GraalでのOSR - Fight the Future

以前JIT コンパイルでのOSR（on-stack replacement）について見ました。

jyukutyo.hatenablog.com

次は、GraalでのOSRの実装を見てみます。org.graalvm.compiler.hotspot.phases.OnStackReplacementPhaseです。各Phaseクラスではrun(StructuredGraph)を実装しなければなりません。OnStackReplacementPhaseクラスではrun()メソッドがすごく長いです。

public class OnStackReplacementPhase extends Phase {
...
    @Override
    protected void run(StructuredGraph graph) {
        DebugContext debug = graph.getDebug();
        if (graph.getEntryBCI() == JVMCICompiler.INVOCATION_ENTRY_BCI) {
            // This happens during inlining in a OSR method, because the same phase plan will be
            // used.
            assert graph.getNodes(EntryMarkerNode.TYPE).isEmpty();
            return;
        }
        debug.dump(DebugContext.DETAILED_LEVEL, graph, "OnStackReplacement initial at bci %d", graph.getEntryBCI());

        EntryMarkerNode osr;
        int maxIterations = -1;
        int iterations = 0;

        final EntryMarkerNode originalOSRNode = getEntryMarker(graph);
        final LoopBeginNode originalOSRLoop = osrLoop(originalOSRNode);
        final boolean currentOSRWithLocks = osrWithLocks(originalOSRNode);
...

まずEntryBCIがJVMCICompiler.INVOCATION_ENTRY_BCIか確認します。BCI = ByteCode Index（バイトコードインデックス）です。インデックスがINVOCATION_ENTRY_BCIと同じであれば、それはOSRでないコンパイルリクエストなので、returnして終了です。INVOCATION_ENTRY_BCIの実際の値は-1です。BCIはJVMCIのCompilationRequestオブジェクトから取得できます。

package jdk.vm.ci.code;

/**
 * Represents a request to compile a method.
 */
public class CompilationRequest {
...
    /**
     * Gets the bytecode index (BCI) at which to start compiling where -1 denotes a non-OSR
     * compilation request and all other values denote an on stack replacement (OSR) compilation
     * request.
     */
    public int getEntryBCI() {
        return entryBCI;
    }

次にEntryMarkerNodeノードをグラフから取り出します。

/**
 * This node will be inserted at point specified by {@link StructuredGraph#getEntryBCI()}, usually
 * by the graph builder.
 */
@NodeInfo(allowedUsageTypes = Association, cycles = CYCLES_0, size = SIZE_0)
public final class EntryMarkerNode extends BeginStateSplitNode implements IterableNodeType, LIRLowerable {

これはOSR対象であるグラフに1つだけ設定されています。

さらにLoopBeginNodeループ開始ノード、モニタでのロックがあるかを取り出します。

        do {
            osr = getEntryMarker(graph);
            LoopsData loops = new LoopsData(graph);
            // Find the loop that contains the EntryMarker
            Loop<Block> l = loops.getCFG().getNodeToBlock().get(osr).getLoop();
            if (l == null) {
                break;
            }

            iterations++;
            if (maxIterations == -1) {
                maxIterations = l.getDepth();
            } else if (iterations > maxIterations) {
                throw GraalError.shouldNotReachHere();
            }
            // Peel the outermost loop first
            while (l.getParent() != null) {
                l = l.getParent();
            }

            LoopTransformations.peel(loops.loop(l));
            osr.replaceAtUsages(InputType.Guard, AbstractBeginNode.prevBegin((FixedNode) osr.predecessor()));
            for (Node usage : osr.usages().snapshot()) {
                EntryProxyNode proxy = (EntryProxyNode) usage;
                proxy.replaceAndDelete(proxy.value());
            }
            GraphUtil.removeFixedWithUnusedInputs(osr);
            debug.dump(DebugContext.DETAILED_LEVEL, graph, "OnStackReplacement loop peeling result");
        } while (true);

loops.getCFG().getNodeToBlock().get(osr).getLoop()を見ます。

CFGは"Control Flow Graph: 制御フローグラフ"です。Graalのグラフオブジェクトはさまざまな情報を持っていますので、これをCFGとして扱えるようラップしたオブジェクトを使います。さらに、ノードから対応するブロックを取り出せるようにしてEntryMarkerNodeに対応するノードを取り出し、そのLoopオブジェクトを取得しています。

maxIterationsはローカル変数で初期値は-1です。

"Peel the outermost loop first"というコメントがあります。ループのピーリング（ループカウンタをアップして、ループ内の処理をループの外側に通常のコードとして配置する）をするために、もっとも外側のループから始めたいからです。Loopオブジェクトはさらに外側のループをparentとして持っています。そのあとのLoopTransformations.peel()でピーリングしているようです。

そのあとはノードの削除をしてるようですが、どういうことかは僕はわかっていません。

        FrameState osrState = osr.stateAfter();
        osr.setStateAfter(null);
        OSRStartNode osrStart = graph.add(new OSRStartNode());
        StartNode start = graph.start();
        FixedNode next = osr.next();
        osr.setNext(null);
        osrStart.setNext(next);
        graph.setStart(osrStart);
        osrStart.setStateAfter(osrState);

EntryMarkerNodeからFrameStateを取得します。FrameStateは抽象解釈において特定地点でのframe state（ローカル変数とオペランドスタック）をカプセル化したオブジェクトです。

StartNodeは後に使います。EntryMarkerNodeの次のノードを、新しく生成したOSRStartNodeのnextに設定します。ノードの移動でしょう。なぜするのか？僕にはまだ見えていません。

        final int localsSize = osrState.localsSize();
        final int locksSize = osrState.locksSize();

localsSizeはローカル変数のサイズ、locksSizeはframe stateの現在のサイズ（つまり高さ）です。

        for (int i = 0; i < localsSize + locksSize; i++) {
            ValueNode value = null;
            if (i >= localsSize) {
                value = osrState.lockAt(i - localsSize);
            } else {
                value = osrState.localAt(i);
            }
            if (value instanceof EntryProxyNode) {
                EntryProxyNode proxy = (EntryProxyNode) value;
                /*
                 * We need to drop the stamp since the types we see during OSR may be too precise
                 * (if a branch was not parsed for example). In cases when this is possible, we
                 * insert a guard and narrow the OSRLocal stamp at its usages.
                 */
                Stamp narrowedStamp = proxy.value().stamp();
                Stamp unrestrictedStamp = proxy.stamp().unrestricted();
                ValueNode osrLocal;
                if (i >= localsSize) {
                    osrLocal = graph.addOrUnique(new OSRLockNode(i - localsSize, unrestrictedStamp));
                } else {
                    osrLocal = graph.addOrUnique(new OSRLocalNode(i, unrestrictedStamp));
                }
                // Speculate on the OSRLocal stamps that could be more precise.
                OSRLocalSpeculationReason reason = new OSRLocalSpeculationReason(osrState.bci, narrowedStamp, i);
                if (graph.getSpeculationLog().maySpeculate(reason) && osrLocal instanceof OSRLocalNode && value.getStackKind().equals(JavaKind.Object) && !narrowedStamp.isUnrestricted()) {
                    // Add guard.
                    LogicNode check = graph.addOrUniqueWithInputs(InstanceOfNode.createHelper((ObjectStamp) narrowedStamp, osrLocal, null, null));
                    JavaConstant constant = graph.getSpeculationLog().speculate(reason);
                    FixedGuardNode guard = graph.add(new FixedGuardNode(check, DeoptimizationReason.OptimizedTypeCheckViolated, DeoptimizationAction.InvalidateRecompile, constant, false));
                    graph.addAfterFixed(osrStart, guard);

                    // Replace with a more specific type at usages.
                    // We know that we are at the root,
                    // so we need to replace the proxy in the state.
                    proxy.replaceAtMatchingUsages(osrLocal, n -> n == osrState);
                    osrLocal = graph.addOrUnique(new PiNode(osrLocal, narrowedStamp, guard));
                }
                proxy.replaceAndDelete(osrLocal);
            } else {
                assert value == null || value instanceof OSRLocalNode;
            }
        }

いやーわかりません。さらにまだ続くのですが、このアプローチではここまでのようです。