Shipping a CNN game AI on-device in Expo with ONNX (the linking, the testing, the config plugin)

A 303 KB distilled AlphaZero lives inside two of my iOS/Android games. Here's the three-part Expo integration that wasn't in any tutorial: the autolinking hole, the Jest sandbox workaround, and the model pipeline.

Jelmata and Cell Division both ship an “Elite” AI difficulty that uses a small convolutional neural network, on-device, no network call. Jelmata’s board is 10×10 and the model is 303 KB. Cell Division’s board is 8×8 and the model is 262 KB. Both load in about 40 ms and run inference in tens of milliseconds on CPU through the native ONNX runtime — fast enough to feel instant on a five-year-old Android phone with no GPU. The .onnx file sits next to the app icon in the bundle.

Getting there took three specific integration details that were either absent from every tutorial I read or scattered across a dozen. This post collects them.

Why a CNN at all?

Easy / Medium / Hard difficulty tiers use inline heuristics — openness, boundary penalty, immediate score delta, that kind of thing. They’re beatable by anyone who notices the three rules. For Elite to feel meaningfully stronger, it has to play moves that don’t reduce to a one-liner.

A policy network trained to imitate a much stronger teacher does exactly that, and it does it for around 250 KB of weights. That’s smaller than most of the PNGs in the app bundle. At that size, “just ship the model” stops being an exotic choice.

Training in two stages (why the model on-device isn’t the model I trained)

Each game has its own training pipeline in ai/scripts/. Two stages:

  1. Teacher (AlphaZero-style). A ResNet with 64 channels and 6 residual blocks, policy + value heads, trained via MCTS self-play (80 simulations per position). This is the “real” model. It’s several megabytes and too large to ship.
  2. Student (distillation). A SmallPolicyNet with 32 channels, 3 residual blocks, policy head only. It’s trained by imitation learning on the teacher’s MCTS move distributions — not on game outcomes, not with its own MCTS at inference time. It just predicts “what would the teacher do here?” in one forward pass.

Stage 3 is ONNX export — export_elite_onnx.py takes the student PyTorch checkpoint and emits opset 13 with a dynamic batch dimension. That .onnx is what gets bundled.

The student model doesn’t have to reconstruct the teacher’s reasoning. It has to mimic it on the move distribution, which is a much easier learning problem than beating a strong opponent from scratch. Cell Division’s student is about 65K parameters for the 8×8 board; Jelmata’s is a bit larger for the 10×10 grid. Either way you get most of the way to AlphaZero-level play at roughly 1% of the teacher’s size.

The input/output shape that ships

Input [1, 3, N, N] where N is the board dimension (10 for Jelmata, 8 for Cell Division):

  • Channel 0: current player’s stones (1.0 / 0.0).
  • Channel 1: opponent’s stones.
  • Channel 2: valid-cell mask (1.0 where an empty legal cell exists).

Output [1, N*N] — policy logits over grid positions. Argmax for strict Elite play; sampled softmax with a temperature if I want an “elite-but-slightly-loose” variant.

const inputTensor = new OnnxRuntime.Tensor('float32', obs, [1, 3, MAX_BOARD, MAX_BOARD]);
const results = await session.run({ obs: inputTensor });
const logits = results['logits'].data as Float32Array;

That’s the hot path. Construction of the input tensor is the most allocation-heavy line of code in the whole move-selection routine, and it still finishes in a single-digit millisecond on mid-range hardware.

Integration piece 1: the autolinking gap

This is the thing that cost me an afternoon.

onnxruntime-react-native doesn’t ship a react-native.config.js. Expo’s prebuild autolinking walks node_modules looking for that file on each package. Packages without it get silently skipped.

The failure mode is devious: the JavaScript import works, so nothing breaks at module-load time. You only discover the problem when you try to create an InferenceSession:

function isOnnxAvailable(): boolean {
  try {
    const { NativeModules } = require('react-native');
    return NativeModules.Onnxruntime != null;
  } catch {
    return false;
  }
}

NativeModules.Onnxruntime is null. No stack trace. No build error. Everything looked fine. Classic silent failure.

Integration piece 2: a twenty-line Expo config plugin

The fix is to register the Android native package explicitly in MainApplication.kt. Doing that by editing the generated file is pointless — npx expo prebuild --clean wipes it — so I wrote a config plugin:

// plugins/withOnnxruntimePackage.js
const { withMainApplication } = require('@expo/config-plugins');

const IMPORT_LINE = 'import ai.onnxruntime.reactnative.OnnxruntimePackage';
const PACKAGE_LINE = '              add(OnnxruntimePackage())';

module.exports = function withOnnxruntimePackage(config) {
  return withMainApplication(config, (config) => {
    let contents = config.modResults.contents;

    if (!contents.includes(IMPORT_LINE)) {
      contents = contents.replace(
        /^(package .+\n)/m,
        `$1\n${IMPORT_LINE}\n`,
      );
    }

    if (!contents.includes('OnnxruntimePackage()')) {
      contents = contents.replace(
        /(override fun getPackages\(\).*\n.*\.apply \{)\n/,
        `$1\n${PACKAGE_LINE}\n`,
      );
    }

    config.modResults.contents = contents;
    return config;
  });
};

Registered via app.config.js:

plugins: [
  'onnxruntime-react-native',
  './plugins/withOnnxruntimePackage',
]

This pattern generalizes. Any time you use a native React Native package that doesn’t ship autolinking config, the move is:

  1. Identify the import statement and add(…) line you’d manually add to MainApplication.kt.
  2. Write a withMainApplication plugin that idempotently inserts them.
  3. Register the plugin.

It survives prebuilds, it’s readable, and it makes the native integration part of your repo rather than a “remember to edit this file after prebuild” footgun.

Integration piece 3: the Jest sandbox workaround

This one you run into when you try to test the inference path.

onnxruntime-node (the Node build, used in unit tests) uses N-API buffers under the hood. Jest runs each test file inside a VM sandbox, which rewraps typed arrays like Float32Array. The consequence: when the native module does input instanceof Float32Array, the check returns false even though the array is a Float32Array — just one from a different V8 realm. Inference throws.

The workaround that actually works: run inference in a child process, outside the Jest sandbox.

// inside the test
const raw = execSync(`node test/cnn-inference-worker.js`, {
  input: JSON.stringify(board),
}).toString();
const { gridX, gridY } = JSON.parse(raw);

test/cnn-inference-worker.js is a standalone Node script. It reads a board from stdin, loads the .onnx, runs inference, prints {gridX, gridY} as JSON. Slower per call than in-process inference, but that’s acceptable for test suites — you’re running dozens of inferences, not millions — and it means your tests exercise the real ONNX runtime instead of a mock.

Integration piece 4: the tournament test

With inference working in tests, I can assert the CNN is actually stronger than the linear baselines in CI.

src/engine/ai/tournament.test.ts runs a round-robin across every difficulty tier, every board size from 5×5 through 8×8, and both player orders. The shape of the assertions is:

  • Hard beats Easy more than 50% of the time.
  • Elite-CNN beats Elite-linear more than some threshold.
  • Lower-tier beating higher-tier outside of statistical noise is a test failure.

If I change the training pipeline and the student drifts — say, I increase the teacher’s temperature too much and the student learns worse moves — the tournament test catches it before the model ships. This is the second place the student model earns its keep. The first is inference latency on-device; the second is regression-proofing in CI.

Two things I wouldn’t do

  • Ship onnxruntime-web inside a React Native app. The WebAssembly build is excellent for the browser, but using it inside a WebView inside React Native is two layers of indirection where you already have a native bridge available. onnxruntime-react-native is CPU, simple, and reliably fast enough.
  • Build platform-specific variants (Core ML / NNAPI) before there’s a reason. A 300 KB model does CPU inference in low tens of milliseconds, which is a rounding error on a human’s move-thinking time. If the model grows 10× or I start doing continuous inference during animation, that calculation changes.

The meta-pattern

Last week I wrote about bundling static assets into the Docker image instead of fetching them at boot. Different context, same underlying move:

Put the intelligent thing inside the app bundle. Stop treating it like it needs a network.

A 300 KB model plus a 20-line config plugin is a game AI that five years ago would have required a server, a subscription, and a telemetry story. “Indie can’t ship real ML” is a constraint that exists mostly in people’s heads. Small distilled models, native runtimes that already speak ONNX, and a build-time export pipeline make it mostly a plumbing problem. The plumbing is what this post was about.


If you want to see the result: Jelmata and Cell Division are both on the App Store, and Elite difficulty on either is the CNN. Notice how it doesn’t really lose to obvious heuristics.

For the player-facing angle on the same AI — why it feels instant, why it’s offline, what Elite actually is from a player’s perspective — there are companion posts on each game’s own blog: How Elite Plays: The AI Inside Cell Division and How Jelmata’s Elite AI Plays.