<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[tapflow]]></title><description><![CDATA[A self-hosted Appetize / BrowserStack alternative for mobile QA teams]]></description><link>https://tapflow.hashnode.dev</link><image><url>https://cdn.hashnode.com/uploads/logos/6a184e59badcd8afcba8296c/a64917de-7d4c-42dc-a283-da2dc25969f0.png</url><title>tapflow</title><link>https://tapflow.hashnode.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Sun, 21 Jun 2026 17:05:23 GMT</lastBuildDate><atom:link href="https://tapflow.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[The DX we wanted for tapflow setup: a host-ready Mac in one command]]></title><description><![CDATA[tapflow streams iOS simulators and Android emulators into a browser, so a whole team can test an app without installing anything. The simulators run on a Mac that hosts the tapflow agent, and that Mac]]></description><link>https://tapflow.hashnode.dev/the-dx-we-wanted-for-tapflow-setup-a-host-ready-mac-in-one-command</link><guid isPermaLink="true">https://tapflow.hashnode.dev/the-dx-we-wanted-for-tapflow-setup-a-host-ready-mac-in-one-command</guid><category><![CDATA[Node.js]]></category><category><![CDATA[cli]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[devtools]]></category><category><![CDATA[Mobile Development]]></category><category><![CDATA[mobile testing,]]></category><category><![CDATA[testing tools]]></category><category><![CDATA[iOS]]></category><category><![CDATA[Android]]></category><category><![CDATA[React Native]]></category><category><![CDATA[Flutter]]></category><dc:creator><![CDATA[duchan jo]]></dc:creator><pubDate>Wed, 17 Jun 2026 14:00:05 GMT</pubDate><content:encoded><![CDATA[<p><a href="https://github.com/jo-duchan/tapflow">tapflow</a> streams iOS simulators and Android emulators into a browser, so a whole team can test an app without installing anything. The simulators run on a Mac that hosts the tapflow agent, and that Mac needs the mobile toolchain: Xcode, a simulator runtime, the Android SDK, an emulator, AVDs.</p>
<p>tapflow already gives the people who <em>use</em> it a zero-install experience. We wanted the person who <em>hosts</em> it to get the same one. So <code>tapflow setup</code> brings the whole host environment up in as close to one command as the toolchain allows.</p>
<p><a class="embed-card" href="https://youtu.be/RTLBJIrHf9M">https://youtu.be/RTLBJIrHf9M</a></p>

<p>Here are the DX decisions behind it.</p>
<hr />
<h2><code>doctor</code> diagnoses, <code>setup</code> installs and configures</h2>
<p>We split diagnosis from the steps that install and configure.</p>
<p><code>tapflow doctor</code> is read-only: it checks the prerequisites — Xcode, <code>simctl</code>, a simulator runtime; the SDK, <code>adb</code>, an AVD — and reports. It never changes your machine, so it's safe to run anywhere, anytime. A clean run looks like this:</p>
<pre><code class="language-text">tapflow doctor

  ✓  Node v20.11.0

  iOS
  ✓  Xcode 16.2
  ✓  xcrun simctl
  ✓  Simulator available (8)

  Android
  ✓  Android SDK: ~/Library/Android/sdk
  ✓  adb found: ~/Library/Android/sdk/platform-tools/adb
  ✓  AVD available: tapflow-phone

  All checks passed.
</code></pre>
<p>When something's missing, each failing line carries the exact fix — <code>⚠ AVD → No AVD found. Run: tapflow setup android</code> — so <code>doctor</code> always points straight at <code>setup</code>.</p>
<p><code>tapflow setup</code> is the one command allowed to install and configure. The two mirror each other, so the mutating verb lives in exactly one place. Run <code>setup</code> with no argument and it reads the environment:</p>
<pre><code class="language-typescript">if (process.platform === 'darwin') platforms.push('ios')
if (resolveAdb() !== null) platforms.push('android')
</code></pre>
<p>macOS implies iOS; an existing <code>adb</code> implies you care about Android. If neither signal is there, it asks instead of guessing.</p>
<h2>iOS: installed ≠ usable</h2>
<p>Xcode can only come from the App Store, so <code>setup</code> doesn't fake it — it opens the right page and waits.</p>
<p>The part we kept getting wrong was that a freshly installed Xcode isn't a working one. Three steps stand between "the app exists" and "<code>xcodebuild</code> runs": point the active developer directory at Xcode (<code>xcode-select -s</code>), accept the license, and finish first launch. <code>setup</code> runs them for you, after asking, since they need <code>sudo</code>. The check that matters isn't "does Xcode.app exist" — it's whether <code>xcodebuild -version</code> actually runs.</p>
<h2>Android: a self-contained SDK</h2>
<p>This is the decision we're happiest with.</p>
<p>The obvious path is "install Android Studio." We didn't. The host doesn't need a GUI IDE, and depending on one means fighting whatever SDK location, <code>ANDROID_HOME</code>, and AVDs the user already has. Instead <code>setup</code> builds a self-contained SDK under one path we own:</p>
<pre><code class="language-bash">sdkmanager --sdk_root=~/Library/Android/sdk \
  "cmdline-tools;latest" "platform-tools" "emulator" \
  "system-images;android-35;google_apis;arm64-v8a"
</code></pre>
<p>After that, every Android binary tapflow touches comes from inside that directory. A couple of details that make it reliable:</p>
<ul>
<li><p><code>sdkmanager</code> needs a JDK or it won't run, so a Temurin check happens first.</p>
</li>
<li><p>The system image is <code>google_apis</code>, not the Play Store one, which is unstable the way we drive it.</p>
</li>
</ul>
<p>The one thing that has to outlive the process is <code>ANDROID_HOME</code> on your <code>PATH</code>. <code>setup</code> writes it into your shell rc inside a marker block, and only if it isn't already there — so re-running never duplicates it:</p>
<pre><code class="language-bash"># &gt;&gt;&gt; tapflow android sdk &gt;&gt;&gt;
export ANDROID_HOME="$HOME/Library/Android/sdk"
export PATH="\(ANDROID_HOME/platform-tools:\)ANDROID_HOME/emulator:$PATH"
# &lt;&lt;&lt; tapflow android sdk &lt;&lt;&lt;
</code></pre>
<p>The catch it can't remove: the variable isn't in your <em>current</em> shell, so <code>setup</code> tells you to open a new terminal before <code>doctor</code>.</p>
<h2>setup prepares; the relay boots</h2>
<p>For both platforms, <code>setup</code> stops at "a bootable device exists." It never boots a simulator or emulator. That's the relay's job — it boots the right device on demand when a teammate joins a QA session. Two components owning device lifecycle would just race each other.</p>
<h2>One rail under all of it</h2>
<p>Every step that changes the machine asks first, and only auto-runs in an interactive terminal. Run <code>setup</code> in CI and instead of curling an install script as root, it prints guidance and exits clean. Nothing in tapflow deletes your data — the only teardown command is <code>reset</code>, which shuts down running simulators.</p>
<hr />
<h2>Honest limitations</h2>
<ul>
<li><p>Xcode is still a manual App Store download. There's no API for it; <code>setup</code> automates everything around it.</p>
</li>
<li><p>The first run usually needs a new shell for the Android <code>PATH</code> to take effect.</p>
</li>
<li><p>The agent host is a Mac, since that's the only place iOS simulators run. (The relay itself runs on Linux.)</p>
</li>
<li><p>Still v0.x, so the steps will keep moving as the toolchain shifts.</p>
</li>
</ul>
<hr />
<h2>Takeaway</h2>
<p>The downloads were never the hard part. The DX lives in the glue around them — Xcode activation, the <code>PATH</code> that needs a fresh shell, knowing to stop at "bootable" and let the relay boot the rest. Automating a dev environment means automating everything that isn't the install.</p>
<hr />
<h2>Try it</h2>
<p>tapflow is MIT licensed.</p>
<pre><code class="language-bash">npm install -g tapflow
tapflow doctor     # what's missing?
tapflow setup      # set it up
tapflow start
</code></pre>
<ul>
<li><p>🔗 GitHub: <a href="https://github.com/jo-duchan/tapflow">https://github.com/jo-duchan/tapflow</a></p>
</li>
<li><p>📖 Docs: <a href="https://www.tapflow.dev">https://www.tapflow.dev</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[We switched simulator streaming to H.264 and it felt worse. Here's how we fixed the latency.]]></title><description><![CDATA[In an earlier post I described how tapflow streams iOS simulators to the browser: pull frames off the simulator's IOSurface, JPEG-encode them on the Mac, push them over WebSocket at ~30fps.
JPEG has o]]></description><link>https://tapflow.hashnode.dev/we-switched-simulator-streaming-to-h-264-and-it-felt-worse-here-s-how-we-fixed-the-latency</link><guid isPermaLink="true">https://tapflow.hashnode.dev/we-switched-simulator-streaming-to-h-264-and-it-felt-worse-here-s-how-we-fixed-the-latency</guid><dc:creator><![CDATA[duchan jo]]></dc:creator><pubDate>Wed, 10 Jun 2026 07:13:34 GMT</pubDate><content:encoded><![CDATA[<p>In an earlier post I described how <a href="https://github.com/jo-duchan/tapflow">tapflow</a> streams iOS simulators to the browser: pull frames off the simulator's <code>IOSurface</code>, JPEG-encode them on the Mac, push them over WebSocket at ~30fps.</p>
<p>JPEG has one great property for interactive streaming: every frame is independent and decodes instantly. There's no buffer, no inter-frame dependency. On localhost it feels like you're touching the simulator directly.</p>
<p>It also has one terrible property: size. A full-frame JPEG of a scrolling screen is ~590KB. On a LAN that's 12–16 MB/s, and our relay started dropping 16–27 frames a second under backpressure — visible tearing.</p>
<p>So we did the obvious thing and moved to H.264. Bandwidth dropped roughly 140× on a still screen and 5× while scrolling. Drops nearly vanished.</p>
<p>And the stream felt <em>worse</em>.</p>
<p>This post is about why, and the two fixes that got H.264 back to "feels like direct touch."</p>
<hr />
<h2>The bar: localhost JPEG</h2>
<p>Before touching anything I needed a number, not a vibe. So I instrumented the pipeline end to end — a per-stage panel that reports <code>decode→present</code> and <code>glass→glass</code> (capture timestamp to on-screen) latencies live.</p>
<blockquote>
<p>One caveat I'll repeat throughout: <code>glass→glass</code> absolute values are only valid on localhost, where capture and display share one clock. <code>decode→present</code> is a same-machine delta and valid anywhere, so I'll lean on it for the cross-environment claims.</p>
</blockquote>
<p>Here's the baseline that mattered, measured on localhost:</p>
<table>
<thead>
<tr>
<th>Path</th>
<th>decode→present p50/p95 (ms)</th>
</tr>
</thead>
<tbody><tr>
<td>JPEG still</td>
<td>12.4 / 15.4</td>
</tr>
<tr>
<td>JPEG scroll</td>
<td>9.4 / 11.6</td>
</tr>
<tr>
<td><strong>H.264 (WebCodecs) still</strong></td>
<td><strong>267 / 274</strong></td>
</tr>
</tbody></table>
<p>H.264 decode was <strong>~20× slower</strong> than JPEG. On a hardware decoder. That made no sense — until I looked at what the decoder was actually doing.</p>
<hr />
<h2>Fix 1: the decoder was buffering 8 frames for no reason</h2>
<p>The transport was clean (~1ms), the input queue was empty. The latency was entirely inside the decoder: it was holding ~8 frames before emitting the first one.</p>
<p>That's a DPB (decoded picture buffer). A decoder reorders frames when B-frames are present — it has to wait for future frames to arrive before it can output the current one in display order. So it buffers up to the level's maximum.</p>
<p>But our encoder is <strong>baseline H.264, B-frames off</strong>. There is no reordering. The actual reorder depth is zero. The decoder was buffering anyway because the bitstream never <em>told</em> it the reorder depth was zero.</p>
<p>The signal lives in the SPS (sequence parameter set), in the <code>bitstream_restriction</code> flags inside VUI. Our VideoToolbox encoder wasn't setting them, so the decoder fell back to the worst case for the level — <code>max_dec_frame_buffering</code> of ~8 frames at Level 5.0.</p>
<p>The fix is to rewrite the SPS and inject the missing declaration:</p>
<pre><code class="language-plaintext">max_num_reorder_frames = 0
max_dec_frame_buffering = num_ref_frames
</code></pre>
<p>We do this in the agent, on the keyframe SPS, before the frame ever leaves the Mac — so <em>every</em> decoder downstream benefits, not just one browser path:</p>
<pre><code class="language-typescript">// agent-core/utils/sps.ts — rewrite the SPS to declare zero reordering
function rewriteLowLatencySps(sps: Uint8Array): Uint8Array {
  const bits = new BitstreamWriter(parseSps(sps))
  bits.vui.bitstreamRestriction = true
  bits.vui.maxNumReorderFrames = 0
  bits.vui.maxDecFrameBuffering = bits.numRefFrames
  return serialize(bits)
}
</code></pre>
<p>Result on localhost:</p>
<table>
<thead>
<tr>
<th>Path</th>
<th>decode→present p50/p95 (ms)</th>
</tr>
</thead>
<tbody><tr>
<td>H.264 WebCodecs still (before)</td>
<td>267 / 274</td>
</tr>
<tr>
<td><strong>H.264 WebCodecs still (after)</strong></td>
<td><strong>2.5 / 4</strong></td>
</tr>
<tr>
<td><strong>H.264 WebCodecs scroll (after)</strong></td>
<td><strong>2.1 / 3.9</strong></td>
</tr>
</tbody></table>
<p><code>267 → 2.5ms</code>, roughly 100×. The encoder was lying to the decoder by omission, and the decoder defended itself by buffering. One declaration fixed it.</p>
<p>The browser confirms it's receiving the rewrite — the SPS now reports <code>bitstreamRestriction: true, maxNumReorderFrames: 0</code>.</p>
<hr />
<h2>Fix 2: MSE is a buffer you can't turn off</h2>
<p>Fix 1 only helps the WebCodecs path. And WebCodecs has a hard constraint: it only runs in a secure context — HTTPS or localhost.</p>
<p>A team using tapflow over their LAN hits it at plain <code>http://&lt;mac-ip&gt;:4000</code>. That's a non-secure context, so the browser can't use WebCodecs. The fallback at the time was MSE (Media Source Extensions): feed the H.264 into a <code>&lt;video&gt;</code> element through a muxer.</p>
<p>The problem is that <code>&lt;video&gt;</code> <em>is</em> a buffer. It's designed for media playback, where a jitter buffer is a feature. For interactive streaming it's structural latency you can't remove. I measured it on localhost by forcing the MSE tier:</p>
<table>
<thead>
<tr>
<th>Path</th>
<th>decode→present p50/p95 (ms)</th>
</tr>
</thead>
<tbody><tr>
<td>H.264 MSE still</td>
<td>239 / 254</td>
</tr>
<tr>
<td>H.264 MSE scroll</td>
<td>229 / 244</td>
</tr>
</tbody></table>
<p>~235ms, on the <em>same</em> <code>reorder=0</code> stream that WebCodecs decoded in 2.5ms. The SPS fix can't reach this — it's the media-element buffer, not the decoder's DPB. I'd already set the muxer's <code>flushingTime</code> to 0. There was nothing left to shave.</p>
<p>So I stopped trying to make MSE fast and removed it.</p>
<p>The decoder layer is now two tiers, picked automatically per environment:</p>
<pre><code class="language-typescript">// pickDecoder — secure → WebCodecs, otherwise WASM
export function pickDecoder(): Decoder | null {
  if (isSecureContext &amp;&amp; 'VideoDecoder' in window) {
    return new WebCodecsDecoder()      // HW, lowest latency
  }
  if (webgl2Available &amp;&amp; wasmSupported) {
    return new WASMDecoder()           // tinyh264, zero-buffer
  }
  return null                          // → fall back to JPEG
}
</code></pre>
<p>On non-secure LAN-HTTP, we decode H.264 in WASM (tinyh264). It's a software decoder, so it costs CPU — but it has <strong>no media-element buffer at all</strong>. That's the whole point: it gives you JPEG's immediacy with H.264's bandwidth, on plain HTTP.</p>
<p>Measured on localhost (the worst case — encoder and decoder share one Mac):</p>
<table>
<thead>
<tr>
<th>Path</th>
<th>decode→present p50/p95 (ms)</th>
</tr>
</thead>
<tbody><tr>
<td>H.264 WASM still</td>
<td>8.7 / 30.4</td>
</tr>
<tr>
<td>H.264 WASM scroll</td>
<td>14.3 / 37.9</td>
</tr>
</tbody></table>
<p>That's on par with the localhost-JPEG baseline (12.4 / 9.4) — the bar we set at the start. Removing MSE also let us drop the muxer dependency entirely.</p>
<p>One constraint this introduces: tinyh264 only decodes baseline H.264. iOS already encodes baseline. For Android we pin scrcpy to baseline (<code>profile:int=1</code>) so both platforms share the exact same HTTP→WASM path. High profile is still available on the WebCodecs (secure) tier.</p>
<hr />
<h2>One more thing: dropping H.264 isn't like dropping JPEG</h2>
<p>There's a subtlety the switch exposed. With JPEG, every frame is a keyframe, so dropping a frame under backpressure is harmless — the next one stands alone. With H.264, if you drop a P-frame, every following P-frame references something the decoder never received. A zero-buffer decoder like WASM tinyh264 shears until the next IDR arrives.</p>
<p>So the relay had to become keyframe-aware: once it starts dropping under backpressure, it drops the whole GOP until the next keyframe, rather than handing the decoder a broken reference chain. The keyframe flag rides in our frame envelope, so this needs zero NAL parsing on the relay.</p>
<pre><code class="language-typescript">// relay — once dropping, drop until the next keyframe
if (backpressured) {
  if (!frame.isKeyframe) return       // skip P-frames in a broken GOP
  dropping = false                    // keyframe resets the chain
}
</code></pre>
<hr />
<h2>Honest limitations</h2>
<ul>
<li><p><strong>WASM decode is CPU-bound.</strong> At high resolution × fps it hits a CPU ceiling. We mitigate by downscaling the encode resolution — the display is small, so it's a triple win on bandwidth, CPU, and latency.</p>
</li>
<li><p><strong>The localhost numbers are best-case for latency and worst-case for CPU.</strong> On a real LAN the decoder runs on a separate machine. In our cross-machine measurements, scroll p95 climbs to ~50ms on <em>both</em> decoders — at that point the bottleneck is load/transport, not the codec. The <code>decode→present</code> deltas above hold; the <code>glass→glass</code> absolutes do not transfer across two clocks.</p>
</li>
<li><p>Still v0.x. The decoder tiers and SPS rewrite are in <code>agent-core</code>; expect them to keep moving.</p>
</li>
</ul>
<hr />
<h2>Takeaway</h2>
<p>Two bugs, same symptom ("H.264 feels laggy"), completely different causes:</p>
<ol>
<li><p>The decoder's DPB buffered 8 frames because the SPS didn't declare <code>reorder=0</code>. Fix: rewrite the SPS at the encoder.</p>
</li>
<li><p>The media-element buffer in MSE added ~235ms that no encoder flag can reach. Fix: remove MSE, decode in WASM on non-secure contexts.</p>
</li>
</ol>
<p>The lesson I keep relearning: when streaming feels slow, measure each stage before you change the codec. The codec usually isn't the problem — the buffer you didn't know you had is.</p>
<hr />
<h2>Try it</h2>
<p>tapflow is MIT licensed.</p>
<pre><code class="language-bash">npm install -g tapflow
tapflow start
</code></pre>
<ul>
<li><p>🔗 GitHub: <a href="https://github.com/jo-duchan/tapflow">https://github.com/jo-duchan/tapflow</a></p>
</li>
<li><p>📖 Docs: <a href="https://www.tapflow.dev">https://www.tapflow.dev</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Giving an LLM Eyes and Hands on a Mobile Simulator]]></title><description><![CDATA[Mobile QA has a scaling problem.
Unit tests and API tests run in CI automatically. But the thing that actually matters to most users — does tapping this button do the right thing, does this screen loo]]></description><link>https://tapflow.hashnode.dev/giving-an-llm-eyes-and-hands-on-a-mobile-simulator</link><guid isPermaLink="true">https://tapflow.hashnode.dev/giving-an-llm-eyes-and-hands-on-a-mobile-simulator</guid><category><![CDATA[Open Source]]></category><category><![CDATA[iOS]]></category><category><![CDATA[Android]]></category><category><![CDATA[app development]]></category><category><![CDATA[mcp]]></category><category><![CDATA[mcp server]]></category><category><![CDATA[llm]]></category><category><![CDATA[devtools]]></category><category><![CDATA[qa testing]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[duchan jo]]></dc:creator><pubDate>Sun, 31 May 2026 05:39:53 GMT</pubDate><content:encoded><![CDATA[<p>Mobile QA has a scaling problem.</p>
<p>Unit tests and API tests run in CI automatically. But the thing that actually matters to most users — does tapping this button do the right thing, does this screen look right after this flow, does the deeplink open the correct state — none of that runs automatically. Someone has to open the simulator, walk through the steps, and verify. Every time.</p>
<p>The usual answer is Appium or XCUITest. But those require engineers to write and maintain test code that mirrors the UI, breaks whenever the screen changes, and only runs against builds developers already have locally.</p>
<p>We had a different idea. tapflow already lets humans control a simulator through a browser. What if we gave an LLM the same interface?</p>
<hr />
<h2>The interface a human uses</h2>
<p>When a person does QA in tapflow, the loop is:</p>
<ol>
<li><p>Look at the simulator screen</p>
</li>
<li><p>Decide what to do (tap, swipe, type)</p>
</li>
<li><p>Do it</p>
</li>
<li><p>Look again</p>
</li>
</ol>
<p>This is exactly the perception-action loop that vision-capable LLMs are built for. The model sees a screenshot, reasons about what it shows, decides what action to take, and calls a tool to execute it.</p>
<p>We didn't need to build a new automation layer. We just needed to expose tapflow's existing WebSocket and REST APIs as MCP tools.</p>
<hr />
<h2>What the MCP server does</h2>
<p><code>@tapflowio/mcp-server</code> connects to a running tapflow relay and registers 13 tools that any MCP-compatible client can call:</p>
<pre><code class="language-plaintext">list_devices       — see all simulators registered on the relay
connect_device     — join a device session
boot_device        — boot a simulator (waits up to 30s for ready state)
screenshot         — capture the current screen
tap                — tap at a pixel coordinate
swipe              — swipe between two coordinates
type_text          — type into the focused field
press_key          — press a keyboard key (Return, Delete, Escape...)
press_button       — press a hardware button (home, lock)
install_app        — install a build from App Center
launch_app         — launch an installed app
list_builds        — list available builds on the relay
disconnect_device  — end the session
</code></pre>
<p>Setup is two environment variables:</p>
<pre><code class="language-bash">TAPFLOW_RELAY_URL=wss://your-relay-url
TAPFLOW_TOKEN=your-pat-token
npx @tapflowio/mcp-server
</code></pre>
<p>Add it as an MCP server in your client config, and those tools appear in the model's tool list.</p>
<hr />
<h2>How the tools are implemented</h2>
<h3>Screenshot — the model's eyes</h3>
<p>The <code>screenshot</code> tool calls the REST endpoint we added in v0.3.0 (<code>GET /api/v1/sessions/:id/screenshot</code>), gets back a PNG or JPEG buffer, base64-encodes it, and returns it as MCP <code>image</code> content alongside the pixel dimensions:</p>
<pre><code class="language-typescript">return {
  content: [
    { type: 'image', data: buf.toString('base64'), mimeType },
    { type: 'text', text: `Screenshot saved: \({filePath} (\){width}×${height}px)` },
  ],
}
</code></pre>
<p>The model receives the actual image. It can read text on screen, identify UI elements, notice error states — the same things a human would.</p>
<h3>Tap and swipe — normalized coordinates</h3>
<p>Here's the part that took a few iterations to get right. The simulator's logical coordinate space is different from screenshot pixel coordinates, and it changes with screen resolution, device type, and scale factor.</p>
<p>Rather than exposing logical coordinates (which the model can't reason about without device-specific knowledge), we have the model work entirely in screenshot pixel space. The <code>tap</code> tool takes pixel coordinates plus the screenshot dimensions, then normalizes internally:</p>
<pre><code class="language-typescript">// tools.ts
client.tap(sessionId, x / screenshotWidth, y / screenshotHeight)
</code></pre>
<p>The model calls <code>screenshot</code> first, reads the dimensions from the response, then uses those same dimensions when calling <code>tap</code>. This means the model can identify "the button is at roughly pixel 200, 450" from the image and tap it directly — no coordinate system translation required.</p>
<p>Swipe works the same way, with 8 interpolated <code>touch:move</code> events across the duration to simulate a natural gesture:</p>
<pre><code class="language-typescript">// client.ts — swipe interpolation
const STEPS = 8
const interval = durationMs / STEPS

this.send({ type: 'input:touch:start', sessionId, payload: { x: startX, y: startY } })
for (let i = 1; i &lt; STEPS; i++) {
  await delay(interval)
  const t = i / STEPS
  this.send({
    type: 'input:touch:move',
    sessionId,
    payload: {
      x: Math.round(startX + (endX - startX) * t),
      y: Math.round(startY + (endY - startY) * t),
    },
  })
}
</code></pre>
<h3>Async operations over WebSocket</h3>
<p>Several tools involve async operations — booting a device, installing an app — where the relay sends a confirmation back over WebSocket after the operation completes.</p>
<p>The client uses a <code>waitFor</code> pattern: register a predicate against incoming messages, return a promise that resolves when a matching message arrives, and reject if a timeout fires first.</p>
<pre><code class="language-typescript">// client.ts — waitFor
private waitFor(predicate: (msg) =&gt; boolean, timeoutMs: number): Promise&lt;RelayMsg&gt; {
  return new Promise((resolve, reject) =&gt; {
    const timer = setTimeout(() =&gt; {
      this.waiters.splice(this.waiters.findIndex(w =&gt; w.resolve === resolve), 1)
      reject(new Error('Request timed out'))
    }, timeoutMs)
    this.waiters.push({ predicate, resolve, reject, timer })
  })
}
</code></pre>
<p><code>boot_device</code> waits up to 30 seconds. <code>install_app</code> waits 60 seconds. Each resolves on the confirmation message or rejects with the error payload.</p>
<hr />
<h2>What a session looks like</h2>
<p>A model running a login flow might do this:</p>
<pre><code class="language-plaintext">1. list_devices → pick a session
2. connect_device
3. list_builds → find the build to test
4. boot_device
5. install_app
6. launch_app
7. screenshot → see the login screen
8. tap(email field coordinates) → focus the input
9. type_text("test@example.com")
10. tap(password field coordinates)
11. type_text("password")
12. tap(login button coordinates)
13. screenshot → verify the home screen loaded
14. disconnect_device
</code></pre>
<p>Each screenshot gives the model a chance to verify state before proceeding. If step 13 shows an error message instead of the home screen, the model knows something went wrong.</p>
<hr />
<h2>Where we are: experimental</h2>
<p>The version says <code>0.3.1-experimental.1</code> for a reason. The tools work, but the layer needs more hardening before we'd call it reliable.</p>
<p>The core issue is consistency. The same sequence of tool calls should produce predictable behavior every time. Right now it doesn't always — there are timing edge cases where an action fires before the UI has fully settled, device state can drift between steps without the model noticing, and error recovery when something unexpected happens mid-flow is rough.</p>
<p>These are solvable problems, but we want to solve them before presenting this as something teams should build pipelines on.</p>
<hr />
<h2>Where we're going: CI/CD without a QA script</h2>
<p>The direction we're aiming at is using the MCP server as the foundation for LLM-driven smoke tests in CI.</p>
<p>The scenario: a new build passes unit tests and gets uploaded to App Center. A CI step spins up the MCP server, points it at the relay, and gives a model a natural-language test spec:</p>
<blockquote>
<p>"Install the latest build. Log in with test credentials. Navigate to the cart, add an item, and confirm the checkout screen shows the correct total. Take a screenshot at each step."</p>
</blockquote>
<p>The model does the steps, captures evidence, and reports what it saw. No automation code to write. No selectors to maintain when the UI changes. The spec is just a description of what a human would do.</p>
<p>This isn't production-ready yet. The stability work comes first. But the pieces — browser-controllable simulators, screenshot REST endpoint, MCP tool layer — are in place. The question is whether the model can run a flow reliably enough to be trusted in CI without a human verifying each run.</p>
<p>We think it can. That's what we're building toward.</p>
<hr />
<h2>Try the MCP server (experimental)</h2>
<pre><code class="language-bash">npm install -g @tapflowio/mcp-server@experimental
</code></pre>
<p>You'll need a running tapflow relay and a PAT token with viewer scope. Configure it in your MCP client:</p>
<pre><code class="language-json">{
  "mcpServers": {
    "tapflow": {
      "command": "npx",
      "args": ["@tapflowio/mcp-server"],
      "env": {
        "TAPFLOW_RELAY_URL": "wss://your-relay-url",
        "TAPFLOW_TOKEN": "your-pat-token"
      }
    }
  }
}
</code></pre>
<p>If you try it and hit rough edges, open an issue — that feedback is exactly what's shaping the stability work.</p>
<ul>
<li><p>🔗 GitHub: <a href="https://github.com/jo-duchan/tapflow">https://github.com/jo-duchan/tapflow</a></p>
</li>
<li><p>📖 Docs: <a href="https://www.tapflow.dev/guide/mcp-server">https://www.tapflow.dev/guide/mcp-server</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[tapflow v0.3.x: Deeplinks, Keyboard Shortcuts, Screenshot API, and an Experimental MCP Server]]></title><description><![CDATA[tapflow started as a simple idea: stream iOS simulators and Android emulators to the browser so anyone on the team can do mobile QA without touching Xcode or Android Studio. v0.2.x got the core workin]]></description><link>https://tapflow.hashnode.dev/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental-mcp-server</link><guid isPermaLink="true">https://tapflow.hashnode.dev/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental-mcp-server</guid><category><![CDATA[Open Source]]></category><category><![CDATA[devtools]]></category><category><![CDATA[iOS]]></category><category><![CDATA[Android]]></category><category><![CDATA[self-hosted]]></category><category><![CDATA[app development]]></category><category><![CDATA[React Native]]></category><category><![CDATA[qa testing]]></category><category><![CDATA[mobile app development]]></category><category><![CDATA[appetize-alternative]]></category><category><![CDATA[mcp]]></category><dc:creator><![CDATA[duchan jo]]></dc:creator><pubDate>Fri, 29 May 2026 07:56:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6a184e59badcd8afcba8296c/4d09a6ff-cb6b-4293-b26d-843576c1e79d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>tapflow started as a simple idea: stream iOS simulators and Android emulators to the browser so anyone on the team can do mobile QA without touching Xcode or Android Studio. v0.2.x got the core working — streaming, touch input, App Center, session recording.</p>
<p>v0.3.x is about filling in the gaps that matter during actual QA sessions. This post covers what shipped and ends with something we're still figuring out: an experimental MCP server that lets LLM agents control simulators directly.</p>
<hr />
<h2>Deeplink execution from the browser</h2>
<p><a class="embed-card" href="https://youtu.be/MQaikcQd37w">https://youtu.be/MQaikcQd37w</a></p>

<p>The one that came up most in real usage: testers frequently need to trigger deeplinks to verify specific app states — product detail pages, notification payloads, OAuth redirects. The old workflow always involved a mobile developer — either having them trigger it on their machine or building a debug menu inside the app specifically for this purpose.</p>
<p>In v0.3.0 you can now fire a deeplink directly from the QA session toolbar. Click the link icon (or <code>⌘K</code>), enter the URL, and it executes on the active device.</p>
<p>Under the hood it's a new <code>open-url</code> WebSocket message type that routes browser → relay → agent:</p>
<pre><code class="language-plaintext">Browser ──open-url──► Relay ──open-url──► Mac Agent
                                              │
                           iOS: xcrun simctl openurl booted &lt;url&gt;
                           Android: adb shell am start -a VIEW -d &lt;url&gt;
Browser ◄──open-url:done/error── Relay ◄──────┘
</code></pre>
<p>The <code>DeviceAgent</code> interface got a new <code>openUrl(url)</code> method, so both iOS and Android agents implement it symmetrically. The relay routes it and returns either <code>open-url:done</code> or <code>open-url:error</code> with the failure reason. The dashboard shows a toast either way.</p>
<hr />
<h2>Keyboard shortcuts for simulator controls</h2>
<p>QA sessions are repetitive. Reaching for the toolbar icons on every screenshot or rotation adds up. v0.3.0 adds keyboard shortcuts to all the common actions:</p>
<table>
<thead>
<tr>
<th>Shortcut</th>
<th>Action</th>
</tr>
</thead>
<tbody><tr>
<td><code>⌘K</code></td>
<td>Open deeplink dialog</td>
</tr>
<tr>
<td><code>⌘S</code></td>
<td>Take screenshot</td>
</tr>
<tr>
<td><code>⌘⇧Y</code></td>
<td>Start / stop recording</td>
</tr>
<tr>
<td><code>⌘⇧O</code></td>
<td>Rotate simulator</td>
</tr>
<tr>
<td><code>⌘⇧U</code></td>
<td>iOS: press Home</td>
</tr>
<tr>
<td><code>⌘⇧K</code></td>
<td>iOS: toggle software keyboard</td>
</tr>
</tbody></table>
<p>Tooltips now show the shortcut hint inline, so they're discoverable without reading docs. One implementation detail worth noting: key detection uses <code>e.code</code> instead of <code>e.key</code>. This matters for IME input — Korean, Japanese, and Chinese users composing text would otherwise trigger shortcuts mid-composition.</p>
<hr />
<h2>Screenshot REST endpoint</h2>
<p>This one unlocks a new class of CI usage.</p>
<p><code>GET /api/v1/sessions/:sessionId/screenshot</code> returns a PNG or JPEG of the current simulator screen. You can call it with a PAT token from any CI step — before asserting a visual state, during an automated flow, after a build install.</p>
<p>The tricky part was the request/response pattern. The relay communicates with agents over WebSocket (long-lived, multiplexed), but HTTP is request/response. Screenshots are taken on the Mac, not the relay.</p>
<p>We introduced a requestId-based pending map: the relay generates a unique ID, sends a <code>take-screenshot</code> message to the agent over WebSocket, registers a promise keyed by requestId, and resolves it when <code>screenshot:result</code> comes back. The HTTP handler awaits that promise and sends the binary payload:</p>
<pre><code class="language-plaintext">GET /api/v1/sessions/:id/screenshot
    │
    ▼
Relay: generate requestId, push to pending map
    │
    ├──screenshot-request──► Mac Agent
    │                            │ simctl io screenshot (iOS)
    │                            │ ADB screencap (Android)
    ◄──screenshot:result─────────┘
    │
    ▼
HTTP 200 (binary image)
</code></pre>
<p>iOS supports both PNG and JPEG via <code>--type</code>. Android returns PNG regardless — ADB doesn't offer format selection at this layer.</p>
<hr />
<h2>PAT scope enforcement</h2>
<p>Personal Access Tokens existed before v0.3.0, but the scope field wasn't actually enforced on API routes. A <code>developer</code> scoped token could call any endpoint.</p>
<p>v0.3.0 adds proper scope checks to all builds endpoints. PATs are now enforced at the middleware layer: a token issued for <code>builds</code> access can upload and manage builds, but can't touch team settings or session data. This makes it safe to issue narrow tokens for CI pipelines without giving them broader access than they need.</p>
<hr />
<h2>Frame performance instrumentation</h2>
<p>For anyone debugging streaming latency: v0.3.x adds per-frame hop timestamps via a binary header (<code>TFFE</code> — tapflow frame envelope). Each frame now carries the capture time, relay-received time, and client-received time in an 8-byte prefix before the JPEG/H.264 payload.</p>
<p>The dashboard can surface a live performance overlay showing frame latency broken down by segment (agent → relay, relay → browser). Useful when diagnosing whether a slowdown is in the network leg or the browser decode path.</p>
<hr />
<h2>Experimental: an MCP server</h2>
<p>v0.3.x also ships <code>@tapflowio/mcp-server</code> (<code>0.3.1-experimental.1</code>) — it exposes tapflow's WebSocket/REST APIs as MCP tools so an LLM agent can drive a simulator the same way a human does in the browser: screenshot → reason → tap/type → screenshot again.</p>
<p>It's early (the <code>experimental</code> suffix is literal — consistency and error-recovery still need work), and it's a big enough topic to have its own write-up: <a href="https://dev.to/joduchan/-giving-an-llm-eyes-and-hands-on-a-mobile-simulator-5963"><strong>Giving an LLM Eyes and Hands on a Mobile Simulator</strong></a> covers the full tool list, the normalized-coordinate tap/swipe, and where this is headed (LLM-driven smoke tests in CI).</p>
<pre><code class="language-bash">npm install -g @tapflowio/mcp-server@experimental
</code></pre>
<hr />
<h2>Try it</h2>
<pre><code class="language-bash">npm install -g tapflow
tapflow start
# http://localhost:4000
</code></pre>
<ul>
<li><p>🔗 GitHub: <a href="https://github.com/jo-duchan/tapflow">https://github.com/jo-duchan/tapflow</a></p>
</li>
<li><p>📖 Docs: <a href="https://www.tapflow.dev">https://www.tapflow.dev</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Your whole team can now run mobile QA from the browser. Here's how we built it.]]></title><description><![CDATA[If you work on a mobile product, you've probably seen this.
Physical devices are never enough. Covering every OS version is even harder — iOS doesn't support downgrading, so maintaining a range of ver]]></description><link>https://tapflow.hashnode.dev/your-whole-team-can-now-run-mobile-qa-from-the-browser-here-s-how-we-built-it</link><guid isPermaLink="true">https://tapflow.hashnode.dev/your-whole-team-can-now-run-mobile-qa-from-the-browser-here-s-how-we-built-it</guid><category><![CDATA[open source]]></category><category><![CDATA[devtools]]></category><category><![CDATA[iOS]]></category><category><![CDATA[Android]]></category><category><![CDATA[self-hosted]]></category><category><![CDATA[app development]]></category><category><![CDATA[React Native]]></category><category><![CDATA[qa testing]]></category><category><![CDATA[mobile app development]]></category><category><![CDATA[appetize-alternative]]></category><dc:creator><![CDATA[duchan jo]]></dc:creator><pubDate>Thu, 28 May 2026 15:01:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6a184e59badcd8afcba8296c/d16cf8d7-373e-4729-a774-9186b020b835.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you work on a mobile product, you've probably seen this.</p>
<p>Physical devices are never enough. Covering every OS version is even harder — iOS doesn't support downgrading, so maintaining a range of versions means managing a pool of locked devices, which is overhead nobody wants.</p>
<p>But the bigger friction is access. Simulators only run on a developer's Mac, behind complex toolchains. Anyone on the team who isn't a mobile developer has to ask one every single time they need to verify something:</p>
<blockquote>
<p><strong>Server / FE developer</strong> — "How do I install the sandbox build to check what was deployed?"</p>
<p><strong>Product manager</strong> — "I keep having to install and remove different versions just to compare behavior."</p>
<p><strong>Designer</strong> — "I need to check the layout across screen sizes, but I don't have the right devices."</p>
</blockquote>
<p>Cloud simulator services exist. But uploading internal app builds to an external service — and paying monthly fees for simulators already running on Macs you own — was never something we wanted to do.</p>
<p>So we built <a href="https://github.com/jo-duchan/tapflow">tapflow</a>: an open-source, self-hosted tool that streams iOS simulators and Android emulators to the browser. Anyone on your team opens the dashboard, picks a device, and starts interacting — no Xcode, no Android Studio, no setup.</p>
<pre><code class="language-bash">npm install -g tapflow
tapflow start
# → http://localhost:4000
</code></pre>
<p>This post is about how we built it — specifically the parts that weren't obvious.</p>
<hr />
<h2>Demo Video</h2>
<p><a class="embed-card" href="https://youtu.be/BfoS-i5aMcM">https://youtu.be/BfoS-i5aMcM</a></p>

<hr />
<h2>Why we didn't just use Appetize or BrowserStack</h2>
<p>Both services solve the browser access problem. We evaluated them seriously. Before signing up, we hit two blockers:</p>
<ul>
<li><p><strong>Cost.</strong> Appetize starts at $59/month and scales with team size.</p>
</li>
<li><p><strong>Data.</strong> Both require uploading your app binary to external servers. For anything with sensitive business logic, that's a non-starter.</p>
</li>
</ul>
<p>We already had Macs in the office. So we built tapflow instead.</p>
<hr />
<h2>Architecture</h2>
<pre><code class="language-plaintext">Browser (your team)  ←─ WebSocket ─→  Relay Server  ←─ WebSocket (outbound) ─→  Mac Agent
                                     (Linux / Mac)                           (iOS · Android)
</code></pre>
<p>The Mac Agent connects <strong>outbound</strong> to the relay — no firewall or NAT configuration needed. The relay can run on a small Linux server (a ~$5/month Fly.io instance handles it). App data never leaves your infrastructure.</p>
<hr />
<h2>iOS touch — without WebDriverAgent</h2>
<p>WebDriverAgent was the obvious starting point. We didn't use it.</p>
<p>The problems: WDA breaks on Xcode updates, requires provisioning profiles, needs the app to be in the foreground, and adds a layer of process management complexity we didn't want to own.</p>
<p>Instead, we load <code>CoreSimulator.framework</code> dynamically via <code>dlopen</code> in a Swift binary (<code>touch-helper</code>), then inject HID events directly through <code>SimDeviceLegacyHIDClient</code> and <code>IndigoHID</code>:</p>
<pre><code class="language-swift">// touch-helper — HID event injection into the simulator
let client = SimDeviceLegacyHIDClient(device: device)
let event = IndigoHIDEvent.touch(x: x, y: y, phase: .began)
client.send(event)
</code></pre>
<p>This bypasses WDA entirely. It works independently of the app lifecycle and doesn't break on Xcode updates.</p>
<p>The tradeoff: these are private APIs. They've been stable across Xcode versions in our testing, but Apple could remove them. We think that's a better bet than WDA's reliability track record.</p>
<hr />
<h2>iOS streaming — IOSurface</h2>
<p><code>xcrun simctl io screenshot</code> works, but the latency is too high for interactive use.</p>
<p>Instead, we access <code>IOSurface</code> directly through SimulatorKit, pulling frames straight from the simulator's GPU surface. <del>Frames are JPEG-encoded on the Mac and streamed over WebSocket at ~30fps.</del></p>
<p>For slow clients, we drop frames rather than buffering — backpressure is handled at the WebSocket layer to prevent memory accumulation on the relay when a client can't keep up.  </p>
<p><strong>Update:</strong> JPEG was the first version. The default is now H.264 with a buffer-free 2-tier browser decoder (WebCodecs on secure contexts, WASM on plain HTTP). The full teardown — why H.264 first felt <em>worse</em>, and the two fixes that solved it — is a separate post: <a href="https://tapflow.hashnode.dev/we-switched-simulator-streaming-to-h-264-and-it-felt-worse-here-s-how-we-fixed-the-latency">We switched simulator streaming to H.264 and it felt worse</a>.</p>
<hr />
<h2>Android — scrcpy H.264 → WebGL</h2>
<p>Android was cleaner. scrcpy already does the hard work of capturing the emulator display as an H.264 stream.</p>
<p>We receive the H.264 Annex B stream from scrcpy over a local TCP socket, relay it through WebSocket, then decode and render it in the browser. Android now shares the same buffer-free 2-tier decoder as iOS (see the update above).</p>
<pre><code class="language-plaintext">scrcpy server (emulator)
    → TCP socket
    → Mac Agent
    → WebSocket
    → Browser (WebGL2)
</code></pre>
<h3>Pinch gestures</h3>
<p>scrcpy's <code>INJECT_TOUCH_EVENT</code> supports multiple pointer IDs. Pinch is implemented by sending two simultaneous touch events:</p>
<pre><code class="language-typescript">// ScrcpyControl — multi-touch injection
pinchStart(x1: number, y1: number, x2: number, y2: number): void {
  this.touchDown(0, x1, y1)
  this.touchDown(1, x2, y2)
}
</code></pre>
<hr />
<h2>What's included</h2>
<p>Beyond streaming and input:</p>
<ul>
<li><p><strong>App Center</strong> — upload <code>.app.zip</code> (iOS) or <code>.apk</code> (Android), manage build status (Backlog / In Progress / Done / Rejected), REST API + Personal Access Tokens for CI/CD integration</p>
</li>
<li><p><strong>Session recording</strong> — record and share QA sessions, kept for ~72 hours before automatic cleanup</p>
</li>
<li><p><strong>Team management</strong> — invite links, role-based access (Admin / Developer / QA / Viewer)</p>
</li>
<li><p><strong>Mac resource monitoring</strong> — CPU and RAM time-series charts per agent</p>
</li>
</ul>
<hr />
<h2>Honest limitations</h2>
<ul>
<li><p>iOS simulators require macOS — Apple's constraint, not ours</p>
</li>
<li><p>One Mac typically handles 2–4 simultaneous simulators depending on RAM; connect multiple Macs to pool devices</p>
</li>
<li><p>Still v0.x — breaking changes may appear before v1.0</p>
</li>
</ul>
<hr />
<h2>Try it</h2>
<p>tapflow is MIT licensed.</p>
<pre><code class="language-bash">npm install -g tapflow
tapflow start
tapflow init  # create the first admin account
</code></pre>
<p>For team deployments with a shared relay:</p>
<pre><code class="language-bash"># Relay server (Linux/macOS)
JWT_SECRET=$(openssl rand -hex 32) tapflow relay start

# Each Mac agent
tapflow agent start --relay wss://your-relay-url
</code></pre>
<ul>
<li><p>🔗 GitHub: <a href="https://github.com/jo-duchan/tapflow">https://github.com/jo-duchan/tapflow</a></p>
</li>
<li><p>📖 Docs: <a href="https://www.tapflow.dev">https://www.tapflow.dev</a></p>
</li>
</ul>
]]></content:encoded></item></channel></rss>