Build AI features by simulating production scenarios locally.

The best agentic engineering experience to test, build, and optimize multi-turn background agents and data pipelines.

Book a demo Read the docs

Teams already adding Bitfab to their software factories

Haystack

Elo

Lore

Test AI features end-to-end against real scenarios before deploying.

AI features that use multiple sources of context are notoriously difficult to test without reproducing the original data used in production. Bitfab lets engineers simulate any scenario safely, define outcomes, and have their coding agents make verifiable changes in a loop.

Claude Code

> /bitfab:assistant fix the refund bug

loaded dataset refund-emails · 20 captured scenarios

attempt 1: tighten prompt wording

running experiment...

✗ 12/20 passing, 2 regressions

attempt 2: add order context to the prompt

running experiment...

17/20 passing, closer

attempt 3: split refund policy into its own prompt

running experiment...

✓ 20/20 scenarios passing

✓ fix verified, kept as regression tests

Bitfab · Experimentsrefund-emails · 20 scenarios

CaptureSimulateJudge

attempt 1 · tighten prompt wordingrunning...

attempt 2 · add order contextrunning...

attempt 3 · split refund policy promptrunning...

1. Capture

Capture scenarios from production.

A scenario is everything needed to relive one real customer interaction, captured automatically: the code execution, the database it read, and every external API response.

run_support_agentapp/agents/support.py:31run

load_thread_stateapp/agents/support.py:74run

db.get_threadapp/store/threads.py:18mock

db.get_last_replyapp/store/messages.py:42mock

build_contextapp/agents/context.py:12run

summarize_historyapp/agents/context.py:58run

agent.run_turnapp/agents/runtime.py:22run

llm.completeapp/agents/runtime.py:87run

tool_callsapp/agents/runtime.py:104run

persist_turnapp/agents/persist.py:9mock

db.create_messageapp/store/messages.py:77skipped on replay

analytics.captureapp/lib/analytics.py:15skipped on replay

email.send_replyapp/mailer.py:31mock

Every call recorded in execution order with inputs, outputs, and timestamps, two-way serializable down to datetimes, at zero added latency to your production code.

2. Simulate

Simulate any scenario during development.

Replay any scenario locally against the database as it was at capture time, with every external call mocked. Or change one detail, a different user, a different order, and test the what-if. Every result is judged against the scenario's outcome, so a pass means what your team says it means.

recording

Executing on your server

answer_questioninputoutput

parse_questioninputoutput

vector_db.searchinputoutput

embed_queryinputoutput

pinecone.queryinputoutput

generate_answerinputoutput

db.log_interactioninputoutput

Executing locally

answer_question

parse_question

vector_db.search

embed_query

pinecone.query

generate_answer

db.log_interaction

Handles side effects

Detects and instruments dangerous calls
Replays original outputs for safe simulation
Configurable on the fly by you or your coding agent

@client.span("refund", mock_on_simulate=True)

async def refund(row):

return await stripe.refund(row)

@client.span("checkout-agent")

async def run(order):

...

return await refund(row)

@client.span("notify")

async def notify(row):

await slack.post_message(row)

Managed scenario data

Search across billions of scenarios and connect to your monitoring tools
Adapts scenarios to survive code changes automatically
Groups similar traces into scenarios that define behavior

3. Build

Workflows to test, build, and optimize AI features.

With Bitfab, you can build coding agent loops that weren't possible for AI features without simulation or adding your team's judgement. Bitfab provides interfaces connected to your coding agent so your team can define outcomes, review changes, and visualize results.

Fix issues

Start from a failing scenario, simulate it until it passes, and keep it as a regression test.

/bitfab:assistant fix this trace

replaying refund-email ✗ fail

applying fix, re-replaying ✓ pass

kept as a regression test

Run regression tests

Simulate the whole scenario set and catch what broke before you deploy.

/bitfab:assistant run my dataset

refund-email✓

checkout-summary✓

support-triage✗

19 / 20 passing1 regression

Run experiments

Compare before and after across every scenario to see if a change actually helped.

/bitfab:assistant run an experiment

before72% pass

after91% pass

Optimize costs

Profile token spend and iterate to reduce it without losing output quality.

/bitfab:assistant cut tokens

tokens / run1,240 720

cost / run$0.031 $0.018

42% cheaper, same pass rate

Bitfab learns your team's judgement as you scale.

The scenarios and labels you accumulate become the training data for everything that comes next.

Early stage

Label outcomes

Your team labels what good looks like for each scenario, with the agent doing the heavy lifting.

Growth

Coming soon

Invariants

Scenarios are classified into judges that automate review, tuned on your team's ground-truth labels.

Scale

Coming soon

Train models

Reuse your scenarios and invariants to train models that fit your product exactly.

One-line install. 15-minute setup.

Works with TypeScript, Python, Ruby, and Go, driven from Claude Code, Cursor, or Codex. Start building methodically now.

Book a demo Read the docs