Overview

SafeBuilder

Durable Orchestration · Integrity at Every Boundary · Verified Artifacts

SafeBuilder is a durable, model-agnostic orchestration spine — with a visual editor served to your browser — that mechanizes the AI-first build methodology. You author a build workflow as a graph; SafeBuilder schedules executors to run it, enforces integrity at every stage boundary, and moves verified artifacts between stages — without ever reasoning about the codebase itself.

It is to the AI-first methodology what a CI/CD engine is to compilation: it never does the work, it orchestrates the things that do, and it enforces contracts at the boundaries. The executors that write code are treated as commodity, swappable runners. The integrity guarantees that wrap them are the defensible core. SafeBuilder owns the loop; it rents the muscle.

Who this is for. This page is for adopters inside Employbridge and across the Apollo portfolio evaluating SafeBuilder for their own AI-first builds — not a public launch. The question it answers is the one a team burned by a confident-but-false "it's done" actually has: why is a SafeBuilder "ready to ship" trustworthy? The answer is mechanical, not rhetorical — the system trusts telemetry, not testimony (every "done" is backed by a harness recording of what the executor actually read and deterministic gates over it, not by the executor's own claim). The orchestration core that delivers this is built and proven end-to-end; where a piece is still forthcoming (the native desktop installers), this page says so plainly.

What It Is

An orchestration and verification system for the AI-first build methodology. It never reasons about the code. It interprets an authored workflow graph, schedules executors, enforces contracts, and moves verified artifacts between stages. Executors (agentic CLIs) do the work; gates verify it; a visual editor — served by the backend and opened in your browser — is where you compose and operate the build. (The same SPA is designed to run inside a native desktop shell; those per-OS installers are packaged from source and not yet validated on hardware — see Install & Run.)

The system is deliberately small and stays that way: work is always delegated outward to standalone, swappable external processes, and the spine is the only component coupled to the workflow engine.

SafeBuilder

The Problem It Solves

The AI-first engineering methodology works as a process where a reasoning LLM drafts task prompts, an agentic executor CLI runs each prompt against a real codebase, and the cycle repeats 40+ times per build with periodic audits and recalibration. It produces verified, fully-tested, 100%-AI-generated code at rates far beyond traditional development — but, run by hand, it has two structural weaknesses, both rooted in long-horizon state loss:

Drift (the Xerox effect). Across a multi-prompt cycle, an executor progressively substitutes its own paraphrase of the source-of-truth documents for the documents themselves. Each successive prompt operates on a lossier copy of the prior understanding. Prompt 30 is materially worse than prompt 3.
Decay. The executor stops re-reading source documents because they are nominally "already in context" from many prompts earlier — except the runtime has since compressed or evicted them, and the executor now confabulates from a half-remembered summary rather than re-reading from disk.

The shared root cause is that the executor is permitted to treat its own memory as a substitute for the source of truth. The governing principle — never from memory, conversation context, or inference; always from verified source of truth — degrades precisely when it is needed most if it is enforced only by instruction-following. A second weakness is orchestration fragility: the cycle, audit cadence, and corrective re-invocations live in the operator's working memory with no durable record and no mechanical enforcement.

SafeBuilder removes both by making integrity a mechanical gate, not a hope, and by moving long-horizon state into a durable spine.

Vision

The frontier labs are commoditizing the executor — the agent that turns a well-specified task into code. SafeBuilder deliberately does not compete there. Its value is the orchestration-and-verification layer that guarantees specification fidelity before execution and verification quality after it. The executors are commodity, swappable runners; the integrity guarantees that wrap them are the defensible core.

Key Principles

The spine never reasons about the code. It schedules, enforces contracts, and moves artifacts. All reasoning about the codebase happens inside executors and gates, never in the orchestrator.
Source of truth is referenced, never remembered — and provenance is observed, never self-reported. No stage passes another a paraphrase. Stages communicate exclusively through verifiable artifacts with line-anchored provenance, and that provenance is captured by the harness at the tool boundary — a recording of what the executor actually read, not a manifest the executor produces about itself. The system trusts telemetry, not testimony.
Sessions are ephemeral by design. Each prompt cycle is born, hydrates from durable on-disk artifacts, runs one task, emits its report, and dies. No transient understanding accumulates or degrades across cycles.
Integrity is a gate, not a hope. Methodology rules are enforced mechanically at stage boundaries, not requested in prose preambles.
Not monolithic. Executors and gates are standalone, swappable, external CLIs. The system stays small because work is always delegated outward.
Integrity is deterministic and provable. Every gate is a mechanical check; no probabilistic model judgment sits in the critical path.

How It Works — Architectural Overview

SafeBuilder — Architecture Overview

SafeBuilder is a five-concern system. The concerns are kept genuinely separate; that separation is the anti-monolith discipline.

Concern	Responsibility	Realization
Spine	Durable orchestration: interprets the authored workflow graph — position, advancement, scheduling, correction. Never reasons about code.	Temporal parent + child workflows interpreting the spine (Java SDK)
Executor plugins	Adapt a generic "run this prompt against this repo, capture reads, return a report" into a specific agentic CLI, run under confinement.	Trusted-provenance adapters (Claude Code, OpenCode, LockedCode) sandboxed to the workspace
Gate plugins	Verify a completed cycle's artifacts; return pass/fail + findings. Methodology made mechanical.	Deterministic gates: drift, citation-integrity, write-scope, coverage. Pluggable interface accepts further gates.
Artifact bus	Move and persist verifiable artifacts with provenance between stages.	Local-filesystem build workspace
Front doors	Author, start, and observe builds.	Visual editor (browser-served; native desktop forthcoming); control/observation HTTP API; Factory CLI

The spine is the only component coupled to Temporal. Executors and gates are standalone external processes the spine invokes through activities. This is what makes the muscle swappable and the loop ownable.

The Three-Layer Execution Model

An executor is not a black box. It decomposes into three layers, and the distinction is load-bearing for provenance:

  Reasoning Model        (e.g., Claude Opus)      — decides what to do
        |
        v
     Executor            (e.g., Claude Code)      — turns decisions into tool calls
        |
        v
      Tools              (e.g., the Read tool)    — touch the source of truth

The source-of-truth interaction happens in the tool layer. Therefore provenance is captured there — at the tool-call boundary — not inferred from what the model says it did. The read-log is a recording made at this layer. This is the mechanical basis for provenance is observed, never self-reported.

Build Lifecycle (Canonical Flow)

  Authored Spine Graph
        |
        v
  [ Spine Interpreter ]  (parent — durable, holds position + advancement rule)
        |
        |  walk the graph in dependency order:
        v
  +------------------------------------------------------------------+
  |  [ PromptCycleWorkflow ]  (child — one task, ephemeral)          |
  |        |                                                         |
  |        v                                                         |
  |   Pre-flight gates  ---- fail ---->  (report to parent)          |
  |        | pass                                                    |
  |        v                                                         |
  |   Executor activity  (heartbeating, confined to workspace)       |
  |     - adapter installs tool-boundary interceptor                 |
  |     - CLI runs the prompt; every read captured to read-log       |
  |     - CLI writes code + completion-report.json to workspace      |
  |        |                                                         |
  |        v                                                         |
  |   Post-flight gates  (run in parallel — read-only):              |
  |     - drift gate        (read-log vs disk)                       |
  |     - citation-integrity gate (code vs read-log)                 |
  |     - write-scope gate  (files-touched within scope)             |
  |     - coverage gate     (measured % vs contract)                 |
  |        |                                                         |
  |        v                                                         |
  |   Gate verdict  -------> returned to parent                      |
  +------------------------------------------------------------------+
        |
        v
  [ Parent advancement decision ]
        |
        +-- all gates pass ----------> commit + push, advance to next node
        |
        +-- gate failure ------------> reset working tree to last passed
        |                              commit; spawn corrective child
        |                              (same task, seeded with prior findings)
        |
        +-- audit cadence reached ---> spawn audit child, then recalibrate
        |
        +-- graph exhausted ---------> build complete (STATUS=complete)

  (Between children only, never mid-child:
   parent may Continue-As-New to bound its event history.)

Authoring & Operating — Visual Editor (Browser or Desktop)

A build is an authored workflow graph, not a fixed script. SafeBuilder's visual editor is a node-graph canvas — served by the backend and opened in your browser — where you compose the workflow and operate it live. The backend serves the editor and every file and control operation over a local API; the same SPA is designed to run unchanged inside a native desktop shell (forthcoming — see Install & Run), because both surfaces speak only to that local API.

Typed node palette. Control-flow nodes (start, end, decision, parallel/join, bounded loop), execution nodes (prompt cycle, executor task, gate task, audit task), integration nodes (wait, signal, external activity), and annotations. Nodes connect with edges; connectivity is the single source of truth, and routing decisions are expressed in a small closed predicate grammar on the edges (allGatesPass, gateVerdict, taskStatus, lastVerdict, iterationCount) — never in free-form code.
Per-node configuration. A properties inspector exposes each node's settings: executor and model, timeouts and retries, the gate set, the declared write-scope, loop bounds, join policy, decision outcomes.
Folder-based projects. A project is a directory containing many named build workflows (*.spine.json), opened and managed through an IDE-style project tree. Nothing is stored in a database.
Validation before run. Every authored graph is checked against the spine invariants — exactly one start, full reachability, at least one reachable end, structured parallel/join pairing, bounded loops with body containment, decision-outcome coverage, registered executor/gate references, a well-formed predicate grammar — and rejected with itemized errors before it can run.
Live operation. You run the orchestration backend and its workflow engine (the bundled Temporal dev server) locally, and open the editor as a page the backend serves. Runs are started, paused, resumed, stopped, and signaled from the UI over the local API, and a live overlay animates per-node status as the build proceeds. The native desktop shell — which is designed to launch the backend and the workflow engine as managed, health-gated sidecars with no operator setup — is forthcoming: it is packaged from source (see Install & Run) but not yet validated on real hardware.

The graph the editor draws is the artifact the spine interprets — there is no translation step and no second source of truth.

The Spine — Temporal Workflows

The Parent — Spine Interpreter

One instance per build. It interprets the authored spine graph and owns durable long-horizon state and nothing transient: the workflow graph, the current position within it, per-node status, the audit-cadence counter, and pointers to each completed cycle's artifacts. It determines the next runnable node, spawns children, applies the advancement decision (advance / corrective child / audit child / halt / complete), and manages its own event-history size via Continue-As-New.

What it must never do: read, parse, summarize, or reason about codebase content, prompt bodies, or report bodies beyond the structured status fields it needs to decide advancement. Any such logic is a defect.

PromptCycleWorkflow (Child)

One instance per task attempt. Born, hydrates from on-disk artifacts, runs one cycle, returns a structured result, dies — its event history dying with it, which is why the parent stays lean. It runs pre-flight gates, invokes the executor (heartbeating activity), runs post-flight gates in parallel, and returns the cycle result. The child does not commit or push; that is the parent's decision after seeing the verdict, keeping "what passed" and "what gets persisted to git" as one authority.

Continue-As-New Invariant (Hard Rule)

The parent bounds its event history with Continue-As-New. Because child workflows are not retained across a parent's Continue-As-New (the parent receives a new run ID; in-flight children would be orphaned and signals broken):

The parent may invoke Continue-As-New only between prompt children, after explicitly awaiting completion of the current child and before spawning the next. It must never Continue-As-New while any child is in flight.

This is safe by construction given sequential execution.

Execution Order

Default execution follows the spine's edges. Concurrency is a first-class part of the vocabulary: a parallel node forks concurrent branches that a paired join re-converges under a declared policy (all / any / n) — structured concurrency, expressed in the graph and never inferred by the spine (inferring it would be reasoning about the work). The Continue-As-New invariant applies at parallel/join boundaries exactly as to single children. (This first-class parallel/join construct replaced v1's roadmap-declared-independent batches, which were retired with the v1 fixed parent.)

Activities

Activity	Role	Notes
`RunPreflightGates`	Execute pre-flight gate CLIs	Returns structured findings
`InvokeExecutor`	Shell out to the executor adapter, confined to the workspace	Heartbeating; long-running
`RunPostflightGates`	Execute post-flight gate CLIs in parallel	Read-only; per-gate verdicts
`ResetWorkingTree`	Reset the repo working tree to the last passed commit before a corrective cycle	Parent-authorized
`CommitAndPush`	Git commit + push on pass	Parent-authorized only; records commit hash
`RunAudit`	Execute the audit step for an audit child	Produces a fresh audit artifact

InvokeExecutor uses Temporal heartbeating with a heartbeat timeout (to detect a hung executor) and a start-to-close timeout (to bound cycle duration). Retries honor the STOP-LOSS discipline: a cycle that cannot go green is not looped indefinitely — it fails to the parent with the suspected cause logged, and the parent decides corrective re-invocation vs. halt. All timeout and retry values are centralized constants.

Contracts

Both contracts are JSON Schema documents validating on-disk artifacts, not network payloads. The executor activity passes data by reference: it writes artifacts to the workspace and returns their paths; the spine and gates read them from disk.

completion-report (Executor → Spine)

Carries task_id, attempt, status (complete | blocked | partial), commit_hash (required when complete; absence fails validation), coverage ({ line_pct, branch_pct, measured: true } — self-declared, then independently re-checked by the coverage gate, which is authoritative), files_touched (checked by the write-scope gate against the declared blast radius), deviations (the STOP-LOSS log), gate_results (filled by the spine, not the executor), and read_log_ref (a pointer to the harness-captured read-log — the executor does not inline provenance). The human-readable .md report is a rendering of this JSON, not the source of truth.

read-log (Harness → Gates)

Produced by the adapter's interceptor, never by the executor — the provenance of record. Carries task_id / attempt, executor (and capability tier — always trusted-provenance), an ordered reads list (one entry per content-surfacing tool call, with tool, path, range, content_hash, returned_content_ref, and timestamp), and an interception_manifest confirming the log was produced under a complete interception configuration. Because it is a recording, the entity being recorded cannot fabricate it.

Executor Input

A prompt.md (beginning with the mandatory STOP / read-these-files header), the repo_path, the authorized source_pointers (the source-of-truth files — OpenAPI spec, audit, architecture, conventions), and the task's declared write-scope. The source pointers are the read authorization set and the write-scope is the write authorization set; the read-log is the actual read set and files_touched is the actual write set; the gates compare them.

Executor Plugins and the Interception Map

Each adapter wraps one agentic CLI and is responsible for two things: invoking it under the executor contract, and installing the tool-boundary interceptor that produces the read-log. Every supported executor is a trusted-provenance executor — one whose file reads are observable at the tool boundary. The adapter's defining responsibility is interception completeness: capturing every path by which the executor obtains file content, not only direct reads.

Claude Code — capture via Claude Code hooks at the tool boundary. The Read tool and content-surfacing search tools (Grep, Glob, and any file-content-returning search) are each matched at the post-tool hook and recorded as read entries. Hooks fire deterministically (shell commands, not model prompts), so capture is not subject to model cooperation.
OpenCode — capture via the plugin system: tool.execute.after is the primary capture hook, tool.execute.before a secondary control point, and experimental.session.compacting is captured as a first-class integrity event — the literal Xerox moment — so context-compaction is recorded rather than silently losing reads.
LockedCode (OpenCode fork) — inherits all OpenCode capture points, hardened in the fork: because the read path is owned, every file-content-returning operation is routed through a single recording chokepoint, making the read-log a property of the runtime rather than a cooperating plugin. This yields provenance the executor cannot route around.

Detection is paired with confinement. The adapter runs its CLI inside a sandbox scoped to the build workspace, so the executor's filesystem writes are physically bounded to the repo working tree rather than only checked after the fact. Confinement and the write-scope gate are complementary: the sandbox prevents an out-of-workspace write from ever escaping, while the gate fails the cycle when an in-workspace write lands outside the task's authorized blast radius. Neither relies on the executor's cooperation.

Every adapter emits the same two artifacts (completion-report.json rendered also as .md, and read-log.json) — the uniformity that makes a new trusted-provenance executor one new adapter plus its interception map and per-path capture tests.

Integrity Gates

Gates are standalone external CLIs. Each consumes a cycle's artifacts and emits a structured findings artifact plus a pass/fail exit status. The spine aggregates verdicts; the spine never judges code itself. Every gate conforms to one pluggable interface (workspace path + artifact pointers in; gate-findings.json + exit status out) and is read-only with respect to the repo, which is what permits running post-flight gates in parallel.

Drift Gate. Consumes the read-log; for each recorded read, compares the captured content (by hash, then by content on hash mismatch) against the current on-disk content at the recorded path/range, verifying the range is still valid. Fails on any mismatch or invalid range. Zero model involvement. This catches the most common and most dangerous failure mode — stale context / the Xerox effect — provably.
Citation-Integrity Gate. Verifies that every entity the produced code depends on or modifies corresponds to something present in the read-log. An entity touched but never read is generation-from-ignorance — hallucination caught mechanically at the tool layer, with no model cooperation.
Write-Scope (Blast-Radius) Gate. Compares the cycle's files_touched against the task's declared write-scope and fails any modification outside it. Where citation-integrity catches reading too little (code touching an entity that was never read), write-scope catches writing too much: an executor that reads correctly but edits files beyond its mandate, or scatters unrequested changes across the repository. Together the two gates bound both ends of the read/write relationship.
Coverage Gate. Independently measures line and branch coverage (it does not trust the report's self-declared number) and compares against the 100% contract. "Coverage" here is strictly test coverage of code, never build completeness.

Every gate is deterministic — a mechanical comparison, not a judgment. No model sits in the verdict path.

Artifact Bus and Trust Boundary

All stage-to-stage communication is through verifiable artifacts on disk. The store is the local filesystem — a per-build workspace containing prompts, the repo, completion reports, read-logs with captured content, gate findings, drift reports, and audit artifacts. Every artifact is associated with its run, node, and attempt; failed-attempt artifacts are retained as a forensic history of drift-and-correction. This forensic record is what makes the corrective loop cheap and clean: because failed cycles never commit, the last passed commit is always the last green state, so a corrective child begins by resetting the repo working tree to that commit and re-running the task from known-good — the failed attempt's partial edits are discarded from the tree but its artifacts (read-log, gate findings, deviations) survive outside the tree as the seed for the next attempt. There is no expensive state-rewind; recovery is a git reset plus a fresh, bounded re-invocation.

Because Temporal holds only pointers, the artifact store is inside the trust boundary — every gate assumes store integrity.

Front Doors

SafeBuilder is operated through one shared control layer (FactoryControl) behind these surfaces:

Control & Observation API. A loopback HTTP API (OpenAPI 3.1, openapi.yaml): project and workflow management, spine validate/generate, run lifecycle (POST /projects/{id}/runs to start, plus stop/pause/resume), signal delivery, a live SSE event stream, and artifact access. A run is started spine-first — the API hands the project's saved spine.json to the interpreter; there is no roadmap input. The visual editor is built on this API.
Factory CLI. The same control layer from the command line: start --repo <dir> reads the project's spine.json and starts the run, then stream status and drift/correction events and inspect a cycle's artifacts.
Visual editor (browser). Compose, validate, run, and observe a build on the node-graph canvas; the live overlay tracks per-node status. Served by the backend over the local API and opened in a browser; the same SPA is designed to run unchanged inside the forthcoming native desktop shell.

All surfaces delegate the heavy lifting to the spine; none of them reason about code.

The Suite

SafeBuilder is the code-generation core of a four-repo suite; the repos couple through on-disk JSON-Schema artifacts, not a network API:

safebuilder-construct (this repo — built) — the orchestration-and-verification engine: the spine interpreter, deterministic gates, trusted-provenance executor adapters, the control layer, the HTTP API, and the CLI. Verified end-to-end against a real Temporal dev server.
safebuilder-plan — model-assisted authoring of the build plan / spine (outside the integrity path).
safebuilder-contracts — the shared seam contracts (buildPlan in, constructionRecord out, findings back as corrective requirements) as JSON Schema over on-disk files.
safebuilder-verify (forthcoming — not yet scheduled) — an independent verdict layer built ground-up: a telemetry-anchored verdict (from the harness-captured read-logs, gate findings, and run history this repo emits — never self-report), verification agents as bounded lenses over that evidence, an adversarial squad that tries to refute a "done" claim from multiple angles, and a durable, content-hashed audit trail of every verdict. Construct-first: the engine ships first; Verify is designed (Architecture §17) and follows.

Install & Run

The orchestration engine runs today from source. The native desktop installers are packaged but not yet validated on real hardware (see the note at the end of this section).

Prerequisites: Java 21, Maven, the Temporal dev-server binary, and at least one executor CLI (Claude Code, OpenCode, or LockedCode). Maven only — no Gradle.

Build the backend — mvn clean package from the repo root (Java 21).
Start the durable workflow engine, disk-backed so build state survives a restart: temporal server start-dev --db-filename ~/temporal/safebuilder.db (the in-memory default does not satisfy restart-resumption).
Start the backend (the spine worker + the loopback HTTP API that also serves the editor SPA), then open the editor in your browser.
Author a build — compose and validate a spine graph in the editor (or hand-author a project's spine.json).
Start a run — spine-first. From the CLI: start --repo <project-dir> reads that project's spine.json and starts the interpreter. From the API: POST /projects/{id}/runs starts the interpreter on the project's saved spine. (There is no roadmap/BuildInput input — that v1 path was retired.)

Native desktop app (forthcoming). A Rust/Tauri shell that launches the JVM backend and the bundled Temporal dev server as managed, health-gated sidecars (zero operator setup) is packaged per OS from src-tauri/ — the full build matrix and signing/notarization steps are in src-tauri/BUILD.md. These installers require a real per-OS build host and have not yet been validated on hardware — there is no signed, downloadable binary today. Until then, run the engine from source as above.

Cross-Cutting Conventions

Centralized logging — the spine, adapters, and gates log through a centralized configuration; no ad hoc logging.
Centralized constants — all timeouts, retry counts, cadence values, and coverage thresholds live in one location, never inline.
Documentation — Javadoc on every class and public method (excluding DTOs, entities, generated code), shipped in the same pass.
Testing — unit + integration tests at 100% line and branch coverage, in the same pass; never a follow-up task.
Maven only — the Java spine builds with Maven; no Gradle.
Persistence during development — Hibernate, not Flyway; Flyway only at production.

What SafeBuilder Guarantees

These are the integrity guarantees the system is built to enforce. The mechanical ones below are proven today — exercised by the green test suite and by live runs against a real Temporal dev server. The two full-system outcomes (a complete unattended build, and mid-build restart-resume) are demonstrated end-to-end in the final on-hardware validation step (build-host-bound); the mechanisms they rest on are proven, the headline multi-cycle demonstration is the last step.

(full-system, final validation step) A dependency-ordered build of non-trivial size runs to completion through the spine, executors, and gates, unattended across the full cycle.
(full-system, final validation step) The spine survives a process restart mid-build and resumes correctly from durable state. (The durable engine and disk-backed persistence are in place; the mid-build demonstration is part of the on-hardware step.)
The drift gate detects a stale read against the harness read-log and halts or corrects the cycle.
The citation-integrity gate fails a cycle whose code touches an entity absent from the read-log; the write-scope gate fails a cycle that writes outside its declared blast radius; the coverage gate fails an under-covered cycle.
Interception is complete: every content-surfacing tool path (direct read plus grep/glob/search) is captured in the read-log, with a test per path.
A failed gate resets the working tree to the last passed commit and triggers a corrective re-invocation seeded with prior findings — or fails loud (a RESET_FAILED halt) rather than building a correction atop a tree it could not reset; the forensic history of the drift-and-correction is retained and queryable.
The executor is swappable: a second trusted-provenance adapter substitutes via configuration alone, with no spine code change.
All code ships with unit and integration tests at 100% line and branch coverage (measured) and documentation on every class and public method, in the same pass.

What SafeBuilder Is Not

Not a coding agent. Not a code editor or IDE — the visual editor composes build-workflow graphs, not source. Not a replacement for executors. Not a semantic correctness oracle — it verifies provenance, scope, and contract conformance, not whether the code is good.

Governing documents: SafeBuilder Charter v2.0.0 · Architecture v2.0.0