- Design Goal
- Canonical Model
- Philosophy
- Current Implementation Priorities
- System Boundary
- Structured Render Contract
- Module Layer Rules
- Adapter Boundary
- Pipeline
- Invariants
- Category B Behaviors
- Why Persistent Token Accounting Matters
- Why SHA-256 Matters
- Why Output Spans Matter
- Config Safety and Merge Semantics
- Extraction Semantics
|
Track A produces proof-grade artifacts such as bundles, manifests, hashes, and verification results. Track B produces hypotheses, exploratory reasoning, and live workspace investigation state. Use the proof path when later humans or CI must trust the output. Use the live path when the work is still moving. |
|
Trust remains asymmetric across the system:
This page recaps only the local boundary. Use Cross-cutting Concepts when you need the full canonical model. |
This is the implementation reference for contributors who need to understand how the codebase enforces the canonical model.
Read this after Mental Model, System Contracts, and System Map, not before them.
Design Goal
cx exists to make context bundling reproducible enough for automation.
For the documentation map, see Documentation Index.
The project now centers a native render kernel because rendering alone is not enough for CI/CD. A pipeline also needs deterministic planning, exact metadata, verification, and explicit recovery semantics. The remaining adapter/oracle seam exists for diagnostics and parity visibility, not as the shipped proof path.
Canonical Model
See Operating Modes and Mental Model.
This document starts where the shared mental model stops. The mental model defines the CX triad, Track A vs Track B, MCP policy tiers, and the artifact lifecycle. Architecture explains how the codebase enforces that model with deterministic planning, manifest truth, verification rules, and extraction guardrails.
Philosophy
cx is an operational bundler with a kernel-owned proof path.
That distinction explains most of the architecture:
-
exploratory packing tolerates heuristics
-
operational bundling needs reproducibility
-
local prompt assembly can accept "close enough"
-
remote automation needs "provably the same"
This is why cx records token accounting in the manifest, writes
canonical JSON, and protects emitted artifacts with SHA-256 checksums.
Those are not decorative constraints. They are the mechanisms that let a
bundle survive time, transport, and automation boundaries.
Current Implementation Priorities
The current reliability program avoids broad framework churn and instead prioritizes boundary cleanup and clearer ownership.
The active modernization targets are:
-
keeping command I/O injectable instead of process-global
-
keeping workspace context explicit instead of ambient
-
keeping Vitest as the authoritative shared-suite test runner and coverage lane
-
keeping Bun limited to explicit runtime compatibility smoke
-
keeping the adapter seam parity-only and out of the shipped proof path
That means the main architectural work is still about deterministic boundaries, not about swapping frameworks for their own sake.
System Boundary
cx does not shell out to the adapter/oracle runtime. It uses a narrow
boundary and only relies on the small public surface it actually needs
for diagnostics and parity work.
Core responsibility split:
-
Render kernel (
src/render/): proof-path interfaces, ordering, spans, and plan hashing -
Adapter/oracle seam (
src/adapter/): parity-only adapter integration and runtime capability reporting for expert oracle diagnostics during migration -
cxplanner: decide which files belong where -
cxmanifest layer: describe the bundle in stable JSON -
cxverification layer: confirm artifacts and source-tree alignment -
cxextraction layer: recover files according to manifest truth
Structured Render Contract
The canonical proof-path contract now lives in Render Kernel Contract. This section gives architectural context for the current implementation.
The stabilized internal seams that sit beneath that contract are recorded in Internal API Contract.
cx enforces a deterministic structured render contract instead of
relying on heuristic span parsing. The native kernel owns the production
proof model; the adapter/oracle seam is comparison-only.
The structured render contract is represented as:
interface StructuredRenderEntry {
path: string;
content: string;
sha256: string; // Content hash, not dependent on rendering
tokenCount: number;
}
interface StructuredRenderPlan {
entries: StructuredRenderEntry[];
ordering: string[]; // Canonical lexicographic ordering
}
Enforcement
-
Deterministic ordering: All files are sorted lexicographically.
validatePlanOrdering()enforces this invariant. -
Content integrity: Each file’s sha256 is computed during structured extraction, not after rendering.
validateEntryHashes()verifies consistency. -
Plan hash: The manifest stores
renderPlanHash, computed from the deterministic JSON representation of the plan. This provides cryptographic proof that the render plan is correct. -
Verification:
cx verifynow checks:-
Ordering is deterministic (no regressions in file order)
-
All entry hashes are consistent (content didn’t drift)
-
Plan hash is reproducible (render contract is sound)
-
Files
-
src/render/types.ts: kernel-owned proof-path types -
src/render/engine.ts: render engine interface plus the current native-first default implementation -
src/render/native/: native proof-path renderers for XML, Markdown, Plain, and JSON section outputs plus shared render helpers -
src/render/structuredPlan.ts: kernel-owned structured plan construction for native rendering plus adapter-plan extraction for parity/oracle use -
src/render/ordering.ts: deterministic ordering invariant checks -
src/shared/tokenizer.ts:TokenizerProviderinterface plus the default tokenizer implementation used by bundle, inspect, and render planning -
src/doctor/scanner.ts:ScannerPipelineinterface plus the reference scanner bridge used bycx doctor secrets -
src/render/planHash.ts: section and aggregate render-plan hashing -
src/render/spans.ts: style-aware output span helpers -
src/adapter/oracleRender.ts: reference-oracle renderer retained for parity checks and compatibility diagnostics -
src/manifest/types.ts: AddedrenderPlanHashfield -
src/manifest/build.ts: Computes aggregate plan hash from sections -
src/bundle/verify.ts: Validates plan integrity during verification
Module Layer Rules
The codebase enforces strict import directionality. Violations are bugs, not style issues.
config/ shared/ vcs/ ← foundation (no domain imports)
↓
notes/ planning/ manifest/ ← domain modules
↓
inspect/ doctor/ adapter/ ← cross-domain orchestration
↓
mcp/ ← transport layer (imports domain only)
↓
cli/commands/ ← presentation layer (thin shells)
Enforced boundaries:
-
planning/must not import fromnotes/. The planner classifies files; note graph enrichment is an orchestration concern.notes/planner.tsprovidesenrichPlanWithLinkedNotesas a post-planning step called by the CLI bundle and inspect paths, not by the planner itself. -
mcp/must not import fromcli/commands/. MCP is a transport layer; it imports domain functions fromdoctor/,inspect/,notes/, andplanning/directly. The CLI command files are thin presentation shells that re-export from those same domain modules. -
Note CRUD operations (
createNewNote,readNote,updateNote,renameNote,deleteNote,searchNotes,listNotes) live innotes/crud.ts, not incli/commands/. CLI and MCP both import from the domain module. -
MCP config resolution (
resolveMcpConfigPath) lives inmcp/config.ts, not incli/commands/. The CLI mcp command re-exports it from there.
Module inventory (domain layer):
| Module | Responsibility |
|---|---|
|
Note CRUD I/O and search |
|
Note link graph construction and queries |
|
Linked-note enrichment for bundle plans |
|
VCS-anchored master file list construction |
|
MCP profile diagnostic report |
|
Section overlap diagnostic report |
|
Secret scan diagnostic report |
|
Workflow mode recommendation |
|
Bundle plan inspection report |
|
MCP config path resolution |
|
Shared MCP tool catalog and capability metadata |
|
Shared MCP registration wrapper that carries capability into enforcement |
|
Workspace navigation tools |
|
Bundle preview and inspect tools |
|
Doctor diagnostic tools |
|
Note CRUD and graph tools |
Adapter Boundary
The src/adapter/ directory is the explicit boundary between cx core
logic and the external oracle/reference runtime.
This is an expert surface:
-
parity diagnostics
-
oracle capability inspection
-
migration and reference testing
It is not required for ordinary bundle, verify, validate, or extract flows.
-
oracleRender.tsowns parity-only calls into the adapter. Nothing outside this module invokes adapter render functions directly. -
capabilities.tsperforms runtime feature detection (ispackStructuredavailable? isrenderWithMapavailable?) so the planner and manifest builder never need to know which rendering path was taken. -
section.tsprovides the render-command wrapper for explicit section output without letting the CLI reach into render internals directly.
The [repomix] section in cx.toml is adapter-specific
configuration. cx.toml keys like show_line_numbers,
include_empty_directories, and security_check are passed
through exclusively within adapter/scanner bridge code and are never
interpreted by the planner, manifest builder, or any other core
proof-path module. style is the single key shared between the two
layers: it is used by the planner to determine output file extensions
and by adapter/oracle code only when comparison tooling needs it.
Future external comparison backends should follow the same pattern: a
new src/<backend>/ directory that exposes the same comparison
seam while keeping core modules unaware of adapter internals.
Current Migration State
The proof path now uses a native-only production engine:
-
XML, Markdown, Plain, and JSON sections are rendered through the native kernel path in
src/render/native/ -
tests/render/nativeParity.test.tsis the release-gating evidence that the reference oracle and native kernel still satisfy the same proof contract
This split is intentional. It lets the kernel own the production proof
path without weakening extraction, verification, or manifest trust.
Official repomix remains only as an optional reference oracle for
parity/testing work; the older fork is historical. The reference-oracle lane now
includes scripts/repomix-adapter-parity.js, which keeps adapter upgrades bound
to a checked-in repomix 1.13.1 baseline for rendered XML bytes, per-file token
counts, total token count, and security findings.
Pipeline
[[1-configuration-load]] === 1. Configuration load
cx loads cx.toml, expands supported paths, resolves behavioral
settings, and validates the project configuration.
[[2-deterministic-planning]] == 2. Deterministic planning
Planning is a three-phase pipeline anchored to the VCS-derived master list.
Phase 1: Build the master base
The planner resolves the source of truth for candidate files from version control, not from a broad filesystem walk. The priority order is:
-
Git —
git ls-files --cachedprovides the tracked set. -
Fossil —
fossil lsprovides the tracked set. -
Filesystem — used as a fallback when no VCS root is detected (for test environments or unversioned workspaces).
That VCS-derived list is the master base for all later planning decisions.
Tests validate VCS dispatch using real temporary repositories rather than module mocking. This avoids worker-global mock leakage and keeps the planner behavior identical to the actual Git/Fossil/Hg provider code paths used in production.
Phase 2: Apply global list shaping
The global [files] include array can extend the master base
with extra paths that the VCS does not track, such as generated
artefacts or build outputs. The global [files] exclude array
is applied after all extensions to remove paths that should never be
planned.
This shaping step changes membership in the master base, but it still does not let sections discover new files on their own.
Phase 3: Classify into sections
Section include and exclude globs are classifiers, not
discoverers. They operate only on the already-computed master base and
can never add a file that is not already in it. This separates the
question of what exists from the question of where it belongs.
At this stage the planner resolves:
-
project name
-
source root
-
output directory
-
VCS provider and working-tree dirty state
-
section membership (including priority-based overlap resolution)
-
copied assets
-
unmatched files
-
overlap and collision failures
This happens before rendering because the plan must be settled first.
Dirty state taxonomy
After deriving the master list, the planner classifies the working tree into one of four states:
| State | Condition | Default behavior |
|---|---|---|
|
No modified or untracked files |
Plan proceeds normally |
|
Only untracked files present |
Plan proceeds with a warning |
|
Tracked files have uncommitted modifications |
Planning aborts (exit 7) |
|
|
Plan proceeds with a warning |
|
|
Plan proceeds with a warning |
The unsafe_dirty guard exists because a bundle built from a dirty
tracked file cannot be reliably reproduced or verified later. Two escape
hatches are available: --force for local experimentation where a human
is present, and --ci for automated pipelines. Both are recorded in the
manifest under distinct state labels so audit tooling can distinguish
human overrides from pipeline overrides.
Why this protects you: a tracked-file bundle built from a moving working tree cannot later prove what source state it represents. Refusing to proceed by default keeps the artifact contract anchored to reproducible input.
VCS state is not tracked for filesystem-fallback workspaces. Those
always produce dirtyState = "clean" and vcsProvider = "none".
Section priority and overlap resolution
Sections are ordered by their priority value (descending) before
overlap resolution begins. A file claimed by multiple sections is
assigned to whichever section appears first in the resolved order.
Sections without an explicit priority are treated as priority 0 and
their relative order follows dedup.order (config position or lexical)
as a stable tie-breaker.
Catch-all sections
A section may set catch_all = true instead of providing an
include list. A catch-all section absorbs all files in the master list
that were not claimed by any other section. It runs last in the planning
pipeline, after all normal sections have consumed their files, and may
still apply an exclude list to filter what it absorbs.
At most one catch-all section is permitted per configuration.
This same mechanism lets repository notes live in the normal planning
model. The default docs section now includes notes/**, so
durable human intent is bundled beside documentation and code-facing
Markdown instead of being treated as a separate side channel.
Three overlap handling strategies are available via dedup.mode:
-
fail— planning aborts with an actionable message.cx doctorcan propose staticexcludefixes, or you can set higherpriorityon the owning section and switch tofirst-winsto avoid static TOML mutations entirely. -
warn— conflicts are reported to stderr and resolution proceeds using priority order. -
first-wins— overlaps are resolved silently using priority order.
For rapidly evolving codebases where new files frequently match multiple
sections, dedup.mode = "first-wins" with explicit priority values is
preferable to accumulating static exclude paths in cx.toml. Static
excludes become stale as files are renamed or moved, and they generate
merge conflicts when multiple developers resolve overlaps concurrently
on separate branches.
[[3-section-rendering]] === 3. Section rendering
Each section is rendered as one Repomix-compatible output file in the configured style:
-
xml -
markdown -
json -
plain
cx also supplies a section-specific Repomix header through the
documented output.headerText option so the file itself carries
cx-oriented handover context without post-processing the generated
output. That header is intentionally a presentation-layer prolog rather
than part of the core format semantics: it explains what the artifact
is, how to interpret logical paths, where edits belong, and which bundle
contracts remain authoritative.
This yields more semantic fluency for humans and AI consumers without relaxing the deterministic contract:
-
rigid core semantics for markers, directives, manifests, and validation
-
expressive consumer guidance in the generated prolog and shared handover
-
optional narrative cues that never override canonical bundle data
The renderer also reports output token counts. If the adapter supports
exact span capture, cx records absolute outputStartLine and
outputEndLine values for each packed text file in XML, Markdown, and
plain sections. Those spans are the primary lookup path for those text
formats. JSON uses direct object lookup instead of span metadata, and
JSON-only bundles may omit spans entirely.
[[4-shared-handover]] === 4. Shared handover
cx bundle writes a kernel-rendered shared handover file alongside the
section outputs. The handover is meant to travel with the section files
when multiple outputs are handed over together, so the shared context is
externalized without breaking the self-contained section files.
For XML-style bundles, that handover intentionally stays mostly plain
text. Rare tags such as <section_inventory> and
<recent_repository_history> act as semantic anchors
for LLMs and operators rather than as a full XML document contract.
[[5-manifest-build]] === 5. Manifest build
cx writes a canonical manifest that records:
-
bundle identity and versions
-
source root and bundle directory
-
VCS provider (
git,fossil, ornone) -
dirty state at bundle time (
clean,safe_dirty, orforced_dirty) -
list of uncommitted modified files when
forced_dirty -
checksum algorithm
-
the shared handover filename (currently stored in the manifest
handoverFilefield) -
section outputs
-
copied assets
-
repository note metadata, including extracted summaries when notes are present
-
per-file token counts
-
source metadata such as size, media type, and mtime
-
output spans for XML, Markdown, and plain sections when exact span capture is available; JSON sections do not need them
The manifest is not just a report. It is the contract other commands operate against.
[[6-lock-file-and-checksums]] === 6. Lock file and checksums
cx bundle also writes:
-
a lock file containing the resolved Category B behavioral settings used during bundle creation
-
a SHA-256 checksum sidecar covering the manifest, lock file, section outputs, and copied assets
[[7-post-build-consumers]] === 7. Post-build consumers
After bundling, other commands use the recorded state:
-
validatechecks bundle structure and schema -
verifychecks integrity and optional source-tree drift -
listsurfaces stored metadata -
extractreconstructs files according to manifest semantics
Invariants
Some failures are fundamental and intentionally hard:
-
section overlap when overlap failure mode is active
-
asset collision between copied assets and packed files
-
missing core adapter contract
-
unsafe_dirtyworking tree without--force(exit code 7)
These are Category A invariants. They are never configurable away because doing so would make the bundle ambiguous or unverifiable.
Category B Behaviors
Some operational friction points are configurable:
-
overlap handling mode
-
missing cx-specific adapter extension mode
-
duplicate config entry mode
These are recorded in the lock file so later verification can detect drift between the settings used to build the bundle and the settings currently in effect.
Why Persistent Token Accounting Matters
Repomix can already calculate token counts while rendering. cx adds a
different guarantee: those counts are carried forward as part of the
artifact contract instead of disappearing with a single run.
That matters in automation because later verification and downstream tooling can read the manifest’s recorded token counts directly instead of re-running a render or relying on a fresh estimate in a different environment.
Why SHA-256 Matters
Checksums are not included for cryptography theater. They prove that the artifacts the runner sees are the same artifacts the bundler emitted.
Why this protects you: checksum failures are evidence that the artifact
set in hand is no longer provably the one cx wrote. Verification stops
before a tampered, partial, or substituted bundle can pass as
authoritative.
That lets verify detect:
-
manifest tampering
-
section output drift
-
missing checksum entries
-
copied asset mutation
For packed text rows, the manifest hash covers the normalized packed
content emitted by Repomix. That keeps verification aligned with the
actual handover payload instead of pretending cx is a source-byte
archiver.
Why Output Spans Matter
When exact span capture is available, the manifest can tell downstream tooling where each file lives in the rendered section output.
Those spans are only useful if the output remains deterministic. That is why degraded extraction is treated carefully: once the parser can no longer reconstruct the packed output cleanly, absolute coordinates can become unsafe for downstream automation.
Config Safety and Merge Semantics
Configuration files often inherit from other configurations (e.g.,
project-specific settings extending organization defaults). cx
enforces explicit merge semantics to prevent silent overwrites and make
configuration conflicts visible.
Merge Rules
When merging two configurations (base and override), cx applies these
rules:
-
Scalars: Override wins (right overwrites left). Conflicting scalar values are recorded as conflicts.
-
Arrays: Append-only semantics (never silent replace). When both base and override have non-empty arrays, they are concatenated. This prevents accidentally dropping existing patterns or values.
-
Objects: Deep merge (recursive application of rules). Nested structures are merged field-by-field.
-
Undefined: Treated as "not set". Missing fields in override do not affect base values.
-
Null: Valid overwrite value. Explicit null in override overwrites base (unlike undefined).
Conflict Detection
Every merge operation returns a conflict list documenting:
-
path: The configuration path (e.g.,files.exclude,dedup.mode) -
reason: Why the conflict occurred (e.g., "scalar value replaced", "array append behavior") -
baseValueandoverrideValue: The actual conflicting values
This explicit logging lets operators audit configuration inheritance chains and detect unintended changes that silent merges would hide.
Why This Matters
Configuration is not arbitrary application state. Changes to section
definitions, file patterns, or dedup rules affect the reproducibility of
a bundle. By making conflicts visible rather than silent, cx ensures
that operational decisions about configuration are explicit and
auditable.
Extraction Semantics
cx classifies extraction outcomes as:
-
intact: reconstructed text matches the packed-content hash in the manifest -
copied: asset restored directly from stored bundle content -
degraded: text is recoverable but does not match the packed-content hash in the manifest -
blocked: deterministic recovery is not possible from the stored output
degraded is intentionally not the default success path. It requires
explicit operator consent with --allow-degraded.
See Extraction Safety for the operational consequences.