Little Bear Foundry
Architect & sole engineer · 2026A multi-agent autonomy platform deployed next to a customer's live operation
Multi-agentMCPClaude / vLLMDocker ComposePostgreSQLRDF / SPARQLTypeScriptObservability
A federation of role-specialized AI agents over a vendor-neutral ontological substrate: operational events become an RDF/SPARQL knowledge graph the agents reason against, every decision is proposed rather than auto-applied, and approved decisions write back to the system of record through a policy-gated, fully-auditable executor. Built to run on-prem on open or hosted models.


> The_Problem
A field-operations business ran on data scattered across forms, spreadsheets, and people's heads. They needed an agent layer that could reason over all of it — but the data couldn't leave their network, and an agent that's confidently wrong can't be allowed to touch the operational record.
> What_I_Built
- ▸A federation of nine role-specialized agents (foreman, scheduler, financial, schedule-risk, critic, and more) plus a 22-tool operational agent that scores invoices, flags discrepancies between field forms and source data, and recommends crew, equipment, and schedule decisions.
- ▸The ontological engine at its core: operational events are projected into an RDF-star knowledge graph (TrustGraph, with a lightweight Oxigraph fallback) keyed on five universal primitives — WorkUnit, Actor, Decision, Outcome, Resource. Agents query reality with SPARQL instead of brittle joins, and the same ontology maps onto new verticals (construction today, HVAC stubbed) without a rewrite.
- ▸Integrated the customer's systems and feeds into agent workflows over the Model Context Protocol (MCP): direct Postgres, the SPARQL graph, a DoWhy/PyMC causal-inference service, and external sources like weather and permits.
- ▸The writeback execution path: a policy gate holds every proposed mutation for human approval, then a vendor-neutral executor applies it to the system of record under a one-shot action token — capturing a full pre-state rollback record and stamping the reason. Approved decisions are auditable and reversible, and nothing the model proposes touches the record until a person approves it.
- ▸Model-agnostic inference (Claude or self-hosted open-weight models via vLLM) across 13 services orchestrated with Docker Compose under local-only network isolation with automated pre-deploy verification — deployable on-prem or air-gapped.
- ▸Production observability and evaluation: structured logging, metrics, append-only audit trails, per-agent cost tracking, and a 47-case eval harness that separates real model regressions from infrastructure flakes.
> Outcome
- ✓Open models can run entirely inside the customer's network — no operational data sent to a third party.
- ✓Agent proposals are observable, costed, and gated on human approval, so a wrong answer never silently corrupts the record.
> Why_It_Matters
- →The knowledge graph is a derived read-cache, and the agents only ever read it — they never author substrate facts directly. That half-loop is deliberate: evaluation showed small models are wrong too often to let them write the operational record, so closure runs through human-approved writeback. Correctness doesn't hinge on the model being right.
- →Because the substrate is a universal ontology behind vendor-neutral adapters, the engine generalizes past construction — a new industry is a new adapter and event mapping, not a new system. OTTER is simply the first connected vertical.
- →It runs end-to-end on the customer's own hardware with open models, so regulated or air-gapped operators get agent intelligence without their data ever leaving the building.
- →Every approved decision and its rationale is captured as structured, queryable history, so the system compounds — it gets more useful as institutional knowledge accumulates, instead of walking out the door when people leave.