Little Bear Foundry

Architect & sole engineer · 2026

A multi-agent autonomy platform deployed next to a customer's live operation

Multi-agentMCPClaude / vLLMDocker ComposePostgreSQLRDF / SPARQLTypeScriptObservability

A federation of role-specialized AI agents over a vendor-neutral ontological substrate: operational events become an RDF/SPARQL knowledge graph the agents reason against, every decision is proposed rather than auto-applied, and approved decisions write back to the system of record through a policy-gated, fully-auditable executor. Built to run on-prem on open or hosted models.

Agent foreman proposal: the pick, self-reported confidence, rationale, tools used, and the ranked candidate cohort

The compounding loop: a plain-English operator redirect is written to the knowledge substrate, the agent re-picks citing that feedback, and the human-approved decision writes back

> The_Problem

A field-operations business ran on data scattered across forms, spreadsheets, and people's heads. They needed an agent layer that could reason over all of it — but the data couldn't leave their network, and an agent that's confidently wrong can't be allowed to touch the operational record.

> What_I_Built

▸A federation of nine role-specialized agents (foreman, scheduler, financial, schedule-risk, critic, and more) plus a 22-tool operational agent that scores invoices, flags discrepancies between field forms and source data, and recommends crew, equipment, and schedule decisions.
▸The ontological engine at its core: operational events are projected into an RDF-star knowledge graph (TrustGraph, with a lightweight Oxigraph fallback) keyed on five universal primitives — WorkUnit, Actor, Decision, Outcome, Resource. Agents query reality with SPARQL instead of brittle joins, and the same ontology maps onto new verticals (construction today, HVAC stubbed) without a rewrite.
▸Integrated the customer's systems and feeds into agent workflows over the Model Context Protocol (MCP): direct Postgres, the SPARQL graph, a DoWhy/PyMC causal-inference service, and external sources like weather and permits.
▸The writeback execution path: a policy gate holds every proposed mutation for human approval, then a vendor-neutral executor applies it to the system of record under a one-shot action token — capturing a full pre-state rollback record and stamping the reason. Approved decisions are auditable and reversible, and nothing the model proposes touches the record until a person approves it.
▸Model-agnostic inference (Claude or self-hosted open-weight models via vLLM) across 13 services orchestrated with Docker Compose under local-only network isolation with automated pre-deploy verification — deployable on-prem or air-gapped.
▸Production observability and evaluation: structured logging, metrics, append-only audit trails, per-agent cost tracking, and a 47-case eval harness that separates real model regressions from infrastructure flakes.

> Outcome

✓Open models can run entirely inside the customer's network — no operational data sent to a third party.
✓Agent proposals are observable, costed, and gated on human approval, so a wrong answer never silently corrupts the record.

> Why_It_Matters

→The knowledge graph is a derived read-cache, and the agents only ever read it — they never author substrate facts directly. That half-loop is deliberate: evaluation showed small models are wrong too often to let them write the operational record, so closure runs through human-approved writeback. Correctness doesn't hinge on the model being right.
→Because the substrate is a universal ontology behind vendor-neutral adapters, the engine generalizes past construction — a new industry is a new adapter and event mapping, not a new system. OTTER is simply the first connected vertical.
→It runs end-to-end on the customer's own hardware with open models, so regulated or air-gapped operators get agent intelligence without their data ever leaving the building.
→Every approved decision and its rationale is captured as structured, queryable history, so the system compounds — it gets more useful as institutional knowledge accumulates, instead of walking out the door when people leave.

Read the research report (PDF) →