Introduction

insigz is a geospatial intelligence fusion platform. This documentation explains how it's built, how to put data into it, how to reason over it, and how to run sessions on top of it. Read in order; each section depends on the previous one.

What insigz is

insigz is the layer between continuously-published world data — AIS, electrical grids, sanctions registries, news feeds, weather, cyber advisories — and the people who have to make decisions on top of it. We fuse live sources across maritime, energy, sanctions, news and research into a single canonical model: Source → Observation → Entity → Event → Case → Report.

Three things make insigz different from "another intelligence platform":

The data is live. AIS positions update every minute. ENTSO-E grid flows update every 15 minutes. News and sanctions update as they're published. The world the platform shows is the world you're acting on.
The AI shows its work. Every agent suggestion carries its reasoning. Faculty and analysts approve, edit, or override. Nothing publishes autonomously.
One canonical model. No bespoke pipelines, no proprietary formats. Add a source: write one ingestor, run one migration. Everything else inherits.

PRINCIPLE

AI supports, never decides. This is the binding constraint that shapes every agent surface. See #oversight for how it's enforced.

Quickstart

A working insigz tenant in five steps. Assumes you have a contract scope and a deployment target (Cloud Run on GCP — our Swiss europe-west6 region, or a customer region).

# 1. Provision a tenant database
$ insigzctl tenant create --name=baltic-2027 --region=eu-central

# 2. Seed the canonical model
$ insigzctl migrate --tenant=baltic-2027 --version=latest

# 3. Enable the data sources you need
$ insigzctl sources enable ais entso-e ofac ncsc noaa

# 4. Verify ingestion
$ insigzctl observe --since="5 min ago" --count
   14,602 observations

# 5. Open the working surface
$ insigzctl open

Within ~10 minutes you should see the live map populating in the Workshop view. From there, scenarios and sessions are built using the Workshop authoring tools.

Core concepts

Five nouns are enough to describe everything insigz models:

Source

An external system insigz reads from. Examples: ais.marinetraffic, entso-e.transparency, ofac.sdn. Each source has an ingestor, a polling cadence, and a provenance signature.

Observation

An atomic, timestamped, geolocated fact produced by an ingestor. Observations are immutable and signed by their source. Example: vessel:imo:9123456 position:(59.43,24.75) at:2027-02-15T04:17:08Z source:ais.

Entity

A noun in the world. Vessels, cables, ports, substations, sanctioned parties, news outlets. Entities accumulate observations over time and have stable canonical IDs across sessions.

Event

A change in entity state, derived from one or more observations. Examples: cable-throughput-drop, ais-gap, designation-added. Events carry confidence, provenance, and a back-reference to their constituting observations.

Case

A bound collection of events that mean something together. Analysts open cases; insigz suggests bindings. A case is the unit of analysis — what becomes a report.

The canonical model

Every table in every insigz tenant is one of: a source registry, an observation log, an entity table, an event log, a case binding, or a report manifest. There are no exceptions and no escape hatches.

CREATE TABLE observations (
  id              uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  source_id       text NOT NULL REFERENCES sources(id),
  entity_id       uuid REFERENCES entities(id),
  observed_at     timestamptz NOT NULL,
  geom            geometry(Point, 4326),
  payload         jsonb NOT NULL,
  signature       bytea NOT NULL,
  received_at     timestamptz NOT NULL DEFAULT now()
);

Every observation has a signature from its source — an Ed25519 signature over a canonical hash of (source_id, observed_at, payload). The signature is what makes the audit log defensible: if a faculty member disputes a consequence, we can trace it back through events to specific signed observations.

Framework lineage

The Source → Observation → Entity → Event → Case → Report chain is a deliberate synthesis of two established traditions — not a bespoke invention. Naming the lineage matters: it places the model for any reviewer with a fusion or platform background.

Defense / IC data fusion (JDL / DFIG)

The Joint Directors of Laboratories model is the reference framework for intelligence data fusion. Our stages map onto its levels almost one-to-one:

JDL level                       insigz stage
L0  source / signal refinement   Source, Observation
L1  object refinement            Entity
L2  situation refinement         Event
L2/L3 situation & threat        Case
L3  impact assessment            Report
L4  process refinement           the platform itself (ingestion, tasking, audit)

Entity-centric ontology (commercial)

The closer comparison is the modern ontology pattern — object types (an entity or event), properties (its characteristics), and link types (relationships), all mapped from source data. insigz grounds its AI analyst at the observation / entity layer, the same move ontology platforms made when the ontology became the backbone for AI agents from ~2023. The analyst cites [OBS-####] rather than asserting unsupported facts.

Why "Case" is a stage

In pure sensor-fusion vocabulary, Case isn't a level — it belongs to investigative / compliance / OSINT practice (case management, sanctions work, link analysis). It's a deliberate, recognisable choice for our maritime, sanctions and academic users; for a hard-defense audience the same stage reads as "investigation."

CAVEAT

The chain is a communication device, not a claim that data only flows left-to-right. Fusion isn't strictly linear — any level can run on another's output. What earns credibility is the properties around the chain: provenance & lineage, entity resolution, grounding, auditability, and time-travel reconstruction (AsOf). The public explainer is at knowledge.html#lineage.

Source onboarding — the Ontology Agent

◐ PREVIEW · NOT YET GA

The Ontology Agent ships today as a guided, scripted preview of the onboarding flow; full self-serve onboarding of arbitrary sources is on the roadmap. The Q9 guarantee below — the agent drafts, a human commits every schema change — is firm.

Adding a source is traditionally a two-specialist job: a Connector Engineer wires the feed and maps its fields to the canonical model, and a Canonical Model Steward (ontologist) decides how those fields become entities and events without forking the schema. The Ontology Agent compresses both into one guided flow driven from the analyst chat. You drop a source — a URL, an API spec, a sample payload, a file, or a catalog pick — and the agent profiles it, drafts the connector, drafts the field→Observation mapping, proposes which entity/event types it maps onto (reusing existing ones wherever possible), proposes resolution keys, and dry-runs the result against a live sample. It drafts; a human commits.

PRINCIPLE · Q9

Onboarding is agent-drafted, human-committed. The agent proposes connector config, mappings, new types and resolution rules; the Steward approves a diff before anything touches the canonical model. No silent schema changes, no silent entity merges. Q9 is the extension of Q8 (#oversight) into the data layer: the canonical model is a protected branch; the agent opens the pull request, a human merges it.

The staged pipeline

The agent runs a fixed, auditable pipeline. Each stage proposes an artifact and waits for approve / edit / reject.

# org-layer: make a source exist & be trustworthy
0 Intake     → pull a representative sample (N records)
1 Profile    → protocol, rate, auth; field types, units, semantic roles; PII flags
2 Connector  → endpoint, auth binding, cadence, health probe
3 Mapping    → native field → Observation field; observed_property; UTC/WGS84
4 Ontology   → match existing entity/event types (reuse>new); resolution keys
5 Dry-run    → replay sample, no commit: resolved/new/dedup/conflicts + coverage
6 Steward    → Q9 GATE: human approves the full diff (logged)
7 Catalog    → register in org connector catalog + entitlement
# project-layer: decide to use it (existing flow)
8 Connect    → Project Lead adds it to a project, sets alerts

Automated vs. human-gated

Agent drafts: protocol/field detection, unit·timestamp·geo normalization, connector config + health probe, field→Observation mapping, proposed resolution keys & thresholds, dry-run replay with conflict/dedup detection, and reuse suggestions (anti-fork).
Human only (never autonomous): committing a new entity/event type (Steward, Q9), supplying credentials/classification (Admin), merging two entities in the review band (analyst/Steward), connecting a source into a project (Project Lead).

The Observation schema

One shape for every fact, from every source — the atomic unit the analyst is grounded on and lineage is tracked at.

// Observation — atomic, timestamped, sourced fact
{
  "id": "OBS-50412",
  "source_id": "spire-ais",
  "observed_property": "vessel_position",  // typed enum; governs value shape
  "observed_at": "2027-02-17T14:33:07Z",    // EVENT time (when true), UTC
  "ingested_at": "2027-02-17T14:33:11Z",    // SYSTEM time (when received), UTC
  "entity_ref": "VESSEL-SF12",             // resolved entity, or null
  "entity_candidate": null,
  "geo": { "lat": 59.81, "lon": 24.77, "geohash": "ud9…" },  // WGS84 | null
  "value": { "sog_kn": 0.2, "cog_deg": 184 },        // schema per observed_property
  "source_native": { "mmsi": 209123456 },          // raw fields, preserved for replay
  "provenance": {
    "connector_type": "ws",        // ws | http | file | manual
    "signature": "ed25519:…",       // Ed25519 signature over the record hash
    "classification": "OPEN",      // → drives entitlement/visibility
    "resolution": { "method": "deterministic", "confidence": 1.0, "keys_used": ["mmsi"] }
  }
}

Bitemporal (observed_at vs ingested_at) — this is what powers the AsOf control (Q2). Do not collapse the two axes.
source_native is never discarded — normalization is additive; lineage and replay need the raw fields.
signature + classification on every record — a degraded feed can be down-weighted or quarantined without corrupting the picture; access is scoped at the row.
value is polymorphic by observed_property — keep a registry of property types; a genuinely new property type is part of the Q9 diff.

Entity resolution

Resolution is where fusion creates value — and where it most often goes wrong at scale. Rules are declared per entity type, proposed by the agent at stage 4, approved by the Steward.

// runtime — per incoming Observation
1. try deterministic keys in priority order   // e.g. VESSEL: imo > mmsi > callsign
     unique hit            → attach, confidence 1.0, method=deterministic
2. else block + score (probabilistic)
     block on coarse key (geo-cell+time, or name prefix); score signals → s∈[0,1]
3. decide by band:
     s ≥ AUTO   (≈0.92)            → attach, method=probabilistic
     REVIEW ≤ s < AUTO (≈0.65)     → candidate + MERGE-REVIEW task (no silent merge · Q9)
     s < REVIEW                    → create NEW candidate entity

Candidate → confirmed — probabilistic/unresolved attaches create candidate entities; promotion is an analyst/Steward action, logged.
Merge & split are first-class, audited, reversible — resolution is never destructive; a merge can be undone, restoring both lineages.
Attribute conflicts are kept, not overwritten — disagreeing sources both retained with source_ref; the entity surfaces the freshest value and flags the conflict. No last-writer-wins.
Thresholds are per-type and tunable — AUTO/REVIEW bands live on the entity type, version-controlled with the ontology diff.

ANTI-FORK

The biggest risk of many sources is schema sprawl — 50 feeds inventing 50 near-duplicate "vessel" types. At stage 4 the agent embeds the source's concepts, matches them against existing types, and defaults to reuse; new types are proposed only when the fit is poor, as the minimal covering type, with near-neighbours shown. Every accepted change is a versioned ontology_diff in the audit log. The customer-facing explainer is at knowledge.html#agent.

The Autonomy Layer — Watches & Proposals

Watches turn insigz from a platform you query into one that watches for you — without crossing into autonomous action. A Watch runs a grounded agent against the live feed on a trigger or schedule and stages a Proposal; a human Accepts, Modifies or Rejects it from the Proposal Inbox. A committed proposal becomes an ordinary Event / Case-update / Alert — no parallel data world.

PRINCIPLE · Q10

Autonomy is bounded and reversible. An agent may auto-apply only internal, reversible outputs (attach an observation, flag an entity, raise an alert, open a WATCH-status case) above the AUTO confidence band (≈0.92). Anything customer-facing or irreversible — report drafts, published artifacts, external notifications — always stages for review. Enforced in the band function and again server-side. Q10 extends Q8/Q9 from one-off approvals to always-on monitoring.

Autonomy bands

confidence ≥ AUTO (0.92)  AND  output ∈ auto_allowed_kinds  → auto-apply + audit
REVIEW (0.65) ≤ confidence < AUTO                            → stage for review (alert)
confidence < REVIEW                                          → stage for review, flagged low

Each needs-review proposal emits a watch-review alert into the one notification inbox (no standalone queue). The Review panel’s Show Logic opens the cited [OBS-####] observations + the tool-use trace + the reasoning chain — the same provenance substrate the analyst chat uses.

Trust Evals & the Scorecard

Before an agent drives a watch, an eval suite proves it. Evaluators: citation-accuracy (fraction of claims citing a real observation — the defensibility number), refusal-correctness (does it refuse when unsupported?), answer-fidelity (vs. an expert ground-truth), plus contains-key-details and ROUGE. Runs compare against a baseline so a model/prompt change can’t silently regress citation accuracy. The output is a signed Scorecard (“Every claim cited · citation-accuracy 0.98 · n=120”) that attaches to a report or case — for regulated buyers this is the buy reason.

Analyst Studio

A no-code builder for grounded agents. Config (instructions · tool scope · grounding rules · model) on the left, a live test panel on the right. Every published agent inherits the grounding contract — citation-required, the literal refusal string, an observation scope — so you cannot ship an ungrounded analyst. Publishing makes the agent selectable in the New-Watch picker and as an Eval target. The four defaults (analyst chat, Ontology Agent, Adjudicator, After-Action) are read-only.

The Entity Network

The Entity Network renders the canonical model as a force-directed graph. Nodes are entities; edges are ties derived from shared observations and events. The layout runs a 3D force simulation client-side and projects it to a navigable point cloud — grab to rotate, scroll to zoom, click a node to open the Entity Inspector.

Edges are typed and weighted. Intra-community edges take the community colour; cross-community ties render as faint bridges; agent-suggested ties render dashed until a human confirms them (Q10). Community assignment and centrality are computed over the live tie graph, so the colouring and node sizing reflect the current state of the model, not a cached snapshot.

Colour by community (default), entity type, or risk.
Centrality drives node size and brightness — degree within the live subgraph.
Ego-network — select a node to scope the graph to its neighbourhood.
Every edge resolves to the observations and events that established it; nothing in the graph is ungrounded.

Pattern of Life

Pattern of Life scores an entity's activity over time against a baseline computed from its own history. The baseline band uses robust statistics (median + robust spread) so it resists single-point noise. Each time bucket is compared to the band; buckets outside it are flagged and scored.

// per bucket
band   = robust_baseline(entity, window)      // median ± k·MAD
z      = (count - band.center) / band.spread  // robust deviation
anomaly = count < band.lo OR count > band.hi
direction = count > band.hi ? 'elevated' : 'suppressed'

Analysts brush a window to aggregate (total, peak, anomalies vs expected) or click a bucket to inspect a single moment and open the evidence. A scored deviation is the canonical trigger for a Standing Watch — the same condition surfaced here is what a watch evaluates against the live feed before staging a Proposal. See #autonomy.

Human-in-loop oversight

The agent triad — Adjudicator, Inject Generator, After-Action analyst — runs in suggestion mode. Every output is queued for human review. Implementation:

INSERT INTO adjudicated_consequences (
  action_id, variant, text, confidence_low, confidence_high,
  reasoning, historical_analogs, approval
) VALUES (
  $1, $2, $3, $4, $5, $6, $7, 'pending'
);

-- approval state transitions only via a faculty session
UPDATE adjudicated_consequences
   SET approval = 'approved', approved_by = $1, approved_at = now()
 WHERE id = $2;

Faculty users have a database-level bypass on visibility views but not on approval requirements. The two policies are independent and both are enforced in Postgres, not in application code. See #deploy-arch for the row-level-security configuration.

← PREVIOUS NEXT — QUICKSTART →