Inside the Knowledge Architecture: Trust-Aware Research Memory

Most AI research systems bolt RAG onto a chat interface and call it a day. Upload your documents, chunk them, embed them, retrieve the closest vectors when someone asks a question. It works for simple Q&A. It falls apart for serious research — the kind where source credibility matters, where findings accumulate over months, and where your research data is too sensitive to store on someone else’s infrastructure.

This article walks through the architecture of LumaVista’s unified knowledge system — the persistent layer that sits behind every research agent and makes them collectively smarter over time. If you’ve read Why Your AI Research Tool Forgets Everything and Trusts Everyone, this is the machinery underneath.

The problem we replaced

We started with six separate knowledge subsystems, each solving one piece of the puzzle and none talking to the others.

Memory entities lived in a dedicated database with text-only search — no semantic understanding. A mindmap store extracted concepts per-project but died when the project closed. An external gRPC service called LeANN handled vector search but stored embeddings unencrypted and couldn’t work with per-user encryption keys. Document concepts were extracted and then never queried. Notes had no semantic search. Project results — the actual findings from research — were trapped inside node outputs with no way to promote them to persistent knowledge.

Six subsystems, none talking to each other. Document vectors unencrypted. Project findings trapped in node outputs. Concepts extracted and never queried.

The replacement is a single unified knowledge database per user. One data model, one search pipeline, one trust assessment system, one encryption boundary.

Data model: KnowledgeNode and KnowledgeEdge

Everything in the knowledge system is either a KnowledgeNode (a piece of knowledge) or a KnowledgeEdge (a relationship between two pieces).

A KnowledgeNode carries more metadata than a typical RAG chunk. Beyond the obvious fields — name, description, summary — each node has a classification (what kind of knowledge is this?), a trust profile (how reliable is it?), a provenance chain (where did it come from?), and temporal validity (when is this knowledge true?).

The identity system uses deterministic SHA256 hashing. When two different sources mention the same concept — say, both a document upload and a research project discover “OAuth2” — they generate the same node ID. Instead of creating a duplicate, the system merges on write: mention count increments, evidence arrays append (deduplicated by source), the trust tier promotes if the new source is more authoritative, and the timestamp updates. The original creation date is preserved.

This merge-on-write semantic is the key to cross-source deduplication. A concept discovered through document indexing and independently confirmed through live research doesn’t become two entries that confuse search results. It becomes one entity with two evidence sources — which is exactly how knowledge works.

Edges carry relationship types: defines, requires, enables, part_of, cites, supersedes, contradicts. Each edge has its own confidence weight and evidence chain. The “contradicts” relationship is especially important — it’s how the system knows when two sources disagree, rather than silently picking one.

Palace hierarchy: organizing knowledge by topic

Knowledge that’s stored but can’t be navigated is barely better than knowledge that’s lost. The Palace hierarchy provides a three-level topical structure that organizes knowledge semantically, not by source.

The three levels are Wing (domain), Room (subtopic within a domain), and Hall (knowledge type). A wing might be “security” or “compliance” or “infrastructure.” A room within the “security” wing might be “oauth2” or “encryption” or “zero-trust.” Hall is a fixed vocabulary of knowledge types: fact, event, discovery, preference, advice, claim, sentiment.

Knowledge is organized by topic, not by source. A Wing is a domain, a Room is a subtopic, a Hall is the knowledge type. Cross-domain connections happen through shared edges, not source proximity.

Classification uses a hybrid approach. A fast keyword pass scans node names and descriptions against a vocabulary of ~70 patterns. If confidence exceeds 0.7, the classification is assigned directly — no LLM call needed. For ambiguous cases, nodes are batched (up to 10 per call) and sent to an LLM for classification. The vocabulary self-improves: when the LLM suggests a new keyword-to-wing mapping, it’s extracted and added to the vocabulary for future keyword passes.

Cross-domain connections happen naturally through edges. When nodes in different wings share edges, those implicit paths — we call them tunnels — allow traversal across domains. A “security” concept that links to a “compliance” concept creates a navigable connection that wouldn’t exist in a flat document store.

Trust pipeline: four layers, escalating cost

Not every piece of knowledge deserves the same level of scrutiny. The trust assessment pipeline runs four layers, each more expensive than the last, and stops as soon as it has enough confidence.

Layer 0 (free — metadata only): Source tier from URL domain or type field. Institutional sources (.gov, .edu, peer-reviewed journals) start at 0.85 reliability. Social media starts at 0.40. Anonymous sources at 0.20. This layer also pulls any existing threat score from the inbound content filter.

Layer 1 (cheap — keyword detection, <1ms): Scans for certainty markers (“reportedly” vs “proven”), arousal signals (“breaking” vs “note”), bias indicators (“buy now,” “everyone knows”), and sentiment balance. These are fast string matches — no model inference, just pattern recognition.

Layer 2 (moderate — cross-graph corroboration): BM25 searches the existing knowledge graph for similar claims. If a different source already says the same thing, the corroboration count increments and reliability goes up. If an existing node contradicts the new claim, both get flagged as contentious and a “contradicts” edge is created. This layer runs on insert and on periodic sweeps.

Layer 3 (expensive — LLM assessment): For nodes that need nuanced evaluation. Evidence grade classification (is this primary data or someone’s opinion?), sophisticated bias detection (cherry-picking, astroturfing, false balance), and summary generation with trust context. This layer is batched with the Palace classification to amortize the LLM call cost.

The composite reliability formula weights all of these:

reliability = sourceTierBase + evidenceGradeAdjustment
            + min(corroboration × 0.05, 0.25)
            - min(contradictions × 0.10, 0.30)
            - min(biasSignals × 0.05, 0.20)
            - (alarming arousal ? 0.10 : 0)
            → clamped to [0.0, 1.0]

User corrections override everything. If a researcher manually adjusts a trust score, that override is permanent — the system never re-assesses it.

Dense network of interconnected golden nodes, some glowing bright and steady (high trust), others flickering faintly (low trust), with thin amber threads connecting corroborating sources

Search architecture: BM25 + int8 vectors + RRF

The search pipeline replaces three separate search implementations (LeANN vector, memory Jaccard similarity, and note full-text) with one unified hybrid search.

BM25 index. Standard inverted index stored in Badger. Terms are tokenized, posting lists are varint-encoded for compactness. Searches produce a ranked top-100 by TF-IDF score. Sub-millisecond on typical knowledge bases.

Vector index. Embeddings come from a stateless sidecar running nomic-embed-text-v1.5 (137M parameters, Apache 2.0 license). The model produces 768-dimensional float32 vectors, which are truncated to 256 dimensions via Matryoshka representation learning and then quantized to int8 using per-user calibration. Storage: 256 bytes per vector. Search: brute-force int8 cosine similarity, top-100.

That “brute-force” might sound alarming, but the math works out. 50,000 vectors × 256 dimensions × int8 = ~12.5 MB. With SIMD optimization, a full scan takes ~2ms. With pure Go, ~20ms. Both are fast enough that an approximate nearest-neighbor index (HNSW, IVF) would add complexity without meaningful benefit at this scale.

50,000 vectors at 256 bytes each = 12.5 MB. Full brute-force scan in 2ms with SIMD. At this scale, approximate indexes add complexity without benefit.

RRF fusion (k=60). Reciprocal Rank Fusion merges the BM25 and vector result lists. Each result gets a score of 1/(k + rank), and scores are summed across lists. Results that rank well in both methods rise to the top. Results that only appear in one list still surface but with lower combined scores.

Graceful degradation. If the embedding sidecar goes down, the search pipeline automatically falls back to BM25-only mode. Research continues — you lose semantic matching but keep precise term search. When the sidecar recovers, hybrid search resumes transparently.

Metadata filters apply before or after fusion depending on the filter type. Wing, room, hall, kind, and tag constraints use Badger prefix scans on secondary indexes — existence-only keys with no values, designed purely for filtering. Optional KG enrichment follows edges from matched nodes (1-2 hops) to surface related knowledge that didn’t match the query directly.

Memory stack: L0-L3 token-budgeted injection

Search is pull — you ask a question and get results. The memory stack is push — knowledge injected into every agent’s context automatically, without being asked.

The injection happens at a single point: the executor adapter’s BuildAgentRequest function. Every agent request passes through this function once, which means knowledge injection reaches all agents — planner, searcher, reasoning, aggregator, report writer — without each one needing its own integration.

Four layers, each with a token budget:

L0: Identity (~100 tokens). Always injected. User profile from settings — who they are, what they work on. This grounds every agent in the user’s context.

L1: Essentials (~500 tokens). Always injected. Preferences, high-reliability facts (≥0.7), core decisions, trusted advice (≥0.6). Priority: importance × recency. Exclusions: contentious claims and nodes with reliability below 0.3 never make it into L1. These are the things that should color every interaction.

L2: Room Context (~200-500 tokens). Injected when the system detects a topic in the current task. If the agent is working on something related to “gdpr,” L2 pulls knowledge from the gdpr room via prefix scan. This gives agents awareness of what the user already knows about the current topic — avoiding re-discovery.

L3: Deep Search (unlimited). Explicit retrieval only — when a RAG query fires or an agent specifically requests knowledge. Full hybrid search with KG enrichment. This is where the heavy lifting happens.

L1 is what the system always remembers about you. L2 is what it knows about your current topic. L3 is what it finds when you ask. Each layer has a token budget so injection never blows up context costs.

Different agents benefit differently. The planner gets L2 so it can skip topics the user already understands. The searcher gets L2 so it avoids re-discovering known sources. The aggregator gets L3 with contradiction detection so it can flag when new findings conflict with existing knowledge. The report writer gets L0+L1 only — enough awareness without overloading.

Token budgets are configurable per user. A fast approximation (text length × 4/3 for English) keeps the counting cheap. A global ceiling via environment variable provides a hard stop. Every injection emits metrics — which layers fired, how many tokens each consumed, which nodes were selected — so the system’s knowledge behavior is observable and tunable.

Storage and encryption

All knowledge lives in Badger KV — the same embedded key-value store used for project data. No external database, no network calls to a vector service, no data leaving the user’s encryption boundary.

The key schema is designed for prefix scanning:

kn:<nodeID>                              → KnowledgeNode JSON
kn:idx:wing:<wing>:<nodeID>              → (empty, existence scan)
kn:idx:room:<wing>:<room>:<nodeID>       → (empty)
kn:idx:kind:<kind>:<nodeID>              → (empty)
ke:<edgeID>                              → KnowledgeEdge JSON
ke:idx:from:<sourceNodeID>:<edgeID>      → (empty)
si:vec:<nodeID>                          → 256-byte int8 embedding
si:bm25:<term>                           → varint posting list

Index keys carry no values — they exist solely for prefix scanning. This means a “give me all nodes in the security wing” query is a single Badger prefix iteration, not a table scan with filtering.

Per-user encryption uses a data encryption key (DEK) derived from the user’s authentication. The entire knowledge.db file is encrypted at rest. In-memory, Badger handles encryption transparently. When the user’s session ends, the database closes and the DEK is discarded. Nobody — not even the platform operator — can read a user’s knowledge without their authentication credential.

Storage estimates at heavy usage (10,000 documents, ~50,000 chunks):

Component	Size
KnowledgeNodes (~5K entities/concepts)	~5-10 MB
Chunk embeddings (50K × 256 bytes)	~12.5 MB
BM25 index (~10% of text)	~10 MB
KnowledgeEdges	~2 MB
Palace vocab + metadata	~1 MB
Total knowledge.db	~30-40 MB

That’s a heavy user with thousands of documents. A typical researcher might have 5-10 MB total. The entire knowledge system for 100 active users fits on a single machine with room to spare.

Stacked translucent golden layers representing the storage hierarchy — nodes, embeddings, and indexes all contained within a single sealed crystalline structure

Enterprise knowledge layer

Individual knowledge is half the story. Organizations need shared knowledge that every team member can access — company policies, institutional expertise, standard operating procedures — without compromising per-user encryption.

Enterprise knowledge lives in PostgreSQL (org-level, not per-user Badger). It’s a separate table with versioned rows containing the same KnowledgeNode JSON structure. An org-level version counter tracks changes.

Propagation uses copy-on-session-open. When a user authenticates, the system compares their last-seen enterprise version with the current org version. If there’s a delta, new or updated enterprise nodes are copied into the user’s personal knowledge.db with tier set to “inherited.” They arrive pre-classified (wing, room, hall already assigned) and their terms get added to the user’s Palace vocabulary.

Enterprise knowledge syncs into each user’s encrypted DB on login. Users can annotate but can’t delete — enterprise nodes re-inject on next sync.

The key constraints: users can annotate enterprise nodes (add notes, adjust importance, add tags) but cannot delete them. Deleted enterprise nodes re-inject on the next session sync. Users can promote enterprise knowledge to a higher tier through their own research — “inherited” becomes “researched” when the user independently verifies it.

Enterprise nodes participate in all searches alongside the user’s personal knowledge. L1 injection includes enterprise nodes with importance ≥ 4. This means organizational expertise is always available to every agent, blended seamlessly with the individual’s own accumulated knowledge.

Passive learning

The system gets smarter without anyone filling out a feedback form. Every interaction point doubles as an observation point.

When a node is accessed through search or retrieval, its importance score gets a small boost and its last-accessed timestamp updates. Frequently accessed knowledge naturally rises in L1 priority. Knowledge that’s never accessed gradually fades — not deleted, but deprioritized.

When a search returns fewer than two results, the system records a knowledge gap — the query hash, the text, and a timestamp. Gaps with three or more hits within seven days surface in the UI as “topics you’ve asked about but don’t have stored knowledge for.” This turns the absence of knowledge into actionable information.

When users review extracted knowledge — approving, dismissing, or editing before committing — every action is a signal. Approvals confirm the extraction quality. Dismissals say “this isn’t worth keeping.” Edits correct the classification. All of these feed back into the keyword vocabulary and scoring models that drive future extraction.

The cumulative effect is a knowledge system that asymptotically approaches the user’s actual research needs. Not through explicit training, but through the accumulated weight of thousands of small signals. The longer you use it, the better it gets at predicting what you need.

What to do now

Evaluate your current RAG architecture against these dimensions. Does it handle deduplication? Trust assessment? Per-user encryption? If the answer is “we store everything in a shared Pinecone index,” you have structural limitations that no prompt engineering will fix.
Map your knowledge fragmentation. How many separate stores hold pieces of your users’ knowledge? Chat logs, document vectors, extracted entities, project outputs — if these live in different systems, your users are doing the integration work manually.
Assess your trust posture. Does your system distinguish source quality? Can it flag contradictions? If all retrieved content gets the same confidence treatment, your system is fast but untrustworthy.
Check your encryption boundary. Are vectors stored encrypted per-user, or in a shared index? Can the platform operator read user knowledge? If you’re in a regulated industry, the answer matters.
Consider the compound effect. A knowledge system that resets every session has linear value — each session is independent. A system that accumulates has exponential value — each session builds on everything that came before. The gap between these compounds over months.
Talk to us. If you’re building or evaluating knowledge architectures for sovereign AI deployment, we’re happy to walk through the technical details in depth. Get in touch →