Product · Lighthouse
Pre-alphaKnowledge base for AI agents.
Retrieval over MCP, proposals over HTTP, a temporal knowledge graph underneath. Live sources, auto-expiring facts, and a curator pipeline that gets stronger with every project that contributes.
0
stale facts
auto-expiring validity windows
30+
source connectors
Markdown · GitHub · Web · Notion
3
MCP tools
search · fetch · propose
100%
curated
no fact enters without evaluation
The problem
AI agents work with knowledge they don't have.
Every team in the industry is solving this independently — and getting it wrong in the same three ways.
Without Lighthouse
The stale knowledge problem
Models are trained on yesterday's data. They confidently recommend APIs that no longer exist and versions that broke months ago. There is no layer between the agent and the cutoff.
Ten teams hit the same edge cases. Nobody shares fixes structurally — only in Slack threads that vanish.
With Lighthouse
The temporal knowledge graph
Every fact carries a validity window. What was true a year ago is structurally separate from what is true now. Agents query the present, not the archive.
Live sources auto-expire outdated facts. The graph gets more accurate as more projects contribute corrections.
What makes it different
Four architectural bets.
Not a wiki someone maintains. An index over live sources, with temporal semantics and a curation loop that scales with contributions.
01
Index, not wiki
Content is connected, not written. When a source updates, outdated facts are automatically marked invalid. No editor required; no drift between source and knowledge base.
02
Temporal knowledge graph
Every fact carries a validity window — valid_from and valid_until. What was true a year ago is structurally separate from what is true now. Agents query the present, not the archive.
03
Network effect as a filter
Ten projects independently proposing the same correction is a strong signal. One project is weak. Critical mass is the quality gate — the library gets more accurate as more teams contribute.
04
Curation model, no direct writes
Every fact enters through a proposal. The librarian agent evaluates: accept if verifiable, reject with a pointer to the duplicate, escalate if evidence is thin. Humans see only what the agent can't decide.
Architecture
Three levels of knowledge, different policies.
Each layer has a distinct owner, scope, and trust level. Project knowledge references global knowledge without duplicating it. Role memory stays separate.
Shared · cross-industry
Global Library
Canonical facts about technologies. Sources: official documentation, RFCs, curated cookbooks. Practice-derived facts from many contributing projects. Critical mass equals trust.
owner: Shared resource
Private · per project
Project KB
Connect your Notion, Confluence, or repo. The graph is built and indexed privately. It references the Global Library without duplicating it — and has its own librarian calibration.
owner: Project team
Personal · per agent role
Role Memory
Habits and style for a specific role. Accumulated from feedback, not from ingest. Kept separate from knowledge — this is about how the agent works, not what is true.
owner: The role itself
How it flows
From source to agent in one pipeline.
Sources are ingested on a schedule, facts are curated by the librarian, and agents retrieve over MCP — the same engine throughout.
Consumers — any MCP-compatible agent
API surface
search
Hybrid BM25 + vector + graph BFS
fetch
Node by UUID, full record
propose
Submit fact for curation
Temporal knowledge graph
Graphiti + FalkorDB
Every fact carries valid_from / valid_until. Hybrid search: vector + BM25 + graph BFS + reranker.
Librarian agent
Claude Haiku · Sonnet
Source connectors · scheduled ingest
API surfaces
Three surfaces. One engine.
The same graph powers retrieval, the proposal pipeline, and the MCP server adapter. Swap your transport, keep your tool definitions.
GET /search · /fetch · /health
MCP Retrieval
Hybrid search: BM25 + vector + graph BFS with cross-encoder reranking. Returns ranked facts with validity windows. Stateless — the same shape an MCP server wraps on top.
Exposes: search, fetch, propose — three tools for any MCP-compatible client.
POST /v1/propose · GET /v1/proposals/:id
Proposal Pipeline
Submit new facts, corrections, or deprecations over HTTP. Fire-and-forget 202 Accepted — the librarian picks it up async. Poll the ID for the decision: accepted, rejected, escalated.
Types: add · correct · deprecate. Statuses: queued → evaluating → accepted | rejected | escalated.
stdio / HTTP-SSE
MCP Server
Three tools: search, fetch, propose. stdio transport for desktop clients (Claude Desktop, Cursor); HTTP/SSE for remote agents. Swap transports without changing tool definitions.
Compatible with any MCP client — Claude, Cursor, Codex, custom.
Librarian agent
Every fact earns its place.
No fact enters the graph without evaluation. The librarian — a Claude-backed curator agent with a stable, cached rubric — decides accept, reject, or escalate on every proposal.
Verifiable against the supplied evidence. Written to graph as a new episode.
Wrong, duplicate, or too project-specific. Returned with a reason and a pointer.
Evidence too thin to decide. Lands in a human queue.
Proposal store is git-backed — every state change is a commit. History, audit trail, and grep all come free.
Source connectors
Connect the sources you own.
The scheduler pulls from sources on your cadence. When a source changes, derived facts are marked for re-evaluation. The library stays current without manual curation.
Runner schedule uses simple duration notation (6h, 1d, 30m). An error in one source only fails that source — the scheduler continues.
Why now
The window opened in 2025–26.
Four shifts converged. Any one of them alone would not have been enough.
01
MCP standardized access
Any agent connects to any MCP server without integration work. Before MCP, the retrieval layer was bound to a specific model. That lock-in is gone.
02
Graph memory matured
Graphiti, temporal graphs, provenance tracking — ready-to-use open-source primitives. A year ago this had to be built from scratch.
03
A vacuum opened in shared knowledge
A generation of developers moved from Stack Overflow to LLMs. Those LLMs need a knowledge layer that is accurate, live, and shared — not buried in a training corpus.
04
Agentic AI is going to production
Analysts project that by 2027 half of companies with GenAI investment will run production agents. Infrastructure-grade knowledge is not optional at that scale.
Market position
Not personal memory. Shared infrastructure.
Memory systems — Mem0, Letta, Cognee — solve "the agent remembers this specific user." Lighthouse solves "the entire industry shares knowledge about technologies." These are different categories. We do not compete — we sit on top.
The differentiator is the temporal graph plus the proposal pipeline. Context7 and similar tools give agents docs. Lighthouse gives agents facts with provenance, validity windows, and a feedback loop that improves with use.
Under the hood
Boring primitives, no proprietary lock-in.
Every component is open-source or replaceable. The graph store, the embedding model, the curator LLM — all swappable without touching retrieval or proposal contracts.
Graph DB
FalkorDB + Graphiti
Temporal semantics, hybrid search, provenance
Entity extraction
OpenAI gpt-4o-mini
Graphiti framework; Gemini / Anthropic variants possible
Embeddings
text-embedding-3-small
1024-dim, OpenAI compatible
Curator LLM
Claude (Haiku · Sonnet)
Anthropic SDK with prompt caching on rubric
Source connectors
LlamaIndex hub
30+ connectors: Markdown, GitHub, Web, Notion…
MCP adapter
mcp v1.0+
stdio for desktop, HTTP/SSE for remote agents
API framework
FastAPI + Uvicorn
Python 3.12+, Pydantic v2 schemas
Proposal store
Git-backed markdown
Every state change is a commit. Grep-friendly.
Isolation model
Separate instances
No tenant model — deploy two for global + project
Pre-alpha
If it works, this is infrastructure for the era.
Lighthouse is open-source and in active development. We are looking for teams willing to run it against real projects and calibrate the librarian on real knowledge gaps.