Product · Lighthouse

Pre-alpha

Knowledge base for AI agents.

Retrieval over MCP, proposals over HTTP, a temporal knowledge graph underneath. Live sources, auto-expiring facts, and a curator pipeline that gets stronger with every project that contributes.

View source Join pre-alpha

stale facts

auto-expiring validity windows

30+

source connectors

Markdown · GitHub · Web · Notion

MCP tools

search · fetch · propose

100%

curated

no fact enters without evaluation

The problem

AI agents work with knowledge they don't have.

Every team in the industry is solving this independently — and getting it wrong in the same three ways.

Without Lighthouse

The stale knowledge problem

Models are trained on yesterday's data. They confidently recommend APIs that no longer exist and versions that broke months ago. There is no layer between the agent and the cutoff.

Ten teams hit the same edge cases. Nobody shares fixes structurally — only in Slack threads that vanish.

With Lighthouse

The temporal knowledge graph

Every fact carries a validity window. What was true a year ago is structurally separate from what is true now. Agents query the present, not the archive.

Live sources auto-expire outdated facts. The graph gets more accurate as more projects contribute corrections.

What makes it different

Four architectural bets.

Not a wiki someone maintains. An index over live sources, with temporal semantics and a curation loop that scales with contributions.

Index, not wiki

Content is connected, not written. When a source updates, outdated facts are automatically marked invalid. No editor required; no drift between source and knowledge base.

Temporal knowledge graph

Every fact carries a validity window — valid_from and valid_until. What was true a year ago is structurally separate from what is true now. Agents query the present, not the archive.

Network effect as a filter

Ten projects independently proposing the same correction is a strong signal. One project is weak. Critical mass is the quality gate — the library gets more accurate as more teams contribute.

Curation model, no direct writes

Every fact enters through a proposal. The librarian agent evaluates: accept if verifiable, reject with a pointer to the duplicate, escalate if evidence is thin. Humans see only what the agent can't decide.

Architecture

Three levels of knowledge, different policies.

Each layer has a distinct owner, scope, and trust level. Project knowledge references global knowledge without duplicating it. Role memory stays separate.

Shared · cross-industry

Global Library

Canonical facts about technologies. Sources: official documentation, RFCs, curated cookbooks. Practice-derived facts from many contributing projects. Critical mass equals trust.

owner: Shared resource

Private · per project

Project KB

Connect your Notion, Confluence, or repo. The graph is built and indexed privately. It references the Global Library without duplicating it — and has its own librarian calibration.

owner: Project team

Personal · per agent role

Role Memory

Habits and style for a specific role. Accumulated from feedback, not from ingest. Kept separate from knowledge — this is about how the agent works, not what is true.

owner: The role itself

How it flows

From source to agent in one pipeline.

Sources are ingested on a schedule, facts are curated by the librarian, and agents retrieve over MCP — the same engine throughout.

Consumers — any MCP-compatible agent

Claude DesktopCursorCodexCustom agents

MCP · stdio / HTTP-SSE

API surface

Hybrid BM25 + vector + graph BFS

fetch

Node by UUID, full record

propose

Submit fact for curation

Graphiti + FalkorDB

Temporal knowledge graph

Graphiti + FalkorDB

Every fact carries valid_from / valid_until. Hybrid search: vector + BM25 + graph BFS + reranker.

Librarian agent

Claude Haiku · Sonnet

AcceptRejectEscalate

Proposal pipeline — POST /v1/propose

Source connectors · scheduled ingest

MarkdownWebGitHubNotion…

API surfaces

Three surfaces. One engine.

The same graph powers retrieval, the proposal pipeline, and the MCP server adapter. Swap your transport, keep your tool definitions.

GET /search · /fetch · /health

MCP Retrieval

Hybrid search: BM25 + vector + graph BFS with cross-encoder reranking. Returns ranked facts with validity windows. Stateless — the same shape an MCP server wraps on top.

Exposes: search, fetch, propose — three tools for any MCP-compatible client.

POST /v1/propose · GET /v1/proposals/:id

Proposal Pipeline

Submit new facts, corrections, or deprecations over HTTP. Fire-and-forget 202 Accepted — the librarian picks it up async. Poll the ID for the decision: accepted, rejected, escalated.

Types: add · correct · deprecate. Statuses: queued → evaluating → accepted | rejected | escalated.

stdio / HTTP-SSE

MCP Server

Three tools: search, fetch, propose. stdio transport for desktop clients (Claude Desktop, Cursor); HTTP/SSE for remote agents. Swap transports without changing tool definitions.

Compatible with any MCP client — Claude, Cursor, Codex, custom.

Librarian agent

Every fact earns its place.

No fact enters the graph without evaluation. The librarian — a Claude-backed curator agent with a stable, cached rubric — decides accept, reject, or escalate on every proposal.

Verifiable against the supplied evidence. Written to graph as a new episode.

Reject

Wrong, duplicate, or too project-specific. Returned with a reason and a pointer.

Escalate

Evidence too thin to decide. Lands in a human queue.

Proposal store is git-backed — every state change is a commit. History, audit trail, and grep all come free.

Source connectors

Connect the sources you own.

The scheduler pulls from sources on your cadence. When a source changes, derived facts are marked for re-evaluation. The library stays current without manual curation.

MarkdownLocal .md / .markdown files

WebHTTP scraping via Trafilatura

GitHub.md / .rst / .mdx / .txt from any repo

NotionVia LlamaHub (upcoming)

ConfluenceVia LlamaHub (upcoming)

30+ moreLlamaIndex connector protocol

Runner schedule uses simple duration notation (6h, 1d, 30m). An error in one source only fails that source — the scheduler continues.

Why now

The window opened in 2025–26.

Four shifts converged. Any one of them alone would not have been enough.

MCP standardized access

Any agent connects to any MCP server without integration work. Before MCP, the retrieval layer was bound to a specific model. That lock-in is gone.

Graph memory matured

Graphiti, temporal graphs, provenance tracking — ready-to-use open-source primitives. A year ago this had to be built from scratch.

A vacuum opened in shared knowledge

A generation of developers moved from Stack Overflow to LLMs. Those LLMs need a knowledge layer that is accurate, live, and shared — not buried in a training corpus.

Agentic AI is going to production

Analysts project that by 2027 half of companies with GenAI investment will run production agents. Infrastructure-grade knowledge is not optional at that scale.

Market position

Not personal memory. Shared infrastructure.

Memory systems — Mem0, Letta, Cognee — solve "the agent remembers this specific user." Lighthouse solves "the entire industry shares knowledge about technologies." These are different categories. We do not compete — we sit on top.

The differentiator is the temporal graph plus the proposal pipeline. Context7 and similar tools give agents docs. Lighthouse gives agents facts with provenance, validity windows, and a feedback loop that improves with use.

Mem0 / Letta / CogneePersonal agent memory

Context7Live docs retrieval

Stack OverflowHuman Q&A (in decline)

LighthouseShared, temporal, curated knowledge graph

Under the hood

Boring primitives, no proprietary lock-in.

Every component is open-source or replaceable. The graph store, the embedding model, the curator LLM — all swappable without touching retrieval or proposal contracts.

Graph DB

FalkorDB + Graphiti

Temporal semantics, hybrid search, provenance

Entity extraction

OpenAI gpt-4o-mini

Graphiti framework; Gemini / Anthropic variants possible

Embeddings

text-embedding-3-small

1024-dim, OpenAI compatible

Curator LLM

Claude (Haiku · Sonnet)

Anthropic SDK with prompt caching on rubric

Source connectors

LlamaIndex hub

30+ connectors: Markdown, GitHub, Web, Notion…

MCP adapter

mcp v1.0+

stdio for desktop, HTTP/SSE for remote agents

API framework

FastAPI + Uvicorn

Python 3.12+, Pydantic v2 schemas

Proposal store

Git-backed markdown

Every state change is a commit. Grep-friendly.

Isolation model

Separate instances

No tenant model — deploy two for global + project

Pre-alpha

If it works, this is infrastructure for the era.

Lighthouse is open-source and in active development. We are looking for teams willing to run it against real projects and calibrate the librarian on real knowledge gaps.

View on GitHub Talk to us