Product · Lighthouse
Live · global instanceCanonical docs your agent can cite.
Lighthouse hosts a curated knowledge graph for coding agents — RFCs, OWASP, NIST, framework documentation, methodology pages — indexed, chunked, citation-ready over MCP. Plug your client in and the agent stops hallucinating axios v0.21.
One product, two ways to run it: plug into the hosted instance, or self-host the Apache-2.0 engine with your own corpus ↗
By the numbers
The corpus, today.
A live snapshot of what your agent gets the moment you plug in. Refreshed by our ingest crons on per-role cadences.
71k+
Chunks
Indexed paragraphs your agent can search and cite verbatim.
14k
Sources
RFCs, OWASP, NIST SP-800, framework docs, methodology pages.
21
Role recipes
Per-role manifests tuned for security, ML, frontend, DevOps, more.
Connect
One config block — any MCP client.
A read-only global instance is running at https://lighthouse.harborgang.com. No signup for the anonymous tier; sign in to get a personal MCP token. Lighthouse appears in your tool picker with search and fetch_source.
Claude Code
Project (.mcp.json) or user (~/.claude/mcp.json)
claude mcp add lighthouse \ --transport http \ --url https://lighthouse.harborgang.com/mcp/
Adds Lighthouse to the current project. Use --scope user for a global registration. Restart and `search` appears in the tool picker.
Cursor
~/.cursor/mcp.json (or .cursor/mcp.json per project)
{
"mcpServers": {
"lighthouse": {
"url": "https://lighthouse.harborgang.com/mcp/"
}
}
}Settings → MCP picks it up. Lighthouse appears in the composer's tool list after a reload.
Claude Desktop
~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
{
"mcpServers": {
"lighthouse": {
"type": "http",
"url": "https://lighthouse.harborgang.com/mcp/"
}
}
}Merge into any existing mcpServers block. Quit + relaunch Claude Desktop.
Other clients work too — Continue · Cline · Zed · Aider · LangGraph · Codex CLI · ChatGPT (with connectors). Anything that speaks the MCP protocol.
Why hosted
A curated corpus is the floor. We keep it refreshed.
You could run it yourself — the engine is Apache-2.0 and complete. Hosted Lighthouse buys you three things the engine alone can't.
Curation
The corpus is opinionated.
We picked the canonical sources for each role and we maintain the ingest recipes. You don't have to argue with your team about whether MDN or W3C is the source of truth.
Freshness
Crons refresh on cadence.
Per-role refresh schedules. Hash-delta detects changes — re-runs are cheap. When a framework ships a new release, the relevant chunks roll over within a day.
Rerank
Cross-encoder, always on.
The hosted instance runs a tuned cross-encoder on top of BM25 + vector. Pro queries jump the rerank queue under load. Most of the eval lift comes from this layer.
Pricing
Same corpus on every tier. Caps reset 00:00 UTC.
Anonymous works without an account. Paid tiers unlock larger result sets, the cross-encoder rerank, and per-agent MCP tokens. Cancel anytime, prorated.
Anonymous
Kicking the tyres.
- 30 searches / day per IP
- Top-10 results
- Links to original sources
- No MCP token
Reader
Daily driver for one engineer.
- 200 searches / day
- Top-15 results
- 1 personal MCP endpoint
- Sort by date or relevance
Pro
One engineer, serious about it.
- 1,500 searches / day
- Top-30 results
- Cross-encoder rerank + summary boost
- Per-agent MCP tokens, revocable
- Notify on changes to bookmarked sources
- Priority in rerank queue
Team
Shared usage view, org SSO.
- Everything in Pro · 1,500/seat/day
- GitHub-org SSO (rolling out)
- Shared bookmarks + usage dashboard
- Org-scoped private corpus (opt-in)
Prefer to run it yourself? Self-hosting is free — the full engine, Apache-2.0, your infra.
Annual billing on Pro and Team saves 25%. Switch on the pricing page.
Does it actually help?
376 side-by-side tasks. 11 frontier models. Eight wins, two losses, the receipts.
We ran the same prompts twice — once cold, once with Lighthouse — across ten engineering roles. Gemini, Kimi, Qwen, and DeepSeek saw the largest gains; Claude moved slightly, GPT moved a hair. Six of ten roles improved overall. The page is honest about the roles that regressed.
Self-host
Open source · Apache-2.0The same engine, your hardware, your corpus.
Everything in this repo is the full product: hybrid BM25 + pgvector retrieval, cross-encoder rerank, MCP server, 30 importers (Notion, Confluence, Slack, S3, GitHub, sitemaps, …), admin UI, coverage-gap analytics, API keys, multi-workspace tenancy. Private repos, internal runbooks, vendor docs your compliance team won't let leave the boundary — one container + Postgres on a $20/mo VPS.
Clone and boot — Postgres included, no API keys required.
git clone https://github.com/ElMundiUA/lighthouse.git && cd lighthouse cp .env.example .env docker compose up --build
Open the admin UI, add a source (your docs sitemap, Notion, a GitHub repo — 30 importer types), hit Run.
open http://localhost:8000/ui/
Point your agent at it. Done — your corpus, citation-ready.
claude mcp add my-docs \ --transport http \ --url http://localhost:8000/mcp/
What will always be free: the engine is complete and Apache-2.0 — we will never move an existing feature behind a paywall. Hosted tiers sell curation, operations, and scale, not withheld code.
Keep going
Try it
Search the corpus
Live search box on the hosted instance — no signup.
Open
Effectiveness
Read the evals
376 tasks, 11 frontier models. Six wins, two losses, the receipts.
Open
Self-host
Lighthouse on GitHub
The same engine, Apache-2.0 — one container + Postgres on a $20/mo VPS.
Open
Pair with
Ship — the loop
Run the ticket-to-merged-PR cadence that produces the questions.
Open
Live · global instance
The cheapest way to lift weaker models is to give them a memory.
Plug into the public instance, sign in for a personal MCP token, or start Pro for the rerank-and-priority lane.
