I Gave My AI a Search Engine For My Own Notes
I have 35 markdown files documenting every service in my homelab. Caddy configs, Proxmox cluster notes, AI agent skills, firewall rules, Plex migration war stories -- the lot. They live in a private GitHub repo, and they're genuinely useful. When they're open in front of me. Which they never are when I actually need them.
The problem isn't that the documentation doesn't exist. It's that my AI assistant can't always find it when it matters.
The CLAUDE.md Approach
I'd already taken a stab at solving this. Claude Code has a memory system -- a CLAUDE.md file that gets loaded into every conversation automatically. Think of it as a cheat sheet that's always pinned to the top of the context window. Mine is... substantial.
It has a map of my infrastructure -- which nodes exist, what runs where, how to reach each service. Networking topology, branching strategy, commit conventions, platform-specific quirks (like using Homebrew's rsync on macOS instead of the ancient system one). It's a condensed brain dump of everything I thought Claude might need to know to work effectively in my environment.
And it works. When I say "SSH into the Grafana box," Claude knows exactly which IP to connect to without me spelling it out. When I create a branch, it follows the naming convention. It knows which Caddy instance handles internal traffic and which one sits in the DMZ. All of that context is right there, every time.
But here's the thing about CLAUDE.md: it's a flat file with a finite size. It's brilliant for structured reference data -- infrastructure maps, service locations, conventions. The stuff that fits neatly into a table or a bullet list. What it can't do is hold 35 detailed service architecture documents, each with their own setup steps, troubleshooting notes, config examples, and "here's what went wrong at 2am" war stories.
I tried cramming more in. It got unwieldy. The file was already pushing the limits of what's reasonable to load into every single conversation. And most of the detail was irrelevant most of the time -- you don't need the full Plex migration history when you're debugging a Caddy route.
CLAUDE.md is the index card pinned to the monitor. The docs repo is the filing cabinet. I needed a way for Claude to open the filing cabinet.
The Context Gap
Without a search tool, getting Claude to use the docs required manual orchestration. "Read the InfluxDB doc." "Now read the internal Caddy config." "Actually, check the firewall doc too, I think the port mapping is in there."
This assumes I remember which file contains what. Across 35 documents with names that range from self-explanatory to cryptic abbreviations, that's optimistic. I wrote these files. I still can't reliably tell you which one has the thing about the Sonarr custom header.
What I remember is vibes. "Something about certificate syncing between nodes." "The InfluxDB retention policy setup." "That bit where Plex needed the GPU passthrough." I know what I'm looking for. I just don't know where I put it.
This is the documentation equivalent of knowing your keys are somewhere in the house. You could check every room systematically, or you could give someone a description and let them find it. I went looking for that someone.
Enter qmd
qmd is a local search engine for markdown files. Not a cloud service, not a SaaS product, not something that sends your private homelab documentation to a server in Virginia. It runs entirely on your machine, indexes your markdown files, and lets you search them three different ways:
- Keyword search (BM25) -- the classic. Fast, exact, good for when you know the words you're looking for
- Semantic search (vector) -- the clever one. Uses embeddings to find documents by meaning, not just matching words. Search for "reverse proxy SSL" and it finds your certificate syncing doc even if those exact words aren't in it
- Hybrid search -- both of the above, combined with LLM re-ranking. The "I have no idea what I called this but here's roughly what it was about" mode
All of this runs locally using GGUF models via node-llama-cpp. The embedding model is about 330MB. It downloaded in under 30 seconds and I haven't thought about it since.
Setting It Up
Four commands. I'm not exaggerating.
# Install
npm install -g @tobilu/qmd
# Point it at your docs
qmd collection add ~/path/to/docs --name my-docs
# Generate embeddings
qmd embed
# Add as MCP server to Claude Code
claude mcp add qmd -- qmd mcpThe collection scan found 41 markdown files and indexed them immediately. The embedding step chunked those into 153 pieces and vectorised the lot in about 9 seconds. The model download was the slowest part, and even that was under a minute.
That's it. No config files. No YAML to wrestle with. No Docker compose that needs three environment variables you'll forget to set.
CLAUDE.md + qmd: The Two-Layer System
Here's what the setup looks like now. CLAUDE.md is still there, still loaded every conversation, still doing what it does best -- quick reference tables, infrastructure topology, conventions. The stuff Claude needs in every session regardless of what we're working on.
qmd sits behind it as the deep knowledge layer. When Claude needs more than what's on the cheat sheet -- the full architecture of a service, the step-by-step setup, the troubleshooting notes from when something broke -- it searches the docs collection and pulls in exactly the chunks it needs.
Before, the workflow looked like this:
"What's the Caddy config for the Overseerr route?"
Claude checks CLAUDE.md. Finds the IP for the internal reverse proxy. Doesn't know the actual Caddyfile config. I say "read the Caddy doc." Claude reads the whole file. Finds the relevant section. Answers.
Now:
"What's the Caddy config for the Overseerr route?"
Claude already knows the IP from CLAUDE.md. Searches qmd for "Caddy Overseerr route config." Gets back the specific chunk with the Caddyfile block. Answers. One step. I didn't have to remember the filename or tell it where to look.
The real payoff is the moments where you're mid-task and need context you didn't know you needed. I was setting up a new Caddy route and Claude pulled in information from three different docs -- the DMZ proxy config, the certificate syncing setup, and the firewall rules -- without being asked. It found them because they were semantically relevant to what we were doing. I didn't have to orchestrate that. The search engine in the background just... handled it.
CLAUDE.md tells Claude what my homelab looks like. qmd tells it how everything works.
The MCP Tools
When qmd runs as an MCP server, it exposes four tools that Claude Code can use natively:
- query -- the main search. Supports keyword, vector, and hybrid modes. This is the one that does the heavy lifting
- get -- fetch a specific document by path or ID. For when you know exactly what you want
- multi_get -- batch retrieval by glob pattern. "Give me everything in the agent skills subdirectory"
- status -- index health and collection info. Mostly useful for checking it's actually running
The query tool is where the magic is. It takes typed sub-queries -- you can specify whether you want lexical, vector, or HyDE (Hypothetical Document Embedding) search -- and combines results using Reciprocal Rank Fusion before optionally re-ranking with an LLM. That's a lot of words to say "it finds the right document even when your search terms are vague."
Living Documentation
There's a secondary effect I didn't anticipate. Knowing that my docs are actually being searched and used -- not just by me squinting at a GitHub repo, but by an AI that surfaces them at the right moment -- has made me write better documentation.
When CLAUDE.md was the only bridge, I was tempted to cram everything important into it. Port numbers, config snippets, the lot. The detailed docs were an afterthought -- nice to have, but Claude wouldn't see them unless I specifically pointed it there. So they'd drift. Ports would change, auth sections would go stale, and I wouldn't notice because nobody was reading them.
Now that qmd makes every sentence in those docs searchable and retrievable, I actually care about keeping them accurate. I've started adding more context to my notes. Instead of just a port number in the InfluxDB doc, I now include the full connection URL and any auth details, because I know a search for "how to connect to InfluxDB" will land on that sentence and it needs to be self-contained.
Documentation that gets used gets maintained. Documentation that sits in a repo untouched slowly rots until someone (me, three months later) reads it and discovers half the ports have changed and the entire auth section is from before the migration. Having a search engine pointed at your docs turns them from a write-once archive into something closer to a living system.
The Bottom Line
35 markdown files. 153 indexed chunks. A 330MB embedding model running locally. Four CLI commands to set up.
CLAUDE.md gives Claude the quick-reference card -- infrastructure topology, service locations, conventions. qmd gives it the full library. Between the two, my AI assistant now knows everything I've ever documented about my homelab, and can find the relevant bits faster than I can remember where I put them. Which, given my track record with remembering where I put things, is a low bar. But it clears it comfortably.
qmd is on GitHub. If you've got a pile of markdown files that you know are useful but can never find when you need them, point qmd at the directory and give your AI a memory upgrade. It takes five minutes and it's one of those rare tools where the setup cost is genuinely less than the time it saves you the first time you use it.