DocSage — RAG over PDFs with Citations
A RAG chatbot over any PDF set, answering with inline source citations and confidence scores.
Founders with large document sets need AI that cites its sources — not hallucinates them. Legal, policy, manuals, and API docs all require verifiable answers.
Built a RAG pipeline with semantic chunking, hybrid retrieval (vector + keyword), and strict citation enforcement via structured output. Chose Postgres + pgvector over Pinecone to keep infra near zero for small-to-medium corpora. GPT-4o-mini keeps per-query cost under $0.002.
per query
3-5s end-to-end latency
citation coverage
Zero infra lock-in
Live demo preloaded with Next.js, Stripe, and Kubernetes documentation. Visitors can ask technical questions and see inline citations to the exact PDF + page.
