I built Clonar, a Node.js RAG pipeline inspired by Perplexity that does explicit multihop reasoning over query intent, retrieval planning, and answer synthesis.
Why multihop matters: Most RAG systems do "retrieve → synthesize" in one shot. Clonar chains 8 reasoning stages where each step conditions on prior outputs: query rewrite → clarification gate → filter extraction → grounding decision → retrieval planning → execution → quality-aware synthesis → optional deep-mode critique (which triggers a second full retrieval pass if the answer is insufficient).
Technical approach:
TypeScript/Node backend with REST and SSE streaming endpoints
Orchestrator pattern: orchestrator.ts sequences LLM calls for planning vs synthesis
Pluggable retrievers via pipeline-deps.ts (currently web via Perplexity API, extensible to vector DBs / SQL)
Session + user memory for cross-turn reasoning
Quality-driven formatting: derives retrieval confidence and adjusts UI hints
What's interesting:
The model reasons about whether to clarify ambiguous queries before retrieval (step 2)
Retrieval plan is generated dynamically based on extracted filters and vertical (step 5), not hardcoded
Deep mode (step 8): critique → expand prompt → second 7-stage run when first answer is weak
No frontend needed—call /api/query or /api/query/stream from curl/Postman
Feedback I'd love:
Is the 8-stage flow overkill or are there steps I'm missing?
API design for plugging in custom retrievers (vector, SQL, etc.)
Production observability—I have basic tracing/metrics; what else would you prioritize?
I built Clonar, a Node.js RAG pipeline inspired by Perplexity that does explicit multihop reasoning over query intent, retrieval planning, and answer synthesis.
Why multihop matters: Most RAG systems do "retrieve → synthesize" in one shot. Clonar chains 8 reasoning stages where each step conditions on prior outputs: query rewrite → clarification gate → filter extraction → grounding decision → retrieval planning → execution → quality-aware synthesis → optional deep-mode critique (which triggers a second full retrieval pass if the answer is insufficient).
Technical approach:
TypeScript/Node backend with REST and SSE streaming endpoints
Orchestrator pattern: orchestrator.ts sequences LLM calls for planning vs synthesis
Pluggable retrievers via pipeline-deps.ts (currently web via Perplexity API, extensible to vector DBs / SQL)
Session + user memory for cross-turn reasoning
Quality-driven formatting: derives retrieval confidence and adjusts UI hints
What's interesting:
The model reasons about whether to clarify ambiguous queries before retrieval (step 2)
Retrieval plan is generated dynamically based on extracted filters and vertical (step 5), not hardcoded
Deep mode (step 8): critique → expand prompt → second 7-stage run when first answer is weak
No frontend needed—call /api/query or /api/query/stream from curl/Postman
Feedback I'd love:
Is the 8-stage flow overkill or are there steps I'm missing?
API design for plugging in custom retrievers (vector, SQL, etc.)
Production observability—I have basic tracing/metrics; what else would you prioritize?
Repo: https://github.com/clonar714-jpg/clonar
Stack: Node 18+, TypeScript, OpenAI + Perplexity APIs, optional Redis/Postgres
Happy to answer questions about the architecture or design tradeoffs.