Tree-RAG freed from provider lock-in. Offline execution across Ollama, llama.cpp, and vLLM - finish-reason normalization makes traversal correctness independent of which runtime answers.
PageIndex's tree-RAG was hardwired to OpenAI's API contract - inline prompt strings, non-normalized completion handling. Any provider switch corrupted recursive traversal silently, with errors surfacing only at collapse. All inference required a live API key, blocking offline or air-gapped execution entirely. Token encoding differences across providers produced inconsistent chunk boundaries with no visible signal.
The failure mode was particularly dangerous: traversal would proceed, produce output, and only reveal the corruption at the final collapse stage. There was no early signal that the provider difference had invalidated the intermediate steps.
Forked and refactored with a provider-routing abstraction resolved via environment variables. A finish-reason normalization layer stabilizes recursive traversal across model outputs - traversal depends on stable internal contracts, not whichever runtime answered. Prompts externalized into a registry loader, removing the coupling between prompt content and provider expectations.
The key insight: tree-RAG correctness depends on finish-reason semantics, not provider identity. Normalizing finish reasons before they enter traversal logic makes the rest of the pipeline provider-agnostic. Bounded async concurrency across TOC generation and summarization prevents resource saturation on constrained hardware. Hierarchical fallback handles cases where the primary chunking strategy exceeds available VRAM.
Fully offline tree-RAG with Ollama - no API keys. Seamless provider switching across Ollama, llama.cpp, and vLLM via stable internal contracts, with no traversal corruption on change. Vendor lock-in eliminated. Fallback chunk policies handle constrained VRAM and RAM without requiring manual tuning.