ADR-014: Knowledge Mound Architecture
Status
Accepted
Context
Enterprise deployments require a unified knowledge management system that:
- Supports multiple document formats
- Enables semantic search across documents
- Integrates with vertical-specific knowledge bases
- Provides fact extraction and verification
- Maintains provenance and audit trails
Decision
We implemented the Knowledge Mound system with the following architecture:
Core Components
Knowledge Pipeline (aragora/knowledge/pipeline.py):
Document Ingestion -> Chunking -> Embedding -> Vector Storage -> Retrieval
Pipeline stages:
- Ingestion: Multiple format support (PDF, DOCX, MD, etc.)
- Chunking: Semantic-aware text splitting
- Embedding: OpenAI/local embedding models
- Storage: Weaviate/Pinecone/local vector store
- Retrieval: Hybrid semantic + keyword search
Vector Store Integration
Located in aragora/documents/indexing/:
Weaviate Store (weaviate_store.py):
- Primary production backend
- Supports hybrid search
- Multi-tenant isolation
Local Store:
- Development/testing
- SQLite + numpy for embeddings
Fact Registry
Located in aragora/knowledge/:
FactRegistry:
- Extracted facts from documents
- Confidence scoring
- Source attribution
VerticalKnowledge:
- Industry-specific knowledge bases
- Pre-loaded compliance frameworks
- Domain terminology
Integration Points
Debate Integration:
# In debate context
knowledge = await knowledge_mound.retrieve(
query=debate_topic,
filters={"vertical": "legal"},
limit=10
)
Workflow Integration:
knowledge_pipelinenode type in workflows- Automatic document processing in workflows
Security & Compliance
- Document-level access control
- Encryption at rest
- Audit logging for all retrievals
- GDPR/CCPA compliant deletion
Consequences
Positive:
- Unified knowledge access across verticals
- Semantic search improves relevance
- Provenance enables audit trails
- Scalable vector storage
Negative:
- Vector store dependency (Weaviate/Pinecone)
- Embedding costs for large document sets
- Chunk size tuning required per use case
- Complex multi-tenant isolation
References
aragora/knowledge/pipeline.py- Main pipelinearagora/documents/indexing/weaviate_store.py- Vector storearagora/knowledge/- Knowledge module (40K lines)docs/EVIDENCE.md- Evidence/knowledge documentation