Document Ingestion & Processing

Aragora ingests documents as evidence for debates, audits, and workflows. The DocumentConnector uses the unified DocumentParser to extract text, tables, and metadata into structured evidence objects.

Supported Formats

The parser supports common business and technical formats, including:

Office: PDF, DOC/DOCX, XLS/XLSX, PPT/PPTX
Text: TXT, Markdown, RST, HTML, RTF
Structured data: JSON, YAML, XML, CSV
Notebooks: IPYNB
E-books: EPUB, MOBI
Archives: ZIP, TAR, GZIP (expanded before parsing)

Source: aragora/connectors/documents/parser.py

Connector Usage

from aragora.connectors.documents import DocumentConnector

connector = DocumentConnector(max_pages=100, extract_tables=True)

# Parse a file from disk
results = await connector.search_file("/path/to/report.pdf")

# Parse raw bytes
with open("policy.docx", "rb") as f:
    results = await connector.search_bytes(f.read(), filename="policy.docx")

# Search within parsed content
matches = await connector.search("access control", limit=5)

Source: aragora/connectors/documents/connector.py

Where Documents Flow

Evidence system for grounded debate (see EVIDENCE.md)
Gauntlet for adversarial audits (see GAUNTLET.md)
Knowledge Mound for long-term storage (see KNOWLEDGE_MOUND.md)

Configuration Notes

Document parsing limits and table extraction are configured in the connector constructor. For production workloads, tune page limits and max content size to match your document sizes and compute budget.

Supported Formats​

Connector Usage​

Where Documents Flow​

Configuration Notes​

Supported Formats

Connector Usage

Where Documents Flow

Configuration Notes