Skip to main content

Testing Guide

Comprehensive guide for testing the Aragora codebase.

Quick Start

# Run all tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# Run specific test file
pytest tests/test_debate_convergence_comprehensive.py

# Run specific test class
pytest tests/test_debate_convergence_comprehensive.py::TestJaccardBackend

# Run specific test
pytest tests/test_debate_convergence_comprehensive.py::TestJaccardBackend::test_identical_texts

Test Tiers

Use scripts/test_tiers.sh for common tiers:

TierCommandNotes
fastscripts/test_tiers.sh fastSkip slow/load/e2e for rapid feedback
ciscripts/test_tiers.sh ciMirrors the main CI test run
lintscripts/test_tiers.sh lintBlack + Ruff checks
typecheckscripts/test_tiers.sh typecheckMypy checks
frontendscripts/test_tiers.sh frontendJest/RTL in aragora/live
e2escripts/test_tiers.sh e2ePlaywright E2E in aragora/live

Integration Baseline Runner

Use scripts/run_integration_baseline.py for tiered integration baselines that match CI expectations and provide quick local feedback:

# Smoke tests (fastest)
python scripts/run_integration_baseline.py smoke

# PR-safe integration baseline (mocked dependencies)
python scripts/run_integration_baseline.py offline

# Knowledge/CDC integration focus
python scripts/run_integration_baseline.py knowledge

# Full integration suite
python scripts/run_integration_baseline.py full

Environment controls:

  • ARAGORA_BASELINE_PARALLEL - worker count (default: auto)
  • ARAGORA_BASELINE_TIMEOUT - per-test timeout seconds (default: 60)

CI Mapping

CI workflows and what they cover:

WorkflowPurpose
.github/workflows/test.ymlPytest matrix + smoke tests + frontend build
.github/workflows/lint.ymlBlack, Ruff, mypy, ESLint, Bandit
.github/workflows/e2e.ymlFull Playwright E2E + Python E2E harness tests
.github/workflows/integration.ymlE2E harness, integration, and control plane tests
.github/workflows/load-tests.ymlScheduled load tests and memory checks

TestFixer Automation

Automated fix loop for the first failing test:

  • Workflow: .github/workflows/testfixer-auto.yml
  • Trigger: on failed Tests workflow or manual dispatch
  • Requires: OPENAI_API_KEY, ANTHROPIC_API_KEY secrets
  • Output: PR with proposed fix, plus .testfixer/attempts.jsonl artifact
  • Artifacts: .testfixer/runs/<timestamp>_<runid>_<sha>/ (stdout/stderr, exit code, env/resources, kernel logs when available)

You can also run it locally:

aragora testfixer . --test-command "pytest tests/ -q --maxfail=1"

# Custom artifact location or disable diagnostics
aragora testfixer . --artifacts-dir /tmp/testfixer-runs
aragora testfixer . --no-diagnostics

Test Organization

tests/
├── conftest.py # Shared fixtures and setup
├── test_*.py # Main test files (400+)
├── benchmarks/ # Performance benchmarks
├── e2e/ # End-to-end tests with test harness
│ ├── conftest.py # E2E-specific fixtures
│ ├── harness.py # E2E test harness implementation
│ └── test_full_flow.py # Full system flow tests
├── integration/ # Integration tests
│ ├── conftest.py # Integration-specific fixtures
│ ├── test_api_workflow.py # API workflow tests
│ ├── test_debate_lifecycle.py # Full debate lifecycle
│ └── test_websocket_events.py # WebSocket event tests
├── skills/ # Skills system tests
│ ├── builtin/ # Built-in skill coverage
│ └── test_*.py # Registry/loader/base tests
├── security/ # Security tests
│ ├── test_auth_boundaries.py # Auth boundary tests
│ ├── test_cors.py # CORS handling
│ ├── test_csrf_protection.py # CSRF protection
│ ├── test_input_validation.py # Input validation
│ ├── test_rate_limit_enforcement.py
│ └── test_sql_injection.py # SQL injection prevention
└── storage/ # Storage layer tests
└── test_*.py # Database and persistence tests

Running Tests

Basic Commands

# Run all tests with coverage
pytest tests/ --cov=aragora --cov-report=html

# Run tests excluding slow tests
pytest tests/ -m "not slow"

# Run only integration tests
pytest tests/integration/ -m integration

# Run with specific timeout
pytest tests/ --timeout=30

# Run in parallel (requires pytest-xdist)
pytest tests/ -n auto

# Run with output capture disabled (see prints)
pytest tests/ -s

Auth/RBAC Checks

By default, handler permission decorators are bypassed during tests. Set ARAGORA_TEST_REAL_AUTH=1 to exercise real RBAC checks in a test run.

If RBAC decorators block integration endpoints, mock an AuthorizationContext in the integration tests (see tests/integration/test_*_api.py).

OpenAPI Route Validation

Validate handler routes against the OpenAPI spec:

python scripts/validate_openapi_routes.py

Test Markers

Tests use markers to categorize them:

MarkerDescriptionUsage
slowTests that take >5 seconds-m "not slow" to skip
loadLoad/stress tests-m load to run only
integrationIntegration tests-m integration
integration_minimalMinimal integration baseline (no external services)-m integration_minimal
knowledgeKnowledge Mound tests-m knowledge
e2eEnd-to-end tests-m e2e
no_auto_authOpt out of autouse auth mocking in handler tests-m no_auto_auth
# Run only fast tests
pytest tests/ -m "not slow and not load"

# Run integration tests only
pytest tests/ -m integration

# Run minimal integration baseline
pytest tests/integration/ -m integration_minimal

# Run everything except e2e
pytest tests/ -m "not e2e"

Environment Variables

Some tests require specific environment configuration:

# Force specific similarity backend (avoid slow model loading)
ARAGORA_CONVERGENCE_BACKEND=jaccard pytest tests/test_debate_convergence_comprehensive.py

# Set test API keys (use mock values for unit tests)
ANTHROPIC_API_KEY=test-key pytest tests/

# Enable debug logging
ARAGORA_DEBUG=1 pytest tests/ -v

Writing Tests

Test File Naming

  • Test files: test_<module_name>.py
  • Test classes: Test<ClassName>
  • Test methods: test_<behavior_description>
# tests/test_debate_convergence.py
class TestJaccardBackend:
def test_identical_texts_return_similarity_of_one(self):
...

Using Fixtures

Common fixtures from conftest.py:

import pytest

def test_with_temp_database(temp_db):
"""Uses temporary SQLite database that's cleaned up after test."""
from aragora.ranking.elo import EloSystem
elo = EloSystem(db_path=temp_db)
# ... test code

def test_with_mock_storage(mock_storage):
"""Uses pre-configured mock DebateStorage."""
debates = mock_storage.list_debates()
assert len(debates) == 2

def test_with_mock_agents(mock_agents):
"""Uses list of 3 mock agents: claude, gemini, gpt4."""
assert len(mock_agents) == 3

def test_requiring_api_keys(mock_api_keys):
"""Sets mock ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY."""
import os
assert os.getenv("ANTHROPIC_API_KEY") == "test-anthropic-key"

def test_with_clean_environment(clean_env):
"""All API keys removed for clean slate testing."""
import os
assert os.getenv("ANTHROPIC_API_KEY") is None

Async Tests

Use pytest-asyncio for async tests:

import pytest

@pytest.mark.asyncio
async def test_async_debate():
"""Async tests are automatically detected."""
result = await some_async_function()
assert result is not None

class TestAsyncClass:
@pytest.mark.asyncio
async def test_async_method(self):
"""Async methods in classes work too."""
pass

Mocking External Services

from unittest.mock import Mock, patch, AsyncMock

def test_api_call_mocked():
"""Mock external API calls."""
with patch("aragora.agents.anthropic.AnthropicAgent._call_api") as mock:
mock.return_value = {"content": "mocked response"}
# ... test code

@pytest.mark.asyncio
async def test_async_api_mocked():
"""Mock async API calls."""
with patch("module.async_function", new=AsyncMock(return_value="result")):
result = await some_function()
assert result == "result"

Testing Error Handling

import pytest

def test_raises_on_invalid_input():
"""Test that exceptions are raised correctly."""
with pytest.raises(ValueError, match="Invalid input"):
function_that_should_raise("bad input")

def test_custom_exception():
"""Test custom exception types."""
from aragora.exceptions import DebateError
with pytest.raises(DebateError):
start_invalid_debate()

Fixtures Reference

Auto-Used Fixtures

These run automatically for every test:

FixturePurpose
reset_circuit_breakersResets CircuitBreaker state
clear_handler_cacheClears handler TTL cache
reset_supabase_envClears Supabase env vars
reset_lazy_globalsResets module-level lazy globals

Available Fixtures

FixtureDescription
temp_dbTemporary SQLite database path
temp_dirTemporary directory (Path)
temp_nomic_dirTemporary nomic state directory
mock_storageMock DebateStorage with sample data
mock_elo_systemMock EloSystem with sample rankings
mock_agentSingle mock agent
mock_agentsList of 3 mock agents
mock_environmentMock Environment for arena testing
mock_emitterMock event emitter
mock_auth_configMock AuthConfig
handler_contextComplete handler context dict
elo_systemReal EloSystem with temp database
continuum_memoryReal ContinuumMemory with temp database
clean_envClears all API key env vars
mock_api_keysSets mock API keys
sample_debate_messagesSample debate message list
sample_critiqueSample critique dict

Coverage

Running Coverage

# Basic coverage
pytest tests/ --cov=aragora

# HTML report
pytest tests/ --cov=aragora --cov-report=html
open htmlcov/index.html

# Terminal report
pytest tests/ --cov=aragora --cov-report=term-missing

# Fail if below threshold
pytest tests/ --cov=aragora --cov-fail-under=70

# Coverage for specific modules
pytest tests/ --cov=aragora/debate --cov=aragora/server

Current Coverage Targets

ModuleTargetPriority
aragora/debate/80%Critical
aragora/server/handlers/70%High
aragora/agents/60%Medium
aragora/memory/70%High
aragora/billing/80%Critical

E2E Test Harness

The E2E test harness provides a complete integration testing environment for testing full system workflows including the control plane, task scheduling, and debate orchestration.

Harness Overview

The harness (tests/e2e/harness.py) spins up:

  • ControlPlaneCoordinator
  • TaskScheduler
  • Mock agents with configurable behaviors
  • Optional Redis/PostgreSQL connections
  • Metrics and tracing support

Running E2E Tests Locally

# Run all E2E tests
pytest tests/e2e/ -v

# Run with Redis (requires Redis running locally)
REDIS_URL=redis://localhost:6379 pytest tests/e2e/ -v

# Run with verbose CI-style logging
ARAGORA_CI=true pytest tests/e2e/ -v

# Run specific test class
pytest tests/e2e/test_full_flow.py::TestDebateIntegration -v

CI-safe E2E subset

Some E2E suites exercise external services (Redis, third-party connectors) and may require local dependencies. For a stable CI-friendly subset that stays in-memory, run:

pytest \
tests/e2e/test_full_flow.py \
tests/e2e/test_control_plane_workflows.py \
tests/e2e/test_debate_crash_recovery.py \
tests/e2e/test_chat_result_routing.py \
tests/e2e/test_api_rate_limiting.py \
tests/e2e/test_complete_user_signup_flow.py -v

Using the Harness in Tests

Basic usage with context manager:

import pytest
from tests.e2e.harness import e2e_environment, E2ETestConfig

@pytest.mark.asyncio
async def test_task_workflow():
async with e2e_environment() as harness:
# Submit a task
task_id = await harness.submit_task(
task_type="analysis",
payload={"input": "test data"},
required_capabilities=["analysis"],
)

# Wait for completion (harness auto-processes with mock agents)
result = await harness.wait_for_task(task_id)

assert result is not None
assert result.status.value == "completed"

Using fixtures from conftest.py:

@pytest.mark.asyncio
async def test_with_fixture(e2e_harness):
# Fixture provides pre-configured harness with 3 agents
task_id = await e2e_harness.submit_task("test", {"data": "value"})
result = await e2e_harness.wait_for_task(task_id)
assert result is not None

Running Debates Through the Harness

@pytest.mark.asyncio
async def test_debate():
async with e2e_environment() as harness:
# Run a debate directly
result = await harness.run_debate(
topic="Should we use microservices?",
rounds=3,
)

assert result is not None

# Or run via control plane (full task lifecycle)
result = await harness.run_debate_via_control_plane(
topic="API design best practices",
rounds=2,
)

assert result["consensus_reached"] is True

Custom Agent Configuration

from tests.e2e.harness import E2ETestConfig, MockAgentConfig, e2e_environment

@pytest.mark.asyncio
async def test_custom_agents():
config = E2ETestConfig(
num_agents=5,
agent_capabilities=["code", "review"],
fail_rate=0.1, # 10% simulated failures
)

async with e2e_environment(config) as harness:
# Create additional specialized agent
agent = await harness.create_agent(
agent_id="specialist",
capabilities=["security", "audit"],
)

# Test with specialized capability
task_id = await harness.submit_task(
task_type="audit",
payload={},
required_capabilities=["security"],
)

result = await harness.wait_for_task(task_id)

Available E2E Fixtures

FixtureDescription
e2e_harnessBasic harness with 3 agents, in-memory storage
e2e_harness_with_redisHarness with Redis backend
debate_harnessDebate-focused harness with 4 agents
load_test_harnessLoad testing harness with 10 agents
harness_configCustomizable configuration object
mock_agent_factoryFactory for creating mock agents

CI Integration

E2E tests run in CI via .github/workflows/integration.yml:

# What CI runs
pytest tests/e2e/ -v --tb=short --timeout=180

# With environment
REDIS_URL=redis://localhost:6379
ARAGORA_CI=true

The harness automatically adjusts timeouts when running in CI:

  • Default timeout: 30s (local) / 60s (CI)
  • Task timeout: 10s (local) / 30s (CI)

Specialized Harnesses

DebateTestHarness - For debate-focused testing:

from tests.e2e.harness import DebateTestHarness

harness = DebateTestHarness()
await harness.start()

# Run tracked debates
await harness.run_debate_with_tracking("Topic 1", rounds=2)
await harness.run_debate_with_tracking("Topic 2", rounds=2)

# Get metrics
rate = harness.get_consensus_rate()
results = harness.get_debate_results()

await harness.stop()

LoadTestHarness - For load/performance testing:

from tests.e2e.harness import LoadTestHarness

harness = LoadTestHarness()
await harness.start()

# Submit many tasks concurrently
task_ids = await harness.submit_concurrent_tasks(count=100)

# Measure throughput
metrics = await harness.measure_throughput(task_count=50)
print(f"Tasks/second: {metrics['tasks_per_second']}")

await harness.stop()

CI/CD Integration

Tests run automatically on CI:

# .github/workflows/test.yml (simplified)
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: ["3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install -e ".[dev,research]" && pip install pytest-cov pytest-timeout
- run: pytest tests/ -v --timeout=60 --cov=aragora --cov-report=xml --cov-report=term-missing -x --tb=short

Test Commands in CI

# Full test suite
pytest tests/ -v --timeout=60 --cov=aragora --cov-report=xml --cov-report=term-missing -x --tb=short

# CLI smoke (demo)
aragora ask "Smoke test a demo debate" --demo --rounds 1

# Server smoke (non-default ports to avoid collisions)
aragora serve --api-port 8090 --ws-port 8766 --host 127.0.0.1

Debugging Tests

Using pdb

# Drop into debugger on failure
pytest tests/test_file.py --pdb

# Drop into debugger on first failure
pytest tests/test_file.py --pdb -x

# Start debugger at specific line
# Add: import pdb; pdb.set_trace() in code
pytest tests/test_file.py -s

Verbose Output

# Show test names as they run
pytest tests/ -v

# Show full diff on assertion failures
pytest tests/ -vv

# Show local variables in tracebacks
pytest tests/ --tb=long

# Show only the first N failures
pytest tests/ --maxfail=3

Logging

import logging

def test_with_logging(caplog):
"""Capture log output during tests."""
with caplog.at_level(logging.DEBUG):
function_that_logs()
assert "expected message" in caplog.text

Performance Testing

Benchmarks

# Run benchmarks
pytest tests/benchmarks/ -v

# Run task ROI/quality benchmark harness (offline demo mode)
python benchmarks/task_bench.py --mode demo --profile fast
# Include Pulse trending context (network required)
python benchmarks/task_bench.py --mode demo --profile fast --enable-trending

Sample artifacts live in `examples/task_bench/demo`.

# Run with timing info
pytest tests/ --durations=10

# Profile slow tests
pytest tests/ --profile

Load Tests

# Run load tests only
pytest tests/ -m load -v

# Example load test
pytest tests/integration/test_server_under_load.py -v

Best Practices

Test Isolation

  1. Use fixtures for setup/teardown - Don't rely on test order
  2. Mock external dependencies - Don't make real API calls
  3. Use temporary databases - Clean state for each test
  4. Reset global state - Use autouse fixtures

Test Quality

  1. One assertion per test (when practical)
  2. Descriptive test names - test_<action>_<expected_result>
  3. Test edge cases - Empty inputs, None values, boundaries
  4. Test error paths - Exceptions, invalid inputs

Performance

  1. Mock slow operations - API calls, model loading
  2. Use @pytest.mark.slow - For tests >5 seconds
  3. Parallelize with -n auto - When tests are isolated
  4. Cache expensive fixtures - Use scope="session" when safe

Domain-Specific E2E Tests

In addition to the system-level E2E harness, Aragora includes domain-specific E2E test suites for compliance, privacy, and security features.

Compliance E2E Tests

Location: tests/e2e/test_compliance_e2e.py

Tests SOC 2, GDPR, and audit compliance workflows:

# Run compliance E2E tests
pytest tests/e2e/test_compliance_e2e.py -v

# Run specific compliance area
pytest tests/e2e/test_compliance_e2e.py::TestSOC2ReportGeneration -v
pytest tests/e2e/test_compliance_e2e.py::TestGDPRExport -v
pytest tests/e2e/test_compliance_e2e.py::TestRightToBeForgotten -v

Coverage includes:

  • SOC 2 report generation
  • GDPR data export (Article 15)
  • Right-to-be-Forgotten (Article 17) with grace periods
  • Audit log query and export
  • Audit integrity verification

Privacy E2E Tests

Location: tests/e2e/test_privacy_e2e.py

Tests consent management, data retention, and anonymization:

# Run privacy E2E tests
pytest tests/e2e/test_privacy_e2e.py -v

# Run specific privacy area
pytest tests/e2e/test_privacy_e2e.py::TestConsentFlow -v
pytest tests/e2e/test_privacy_e2e.py::TestDataRetention -v
pytest tests/e2e/test_privacy_e2e.py::TestAnonymization -v

Coverage includes:

  • Consent lifecycle (grant, check, revoke)
  • Data retention policies (delete, archive, anonymize)
  • HIPAA anonymization (redaction, hashing)
  • K-anonymity verification
  • Differential privacy validation

DSAR/ABAC E2E Tests

Location: tests/e2e/test_dsar_abac_lifecycle.py

Tests Data Subject Access Requests and Attribute-Based Access Control:

# Run DSAR/ABAC tests
pytest tests/e2e/test_dsar_abac_lifecycle.py -v

# Run specific areas
pytest tests/e2e/test_dsar_abac_lifecycle.py::TestGDPRArticle15Export -v
pytest tests/e2e/test_dsar_abac_lifecycle.py::TestABACTimeConditions -v
pytest tests/e2e/test_dsar_abac_lifecycle.py::TestABACIPConditions -v

Coverage includes:

  • GDPR Article 15 data export workflow
  • RTBF lifecycle (grace periods, legal holds, cancellation)
  • Data portability (machine-readable format)
  • Multi-tenant data isolation
  • Time-based access control (business hours, date ranges)
  • IP-based restrictions (allowlist, CIDR, blocklist)
  • Resource ownership conditions
  • Tag-based access control
  • Combined ABAC conditions

Secrets Rotation E2E Tests

Location: tests/e2e/test_secrets_rotation_lifecycle.py

Tests secrets management and rotation for SOC 2 CC6.2 compliance:

# Run secrets rotation tests
pytest tests/e2e/test_secrets_rotation_lifecycle.py -v

# Run specific areas
pytest tests/e2e/test_secrets_rotation_lifecycle.py::TestSecretRotation -v
pytest tests/e2e/test_secrets_rotation_lifecycle.py::TestVerificationAndRollback -v
pytest tests/e2e/test_secrets_rotation_lifecycle.py::TestComplianceReporting -v

Coverage includes:

  • Secret registration (API keys, JWT, database, OAuth, encryption keys)
  • Full rotation lifecycle (rotate, verify, complete)
  • Grace period handling
  • Verification and rollback scenarios
  • Custom rotation handlers
  • Compliance reporting
  • Due secrets detection and processing

Audit Retention E2E Tests

Location: tests/e2e/test_audit_retention_lifecycle.py

Tests audit log retention and compliance export for SOC 2 CC6.3 and CC7.1:

# Run audit retention tests
pytest tests/e2e/test_audit_retention_lifecycle.py -v

# Run specific areas
pytest tests/e2e/test_audit_retention_lifecycle.py::TestRetentionEnforcement -v
pytest tests/e2e/test_audit_retention_lifecycle.py::TestComplianceExport -v
pytest tests/e2e/test_audit_retention_lifecycle.py::TestIntegrityVerification -v

Coverage includes:

  • Audit entry creation and hash computation
  • Retention policy management (90-day, 7-year SOC 2)
  • Retention enforcement (removal of expired entries)
  • Compliance export (SOC 2 Type II, ISO 27001, Syslog)
  • Audit query and filtering (action, actor, time range)
  • Hash chain integrity verification (tamper detection)
  • Sequence number continuity checks

SOC 2 Test Organization

Domain E2E tests map to SOC 2 controls:

ControlTest FileKey Areas
CC6.2 Credential Managementtest_secrets_rotation_lifecycle.pyKey rotation, grace periods
CC6.3 Change Managementtest_audit_retention_lifecycle.pyConfig change audit trails
CC7.1 System Monitoringtest_audit_retention_lifecycle.pyAudit retention, integrity
CC7.2 Security Monitoringtest_compliance_e2e.pyAudit logs, SOC 2 reports
P1-02 Data Subject Rightstest_dsar_abac_lifecycle.pyDSAR export, RTBF
CC6.1 Access Controltest_dsar_abac_lifecycle.pyABAC conditions
P3-01 Privacy Noticestest_privacy_e2e.pyConsent management

Running All Domain E2E Tests

# Run all domain-specific E2E tests
pytest tests/e2e/test_compliance_e2e.py tests/e2e/test_privacy_e2e.py \
tests/e2e/test_dsar_abac_lifecycle.py tests/e2e/test_secrets_rotation_lifecycle.py \
tests/e2e/test_audit_retention_lifecycle.py -v

# Quick compliance validation
pytest tests/e2e/ -k "compliance or privacy or dsar or secrets or audit" -v --tb=short

# Full E2E suite (system + domain)
pytest tests/e2e/ -v

Troubleshooting

Common Issues

Tests hang loading models:

# Force fast backend
ARAGORA_CONVERGENCE_BACKEND=jaccard pytest tests/

Tests fail with missing env vars:

# Use mock API keys fixture
@pytest.fixture(autouse=True)
def setup_env(mock_api_keys):
yield

Tests pollute each other:

# Add to conftest.py
@pytest.fixture(autouse=True)
def reset_state():
yield
# Reset after each test
clear_global_state()

Async tests timeout:

@pytest.mark.asyncio
@pytest.mark.timeout(30) # Explicit timeout
async def test_slow_async():
...

Getting Help

  • Check existing test files for patterns
  • Review conftest.py for available fixtures
  • Run with -v --tb=long for detailed errors
  • Use --pdb to drop into debugger