Troubleshooting Guide
Common issues and solutions for Aragora.
Table of Contents
- Quick Diagnostics
- Authentication Issues
- Redis Connection Issues
- Rate Limiting Issues
- Server Issues
- WebSocket Connection Issues
- Nomic Loop Problems
- Database Issues
- API Key Configuration
- Frontend Issues
Quick Diagnostics
Run the built-in diagnostics command:
aragora doctor
This checks:
- API key configuration
- Database connectivity
- Redis connection (if configured)
- Circuit breaker status
- Memory usage
Authentication Issues
"Invalid credentials" on login
Symptoms: Login fails with "Invalid email or password"
Solutions:
-
Check password case sensitivity (passwords are case-sensitive)
-
Account may be locked:
curl -X GET /api/v2/admin/users/\{user_id\}/lockout-status \
-H "Authorization: Bearer $ADMIN_TOKEN" -
Reset password via CLI:
python -c "
from aragora.storage.user_store import UserStore
store = UserStore('.nomic/aragora_users.db')
store.reset_password('user@example.com', 'new-password')
"
"Account locked" error
Symptoms: "Account locked for X seconds" after failed login attempts
Lockout durations:
- 5 failed attempts: 1 minute
- 10 failed attempts: 15 minutes
- 15+ failed attempts: 1 hour
Solutions:
-
Wait for lockout to expire
-
Admin unlock:
curl -X POST /api/v2/admin/users/\{user_id\}/unlock \
-H "Authorization: Bearer $ADMIN_TOKEN" -
Direct database unlock (emergency only):
sqlite3 .nomic/aragora_users.db \
"UPDATE users SET locked_until=NULL, failed_login_count=0 WHERE email='user@example.com'"
MFA verification fails
Symptoms: TOTP codes not accepted
Solutions:
-
Check time synchronization:
- TOTP codes use 30-second windows
- Ensure device and server clocks are synced (UTC)
date -u # Should match device time -
Use backup codes (each code works once)
-
Admin MFA reset:
curl -X DELETE /api/v2/admin/users/\{user_id\}/mfa \
-H "Authorization: Bearer $ADMIN_TOKEN"
JWT token errors
Symptoms: "Token expired" or "Invalid token"
Solutions:
-
Re-authenticate to get a fresh token
-
Check token expiry:
import jwt
decoded = jwt.decode(token, options={"verify_signature": False})
print(decoded["exp"]) # Expiry timestamp -
Verify token hasn't been revoked:
- Tokens are revoked on password change
- Tokens are revoked on logout
Redis Connection Issues
"Redis connection refused"
Symptoms: ConnectionRefusedError: Connection refused
Solutions:
-
Verify Redis is running:
redis-cli ping
# Should return: PONG -
Check Redis URL:
echo $REDIS_URL
# Format: redis://host:6379/0 or rediss://... for TLS -
Test connection:
redis-cli -u "$REDIS_URL" ping -
Note: Aragora falls back to in-memory storage automatically
- Check logs for: "Redis unavailable, using in-memory storage"
- Multi-replica deployments will have inconsistent state
Redis timeout errors
Symptoms: TimeoutError: Connection timed out
Solutions:
-
Increase timeout:
export REDIS_TIMEOUT=10 -
Check Redis server load:
redis-cli info stats | grep -E "instantaneous_ops|connected_clients" -
Check network latency:
redis-cli --latency-history
Rate Limiting Issues
"Too many requests" (429 error)
Symptoms: API returns 429 status code
Check rate limit headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704134400
Retry-After: 60
Solutions:
-
Wait for reset (check
Retry-Afterheader) -
Implement exponential backoff:
import time
def call_with_backoff(fn, max_retries=5):
for i in range(max_retries):
try:
return fn()
except RateLimitError:
wait = 2 ** i
time.sleep(wait)
raise Exception("Max retries exceeded") -
Increase limits (admin):
export ARAGORA_RATE_LIMIT_DEFAULT=200
export ARAGORA_RATE_LIMIT_AUTH=20
Default limits:
| Endpoint | Limit |
|---|---|
| Authentication | 10/min |
| Debate creation | 30/min |
| General API | 100/min |
Server Issues
Server Won't Start
Symptoms: Server fails to start or crashes immediately.
Solutions:
-
Check port availability:
lsof -i :8080
# Kill any existing process:
kill -9 <PID> -
Verify Python environment:
python --version # Should be 3.11+
pip list | grep aragora -
Check for import errors:
python -c "from aragora.server.unified_server import UnifiedServer; print('OK')"
High Memory Usage
Solutions:
-
Reduce cache size:
export ARAGORA_CACHE_MAX_ENTRIES=500 -
Lower batch sizes for large debates:
- Reduce
limitparameters in API calls - Use pagination
- Reduce
WebSocket Connection Issues
Connection Fails Immediately
Symptoms: WebSocket disconnects right after connecting.
Solutions:
-
Check server is running:
curl http://localhost:8080/api/health -
Verify CORS settings:
export ARAGORA_ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" -
Check for proxy/firewall blocking WebSocket upgrade:
- WebSocket uses HTTP upgrade handshake
- Some proxies block or don't support this
Connection Drops During Debate
Solutions:
-
Check heartbeat settings:
- Server sends ping every 30 seconds
- Client should respond with pong
-
Network stability:
- Check for intermittent network issues
- Consider adding reconnection logic
-
Review server logs for errors:
# Server logs connection events
grep "WebSocket" server.log
Nomic Loop Problems
Loop Hangs or Doesn't Progress
Symptoms: Nomic loop stuck on a phase.
Solutions:
-
Check phase timeouts:
cat .nomic/circuit_breaker.jsonDefault timeouts:
- context: 300s
- debate: 600s
- design: 300s
- implement: 900s
- verify: 300s
-
Verify API keys are valid:
# Test Anthropic
python -c "import anthropic; c = anthropic.Anthropic(); print('OK')"
# Test OpenAI
python -c "import openai; c = openai.OpenAI(); print('OK')" -
Check for rate limits:
- Review logs for 429 errors
- Reduce concurrent agent count
- Add delays between API calls
-
Review replay events:
cat .nomic/replays/nomic-cycle-*/events.jsonl | tail -20
Phase Failures
Symptoms: Phase fails and rolls back.
Solutions:
-
Check the specific error in events:
grep "error" .nomic/replays/nomic-cycle-*/events.jsonl -
Verify implementation tests pass:
pytest tests/ -x --timeout=60 -
Check for protected file modifications:
- Review CLAUDE.md for protected files
- Nomic loop won't modify protected files
Rollback Issues
Symptoms: Rollback fails or corrupts state.
Solutions:
-
Restore from backup:
# List available backups
ls -la .nomic/backups/
# Restore a specific backup
cp -r .nomic/backups/backup_YYYYMMDD_HHMMSS/* .nomic/ -
Force reset to clean state:
# Backup current state first!
cp -r .nomic .nomic.bak
# Reset nomic state
rm -rf .nomic/replays/*
python -c "from aragora.modes import NomicLoop; NomicLoop().reset()"
Database Issues
Database Validation
Run validation to check database health:
python scripts/migrate_databases.py --validate
Database Corruption
Symptoms: SQLite errors, missing data, crashes on read.
Solutions:
-
Stop all processes accessing the database:
# Find processes
lsof *.db -
Create backup before recovery:
python scripts/migrate_databases.py --backup -
Try SQLite recovery:
sqlite3 corrupted.db ".recover" | sqlite3 recovered.db -
If recovery fails, delete and recreate:
# The server recreates empty databases on startup
rm corrupted.db
aragora serve
Database Locking
Symptoms: "database is locked" errors.
Solutions:
-
Find and kill blocking processes:
lsof *.db
kill <PID> -
Increase timeout:
conn = sqlite3.connect(db_path, timeout=30) -
Use WAL mode (recommended):
conn.execute("PRAGMA journal_mode=WAL")
Running Database Migration
Consolidate multiple databases:
# 1. Create backup
python scripts/migrate_databases.py --backup
# 2. Preview migration plan
python scripts/migrate_databases.py --dry-run
# 3. Execute migration
python scripts/migrate_databases.py --migrate
# 4. Verify
python scripts/migrate_databases.py --report
API Key Configuration
Required Environment Variables
| Provider | Variable | Test Command |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | python -c "import anthropic; print(anthropic.Anthropic().models.list())" |
| OpenAI | OPENAI_API_KEY | python -c "import openai; print(openai.OpenAI().models.list())" |
GEMINI_API_KEY | Check console output on startup | |
| xAI | XAI_API_KEY | Check console output on startup |
Setting Keys
# In .env file
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
# Or export directly
export ANTHROPIC_API_KEY=sk-ant-...
Verifying Keys Work
# Quick test
python -c "
from aragora.agents.api_agents import AnthropicAPIAgent
agent = AnthropicAPIAgent(name=\"anthropic-api\")
print('Anthropic API: OK')
"
Rate Limit Errors
Symptoms: 429 errors, "rate limit exceeded"
Solutions:
-
Reduce concurrent requests:
# In debate config
protocol = DebateProtocol(max_concurrent_agents=2) -
Add delays between calls:
import time
time.sleep(1) # Between API calls -
Use different API tiers:
- Consider upgrading API plan
- Use multiple API keys with rotation
Frontend Issues
Frontend Not Loading
Symptoms: Blank page, loading spinner stuck.
Solutions:
-
Check build succeeded:
cd aragora/live
npm run build -
Verify development server:
npm run dev
# Should be available at http://localhost:3000 -
Check browser console for errors:
- Open DevTools (F12)
- Check Console and Network tabs
-
Verify API backend is running:
curl http://localhost:8080/api/health
WebSocket Not Connecting from Frontend
Solutions:
-
Check CORS configuration:
export ARAGORA_ALLOWED_ORIGINS="http://localhost:3000" -
Verify WebSocket URL in frontend:
// Should match your backend
const WS_URL = 'ws://localhost:8765/ws'; -
Check for HTTPS/WSS mismatch:
- HTTP pages should use ws://
- HTTPS pages must use wss://
Slow Performance
Solutions:
-
Enable production build:
npm run build
npm start # Instead of npm run dev -
Check bundle size:
npm run analyze -
Verify lazy loading is working:
- Check Network tab in DevTools
- Heavy components should load on demand
Getting Help
-
Check documentation:
docs/API_REFERENCE.md- Complete API referencedocs/ENVIRONMENT.md- Environment setupdocs/ARCHITECTURE.md- System overview
-
Review logs:
# Server logs
tail -f server.log
# Nomic loop events
tail -f .nomic/replays/*/events.jsonl -
Report issues:
Quick Diagnostic Commands
# Check system status
python -c "
from aragora.server.unified_server import UnifiedServer
print('Import: OK')
"
# Validate databases
python scripts/migrate_databases.py --validate
# Check API health
curl http://localhost:8080/api/health
# List recent debates
curl http://localhost:8080/api/debates?limit=5
# Get agent leaderboard
curl http://localhost:8080/api/leaderboard?limit=10