Skip to main content

API Rate Limiting

This document describes the rate limiting policies for the Aragora API.

Overview

Rate limiting protects the API from abuse and ensures fair usage across all clients. Aragora implements tiered rate limits based on authentication level and endpoint type.

Rate Limit Tiers

Anonymous (Unauthenticated)

Endpoint TypeLimitWindow
Read endpoints60 requests1 minute
Write endpoints10 requests1 minute
WebSocket connections2concurrent

Authenticated (API Key)

Endpoint TypeLimitWindow
Read endpoints1000 requests1 minute
Write endpoints100 requests1 minute
Debate creation20 debates1 hour
WebSocket connections10concurrent

Premium/Enterprise

Endpoint TypeLimitWindow
Read endpoints10000 requests1 minute
Write endpoints1000 requests1 minute
Debate creationUnlimited-
WebSocket connections100concurrent

Endpoint-Specific Limits

High-Cost Endpoints

These endpoints have stricter limits due to computational cost:

EndpointLimitWindow
POST /api/debates20/hourper API key
POST /api/debates/\{id\}/analyze10/hourper API key
POST /api/agents/train5/dayper API key
POST /api/knowledge/ingest100/hourper API key

Bulk Operations

EndpointLimitWindow
POST /api/batch/*10/hourper API key
GET /api/export/*5/hourper API key

Response Headers

All API responses include rate limit headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640000000
X-RateLimit-Policy: authenticated
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when limit resets
X-RateLimit-PolicyActive rate limit policy

Rate Limit Exceeded Response

When rate limit is exceeded, the API returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30

{
"error": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"retry_after": 30,
"limit": 1000,
"reset_at": "2026-01-20T12:00:00Z"
}

Implementation Details

Token Bucket Algorithm

Aragora uses a token bucket algorithm with Redis for distributed rate limiting:

# Rate limiter configuration
RATE_LIMIT_CONFIG = {
"anonymous": {
"read": {"tokens": 60, "interval": 60},
"write": {"tokens": 10, "interval": 60},
},
"authenticated": {
"read": {"tokens": 1000, "interval": 60},
"write": {"tokens": 100, "interval": 60},
},
"premium": {
"read": {"tokens": 10000, "interval": 60},
"write": {"tokens": 1000, "interval": 60},
},
}

Redis Key Structure

ratelimit:\{api_key\}:\{endpoint_type\}:\{window\}

Example:

ratelimit:sk_abc123:read:1640000000

Best Practices

Client Implementation

  1. Respect Retry-After: Always wait the specified time before retrying
  2. Implement exponential backoff: For repeated 429s, increase wait time
  3. Cache responses: Reduce API calls by caching read responses
  4. Batch requests: Use batch endpoints when available

Example: Python Client

import time
import requests
from tenacity import retry, wait_exponential, retry_if_result

def is_rate_limited(response):
return response.status_code == 429

@retry(
retry=retry_if_result(is_rate_limited),
wait=wait_exponential(multiplier=1, min=1, max=60)
)
def api_request(url, **kwargs):
response = requests.get(url, **kwargs)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 30))
time.sleep(retry_after)
return response

Example: JavaScript Client

async function apiRequest(url, options = {}) {
const response = await fetch(url, options);

if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '30');
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
return apiRequest(url, options); // Retry
}

return response;
}

Monitoring

Prometheus Metrics

# Rate limit hits
aragora_ratelimit_hits_total{policy="authenticated", endpoint_type="read"}

# Rate limit rejections
aragora_ratelimit_rejected_total{policy="authenticated", endpoint_type="read"}

# Current token count
aragora_ratelimit_tokens_remaining{api_key_hash="xxx"}

Alerts

AlertConditionSeverity
High Rejection Rate>10% requests rejectedWarning
Single Client Abuse>50% of capacityWarning
Distributed AttackMany clients at limitCritical

Configuration

Environment Variables

VariableDefaultDescription
RATE_LIMIT_ENABLEDtrueEnable/disable rate limiting
RATE_LIMIT_REDIS_URL-Redis URL for distributed limiting
RATE_LIMIT_DEFAULT_TIERanonymousDefault tier for unauth requests

API Key Tier Override

Admins can set custom limits per API key:

# Admin API
POST /api/admin/rate-limits
{
"api_key": "sk_xxx",
"tier": "premium",
"custom_limits": {
"debates_per_hour": 50
}
}

Exemptions

The following are exempt from rate limiting:

  1. Health check endpoint (/api/health)
  2. Metrics endpoint (/metrics)
  3. Internal service-to-service calls (via service mesh)
  4. Whitelisted IP addresses (configurable)

Distributed Rate Limiting

Architecture

In production deployments with multiple server instances, rate limits are coordinated via Redis:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
│ Instance │ │ Instance │ │ Instance │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘

┌──────┴──────┐
│ Redis │
│ Cluster │
└─────────────┘

Usage

from aragora.server.middleware.rate_limit.distributed import (
get_distributed_limiter,
configure_distributed_endpoint,
)

# Get the global limiter
limiter = get_distributed_limiter()

# Configure an endpoint
configure_distributed_endpoint(
endpoint="/api/debates",
requests_per_minute=60,
burst_size=120,
)

# Check rate limit
result = limiter.allow(
client_ip="192.168.1.1",
endpoint="/api/debates",
tenant_id="tenant-123",
)

Environment Variables

VariableDescriptionDefault
ARAGORA_RATE_LIMIT_STRICTRequire Redis (fail-closed mode)false
REDIS_URL / ARAGORA_REDIS_URLRedis connection URLNone
ARAGORA_REDIS_MODERedis mode: standalone, sentinel, clusterstandalone
ARAGORA_INSTANCE_IDUnique server instance identifierAuto-generated

Strict Mode

When ARAGORA_RATE_LIMIT_STRICT=true:

  • Production: Raises error if Redis unavailable (fail-closed)
  • Development: Logs warning and falls back to in-memory

Circuit Breaker

The circuit breaker protects against Redis failures:

States: CLOSED → OPEN → HALF_OPEN → CLOSED
↑ │ │
│ └─────────┘ (failure)
└────────────────────(success)

When the circuit is open, requests fall back to in-memory rate limiting to maintain service availability.

Specialized Limiters

LimiterUse CaseDefault Rate
TenantRateLimiterPer-tenant API limits1000/min
TierRateLimiterSubscription tier-based limitsVaries
UserRateLimiterPer-user rate limiting60/min
PlatformRateLimiterThird-party platform limits30/min
OAuthRateLimiterOAuth endpoint protection10/min

Stats Endpoint

curl http://localhost:8080/api/v1/admin/rate-limits/stats

Returns:

{
"instance_id": "server-1",
"backend": "redis",
"strict_mode": true,
"total_requests": 150000,
"redis_requests": 149500,
"fallback_requests": 500
}

Troubleshooting

Sudden Rate Limit Issues

  1. Check for leaked API keys
  2. Review client implementation for request loops
  3. Check for WebSocket reconnection storms
  4. Verify clock synchronization

Distributed Rate Limiting Issues

Rate limits not shared across instances

  • Verify Redis connectivity: redis-cli ping
  • Check REDIS_URL is set correctly
  • Confirm backend is "redis" in stats

Circuit breaker stuck open

  • Check Redis health
  • Review error logs for connection issues
  • Monitor rate_limit_circuit_breaker_state metric

Capacity Planning

TierUsersExpected RPSRedis Memory
100 anonymous-10010 MB
1000 authenticated-1000100 MB
100 premium-10000100 MB

Testing

# Unit tests
pytest tests/server/middleware/rate_limit/ -v

# Integration tests (requires Redis)
pytest tests/server/middleware/rate_limit/test_distributed_integration.py -v --integration