API Rate Limits
Audience: This guide is for API consumers who need to understand and work with rate limits. For implementation details and developer documentation, see RATE_LIMITING.md.
Aragora implements rate limiting to ensure fair usage and service stability. This document describes the rate limiting system and how to work within its constraints.
Rate Limit Tiers
Rate limits are based on your subscription tier:
| Tier | Requests/Minute | Burst Size | Notes |
|---|---|---|---|
| Free | 10 | 60 | Unauthenticated or free tier |
| Starter | 50 | 100 | Basic paid plan |
| Professional | 200 | 400 | Standard paid plan |
| Enterprise | 1000 | 2000 | Custom limits available |
What is "Burst Size"?
The burst size allows short spikes above your sustained rate. For example, a free tier user can make up to 60 requests in quick succession, but sustained usage must stay under 10 req/min.
Default Limits
For endpoints without tier-specific limits:
| Limit Type | Default | Environment Variable |
|---|---|---|
| Per-endpoint | 60 req/min | ARAGORA_RATE_LIMIT |
| Per-IP (global) | 120 req/min | ARAGORA_IP_RATE_LIMIT |
| Burst multiplier | 2x | ARAGORA_BURST_MULTIPLIER |
Response Headers
Rate-limited responses include these headers:
X-RateLimit-Limit: 60 # Your rate limit
X-RateLimit-Remaining: 45 # Requests remaining in window
X-RateLimit-Reset: 1705123456 # Unix timestamp when limit resets
Retry-After: 30 # Seconds to wait (only on 429)
Rate Limit Responses
When rate limited, the API returns:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
{
"error": "Rate limit exceeded",
"retry_after": 30,
"limit": 60,
"remaining": 0
}
Best Practices
1. Implement Exponential Backoff
When you receive a 429 response, wait and retry with increasing delays:
import time
import random
def make_request_with_backoff(url, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url)
if response.status_code != 429:
return response
# Get retry delay from header or calculate
retry_after = int(response.headers.get('Retry-After', 0))
if not retry_after:
retry_after = min(60, (2 ** attempt) + random.random())
time.sleep(retry_after)
raise Exception("Max retries exceeded")
2. Monitor Your Usage
Check the X-RateLimit-Remaining header to see how many requests you have left:
response = requests.get(f"\{API_BASE\}/api/debates")
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
if remaining < 10:
print(f"Warning: Only \{remaining\} requests remaining")
3. Batch Operations When Possible
Instead of making many small requests, use batch endpoints:
# Instead of this (10 requests):
for debate_id in debate_ids:
response = requests.get(f"\{API_BASE\}/api/debates/\{debate_id\}")
# Do this (1 request):
response = requests.post(f"\{API_BASE\}/api/debates/batch", json={
"debate_ids": debate_ids
})
4. Cache Responses
Cache responses that don't change frequently:
- Debate results (after completion)
- Agent rankings and ELO scores
- Historical analytics
5. Use Webhooks for Real-Time Updates
Instead of polling for debate status, register webhooks:
requests.post(f"\{API_BASE\}/api/webhooks", json={
"url": "https://your-app.com/webhook",
"events": ["debate.completed", "debate.verdict"]
})
Per-User Rate Limiting
For authenticated users, Aragora implements per-user rate limiting based on user ID rather than IP address. This provides fairer limits when users share IPs (e.g., corporate networks) and prevents abuse via IP rotation.
Action-Based Limits
Different operations have different limits per authenticated user:
| Action | Limit (req/min) | Description |
|---|---|---|
default | 60 | Default for authenticated requests |
debate_create | 10 | Creating new debates |
vote | 30 | Voting on proposals |
agent_call | 120 | Calling agent APIs |
export | 5 | Exporting data |
admin | 300 | Admin operations |
How It Works
- Authenticated requests are rate-limited by
user_id - Unauthenticated requests fall back to IP-based limiting
- Each action has its own bucket (limits don't share)
- Burst multiplier (2x) allows short spikes
Response Headers
Per-user rate-limited responses include additional context:
X-RateLimit-Limit: 10 # Your limit for this action
X-RateLimit-Remaining: 7 # Requests remaining
X-RateLimit-Reset: 1705123456 # Reset timestamp
X-RateLimit-Action: debate_create # Which action was limited
Decorator Usage
Backend handlers use the @user_rate_limit decorator:
from aragora.server.handlers.utils.rate_limit import user_rate_limit
class DebatesHandler(BaseHandler):
@user_rate_limit(action="debate_create")
def _create_debate(self, handler):
# Limited to 10 debates/minute per user
...
@user_rate_limit(action="vote")
def _submit_vote(self, handler):
# Limited to 30 votes/minute per user
...
Checking Your Status
To see your current rate limit status across all actions:
curl -H "Authorization: Bearer $TOKEN" \
https://api.aragora.ai/api/rate-limit/status
Response:
{
"user_id": "user-123",
"limits": {
"debate_create": { "remaining": 8, "limit": 10, "retry_after": 0 },
"vote": { "remaining": 30, "limit": 30, "retry_after": 0 }
}
}
Endpoint-Specific Limits
Some endpoints have stricter limits to prevent abuse:
| Endpoint | Limit | Notes |
|---|---|---|
POST /api/debate | 10/min | Debate creation is expensive |
POST /api/gauntlet | 5/min | Stress tests are resource-intensive |
GET /api/health | 120/min | Higher limit for monitoring |
GET /api/debates | 60/min | Standard list endpoint |
IP-Based Limits
In addition to tier limits, there's a global per-IP limit of 120 req/min across all endpoints. This prevents a single source from overwhelming the API even with multiple accounts.
Trusted Proxies
If you're behind a load balancer or proxy, configure ARAGORA_TRUSTED_PROXIES so rate limits apply to the real client IP:
export ARAGORA_TRUSTED_PROXIES="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
Need Higher Limits?
Enterprise customers can request custom rate limits. Contact support or upgrade your tier at /billing.
Debugging Rate Limits
To check your current rate limit status:
curl -v https://api.aragora.ai/api/health 2>&1 | grep -i ratelimit
Or in your code:
response = requests.get(f"\{API_BASE\}/api/health")
print(f"Limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Resets at: {response.headers.get('X-RateLimit-Reset')}")