API Rate Limits

Audience: This guide is for API consumers who need to understand and work with rate limits. For implementation details and developer documentation, see RATE_LIMITING.md.

Aragora implements rate limiting to ensure fair usage and service stability. This document describes the rate limiting system and how to work within its constraints.

Rate Limit Tiers

Rate limits are based on your subscription tier:

Tier	Requests/Minute	Burst Size	Notes
Free	10	60	Unauthenticated or free tier
Starter	50	100	Basic paid plan
Professional	200	400	Standard paid plan
Enterprise	1000	2000	Custom limits available

What is "Burst Size"?

The burst size allows short spikes above your sustained rate. For example, a free tier user can make up to 60 requests in quick succession, but sustained usage must stay under 10 req/min.

Default Limits

For endpoints without tier-specific limits:

Limit Type	Default	Environment Variable
Per-endpoint	60 req/min	`ARAGORA_RATE_LIMIT`
Per-IP (global)	120 req/min	`ARAGORA_IP_RATE_LIMIT`
Burst multiplier	2x	`ARAGORA_BURST_MULTIPLIER`

Response Headers

Rate-limited responses include these headers:

X-RateLimit-Limit: 60          # Your rate limit
X-RateLimit-Remaining: 45      # Requests remaining in window
X-RateLimit-Reset: 1705123456  # Unix timestamp when limit resets
Retry-After: 30                # Seconds to wait (only on 429)

Rate Limit Responses

When rate limited, the API returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30

{
  "error": "Rate limit exceeded",
  "retry_after": 30,
  "limit": 60,
  "remaining": 0
}

Best Practices

1. Implement Exponential Backoff

When you receive a 429 response, wait and retry with increasing delays:

import time
import random

def make_request_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)

        if response.status_code != 429:
            return response

        # Get retry delay from header or calculate
        retry_after = int(response.headers.get('Retry-After', 0))
        if not retry_after:
            retry_after = min(60, (2 ** attempt) + random.random())

        time.sleep(retry_after)

    raise Exception("Max retries exceeded")

2. Monitor Your Usage

Check the X-RateLimit-Remaining header to see how many requests you have left:

response = requests.get(f"\{API_BASE\}/api/debates")
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))

if remaining < 10:
    print(f"Warning: Only \{remaining\} requests remaining")

3. Batch Operations When Possible

Instead of making many small requests, use batch endpoints:

# Instead of this (10 requests):
for debate_id in debate_ids:
    response = requests.get(f"\{API_BASE\}/api/debates/\{debate_id\}")

# Do this (1 request):
response = requests.post(f"\{API_BASE\}/api/debates/batch", json={
    "debate_ids": debate_ids
})

4. Cache Responses

Cache responses that don't change frequently:

Debate results (after completion)
Agent rankings and ELO scores
Historical analytics

5. Use Webhooks for Real-Time Updates

Instead of polling for debate status, register webhooks:

requests.post(f"\{API_BASE\}/api/webhooks", json={
    "url": "https://your-app.com/webhook",
    "events": ["debate.completed", "debate.verdict"]
})

Per-User Rate Limiting

For authenticated users, Aragora implements per-user rate limiting based on user ID rather than IP address. This provides fairer limits when users share IPs (e.g., corporate networks) and prevents abuse via IP rotation.

Action-Based Limits

Different operations have different limits per authenticated user:

Action	Limit (req/min)	Description
`default`	60	Default for authenticated requests
`debate_create`	10	Creating new debates
`vote`	30	Voting on proposals
`agent_call`	120	Calling agent APIs
`export`	5	Exporting data
`admin`	300	Admin operations

How It Works

Authenticated requests are rate-limited by user_id
Unauthenticated requests fall back to IP-based limiting
Each action has its own bucket (limits don't share)
Burst multiplier (2x) allows short spikes

Response Headers

Per-user rate-limited responses include additional context:

X-RateLimit-Limit: 10           # Your limit for this action
X-RateLimit-Remaining: 7        # Requests remaining
X-RateLimit-Reset: 1705123456   # Reset timestamp
X-RateLimit-Action: debate_create  # Which action was limited

Decorator Usage

Backend handlers use the @user_rate_limit decorator:

from aragora.server.handlers.utils.rate_limit import user_rate_limit

class DebatesHandler(BaseHandler):
    @user_rate_limit(action="debate_create")
    def _create_debate(self, handler):
        # Limited to 10 debates/minute per user
        ...

    @user_rate_limit(action="vote")
    def _submit_vote(self, handler):
        # Limited to 30 votes/minute per user
        ...

Checking Your Status

To see your current rate limit status across all actions:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.aragora.ai/api/rate-limit/status

Response:

{
  "user_id": "user-123",
  "limits": {
    "debate_create": { "remaining": 8, "limit": 10, "retry_after": 0 },
    "vote": { "remaining": 30, "limit": 30, "retry_after": 0 }
  }
}

Endpoint-Specific Limits

Some endpoints have stricter limits to prevent abuse:

Endpoint	Limit	Notes
`POST /api/debate`	10/min	Debate creation is expensive
`POST /api/gauntlet`	5/min	Stress tests are resource-intensive
`GET /api/health`	120/min	Higher limit for monitoring
`GET /api/debates`	60/min	Standard list endpoint

IP-Based Limits

In addition to tier limits, there's a global per-IP limit of 120 req/min across all endpoints. This prevents a single source from overwhelming the API even with multiple accounts.

Trusted Proxies

If you're behind a load balancer or proxy, configure ARAGORA_TRUSTED_PROXIES so rate limits apply to the real client IP:

export ARAGORA_TRUSTED_PROXIES="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"

Need Higher Limits?

Enterprise customers can request custom rate limits. Contact support or upgrade your tier at /billing.

Debugging Rate Limits

To check your current rate limit status:

curl -v https://api.aragora.ai/api/health 2>&1 | grep -i ratelimit

Or in your code:

response = requests.get(f"\{API_BASE\}/api/health")
print(f"Limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Resets at: {response.headers.get('X-RateLimit-Reset')}")

Rate Limit Tiers​

What is "Burst Size"?​

Default Limits​

Response Headers​

Rate Limit Responses​

Best Practices​

1. Implement Exponential Backoff​

2. Monitor Your Usage​

3. Batch Operations When Possible​

4. Cache Responses​

5. Use Webhooks for Real-Time Updates​

Per-User Rate Limiting​

Action-Based Limits​

How It Works​

Response Headers​

Decorator Usage​

Checking Your Status​

Endpoint-Specific Limits​

IP-Based Limits​

Trusted Proxies​

Need Higher Limits?​

Debugging Rate Limits​

Rate Limit Tiers

What is "Burst Size"?

Default Limits

Response Headers

Rate Limit Responses

Best Practices

1. Implement Exponential Backoff

2. Monitor Your Usage

3. Batch Operations When Possible

4. Cache Responses

5. Use Webhooks for Real-Time Updates

Per-User Rate Limiting

Action-Based Limits

How It Works

Response Headers

Decorator Usage

Checking Your Status

Endpoint-Specific Limits

IP-Based Limits

Trusted Proxies

Need Higher Limits?

Debugging Rate Limits