Usage & Quotas

Understand how requests are metered, quota limits, burst behavior, and HTTP 429 rate limiting in IOA Cloud.

What Counts as a Request?

In IOA Cloud, a request is a single governance evaluation event. Each time your application makes an LLM call that passes through IOA's governance layer, it counts as one request.

Request Examples

  • Single LLM Call: 1 request
  • Multi-LLM Consensus (3 models): 1 request (not 3)
  • Failed/Blocked Call: Still counts as 1 request
  • Retries: Each retry counts as a separate request

Note: Consensus mode uses multiple LLMs internally but only counts as one governance request toward your quota. However, consensus requests are tracked separately as a distinct meter for add-on billing.

Consensus Request Metering

When using Consensus Mode (multi-LLM quorum), IOA tracks two separate meters:

Standard Request Meter

What it counts: Number of governance evaluations

Applies to: All modes (Shadow, Enforce, Consensus, Federated)

Example: 1 consensus call with 3 LLMs = 1 standard request

Consensus Request Meter

What it counts: Consensus-specific calls (multi-LLM fan-out)

Applies to: Only Consensus mode

Example: 1 consensus call with 3 LLMs = 1 consensus request

Why separate? Consensus mode has higher compute costs (multiple LLM calls, quorum logic, dissent recording)

Consensus Metering Example

You make 100 LLM calls in a month:

  • 90 calls in Enforce mode → 90 standard requests, 0 consensus requests
  • 10 calls in Consensus mode → 10 standard requests, 10 consensus requests

Total metered: 100 standard requests + 10 consensus requests

Consensus Add-On Pricing

Consensus Packs add to your consensus request allowance:

Add-On Consensus Requests Price
Consensus Pack 10,000 consensus requests/month $299/month
Multiple Packs Stackable (e.g., 2 packs = 20k requests) $299 × quantity

Important: If you exceed your consensus request allowance, consensus-mode calls will be rate-limited. Standard requests (Shadow/Enforce) are unaffected.

Plan Quotas

Each plan includes a monthly request allowance with soft rate limits and burst capabilities.

Launch

Free
Requests/Month 1,000
RPS Soft Limit 2
Burst Allowance 10% (+100 req)

Scale

$49/mo
Requests/Month 25,000
RPS Soft Limit 5
Burst Allowance 10% (+2,500 req)

Trust

Custom
Requests/Month Unlimited
RPS Soft Limit Custom
Burst Allowance Negotiated

Burst Behavior

IOA Cloud allows temporary bursts above your soft rate limit to handle traffic spikes gracefully.

1

Soft Limit Exceeded

When you exceed your RPS soft limit, IOA Cloud allows a 10% burst allowance for up to 1 hour.

2

Burst Window

During the burst window, requests are processed normally but count toward your monthly quota.

3

Rate Limiting

After the burst window expires, requests exceeding the soft limit receive HTTP 429 responses.

Example: Scale Plan Burst

Scale Plan: 5 RPS soft limit
Burst Allowance: 10% = 5.5 RPS for 1 hour
After 1 hour: Requests > 5 RPS return HTTP 429

HTTP 429 Rate Limiting

When you exceed your rate limits, IOA Cloud returns HTTP 429 responses with helpful information.

Rate Limit Exceeded

Response Code: 429 Too Many Requests

Retry-After: Time until rate limit resets

X-RateLimit-Limit: Your current RPS limit

X-RateLimit-Remaining: Requests remaining in current window

Recommended Retry Strategies

Exponential Backoff

Increase delay between retries: 1s, 2s, 4s, 8s, 16s

Jitter

Add random variation to prevent thundering herd

Circuit Breaker

Stop retrying after multiple consecutive failures

Auto-Add-On Purchasing

Scale plan users can enable automatic purchasing of Capacity Packs when approaching quota limits.

Automatic Scaling

Capacity Packs are automatically purchased when you reach 90% of your monthly quota.

Cost Control

Set spending limits and receive notifications before auto-purchases occur.

Seamless Experience

No service interruption - requests continue processing while add-ons are activated.

Frequently Asked Questions

What happens if I exceed my monthly quota?

You'll receive HTTP 429 responses until your quota resets at the beginning of the next month. You can purchase Capacity Packs to increase your quota immediately.

Can I monitor my usage in real-time?

Yes, the IOA Cloud dashboard shows your current usage, remaining quota, and projected monthly consumption based on current trends.

Do failed requests count toward my quota?

Yes, all requests that pass through the governance layer count toward your quota, regardless of whether they succeed or fail.

How accurate is the usage tracking?

Usage is tracked in real-time with 99.9% accuracy. Minor discrepancies may occur due to network latency and retry attempts.

Ready to Get Started?

Choose the plan that fits your usage needs and start building with IOA Cloud.