Usage & Quotas
Understand how requests are metered, quota limits, burst behavior, and HTTP 429 rate limiting in IOA Cloud.
What Counts as a Request?
In IOA Cloud, a request is a single governance evaluation event. Each time your application makes an LLM call that passes through IOA's governance layer, it counts as one request.
Request Examples
- Single LLM Call: 1 request
- Multi-LLM Consensus (3 models): 1 request (not 3)
- Failed/Blocked Call: Still counts as 1 request
- Retries: Each retry counts as a separate request
Note: Consensus mode uses multiple LLMs internally but only counts as one governance request toward your quota. However, consensus requests are tracked separately as a distinct meter for add-on billing.
Consensus Request Metering
When using Consensus Mode (multi-LLM quorum), IOA tracks two separate meters:
Standard Request Meter
What it counts: Number of governance evaluations
Applies to: All modes (Shadow, Enforce, Consensus, Federated)
Example: 1 consensus call with 3 LLMs = 1 standard request
Consensus Request Meter
What it counts: Consensus-specific calls (multi-LLM fan-out)
Applies to: Only Consensus mode
Example: 1 consensus call with 3 LLMs = 1 consensus request
Why separate? Consensus mode has higher compute costs (multiple LLM calls, quorum logic, dissent recording)
Consensus Metering Example
You make 100 LLM calls in a month:
- 90 calls in Enforce mode → 90 standard requests, 0 consensus requests
- 10 calls in Consensus mode → 10 standard requests, 10 consensus requests
Total metered: 100 standard requests + 10 consensus requests
Consensus Add-On Pricing
Consensus Packs add to your consensus request allowance:
| Add-On | Consensus Requests | Price |
|---|---|---|
| Consensus Pack | 10,000 consensus requests/month | $299/month |
| Multiple Packs | Stackable (e.g., 2 packs = 20k requests) | $299 × quantity |
Important: If you exceed your consensus request allowance, consensus-mode calls will be rate-limited. Standard requests (Shadow/Enforce) are unaffected.
Plan Quotas
Each plan includes a monthly request allowance with soft rate limits and burst capabilities.
Launch
FreeScale
$49/moTrust
CustomBurst Behavior
IOA Cloud allows temporary bursts above your soft rate limit to handle traffic spikes gracefully.
Soft Limit Exceeded
When you exceed your RPS soft limit, IOA Cloud allows a 10% burst allowance for up to 1 hour.
Burst Window
During the burst window, requests are processed normally but count toward your monthly quota.
Rate Limiting
After the burst window expires, requests exceeding the soft limit receive HTTP 429 responses.
Example: Scale Plan Burst
Scale Plan: 5 RPS soft limit
Burst Allowance: 10% = 5.5 RPS for 1 hour
After 1 hour: Requests > 5 RPS return HTTP 429 HTTP 429 Rate Limiting
When you exceed your rate limits, IOA Cloud returns HTTP 429 responses with helpful information.
Rate Limit Exceeded
Response Code: 429 Too Many Requests
Retry-After: Time until rate limit resets
X-RateLimit-Limit: Your current RPS limit
X-RateLimit-Remaining: Requests remaining in current window
Recommended Retry Strategies
Exponential Backoff
Increase delay between retries: 1s, 2s, 4s, 8s, 16s
Jitter
Add random variation to prevent thundering herd
Circuit Breaker
Stop retrying after multiple consecutive failures
Auto-Add-On Purchasing
Scale plan users can enable automatic purchasing of Capacity Packs when approaching quota limits.
Automatic Scaling
Capacity Packs are automatically purchased when you reach 90% of your monthly quota.
Cost Control
Set spending limits and receive notifications before auto-purchases occur.
Seamless Experience
No service interruption - requests continue processing while add-ons are activated.
Frequently Asked Questions
What happens if I exceed my monthly quota?
You'll receive HTTP 429 responses until your quota resets at the beginning of the next month. You can purchase Capacity Packs to increase your quota immediately.
Can I monitor my usage in real-time?
Yes, the IOA Cloud dashboard shows your current usage, remaining quota, and projected monthly consumption based on current trends.
Do failed requests count toward my quota?
Yes, all requests that pass through the governance layer count toward your quota, regardless of whether they succeed or fail.
How accurate is the usage tracking?
Usage is tracked in real-time with 99.9% accuracy. Minor discrepancies may occur due to network latency and retry attempts.
Ready to Get Started?
Choose the plan that fits your usage needs and start building with IOA Cloud.