Usage & Quotas
Understand how requests are metered, quota limits, burst behavior, and HTTP 429 rate limiting in IOA Cloud.
What Counts as a Request?
In IOA Cloud, a request is a single governance evaluation event. Each time your application makes an LLM call that passes through IOA's governance layer, it counts as one request.
Request Examples
- Single LLM Call: 1 request
- Multi-LLM Consensus (3 models): 1 request (not 3)
- Failed/Blocked Call: Still counts as 1 request
- Retries: Each retry counts as a separate request
Note: Consensus mode uses multiple LLMs internally but still counts as one governance request toward your request quota. Consensus usage is additionally tracked in a separate consensus-token meter for add-on billing.
Consensus Token Metering
When using Consensus Mode (multi-LLM quorum), IOA tracks two separate meters:
Standard Request Meter
What it counts: Number of governance evaluations
Applies to: All modes (Shadow, Enforce, Consensus, Federated)
Example: 1 consensus call with 3 LLMs = 1 standard request
Consensus Token Meter
What it counts: Consensus tokens consumed by multi-LLM quorum runs
Applies to: Only Consensus mode
Example: 1 consensus call consumes model-dependent consensus tokens
Why separate? Consensus mode has higher compute costs (multiple LLM calls, quorum logic, dissent recording)
Consensus Metering Example
You make 100 LLM calls in a month:
- 90 calls in Enforce mode → 90 standard requests, 0 consensus tokens
- 10 calls in Consensus mode → 10 standard requests + consensus-token usage for those 10 calls
Total metered: 100 standard requests + consensus tokens consumed
Consensus Add-On Pricing
Consensus Packs add to your consensus token allowance:
| Add-On | Consensus Tokens | Price |
|---|---|---|
| Consensus Pack | 50,000 consensus tokens/month | $49/month |
| Multiple Packs | Stackable (e.g., 2 packs = 100k tokens) | $49 × quantity |
Important: If you exceed your consensus request allowance, consensus-mode calls will be rate-limited. Standard requests (Shadow/Enforce) are unaffected.
Plan Quotas
Each plan includes a monthly request allowance with soft rate limits and burst capabilities.
Launch
FreeScale
$299/moTrust
CustomQiXChat Self-Serve Plans
QiXChat plans meter governed requests (RQM) — each policy-evaluated chat turn counts as one request.
QiXChat Solo
$39/moQiXChat Practice
$119/moQiXChat Growth
$279/moBurst Behavior
IOA Cloud allows temporary bursts above your soft rate limit to handle traffic spikes gracefully.
Soft Limit Exceeded
When you exceed your RPS soft limit, IOA Cloud allows a 10% burst allowance for up to 1 hour.
Burst Window
During the burst window, requests are processed normally but count toward your monthly quota.
Rate Limiting
After the burst window expires, requests exceeding the soft limit receive HTTP 429 responses.
Example: Scale Plan Burst
Scale Plan: 100 RPS soft limit
Burst Allowance: 10% = 110 RPS for 1 hour
After 1 hour: Requests > 100 RPS return HTTP 429 HTTP 429 Rate Limiting
When you exceed your rate limits, IOA Cloud returns HTTP 429 responses with helpful information.
Rate Limit Exceeded
Response Code: 429 Too Many Requests
Retry-After: Time until rate limit resets
X-RateLimit-Limit: Your current RPS limit
X-RateLimit-Remaining: Requests remaining in current window
Recommended Retry Strategies
Exponential Backoff
Increase delay between retries: 1s, 2s, 4s, 8s, 16s
Jitter
Add random variation to prevent thundering herd
Circuit Breaker
Stop retrying after multiple consecutive failures
Auto-Add-On Purchasing
Scale plan users can enable automatic purchasing of Capacity Packs when approaching quota limits.
Automatic Scaling
Capacity Packs are automatically purchased when you reach 90% of your monthly quota.
Cost Control
Set spending limits and receive notifications before auto-purchases occur.
Seamless Experience
No service interruption - requests continue processing while add-ons are activated.
Frequently Asked Questions
What happens if I exceed my monthly quota?
You'll receive HTTP 429 responses until your quota resets at the beginning of the next month. You can purchase Capacity Packs to increase your quota immediately.
Can I monitor my usage in real-time?
Yes, the IOA Cloud dashboard shows your current usage, remaining quota, and projected monthly consumption based on current trends.
Do failed requests count toward my quota?
Yes, all requests that pass through the governance layer count toward your quota, regardless of whether they succeed or fail.
How accurate is the usage tracking?
Usage is tracked in real-time with 99.9% accuracy. Minor discrepancies may occur due to network latency and retry attempts.
Ready to Get Started?
Choose the plan that fits your usage needs and start building with IOA Cloud.