Usage & Quotas

Understand how requests are metered, quota limits, burst behavior, and HTTP 429 rate limiting in IOA Cloud.

What Counts as a Request?

In IOA Cloud, a request is a single governance evaluation event. Each time your application makes an LLM call that passes through IOA's governance layer, it counts as one request.

Request Examples

  • Single LLM Call: 1 request
  • Multi-LLM Consensus (3 models): 1 request (not 3)
  • Failed/Blocked Call: Still counts as 1 request
  • Retries: Each retry counts as a separate request

Note: Consensus mode uses multiple LLMs internally but still counts as one governance request toward your request quota. Consensus usage is additionally tracked in a separate consensus-token meter for add-on billing.

Consensus Token Metering

When using Consensus Mode (multi-LLM quorum), IOA tracks two separate meters:

Standard Request Meter

What it counts: Number of governance evaluations

Applies to: All modes (Shadow, Enforce, Consensus, Federated)

Example: 1 consensus call with 3 LLMs = 1 standard request

Consensus Token Meter

What it counts: Consensus tokens consumed by multi-LLM quorum runs

Applies to: Only Consensus mode

Example: 1 consensus call consumes model-dependent consensus tokens

Why separate? Consensus mode has higher compute costs (multiple LLM calls, quorum logic, dissent recording)

Consensus Metering Example

You make 100 LLM calls in a month:

  • 90 calls in Enforce mode → 90 standard requests, 0 consensus tokens
  • 10 calls in Consensus mode → 10 standard requests + consensus-token usage for those 10 calls

Total metered: 100 standard requests + consensus tokens consumed

Consensus Add-On Pricing

Consensus Packs add to your consensus token allowance:

Add-On Consensus Tokens Price
Consensus Pack 50,000 consensus tokens/month $49/month
Multiple Packs Stackable (e.g., 2 packs = 100k tokens) $49 × quantity

Important: If you exceed your consensus request allowance, consensus-mode calls will be rate-limited. Standard requests (Shadow/Enforce) are unaffected.

Plan Quotas

Each plan includes a monthly request allowance with soft rate limits and burst capabilities.

Launch

Free
Requests/Month 1,000
RPS Soft Limit 10
Burst Allowance 10% (+100 req)

Scale

$299/mo
Requests/Month 25,000
RPS Soft Limit 100
Burst Allowance 10% (+2,500 req)

Trust

Custom
Requests/Month Unlimited
RPS Soft Limit Custom
Burst Allowance Negotiated

QiXChat Self-Serve Plans

QiXChat plans meter governed requests (RQM) — each policy-evaluated chat turn counts as one request.

QiXChat Solo

$39/mo
Governed RQM 5,000
Seats 1
Modes Shadow

QiXChat Practice

$119/mo
Governed RQM 30,000
Seats 5
Modes Shadow + Enforce

QiXChat Growth

$279/mo
Governed RQM 100,000
Seats 20
Modes Shadow + Enforce + Consensus

Burst Behavior

IOA Cloud allows temporary bursts above your soft rate limit to handle traffic spikes gracefully.

1

Soft Limit Exceeded

When you exceed your RPS soft limit, IOA Cloud allows a 10% burst allowance for up to 1 hour.

2

Burst Window

During the burst window, requests are processed normally but count toward your monthly quota.

3

Rate Limiting

After the burst window expires, requests exceeding the soft limit receive HTTP 429 responses.

Example: Scale Plan Burst

Scale Plan: 100 RPS soft limit
Burst Allowance: 10% = 110 RPS for 1 hour
After 1 hour: Requests > 100 RPS return HTTP 429

HTTP 429 Rate Limiting

When you exceed your rate limits, IOA Cloud returns HTTP 429 responses with helpful information.

Rate Limit Exceeded

Response Code: 429 Too Many Requests

Retry-After: Time until rate limit resets

X-RateLimit-Limit: Your current RPS limit

X-RateLimit-Remaining: Requests remaining in current window

Recommended Retry Strategies

Exponential Backoff

Increase delay between retries: 1s, 2s, 4s, 8s, 16s

Jitter

Add random variation to prevent thundering herd

Circuit Breaker

Stop retrying after multiple consecutive failures

Auto-Add-On Purchasing

Scale plan users can enable automatic purchasing of Capacity Packs when approaching quota limits.

Automatic Scaling

Capacity Packs are automatically purchased when you reach 90% of your monthly quota.

Cost Control

Set spending limits and receive notifications before auto-purchases occur.

Seamless Experience

No service interruption - requests continue processing while add-ons are activated.

Frequently Asked Questions

What happens if I exceed my monthly quota?

You'll receive HTTP 429 responses until your quota resets at the beginning of the next month. You can purchase Capacity Packs to increase your quota immediately.

Can I monitor my usage in real-time?

Yes, the IOA Cloud dashboard shows your current usage, remaining quota, and projected monthly consumption based on current trends.

Do failed requests count toward my quota?

Yes, all requests that pass through the governance layer count toward your quota, regardless of whether they succeed or fail.

How accurate is the usage tracking?

Usage is tracked in real-time with 99.9% accuracy. Minor discrepancies may occur due to network latency and retry attempts.

Ready to Get Started?

Choose the plan that fits your usage needs and start building with IOA Cloud.