Pricing

Understand how AI Router displays model pricing, quotas, and usage-based billing

AI Router is an OpenAI-compatible AI gateway. The Pricing page shows the billing rate for each available model so you can estimate how much quota a request will consume before you call it.

This is a demo deployment. The concrete prices, currency, top-up amounts, and payment methods shown in the UI are not finalized. Treat any numbers you see as placeholders. (定价 / 费率待补 — pricing TBD.)

Where to find it

Open the Pricing entry in the left sidebar, or visit the /pricing path directly. The pricing list is generally viewable without logging in.

The page lists the available models. Each row shows the model name along with its input rate and output rate (the per-token quota the model consumes).

Use the search box at the top to filter by a model-name keyword and jump to a specific model's rate.

The model list, rates, and which models you can see depend on your model group and token group. Different groups may resolve to different billing multipliers, so the rate you see can differ from another account. Actual rates for this demo are still being decided (费率待补 — rates TBD).

How billing works

AI Router bills on a usage basis, measured in quota units. The conceptual model is the same as any OpenAI-compatible gateway:

Input rate — quota consumed per unit of input (prompt) tokens.
Output rate — quota consumed per unit of output (completion) tokens.
Per-request cost ≈ (input tokens × input rate) + (output tokens × output rate). Some models or endpoints may add a fixed per-call component.
Group multipliers — your model/token group can apply a multiplier on top of the base rate, so two users may pay different amounts for the same model.

The exact unit prices, the size of a "unit" of tokens, the multiplier values, and the currency are TBD for this demo (单价 / 倍率 / 货币待补). Do not rely on any displayed amount as a real charge.

Quotas and balance

Each key/token can carry a remaining quota, and your account carries an overall balance. When you call the API:

The gateway resolves the requested model to a provider channel based on your group.
It checks any applicable limits (per-token remaining quota, account balance, and rate/usage limits).
After a successful response, it deducts the computed cost from your remaining quota/balance and records the usage.

You can review consumed quota and per-request detail on your usage / logs page. The displayed balance change reflects the actual deduction for that request.

Top-up and payment

Top-up flows and payment methods are not configured in this demo. No specific payment provider, balance amount, or purchase tier shown here should be treated as real or available (充值方式 / 支付渠道 / 金额待补 — top-up & payment TBD).

When top-up is enabled, the typical flow is:

Go to the top-up / wallet page from your account menu.

Choose an amount or package (amounts and packages: 待补 — TBD).

Complete payment through the configured method (method: 待补 — TBD), after which your balance updates and the new quota becomes available to your keys.

Where to find it

How billing works

Quotas and balance

Top-up and payment

FAQ

On this page

Pricing

Where to find it

How billing works

Quotas and balance

Top-up and payment

FAQ

Why is the rate I see different from someone else's?

How do I estimate cost before calling?

Where do I report incorrect or missing pricing?

On this page