# Ainfera — The Inference of AI Agents > Ainfera is outcome-aware inference routing for AI agents. Point your agents at one endpoint and every call is routed to the model most likely to finish the task — scored on the result, neutral across providers. You pay a share of what routing saves you versus your own baseline, so your all-in cost is always lower than calling providers directly. This file gives language models and agents a complete, plain-text overview of Ainfera. The human-facing site is at https://ainfera.ai. ## What Ainfera does AI agents make many model calls, and the best model for each call differs by task — capability, latency, and cost trade off differently every time, and the right choice keeps changing as models and prices change. Ainfera places each call on the model most likely to complete it, learns from the outcome, and keeps improving. It is neutral by design: Ainfera does not own a model and is not steering you toward one provider. ## One-line integration Ainfera exposes a drop-in, OpenAI-compatible API. Change one line and keep the rest of your request as-is: - Set `model = "ainfera-inference"` - Point the base URL at `https://api.ainfera.ai` - Authenticate with an Ainfera key (one `ainfera_*` key per agent) That is the whole change. Exact request examples are in the Quickstart: https://ainfera.ai/quickstart ## How routing works - Each call's candidate models are scored across capability, latency, and cost for that specific task. - The call is routed to the highest-scoring candidate; deterministic fallbacks cover a degraded route. - Completed calls are scored (automated evaluations plus your own production signals), and those outcomes feed back as routing weight — the policy is not frozen at launch. - Policy changes are replayed offline before they ship. ## Audit and observability Every routed call is a record: the candidates considered, their scores, the decision made, and the result. Any decision can be inspected and replayed. Cost and latency are attributed per provider, and traces export to your stack via OpenTelemetry. ## Pricing Ainfera charges a share of the savings that routing creates versus your own baseline — never a markup on tokens. If routing does not save you money, Ainfera does not make any, and your all-in cost is always lower than calling providers directly. Details: https://ainfera.ai/pricing ## The routable catalog One endpoint reaches frontier and open models across every major provider. The current catalog is published at https://ainfera.ai/models ## Who it is for - Builders and teams running AI agents in production who want the best result per call without hand-maintaining model routing. - The agents themselves: one stable endpoint and model string, so an agent's stack does not have to change as the model landscape does. ## Key facts - Product: Ainfera Inference — route every call to the best model, score the outcome, keep every decision auditable. - Wire string: `ainfera-inference` - API: https://api.ainfera.ai (OpenAI-compatible) - Dashboard: https://app.ainfera.ai - Neutral across providers; Ainfera does not own a model. - Pricing: a share of savings versus your baseline; all-in cost always below calling providers directly. ## Links - Quickstart: https://ainfera.ai/quickstart - Docs: https://ainfera.ai/docs - Models: https://ainfera.ai/models - How routing works: https://ainfera.ai/routing - Pricing: https://ainfera.ai/pricing - Contact: https://ainfera.ai/contact