Ainfera

Route every agent call to the model that finishes the task.

Point your agents at one endpoint and every call goes to the model most likely to finish the task.

model ="ainfera-inference"Drop-in and OpenAI-compatible. That one line is the whole switch.
The routable frontier

No single model leads on both intelligence and price.

Intelligence and price trade off differently for every task, and the frontier moves week to week. Pin one model and you overpay on the calls it isn’t best at — route the frontier and you don’t.

020406080$0.10$1$10$100Reference price · $ / 1M tokens (blended, log) →Intelligence Index ↑Claude Opus 4.7 1M · Anthropic · index 70 · $60/Mtok · coverageGemini 3.1 Flash · Google · index 57 · $0.85/Mtok · coverageZGLM-5 · Z.ai (GLM) · index 50 · $2/Mtok · coverageMiniMax M2.7 (Novita) · MiniMax · index 50 · $0.53/Mtok · coverageQwen3.6 Plus · Alibaba (Qwen) · index 50 · $1/Mtok · coverageGemini 3.1 Flash Lite · Google · index 48 · $0.17/Mtok · coverageZGLM-5-Turbo · Z.ai (GLM) · index 47 · $2/Mtok · coverageDeepseek V4 Flash · DeepSeek · index 46 · $0.13/Mtok · coverageQwen3.6-27B · Alibaba (Qwen) · index 46 · $1/Mtok · coverageQwen3.6-35B-A3B · Alibaba (Qwen) · index 44 · $0.35/Mtok · coverageZGLM-5V-Turbo · Z.ai (GLM) · index 43 · $2/Mtok · coverageZGLM-4.7 · Z.ai (GLM) · index 42 · $1/Mtok · coverageMiniMax M2.5 · MiniMax · index 42 · $0.53/Mtok · coverageQwen3.5-122B-A10B · Alibaba (Qwen) · index 42 · $1/Mtok · coverageQwen3.5-27B · Alibaba (Qwen) · index 42 · $0.80/Mtok · coverageQwen3-Max-Thinking · Alibaba (Qwen) · index 40 · $2/Mtok · coverageMinimax M2.1 · MiniMax · index 39 · $0.53/Mtok · coverageQwen3.5-35B-A3B · Alibaba (Qwen) · index 37 · $0.35/Mtok · coverageDeepseek V3.2 · DeepSeek · index 32 · $0.30/Mtok · coverageQwen3.5-9B · Alibaba (Qwen) · index 32 · $0.11/Mtok · coverageQwen3-Max · Alibaba (Qwen) · index 31 · $2/Mtok · coverageZGLM-4.6 · Z.ai (GLM) · index 30 · $0.76/Mtok · coverageZGLM-4.7-Flash · Z.ai (GLM) · index 30 · $0.14/Mtok · coverageDeepSeek-V3.1 · DeepSeek · index 28 · $0.35/Mtok · coverageDeepSeek-V3.1-Terminus · DeepSeek · index 28 · $0.44/Mtok · coverageQwen3 Coder Next · Alibaba (Qwen) · index 28 · $0.53/Mtok · coverageZGLM-4.5 · Z.ai (GLM) · index 26 · $1/Mtok · coverageQwen3 235B A22B Instruct 2507 · Alibaba (Qwen) · index 25 · $0.21/Mtok · coverageQwen3 Coder 480B A35B Instruct · Alibaba (Qwen) · index 25 · $0.67/Mtok · coverageMiniMax M1 · MiniMax · index 24 · $0.96/Mtok · coverageZzai-org/glm-4.5-air · Z.ai (GLM) · index 23 · $0.31/Mtok · coverageDeepSeek-V3-0324 · DeepSeek · index 22 · $0.34/Mtok · coverageQwen3-VL-235B-A22B-Instruct · Alibaba (Qwen) · index 21 · $0.37/Mtok · coverageQwen QwQ-32B · Alibaba (Qwen) · index 20 · $1/Mtok · coverageQwen3 Coder 30b A3B Instruct · Alibaba (Qwen) · index 20 · $0.12/Mtok · coverageQwen3-Next-80B-A3B-Instruct · Alibaba (Qwen) · index 20 · $0.34/Mtok · coverageZGLM 4.6V · Z.ai (GLM) · index 17 · $0.45/Mtok · coverageQwen3-VL-32B-Instruct · Alibaba (Qwen) · index 17 · $0.75/Mtok · coverageDeepSeek R1 Distill LLama 70B · DeepSeek · index 16 · $0.80/Mtok · coverageDeepSeek R1 Distill Qwen 14B · DeepSeek · index 16 · $2/Mtok · coverageDeepSeek-V3 · DeepSeek · index 16 · $0.46/Mtok · coverageQwen2.5-72B-Instruct · Alibaba (Qwen) · index 16 · $0.37/Mtok · coverageQwen3-VL-30B-A3B-Instruct · Alibaba (Qwen) · index 16 · $0.26/Mtok · coverageqwen/qwen3-vl-8b-instruct · Alibaba (Qwen) · index 14 · $0.18/Mtok · coverageZGLM 4.5V · Z.ai (GLM) · index 13 · $0.90/Mtok · coverageQwen 2.5 Coder 32B Instruct · Alibaba (Qwen) · index 13 · $0.80/Mtok · coverageQwen2 72B Instruct · Alibaba (Qwen) · index 12 · $0.90/Mtok · coverageQwen3 Omni 30B A3B Instruct · Alibaba (Qwen) · index 11 · $0.43/Mtok · coverageDeepSeek R1 Distill Qwen 1.5B · DeepSeek · index 9 · $0.18/Mtok · coverageClaude Opus 4.7 · Anthropic · index 73 · $30/Mtok · preferred coreOpenAI o4-pro · OpenAI · index 72 · $105/Mtok · preferred coreGPT-5.5 · OpenAI · index 70 · $8/Mtok · preferred coreGemini 3.1 Pro · Google · index 68 · $3/Mtok · preferred coreGrok 4 · xAI · index 65 · $8/Mtok · preferred coreClaude Sonnet 4.7 · Anthropic · index 63 · $6/Mtok · preferred coreLlama 4 405B (Together) · Meta · index 62 · $3/Mtok · preferred coreMistral Large 3 · Mistral · index 60 · $3/Mtok · preferred coreGPT-5.5 Mini · OpenAI · index 58 · $0.70/Mtok · preferred coreQwen3.7 Max (Novita) · Alibaba (Qwen) · index 57 · $2/Mtok · preferred coreGrok 4 Mini · xAI · index 55 · $0.88/Mtok · preferred coreMiniMax-M3 · MiniMax · index 55 · $0.53/Mtok · preferred coreMistral Medium 3 · Mistral · index 54 · $2/Mtok · preferred coreDeepSeek V4 Pro (Together) · DeepSeek · index 52 · $2/Mtok · preferred coreZGLM 5.1 (Novita) · Z.ai (GLM) · index 51 · $2/Mtok · preferred coreQwen3.5 397B A17B (DeepInfra) · Alibaba (Qwen) · index 45 · $1/Mtok · preferred coreQwen3.5 397B A17B (Together) · Alibaba (Qwen) · index 45 · $1/Mtok · preferred coreGPT-OSS 120B (Novita) · OpenAI · index 33 · $0.10/Mtok · preferred coreGPT-OSS 20B (Novita) · OpenAI · index 24 · $0.07/Mtok · preferred core
We route inside this frontier, per call — and prefer the proven core.188+ models, refreshed daily — no single model leads on both intelligence and price.

Intelligence: Artificial Analysis · artificialanalysis.ai

Neutral across every provider
Alibaba (Qwen)AnthropicArcee AIBaidu (ERNIE)ByteDance (Seed)DeepCogitoDeepSeekEssential AIGoogleGrypheinclusionAI (Ling)Kwaipilot (Kuaishou)Liquid AIMetaMicrosoft (Phi)MiniMaxMistralMoonshot AINous ResearchNVIDIAOpenAISao10KStepFunxAIXiaomi (MiMo)Z.ai (GLM)
Outcome-aware routing

Pick the model by the result, not the reputation.

Every agent call is hard to place — capability, latency and cost trade off differently each time. Ainfera scores the candidates against the task and routes to the one most likely to finish it.

  • Per-call scoring across capability, latency and cost
  • One endpoint, every provider and open model
  • Deterministic fallbacks when a route degrades
  • Policy controls for cost ceilings and data residency
Route · agent.call()live
routingtulkas.call()
gpt-oss-120b-novita
gpt-5-5
gpt-oss-20b-novita
Evaluation-driven

The router learns from what actually shipped.

Outcomes feed back. Ainfera scores completed calls — with automated evals and your own signals — and the routing improves with every result instead of staying frozen at launch.

  • LLM-as-judge and task-specific scoring
  • Your production signals as routing weight
  • Win-rates tracked per model, per task type
  • Offline replay before any policy change ships
Eval · win-rate by task
Win-rates publish when routed-outcome volume clears the empirical threshold.
Observability

See exactly why each call went where it did.

Every route is a record: the candidates considered, the scores, the decision, the result. Audit any call, replay any decision, and keep the whole path inspectable.

  • Full decision trace for every routed call
  • Candidate scores and the chosen route, retained
  • Cost and latency attributed per provider
  • Export to your stack via OpenTelemetry
Trace · live auditlive
timehash · agentevent · modelseq
00:21:51
0x1f50…8aaa · tulkas
provider ok · novita
3,834
00:21:51
0x1c21…db99 · tulkas
created
3,835
00:21:51
0xe4f4…484e · tulkas
debited
3,833
00:21:37
0xc9d0…89a6 · tulkas
routed · gpt-oss-120b-novita
3,832
00:21:37
0xfe08…3a58 · tulkas
request · gpt-oss-120b-novita
3,830
00:21:37
0x1f22…9327 · tulkas
debited
3,831
How it works

Send, route, complete.

01

Send

Point your agent at one Ainfera endpoint. No SDK lock-in — keep your framework, change the base URL.

02

Route

Ainfera scores every eligible model against the task and routes to the one most likely to finish it, within your policy.

03

Complete

The result returns, the outcome is scored, and the next routing decision is a little sharper than the last.

Proof

Every decision signed, on a public chain.

Every routed call is hashed, Ed25519-signed, and appended to an append-only public chain. No account, no key, no dashboard claim — re-hash it yourself.

Trace · live auditlive
timehash · agentevent · modelseq
00:21:51
0x1f50…8aaa · tulkas
provider ok · novita
3,834
00:21:51
0x1c21…db99 · tulkas
created
3,835
00:21:51
0xe4f4…484e · tulkas
debited
3,833
00:21:37
0xc9d0…89a6 · tulkas
routed · gpt-oss-120b-novita
3,832
00:21:37
0xfe08…3a58 · tulkas
request · gpt-oss-120b-novita
3,830
00:21:37
0x1f22…9327 · tulkas
debited
3,831
verify — no key required
# the public chain is keyless
curl https://api.ainfera.ai/v1/audit/public

# → each entry: the routed model, provider,
#   sequence, block height and the Ed25519
#   signature. Re-hash it yourself to verify.

Our own fleet of seven production agents routes every call through ainfera-inference — verify their decisions live on the public chain.

  • namo
  • varda
  • yavanna
  • tulkas
  • aule
  • ulmo
  • vaire
live audit feed · block #12,650
Built for agents, not chatbots

Your agent calls one endpoint. We place every call.

Point an agent at Ainfera and outcome-aware routing handles cost, latency and task completion — neutral across providers, every call signed and on the audit chain. We run our own fleet on it too.

One endpoint, every model

No SDK rewrite and no model to pin. Two strings on your existing OpenAI- or Anthropic-style client.

Keeps working when a provider degrades

Routing spreads across model brands with deterministic fallbacks, so one bad provider doesn't stall the agent.

Accountable by default

Every routed call is an Ed25519-signed record on a public chain — spend and decisions are a line item, not a mystery.

vs calling direct · vs a gateway

Built for agents, not dashboards.

Calling directA static gatewayAinfera
Model choiceYou pin one model and babysit itA fixed routing table you hand-tuneRouted per call to the model that finishes the task
ProofLogs you keep yourselfA dashboard claim, not a recordEvery decision signed on a public chain
CostYou eat every price and latency regressionA markup, seats, or a per-agent taxYou pay from the savings routing proves — never a markup
IntegrationPer-provider SDKs to maintainA new client to adoptTwo strings on the client you already have

Your all-in is always below calling direct. If a month isn't, the difference is free.

Neutral by design

Never paid to pick a provider.

Routing is neutral across every provider. We only make money when routing saves you money — so we're never paid to send your agents somewhere worse.

246
Models · one endpoint
#12,650
Signed audit blocks · cumulative
Ed25519
Audit signature
active
Routing · status

Stop picking models. Start finishing tasks.

One endpoint, every provider, each call routed to the model that will complete it.

Your all-in is always below calling direct. If a month isn't, the difference is free.

routing · activeblock #12,650models · 246audit · on-chainainfera · the inference of ai agents