Last Updated: June 2026

Kimi K2.5 API — Complete Reference for Developers 2026

Quick Answer

Kimi K2.5 API is Moonshot AI's flagship language model offering 128K context window, vision support, and thinking mode capabilities. Pricing is $1.40 per million input tokens and $4.20 per million output tokens. The kimi 2.5 api is OpenAI-compatible and available through both Moonshot's platform and Nvidia's free API Catalog.

What is Kimi K2.5 API

The kimi k2.5 api is the latest flagship model from Moonshot AI, released in 2025 and continuously updated through 2026. It represents the most capable model in the Kimi family, excelling at complex reasoning, code generation, mathematical problem-solving, and multimodal understanding (vision inputs).

The kimi-k2.5 api follows the OpenAI chat completions format, meaning developers familiar with GPT-4 can integrate Kimi K2.5 with minimal code changes. The model is identified as kimi-k2-5 in API requests, with a thinking variant available as kimi-k2-5-thinking.

Kimi K2.5 Specifications

Specification	Value
Model ID	`kimi-k2-5`
Context Window	131,072 tokens (128K)
Max Output Tokens	8,192 tokens
Input Price	$1.40 / million tokens
Output Price	$4.20 / million tokens
Vision Support	Yes (image inputs)
Thinking Mode	Yes (`kimi-k2-5-thinking`)
Function Calling	Yes
Streaming	Yes
OpenAI Compatible	Yes
Free Tier	Available
Nvidia NIM	Free Access

How to Call Kimi K2.5 API with Python and cURL

Here's how to make a basic kimi k2.5 api call using Python with the OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="your-kimi-api-key",
    base_url="https://api.moonshot.cn/v1"
)

response = client.chat.completions.create(
    model="kimi-k2-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=2048
)

print(response.choices[0].message.content)

curl https://api.moonshot.cn/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-kimi-api-key" \
  -d '{
    "model": "kimi-k2-5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-kimi-api-key',
  baseURL: 'https://api.moonshot.cn/v1'
});

const response = await client.chat.completions.create({
  model: 'kimi-k2-5',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in simple terms.' }
  ],
  temperature: 0.7,
  max_tokens: 2048
});

console.log(response.choices[0].message.content);

Is Kimi K2.5 API Better Than GPT-4o

Metric	Kimi K2.5	GPT-4o
Input Price (per 1M tokens)	$1.40	$2.50
Output Price (per 1M tokens)	$4.20	$10.00
Context Window	128K	128K
Vision Support	Yes	Yes
Thinking Mode	Yes	No (separate o3)
Free Tier	Available	Limited
OpenAI SDK Compatible	Yes	Yes (native)
Nvidia NIM Access	Free	No

Kimi K2.5 API is approximately 44% cheaper than GPT-4o for input tokens and 58% cheaper for output tokens, while achieving competitive benchmark scores. For developers prioritizing cost efficiency without sacrificing quality, kimi k2.5 api is a strong alternative to OpenAI's offerings.

Kimi K2.5 API Pricing

For detailed kimi k2.5 api pricing including volume discounts, enterprise plans, and free tier limits, see our complete Kimi API Pricing guide. The kimi 2.5 api cost is among the most competitive for a model of this capability level.

Frequently Asked Questions About Kimi API

What is Kimi K2.5 thinking mode?

Kimi K2.5 thinking mode (kimi-k2-5-thinking) enables chain-of-thought reasoning where the model shows its step-by-step thought process. This improves accuracy on math, logic, and complex reasoning tasks at slightly higher latency and cost.

Can I use Kimi K2.5 API with Python?

Yes. Kimi K2.5 API is OpenAI-compatible. Use the official OpenAI Python SDK (pip install openai) and set base_url to https://api.moonshot.cn/v1. All chat completions parameters work identically.

What are Kimi K2.5 API rate limits?

Rate limits vary by account tier. Free accounts receive approximately 3 requests per minute with daily token limits. Paid accounts have higher limits that scale with usage. Enterprise accounts can request custom rate limits through Moonshot AI.

Summary

The kimi k2.5 api delivers flagship-level AI capabilities at a fraction of GPT-4o's cost, with free access available through Nvidia's API Catalog. Start with our API Key guide to get your first K2.5 call running in minutes.