Kimi K2.5 API — Complete Reference for Developers 2026
Quick Answer
Kimi K2.5 API is Moonshot AI's flagship language model offering 128K context window, vision support, and thinking mode capabilities. Pricing is $1.40 per million input tokens and $4.20 per million output tokens. The kimi 2.5 api is OpenAI-compatible and available through both Moonshot's platform and Nvidia's free API Catalog.
What is Kimi K2.5 API
The kimi k2.5 api is the latest flagship model from Moonshot AI, released in 2025 and continuously updated through 2026. It represents the most capable model in the Kimi family, excelling at complex reasoning, code generation, mathematical problem-solving, and multimodal understanding (vision inputs).
The kimi-k2.5 api follows the OpenAI chat completions format,
meaning developers familiar with GPT-4 can integrate Kimi K2.5 with minimal code
changes. The model is identified as kimi-k2-5 in API requests, with
a thinking variant available as kimi-k2-5-thinking.
Kimi K2.5 Specifications
| Specification | Value |
|---|---|
| Model ID | kimi-k2-5 |
| Context Window | 131,072 tokens (128K) |
| Max Output Tokens | 8,192 tokens |
| Input Price | $1.40 / million tokens |
| Output Price | $4.20 / million tokens |
| Vision Support | Yes (image inputs) |
| Thinking Mode | Yes (kimi-k2-5-thinking) |
| Function Calling | Yes |
| Streaming | Yes |
| OpenAI Compatible | Yes |
| Free Tier | Available |
| Nvidia NIM | Free Access |
How to Call Kimi K2.5 API with Python and cURL
Here's how to make a basic kimi k2.5 api call using Python with the OpenAI SDK:
from openai import OpenAI
client = OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
response = client.chat.completions.create(
model="kimi-k2-5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.7,
max_tokens=2048
)
print(response.choices[0].message.content) curl https://api.moonshot.cn/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-kimi-api-key" \
-d '{
"model": "kimi-k2-5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 2048
}' import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-kimi-api-key',
baseURL: 'https://api.moonshot.cn/v1'
});
const response = await client.chat.completions.create({
model: 'kimi-k2-5',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing in simple terms.' }
],
temperature: 0.7,
max_tokens: 2048
});
console.log(response.choices[0].message.content); Is Kimi K2.5 API Better Than GPT-4o
| Metric | Kimi K2.5 | GPT-4o |
|---|---|---|
| Input Price (per 1M tokens) | $1.40 | $2.50 |
| Output Price (per 1M tokens) | $4.20 | $10.00 |
| Context Window | 128K | 128K |
| Vision Support | Yes | Yes |
| Thinking Mode | Yes | No (separate o3) |
| Free Tier | Available | Limited |
| OpenAI SDK Compatible | Yes | Yes (native) |
| Nvidia NIM Access | Free | No |
Kimi K2.5 API is approximately 44% cheaper than GPT-4o for input tokens and 58% cheaper for output tokens, while achieving competitive benchmark scores. For developers prioritizing cost efficiency without sacrificing quality, kimi k2.5 api is a strong alternative to OpenAI's offerings.
Kimi K2.5 API Pricing
For detailed kimi k2.5 api pricing including volume discounts, enterprise plans, and free tier limits, see our complete Kimi API Pricing guide. The kimi 2.5 api cost is among the most competitive for a model of this capability level.
Frequently Asked Questions About Kimi API
What is Kimi K2.5 thinking mode?
Kimi K2.5 thinking mode (kimi-k2-5-thinking) enables chain-of-thought reasoning where the model shows its step-by-step thought process. This improves accuracy on math, logic, and complex reasoning tasks at slightly higher latency and cost.
Can I use Kimi K2.5 API with Python?
Yes. Kimi K2.5 API is OpenAI-compatible. Use the official OpenAI Python SDK (pip install openai) and set base_url to https://api.moonshot.cn/v1. All chat completions parameters work identically.
What are Kimi K2.5 API rate limits?
Rate limits vary by account tier. Free accounts receive approximately 3 requests per minute with daily token limits. Paid accounts have higher limits that scale with usage. Enterprise accounts can request custom rate limits through Moonshot AI.
Summary
The kimi k2.5 api delivers flagship-level AI capabilities at a fraction of GPT-4o's cost, with free access available through Nvidia's API Catalog. Start with our API Key guide to get your first K2.5 call running in minutes.