Last Updated: June 2026

Kimi K2.5 API — Complete Reference for Developers 2026

Quick Answer

Kimi K2.5 API is Moonshot AI's flagship language model offering 128K context window, vision support, and thinking mode capabilities. Pricing is $1.40 per million input tokens and $4.20 per million output tokens. The kimi 2.5 api is OpenAI-compatible and available through both Moonshot's platform and Nvidia's free API Catalog.

What is Kimi K2.5 API

The kimi k2.5 api is the latest flagship model from Moonshot AI, released in 2025 and continuously updated through 2026. It represents the most capable model in the Kimi family, excelling at complex reasoning, code generation, mathematical problem-solving, and multimodal understanding (vision inputs).

The kimi-k2.5 api follows the OpenAI chat completions format, meaning developers familiar with GPT-4 can integrate Kimi K2.5 with minimal code changes. The model is identified as kimi-k2-5 in API requests, with a thinking variant available as kimi-k2-5-thinking.

Kimi K2.5 Specifications

SpecificationValue
Model IDkimi-k2-5
Context Window131,072 tokens (128K)
Max Output Tokens8,192 tokens
Input Price$1.40 / million tokens
Output Price$4.20 / million tokens
Vision SupportYes (image inputs)
Thinking ModeYes (kimi-k2-5-thinking)
Function CallingYes
StreamingYes
OpenAI CompatibleYes
Free TierAvailable
Nvidia NIMFree Access

How to Call Kimi K2.5 API with Python and cURL

Here's how to make a basic kimi k2.5 api call using Python with the OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="your-kimi-api-key",
    base_url="https://api.moonshot.cn/v1"
)

response = client.chat.completions.create(
    model="kimi-k2-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=2048
)

print(response.choices[0].message.content)
curl https://api.moonshot.cn/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-kimi-api-key" \
  -d '{
    "model": "kimi-k2-5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-kimi-api-key',
  baseURL: 'https://api.moonshot.cn/v1'
});

const response = await client.chat.completions.create({
  model: 'kimi-k2-5',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in simple terms.' }
  ],
  temperature: 0.7,
  max_tokens: 2048
});

console.log(response.choices[0].message.content);

Is Kimi K2.5 API Better Than GPT-4o

MetricKimi K2.5GPT-4o
Input Price (per 1M tokens)$1.40$2.50
Output Price (per 1M tokens)$4.20$10.00
Context Window128K128K
Vision SupportYesYes
Thinking ModeYesNo (separate o3)
Free TierAvailableLimited
OpenAI SDK CompatibleYesYes (native)
Nvidia NIM AccessFreeNo

Kimi K2.5 API is approximately 44% cheaper than GPT-4o for input tokens and 58% cheaper for output tokens, while achieving competitive benchmark scores. For developers prioritizing cost efficiency without sacrificing quality, kimi k2.5 api is a strong alternative to OpenAI's offerings.

Kimi K2.5 API Pricing

For detailed kimi k2.5 api pricing including volume discounts, enterprise plans, and free tier limits, see our complete Kimi API Pricing guide. The kimi 2.5 api cost is among the most competitive for a model of this capability level.

Frequently Asked Questions About Kimi API

What is Kimi K2.5 thinking mode?

Kimi K2.5 thinking mode (kimi-k2-5-thinking) enables chain-of-thought reasoning where the model shows its step-by-step thought process. This improves accuracy on math, logic, and complex reasoning tasks at slightly higher latency and cost.

Can I use Kimi K2.5 API with Python?

Yes. Kimi K2.5 API is OpenAI-compatible. Use the official OpenAI Python SDK (pip install openai) and set base_url to https://api.moonshot.cn/v1. All chat completions parameters work identically.

What are Kimi K2.5 API rate limits?

Rate limits vary by account tier. Free accounts receive approximately 3 requests per minute with daily token limits. Paid accounts have higher limits that scale with usage. Enterprise accounts can request custom rate limits through Moonshot AI.

Summary

The kimi k2.5 api delivers flagship-level AI capabilities at a fraction of GPT-4o's cost, with free access available through Nvidia's API Catalog. Start with our API Key guide to get your first K2.5 call running in minutes.