OpenAI API Pricing: Every Model, Every Cost
The OpenAI API offers a range of models at different price points, from the budget GPT-4o-mini at $0.15 per million input tokens to the frontier O3-Pro at $150.00 per million tokens. This page covers every available model with input, output, cached, and batch pricing so you can choose the right model for your workload and budget.
GPT-5 Family
| Model | Input / 1M | Output / 1M | Cached Input | Batch (In/Out) | Notes |
|---|---|---|---|---|---|
| GPT-5 | $1.25 | $10.00 | $0.625 | $0.625 / $5.00 | Flagship model |
| GPT-5-mini | $0.25 | $2.00 | $0.125 | $0.125 / $1.00 | Budget GPT-5 |
| GPT-5.2 Instant | $0.50 | $3.00 | $0.25 | $0.25 / $1.50 | Fast variant (Go plan model) |
Reasoning Models
| Model | Input / 1M | Output / 1M | Cached Input | Batch (In/Out) | Notes |
|---|---|---|---|---|---|
| O3 | $2.00 | $8.00 | $1.00 | $1.00 / $4.00 | Advanced reasoning |
| O3-mini | $0.50 | $2.00 | $0.25 | $0.25 / $1.00 | Budget reasoning |
| O3-Pro | $150.00 | $150.00 | N/A | N/A | Maximum capability |
GPT-4o Family (Legacy)
| Model | Input / 1M | Output / 1M | Cached Input | Batch (In/Out) | Notes |
|---|---|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | $1.25 | $1.25 / $5.00 | Previous flagship |
| GPT-4o-mini | $0.15 | $0.60 | $0.075 | $0.075 / $0.30 | Cheapest model |
Embedding Models
| Model | Input / 1M | Output / 1M | Cached Input | Batch (In/Out) | Notes |
|---|---|---|---|---|---|
| text-embedding-3-large | $0.13 | - | - | $0.065 | Best quality embeddings |
| text-embedding-3-small | $0.02 | - | - | $0.01 | Budget embeddings |
All prices in USD per million tokens unless noted. Source: developers.openai.com/api/docs/pricing. Calculate your monthly cost
How Tokens Work
What Is a Token?
Tokens are the fundamental units that language models process. In English, one token roughly equals 4 characters or about 0.75 words. A typical sentence contains 15-20 tokens. A full page of text (approximately 500 words) contains about 670 tokens. Common words like "the", "is", and "a" are single tokens, while longer or rarer words may be split into multiple tokens. Punctuation marks are usually separate tokens.
Quick reference:
- 1 sentence~15-20 tokens
- 1 paragraph (100 words)~130 tokens
- 1 page (500 words)~670 tokens
- 1 book (80,000 words)~107K tokens
Input vs Output Pricing
You pay separately for input tokens (what you send to the model, including your prompt and any context) and output tokens (what the model generates in response). Output tokens are always more expensive, typically 2-8x the input cost, because generating new text requires more computation than reading existing text.
This pricing structure means you can optimise costs by sending concise prompts (reducing input tokens) and requesting concise responses with the max_tokens parameter (reducing output tokens). For GPT-5, the output-to-input price ratio is 8x ($10.00 vs $1.25), making output optimisation particularly impactful.
Cost Reduction Features
Batch API (50% Discount)
The Batch API allows you to submit large sets of requests for asynchronous processing within a 24-hour window. In exchange for the flexibility on timing, OpenAI charges exactly half the standard rate for both input and output tokens. The results are identical to real-time API calls - same models, same quality, same format.
Best for: content generation pipelines, data classification, email processing, document summarisation, and any workload where you do not need instant results. Not suitable for chatbots, real-time applications, or interactive user experiences.
Prompt Caching (50% Input Discount)
When you send the same prompt prefix repeatedly (common in chatbots and RAG applications), OpenAI caches the processed tokens and charges half the standard input rate for cached tokens. This happens automatically - you do not need to opt in. The cache is maintained per-model and typically expires after minutes of inactivity.
Best for: chatbots with system prompts, RAG applications with consistent context, any application where the beginning of each prompt is the same. Can save 30-50% on input costs for qualifying workloads.
Rate Limit Tiers
| Tier | Qualification | RPM (GPT-5) | TPM (GPT-5) |
|---|---|---|---|
| Free | $0 spent | 3 | 40,000 |
| Tier 1 | $5+ spent | 500 | 200,000 |
| Tier 2 | $50+ spent, 7+ days | 5,000 | 2,000,000 |
| Tier 3 | $100+ spent, 7+ days | 5,000 | 10,000,000 |
| Tier 4 | $250+ spent, 14+ days | 10,000 | 50,000,000 |
| Tier 5 | $1,000+ spent, 30+ days | 10,000 | 150,000,000 |
RPM = Requests per minute. TPM = Tokens per minute. Limits vary by model.
Real-World Cost Examples
Customer Support Chatbot
Model: GPT-5-mini | 1,000 conversations/day
Each conversation averages 500 input tokens (system prompt + user message) and 300 output tokens. Using GPT-5-mini keeps costs under $1/day for moderate volume.
Daily cost
$0.73
Monthly cost
$21.90
Content Generation Pipeline
Model: GPT-5 (Batch) | 100 articles/day
Generating 100 blog articles daily with detailed prompts. Using Batch API cuts costs 50%. Output-heavy workload means output tokens dominate the cost.
Daily cost
$10.12
Monthly cost
$303.75
RAG Application
Model: GPT-5-mini | 5,000 queries/day
Retrieval-augmented generation with cached system prompt and context. High input volume offset by prompt caching (50% input discount).
Daily cost
$2.63
Monthly cost
$78.75
Code Review Tool
Model: GPT-5 | 200 reviews/day
Reviewing code diffs with detailed analysis. Using the full GPT-5 model for quality. Each review averages 5K input tokens (code + prompt) and 2.5K output tokens.
Daily cost
$6.25
Monthly cost
$187.50
When to Use the API vs a Subscription
Choose the API When
- You need programmatic access to integrate into your application
- You process large volumes of data automatically
- You need fine-grained control over model parameters
- Your monthly usage would exceed the Plus message limits
- You want to use Batch API for 50% cost savings
- You need to use multiple models for different tasks
Choose a Subscription When
- You primarily interact with ChatGPT through the web interface
- You need Deep Research, image generation, or Sora
- You want a predictable monthly cost with no surprises
- Your usage is moderate (under 80 messages per 3 hours)
- You need team management and admin features
- You prefer a no-code approach without API integration