Question 1

How can I reduce my OpenAI API costs?

Accepted Answer

The most impactful cost reduction strategies are: (1) Use a cheaper model like GPT-5-mini instead of GPT-5 for standard tasks (80% savings), (2) Enable the Batch API for non-real-time workloads (50% discount), (3) Leverage prompt caching for repeated prefixes (50% input discount), (4) Compress your prompts to reduce token count, and (5) Implement model routing to use cheap models for easy tasks and expensive models only for complex ones. Combined, these strategies can reduce costs by 60-90%.

Question 2

What is the cheapest way to use the OpenAI API?

Accepted Answer

The absolute cheapest approach combines GPT-4o-mini ($0.15 per million input) with the Batch API (50% discount) and prompt caching. This gives you effective rates as low as $0.0375 per million input tokens and $0.30 per million output tokens. For applications that need better quality, GPT-5-mini with batch processing costs $0.125 per million cached input and $1.00 per million output, still extremely affordable for most workloads.

Question 3

How much does the OpenAI Batch API save?

Accepted Answer

The Batch API provides a flat 50% discount on all token costs for both input and output tokens. The trade-off is that requests are processed asynchronously within a 24-hour window rather than in real-time. For a workload costing $1,000 per month in real-time, switching to batch processing would reduce the bill to $500 per month with identical output quality. Any workload that does not require instant responses should use the Batch API.

Question 4

How does OpenAI prompt caching work?

Accepted Answer

Prompt caching automatically detects when you send the same prompt prefix repeatedly and caches the processed tokens. Cached input tokens are charged at 50% of the standard input rate. This happens automatically with no configuration needed. It is most effective for applications with consistent system prompts, RAG applications with shared context, and chatbots where the system message stays the same across conversations. Cache typically expires after a few minutes of inactivity.

Question 5

Should I use GPT-5-mini or GPT-5 to save money?

Accepted Answer

GPT-5-mini costs 80% less than GPT-5 for input tokens ($0.25 vs $1.25) and 80% less for output ($2.00 vs $10.00). For standard tasks like summarisation, classification, simple Q&A, and content generation, GPT-5-mini produces nearly identical quality. Use GPT-5 only for tasks requiring the highest quality: complex reasoning, nuanced analysis, advanced code generation, and detailed creative writing. Many production applications use GPT-5-mini for 80-90% of requests.

Cut Your OpenAI API Bill: Practical Cost Optimisation Guide

Cumulative Impact Example

Model Selection: Use GPT-5-mini Instead of GPT-5

Batch API: 50% Off for Non-Real-Time Work

Prompt Caching: Save on Repeated Prefixes

Prompt Compression: Shorter Prompts, Fewer Tokens

Semantic Caching: Cache Similar Queries

Output Length Limits: Control Response Length

Fine-Tuning: Bake Knowledge Into the Model

Structured Outputs: Eliminate Retry Waste

Model Routing: Smart Task Assignment

Monitoring and Alerts: Catch Runaway Costs