Benjamin Crozat “Heard about Sevalla? They let you deploy PHP apps with ease.” Claim $50 →

Start using GPT‑5 through OpenAI's API, today

5 minutes read

Start using GPT‑5 through OpenAI's API, today

Introduction to GPT-5

GPT-5 is OpenAI’s flagship model for the second half of 2025. It’s built for deeper reasoning, better coding, and agentic workflows, and it adds two controls that matter in practice: verbosity and reasoning effort. It runs with a 400,000-token total context and can emit up to 128,000 tokens per response.

If you’re new to large language models, skim my plain-English explainer on how GPT-style LLMs work. You’ll prompt better after.

Ready? Let’s ship your first GPT-5 request.

Create an account to get your GPT-5 API key

  1. Create an account or sign in.

Creating an account on OpenAI

  1. Confirm your email address.
  2. Log in.
  3. Open the Billing overview page and add credit or a payment method so your keys work right away. (The free-credit program ended mid-2024.)
  4. Generate your first API key for GPT-5. Keys are shown once; paste it into a password manager immediately.

API key generation on OpenAI

Got your key? Great. Time to hit the API.

How to make your first request to GPT-5

OpenAI’s Responses API is the modern endpoint. Chat Completions still works, but start with Responses unless you have a hard reason not to.

macOS and Linux (Responses API):

curl -s https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "input": [
      { "role": "user", "content": [{ "type": "input_text", "text": "Hello!" }] }
    ],
    "verbosity": "medium",
    "reasoning_effort": "minimal",
    "max_output_tokens": 200
  }'

Windows (one-liner, Chat Completions still fine):

curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer %OPENAI_API_KEY%" https://api.openai.com/v1/chat/completions -d "{ \"model\": \"gpt-5\", \"messages\": [{\"role\":\"user\",\"content\":\"Hello!\"}], \"verbosity\":\"medium\", \"reasoning_effort\":\"minimal\", \"max_output_tokens\":200 }"

Pro tip: Use gpt-5 to track the latest GPT-5 snapshot. If you need strict reproducibility, pin a snapshot in your stack.

Token budget: a single call supports up to 400,000 tokens total. Max output is 128,000 tokens. Your rate-limit tier must be high enough to feed that much TPM; check your org’s quotas before long prompts.

Rock-solid JSON with Structured Outputs (Responses API)

In Responses API, JSON lives under text.format. If you send response_format here, you’ll get the exact error you saw. Use this shape:

curl -s https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "input": [
      { "role": "system", "content": [{ "type": "input_text", "text": "Return compact JSON only." }] },
      { "role": "user",   "content": [{ "type": "input_text", "text": "Solve 8x + 31 = 2." }] }
    ],
    "text": {
      "format": {
        "type": "json_schema",
        "name": "equation_solution",
        "schema": {
          "type": "object",
          "properties": {
            "steps": { "type": "array", "items": { "type": "string" } },
            "final_answer": { "type": "string" }
          },
          "required": ["steps", "final_answer"],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  }'

That’s the Responses-API-correct way to enforce a schema. For Chat Completions, you still use response_format.

Vision and multimodal (quick-start)

GPT-5 accepts text and images in one request. With the Responses API, set image parts as { "type": "input_image", "image_url": "<url or data URL>" }, then put your text after the image for better results.

Supported image formats: PNG, JPEG/JPG, WEBP, non-animated GIF. Size limits: Up to 50 MB total payload per request for image bytes across parts.

Image URL example:

curl -s https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_image", "image_url": "https://cdn.example.com/slide.jpg" },
          { "type": "input_text",  "text": "Describe this slide in 5 bullets." }
        ]
      }
    ],
    "max_output_tokens": 250
  }'

Base64 option:

{ "type": "input_image", "image_url": "data:image/jpeg;base64,...." }

Pro tips

  • One image per content part unless you’re explicitly comparing; caption each if multiple.
  • Prefer URLs in long threads to avoid resending Base64.
  • Always cap max_output_tokens so multimodal answers don’t run wild.

Verbosity (new)

What it does: constrains how compact or expansive the answer is without rewriting the prompt. Values: "low", "medium" (default), "high". Set it deliberately.

When to use low: terse assistants, tool-first UX, status replies. When to use high: audits, code reviews, pedagogical explanations.

"verbosity": "low"

Reasoning effort (new)

What it does: controls how much internal reasoning the model does before responding. Values: "minimal", "low", "medium" (default), "high". "minimal" is new and fast for simple tasks.

  • Use “minimal” for retrieval, formatting, simple transforms, low-latency UX.
  • Use “high” for complex planning, multi-step refactors, ambiguous tradeoffs.
"reasoning_effort": "minimal"

GPT-5 pricing

Model Input (per 1 M) Output (per 1 M)
gpt-5 (400K context) $1.25 $10.00
gpt-5-mini (400K context) $0.25 $2.00
gpt-5-nano (400K context) $0.05 $0.40
gpt-4.1 (1 M context) $2.00 $8.00
gpt-4.1-mini (1 M context) $0.40 $1.60
gpt-4.1-nano (1 M context) $0.10 $0.40

Prompt-cached input is cheaper; check the official pricing and your model page for cached-input rates.

Output limits: GPT-5 can emit up to 128K tokens per call; GPT-4.1’s max output is ~32K with a ~1.0–1.05M context. If you need the absolute longest context, 4.1 still has the edge; otherwise default to GPT-5.

GPT-5 (full), mini, or nano?

  • GPT-5 (full): flagship quality for deep reasoning, complex coding, long-context analysis.
  • GPT-5 mini: cost-sensitive apps with crisp prompts.
  • GPT-5 nano: ultra-low latency and volume workloads.

There’s also gpt-5-chat-latest if you want a non-reasoning chat flavor.


Did you like this article? Then, keep learning:

Help me reach more people by sharing this article on social media!

0 comments

Guest

Markdown is supported.

Hey, you need to sign in with your GitHub account to comment. Get started →

Great deals for developers

Search for posts and links

Try to type something…