I Read the Chat GPT-5 Prompting Playbook. Here’s My Hot Take.

saurabhsarkar
Aug 14
3 min read

Let’s keep it to the five knobs that actually move metrics and add concrete snippets you can drop into prod. Light humor. No fluff.

1) Calibrate agentic eagerness like a throttle with GPT-5 prompting.

Why it matters: GPT-5 can either politely wait for orders or go full “let me handle it.” You want different settings per workflow (triage vs. research vs. code edits), not one global vibe. The Playbook frames this explicitly and shows patterns for dialing eagerness up/down.

a) Lower eagerness / faster answers

<context_gathering>
Goal: Get just-enough context and act.

Method:

- Start broad, then branch only if needed.

- One parallel batch of lookups; dedupe; avoid repeats.

Early-stop when:

- You can name the exact object to change OR

- Top sources converge (~70%) on the same action.

If uncertain: proceed with the most plausible assumption and note it.

Budget: ≤ 2 tool calls.

</context_gathering>

b) Higher eagerness / end-to-end autonomy

<persistence>
- You are an agent. Keep going until the task is fully resolved.

- Don’t hand back on uncertainty; research/deduce a reasonable path and continue.

- Terminate only when the solution is verified or blocked by a hard permission boundary.

</persistence>

2) Reason once, reuse everywhere (Responses API + previous_response_id)

Why it matters: In agentic flows, re-planning each call is self-inflicted latency and token burn. The Playbook recommends Responses API because it persists reasoning between tool calls. OpenAI Cookbook OpenAI’s docs show how to thread state with previous_response_id.

Bare-bones Python pattern:

from openai import OpenAI
client = OpenAI()

def step(messages, prev=None):
    return client.responses.create(
        model="gpt-5",
        input=messages,
        # if you support verbosity or reasoning knobs, pass them here too
        text={"format": {"type": "text"}},
        previous_response_id=prev
    )

# 1) Plan
msgs = [{"role":"user","content":"Fix flaky tests in payment module and explain root cause."}]
r1 = step(msgs)

# 2) Tool call result comes back; continue the SAME chain
msgs.append({"role":"tool","content":"pytest output... failing test X due to race on cache"})
r2 = step(msgs, prev=r1.id)

# 3) Keep going until done
msgs.append({"role":"tool","content":"applied patch; tests passing"})
r3 = step(msgs, prev=r2.id)

For a research agent fetching 5 sources then synthesizing, keep the same previous_response_id across “search → read → synthesize” so the plan and partial notes persist rather than being re-derived every hop.

3) Split thinking hard from talking long (reasoning effort ≠ verbosity)

Why it matters: GPT-5 adds verbosity (how long the final answer is) separate from reasoning effort (how much internal thinking it does). That lets you keep terse chat but verbose code diffs or vice versa, without mangling prompts.

Practical defaults:

Chat surfaces: verbosity = "low"; reasoning effort = medium.
Audit/explanations: verbosity = "high"; reasoning = high.
Latency-sensitive chores (formatting/extractors): verbosity = "low"; reasoning = minimal.

Per-tool override example:
{"model": "gpt-5",
  "input": "...user task...",
  "text": {"verbosity": "low"}, 
  "tools": [
    {
      "type": "custom", "name": "code_edit",
      "description": "Apply patch to repository",
      "defaults": {"text": {"verbosity": "high"}}  // verbose only when patching
    }
  ]
}

4) Clean up instruction hierarchy (this quietly eats ~30% of performance)

Why it matters: GPT-5 is very literal. Contradictions make it waste reasoning tokens trying to reconcile impossible rules. The Playbook shows a medical triage spec where “never schedule without consent” collides with “auto-assign same-day slot without contacting the patient.” Fix the hierarchy, the model stops chasing its tail.

Bad (contradictory) sketch:

- Never schedule without consent.

- For red/orange cases, auto-assign earliest same-day slot without contacting the patient.

Fixed hierarchy (promptable skeleton):

<policy>
<global>
  - Never schedule without explicit consent in chart.
  - Always look up patient before actions.
</global>
<exceptions>
  - Emergency: bypass lookup; instruct 911 immediately; no scheduling.
</exceptions>
<red_orange>
  - Inform patient first, record consent, then auto-assign earliest same-day slot.
</red_orange>
</policy>

5) Make code edits repo-safe with explicit “editing rules”

Why it matters: Most “the model messed up my repo” complaints are actually missing constraints. The Playbook cites Cursor’s experience: GPT-5 obeys structured, scoped rules extremely well, especially for verbosity, code style, and tool behavior

<code_editing_rules>Principles:
  - Small, reviewable diffs. Prefer `apply_patch` edits.
  - Preserve public APIs unless user asks to break them.
Repo Layout:
  - /apps/web (Next.js/TS, Tailwind, shadcn/ui)
  - /packages/ui (design system)
  - /packages/lib (pure functions; no fetch)

UI Patterns:
  - Use shadcn components; icons via lucide-react.
  - CSS via Tailwind; no inline styles.

State & Data:
  - Server actions for mutations; SWR for client fetch.
  - Zod for input validation.
Testing:
  - Add/adjust tests when changing behavior; run `pnpm test`.

Diff Style:
  - One logical change per commit; include migration notes if needed.

</code_editing_rules>

I Read the Chat GPT-5 Prompting Playbook. Here’s My Hot Take.

1) Calibrate agentic eagerness like a throttle with GPT-5 prompting.

2) Reason once, reuse everywhere (Responses API + previous_response_id)

3) Split thinking hard from talking long (reasoning effort ≠ verbosity)

4) Clean up instruction hierarchy (this quietly eats ~30% of performance)

5) Make code edits repo-safe with explicit “editing rules”

Recent Posts

Comments

Join our mailing list