Good prompting, memory, context, and the token graphs

Good prompting is not about making requests tiny. It is about making requests legible. The best prompt is often not the shortest one. It is the one that prevents dumb misfires.

What good prompting is

A good prompt gives the model four things: goal, scope, constraints, and output shape.

Goal: what you want done
Scope: what to focus on
Constraints: what not to do, or what matters most
Output: what kind of answer you want back

For example: “Review this config. Focus only on backups and logging. Don’t change anything yet. Give me the top three issues and the recommended order of action.”

That is strong because it reduces ambiguity without turning into ceremony.

What memory is for

Memory should hold things that still matter later. Preferences. lasting decisions. environment facts. lessons learned. patterns that change future behaviour.

Good memory items sound like this:

“Work through agreed lists in order and don’t introduce side quests mid-stream.”
“Use the archive layer for temp artifacts; keep Git focused on docs, memory, and source-ish files.”
“Backups must live outside the tree being backed up.”

Bad memory items are mostly just session exhaust — trivia that won’t help next time.

What context is for

Context is the working set for the current task. Think of it as expensive attention. You want enough of it to avoid re-explaining everything, but not so much that the model spends energy dragging dead weight around.

Good use of context looks like:

including the specific file or message that matters right now
reminding the model of one decision that changes what “correct” means
supplying examples when formatting matters

Bad use of context looks like pasting large irrelevant logs, rehashing old decisions that no longer matter, or carrying stale assumptions forward because nobody cleaned them up.

How to read the graphs

Input tokens

This is what the model had to read: your prompt, the conversation, tool results, files, instructions, and whatever else got loaded for the turn.

High input is not automatically bad. It is bad when it is mostly irrelevant.

Output tokens

This is what the model wrote back.

Long output is not the problem. Unnecessary output is the problem. If extra explanation prevents a wrong action, it can be worth every token.

Cache reads

Usually a good sign. It means previously processed context was reused instead of being paid for from scratch again. In other words: continuity is working.

Cache writes

This is the setup work that lets later turns reuse context efficiently. You usually do not need to micromanage it. It is part of making follow-up turns cheaper.

What good usage looks like

A healthy question is not “were the token numbers high?” The healthy question is “did the extra context reduce guessing, rework, or mistakes?”

Spend tokens where they buy accuracy. Do not spend them on noise.

High usage is worth it when the task is messy, when the downside of error is expensive, or when structured explanation now prevents five confused follow-up turns later.

Usage is wasteful when the model is hauling around irrelevant baggage, branching into side quests, or solving the wrong problem because the scope was never made clear.

The practical standard

You do not need perfect prompts. You need prompts that are good enough to reduce avoidable drift.

Say what you want.
Say what to focus on.
Say what not to do yet.
Say what kind of answer you want back.

That is already enough to get noticeably better results in everyday work.