Think Ahead

How async design changes AI response speed

The point of this essay: asking AI to "look it up right now" is slower, more expensive, and less stable than asking it to "prepare this by morning." The classic design wisdom of separating heavy and light processes is more powerful than ever in the age of AI.

1. The moment I noticed the weight of "right now"

When I first automated my morning briefing, the design was simple: say "good morning," and the AI would check the project board, read the inbox, search for news, and return a summary. One request, everything at once.

It worked. It was useful. But it was a little slow. News retrieval alone took over ten seconds. Waiting for that every morning was fine — but something nagged at me.

The source of that nagging became clear quickly. News is what happened between yesterday and today. Whether I fetch it at 6am or 7am makes no real difference. So why was the system searching the web every single time I asked for a briefing?

2. An old idea: synchronous and asynchronous

Computer science has a pair of concepts called synchronous and asynchronous processing. Synchronous means waiting for one task to finish before moving on. Asynchronous means offloading a heavy task in advance, then collecting only the results when needed.

This is standard practice in web services. Image conversion doesn't run at request time — it runs in nightly batches. Search indexes aren't rebuilt on every access — they're prepared in advance. Heavy work is done ahead of time. Only the light work runs in real time.

This wisdom is decades old. What's new is asking the same question about AI: does this process need to happen right now?

3. The idea of "pre-loading" the news

I changed the design. I separated news retrieval from the briefing itself.

Every morning at 6am (automated)
  → Fetch news (web search + AI summary)
  → Save result to file

Whenever I say "good morning"
  → Generate briefing
  → Read the saved news file (no API call)

News retrieval is heavy. It calls a large model, searches the web, summarizes results. But it only needs to happen once a day. The briefing became light — reading a file takes almost no time. The response became nearly instant.

Costs dropped too. News retrieval uses a capable model but runs once. The briefing can be called as many times as needed at low cost. Same quality output, fewer resources consumed.

4. When "right now" is actually justified

Some processes genuinely need real-time data. Stock prices. Breaking news. The live status of a running system. These can't be pre-loaded — their value is entirely in freshness.

The question to ask is: "If I gathered this information a few hours ago, would it lose its value?"

Morning news — not really. Project task status — unlikely to have changed overnight. Voice memo insights — entered the previous evening. All of these are processes that can be done in advance.

The demand for real-time is often an assumption rather than a requirement. When you question the design, many processes turn out to be candidates for pre-computation.

5. Human thinking has the same structure

This design principle isn't unique to AI or computers. Human thinking works the same way.

Skilled decision-makers don't gather information in the moment of decision. They read the materials the night before, think through the implications during the morning commute, and arrive at the meeting with pre-loaded thinking already in place. When the discussion starts, they move quickly and clearly — not because they're reacting faster, but because the heavy work was already done.

The value of thinking ahead isn't only speed. It's the clarity that comes from a lighter cognitive load at the moment of action. Deciding while information is still being collected is harder than deciding after it's been absorbed. Separating the processes raises the quality of judgment.

6. The scheduler working in the background

To realize this design, you need something that works while no one is watching. In technical terms, a scheduler — a mechanism that triggers tasks at specified times.

Every morning at 6am, it quietly starts, fetches the news, saves the file, and goes back to sleep. No one asked it to. No one is watching. But by the time anyone wakes up, the preparation is already done.

Personal infrastructure can now include this kind of "work that happens while you sleep." Cloud schedulers are free. Running them overnight costs nothing. The system thinks ahead so you don't have to.

7. Separation creates freedom

When processes are separated, each one can evolve independently.

To improve news quality, only the retrieval component needs to change. To adjust how the briefing looks, only the display component needs to change. Neither affects the other. This is what loosely coupled design enables.

When everything was combined, changing one thing risked breaking everything. After separation, changes became small and safe. The system became easier to grow.

8. Changing the question changes the design

"How do we make this faster?" is an optimization question. "Does this need to happen right now?" is a design question. The second one cuts deeper.

As more people bring AI into their daily work, the same question will keep surfacing. Is this process I'm running in real time actually time-sensitive? Could this content I'm generating on demand be prepared in advance?

Changing the question changes the design. Changing the design reduces cost, increases speed, and simplifies the system. And simple systems are the ones that keep running.

"Right now" is often an assumption. Thinking ahead is what turns AI from a tool into a habit.

TokiStorage is a project for preserving voice, image, and text for 1,000 years.
The design principle of thinking ahead is built into the preservation system itself.

Explore TokiStorage Read all essays