Tomáš Kubica

AI, PDF, and audio

Spiral reading: AI podcast from PDFs without losing context

How to build a simple workflow in which AI turns a PDF into an audiobook without losing the document's original structure.

From PDF to audio

Long PDFs, studies, technical documents, and long manuals have one common problem: I rarely need to read them linearly from page one to the last page. Most of the time I need to understand the map first, then decide where to dive deeper. For me, AI is great exactly for this: it helps me circle around a document repeatedly, each time at a different level of detail.

I call this spiral reading. First, a very high-level overview. Then a more detailed summary. Then questions about specific chapters. And eventually audio, so I can take the content with me into the car or on a walk.

Core idea

Do not generate an audiobook from a PDF in one blind step. First have AI understand the document structure, verify what is important, and only then turn it into spoken form.

  1. Upload the PDF to an AI assistant.Let it map the structure: chapters, sections, tables, important definitions, warnings, and examples.
  2. Ask for a hierarchical summary.First in five bullets, then by chapter, then by the parts that are relevant to you.
  3. Create a reading plan.Decide which parts are worth listening to, which are enough as a summary, and which can be skipped.
  4. Generate the audio script.Have the AI rewrite the relevant parts into fluent spoken language while preserving technical meaning.
  5. Feed the script into TTS.Use a text-to-speech tool or a service that can create a podcast-style output.

This workflow is a little slower than the naive “turn the whole PDF into audio” approach, but it produces much better results. Above all, it keeps context.

When AI converts a document into audio, it tends to smooth things out. That is pleasant to listen to, but risky for technical materials. I therefore watch three things in particular.

The listener should know where they are in the document. Is this an introduction, a method, a result, a limitation, or a recommendation? Without this, the audio sounds nice but is hard to use later.

If the document says “may”, “usually”, or “only if”, the audio script must not turn it into “always”. Modal verbs and caveats matter.

Numbers, units, table labels, and assumptions must either be spoken accurately or intentionally omitted. Half-remembered figures are worse than no figures.

The goal is not a theatrical podcast. The goal is useful spoken reading that still respects the source.

Prompts and checks

I usually split the work into several prompts. One prompt asks for structure, another for the summary, another for a spoken script. For example:

You are helping me read this PDF in a spiral.
First identify the document structure and the author's main argument.
Do not rewrite the document yet.
Return:
1. hierarchy of chapters and sections,
2. key claims,
3. warnings, limitations, or assumptions,
4. parts that are probably worth turning into audio.

Only after that do I ask for the spoken version:

Create a spoken audio script from the selected parts.
Keep the meaning precise, but make the sentences easier to listen to.
Clearly announce transitions between chapters.
Do not invent examples or conclusions that are not in the PDF.

Before I let the text go to TTS, I ask a few control questions. They often reveal that the model has oversimplified something.

  • Which important claim from the original PDF did you omit and why?
  • Which numbers or tables did you intentionally not include in the audio script?
  • Which parts of the script are paraphrases rather than direct content from the document?
  • What should I verify in the original PDF before sharing this audio with someone else?

This is also the moment when I decide whether the output is only for my personal orientation, or whether it is good enough to send to someone else.

This approach is useful mainly when I do not need pixel-perfect fidelity, but I do need to keep the author's logic in my head.

  1. First passI listen to the high-level version to decide whether the document deserves attention.
  2. Focused passI listen only to chapters that matter to my current problem.
  3. Return to sourceFor anything important, I go back to the original PDF and verify details.

For legal, contractual, safety, or medical content I would not rely on audio alone. There the audiobook can help with orientation, but the original remains authoritative.

Practical conclusion

Start with structure, not with audio generation.
Keep the hierarchy of the original document visible in the script.
Force the AI to state what it omitted or simplified.
Use TTS only after you know the script is faithful enough.

AI can turn a PDF into an audiobook, but the real trick is not the voice. The trick is preserving context while changing the medium.