I Wrote a Script to Fix Audible's Unreadable PDF Filenames

3 min read

If you buy audiobooks on Audible and download the PDF companions, you get files named like this:

bk_adbl_022796.pdf
bk_rand_002806.pdf
bk_sans_007977.pdf
bk_harp_004529.pdf

Which one is “Thinking, Fast and Slow”? Which one is “Leonardo da Vinci”? You have no idea. Audible names these files with internal publisher codes that mean nothing to a human. If you have five PDFs, it’s annoying. If you have fifty, it’s a project.

I had fifty.

What the tool does

audible-pdf-renamer is a Python CLI that extracts the actual book title from each PDF and renames the file. Point it at a folder, and:

bk_rand_002806.pdf  →  Thinking Fast and Slow.pdf
bk_sans_007977.pdf  →  Leonardo da Vinci.pdf
bk_upfr_000065.pdf  →  Information Architecture for the Web and Beyond.pdf

That’s it. One command, the whole folder gets fixed.

Why three tiers matter

The obvious approach is to read the PDF metadata title field. Many tools stop there. The problem is that a lot of Audible PDFs don’t have useful metadata – the title field is blank, or it contains the same cryptic code as the filename.

So the tool uses a three-tier fallback:

Tier 1: PDF metadata. Check the title property. If it’s a real title, use it. This is the fastest path and works for maybe 60% of the files.

Tier 2: Text extraction. If metadata fails, extract text from the first few pages and look for title patterns. The tool skips boilerplate (copyright notices, publisher info, “Also by this author” blocks) and finds the actual title. This catches most of the remaining files.

Tier 3: OCR. Some PDFs – particularly from publishers like O’Reilly – are image-based. There’s no text to extract. The tool renders the pages as images and runs Tesseract OCR to read the title off the page. This is slower but catches the files that tiers 1 and 2 miss entirely.

Each tier is tried in order. The first one that produces a plausible title wins. If all three fail, the file is left alone and reported in the summary.

Usage

# Rename PDFs in a specific folder
python audible_pdf_renamer.py ~/Downloads/Audible

# Preview changes without renaming anything
python audible_pdf_renamer.py ~/Downloads/Audible --dry-run

# See how each title was extracted
python audible_pdf_renamer.py ~/Downloads/Audible --verbose

# Skip OCR for faster processing
python audible_pdf_renamer.py --no-ocr

# Process all PDFs, not just Audible-named ones
python audible_pdf_renamer.py --pattern "*.pdf"

The --dry-run flag is there because renaming 50 files at once should be something you preview first.

Install

# Basic (metadata + text extraction)
pip install pdfplumber pypdf

# With OCR support (recommended)
pip install pdfplumber pypdf pytesseract pdf2image

# macOS OCR dependencies
brew install tesseract poppler

# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils

Or clone the repo:

git clone https://github.com/snapsynapse/audible-pdf-renamer.git
cd audible-pdf-renamer
pip install -r requirements.txt

What the output looks like

Audible PDF Renamer v1.0.0
Folder: /Users/you/Downloads/Audible Booknotes
OCR: Available

Found 51 PDF(s) to process

======================================================================

bk_rand_002806.pdf
  → Thinking Fast and Slow.pdf
    (extracted via text)
  ✓ Renamed successfully

bk_upfr_000065.pdf
  → Information Architecture for the Web and Beyond.pdf
    (extracted via ocr)
  ✓ Renamed successfully

======================================================================

Summary:
  ✓ Renamed: 51
  ✗ Failed: 0

Each file shows the original name, the new name, which extraction tier succeeded, and whether the rename worked. No surprises.

Edge cases

The tool handles filename conflicts (appends a number if a title already exists), strips special characters that would break file systems, and only processes files matching Audible’s bk_ naming pattern by default. The --pattern flag overrides this if you have PDFs with different naming conventions.

Who this is for

Anyone with a growing folder of Audible PDF companions who’s tired of opening files one by one to figure out which book they belong to. It’s a small annoyance that compounds over time, and this fixes it in about ten seconds.

GitHub repo – Python, MIT licensed, v1.0.0.

Built by SnapSynapse. Filed under “I needed this, so I built it.”