PDF Text Extraction Shootout: `pdftotext` vs. The Rest

by pdfkungfoo
macOS ◆ xterm-256color ◆ bash 4449 views

This ASCiinema cast features a competition in PDF processing. See how different PDF tools score when it comes to extract a text snippet containing source code from a specific page!

(The cast may at certain spots be progressing too fast for you to follow closely. Please make use of the Pause button at the lower left corner if you need more time to read+understand the details of the screen contents. You can also scroll back if needed.)

The combatants are:

  • Apple Mac OS X Preview.app
  • Adobe Reader XI
  • Acrobat Professional XI
  • Evince (Poppler-based)
  • Chrome Web Browser with native, built-in PDF renderer PDFium
  • Chrome Web Browser with external PDF renderer PDF.js
  • Poppler-based Command Line Tool pdftotext