Don't play this version, play this one instead: https://asciinema.org/a/85999

by dann
macOS ◆ xterm-256color ◆ bash 1473 views

NOTE

Visit this version instead: https://asciinema.org/a/85999

I trimmed it because showing ALL of the output resulted in tens of thousands of lines, which will probably make the asciinema player crash your browser.

Original text as follows:

Demonstration of basic ack usage with the Clinton emails as packaged by the WSJ:

http://graphics.wsj.com/hillary-clinton-email-documents/

I downloaded a couple of the zip files, unzipped them into subdirectories, and then ran poppler’s pdftotext to create text files of each PDF:

                  ls *.pdf | xargs -n 1 pdftotext -layout

With the given directory tree of:

    hereiam/
    |__ clinton-emails/
         |__dec
         |__jun

The following ack command searches through the subdirectories for all text files that have a line in which there are the words “To” or “From”, followed by a colon, followed by 0 or more whitespaces, followed by a single capital H (which is how Clinton’s name shows up in the email fields), followed by anything that is not a letter (i.e. she goes by “H” and not “Hillary”)

                ack '(?:From|To): *H\W' .

Example result:

       clinton-emails/june/C05763789.txt
       19:From: H [mailto:HDR22@clintonemail.com]

The top line is the relative path to the filename. The second line is the actual match. The number 19 refers to the line number of the file (you can run ack to not include this metadata if you want)

Here’s that same search except with the -C flag, which adds the 2 lines before and after each match. Note how the context lines start with the line number and then a hyphen. The actual line with the match starts with a number followed by a colon, as before:

        clinton-emails/june/C05763789.txt
        17-
        18-    Original Message
        19:From: H [mailto:HDR22@clintonemail.com]
        20-Sent: Thursday, July 16, 2009 7:48 AM
        21-To: Chollet, Derek H