NOTE
Visit this version instead: https://asciinema.org/a/85999
I trimmed it because showing ALL of the output resulted in tens of thousands of lines, which will probably make the asciinema player crash your browser.
Original text as follows:
Demonstration of basic ack usage with the Clinton emails as packaged by the WSJ:
http://graphics.wsj.com/hillary-clinton-email-documents/
I downloaded a couple of the zip files, unzipped them into subdirectories, and then ran poppler’s pdftotext to create text files of each PDF:
ls *.pdf | xargs -n 1 pdftotext -layout
With the given directory tree of:
hereiam/
|__ clinton-emails/
|__dec
|__jun
The following ack command searches through the subdirectories for all text files that have a line in which there are the words “To” or “From”, followed by a colon, followed by 0 or more whitespaces, followed by a single capital H (which is how Clinton’s name shows up in the email fields), followed by anything that is not a letter (i.e. she goes by “H” and not “Hillary”)
ack '(?:From|To): *H\W' .
Example result:
clinton-emails/june/C05763789.txt
19:From: H [mailto:HDR22@clintonemail.com]
The top line is the relative path to the filename. The second line is the actual match. The number 19
refers to the line number of the file (you can run ack to not include this metadata if you want)
Here’s that same search except with the -C flag, which adds the 2 lines before and after each match. Note how the context lines start with the line number and then a hyphen. The actual line with the match starts with a number followed by a colon, as before:
clinton-emails/june/C05763789.txt
17-
18- Original Message
19:From: H [mailto:HDR22@clintonemail.com]
20-Sent: Thursday, July 16, 2009 7:48 AM
21-To: Chollet, Derek H