Using ack and regex to convert unstructured NASA GISS temperature data to CSV

by dann
macOS ◆ xterm-256color ◆ bash 2817 views

Simple exercise in text processing/wrangling with a grep-like (ack) and regular expressions.

Context: http://2017.compciv.org/syllabus/assignments/homework/contrived-cli-data-crunching.html#nasa-annual-global-temperature-averages

Original data URL: https://data.giss.nasa.gov/gistemp/graphs_v3/Fig.A.txt

Test data URL: http://stash.compciv.org/2017/nasa-gistemp-sample.txt

Ack: beyondgrep.com/documentation/ack-2.14-man.html

Regex basics:

Note that I throw in an extra ack for clarity sake, to separate the captured-group matching from the pattern matching. To do everything with one ack:

   curl -s http://stash.compciv.org/2017/nasa-gistemp-sample.txt  \
          |  ack '(\d{4}) +(-?\d+\.\d+)' --output '$1,$2'