JP Vossen via plug on 15 Apr 2022 14:41:51 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Book: Data Science at the Command Line


I mentioned this one in the "A list of new(ish) command line tools" thread, but I've finished reading it now and wanted to call some things out.

For us on this list, if nothing else, go read https://datascienceatthecommandline.com/2e/list-of-command-line-tools.html.

I really liked this book.  It reads well, especially for such a potentially dry topic, it's engaging, and it just flows.  I learned about a TON of commands I didn't know existed, but that's because most of them aren't installed by default, or in repos at all.  That's a bummer, but knowing the tool exists is sometimes 95% of the battle.  I've called out 3 really interesting CSV commands in the list below.

Full disclosure, the author recommended one of my books in a couple of places, but I didn't know that until I read it.

----
* https://www.oreilly.com/library/view/data-science-at/9781492087908/
    * Data Science at the Command Line, 2nd Edition
    * by Jeroen Janssens
    * Released August 2021
    * Publisher(s): O'Reilly Media, Inc.
    * ISBN: 9781492087915
* Code: https://github.com/jeroenjanssens/data-science-at-the-command-line
* Site: https://datascienceatthecommandline.com/
* Read it for free: https://datascienceatthecommandline.com/2e/
* AWESOME list of CLI tools: https://datascienceatthecommandline.com/2e/list-of-command-line-tools.html
    * The only problem is that many of these tools are NOT installed by default, and some are not in repos.
    * But some are: apt install moreutils cvstools
* And there's a (largeish Ubuntu zsh) Docker container with ALL of those tools!
    * Read: https://datascienceatthecommandline.com/2e/chapter-2-getting-started.html
    * docker pull datasciencetoolbox/dsatcl2e
    * docker run --rm -it -v "$PWD":/data datasciencetoolbox/dsatcl2e

Amazing tools: https://csvkit.rtfd.org:
    sudo apt install csvkit              # LOTS of Python modules
    csvcut    Filter and truncate CSV files
    csvgrep    Search CSV files
    csvjoin    Execute a SQL-like join to merge CSV files on a specified column or columns
    csvlook    Render a CSV file in the console as a Markdown-compatible, fixed-width table
    csvsort    Sort CSV files
    csvsql    Execute SQL statements on CSV files           <<<<<<<<<<<<<<<<<<<<<<<
    csvstack    Stack up the rows from multiple CSV files
    csvstat    Print descriptive statistics for each column in a CSV file
    in2csv    Convert common, but less awesome, tabular data formats to CSV
        in2csv data.xls > data.csv
    sql2csv    Execute an SQL query on a database and output the result to a CSV file

Related:
    json2csv    Convert JSON to CSV
        https://github.com/jehiah/json2csv
        Or: CSVKit: in2csv data.json > data.csv
    xml2json    Convert an XML input to a JSON output, using xml-mapping
        https://github.com/parmentf/xml2json
----

Later,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug