Re: [PLUG] scientific computing data formats

Isn't size a big factor? If one is keeping huge amounts of data, then binary formats may be better for time and space at the trade off of possibly more complex data handling.

From: Walt Mankowski <waltman@pobox.com>
To: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sent: Thursday, March 26, 2009 9:37:56 AM
Subject:
 Re: [PLUG] scientific computing data formats

On Thu, Mar 26, 2009 at 08:56:09AM -0400, Paul L. Snyder wrote:
> What about CSV isn't working for you?
> 
> How are you going to be using the data?
> 
> What languages will you be using it from?
> 
> How much data are you talking about?

Most science grad students I know end up using CSV files.  Those are
mostly physics/astronomy students, who tend to be relatively
computer-savvy, and CS students, who tend to be a little less savvy
but can usually handle a text file. :)  When I was doing
bioinformatics work a few years ago, they all used excel spreadsheets.

> Lately I've been using Python for this sort of thing.  It's quite
> convenient to use one script to read you data into Python data structures,
> then use the pickle module to save them out.  Other scripts can then work
> directly with the pickled data.

I've heard a lot of good things
 about SciPy, but I haven't used it
myself and don't know if it would be useful for what you're doing.
Perl has a large and well-maintained module called bioperl that may or
may not be of use to you.  I used SQLite for a recent project and it
worked quite well.  As Paul says, it all depends on what your data
looks like, what languages you know, how many people need to use it,
and so on.

Walt