Rich Freeman on 19 Dec 2016 08:17:12 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data


On Mon, Dec 19, 2016 at 11:11 AM, Doug Stewart <zamoose@gmail.com> wrote:
> The problem with data is that, even at the fattest pipe speeds, the fastest
> transit method is still overnighting HDDs via FedEx. We used to get DNA
> sequences from Tufts, Johns Hopkins, etc. via this method when I was at
> CHOP. Transfer time via Internet2 connections: ~1 month. Via FedEx: 2 days.
>

How long ago was that?  A human genome is only 4gigabases, with 2 bits
per base (before compression).  Granted, I hear some plants are just
insane but a lot of that is duplicative.

1GB isn't THAT much data to transfer, and that is before compression.

Now, if it is all stored as ASCII files with 1 character per base and
maybe 10-20% overhead with things like line numbers and such then I
could see it expanding, but that is still only a 4-5x expansion in
size.

So, maybe a human genome that is 10-20x oversampled (you're sending
raw contigs and not the assembled result) and poorly encoded you're
talking about a day of downloading.

Unless you're talking about 1998 and your network admin doesn't want
you using more than 20kb/s of bandwidth...

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug