Doug Stewart on 19 Dec 2016 09:02:58 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data |
On Mon, Dec 19, 2016 at 11:11 AM, Doug Stewart <zamoose@gmail.com> wrote:
> The problem with data is that, even at the fattest pipe speeds, the fastest
> transit method is still overnighting HDDs via FedEx. We used to get DNA
> sequences from Tufts, Johns Hopkins, etc. via this method when I was at
> CHOP. Transfer time via Internet2 connections: ~1 month. Via FedEx: 2 days.
>
How long ago was that? A human genome is only 4gigabases, with 2 bits
per base (before compression). Granted, I hear some plants are just
insane but a lot of that is duplicative.
1GB isn't THAT much data to transfer, and that is before compression.
Now, if it is all stored as ASCII files with 1 character per base and
maybe 10-20% overhead with things like line numbers and such then I
could see it expanding, but that is still only a 4-5x expansion in
size.
So, maybe a human genome that is 10-20x oversampled (you're sending
raw contigs and not the assembled result) and poorly encoded you're
talking about a day of downloading.
Unless you're talking about 1998 and your network admin doesn't want
you using more than 20kb/s of bandwidth...
--
Rich
____________________________________________________________ _______________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug