Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth,

Soren Harward on 19 Dec 2016 15:22:16 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth, storage, coding skills to help save climate data)

From: Soren Harward <stharward@gmail.com>
To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
Subject: Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth, storage, coding skills to help save climate data)
Date: Mon, 19 Dec 2016 23:21:58 +0000
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Rms7ceDkwo5jBD9iYqh3JgsgVeYZwodh2hSqyMIoBPk=; b=gITNZ3AXMBGBH9AP58GOoOnlwQBb92mO2cOGA/G1+POJvD7U8MjclzGRYOAqnvPkbN flTtaKB72vXm+UapwN92VGvh8T+rNQqrKqTJqzIwMWc3EMy0JuRwPdX48tPcCarhGBBA GhxaFmDF9sl8JSYk1Mmi0ID3kXkvAM7rV0jjeDCzkQYgFA3ix92yl06f5e+Zg3K0rrJk y79pjRk7S1AhfZoYJaSD4hPBKbw2E/MrIGxwBLRIO4Pv/Wf+PJ5t3Lel/eh5Q88s723d /bWYrNpIilFEJnL2HScpYhVaWZ23VCSo8zUSwPjR6SWDLqsnsE+PmXQLIz8KIriENw56 VlKQ==
Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sender: "plug" <plug-bounces@lists.phillylinux.org>

On Mon, Dec 19, 2016 at 4:49 PM Rich Freeman <r-plug@thefreemanclan.net> wrote:

(I assume they're still using replication termination of some kind to do sequencing).

Sort of. The Solexa/Illumina platform does "sequencing by synthesis". Basically, you start with several million short, single-stranded oligonucleotides anchored to a surface. Then you add fluorescently-tagged nucleotide bases one at a time (called a "flow"), and image the surface each time a new base is added. Then do 30–100 cycles of flows, gradually synthesizing the complementary sequence; hence the name "sequencing by synthesis". You figure out the sequence for each short oligonucleotide by seeing which flow causes it to light up. So TAAGTC would light up on the A flow in the first cycle (remember it's the complementary base), the T flow on the second and third cycles, the C flow on the fourth cycle, etc.

Most of the ~1TB of raw data from a single sequencing run is the hundreds of multi-megapixel 16-bit grayscale uncompressed digital images of the surface; it's so much data that even in 2005, Solexa had to use an FPGA accelerator so that the image analysis didn't take weeks. The software processes down the images to call bases for each oligonucleotide "read". So even though the final sequences of all the reads compress down to a few dozens of MBs, it's good practice to keep the raw image data around until you're certain you don't need it.

The main reason you'd need the raw image data is that calling bases — at least doing it well — is much, much harder than you'd expect. Early versions of the Solexa software weren't that good at base calling, and even now I think the stock software is still a bit behind the state of the art. I'm a patent examiner in bioinformatics, and every year I do a couple applications for new base calling algorithms.

--

Soren Harward
stharward@gmail.com

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:
- Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth, storage, coding skills to help save climate data)
  - From: Rich Freeman <r-plug@thefreemanclan.net>

References:
- [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Rich Kulawiec <rsk@gsp.org>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: LeRoy <ldc@lrcressy.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: "Eric H. Johnson" <ejohnson@camalytics.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Alex Ruijie Fang <frjalex@temple.edu>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Paul Walker <pjwalker76@gmail.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: "Eric H. Johnson" <ejohnson@camalytics.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: "Keith C. Perry" <kperry@daotechnologies.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Doug Stewart <zamoose@gmail.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Doug Stewart <zamoose@gmail.com>
- Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
  - From: Rich Freeman <r-plug@thefreemanclan.net>

Prev by Date: Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
Next by Date: Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth, storage, coding skills to help save climate data)
Previous by thread: Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data
Next by thread: Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth, storage, coding skills to help save climate data)
Index(es):
- Date
- Thread