Rich Freeman on 19 Dec 2016 16:29:15 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] genome sequencing (was Re: Wanted: volunteers with bandwidth, storage, coding skills to help save climate data)

On Mon, Dec 19, 2016 at 6:21 PM, Soren Harward <> wrote:
> The main reason you'd need the raw image data is that calling bases — at
> least doing it well — is much, much harder than you'd expect.  Early
> versions of the Solexa software weren't that good at base calling, and even
> now I think the stock software is still a bit behind the state of the art.
> I'm a patent examiner in bioinformatics, and every year I do a couple
> applications for new base calling algorithms.

Yup, the large file sizes for the raw image data makes perfect sense
given that mechanism, which is rather clever, and certainly likely to
be faster.

At least back in the day when we were still stuck using gels base
calling was still less than perfect, at least once you got past maybe
100 bases or so, and that was back before everything was automated and
you really did want to get 400-500 bases of good sequence so that you
weren't going nuts putting it together.  But, I got out of right about
the time that Craig Venter started turning everything upside down.

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --