Rich Kulawiec on 17 Dec 2016 10:17:53 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Wanted: volunteers with bandwidth, storage, coding skills to help save climate data |
On Sat, Dec 17, 2016 at 11:22:53AM -0500, Charlie Li wrote: > This has happened rather recently in Canada when Stephen Harper was > Prime Minister (the PM immediately previous to Justin Trudeau). Except > he did it by slashing budgets to government scientific institutions so > they'd eventually shutter, and with shuttering comes inaccessibility of > data. To respond to this (and a couple others upthread): 1. The worst possible outcome of such efforts is that data which doesn't really need to be backed up gets backed up. I don't think that's so bad. And we may learn some lessons about how to do this sort of thing, lessons which may serve us well in other contexts. 2. In re Charlie's comments, let me tell you a story -- a bedtime tale for sysadmins, if you will. Please settle in on this snowy/icy Saturday afternoon with your favorite beverage and read on. Once upon a time, there was a medical research unit at a major university. They hired a new system administrator, who soon discovered -- to his horror -- that they had no off-campus backups. Not a single byte. For an operation doing studies over years and decades worth of data, this was unacceptable. So the sysadmin set up an offsite backup system. It had a retention and rotation schedule. It had catalogs. It had encryption. It had compression. It had checking (in order to avoid the all-too-familiar and entirely sad story of we-have-backups-oh-wait-no-we-don't-they're-unreadable). It had redundancy to mitigate most single points of failure. It had an archiving function for finished work. It had a disaster recovery mechanism (and that was tested to avoid another kind of sad story). It was built entirely on open-source software plus a few bits of shell and Perl. The code was documented(!). The code was commented(!!). There was even a formal document that explained the whole thing: rationale, policy, procedure, etc. It worked perfectly -- zero recovery failures -- for most of a decade. And it scaled just fine as the operation grew from terabytes of storage toward half a petabyte. All was serene and calm and boring -- which is of course how sysadmins want things like backup systems to be. But then, one day, as sometimes happens in academia, there was a shift in the political winds, followed by a regime change and budget cuts. And the sysadmin found himself being nudged toward the exit, along with others. So he did what any professional would do: he spent copious time ensuring a smooth handoff of all functions to those remaining. He wrote, he talked, he explained, he demonstrated, he made certain that everything -- including the backup system -- would keep right on going. And he was assured, repeatedly, that it would, that it was fully understood and that everything would be fine. And so one day he handed in his keys and his badge and set off to do something more respectable than sysadmin, like playing three card monte on streetcorners in order to fleece schoolchildren out of their lunch money. Then...many months later...he received an urgent email. "We have lost the encryption keys", it said. "Woe is us, for we can access nothing. No backups. No archives. No disaster recovery. Surely you can fix this for us?" And the sysadmin -- once he recovered from the initial shock -- had to tell them that no, he couldn't fix it, for he had dispensed with his copies of the encryption keys the day he left employment. And because he had chosen strong, open-source, vetted encryption, there wasn't going to be a way around it. All that data, all those years of work, everything, was -- absent perhaps intervention from the NSA or an equivalent cryptology powerhouse -- gone. The moral of the story is that things like this happen constantly, due to negligence or incompetence or forgetfulness or changes in personnel or budget cuts or reorganizations or just the passage of time. Remember that we live in a country that managed to be the first to land someone on the moon, but then lost the original videotapes of it. This is the norm, not the exception. So if there is valuable scientific data out there, and it can be backed up [1], it should be backed up. (As to what "valuable" means: the value of data is the cost to recreate it, whether that cost is money or time or opportunity or anything else. By this metric, some data is priceless, because there exist no means to recreate it -- at any cost.) Let me also add that often we don't know the value of data. Surely there were scrolls in the library at Alexandria that would have been considered by their authors to be far too mundane to be of any possible future interest to anyone. But today's scholars would likely disagree with that assessment. (And more recently, and close to heart for many of us, some of the original Dr. Who tapes were re-used by the BBC.) ---rsk [1] Not everything can be, at least not easily. A lot of medical research data involving human subjects is covered by HIPAA and/or contractual agreements with its suppliers. This can make backups...complicated. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug