brent timothy saner on 7 Dec 2018 10:43:06 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] How to Store Video Files for 25 Years? |
On 12/7/18 12:47 PM, Rich Freeman wrote: > On Fri, Dec 7, 2018 at 12:31 PM Michael Lazin <microlaser@gmail.com> wrote: >> >> I was skeptical of the practicality of your suggestion Brent so I tested it. I turned a picture of me in Heidelberg Germany into base64 at the command line and did a line count with wc -l and was surprised that it was only 296 lines of base64 code. > > Base64 is 6-bit encoding (as suggested by the name), so it takes up > 25% more space than the original file. > (SNIP) hence my recommendation to first XZ it, which compresses most mixed media data down to about 10-50% of the original size on average, yes. let's use http://hubblesource.stsci.edu/sources/video/clips/details/images/centaur_2.mpg as an example file. _______________________________________________________________ [bts@cylon poc]$ ls -l centaur_2.mpg -rw-r--r-- 1 bts bts 10727428 Jul 9 2003 centaur_2.mpg [bts@cylon poc]$ base64 centaur_2.mpg > centaur_2.mpg.b64 [bts@cylon poc]$ base64 -w0 centaur_2.mpg > centaur_2.mpg.1line.b64 [bts@cylon poc]$ time xz -c -9e centaur_2.mpg > centaur_2.mpg.xz real 0m2.819s user 0m2.702s sys 0m0.107s _______________________________________________________________ takes a bit of time, but you get quite decent compression rates that definitely will help ease the base64 overhead: _______________________________________________________________ [bts@cylon poc]$ ls -l total 65544 -rw-r--r-- 1 bts bts 10727428 Jul 9 2003 centaur_2.mpg -rw-r--r-- 1 bts bts 14303240 Dec 7 13:15 centaur_2.mpg.1line.b64 -rw-r--r-- 1 bts bts 14491441 Dec 7 13:15 centaur_2.mpg.b64 -rw-r--r-- 1 bts bts 9088848 Dec 7 13:15 centaur_2.mpg.xz _______________________________________________________________ lrzip can be used for *slightly* better compression rates and an incredible performance increase: _______________________________________________________________ [bts@cylon poc]$ time lrzip -o - centaur_2.mpg > centaur_2.mpg.lrz centaur_2.mpg - Compression Ratio: inf. Average Compression Speed: 5.000MB/s. Total time: 00:00:01.77 real 0m1.788s user 0m2.445s sys 0m0.197s [bts@cylon poc]$ ls -l *lrz -rw-r--r-- 1 bts bts 9082074 Dec 7 13:19 centaur_2.mpg.lrz _______________________________________________________________ lrzip's ZPAQ mode, while more time-consuming, affords even greater space savings: _______________________________________________________________ [bts@cylon poc]$ time lrzip -o - -z centaur_2.mpg > centaur_2.mpg.zpaq.lrz centaur_2.mpg - Compression Ratio: inf. Average Compression Speed: 0.909MB/s. Total time: 00:00:10.88 real 0m10.884s user 0m10.728s sys 0m0.279s [bts@cylon poc]$ ls -l *lrz -rw-r--r-- 1 bts bts 9082074 Dec 7 13:19 centaur_2.mpg.lrz -rw-r--r-- 1 bts bts 8709175 Dec 7 13:23 centaur_2.mpg.zpaq.lrz _______________________________________________________________ but there's definitely a finite ROI. _______________________________________________________________ [bts@cylon poc]$ base64 centaur_2.mpg.lrz > centaur_2.mpg.lrz.b64 [bts@cylon poc]$ base64 -w0 centaur_2.mpg.lrz > centaur_2.mpg.lrz.1line.b64 [bts@cylon poc]$ base64 centaur_2.mpg.zpaq.lrz > centaur_2.mpg.zpaq.lrz.b64 [bts@cylon poc]$ base64 -w0 centaur_2.mpg.zpaq.lrz > centaur_2.mpg.zpaq.lrz.1line.b64 [bts@cylon poc]$ ls -l *.b64 -rw-r--r-- 1 bts bts 14303240 Dec 7 13:26 centaur_2.mpg.1line.b64 -rw-r--r-- 1 bts bts 14491441 Dec 7 13:26 centaur_2.mpg.b64 -rw-r--r-- 1 bts bts 12109432 Dec 7 13:29 centaur_2.mpg.lrz.1line.b64 -rw-r--r-- 1 bts bts 12268767 Dec 7 13:28 centaur_2.mpg.lrz.b64 -rw-r--r-- 1 bts bts 11612236 Dec 7 13:29 centaur_2.mpg.zpaq.lrz.1line.b64 -rw-r--r-- 1 bts bts 11765029 Dec 7 13:29 centaur_2.mpg.zpaq.lrz.b64 _______________________________________________________________ as shown, the lowest amount of bytes is centaur_2.mpg.zpaq.lrz.1line.b64, which - keep in mind this is base64'd - is only 108% the size as the original file (10727428 bytes). (the uncompressed base64 with no linebreaks is 133%). i'd say those are pretty significant savings if it's something important to you (and time isn't). however, there is some criticism of the viability of XZ in particular[0], and if that's a concern you're banking legal liability on you'd want to use something else[1]. but i'd suspect it should be fine as long as the paper itself is kept in preserved format (acid-free, dark, etc.) and the conversion process back into data was tested. using the above "best-case" scenario, and my standard of my previous email (FreeMono, standard libreoffice margins and kerning, 8pt - or, in other words, 8652 chars/page face), you're looking at 1343 printed page faces. hope you have a duplexer. i never proposed using base64 was ideal, by any means (or even using paper media for data archival in general), just *possible*. :P [0] https://www.nongnu.org/lzip/xz_inadequate.html [1] https://suchanek.name/texts/archiving/index.html#TOC19 https://suchanek.name/texts/archiving/index.html#TOC23
Attachment:
signature.asc
Description: OpenPGP digital signature
___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug