brent timothy saner on 7 Dec 2018 10:43:06 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] How to Store Video Files for 25 Years?


On 12/7/18 12:47 PM, Rich Freeman wrote:
> On Fri, Dec 7, 2018 at 12:31 PM Michael Lazin <microlaser@gmail.com> wrote:
>>
>> I was skeptical of the practicality of your suggestion Brent so I tested it.  I turned a picture of me in Heidelberg Germany into base64 at the command line and did a line count with wc -l and was surprised that it was only 296 lines of base64 code.
> 
> Base64 is 6-bit encoding (as suggested by the name), so it takes up
> 25% more space than the original file.
> 
(SNIP)

hence my recommendation to first XZ it, which compresses most mixed
media data down to about 10-50% of the original size on average, yes.

let's use
http://hubblesource.stsci.edu/sources/video/clips/details/images/centaur_2.mpg
as an example file.

_______________________________________________________________
[bts@cylon poc]$ ls -l centaur_2.mpg
-rw-r--r-- 1 bts bts 10727428 Jul  9  2003 centaur_2.mpg
[bts@cylon poc]$ base64 centaur_2.mpg > centaur_2.mpg.b64
[bts@cylon poc]$ base64 -w0 centaur_2.mpg > centaur_2.mpg.1line.b64
[bts@cylon poc]$ time xz -c -9e centaur_2.mpg > centaur_2.mpg.xz

real	0m2.819s
user	0m2.702s
sys	0m0.107s
_______________________________________________________________

takes a bit of time, but you get quite decent compression rates that
definitely will help ease the base64 overhead:

_______________________________________________________________
[bts@cylon poc]$ ls -l
total 65544
-rw-r--r-- 1 bts bts 10727428 Jul  9  2003 centaur_2.mpg
-rw-r--r-- 1 bts bts 14303240 Dec  7 13:15 centaur_2.mpg.1line.b64
-rw-r--r-- 1 bts bts 14491441 Dec  7 13:15 centaur_2.mpg.b64
-rw-r--r-- 1 bts bts  9088848 Dec  7 13:15 centaur_2.mpg.xz
_______________________________________________________________


lrzip can be used for *slightly* better compression rates and an
incredible performance increase:

_______________________________________________________________
[bts@cylon poc]$ time lrzip -o - centaur_2.mpg > centaur_2.mpg.lrz
centaur_2.mpg - Compression Ratio: inf. Average Compression Speed:
5.000MB/s.
Total time: 00:00:01.77

real	0m1.788s
user	0m2.445s
sys	0m0.197s
[bts@cylon poc]$ ls -l *lrz
-rw-r--r-- 1 bts bts 9082074 Dec  7 13:19 centaur_2.mpg.lrz
_______________________________________________________________


lrzip's ZPAQ mode, while more time-consuming, affords even greater space
savings:

_______________________________________________________________
[bts@cylon poc]$ time lrzip -o - -z centaur_2.mpg > centaur_2.mpg.zpaq.lrz
centaur_2.mpg - Compression Ratio: inf. Average Compression Speed:
0.909MB/s.
Total time: 00:00:10.88

real	0m10.884s
user	0m10.728s
sys	0m0.279s
[bts@cylon poc]$ ls -l *lrz
-rw-r--r-- 1 bts bts 9082074 Dec  7 13:19 centaur_2.mpg.lrz
-rw-r--r-- 1 bts bts 8709175 Dec  7 13:23 centaur_2.mpg.zpaq.lrz
_______________________________________________________________

but there's definitely a finite ROI.

_______________________________________________________________
[bts@cylon poc]$ base64 centaur_2.mpg.lrz > centaur_2.mpg.lrz.b64
[bts@cylon poc]$ base64 -w0 centaur_2.mpg.lrz > centaur_2.mpg.lrz.1line.b64
[bts@cylon poc]$ base64 centaur_2.mpg.zpaq.lrz > centaur_2.mpg.zpaq.lrz.b64
[bts@cylon poc]$ base64 -w0 centaur_2.mpg.zpaq.lrz >
centaur_2.mpg.zpaq.lrz.1line.b64
[bts@cylon poc]$ ls -l *.b64
-rw-r--r-- 1 bts bts 14303240 Dec  7 13:26 centaur_2.mpg.1line.b64
-rw-r--r-- 1 bts bts 14491441 Dec  7 13:26 centaur_2.mpg.b64
-rw-r--r-- 1 bts bts 12109432 Dec  7 13:29 centaur_2.mpg.lrz.1line.b64
-rw-r--r-- 1 bts bts 12268767 Dec  7 13:28 centaur_2.mpg.lrz.b64
-rw-r--r-- 1 bts bts 11612236 Dec  7 13:29 centaur_2.mpg.zpaq.lrz.1line.b64
-rw-r--r-- 1 bts bts 11765029 Dec  7 13:29 centaur_2.mpg.zpaq.lrz.b64
_______________________________________________________________

as shown, the lowest amount of bytes is
centaur_2.mpg.zpaq.lrz.1line.b64, which - keep in mind this is base64'd
- is only 108% the size as the original file (10727428 bytes). (the
uncompressed base64 with no linebreaks is 133%). i'd say those are
pretty significant savings if it's something important to you (and time
isn't).

however, there is some criticism of the viability of XZ in
particular[0], and if that's a concern you're banking legal liability on
you'd want to use something else[1].

but i'd suspect it should be fine as long as the paper itself is kept in
preserved format (acid-free, dark, etc.) and the conversion process back
into data was tested.

using the above "best-case" scenario, and my standard of my previous
email (FreeMono, standard libreoffice margins and kerning, 8pt - or, in
other words, 8652 chars/page face), you're looking at 1343 printed page
faces. hope you have a duplexer. i never proposed using base64 was
ideal, by any means (or even using paper media for data archival in
general), just *possible*. :P



[0] https://www.nongnu.org/lzip/xz_inadequate.html
[1] https://suchanek.name/texts/archiving/index.html#TOC19
    https://suchanek.name/texts/archiving/index.html#TOC23

Attachment: signature.asc
Description: OpenPGP digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug