William H. Magill on 26 Nov 2003 01:12:02 -0500


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] XML, text, and the development of unix


On 24 Nov, 2003, at 22:19, Jeff Abrahamson wrote:
What are we losing, I would have asked Richie, as more and more files
become XML?

What is being lost is a point of view.

Unix "flat files" are really no different than XML's flat files. Only their content is different.
Both are 100% "text files."


What is different is the contents of the text, and the amount and kind of fore-knowledge necessary to read them. With classic Unix files, you the human needs to know the format, syntax and content of the file. With XML, the file tells you.

But Unix is not based on text files. It is based on "lines of input." All Unix commands operate
on a single line of text delimited by a thing which is unique to Unix.
(And if you feed it too much text, you get buffer overflows.)


A plain text file created by say the Mac or Windows has a different "line" delimiter.

Change the concept of a line and all your unix commands give you "unexpected" results.

How does unix deal with such issues -- usually with perl or awk.

Somebody wrote a special program to deal with some file that was in a different format than
line by line.


So, you need a different set of tools that understand that an XML file has a different LOGICAL line ending than does a classic Unix file.

Unix was not written around text as much as it was written around the concept that there was no such thing as a line, (a single buffer full of data), but rather a continuous stream of bytes delimited in some way. This "stream of bytes" is what makes it possible to pipeline things together or to parse different formats.

grep /bin/bash /etc/passwd | wc -l

Why does this work?

It does NOT work because each line of /etc/passwd is delimited by tabs or semicolons or frogs. It works because it finds N occurrences of "/bin/bash" and counts them. The command substitutes your brain for a lot of the computer's brain. Suppose you had two or three occurrences of the string "/bin/bash" in the line. How would you know which one to count? All of them? None of them? Are any of those strings in the correct place? You are making an assumption about the content, its format and its validity in order to make the command "do what you want."

But if you are working with a C2 class machine, that command wouldn't necessarily work anyway. Because the REAL file is a different file and might have different contents. You ASSUME that the file is an accurate representation of the real data being used. No different than with XML. You just have to have a different set of assumptions.

This particular example is particularly bad, because a password file coded in xml would give you exactly the same answer.

However, to answer the question "Which users are using bash?" would be a different story. Leave off the pipe to wc and you get a line of data output. Your brain then has to parse that line and find the user name or userid (which one did you want?)

For that you need "xgrep" or whatever, and you would pipe it into something which could find the tag that you wanted an print out the value.

Exact same process... just different tools, and different requirements on the assumptions necessary.

Easier? Simpler? I don't think there is any difference on the part of the user. Only the program itself - grep or xgrep needs to be "smarter" differently.

While Unix is very powerful, it is extremely primitive in its strength. Think about Conan (the Barbarian, not Obrian) - extremely powerful, but rendered inoperative by a Connecticut Yankee.

It's all in the point of view and the tools available.

Back to your original question...
What are we losing,

One response could be -- "innocence." Another would be -- certainty that our knowledge was complete.

We'll have to learn YEG -- Yet Another Grep. We'll all have to turn in our pointed hats with the stars on them and go back to the Unseen University where the Freshmen who had been using xgrep since 6th grade would find things we can't.

Of course, this could just be like BAL, SNOBOL, PL1, PASCAL, APL, and even C ... just a passing fancy.
... After all, they teach freshman Java now instead of C.


Just for yucks... compare these two... They are not all that different.

<dict>
        <key>AppendAMPM</key>
        <false/>
        <key>ClockDigital</key>
        <integer>1</integer>
        <key>ClockEnabled</key>
        <true/>
        <key>ClockLocation</key>
        <integer>0</integer>
        <key>DisplaySeconds</key>
        <false/>
        <key>FlashSeparators</key>
        <true/>
        <key>LastSavedGlobalTimeString</key>
        <string>HH':'mm':'ss</string>
        <key>PreferencesVersion</key>
        <integer>2</integer>
        <key>ShowDay</key>
        <true/>
        <key>Transparency</key>
        <real>0.80000001192092896</real>
        <key>Use24HourClock</key>
        <true/>
</dict>

XClock*geometry:                        180x30-263+0
XClock*update:                            1
#ifdef COLOR
XClock*background:                      LightBlue
XClock*foreground:                        red
XClock*highlight:                            yellow
#endif

The first is OS X's clock "controls." The second is X11s.
Virtually all of Apple's applications use ".plist" files, which contain XML data, to hold the parameters for that application.



T.T.F.N. William H. Magill # Beige G3 - Rev A motherboard - 768 Meg # Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg # PWS433a [Alpha 21164 Rev 7.2 (EV56)- 64 Meg]- Tru64 5.1a magill@mcgillsociety.org magill@acm.org magill@mac.com

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug