gabriel rosenkoetter on 28 Nov 2003 12:17:02 -0500


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] XML, text, and the development of unix


On Fri, Nov 28, 2003 at 10:44:54AM -0500, Alex Birch wrote:
> What's the more syntax to know with XML? I'd say there are more 
> semantics but the syntax seems pretty straight forward to me.

I'd say that knowing that XML functions around tags set between
angle brackets with some vocabulary of tags (don't need to know
what the tags are, just that there exists a vocabulary and a set of
things outside that vocabulary) and that meta-data can be conveyed
with name=value pairs within the angle brackets but that actual
data lies inside opening and closing (which have the same name as
opening tags, but also include / to mark them as closing) tags is
more syntax than that of Makefile, which just says "rules start on
a new line; definitions of rules start on a new line indented by
a tab".

It's a function of the *editor* that a tab looks like a few spaces
to you, but they are very much NOT the same ASCII character.

> I was in a hurry to create a makefile. That was my first nontrivial 
> linux program. I just wanted to create a makefile that worked quickly 
> and I learn best by initially copying and changing. After more 
> experience I can build from scratch, but usually not the first couple of 
> times.

I don't understand how this absolves the author of the book you were
reading from pointing out that Makefiles use tabs for this or, if he
did, you from *reading* that note in the process of copying the
Makefile text.

Furthermore, if you had simply copied a functional Makefile from
some other software, you would NOT have had this problem, because if
someone else's Makefile worked, then it was using tabs. You could
have just replaced the definition contents with your own (probably
replacing only after "	$(CC)", and never noticed.

I'm not going to suggest that Makefile's choice of tabs as being
special (rather than "whitespace") is not a bit confusing, nor that
it hasn't confused me. It confuses everyone... exactly once. It's
one thing to remember. There's way more to remember about XML.

This choice is one of laziness on the part of the Makefile format's
designers (it's harder to parse for "leading white space" than it
is for "leading ASCII tab character"), but the format's pretty well
set in de facto stone at this point, and it really isn't THAT
oppressive.

I believe that you liked Ant's format better and found it easier to
use, but I think that that's because you already knew a lot about
XML and so it clicked easily. That is, you had both syntactic and
semantic baggage going into the process of using a Makefile and you
weren't willing to give a little bit when things didn't work exactly
the way you thought they should the first time.

> That was my entire point, regarding the readability of ant versus makefile.

If all you're doing is reading a Makefile, with the knowledge of how
to use a C compiler (or whatever the Makefile's calling in the rule
definitions) but without any prior knowledge of Makefile format,
it's actually quite legible. It's a set of names to match, and
commands to execute when that name is passed to it.

I'd say that Ant requires *greater* prior knowledge for legibility:
you have to know how to visually parse a lot more, not just how to
visual parse a definition list and Unix command line (the second
part of which we presume you know how to parse, since you've managed
to open the Makefile in an editor and, one assumes, been building
software by typing all the commands long enough and frequently
enough to *want* to use a Makefile).

There's this one, rather small, snag in *generating* a Makefile, but
it is one snag, and a rather small one, and it certainly doesn't
bear on legibility.

> file.xml
> ...
> <foo att1="val1" att2="val2">
> grepValue
> </foo>
> <foo att1="val1" att2="val2">
> another value
> </foo>
> 
> Then if I did xgrep -i "grepvalue" file.xml
> 
> It would produce:
> <foo att1="val1" att2="val2">
> grepValue
> </foo>

I think that this makes sense as a start, especially bearing in mind
William's good points about what constitutes a "line" ending simply
being different in XML than in ASCII (with which points I agree
completely), but suppose that I have something like this:

<foo att="qux">
 not the value
 <bar att="quux">
  grepValue
 </bar>
</foo> 

First, is that actually valid syntax from your point of view? (It
certainly is for HTML, but I haven't really worked much with XML, so
I don't know if it forbids nested tags. I wouldn't expect it to,
since that would severely limit its usefulness for a lot of things,
not just display presentation.)

Second, xgrep -i grepvalue as you describe it would just return:

 <bar att="quux">
  grepValue
 </bar>

which is correct. But let's say that I *want* to see one nesting
level above. I think that I should be able to ask for that, with
something like xgrep -n1 -i grepvalue, and get the whole example
block that I give above. (Don't necessarily take my advice on using
"-n" for this; it may we collide with grep's existing namespace,
which should be preserved wherever those flags still make sense in
operation upon XML.)

This comes about because XML's end-tags are *not* really the same
thing as ASCII's EOL character; they're merely closure for a chunk
that can be nested in a theoretically infinite number of other
chunks.

This means that xgrep would have to be substantially more complex
than grep, a complexity that exists because XML is substantially
more complicated in its syntactic structure than ASCII.

> >>What other traditional tools would be nice? xsed? xgrep?
> >Weren't we discussing xgrep? (Did you mean xawk?)
> I was thinking of all the possible friends of xgrep and other line 
> oriented text manipulation files.

My point was that I think you wrote xgrep where you intended xawk,
that's all.

> Why not have both? have special fields map to the values? $v1, $v2 while 
> the $0 maps to the entire tag. This could allow people to quickly get 
> the tag and value that they want.

That's a good idea. One would probably also want $tN to be able to
get just the tags. What about the in-tag arguments (name=value
pairs) as opposed to the between-tag values?

(I can think of some ways to deal with this, and I'm sure you can
too. I'm not trying to badger you with this. I think your idea is
good; I'm trying to help with the thought experiment of designing
the expanded interface to make the jump from grep for ASCII to xgrep
for XML be one that is logical for those familiar with grep's
behavior.)

Have you taken a look at the package that Kevin Murphy pointed out
(http://xmlstar.sourceforge.net/)? I haven't (because I don't really
have a need for this software at the moment, though I can see that a
need exists for it), but I think you would do well to start from
there rather than reinventing the wheel, provided what they've got
is basically functional, if incomplete.

> That's why I'm trying to get feedback before I start a project like 
> this.

Which is exactly why I'm *giving* you feedback. I'm not trying to
put your idea down, I'm trying to help you develop it. :^>

-- 
gabriel rosenkoetter
gr@eclipsed.net

Attachment: pgpo2D7ZNlXbK.pgp
Description: PGP signature