gabriel rosenkoetter on 28 Nov 2003 12:17:02 -0500 |
On Fri, Nov 28, 2003 at 10:44:54AM -0500, Alex Birch wrote: > What's the more syntax to know with XML? I'd say there are more > semantics but the syntax seems pretty straight forward to me. I'd say that knowing that XML functions around tags set between angle brackets with some vocabulary of tags (don't need to know what the tags are, just that there exists a vocabulary and a set of things outside that vocabulary) and that meta-data can be conveyed with name=value pairs within the angle brackets but that actual data lies inside opening and closing (which have the same name as opening tags, but also include / to mark them as closing) tags is more syntax than that of Makefile, which just says "rules start on a new line; definitions of rules start on a new line indented by a tab". It's a function of the *editor* that a tab looks like a few spaces to you, but they are very much NOT the same ASCII character. > I was in a hurry to create a makefile. That was my first nontrivial > linux program. I just wanted to create a makefile that worked quickly > and I learn best by initially copying and changing. After more > experience I can build from scratch, but usually not the first couple of > times. I don't understand how this absolves the author of the book you were reading from pointing out that Makefiles use tabs for this or, if he did, you from *reading* that note in the process of copying the Makefile text. Furthermore, if you had simply copied a functional Makefile from some other software, you would NOT have had this problem, because if someone else's Makefile worked, then it was using tabs. You could have just replaced the definition contents with your own (probably replacing only after " $(CC)", and never noticed. I'm not going to suggest that Makefile's choice of tabs as being special (rather than "whitespace") is not a bit confusing, nor that it hasn't confused me. It confuses everyone... exactly once. It's one thing to remember. There's way more to remember about XML. This choice is one of laziness on the part of the Makefile format's designers (it's harder to parse for "leading white space" than it is for "leading ASCII tab character"), but the format's pretty well set in de facto stone at this point, and it really isn't THAT oppressive. I believe that you liked Ant's format better and found it easier to use, but I think that that's because you already knew a lot about XML and so it clicked easily. That is, you had both syntactic and semantic baggage going into the process of using a Makefile and you weren't willing to give a little bit when things didn't work exactly the way you thought they should the first time. > That was my entire point, regarding the readability of ant versus makefile. If all you're doing is reading a Makefile, with the knowledge of how to use a C compiler (or whatever the Makefile's calling in the rule definitions) but without any prior knowledge of Makefile format, it's actually quite legible. It's a set of names to match, and commands to execute when that name is passed to it. I'd say that Ant requires *greater* prior knowledge for legibility: you have to know how to visually parse a lot more, not just how to visual parse a definition list and Unix command line (the second part of which we presume you know how to parse, since you've managed to open the Makefile in an editor and, one assumes, been building software by typing all the commands long enough and frequently enough to *want* to use a Makefile). There's this one, rather small, snag in *generating* a Makefile, but it is one snag, and a rather small one, and it certainly doesn't bear on legibility. > file.xml > ... > <foo att1="val1" att2="val2"> > grepValue > </foo> > <foo att1="val1" att2="val2"> > another value > </foo> > > Then if I did xgrep -i "grepvalue" file.xml > > It would produce: > <foo att1="val1" att2="val2"> > grepValue > </foo> I think that this makes sense as a start, especially bearing in mind William's good points about what constitutes a "line" ending simply being different in XML than in ASCII (with which points I agree completely), but suppose that I have something like this: <foo att="qux"> not the value <bar att="quux"> grepValue </bar> </foo> First, is that actually valid syntax from your point of view? (It certainly is for HTML, but I haven't really worked much with XML, so I don't know if it forbids nested tags. I wouldn't expect it to, since that would severely limit its usefulness for a lot of things, not just display presentation.) Second, xgrep -i grepvalue as you describe it would just return: <bar att="quux"> grepValue </bar> which is correct. But let's say that I *want* to see one nesting level above. I think that I should be able to ask for that, with something like xgrep -n1 -i grepvalue, and get the whole example block that I give above. (Don't necessarily take my advice on using "-n" for this; it may we collide with grep's existing namespace, which should be preserved wherever those flags still make sense in operation upon XML.) This comes about because XML's end-tags are *not* really the same thing as ASCII's EOL character; they're merely closure for a chunk that can be nested in a theoretically infinite number of other chunks. This means that xgrep would have to be substantially more complex than grep, a complexity that exists because XML is substantially more complicated in its syntactic structure than ASCII. > >>What other traditional tools would be nice? xsed? xgrep? > >Weren't we discussing xgrep? (Did you mean xawk?) > I was thinking of all the possible friends of xgrep and other line > oriented text manipulation files. My point was that I think you wrote xgrep where you intended xawk, that's all. > Why not have both? have special fields map to the values? $v1, $v2 while > the $0 maps to the entire tag. This could allow people to quickly get > the tag and value that they want. That's a good idea. One would probably also want $tN to be able to get just the tags. What about the in-tag arguments (name=value pairs) as opposed to the between-tag values? (I can think of some ways to deal with this, and I'm sure you can too. I'm not trying to badger you with this. I think your idea is good; I'm trying to help with the thought experiment of designing the expanded interface to make the jump from grep for ASCII to xgrep for XML be one that is logical for those familiar with grep's behavior.) Have you taken a look at the package that Kevin Murphy pointed out (http://xmlstar.sourceforge.net/)? I haven't (because I don't really have a need for this software at the moment, though I can see that a need exists for it), but I think you would do well to start from there rather than reinventing the wheel, provided what they've got is basically functional, if incomplete. > That's why I'm trying to get feedback before I start a project like > this. Which is exactly why I'm *giving* you feedback. I'm not trying to put your idea down, I'm trying to help you develop it. :^> -- gabriel rosenkoetter gr@eclipsed.net Attachment:
pgpo2D7ZNlXbK.pgp
|
|