Mark Dominus on 18 Oct 2005 17:25:51 -0000 |
A popular pasttime of programmers is to write a program that prints out its own source code. Solutions to this problem are more or less contrived. But a deep theorem of computer science says that this is no trick. *Any* reasonable programming language is sufficiently powerful to write a program that can print its own source code, and in a standard and formulaic way. (The only programming languages not powerful enough to do this are also not powerful to do a lot of other things that are more important, like arithmetic.) In fact, not only is it true that programming languages can write programs that print their own source code, they can do more. In any reasonable programming language, for any program P, it is possible to write a version of P that has the exact same behavior as P but that can also print out its own source code. This is the "recursion theorem" of computer science. The recursion theory is easy to prove, but the proof is usually presented involving fixed points of partial recursive functions. It is really only illuminating once you translate it into a conventional programming language. The essence of the proof is this. We want to write a program that prints its own source code. We can do this by writing the program in two parts, A and B. Part A is just a quoted string that contains the source code for part B. Part B is a function that does the actual writing. It acquires the string from part A. Then it prints a quoted version of this string---thus printing part A---and an unquoted version---this printing part B. B might also need to print some decorations; for example, if parts A and B are separated by some newlines, then the B function will need to print some newlines in between the quoted and the unquoted versions of the string. So, for example, consider Perl.. In Perl, quoting a string is easy. If some text has balanced curly braces, you can quote it by just writing it between "q{" and "}". So function B will look like this: sub B { print "\$A = "; print "q{$A};\n"; print "$A\n"; print "B();\n"; } You can see that it is printing a quoted version of $A (that's the second print) and then an unquoted version (that's the third print) and also some decorations: the assignment to $A at the beginning (that's the first print) and a call to function B at the end (that's the fourth print) and also some necessary punctuation in between (newlines, semicolons, and such.) String $A should contain the source code for sub B, like this: $A = q{sub B { print "\$A = "; print "q{$A};\n"; print "$A\n"; print "B();\n"; }}; I just took the code for sub B and stuck it between q{...}. And the program needs to call function B, or nothing will be printed: B(); and indeed, this is a program that prints out its own source code. (Complete code is at http://www.plover.com/~mjd/misc/quine.pl .) There are many such programs. For example, we can change B to use a single print statement instead of using four: sub B { print "\$A = q{$A};\n$A\nB();\n"; } and then A, which should contain B's source code, must change correspondingly: $A = q{sub B { print "\$A = q{$A};\n$A\nB();\n"; }}; but it still works; this version is at http://www.plover.com/~mjd/misc/quine/q2.pl . In both these programs, I relied on Perl's convenient "q{...}" construction, which is simple, so that the resulting program is small. So it might appear that I am depending on a language gimmick. But no, the q{...} construction is inessential. For example, I can use Perl's ordinary double-quotes instead. The only trick is that to quote a string in double quotes is not quite as simple as to quote it with "q{...}"---if the string contains quotes, backslashes, or dollar signs, these must be backslashed. No problem; Perl is good at such things: sub b { my $s = shift; my $q = $s; $q =~ s/([\$\\\"])/\\$1/g; print "\n\n\$a = \"$q\";\n\n$s\n\nb(\$a);\n"; } Here the b function gets an argument string, $s, and makes a copy, $q. It then prepares $q to be double quoted, by putting a backslash before every '$', '\', or '"' character; that's the third statement. Then it prints out the quoted version $q and the unquoted version $s, plus some decorations, such as the call to b($a) that invokes function b on its own source code. The complete program is at http://www.plover.com/~mjd/misc/quine/self.pl . If you think about all the ways of writing string data in a program, you will realize how many ways there are to quote strings. http://www.plover.com/~mjd/misc/quine/cs2.c is a version in C that encodes the source code of the B function in the form: char data[] = { 0x69, 0x6e, 0x74, 0x20, 0x6d, 0x61, 0x69, 0x6e, 0x28, 0x76, 0x6f, 0x69, 0x64, 0x29, 0x20, 0x0a, ... 0x74, 0x75, 0x72, 0x6e, 0x20, 0x30, 0x3b, 0x0a, 0x7d, 0x0a, 0x00, }; Then the B function (named "main" in this version) scans this array twice, once to regenerate the quoted version and the initialization of data[], and a second time to print out the contents of the array, thus regenerating its own source code: int main(void) { unsigned i; printf("\n#include <stdio.h>\n\nchar data[] = {\n"); for (i = 0; i < sizeof(data); i++) { if (i%8 == 0) putchar(' '); printf(" 0x%02x,", data[i]); if (i%8 == 7) putchar('\n'); } printf("\n};\n\n"); for (i = 0; data[i]; i++) putchar(data[i]); return 0; } I have some other similar examples under http://www.plover.com/~mjd/misc/quine/ David Madore's wonderful web page at http://www.madore.org/~david/computers/quine.html discusses this in greater detail. I highly recommend it. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|