Mark Dominus on 18 Oct 2005 17:27:07 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Self-printing programs are easy to write

A popular pasttime of programmers is to write a program that prints
out its own source code.  Solutions to this problem are more or less
contrived.  But a deep theorem of computer science says that this is
no trick.  *Any* reasonable programming language is sufficiently
powerful to write a program that can print its own source code, and in
a standard and formulaic way.  (The only programming languages not
powerful enough to do this are also not powerful to do a lot of other
things that are more important, like arithmetic.)

In fact, not only is it true that programming languages can write
programs that print their own source code, they can do more.  In any
reasonable programming language, for any program P, it is possible to
write a version of P that has the exact same behavior as P but that
can also print out its own source code.

This is the "recursion theorem" of computer science.  The recursion
theory is easy to prove, but the proof is usually presented involving
fixed points of partial recursive functions.  It is really only
illuminating once you translate it into a conventional programming

The essence of the proof is this.  We want to write a program that
prints its own source code.  We can do this by writing the program in
two parts, A and B.  Part A is just a quoted string that contains the
source code for part B.  Part B is a function that does the actual
writing.  It acquires the string from part A.  Then it prints a quoted
version of this string---thus printing part A---and an unquoted
version---this printing part B.  B might also need to print some
decorations; for example, if parts A and B are separated by some
newlines, then the B function will need to print some newlines in
between the quoted and the unquoted versions of the string.

So, for example, consider Perl..  In Perl, quoting a string is easy.
If some text has balanced curly braces, you can quote it by just
writing it between "q{" and "}".  So function B will look like this:

        sub B {
          print "\$A = ";
          print "q{$A};\n";
          print "$A\n";
          print "B();\n";

You can see that it is printing a quoted version of $A (that's the
second print) and then an unquoted version (that's the third print)
and also some decorations: the assignment to $A at the beginning
(that's the first print) and a call to function B at the end (that's
the fourth print) and also some necessary punctuation in between
(newlines, semicolons, and such.)

String $A should contain the source code for sub B, like this:

        $A = q{sub B {
          print "\$A = ";
          print "q{$A};\n";
          print "$A\n";
          print "B();\n";

I just took the code for sub B and stuck it between q{...}.

And the program needs to call function B, or nothing will be printed:


and indeed, this is a program that prints out its own source code.
(Complete code is at .)

There are many such programs. For example, we can change B to use a
single print statement instead of using four:

        sub B {
          print "\$A = q{$A};\n$A\nB();\n";

and then A, which should contain B's source code, must change

        $A = q{sub B {
          print "\$A = q{$A};\n$A\nB();\n";

but it still works; this version is at .

In both these programs, I relied on Perl's convenient "q{...}"
construction, which is simple, so that the resulting program is
small.  So it might appear that I am depending on a language gimmick.
But no, the q{...} construction is inessential.  For example, I can
use Perl's ordinary double-quotes instead.  The only trick is that to
quote a string in double quotes is not quite as simple as to quote it
with "q{...}"---if the string contains quotes, backslashes, or dollar
signs, these must be backslashed.  No problem; Perl is good at such

        sub b {
          my $s = shift;
          my $q = $s;
          $q =~ s/([\$\\\"])/\\$1/g;
          print "\n\n\$a = \"$q\";\n\n$s\n\nb(\$a);\n";

Here the b function gets an argument string, $s, and makes a copy, $q.
It then prepares $q to be double quoted, by putting a backslash before
every '$', '\', or '"' character; that's the third statement.  Then it
prints out the quoted version $q and the unquoted version $s, plus
some  decorations, such as the call to b($a) that invokes
function b on its own source code.  The complete program is at .

If you think about all the ways of writing string data in a program,
you will realize how many ways there are to quote strings. is a version in C that
encodes the source code of the B function in the form:

        char data[] = {
          0x69, 0x6e, 0x74, 0x20, 0x6d, 0x61, 0x69, 0x6e,
          0x28, 0x76, 0x6f, 0x69, 0x64, 0x29, 0x20, 0x0a,
          0x74, 0x75, 0x72, 0x6e, 0x20, 0x30, 0x3b, 0x0a,
          0x7d, 0x0a, 0x00,

Then the B function (named "main" in this version) scans this array
twice, once to regenerate the quoted version and the initialization of
data[], and a second time to print out the contents of the array, thus
regenerating its own source code:

        int main(void) 
          unsigned i;
          printf("\n#include <stdio.h>\n\nchar data[] = {\n");
          for (i = 0; i < sizeof(data); i++) {
            if (i%8 == 0) putchar(' ');
            printf(" 0x%02x,", data[i]);
            if (i%8 == 7) putchar('\n');
          for (i = 0; data[i]; i++)
          return 0;

I have some other similar examples under

David Madore's wonderful web page at

discusses this in greater detail.  I highly recommend it.

**Majordomo list services provided by PANIX <URL:>**
**To Unsubscribe, send "unsubscribe phl" to**