IOCCC image by Matt Zucker

The International Obfuscated C Code Contest

1996/august - Best of Show

Subset of C compiler and byte code interpreter

Author:

To build:

    make all

To use:

    cat august.c test.oc | ./august > test.oo
    ./august < test.oo

The above should print a ! followed by a newline.

Try:

    ./try.sh

Judges’ remarks:

This entry can feed on itself. If you C pre-process the source, you can compile the interpreter:

    make august.oc

We can now compile the interpreter as follows:

    cat august.c august.oc | ./august > august.oo

and use the compiled interpreter to execute some previously compiled code:

    cat august.oo test.oo | ./august

And we can have the compiled interpreter interpret itself which in turn compiles the test program:

    cat august.oo august.oo fac.oo | ./august

And if you have lots of spare time, you can recurse one level deeper:

    cat august.oo august.oo august.oo fac.oo | ./august

We (the judges) recommend that you spend some time studying this entry as it surely will make the ‘best of the IOCCC’ list. It was very well received by those who attended the IOCCC BOF.

Author’s remarks:

This program is a bytecode interpreter. This fact is not particularly obfuscated; any experienced C programmer can see that in about a minute.

The interpreter reads the bytecode from the standard input. What remains of the standard input after this is left as standard input for the interpreted program.

Having an interpreter but no code isn’t any fun. But within a comment in the beginning of the interpreter is a sequence of bytes that is code that the interpreter understands. It is in fact the code for a compiler that compiles from OC (Obfuscated C, a subset of C) into the byte code that the interpreter understands.

The language the compiler understands is a subset of C (the grammar is below) with the functions getchar(3) and putchar(3) available. The subset has the types int, char, and pointers to those (it has function pointers too, but not with the proper syntax). Arrays can only be global. There is a rather limited set of operators, and control constructs.

The compiler has absolutely no error checking, so things can go wrong. But this is what seasoned C programmers are used to.

Assume that the interpreter has been compiled (source in august.c and binary in august) and that we have this in a file named test.oc:

    main() { putchar('!'); putchar('\n'); exit(0); }

We can then compile by

    cat august.c test.oc | ./august > test.oo

And run the compiled program by

    ./august < test.oo

Or even simpler:

    cat august.c test.oc | ./august | ./august

A larger example is included fac.oc.

The compiler for OC is naturally written in OC (where did you think the byte code came from?). It is in the file parse.oc.

Hmm… So now we have an interpreter written in OC and a compiler for OC, can we compile the interpreter? Yes, we can! The OC compiler does not have a preprocessor (hey, what do you expect from a program that is this small), so we need to use an external one. To pre-process august.c use the command from the Makefile, but add the -E flag and change 60000 to 40000, i.e.:

    cc -E -DZ=40000 .... august.c > august.oc

If august.oc contain some junk line starting with a # (most likely it does) then remove it like:

    sed -i'' '/^#/d' august.oc

OK, we can now compile the interpreter:

    cat august.c august.oc | ./august > august.oo

And we can run it:

    cat august.oo test.oo | ./august

Here we have the interpreter interpreting another interpreter that runs the program. Did you think that was too fast? Just throw in another level of interpretation:

    cat august.oo august.oo test.oo | ./august

And another…

    cat august.oo august.oo august.oo test.oo | ./august

Fitting the byte code for the compiler into the initial comment was a challenge. I had to strip the compiler/interpreter of a lot of functionality to bring down their sizes. The initial version had a full set of binary operators as well as prefix &. Given a handful more bytes it would be easy to put them back.

Anyway, the bytecode from the compiler is first optimized in an external optimizer (not supplied here); this brings down the size of it by about 30%. The remaining code is then LZ-compressed (roughly) with a sliding window of 255 bytes. The most common occurring bytes are then encoded using " \n\t;{}" to avoid having them counted as significant. About 200 bytes of the interpreter handles the unpacking of the byte code (this is a lot, but it pays off).

Despite all this work I still had to put a lot of code in the compilation command. This disturbs me. It is possible to compress the code further with a better entropy coder (e.g. an arithmetic coder), but decompression then takes up too much code. Maybe this would be a way to fit it all into the 1536 bytes, but I don’t have time to try it.


Execution starts at main(). Nothing can be used before it is defined. getchar() and putchar() are predefined functions.

Below is a description of the OC grammar.

OC grammar

Terminals are in quotes, () is used for bracketing.

    program:    decl*

    decl:       vardecl
                    fundecl

    vardecl:    type NAME ;
                    type NAME "[" INT "]" ;

    fundecl:    type NAME "(" args ")" "{" body "}"

    args:       /*empty*/
                    ( arg "," )* arg

    arg:        type NAME

    body:       vardecl* stmt*

    stmt:       ifstmt
                    whilestmt
                    dowhilestmt
                    "return" expr ";"
                    expr ";"
                    "{" stmt* "}"
                    ";"

    ifstmt:     "if" "(" expr ")" stmt
                    "if" "(" expr ")" stmt "else" stmt

    whilestmt:  "while" "(" expr ")" stmt

    dowhilestmt:"do" stmt "while" "(" expr ")" ";"

    expr:       expr binop expr
                    unop expr
                    expr "[" expr "]"
                    "(" expr ")"
                    expr "(" exprs ")"
                    NAME
                    INT
                    CHAR
                    STRING

    exprs:      /*empty*/
                    (expr ",")* expr

    binop:      "+" | "-" | "*" | "/" | "%" |
                    "=" |
                    "<" | "==" | "!="

    unop:       "!" | "-" | "*"

    type:       "int" stars
                    "char" stars

    stars:      "*"*

Well, that is enough ranting for now.

Inventory for 1996/august

Primary files

Secondary files


Jump to: top