make
The current status of this entry is:
STATUS: INABIAF - please **DO NOT** fix
For more detailed information see 2013 endoh1 bugs.
./endoh1 [file.lazy]
./try.sh
You might also enjoy running:
cc -E endoh1.c | less
We liked this entry because it can serve as a standalone program as well as an include file; and, as usual, because its input data is at least as obfuscated as the program itself.
This entry can be considered an abuse of the GCC’s optimizer; it takes GCC about 4x longer to compile it with -O3 than without, compared to clang’s 2x.
less endoh1.c
gcc -o endoh1 endoh1.c
./endoh1
… Isn’t there any more to it?
Yes, of course.
This is a tool for C programmers to play the SKI combinator calculus, especially, Lazy K.
This program will shine when it is used as a library. For example, hello.lazy is a “Hello world” program written in Lazy K.
$ lazy hello.lazy
Hello, world!
where lazy
is a reference implementation of Lazy K. At the same time,
hello.lazy is a valid C program that uses endoh1.c
as a library.
$ gcc -o hello -xc hello.lazy
$ ./hello
Hello, world!
In other words, this program is a kind of polyglot.
The usage is simple: you just have to wrap Lazy K program with #include "endoh1.c"
. Note that #
is a comment in Lazy K.
… is inherent in SKI combinator calculus :-)
In addition, it uses various hacks to parse SKI code as C, and to satisfy IOCCC’s size rule.
These led to the good obfuscation.
See Spoiler section in detail, if you need.
This program supports only “Combinator-calculus style notation” of Lazy K. “Unlambda style” and “Iota and Jot” style are not supported.
Also, it requires a space between identifiers. In short, use (S K)
instead of
(SK)
, “`sk”, **i*i*i*ii*i*i*ii
, or 11111100011100
.
Huge memory may be required to compile the program (about 300 MB on my machine).
In addition, there are some limitations (and workarounds) mentioned in Spoiler section.
But I think it would only matter when you run the attached programs.
I confirmed that the program successfully worked with the following compilers:
Recent compilers with -Wall -W -Wextra -pedantic
say nothing.
gcc -Wall -W -Wextra -pedantic prog.c
clang -Wall -W -Wextra -pedantic prog.c
tcc -Wall -W -Wextra -pedantic prog.c
I think it will work on almost all platforms. I confirmed:
To check whether the program is specified on the command line or included from
another source file, the program uses a predefined macro __INCLUDE_LEVEL__
.
It is a gcc extension, and also supported by clang. If your compiler does not
support it, you cannot use the program as a library. But you can compile and
run it as a standalone program, at least. In fact, tcc does not support the
macro, but does work.
The whole program is interpreted by macro expansion.
For example, S (K I)
is translated to a normal C code, (s)((k)(i))
, or
simply s(k(i))
.
This expression returns an abstract syntax tree, and endoh1.c evaluates it.
This program uses a very simple “term rewriting” approach for evaluating SKI combinator calculus. The rewriting rules are shown in the shape of the code.
Consider a sequence of function applications in C:
(x)(x)(x)(x)... s
What type should s
have? Unfortunately, C does not provide a “recursive
type”, such as typedef f (*f)();
.
So, I used a heavily nested function pointer type:
typedef void *(*(*(*...(*(f))()...)())())
This limits how many arguments one function can be consecutively applied to.
But you can increase the number by tweaking the definition of macro p
.
Next, we need to encode “closures”. A closure is a function together with an environment which is a reference to non-local variable.
In this case, we need something that meets all of the following criteria:
However, no type in C is callable and has a reference at the same time. (It is feasible by dynamic code generation, but it is far from portable.)
So, I addressed this issue by generating many function definitions statically by (ab)using macros:
void *x1; void f1(void *y) { return apply(x1, y); }
void *x2; void f2(void *y) { return apply(x2, y); }
void *x3; void f3(void *y) { return apply(x3, y); }
...
void *(h[]) = { f1, f2, f3, ... }
and by allotting each of them when a closure is needed.
This leads to another limitation: the number of pre-defined closures limits how
many identifiers (S
K
I
) one program can use.
But you can increase this number by tweaking the definition of macro A
B
C
and D
. (Note that closures are allotted only when parsing; after the
evaluation starts, “out of closure” cannot occur.)
This margin is too narrow to contain a detailed explanation. Instead, I just
ask you one question. Can you tell what v s[]={0,0,s+6,s+2,s+4,s,s+3,s+5,s+1};
is? I found this by using an SMT
solver.