Author:
- Name: Yusuke Endoh
Location: JP - Japan
To build:
make
Bugs and (Mis)features:
The current status of this entry is:
STATUS: INABIAF - please DO NOT fix
For more detailed information see 2013/endoh1 in bugs.html.
To use:
./endoh1 [file.lazy]
Try:
./try.sh
You might also enjoy running:
cc -E endoh1.c | less
Judges’ remarks:
We liked this entry because it can serve as a standalone program as well as an include file; and, as usual, because its input data is at least as obfuscated as the program itself.
This entry can be considered an abuse of the GCC’s optimizer; it takes GCC about 4x longer to compile it with -O3 than without, compared to clang’s 2x.
Author’s remarks:
Remarks
less endoh1.c
gcc -o endoh1 endoh1.c
./endoh1
… Isn’t there any more to it?
Yes, of course.
This is a tool for C programmers to play the SKI combinator calculus, especially, Lazy K.
This program will shine when it is used as a library. For example, hello.lazy is a “Hello world” program written in Lazy K.
$ lazy hello.lazy
Hello, world!
where lazy
is a reference implementation of Lazy K. At the same time,
hello.lazy is a valid C program that uses endoh1.c
as a library.
$ gcc -o hello -xc hello.lazy
$ ./hello
Hello, world!
In other words, this program is a kind of polyglot.
The usage is simple: you just have to wrap Lazy K program with #include "endoh1.c"
. Note that #
is a comment in Lazy K.
Obfuscation
… is inherent in SKI combinator calculus :-)
In addition, it uses various hacks to parse SKI code as C, and to satisfy IOCCC’s size rule.
- Abuse of function pointers.
- Code duplication by macros.
- Short coding.
These led to the good obfuscation.
See the How this entry works section in detail, if you need.
Limitation
This program supports only “Combinator-calculus style notation” of Lazy K. “Unlambda style” and “Iota and Jot” style are not supported.
Also, it requires a space between identifiers. In short, use (S K)
instead of
(SK)
, “`sk”, **i*i*i*ii*i*i*ii
, or 11111100011100
.
Huge memory may be required to compile the program (about 300 MB on my machine).
In addition, there are some limitations (and workarounds) mentioned in the How this entry works section.
But I think it would only matter when you run the attached programs.
Portability
I confirmed that the program successfully worked with the following compilers:
- gcc 4.6.3 [Linux, Mac OS X]
- clang 3.0 [Linux, Mac OS X]
- tcc 0.9.26 [Linux]
- N.B. tcc 0.9.25 seems to have a bug and fails to run it.
Recent compilers with -Wall -W -Wextra -pedantic
say nothing.
gcc -Wall -W -Wextra -pedantic prog.c
clang -Wall -W -Wextra -pedantic prog.c
tcc -Wall -W -Wextra -pedantic prog.c
I think it will work on almost all platforms. I confirmed:
- Ubuntu Linux
- Mac OS X
- Raspberry PI
To check whether the program is specified on the command line or included from
another source file, the program uses a predefined macro __INCLUDE_LEVEL__
.
It is a gcc extension, and also supported by clang. If your compiler does not
support it, you cannot use the program as a library. But you can compile and
run it as a standalone program, at least. In fact, tcc does not support the
macro, but does work.
NOTICE to those who wish for a greater challenge:
If you want a greater challenge, don’t read any further: just try to understand the program via the source.
If you get stuck, come back and read below for additional hints and information.
How this entry works:
The whole program is interpreted by macro expansion.
For example, S (K I)
is translated to a normal C code, (s)((k)(i))
, or
simply s(k(i))
.
This expression returns an abstract syntax tree, and endoh1.c evaluates it.
This program uses a very simple “term rewriting” approach for evaluating SKI combinator calculus. The rewriting rules are shown in the shape of the code.
Abuse of function pointers
Consider a sequence of function applications in C:
s(x)(x)(x)(x)...
What type should s
have? Unfortunately, C does not provide a “recursive
type”, such as typedef f (*f)();
.
So, I used a heavily nested function pointer type:
typedef void *(*(*(*...(*(f))()...)())())
This limits how many arguments one function can be consecutively applied to.
But you can increase the number by tweaking the definition of macro p
.
Code duplication by macros
Next, we need to encode “closures”. A closure is a function together with an environment which is a reference to non-local variable.
In this case, we need something that meets all of the following criteria:
- Is callable itself.
- Internally possesses a reference to another closure as a non-local variable.
- Applies the internal closure to an argument when it is called.
However, no type in C is callable and has a reference at the same time. (It is feasible by dynamic code generation, but it is far from portable.)
So, I addressed this issue by generating many function definitions statically by (ab)using macros:
void *x1; void f1(void *y) { return apply(x1, y); }
void *x2; void f2(void *y) { return apply(x2, y); }
void *x3; void f3(void *y) { return apply(x3, y); }
...
void *(h[]) = { f1, f2, f3, ... }
and by allotting each of them when a closure is needed.
This leads to another limitation: the number of pre-defined closures limits how
many identifiers (S
K
I
) one program can use.
But you can increase this number by tweaking the definition of macro A
B
C
and D
. (Note that closures are allotted only when parsing; after the
evaluation starts, “out of closure” cannot occur.)
Short cording
This margin is too narrow to contain a detailed explanation. Instead, I just
ask you one question. Can you tell what v s[]={0,0,s+6,s+2,s+4,s,s+3,s+5,s+1};
is? I found this by using an SMT
solver.
Inventory for 2013/endoh1
Primary files
- endoh1.c - entry source code
- Makefile - entry Makefile
- endoh1.orig.c - original source code
- try.sh - script to try entry
- echo.lazy - lazy input file
- hello.lazy - lazy input file
- rot13.ed - rot13 ed script
- tac.lazy - lazy input file
Secondary files
- 2013_endoh1.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- .path - directory path from top level directory
- index.html - this web page