Author:
- Name: Qiming HOU
Location: CN - People’s Republic of China (China)
To build:
make
To use:
./hou syntax-file file-to-process
Try:
./try.sh
View the remarks.htm
file in your web browser.
The result is best viewed in a 80x25 console with ANSI colors.
Try (Windows):
copy header.htm hou.htm /y
hou html.txt hou.c >> hou.htm
start hou.htm
Judges’ remarks:
What a versatile entry! It can be used to check the size of IOCCC entries and to publish them as HTML.
For extra credit: what is the meaning of the number 2321237
?
Author’s remarks:
Disclaimer
The reviewer may have already noticed that the 2nd page of this program is a dense blob, which is discouraged by the contest rules. We wish to point out that a majority of state-of-art programming editors support syntax highlighting, which should be enabled when reading this entry. Anticipating that the reviewer’s preferred color setting may produce a suboptimal visual effect, a few syntax files is provided to highlight the source code under an author-provided setting, using the submitted program itself. Syntax highlighting would also visually improve the 3rd page.
The code may throw a few warnings due to formatting constraints. The author did try to remove warnings under gcc and Visual Studio, though.
The author does not own the original anime series where the embedded ASCII arts come from. For comparison, original non-ASCII art depictions of the relevant characters and objects can be found by googling the embedded text messages.
Syntax files
The syntax file consists of a number of rewriting rules. Each rule consists of a
regular expression, a space, a format string, and a newline. All text matching
the regular expression would be replaced by the format string. In case of a
conflict, rules appearing earlier would take precedence. One can refer to the
original text in the format string using %s
. Other %
characters in the
format string must be escaped with another %
. The space character can be used
in the regular expression by escaping it with []
or ""
.
The following regular expression operators are supported:
() [] * + ? | ""
For example, one can use the following expression to match a certain declaration statement in hou.c:
"char"[ *]*[a-zA-Z_][0-9a-zA-Z_]*[ ]*((=[0-9a-zA-Z_ ]+)|(\[[0-9a-zA-Z_ ]*\]))?[ ]*(,[ *]*[a-zA-Z_][0-9a-zA-Z_]*[ ]*((=[0-9a-zA-Z_ ]+)|(\[[0-9a-zA-Z_ ]*\]))?[ ]*)*;
The regex engine is also algorithmically efficient. To illustrate the point, ansi.txt contains a pathological expression 1 that guarantees a hang for the competing Perl engine while matching itself. Try to compare these two engines:
./hou ansi.txt ansi.txt
perl patho.pl < ansi.txt
Limitations
Finally, there are a few limitations…
- The only escape sequence supported is
\n
and it doesn’t work in[]
. - The maximum file/rule size is a linear function of the hard-coded constant
M
. - Single-character matches at the end of file may be missed.
- The operator
|
doesn’t obey precedence rules and some extra()
s may be required. - Incorrect syntax files are not tolerated.
- The program relies on ASCII.
Why obfuscated
The first layer of obfuscation comes from the challenge of embedding a large
ASCII art in a dense blob. The entire art is composed using only keywords,
strings and character constants. That results in a number of otherwise useless
#define
s and quite a number of unconventionally written constants. While the
preprocessor may remove the former, the later would remain regardless of
beautification and preprocessing.
The second layer of obfuscation comes from the need to squeeze a reasonably powerful regex engine into the remaining area that isn’t occupied by excessive keywords and useless constants. To achieve this, the regex compiler is written using the same threaded virtual machine 2 that parses the regex. Due to compiler limitations, the generated VM code also ends up obscured by a large amount of spaghetti branches and virtual thread creations. Finally, the VM in 3 is extended to track multiple expression matches for the actual formatting.
As an extra tweak, the text message actually does something useful. Remove the “Make a contract with me” catch line and the program would cease to function.
References
Footnotes
Inventory for 2012/hou
Primary files
- hou.c - entry source code
- Makefile - entry Makefile
- hou.orig.c - original source code
- ansi.txt - sample input with a pathological expression
- chk.txt - sample input
- hint.html - hint document
- html.txt - sample input
- markdown.txt - sample input
- patho.pl - perl program - takes a long time to process ansi.txt
- try.sh - script to try entry
- header.htm - HTML header
- perl-notes.txt - time command diff of patho.pl and prog
Secondary files
- 2012_hou.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- hint.md - markdown source for hint.html
- .path - directory path from top level directory
- index.html - this web page