Author:
- Name: Adrian Cable
Location: US - United States of America (United States)
To build:
make
Bugs and (Mis)features:
The current status of this entry is:
STATUS: INABIAF - please DO NOT fix
For more detailed information see 2013/cable2 in bugs.html.
To use:
./cable2 file.bmp [color]
Try:
For different OCR examples:
./try.sh
Judges’ remarks:
We can confirm that this is the first time the IOCCC has been used to peer-review a new research. We had quite a bit of fun with a sharpie and a scanner.
Author’s remarks:
OCR - Obfuscated Character Recognition of Handwritten Text
This entry takes a BMP image file of hand-drawn (mouse-drawn?) text, specified as the first command-line parameter, and converts it to an ASCII text document. Magic!
BMP files created by most paint programs should work. The author recommends Paint.NET.
Features
- Every printable ASCII character is recognized (94 characters plus space), plus a special bonus character.
- Input images can be any size (limited only by memory), containing any number of lines of text.
- BMP is really a mess of different formats, but the program supports most of them from Windows 3.1 and later.
- Hand-drawn characters in the input image can be variably spaced and positioned and be of any width or height, but the bigger the better for accurate recognition (300 pixels tall or more is recommended). Characters must be drawn with a stroke width of around 9 pixels, and separated from adjacent characters. (In Paint.NET a brush of width 9 works well, with anti-aliasing turned off.)
- You can use any colors you like for the text and background, even different colors for different text characters.
- If “color” is specified as the second command line parameter, the program’s output will also be in color, on ANSI/VTxxx terminals.
Why is this entry obfuscated/interesting?
- The source is presented as a beautiful moonlit scene, in which a quick brown fox jumps over a lazy dog. Far too often in software development is aesthetic nuance lost in the name of serious-sounding conventions like “GNU style”, but not here.
- Newcomers to C find it hard to learn all those different ways to control flow:
for
,while
,if
,do
,goto
,continue
,break
and heaven knows what else! So, in this program we only usefor
, so absolute beginners can get into the code straight away. - To teach newcomers all the important features of C, we demonstrate the
importance of the liberal use of short circuits, sequence points, the ternary
operator, using
x^y
orx-y
instead ofx!=y
, using~x
in place ofx!=-1
for conciseness, mixingx[y]
andy[x]
for variety, educational#define
s, and so on. main()
is the most useful function in all of C - so it is a mystery to the author why most programs use it only once. Here we use it over and over for maximum benefit.- The number 42 is featured in the source to provide an answer to the judges’ deeper questions.
- How are character colors converted from RGB in the input image to ANSI escape
sequences? Clue: the weird macro
P
does the interesting part of it, but how? - The program prints a string when the bonus character is detected. How does it
do this? Clue: the string to print is defined in the macro
$
, but how can a string defined in that way ever get printed?
Other notes
- The novel character recognition algorithm used has not been previously published and was developed by the author specifically for the contest. Should this entry win the IOCCC, it will be the first time (to the author’s knowledge) that the IOCCC has been used for peer review/publication of original research.
- OCR in general is a hard problem. Here, we only recognise one possible form for each character. You can see what the supported character shapes look like by looking at the examples in ascii.bmp. If you copy the shape of the character forms you find in there, recognition accuracy should be very good (95%+). A more “useful” version would extend the character stroke table to support multiple different commonly-used forms for each character.
- Despite being the largest possible IOCCC entry at 4096 bytes, it is also probably the smallest general-purpose OCR program ever written (including the character stroke data), maybe by several orders of magnitude.
- The author used an interesting compression algorithm (suggested by Gareth McCaughan - thank you!) to squeeze the character stroke table (containing around 2000 strokes for the whole ASCII character set) into a string constant of just 472 characters.
Bugs/features
- The width of input images must be a multiple of 4.
- Input images can be 8-bit greyscale, 24-bit color or 32-bit color only. 8-bit color and 16-bit color are not supported. If you use the “color” command-line feature, do not use a greyscale input file!
- Normal Windows BMP files are stored “upside-down”, i.e. the bottom line in the image is written first. Some (old) graphics programs actually write BMP files “top-to-bottom” and the program does not support such files.
- Using anti-aliased brush strokes to draw your letters is fine; however, antialiasing interferes with color detection in “color” mode.
- Only runs on little endian machines (since the BMP format is little endian, and endianness conversion would make the source too large for IOCCC rule 2).
Compiler warnings
clang
warns about unused expression results, missing type specifiers, and
incompatible pointer conversions - all just a fun consequence of the
obfuscations.
Inventory for 2013/cable2
Primary files
- cable2.c - entry source code
- Makefile - entry Makefile
- cable2.orig.c - original source code
- try.sh - script to try entry
- ascii.bmp - PC bitmap image
- bonus.bmp - PC bitmap image
- hello_world.bmp - PC bitmap image
- mixed_sizes.bmp - PC bitmap image
- multi_color.bmp - PC bitmap image
- typeset.bmp - PC bitmap image
Secondary files
- 2013_cable2.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- .path - directory path from top level directory
- index.html - this web page