Author:
- Name: Anthony C. Howe
Location: CA - Canada
To build:
make
There are two alternate versions provided by the author. See Alternate code and the author’s remarks below for more details.
Bugs and (Mis)features:
The current status of this entry is:
STATUS: known bug - please help us fix
For more detailed information see 2015/howe in bugs.html.
To use:
./prog file1 file2
./prog -d file1 file2
Try:
./try.sh
Alternate code:
This alternate version does not use a 64-bit FNV hash. This was the original entry but was changed to be the alt version. Later, in 2024, the author provided an unobfuscated version which is in prog.alt2.c.
Alternate build:
To build these versions:
make alt
Alternate use:
Use prog.alt
and/or prog.alt2
as you would prog
above.
Alternate try:
./try.alt.sh
Judges’ remarks:
This is the best use of the FNV that we have seen in the IOCCC so far!
The output, when used without -d
, is compatible with POSIX diff(1)
and
is suitable for use with patch(1)
.
We welcome back Canada to the list of nations from where winning entries have been submitted.
Is this code a bug or a feature? :-) Or is this an attempt to corrupt the programming of our youth? Should we heed Kyle’s mom words that she uttered during a South Park P.T.A. Meeting?
“We must stop dirty (C) language from getting to our children’s ears!”
Or should we teach our youth to understand the intricacies of this code? Ying Tong Iddle I Po! We suggest you read the source for yourself, which might be easier than the academic papers it was inspired by.
NOTE: Unlike the original entry source, prog.alt.c, prog.c uses a 64 bit FNV hash and fixes a function call warning.
Author’s remarks:
Features
“An O(NP) Sequence Comparison Algorithm” by Wu, Manber, Myers, Miller.
Output compatible with
patch(1)
.Strokes one of the judge’s ego (I couldn’t email chocolate or curry).
Description
This is a functioning micro diff(1)
tool using an O(NP) algorithm,
compared to the older O(ND) difference algorithm used by some versions
of diff(1)
. Its output is based on the default diff(1)
output described by POSIX
and the Open Group Base Specifications. The output is suitable for use
with patch(1)
.
The -d
option simply writes the edit distance between file1
and file2
.
Observations
The FNV1a hash is a little slow compared to the trivial hash GNU
diff uses. I downloaded a plain text
copy of “War And Peace” from Project Gutenberg,
used makeholes.c to make 1000 random changes, then profiled and
timed the program verses GNU diff(1)
. The bottleneck appears to be in the
file I/O and line hashing with an average +0.05s slower. Using a huge file like
“War And Peace” for testing offsets the diff(1)
optimised file I/O.
There is no hash collision checking, partly because FNV1a appears to generate very few collisions and an assumption that localised collisions within a region of edits are highly unlikely.
Support Files
prog-test.sh Basic test program verifies known test edit distances and
patch(1)
support.avgtime.sh Run
time(1)
on a commandN
times to determine its average runtime.makeholes.c Random edits (holes) made to a file in-place.
References
Wu, Manber, Myers, and Miller; August 1989;
“An O(NP) Sequence Comparison Algorithm”;
http://myerslab.mpi-cbg.de/wp-content/uploads/2014/06/np_diff.pdf
Fowler, Noll, Vo; 1994
http://www.isthe.com/chongo/tech/comp/fnv/index.html
Fowler, Noll, Vo on Wikipedia
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
“diff”, The Open Group Base Specifications Issue 7
IEEE Std 1003.1, 2013 Edition
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
Eugene W. Myers; “An O(ND) Difference Algorithm and Its Variations”;
Algorithmica, 1986, pages. 251-266
http://www.xmailserver.org/diff2.pdf
Webb Miller and Eugene W. Myers; “A File Comparison Program”;
Software-Practice And Experience, Vol. 15(11). 1025-1040 (November 1985)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.189.70&rep=rep1&type=pdf
D.S. Hirschberg, “A linear space algorithm for computing maximal common subsequence problem”;
Comm. of ACM, Vol. 18, June 1975, pages 341-343
http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/DynProg/Docs/Hirschberg-LCS-1975.pdf
Which hashing algorithm is best for uniqueness and speed?
http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
Inventory for 2015/howe
Primary files
- prog.c - entry source code
- Makefile - entry Makefile
- prog.alt.c - author provided source code without 64-bit FNV hash
- prog.orig.c - original source code
- avgtime.sh - report average time running a command N times
- makeholes.c - C code to add random holes to a file
- prog.alt-test.sh - script to run test suite with alternate code
- prog-test.sh - script to run test suite
- try.alt.sh - script to try alternate code
- try.sh - script to try entry
- cc.1 - cc man page
- err.h - author provided header file for unobfuscated code
- prog.alt2.c - author provided unobfuscated code
- war-and-peace.txt - War and Peace translated to English
Secondary files
- 2015_howe.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- .path - directory path from top level directory
- index.html - this web page