Author:
- Name: Adrian Mariano
Location: US - United States of America (United States)
To build:
make all
Bugs and (Mis)features:
The current status of this entry is:
STATUS: INABIAF - please DO NOT fix
For more detailed information see 1992/adrian in bugs.html.
Try:
./try.sh
For the slow minded, try:
./adsleep 32767
Judges’ remarks:
Once you get past the obfuscation, you have an opportunity to learn about regular expressions and state machines.
NOTE: Some compilers have had trouble optimizing this entry.
Author’s remarks:
ADrian’s GREP (adgrep)
For those confused by the complexity of full-blown egrep(1)
style regular
expressions, this program offers an alternative. It implements an equivalent
search, using a deterministic finite automaton.
A deterministic finite automaton consists of a finite set of states, along with transition rules to move from one state to another, an initial state, and a set of accepting states. The automaton takes a string as input and begins in the start state. It reads a character of the string and consults the rules for the current state, moving to the new state indicated by the appropriate rule. This process is repeated until the string is consumed. If the current state at this point is one of the accepting states, then the string is accepted.
The deterministic finite automaton is specified as a series of rules for each state:
<state> chars1 <dest1> chars2 <dest2> ...
chars1
is a list of characters (only the first 8 are significant) which
should trigger a transition to <dest1>
. <dest1>
is another state which
should have a similar specification somewhere. A state is accepting if
it is specified in square brackets: [final]
; state strings are
significant to only eight characters like chars1
.
Example 1: matches ^abc$
<q0> a <q1> The first state to
appear is the
start state
<q1> b <q2>
<q2> c [q3]
[q3]
Technically, a deterministic finite automaton should have a rule for each
possible input character at each state. To simplify descriptions of the
automata, if no rule is present, the string will not be accepted. Also,
the .
character matches any character if it occurs first in the
character list.
Example 2: ^abc
<q0> a <q1>
<q1> b <q2>
<q2> c [q3]
[q3] . [q3]
Example 3: abc$
<q0> a <q1> . <q0>
<q1> b <q2> a <q1> . <q0>
<q2> c [q3] a <q1> . <q0>
[q3] . <q1>
Example 4: ^(abc)*$
[q0] a <q1>
<q1> b <q2>
<q2> c [q0]
Example 5: ^[ab][cd][ef]$
<q0> ab <q1>
<q1> cd <q2>
<q2> ef [q3]
[q3]
Example 6: ^(abc|efg)$
<q0> a <q1> e <q3>
<q1> b <q2>
<q2> c [q5]
<q3> f <q4>
<q4> g [q5]
[q5]
With the automaton specification in filename
, invoke the program by
typing
./adrian filename
It will read the file and print out all the lines which the automaton
accepts. If the file cannot be opened, a system error message will
be printed. If the input contains errors, then an error message along
with the number of the offending line will be printed to stderr. The
number of rules for each state is limited to 17. If more than 17 rules
are present, you get the error too_many_rules
, and the state that was
being processed is printed. Error no_destination
occurs if you specify a
set of characters, but no destination state, and error too_many_states
occurs if your automaton has more than 257 states.
Running:
./adrian from < your_mailbox
will perform a function similar to that of the unix from
command.
If no filename is specified on the command line, then __FILE__
is used as the
specification for the automaton. (Originally this was "adgrep.c"
but the file
was renamed to adrian.c
by the judges and for more portability the arg was
changed to __FILE__
.) In this case, the program will search for matches to
the regular expression:
^.[^|C][^w[Q]*(Q|[w[]c).*|^.[C|]$
I suggest using adrian.c
as input, and storing the output in adwc.c
:
./adrian < adrian.c > adwc.c
Compiling the new file, adwc.c
, yields a clone of the unix wc
command. It
runs on one file, defaulting to "adrian.c"
if no file is given (again, this
was changed to be __FILE__
) and displays the number of lines, words, and bytes
in the input file.
Another possibly interesting automaton can be created by slightly
adjusting the adrian.c
file. Change the first line to read
/* . echo| . */
and repeat the process above
./adrian <adrian.c > adecho.c
The new file now contains all lines which match
^.[^5|m^]*[m^]\([e=p,;]|[^e=+p,;].*\)$
Compile and run. This is an echo(1)
clone. Note the efficient algorithm
employed.
Two other adjustments to the first line also yield useful results. By changing it to
/* . head; . */
you can search for matches to
^.[^W]*W..*$
By some freak happenstance, lines of adrian.c
which match this regular
expression form a unix head(1)
command. It prints the first ten lines of
the file specified on the command line or adrian.c
if no file is
specified (again this was changed to __FILE__
).
By setting the first line to
/* . basename . */
a clone of the unix basename(1)
command can be unearthed. The automaton will
search for
^.[^j]*jr.*$
on standard input. And the program which results by running adrian.c
through this filter requires two parameters. The first is meant to be a
filename, and the second, an extension. All leading pathname components
are removed from the filename, and the extension is removed if present.
The resulting base name is printed to stdout.
Lastly, by setting the first line to
/* . sleep . */
you can search for
^.[^(~][^s]*sl.*$
Filtering adrian.c
through this search yields a clone of the sleep(1)
command. Invoke with a single integer parameter, and it will pause for that
many seconds.
If either adbasename
or adsleep
is invoked with too few parameters,
the program will print the error message:
Segmentation fault (core dumped)
(The exact text of the above error messages varies from machine to
machine.) The four programs which read from stdin
require lines
shorter than 999 characters.
The other info files are adrian.grep.[1-6]
which contain the six
examples that appear above, and from, which is used to emulate the
unix from
command. For reasons of clarity, the name "from"
should
probably not be changed if possible. I wouldn’t want to be accused of
confusing people by giving the input files weird names.
If you want to change the default input filename (line 80) you must be
careful to choose a name that doesn’t match the wrong search patterns,
introducing extra lines into one of the programs (again, this was changed to be
__FILE__
).
The program will produce at least one warning and possible several when compiled depending on the compiler.
Inventory for 1992/adrian
Primary files
- adrian.c - entry source code
- Makefile - entry Makefile
- adrian.orig.c - original source code
- adrian.grep.1 - agrep script
- adrian.grep.2 - agrep script
- adrian.grep.3 - agrep script
- adrian.grep.4 - agrep script
- adrian.grep.5 - agrep script
- adrian.grep.from - agrep script
- adrian.grep.try - agrep script
- try.sh - script to try entry
Secondary files
- 1992_adrian.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- .path - directory path from top level directory
- index.html - this web page