Author:
- Name: Sean Barrett
Location: US - United States of America (United States)
To build:
make all
There is an alternate version of this program. See Alternate code below for details.
To use:
First, you must make sure that first
is made first (even though make all
should make first first :-) ):
make first
Second:
# get help:
echo help | cat third help.th - | ./first
# try out a demo:
cat third demo5.th | ./first
Third:
cat third help.th - | ./first
NOTE: Wait until Ok
is printed and the type:
2 3 + . cr
Yes you should really type the 2 letters: cr
.
Fourth:
Sorry, this is third!
Try:
./try.sh
Alternate code:
This alternate version, buzzard.2.alt.c, does not contain a fix that the author notes in their remarks.
Alternate build:
make alt
Alternate use:
Use buzzard.2.alt
as you would buzzard.2
above.
Alternate try:
./try.alt.sh
Judges’ remarks:
Have fun!
Author’s remarks:
What it does
first
implements a relatively primitive stack machine. How
primitive? It supplies 13 visible primitives: 3 arithmetic,
1 comparison, 2 memory-access, 2 character I/O, 3 primitives
for defining new words, 1 tokenizing, and 1 special stack
operation. (There are also three internal operations for
the stack machine: ‘push this integer’, ‘call this code’,
and ‘compile a call to this code’.)
It is very difficult to accomplish anything with this set of primitives, but they do have an interesting property.
This–what this interesting property is, or in other words what first is good for–is the major obfuscation; there are also minor source obfuscations, as well as some design tricks that are effectively obfuscations. Details on the obfuscations are below, and the interesting property is discussed much further down.
How to run it
first
expects you to first enter the names of the 13 primitives,
separated by whitespace–it doesn’t care what you name them, but
if all the names aren’t unique, you won’t be able to use some of
them. After this you may type any sequence of valid first
input.
Valid first
input is defined as any sequence of whitespace-delimited
tokens which consist of primitives, new words you’ve defined, and
integers (as parsed by "%d"
). Invalid input behaves unpredictably,
but gives no warning messages. A sample program, demo1.1st
, is
included, but it only works on ASCII systems.
Do not expect to be able to do anything interesting with first
.
To do something interesting, you need to feed first
the file
third first. In unix, you can do
cat third help.th - | ./first
to do this. Hopefully most operating systems will provide a way to do this. It
may take some time for this to complete (I seem to remember it taking several
minutes on an 8086 PC); THIRD
will prompt you when it is finished. The file
third has not been obfuscated, due to sheer kindness on the author’s
part.
For more information on what you can do once you’ve piped third into
first, type help
and consult
FORTH manuals for
further reference. Six sample THIRD programs are included in the files
demo1.th
through demo6.th
. See buzzard.2.README.html
for more information.
Keep in mind that you are still running first
, and are for the most part
limited by first
’s tokenizer (notably, unknown words will attempt to be parsed
as integers.) It is possible to build a new parser that parses by hand, reading
a single character at a time; however, such a parser cannot easily use the
existing dictionary, and so would have to implement its own, thus requiring
reimplementing all of first
and third
a second time–I did not care to
tackle this project.
Compiling
first
is reasonably portable. You may need to adjust the size of the buffers
on smaller machines; m[]
needs to be at least 2000 long, though.
I say first
is portable mainly because it uses native types. Unlike
FORTH, which
traditionally allows byte and multi-byte operations, all operations are
performed on C int
s. That means first
code is only as portable as the same
code would be in C. As in C, the result of dividing -1
by 2
is machine (or
rather compiler) dependent.
How is first obfuscated?
first
is obfuscated in several ways. Some minor obfuscations like
&w[&m[1]][s]
for s+m[w+1]
were in the original source but are no longer
because, apparently, ANSI doesn’t allow it (gcc -ansi -pedantic
doesn’t mind
it, though.)
Other related obfuscations are still present. The top of the stack is cached in
a variable, which increases performance massively if the compiler can figure out
to keep it in a register; it also obfuscates the code. (Unfortunately, the top
of stack is a global variable and neither gcc nor most bundled compilers seem to
register
allocate it.)
More significant are the design obfuscations. m[0]
is the “dictionary
pointer”, used when compiling words, and m[1]
is the return stack index. Both
are used as integer offsets into m
. Both are kept in m
, instead of as
separate pointers, because they are then accessible to first
programs, which
is a crucial property of first
. Similarly the way words are stored in the
dictionary is not obvious, so it can be difficult to follow exactly what the
compiler words are doing.
Assuming you’ve waded through all that, you still have to penetrate the most
significant obfuscation. Traditionally, the question is whether a reader can
answer the question “what will this do when I run it”. A reader who has
deciphered first
to this point may think they know the answer to this
question, but they may not know the answer to the more important question, “what
will this program do when given the right input?”
FORTH aficionados,
and especially FORTH implementers, may recognize the similarity of the internal
compiler format to many FORTH internal representations, and, being aware that
FORTH interpreters can often be self-compiling, may be suspicious that this
program can compile FORTH, or a significant subset of it, or at least be capable
of doing so if fed the right input. Of course, the name “THIRD” should be a
dead giveaway, if the name “first” wasn’t. (These numbers were largely chosen
because they were five letters long, like “FORTH”, and would not require
truncation to five letters, which would be a dead giveaway. Besides, THIRD
represents a step backwards, in more ways than one.)
What exactly is first
, then?
first
is a tiny interpreter which implements a sufficient pseudo-subset of
FORTH to allow it
to bootstrap a relatively complete version of FORTH (based loosely on forth79),
which I call THIRD. Complete relative to what, I’m not sure.
I believe first
is close to the smallest amount of code possible to get this
effect using forth-style primitives, and still have some efficiency (it is
possible to get by without multiplication if you have addition, obviously). In
the design file, design, I give a justification for why each primitive in
first
was included.
THIRD is sorta slow, because first
has so few primitives that many things that
are primitives in
FORTH (like swap)
take a significant amount of time in THIRD.
When you get the Ok.
message from third
, try out some sample
FORTH code
(first
has no way of knowing if keyboard input is waiting, so it can’t actually prompt
you in a normal way. It only prints 'Ok.'
after you define a word).
2 3 + . cr
(which adds 2 and 3, and print the result and a newline.)
THIRD responds:
5
Now try:
: test 11 1 do i . loop cr ;
test
and THIRD responds
1 2 3 4 5 6 7 8 9 10
When in THIRD
, you can see how much space you’re currently
using by typing:
here .
The number THIRD replies with is the number of machine words (int
s) that the
dictionary (the first code) takes up, plus the 512 int
s for the return stack.
If you compile the basic THIRD system without the help word (strings take up one
int
per character in the string!), you should find that you’re using around
1000 int
s (plus the return stack).
Thus THIRD gives you a relatively complete
FORTH system in
less than 700 chars of C source + about 1000 int
s of memory–and it’s portable
too (you could copy over the THIRD memory dump to another machine, in theory).
If the above numbers seem to you to be mixing apples and oranges (C source and
compiled THIRD code), note that you should in theory be able to stick the
compiled THIRD code into the C source.
Software Construction Company gets credit for rekindling my interest in FORTH and thus indirectly inspiring me to write this program.
Inventory for 1992/buzzard.2
Primary files
- buzzard.2.c - entry source code
- Makefile - entry Makefile
- buzzard.2.alt.c - alternate source code
- buzzard.2.orig.c - original source code
- buzzard.2.README.html - author’s description of files
- buzzard.2.design.html - about the FORTH language
- try.alt.sh - script to try alternate code
- try.sh - script to try entry
- demo1.1st - demo FORTH source
- demo1.th - demo FORTH source
- demo2.th - demo FORTH source
- demo3.th - demo FORTH source
- demo4.th - demo FORTH source
- demo5.th - demo FORTH source
- demo6.th - demo FORTH source
- help.th - demo FORTH source
- third - the third source
Secondary files
- 1992_buzzard.2.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- buzzard.2.README.md - markdown source for buzzard.2.README.html
- buzzard.2.design.md - markdown source for buzzard.2.design.html
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- .path - directory path from top level directory
- index.html - this web page