1998Q1/
<!-- MHonArc v2.4.4 -->
<!--X-Subject: Tutorial: Let's build a Compiler! &#45; Part X: Introducing TINY -->
<!--X-From-R13: "Xba O. Znzoreg" <wyflfvapNvk.argpbz.pbz> -->
<!--X-Date: Sat, 28 Feb 1998 21:43:27 +0000 -->
<!--X-Message-Id: 000801bd4492$42f7ec20$883bd8ce@default -->
<!--X-Content-Type: text/plain -->
<!--X-Head-End-->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>MUD-Dev message, Tutorial: Let's build a Compiler! - Part X: Introducing TINY</title>
<!-- meta name="robots" content="noindex,nofollow" -->
<link rev="made" href="mailto:jlsysinc#ix,netcom.com">
</head>
<body background="/backgrounds/paperback.gif" bgcolor="#ffffff"
      text="#000000" link="#0000FF" alink="#FF0000" vlink="#006000">

  <font size="+4" color="#804040">
    <strong><em>MUD-Dev<br>mailing list archive</em></strong>
  </font>
      
<br>
[&nbsp;<a href="../">Other Periods</a>
&nbsp;|&nbsp;<a href="../../">Other mailing lists</a>
&nbsp;|&nbsp;<a href="/search.php3">Search</a>
&nbsp;]
<br clear=all><hr>
<!--X-Body-Begin-->
<!--X-User-Header-->
<!--X-User-Header-End-->
<!--X-TopPNI-->

Date:&nbsp;
[&nbsp;<a href="msg00648.html">Previous</a>
&nbsp;|&nbsp;<a href="msg00650.html">Next</a>
&nbsp;]
&nbsp;&nbsp;&nbsp;&nbsp;
Thread:&nbsp;
[&nbsp;<a href="msg00802.html">Previous</a>
&nbsp;|&nbsp;<a href="msg00648.html">Next</a>
&nbsp;]
&nbsp;&nbsp;&nbsp;&nbsp;
Index:&nbsp;
[&nbsp;<A HREF="author.html#00649">Author</A>
&nbsp;|&nbsp;<A HREF="#00649">Date</A>
&nbsp;|&nbsp;<A HREF="thread.html#00649">Thread</A>
&nbsp;]

<!--X-TopPNI-End-->
<!--X-MsgBody-->
<!--X-Subject-Header-Begin-->
<H1>Tutorial: Let's build a Compiler! - Part X: Introducing TINY</H1>
<HR>
<!--X-Subject-Header-End-->
<!--X-Head-of-Message-->
<UL>
<LI><em>To</em>: &lt;<A HREF="mailto:mud-dev#null,net">mud-dev#null,net</A>&gt;</LI>
<LI><em>Subject</em>: Tutorial: Let's build a Compiler! - Part X: Introducing TINY</LI>
<LI><em>From</em>: "Jon A. Lambert" &lt;<A HREF="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</A>&gt;</LI>
<LI><em>Date</em>: Sat, 28 Feb 1998 16:45:26 -0500</LI>
</UL>
<!--X-Head-of-Message-End-->
<!--X-Head-Body-Sep-Begin-->
<HR>
<!--X-Head-Body-Sep-End-->
<!--X-Body-of-Message-->
<PRE>

                     LET'S BUILD A COMPILER!

                                By

                     Jack W. Crenshaw, Ph.D.

                           21 May 1989


                    Part X: INTRODUCING "TINY"


*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************


INTRODUCTION

In the last installment, I showed you the general  idea  for  the
top-down development of  a  compiler.    I gave you the first few
steps  of  the process for compilers for  Pascal  and  C,  but  I
stopped  far  short  of  pushing  it through to completion.   The
reason was simple: if we're going to produce  a  real, functional
compiler  for  any  language, I'd rather  do  it  for  KISS,  the
language that I've been defining in this tutorial series.

In this installment, we're going to do just that, for a subset of
KISS which I've chosen to call TINY.

The process  will be essentially that outlined in Installment IX,
except  for  one  notable  difference.   In that  installment,  I
suggested  that  you  begin  with  a full BNF description of  the
language.  That's fine for something like Pascal or C,  for which
the language definition is  firm.   In the case of TINY, however,
we don't yet have a full  description  ... we seem to be defining
the language as we go.  That's OK.    In  fact,  it's preferable,
since we can tailor the language  slightly  as we go, to keep the
parsing easy.

So in the development  that  follows,  we'll  actually be doing a
top-down development of BOTH the  language and its compiler.  The
BNF description will grow along with the compiler.

In this process, there will be a number of decisions to  be made,
each of which will influence the BNF and therefore the  nature of
the language.   At  each  decision  point I'll try to remember to
explain  the  decision  and the rationale behind my choice.  That
way, if you happen to hold a different opinion and would prefer a
different option, you can choose it instead.  You  now  have  the
background  to  do  that.  I guess the important thing to note is
that  nothing  we  do  here  is  cast  in  concrete.  When YOU'RE
designing YOUR language, you should feel free to do it YOUR way.

Many of you may be asking at this point: Why bother starting over
from  scratch?  We had a working subset of KISS as the outcome of
Installment  VII  (lexical  scanning).  Why not just extend it as
needed?  The  answer  is  threefold.    First of all, I have been
making  a  number  of changes to further simplify the program ...
changes  like  encapsulating  the  code generation procedures, so
that  we  can  convert to a different target machine more easily.
Second, I want you to see how the development can indeed  be done
from the top down as outlined in the last installment.   Finally,
we both need the practice.  Each time I go through this exercise,
I get a little better at it, and you will, also.


GETTING STARTED

Many  years  ago  there were languages called  Tiny  BASIC,  Tiny
Pascal, and Tiny C, each of which was a subset of its parent full
language.  Tiny BASIC,  for  example,  had  only single-character
variable names and global variables.   It supported only a single
data type.  Sound familiar?  At this point we have almost all the
tools we need to build a compiler like that.

Yet a language called Tiny-anything  still  carries  some baggage
inherited from its parent language.   I've often wondered if this
is a  good  idea.    Granted,  a  language based upon some parent
language will have the  advantage  of  familiarity, but there may
also  be  some  peculiar syntax carried over from the parent that
may tend  to add unnecessary complexity to the compiler. (Nowhere
is this more true than in Small C.)

I've wondered just how small and simple a compiler could  be made
and  still  be  useful, if it were designed from the outset to be
both easy to use and to  parse.    Let's find out.  This language
will just be called "TINY," period.  It's a subset of KISS, which
I  also  haven't  fully  defined,  so  that  at  least  makes  us
consistent (!).  I suppose you could call it TINY KISS.  But that
opens  up a whole can of worms involving  cuter  and  cuter  (and
perhaps more risque) names, so let's just stick with TINY.

The main limitations  of  TINY  will  be because of the things we
haven't yet covered, such as data types.  Like its cousins Tiny C
and Tiny BASIC,  TINY  will  have  only one data type, the 16-bit
integer.    The  first  version  we  develop  will also  have  no
procedure  calls  and  will  use single-character variable names,
although as you will see we can remove these restrictions without
much effort.

The language I have in mind will share some of the  good features
of  Pascal,  C,  and Ada.  Taking a lesson from the comparison of
the Pascal and  C  compilers in the previous installment, though,
TINY will have a decided Pascal flavor.  Wherever  feasible,    a
language structure will  be  bracketed by keywords or symbols, so
that  the parser will know where it's  going  without  having  to
guess.

One other ground rule:  As we go, I'd like  to  keep the compiler
producing real, executable code.  Even though it may not  DO much
at the beginning, it will at least do it correctly.

Finally,  I'll  use  a couple of Pascal  restrictions  that  make
sense:  All data and procedures must be declared before  they are
used.  That makes good sense,  even  though for now the only data
type we'll use  is a word.  This rule in turn means that the only
reasonable place to put the  executable code for the main program
is at the end of the listing.

The top-level definition will be similar to Pascal:


     &lt;program&gt; ::= PROGRAM &lt;top-level decl&gt; &lt;main&gt; '.'


Already, we've reached a decision point.  My first thought was to
make the main block optional.   It  doesn't seem to make sense to
write a "program" with no main program, but it does make sense if
we're  allowing  for  multiple modules, linked together.    As  a
matter of fact,  I intend to allow for this in KISS.  But then we
begin  to open up a can of worms that I'd rather leave closed for
now.  For example, the  term "PROGRAM" really becomes a misnomer.
The MODULE of Modula-2 or the Unit of Turbo Pascal would  be more
appropriate.  Second,  what  about  scope  rules?    We'd  need a
convention for  dealing  with  name  visibility  across  modules.
Better  for  now  to  just  keep  it  simple  and ignore the idea
altogether.

There's also a decision in choosing to require  the  main program
to  be  last.    I  toyed  with  the idea of making its  position
optional,  as  in  C.  The nature of SK*DOS, the OS I'm compiling
for, make this very easy to do.   But  this  doesn't  really make
much sense in view of the Pascal-like requirement  that  all data
and procedures  be declared before they're referenced.  Since the
main  program can only call procedures  that  have  already  been
declared, the only position that makes sense is at the end,  a la
Pascal.

Given  the  BNF  above, let's write a parser that just recognizes
the brackets:


{--------------------------------------------------------------}
{  Parse and Translate a Program }

procedure Prog;
begin
   Match('p');
   Header;
   Prolog;
   Match('.');
   Epilog;
end;
{--------------------------------------------------------------}


The procedure Header just emits  the startup code required by the
assembler:
                              

{--------------------------------------------------------------}
{ Write Header Info }

procedure Header;
begin
   WriteLn('WARMST', TAB, 'EQU $A01E');
end;
{--------------------------------------------------------------}


The procedures Prolog and  Epilog  emit  the code for identifying
the main program, and for returning to the OS:


{--------------------------------------------------------------}
{ Write the Prolog }

procedure Prolog;
begin
   PostLabel('MAIN');
end;


{--------------------------------------------------------------}
{ Write the Epilog }

procedure Epilog;
begin
   EmitLn('DC WARMST');
   EmitLn('END MAIN');
end;
{--------------------------------------------------------------}


The  main program just calls Prog, and then  looks  for  a  clean
ending:


{--------------------------------------------------------------}
{ Main Program }

begin
   Init;
   Prog;
   if Look &lt;&gt; CR then Abort('Unexpected data after ''.''');
end.
{--------------------------------------------------------------}


At this point, TINY  will  accept  only  one input "program," the
null program:


     PROGRAM .   (or 'p.' in our shorthand.)

Note, though, that the  compiler  DOES  generate correct code for
this program.  It will run, and do  what  you'd  expect  the null
program to do, that is, nothing but return gracefully to the OS.

As a matter of interest, one of my  favorite  compiler benchmarks
is to compile, link,  and  execute  the  null program in whatever
language   is   involved.     You  can  learn  a  lot  about  the
implementation by measuring  the  overhead  in  time  required to
compile what should be a trivial case.  It's also  interesting to
measure the amount of code produced.  In many compilers, the code
can be fairly large, because they always include  the  whole run-
time  library whether they need it or not.    Early  versions  of
Turbo Pascal produced a 12K object file for  this  case.    VAX C
generates 50K!

The  smallest  null  programs  I've  seen are those  produced  by
Modula-2 compilers, and they run about 200-800 bytes.

In the case of TINY, we HAVE no run-time library  as  yet, so the
object code is indeed tiny:  two  bytes.    That's  got  to  be a
record, and it's  likely  to  remain  one since it is the minimum
size required by the OS.

The  next step is to process the code for the main program.  I'll
use the Pascal BEGIN-block:


     &lt;main&gt; ::= BEGIN &lt;block&gt; END


Here,  again,  we  have made a decision.  We could have chosen to
require a "PROCEDURE MAIN" sort of declaration, similar to C.   I
must  admit  that  this  is  not  a bad idea at all ...  I  don't
particularly  like  the  Pascal  approach  since I tend  to  have
trouble locating the main  program  in a Pascal listing.  But the
alternative is a little awkward, too, since you have to deal with
the  error condition where the user omits  the  main  program  or
misspells its name.  Here I'm taking the easy way out.

Another solution to the "where is the main program" problem might
be to require a name for  the  program, and then bracket the main
by


     BEGIN &lt;name&gt;
     END &lt;name&gt;


similar to the convention of  Modula  2.    This  adds  a  bit of
"syntactic sugar" to the language.  Things like this are  easy to
add or change to your liking, if the language is your own design.

To parse this definition of a main block,  change  procedure Prog
to read:

{--------------------------------------------------------------}
{  Parse and Translate a Program }

procedure Prog;
begin
   Match('p');
   Header;
   Main;
   Match('.');
end;
{--------------------------------------------------------------}


and add the new procedure:


{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure Main;
begin
   Match('b');
   Prolog;
   Match('e');
   Epilog;
end;
{--------------------------------------------------------------}


Now, the only legal program is:


     PROGRAM BEGIN END . (or 'pbe.')


Aren't we making progress???  Well, as usual it gets better.  You
might try some deliberate errors here, like omitting  the  'b' or
the 'e', and see what happens.  As always,  the  compiler  should
flag all illegal inputs.


DECLARATIONS

The obvious next step is to decide what we mean by a declaration.
My  intent  here  is to have two kinds of declarations: variables
and  procedures/functions.    At  the  top  level,   only  global
declarations are allowed, just as in C.

For now, there  can  only be variable declarations, identified by
the keyword VAR (abbreviated 'v'):


     &lt;top-level decls&gt; ::= ( &lt;data declaration&gt; )*

     &lt;data declaration&gt; ::= VAR &lt;var-list&gt;


Note that since there is only one variable type, there is no need
to  declare the type.  Later on, for full KISS, we can easily add
a type description.

The procedure Prog becomes:


{--------------------------------------------------------------}
{  Parse and Translate a Program }

procedure Prog;
begin
   Match('p');
   Header;
   TopDecls;
   Main;
   Match('.');
end;
{--------------------------------------------------------------}


Now, add the two new procedures:


{--------------------------------------------------------------}
{ Process a Data Declaration }

procedure Decl;
begin
   Match('v');
   GetChar;
end;


{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }

procedure TopDecls;
begin
   while Look &lt;&gt; 'b' do
      case Look of
        'v': Decl;
      else Abort('Unrecognized Keyword ''' + Look + '''');
      end;
end;
{--------------------------------------------------------------}


Note that at this point, Decl  is  just  a stub.  It generates no
code, and it doesn't process a list ... every variable must occur
in a separate VAR statement.

OK,  now  we  can have any  number  of  data  declarations,  each
starting with a 'v' for VAR,  before  the BEGIN-block.  Try a few
cases and see what happens.


DECLARATIONS AND SYMBOLS

That looks pretty good, but  we're still only generating the null
program  for  output.    A  real compiler would  issue  assembler
directives to allocate storage for  the  variables.    It's about
time we actually produced some code.

With  a  little  extra  code,  that's  an  easy  thing to do from
procedure Decl.  Modify it as follows:


{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }

procedure Decl;
var Name: char;
begin
   Match('v');
   Alloc(GetName);
end;
{--------------------------------------------------------------}


The procedure Alloc just  issues  a  command  to the assembler to
allocate storage:


{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
   WriteLn(N, ':', TAB, 'DC 0');
end;
{--------------------------------------------------------------}


Give  this  one  a  whirl.    Try  an  input  that declares  some
variables, such as:

     pvxvyvzbe.

See how the storage is allocated?    Simple, huh?  Note also that
the entry point, "MAIN," comes out in the right place.

For the record, a "real" compiler would also have a  symbol table
to record the variables being used.  Normally,  the  symbol table
is necessary to record the type  of  each variable.  But since in
this case  all  variables  have  the  same  type, we don't need a
symbol  table  for  that reason.  As it turns out, we're going to
find a symbol  necessary  even without different types, but let's
postpone that need until it arises.

Of course, we haven't really parsed the correct syntax for a data
declaration, since it involves a variable list.  Our version only
permits a single variable.  That's easy to fix, too.

The BNF for &lt;var-list&gt; is


     &lt;var-list&gt; ::= &lt;ident&gt; (, &lt;ident&gt;)*


Adding this syntax to Decl gives this new version:


{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }

procedure Decl;
var Name: char;
begin
   Match('v');
   Alloc(GetName);
   while Look = ',' do begin
      GetChar;
      Alloc(GetName);
   end;
end;
{--------------------------------------------------------------}


OK, now compile this code and give it  a  try.    Try a number of
lines of VAR declarations, try a list of several variables on one
line, and try combinations of the two.  Does it work?


INITIALIZERS

As long as we're dealing with data declarations, one thing that's
always  bothered  me  about  Pascal  is  that  it  doesn't  allow
initializing  data items in the declaration.    That  feature  is
admittedly sort of a frill, and it  may  be  out  of  place  in a
language that purports to  be  a minimal language.  But it's also
SO easy to add that it seems a shame not  to  do  so.    The  BNF
becomes:


     &lt;var-list&gt; ::= &lt;var&gt; ( &lt;var&gt; )*

     &lt;var&gt; ::= &lt;ident&gt; [ = &lt;integer&gt; ]

Change Alloc as follows:


{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
   Write(N, ':', TAB, 'DC ');
   if Look = '=' then begin
      Match('=');
      WriteLn(GetNum);
      end
   else
      WriteLn('0');
end;
{--------------------------------------------------------------}


There you are: an initializer with six added lines of Pascal.

OK, try this  version  of  TINY  and verify that you can, indeed,
give the variables initial values.

By golly, this thing is starting to look  real!    Of  course, it
still doesn't DO anything, but it looks good, doesn't it?

Before leaving this section, I should point out  that  we've used
two versions of function GetNum.  One, the earlier one, returns a
character value, a single digit.  The other accepts a multi-digit
integer and returns an integer value.  Either one will work here,
since WriteLn will handle either type.  But there's no  reason to
limit ourselves  to  single-digit  values  here,  so  the correct
version to use is the one that returns an integer.  Here it is:


{--------------------------------------------------------------}
{ Get a Number }

function GetNum: integer;
var Val: integer;
begin
   Val := 0;
   if not IsDigit(Look) then Expected('Integer');
   while IsDigit(Look) do begin
      Val := 10 * Val + Ord(Look) - Ord('0');
      GetChar;
   end;
   GetNum := Val;
end;
{--------------------------------------------------------------}

As a matter  of  fact,  strictly  speaking  we  should  allow for
expressions in the data field of the initializer, or at  the very
least  for  negative  values.  For  now,  let's  just  allow  for
negative values by changing the code for Alloc as follows:


{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
   if InTable(N) then Abort('Duplicate Variable Name ' + N);
   ST[N] := 'v';
   Write(N, ':', TAB, 'DC ');
   if Look = '=' then begin
      Match('=');
      If Look = '-' then begin
         Write(Look);
         Match('-');
      end;
      WriteLn(GetNum);
      end
   else
      WriteLn('0');
end;
{--------------------------------------------------------------}


Now  you should be able to  initialize  variables  with  negative
and/or multi-digit values.


THE SYMBOL TABLE

There's one problem  with  the  compiler  as it stands so far: it
doesn't do anything to record a variable when we declare it.   So
the compiler is perfectly content to allocate storage for several
variables with the same name.  You can easily verify this with an
input like


     pvavavabe.


Here we've declared the variable A three times.  As you  can see,
the compiler will  cheerfully  accept  that,  and  generate three
identical labels.  Not good.

Later on,  when we start referencing variables, the compiler will
also let us reference variables  that don't exist.  The assembler
will  catch  both  of these error conditions, but it doesn't seem
friendly at all to pass such errors along to the assembler.   The
compiler should catch such things at the source language level.

So even though we don't need a symbol table to record data types,
we ought to install  one  just to check for these two conditions.
Since at this  point  we are still restricted to single-character
variable names, the symbol table can be trivial.  To  provide for
it, first add the following  declaration at the beginning of your
program:


     var ST: array['A'..'Z'] of char;


and insert the following function:


{--------------------------------------------------------------}
{ Look for Symbol in Table }

function InTable(n: char): Boolean;
begin
   InTable := ST[n] &lt;&gt; ' ';
end;
{--------------------------------------------------------------}


We  also  need  to initialize the  table  to  all  blanks.    The
following lines in Init will do the job:


var i: char;
begin
   for i := 'A' to 'Z' do
      ST[i] := ' ';
   ...


Finally,  insert  the  following two lines at  the  beginning  of
Alloc:


   if InTable(N) then Abort('Duplicate Variable Name ' + N);
   ST[N] := 'v';


That  should  do  it.  The  compiler  will  now  catch  duplicate
declarations.  Later, we can  also  use  InTable  when generating
references to the variables.


EXECUTABLE STATEMENTS

At this point, we can generate a null program that has  some data
variables  declared  and  possibly initialized.  But  so  far  we
haven't arranged to generate the first line of executable code.

Believe  it or not, though, we almost  have  a  usable  language!
What's missing is the executable code that must go into  the main
program.  But that code is just assignment statements and control
statements ... all stuff we have done before.   So  it  shouldn't
take us long to provide for them, as well.

The BNF definition given earlier  for the main program included a
statement block, which we have so far ignored:


     &lt;main&gt; ::= BEGIN &lt;block&gt; END


For now,  we  can  just  consider  a  block  to  be  a  series of
assignment statements:


     &lt;block&gt; ::= (Assignment)*


Let's start things off by adding  a  parser for the block.  We'll
begin with a stub for the assignment statement:


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
begin
   GetChar;
end;


{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }

procedure Block;
begin
   while Look &lt;&gt; 'e' do
      Assignment;
end;
{--------------------------------------------------------------}


Modify procedure Main to call Block as shown below:


{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure Main;
begin
   Match('b');
   Prolog;
   Block;
   Match('e');
   Epilog;
end;
{--------------------------------------------------------------}


This version still won't generate any code for  the   "assignment
statements" ... all it does is to eat characters  until  it  sees
the 'e' for 'END.'  But it sets the stage for what is to follow.

The  next  step,  of  course,  is  to  flesh out the code for  an
assignment statement.  This  is  something  we've done many times
before,  so  I  won't belabor it.  This time, though, I'd like to
deal with the code generation a little differently.  Up till now,
we've always just inserted the Emits that generate output code in
line with  the parsing routines.  A little unstructured, perhaps,
but it seemed the most straightforward approach, and made it easy
to see what kind of code would be emitted for each construct.

However, I realize that most of you are using an  80x86 computer,
so  the 68000 code generated is of little use to you.  Several of
you have asked me if the CPU-dependent code couldn't be collected
into one spot  where  it  would  be easier to retarget to another
CPU.  The answer, of course, is yes.

To  accomplish  this,  insert  the  following  "code  generation"
routines:


{---------------------------------------------------------------}
{ Clear the Primary Register }

procedure Clear;
begin
   EmitLn('CLR D0');
end;


{---------------------------------------------------------------}
{ Negate the Primary Register }

procedure Negate;
begin
   EmitLn('NEG D0');
end;


{---------------------------------------------------------------}
{ Load a Constant Value to Primary Register }

procedure LoadConst(n: integer);
begin
   Emit('MOVE #');
   WriteLn(n, ',D0');
end;


{---------------------------------------------------------------}
{ Load a Variable to Primary Register }

procedure LoadVar(Name: char);
begin
   if not InTable(Name) then Undefined(Name);
   EmitLn('MOVE ' + Name + '(PC),D0');
end;


{---------------------------------------------------------------}
{ Push Primary onto Stack }

procedure Push;
begin
   EmitLn('MOVE D0,-(SP)');
end;


{---------------------------------------------------------------}
{ Add Top of Stack to Primary }

procedure PopAdd;
begin
   EmitLn('ADD (SP)+,D0');
end;


{---------------------------------------------------------------}
{ Subtract Primary from Top of Stack }

procedure PopSub;
begin
   EmitLn('SUB (SP)+,D0');
   EmitLn('NEG D0');
end;


{---------------------------------------------------------------}
{ Multiply Top of Stack by Primary }

procedure PopMul;
begin
   EmitLn('MULS (SP)+,D0');
end;


{---------------------------------------------------------------}
{ Divide Top of Stack by Primary }

procedure PopDiv;
begin
   EmitLn('MOVE (SP)+,D7');
   EmitLn('EXT.L D7');
   EmitLn('DIVS D0,D7');
   EmitLn('MOVE D7,D0');
end;


{---------------------------------------------------------------}
{ Store Primary to Variable }

procedure Store(Name: char);
begin
   if not InTable(Name) then Undefined(Name);
   EmitLn('LEA ' + Name + '(PC),A0');
   EmitLn('MOVE D0,(A0)')
end;
{---------------------------------------------------------------}


The  nice  part  of  this  approach,  of  course,  is that we can
retarget  the compiler to a new CPU  simply  by  rewriting  these
"code generator" procedures.  In  addition,  we  will  find later
that we can improve the code quality by tweaking these routines a
bit, without having to modify the compiler proper.

Note that both LoadVar  and  Store check the symbol table to make
sure that the variable is defined.  The  error  handler Undefined
simply calls Abort:


{--------------------------------------------------------------}
{ Report an Undefined Identifier }

procedure Undefined(n: string);
begin
   Abort('Undefined Identifier ' + n);
end;
{--------------------------------------------------------------}


OK, we are now finally ready to begin processing executable code.
We'll  do  that  by  replacing  the  stub  version  of  procedure
Assignment.

We've been down this  road  many times before, so this should all
be familiar to you.    In fact, except for the changes associated
with the code generation, we  could just copy the procedures from
Part  VII.    Since we are making some changes, I won't just copy
them, but we will go a little faster than usual.

The BNF for the assignment statement is:

     &lt;assignment&gt; ::= &lt;ident&gt; = &lt;expression&gt;

     &lt;expression&gt; ::= &lt;first term&gt; ( &lt;addop&gt; &lt;term&gt; )*

     &lt;first term&gt; ::= &lt;first factor&gt; &lt;rest&gt;

     &lt;term&gt; ::= &lt;factor&gt; &lt;rest&gt;

     &lt;rest&gt; ::= ( &lt;mulop&gt; &lt;factor&gt; )*

     &lt;first factor&gt; ::= [ &lt;addop&gt; ] &lt;factor&gt;

     &lt;factor&gt; ::= &lt;var&gt; | &lt;number&gt; | ( &lt;expression&gt; )


This version of the BNF is  also  a bit different than we've used
before ... yet another "variation on the theme of an expression."
This particular version  has  what  I  consider  to  be  the best
treatment  of  the  unary minus.  As you'll see later, it lets us
handle   negative  constant  values  efficiently.    It's   worth
mentioning  here  that  we  have  often  seen  the advantages  of
"tweaking"  the  BNF  as we go, to help make the language easy to
parse.    What  you're looking at here is a bit different:  we've
tweaked  the  BNF  to make the CODE  GENERATION  more  efficient!
That's a first for this series.

Anyhow, the following code implements the BNF:


{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }

procedure Expression; Forward;

procedure Factor;
begin
   if Look = '(' then begin
      Match('(');
      Expression;
      Match(')');
      end
   else if IsAlpha(Look) then
      LoadVar(GetName)
   else
      LoadConst(GetNum);
end;


{--------------------------------------------------------------}
{ Parse and Translate a Negative Factor }

procedure NegFactor;
begin
   Match('-');
   if IsDigit(Look) then
      LoadConst(-GetNum)
   else begin
      Factor;
      Negate;
   end;
end;


{--------------------------------------------------------------}
{ Parse and Translate a Leading Factor }

procedure FirstFactor;
begin
   case Look of
     '+': begin
             Match('+');
             Factor;
          end;
     '-': NegFactor;
   else  Factor;
   end;
end;


{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }

procedure Multiply;
begin
   Match('*');
   Factor;
   PopMul;
end;


{-------------------------------------------------------------}
{ Recognize and Translate a Divide }

procedure Divide;
begin
   Match('/');
   Factor;
   PopDiv;
end;


{---------------------------------------------------------------}
{ Common Code Used by Term and FirstTerm }

procedure Term1;
begin
   while IsMulop(Look) do begin
      Push;
      case Look of
       '*': Multiply;
       '/': Divide;
      end;
   end;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Math Term }

procedure Term;
begin
   Factor;
   Term1;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Leading Term }

procedure FirstTerm;
begin
   FirstFactor;
   Term1;
end;


{--------------------------------------------------------------}
{ Recognize and Translate an Add }

procedure Add;
begin
   Match('+');
   Term;
   PopAdd;
end;


{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }

procedure Subtract;
begin
   Match('-');
   Term;
   PopSub;
end;


{---------------------------------------------------------------}
{ Parse and Translate an Expression }

procedure Expression;
begin
   FirstTerm;
   while IsAddop(Look) do begin
      Push;
      case Look of
       '+': Add;
       '-': Subtract;
      end;
   end;
end;


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: char;
begin
   Name := GetName;
   Match('=');
   Expression;
   Store(Name);
end;
{--------------------------------------------------------------}


OK, if you've  got  all  this  code inserted, then compile it and
check  it out.  You should  be  seeing  reasonable-looking  code,
representing a complete program that will  assemble  and execute.
We have a compiler!


BOOLEANS

The next step should also  be  familiar  to  you.    We  must add
Boolean  expressions  and relational operations.    Again,  since
we've already dealt with them more than once,  I  won't elaborate
much on them, except  where  they  are  different from what we've
done before.  Again, we won't just copy from other  files because
I've changed a few things just a bit.  Most  of  the changes just
involve encapsulating the machine-dependent parts as  we  did for
the   arithmetic  operations.    I've  also  modified   procedure
NotFactor  somewhat,  to  parallel  the structure of FirstFactor.
Finally,  I  corrected  an  error  in  the  object code  for  the
relational operators:  The Scc instruction I used  only  sets the
low 8 bits of D0.  We want all 16 bits set for a logical true, so
I've added an instruction to sign-extend the low byte.

To begin, we're going to need some more recognizers:


{--------------------------------------------------------------}
{ Recognize a Boolean Orop }

function IsOrop(c: char): boolean;
begin
   IsOrop := c in ['|', '~'];
end;


{--------------------------------------------------------------}
{ Recognize a Relop }

function IsRelop(c: char): boolean;
begin
   IsRelop := c in ['=', '#', '&lt;', '&gt;'];
end;
{--------------------------------------------------------------}


Also, we're going to need some more code generation routines:


{---------------------------------------------------------------}
{ Complement the Primary Register }

procedure NotIt;
begin
   EmitLn('NOT D0');
end;
{---------------------------------------------------------------}

</PRE>

<!--X-Body-of-Message-End-->
<!--X-MsgBody-End-->
<!--X-Follow-Ups-->
<HR>
<!--X-Follow-Ups-End-->
<!--X-References-->
<!--X-References-End-->
<!--X-BotPNI-->
<UL>
<LI>Prev by Date:
<STRONG><A HREF="msg00648.html">Tutorial: Let's build a Compiler! - Part VII: Lexical Scanning</A></STRONG>
</LI>
<LI>Next by Date:
<STRONG><A HREF="msg00650.html">Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</A></STRONG>
</LI>
<LI>Prev by thread:
<STRONG><A HREF="msg00802.html">Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</A></STRONG>
</LI>
<LI>Next by thread:
<STRONG><A HREF="msg00648.html">Tutorial: Let's build a Compiler! - Part VII: Lexical Scanning</A></STRONG>
</LI>
<LI>Index(es):
<UL>
<LI><A HREF="index.html#00649"><STRONG>Date</STRONG></A></LI>
<LI><A HREF="thread.html#00649"><STRONG>Thread</STRONG></A></LI>
</UL>
</LI>
</UL>

<!--X-BotPNI-End-->
<!--X-User-Footer-->
<!--X-User-Footer-End-->
<ul><li>Thread context:
<BLOCKQUOTE><UL>
<LI><STRONG>Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</STRONG>, <EM>(continued)</EM>
<ul compact>
<LI><strong><A NAME="00666" HREF="msg00666.html">Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</A></strong>, 
Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Mon 02 Mar 1998, 09:28 GMT
</LI>
<LI><strong><A NAME="00790" HREF="msg00790.html">Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</A></strong>, 
Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Fri 20 Mar 1998, 16:36 GMT
<UL>
<LI><strong><A NAME="00792" HREF="msg00792.html">Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</A></strong>, 
Joel Dillon <a href="mailto:emily#cornholio,new.ox.ac.uk">emily#cornholio,new.ox.ac.uk</a>, Fri 20 Mar 1998, 17:05 GMT
</LI>
</UL>
</LI>
<LI><strong><A NAME="00802" HREF="msg00802.html">Re: [MUD-Dev] Net protocols for MUDing (was: Moore's Law sucks)</A></strong>, 
Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Sat 21 Mar 1998, 02:29 GMT
</LI>
</ul>
</LI>
<LI><strong><A NAME="00649" HREF="msg00649.html">Tutorial: Let's build a Compiler! - Part X: Introducing TINY</A></strong>, 
Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Sat 28 Feb 1998, 21:43 GMT
<LI><strong><A NAME="00648" HREF="msg00648.html">Tutorial: Let's build a Compiler! - Part VII: Lexical Scanning</A></strong>, 
Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Sat 28 Feb 1998, 21:42 GMT
<LI><strong><A NAME="00647" HREF="msg00647.html">Tutorial: Let's build a Compiler! - Part IX: A Top View</A></strong>, 
Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Sat 28 Feb 1998, 21:42 GMT
<LI><strong><A NAME="00646" HREF="msg00646.html">Tutorial: Let's build a Compiler! - Part VIII: A Little Philosophy</A></strong>, 
Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Sat 28 Feb 1998, 21:42 GMT
<LI><strong><A NAME="00637" HREF="msg00637.html">Re: [MUD-Dev] Clients</A></strong>, 
Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Sat 28 Feb 1998, 02:20 GMT
</LI>
</UL></BLOCKQUOTE>

</ul>
<hr>
<center>
[&nbsp;<a href="../">Other Periods</a>
&nbsp;|&nbsp;<a href="../../">Other mailing lists</a>
&nbsp;|&nbsp;<a href="/search.php3">Search</a>
&nbsp;]
</center>
<hr>
</body>
</html>