Tracing
A trace of parser execution was useful, e.g., to judge the quality of error recovery:
error: (5) ']' is not allowed in "'(' argument */ ',' ')'"
expecting: ')'
4 (5) ']' reduce parentheses: '(' argum obj one,2
4 (5) ']' shift ']' obj null
4 (5) '.' shift '.' obj null
3 (6) '}' reduce brackets: '[' parenthe obj one,2Tracing is implemented using the Observer design pattern: A subject is informed about an observer and sends back messages as the state of the subject changes. The page.LL1 object plays the role of subject which calls an Observer function. The initial connection is either made when the parser is constructed or when one of the parsing methods is called.
For top-down parsing with either, page.LL1#parse or page.LL1#parser, there are four messages describing state changes:
ruleindicates that the parser starts to match a new rule.shiftindicates that a terminal symbol has been matched.reduceindicates that the parser is done matching a rule.error, finally, indicates that the parser could not match the current input symbol.
Each message is accompanied by additional information:
a number indicating a position in the grammar which could be a rule number,
the current input page.Tuple,
a page.BNF.Rule or page.BNF.T object as context from the grammar, or an error message string,
and a reference to the subject sending the message which could be queried for other information.
Tracing is most convenient if it can be added quickly at any time. Therefore, page has a factory method page.trace to create observers which will simply print the messages.
For example, the stack algorithm for the error recovery example can be traced as follows:
ll1 = new page.LL1(page.EBNF.from( Grammar.value ), page.trace());
input = ll1.tokenizer()( Input.value );
ll1.parse(input.concat([null]));If the trace of the recursive algorithm needs a new header, a fresh observer can be provided when the algorithm is invoked:
ll1.parser(input, page.trace(/reduce/));
The trace can be restricted to messages selected by a pattern argument to page.trace.
Collecting values
So far, input symbols are abandoned as soon as they are matched during parsing.
An observer can change that because it sees all matched tuples during shift messages:
collect = [];
function observer (_at, _tuple, _action, _info, _sender) {
if (_action == 'shift') collect.push(_tuple.value);
return collect;
}For example, the following EBNF grammar describes a sum as a sequence of numbers separated by plus or minus signs:
%token Number '[0-9]+';
sum: Number +/ ('+' | '-');The observer can be used to collect all input symbols:
ll1 = new page.LL1(page.EBNF.from( Grammar.value ), observer);
input = ll1.tokenizer()( '1 + 3 - 5 + 7' );
ll1.parser(input).join(' . ')
By convention, the parsing algorithm returns the last value which it received from the observer. Here this is
1 . + . 3 . - . 5 . + . 7 . These are the input symbols (pushed into the array collect above), joined with blanks and periods for display purposes.
The trailing period is caused by matching and collecting $eof.
page.trace accepts an observer as a first argument and observes its result:
ll1.parser(input, page.trace(observer))produces
AT TUPLE ACTION RETURN
0 (2) Number '1' rule $accept: sum $eof; [ ]
1 (2) Number '1' rule sum: Number +/ ('+' | ' [ ]
1 (2) Number '1' shift Number [ ] 1
1 (2) '+' shift '+' [ ] 1,+
1 (2) Number '3' shift Number [ ] 1,+,3
1 (2) '-' shift '-' [ ] 1,+,3,-
1 (2) Number '5' shift Number [ ] 1,+,3,-,5
1 (2) '+' shift '+' [ ] 1,+,3,-,5,+
1 (2) Number '7' shift Number [ ] 1,+,3,-,5,+,7
1 (0) $eof reduce sum: Number +/ ('+' | ' [ ] 1,+,3,-,5,+,7
1 (0) $eof shift $eof [ ] 1,+,3,-,5,+,7,
0 (0) $eof reduce $accept: sum $eof; [ ] 1,+,3,-,5,+,7,
1,+,3,-,5,+,7,The RETURN column shows how collect acquires one matched input symbol after another.
Collecting values turns out to be so useful for further processing that it is built into page as well. page.LL1#reflect returns an observer which collects those input strings which were matched by page.BNF.Token patterns:
ll1.parser(input, ll1.reflect());produces a flat array
1,3,5,7There is a significant difference between the two built-in observers: page.trace is a static function and produces a state-less observer, page.LL1#reflect is a method of a parser object because the observer has state — the collected values. Either observer is serially (but not concurrently) reusable.
Semantic actions
What can be done with the collected values?
The sum example hints that one could try to build a calculator.
However, by design choice page.LL1#reflect does not collect literal operators.
Here is a trace for a slightly different grammar:
%token Number '[0-9]+';
sum: Number (add | sub)*;
add: '+' Number;
sub: '-' Number;The reduce messages from parsing '1 + 3 - 5 + 7'
AT TUPLE ACTION RETURN
2 (2) '-' reduce add: '+' Number; [ ] 3
3 (2) '+' reduce sub: '-' Number; [ ] 5
2 (0) $eof reduce add: '+' Number; [ ] 7
1 (0) $eof reduce sum: Number (add | sub) [ ] 1,3,5,7
0 (0) $eof reduce $accept: sum $eof; [ ] 1,3,5,7
show how each but the first Number is collected in a reduce message for a rule which is specific to the desired arithmetic operation.
The first Number is collected in rule 1, near the end of the trace.
Note that the value collected for a token is a string from the input.
To avoid a lot of explicit conversions, the grammar is changed so that Number appears in only one rule:
sum: num (add | sub)*;
add: '+' num;
sub: '-' num;
num: Number;Given a collection of functions as an argument, page.LL1#reflect can call a Builder function corresponding to the name of the non-terminal on the left-hand side:
actions = {
num: function (values) { return Number(values[0]); },
add: function (values, env) { return env[0] += values[0]; },
sub: function (values, env) { return env[0] -= values[0]; },
sum: function (values, env) { return values[0] + env[0]; }
};If a Builder function with the name of a rules left-hand side can be found, it receives
the list of values collected for the rule,
an environment object originally passed to
reflect,and a reference to the parser which can be queried for the current input page.Tuple.
The function can perform arbitrary actions. It is known as a semantic action because it defines the meaning of the grammar rule.
The function result
is collected by reflect into the list for the encompassing rule; an array result is flattened.
The function num shown above converts the input string representing Number into a numerical value
and returns that to reflect for the outer rules. The grammar rule for num above was added
exactly to place the conversion into a single Builder function.
add and sub assume that env is an array, initialized with a single element of value zero,
and accumulate the running total in that array element.
Finally, sum combines the value of the first Number with the running total in the environment
and returns the result.
The environment is initially handed to reflect, together with the Builder function collection:
ll1.parser(input, page.trace( ll1.reflect([0], actions), /reduce/));
The reduce messages from the trace show how the numbers are collected until the total 6 is computed.
AT TUPLE ACTION RETURN
4 (2) '+' reduce num: Number; num 1
4 (2) '-' reduce num: Number; num 3
2 (2) '-' reduce add: '+' num; num 3
4 (2) '+' reduce num: Number; num 5
3 (2) '+' reduce sub: '-' num; num -2
4 (0) $eof reduce num: Number; num 7
2 (0) $eof reduce add: '+' num; num 5
1 (0) $eof reduce sum: num (add | sub)*; num 6
0 (0) $eof reduce $accept: sum $eof; [ ] 6
6The trace shows that all the return values passed through reflect are numerical values.
However, there is no semantic action for rule 0; therefore, the final value returned from reflect
is an array containing the last numerical value.
This example should suffice to explain why by design choice reflect does not collect literal input strings:
they are already known in the grammar. Only strings representing token patterns are interesting for further processing.
Experience shows that the grammar can always be (re-)written so that a Builder function knows exactly
what literals and how many token values it deals with.
Similarly, while a Builder function can return an array with zero or more values to reflect,
and while these values are flattened into the list for the next outer rule,
the next outer Builder function knows from its corresponding grammar rule that it receives values for a specific non-terminal
or EBNF repetition construct and should be able to deal even with a variable number of such values.
As an example, consider the fact that, technically, arithmetic is not quite performed from left to right with the Builder functions above. The following semantic actions eliminate this issue:
actions = {
num: function (values) { return Number(values[0]); },
sub: function (values) { return - values[0]; },
sum: function (values) {
return values.slice(1).reduce(
function (left, right) { return left + right; },
values[0]
);
}
};
print( ll1.parser(input, page.trace(ll1.reflect(null, actions), /reduce/)) )
This time the environment is not needed; therefore, null is sent to reflect initially.
The trace shows that the add rule relies on reflect to hand the value to the outer rule
and that the sub rule merely changes the sign of the numerical value.
AT TUPLE ACTION RETURN
4 (2) '+' reduce num: Number; num 1
4 (2) '-' reduce num: Number; num 3
2 (2) '-' reduce add: '+' num; [ ] 3
4 (2) '+' reduce num: Number; num 5
3 (2) '+' reduce sub: '-' num; num -5
4 (0) $eof reduce num: Number; num 7
2 (0) $eof reduce add: '+' num; [ ] 7
1 (0) $eof reduce sum: num (add | sub)*; num 6
0 (0) $eof reduce $accept: sum $eof; [ ] 6
6The actual computation takes place in the sum function which receives the first number
and any number of additional summands.
While right recursion is used to implement the iteration
in the grammar rule for sum, arithmetic is still performed from left to right,
courtesy of the JavaScript reduce method for Array objects.
Interpreting arithmetic expressions
The previous example evaluates strictly from left to right which is sufficient for addition mixed with subtraction. Full-scale arithmetic, however, requires operator precedence and parentheses. The grammar has to be extended:
%token Number '[0-9]+';
expr: prod (add | sub)*;
add: '+' prod;
sub: '-' prod;
prod: factor (mul | div)*;
mul: '*' factor;
div: '/' factor;
factor: prim power*;
power: '**' prim;
prim: num | '(' expr ')';
num: Number;Commonly, exponentiation is evaluated from right to left and takes precedence over the other binary operations which are evaluated from left to right. Multiplication and division take precedence over addition and subtraction. Before the addition in
2 + 3 * 4can happen, the rule for prod has to be reduced
and the observer created by page.LL1#reflect will execute the corresponding function
which can perform the multiplication prior to the addition.
Here are the necessary Builder functions for a full-scale interpreter:
actions = {
expr: function (values) {
return values.slice(1).reduce(
function (left, right) { return left + right; }, values[0]);
},
sub: function (values) { return - values[0]; },
prod: function (values) {
return values.slice(1).reduce(
function (left, right) { return left * right; }, values[0]);
},
div: function (values) { return 1 / values[0]; },
factor: function (values) {
return values.reduceRight(
function (right, left) { return Math.pow(left, right); }, values.pop());
},
num: function (values) { return Number(values[0]); }
};
The functions for expr and prod perform left-to-right evaluation.
It is debatable if creating an inverse in the function for div for later multiplication in prod
is numerically very sound, but at least JavaScript performs floating point arithmetic
and will not crash if a division by zero is attempted.
Alternatively, the observer could throw an error message to abort interpretation.
Right associativity is implemented using the JavaScript reduceRight method
for Array objects
which, however, presents the operands in reverse order for exponentiation with Math.pow().
The trace shows how the computation of
1 + 2*3 - 45/(1 + 2 ** 3)progresses:
AT TUPLE ACTION RETURN
10 (2) '+' reduce num: Number; num 1
9 (2) '+' reduce prim: num | '(' expr ') [ ] 1
7 (2) '+' reduce factor: prim power*; num 1
4 (2) '+' reduce prod: factor (mul | div num 1
10 (2) '*' reduce num: Number; num 2
9 (2) '*' reduce prim: num | '(' expr ') [ ] 2
7 (2) '*' reduce factor: prim power*; num 2
10 (2) '-' reduce num: Number; num 3
9 (2) '-' reduce prim: num | '(' expr ') [ ] 3
7 (2) '-' reduce factor: prim power*; num 3
5 (2) '-' reduce mul: '*' factor; [ ] 3
4 (2) '-' reduce prod: factor (mul | div num 6
2 (2) '-' reduce add: '+' prod; [ ] 6
10 (2) '/' reduce num: Number; num 45
9 (2) '/' reduce prim: num | '(' expr ') [ ] 45
7 (2) '/' reduce factor: prim power*; num 45
10 (2) '+' reduce num: Number; num 1
9 (2) '+' reduce prim: num | '(' expr ') [ ] 1
7 (2) '+' reduce factor: prim power*; num 1
4 (2) '+' reduce prod: factor (mul | div num 1
10 (2) '**' reduce num: Number; num 2
9 (2) '**' reduce prim: num | '(' expr ') [ ] 2
10 (2) ')' reduce num: Number; num 3
9 (2) ')' reduce prim: num | '(' expr ') [ ] 3
8 (2) ')' reduce power: '**' prim; [ ] 3
7 (2) ')' reduce factor: prim power*; num 8
4 (2) ')' reduce prod: factor (mul | div num 8
2 (2) ')' reduce add: '+' prod; [ ] 8
1 (2) ')' reduce expr: prod (add | sub)* num 9
9 (0) $eof reduce prim: num | '(' expr ') [ ] 9
7 (0) $eof reduce factor: prim power*; num 9
6 (0) $eof reduce div: '/' factor; num 0.1111111111111111
4 (0) $eof reduce prod: factor (mul | div num 5
3 (0) $eof reduce sub: '-' prod; num -5
1 (0) $eof reduce expr: prod (add | sub)* num 2
0 (0) $eof reduce $accept: expr $eof; [ ] 2
2The trace shows that rule 4 — the product of 2 and 3 — is reduced long before rule 1 — the overall sum — which has lowest precedence
and is reduced just before rule 0 accepts the expression.
The effect of the parentheses and precedence of ** over + can be seen by the order in which
rule 7 — 2 to the third power, rule 1 — the sum of 1 and 2 ** 3, and rule 4 — the quotient — are reduced.
It should be noted that the same set of Builder functions always works for both, the recursive page.LL1#parser algorithm and the stack-based page.LL1#parse algorithm. The latter is strictly based on a BNF grammar, i.e., value collection works equally well for grammars described using BNF or extended BNF.
Representing arithmetic expressions
An arithmetic expression is often represented as a tree with the operations in the branch nodes and the numbers in the leaves. Lisp's prefix notation is a good way to display such a tree. The following Builder functions create Lisp notation from an arithmetic expression which is a sentence of the grammar shown before.
actions = {
expr: fromLeft,
add: function (values) { return { op: '+', val: values[0] }; },
sub: function (values) { return { op: '-', val: values[0] }; },
prod: fromLeft,
mul: function (values) { return { op: '*', val: values[0] }; },
div: function (values) { return { op: '/', val: values[0] }; },
factor: fromRight,
num: function (values) { return Number(values[0]); }
};
As before, num converts all matched Number tokens into JavaScript number values.
The semantic actions for the operations add, sub, mul, and div capture the
corresponding Lisp operator and its right argument in a collection.
The semantic actions at the two precedence levels of arithmetic, prod and expr
share a function fromLeft which creates Lisp notation to combine all operations
at the same precedence level, associating from left to right:
function fromLeft (values) {
return values.slice(1).reduce(
function (left, right) {
return '(' + right.op + ' ' + left + ' ' + right.val + ')';
}, values[0]);
}fromLeft uses JavaScripts reduce method for Array objects
to create one branch node at a time,
proceeding from left to right. The first collected value was not processed by one of the operators
but all others are collections. Therefore, fromLeft combines the first value and the value from the collection
right with the operator from the right.
function fromRight (values) {
return values.reduceRight(
function (right, left) {
return '(expt ' + left + ' ' + right + ')';
}, values.pop());
}The semantic action for exponentiation uses a function from Right which is based on JavaScripts
reduceRight method for Array objects
to create branch nodes proceeding from right to left.
from Right is much simpler and the default action — value collection — is sufficient for power
because there is only one operation at this precedence level.
Otherwise, if power were encoded like the other operations
power: function (values) { return { op: 'expt', val: values[0] }; },In this case fromRight has to treat the leftmost value as a special case and encode the intermediate terms
function fromRight (values) {
return values.reduceRight(
function (right, left) {
return typeof left == 'number' ?
'(' + right.op + ' ' + left + ' ' + right.val + ')' :
{ op: left.op, val: '(' + right.op + ' ' + left.val + ' ' + right .val + ')' };
}, values.pop());
}
This would be the general blueprint for right-associativity.
The formula 1 + 2*3 - 45/(1 + 2 ** 3) is represented as Lisp expression
(- (+ 1 (* 2 3)) (/ 45 (+ 1 (expt 2 3))))
Note how multiplication and exponentiation as well as parentheses do take precedence and that addition and subtraction associate to the left.
As for right-associativity, (2 ** 3 ** 4 ** 5) is represented as
(expt 2 (expt 3 (expt 4 5)))Compiling arithmetic expressions
JavaScript has functions as first-order values, i.e., they can be passed as arguments and returned as values. It is tempting to build an arithmetic expression into a nest of functions:
actions = {
expr: fromLeft,
add: function (values) { return function (left) { return left() + values[0](); } },
sub: function (values) { return function (left) { return left() - values[0](); } },
prod: fromLeft,
mul: function (values) { return function (left) { return left() * values[0](); } },
div: function (values) { return function (left) { return left() / values[0](); } },
factor: fromRight,
num: function (values) { return function () { return Number(values[0]); }; }
};
num turns a numerical value from a string into a parameter-less function returning the value.
An operator such as add expects the collected value to be a parameter-less function returning the right-hand operand value.
The semantic actions returns a function which expects a parameter-less function returning the left-hand operand value,
evaluates to get both operand values, combines the values and returns the value.
combines it with the right-hand value, and returns the result.
function fromLeft (values) {
return values.slice(1).reduce(
function (left, right) {
return function () { return right(left); };
}, values[0]);
}fromLeft creates a left-associative tower of parameter-less function calls.
Exponentiation is again handled as a special case of right-associativity:
function fromRight (values) { // special case: Math.pow only
return values.reduceRight(
function (right, left) {
return function () { return Math.pow(left(), right()); };
}, values.pop());
}The result of parsing 1 + 2*3 - 45/(1 + 2 ** 3)
looks somewhat disappointing:
function () { return right(left); }On the other hand, the trace shows that the result is an array containing a function which can be unwrapped and executed:
ll1.parser(input, ll1.reflect(null, actions))[0]()This does return the expected value 2. Note that the result of a reflect observer will be an array
containing whatever values were collected for rule 0, i.e., for the start symbol of the grammar. The code shows
how to extract the function from the array and call it.
Adding memory
Here is a bit of a twist:
num: function (_, _tuple, _values) { return function () { return memory[Number(_values[0])]; }; }This way num turns a Number value into a reference to the array memory which at this point does not even exist.
Parsing
7 + 6*5 - 4/(3 + 2 ** 1)produces the result
function () { return item(total); }But if memory is set and the result of parsing unwrapped and executed
expr = ll1.parser(input, ll1.reflect(null, actions))[0];
memory = [0, 2, 4, 7, 69, 5, 4, 3]
print( memory.join(' ') )
print( expr() )
delete memory
memory = [0, 3, 2, 1, 45, 3, 2, 1]
print( memory.join(' ') )
print( expr() )we get
0 2 4 7 69 5 4 3
20
0 3 2 1 45 3 2 1
2i.e., the result expr is a nest of functions representing the parsed sequence of operations
which can be executed
on whatever values are stored into memory — even if the name is destroyed and a new binding is created.
A little language
Programming languages usually combine arithmetic expressions with the ability to control the flow of execution. Euclid's algorithm to compute the greatest common divisor of two natural numbers exercises both aspects:
while (x != y)
if (x > y) x = x - y;
else y = y - x;The notation is borrowed from JavaScript and many other languages.
This input is a sentence for a grammar which combines the grammar used for most of this chapter
with control flow and the use of variable names.
The following needs to be inserted between the definitions for Number and expr:
%token Name '[a-z_A-Z][a-z_A-Z0-9]*';
program: statement*;
statement: assign | if | while | block;
assign: Name '=' expression ';';
if: 'if' '(' expression ')' statement else?;
else: 'else' statement;
while: 'while' '(' expression ')' statement;
block: '{' statement* '}';
expression: conj or*;
or: '||' conj;
conj: cmp and*;
and: '&&' cmp;
cmp: expr ((lt | le | gt | ge | eq | ne) expr)?;
lt: '<';
le: '<=';
gt: '>';
ge: '>=';
eq: '==';
ne: '!=';prim needs to include a variable Name in addition to a Number:
prim: num | name | '(' expression ')';
num: Number;
name: Name;
prim allows a full expression so that all operations can use parentheses.
The semantic actions discussed at the end of the last section
implement expr as parameter-less functions. They have to be augmented
to handle control flow, conditions, and variable names.
function sequence (values) {
return values.slice(1).reduce(
function (left, right) {
return function () { left(); return right(); };
}, values[0]);
}
actions = {
program: sequence,
assign: function (values) {
return function () { memory[values[0]] = values[1](); };
},The result of most Builder functions are parameter-less functions.
sequence uses reduce to build a tower of function calls which calls such functions
in order from left to right.
One such function could be an assignment:
assign expects the string value for a Name, i.e, a variable name,
and a parameter-less function producing the value to be assigned, and returns a parameter-less function
which will evaluate and assign to some memory location identified by name.
The other statements are similarly mapped to JavaScript:
'if': function (values) {
if (values.length == 2)
return function () { if (values[0]()) values[1](); };
else
return function () { if (values[0]()) values[1](); else values[2](); };
},
'while': function (values) {
return function () { while (values[0]()) values[1](); };
},
block: sequence,The arithmetic expressions have to be augmented with comparisons and logical operations but they all follow the familiar pattern:
expression: fromLeft,
or: function (values) { return function (left) { return left() || values[0](); } },
conj: fromLeft,
and: function (values) { return function (left) { return left() && values[0](); } },
cmp: function (values) {
if (values.length == 1) return values[0];
return function() { return values[1]( values[0], values[2]); };
},
lt: function (values) { return function (left, right) { return left() < right(); } },
le: function (values) { return function (left, right) { return left() <= right(); } },
gt: function (values) { return function (left, right) { return left() > right(); } },
ge: function (values) { return function (left, right) { return left() >= right(); } },
eq: function (values) { return function (left, right) { return left() == right(); } },
ne: function (values) { return function (left, right) { return left() != right(); } },JavaScript permits chaining comparison operators; however, a < b < c does not bracket b between a and c.
Therefore, the grammar of the little language discussed here does not permit such chaining,
at least not without using parentheses. The grammar forces comparison to be non-associative.
prim added name to produce the value associated with a variable name and
num, of course, needs to produce the value of a Number token:
num: function (values) { return function () { return Number(values[0]); }; },
name: function (values) { return function () { return memory[values[0]]; }; }memory now is a collection of values, selected by variable names. Here is how Euclid's algorithm would
be compiled and executed:
ll1 = new page.LL1(page.EBNF.from( Grammar.value )); // the grammar
input = ll1.tokenizer()( Input.value ); // the program
expr = ll1.parser(input, ll1.reflect(null, actions))[0]; // the executable code
memory = { x: 36, y: 54 } // initialize x and y
expr() // run the executable code
print( memory.x + ' ' + memory.y )This prints
18 18
That's all, folks!
page