[YG Conlang Archives] > [jboske group] > messages [Date Index] [Thread Index] >


[Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [jboske] Syntactic unambiguity



Arnt Richard Johansen scripsit:

> Do we know what these hacks are?

Yes.  In grammar.300, the tokens in the 700-series are inserted *before*
the problematic areas to disambiguate them.  For example, we don't know
whether a number is going to be a proper number or a -MAI free modifier
until we get to its end, so we insert a lexer_L_712 in front of it in
the former case, and a lexer_A_701 in front of it in the latter case.
The assumption is that a preparser stage inserts these tokens as needed
by snuffling ahead as far as necessary.

The official parser actually uses a slightly different scheme.  Rather
than inserting a 700-series token before the problematic sequence, it
replaces the whole sequence with a 900-series token.  These tokens are
normally commented out, but are restored when constructing a parser;
at the same time, the 900-series *rules* are commented out.

> I was of the belief that iff YACC is a LALR(1) parser, then it can't cope
> with more than 1-token lookahead. Why are we writing rules for it that
> requires more than 1-token lookahead?

It's Lojban, not the YACC grammar, that contains such rules.  We use the
inserted (or replacing) tokens to overcome YACC's limitations.  Similarly,
we use YACC's error-recovery machinery to insert elidable terminators.

> That is very disturbing. I hope those hacks are only a few, so that we can
> have a hope of *proving* the grammar to be unambigous.

Currently there are 24 of them.  See the comments on the definitions of
the 700-series and 900-series tokens.  Most of them are either bounded
(like EK) or unbounded but repetitive (like numbers).  The worst case
is tenses, which have a substantial grammar-within-the-grammar.

With the exception of NAI, there is no selma'o that is usable both in
the main grammar and in the "preparser grammar".  The only possible
ambiguities, therefore, are between preparser rules.

-- 
"Well, I'm back."  --Sam        John Cowan <jcowan@hidden.email>