Contextual Predicates

A contextual predicate allows you to add conditions at choice points based on scanning back in the call/lookahead stack. We are not aware of any other parser generator tool that has this feature.

The easiest way to describe this is with some actual examples.

Specifying that a production is non-reentrant

Probably the most typical usage will be to guarantee that a production is not re-entrant, i.e. that it is not allowed to nest recursively. This can now be expressed very cleanly with a contextual predicate as follows:

   [ SCAN ~\...\Foo => Foo ]

First of all, the tilde “~” character that starts the predicate indicates negation. The above predicate indicates that we scan backward in the call stack to see whether we have previously entered a Foo production. If that is not the case (because the condition is negated with the “~”) then we can enter the Foo production.

The above sort of predicate will probably be the most commonly used pattern. However, more complex conditions can be formed.

Scanning Forward vs. Backward, Ellipsis and Wild-card

Note that the elements in a contextual predicate are separated either with a backslash “\” or a forward slash “/”. The previous example used a backslash and that means that we scan backwards from the current production up towards the root; a forward slash means that we are scanning forward from the root.

In the above example, the ellipsis “…” that follows the backslash means that there can be an arbitrary number of intervening productions in the call stack. The wild-card or simply dot means that we match the occurrence (exactly one!) of any production. If, for example, we wrote:

  [ SCAN ~\.\Bar => Foo] 

this would mean that we enter the Foo production only if the direct parent of the current production is not a Bar.

Or alternatively,

   [ SCAN \.\Bar => Foo ]

would mean that we enter the Foo production if the parent of the current production is a Bar. (Note that this predicate does not start with a “~”, so thus is not negated.

Now, consider the following predicate that uses a forward slash:

    [ SCAN /Foo/Bar => Baz ]

This means that we enter the Baz production only if the root production is a Foo and we then entered directly a Bar.

Optional Ending Slash

If the predicate begins with a forward slash, it may end optionally with a backslash. And vice versa. If a predicate begins with a backslash, it may optionally end with a forward slash. For example, consider the following predicate:

   [ SCAN /Root/.../Foo\ => Bar ]

This means that we check whether the root production, our entry point, was Root and the current production is Foo. The ending backslash means that Foo must be the current production. Note also that the following two predicates are equivalent:

    [ SCAN /Root/.../Foo/...\ => Bar]

and simply:

   [ SCAN /Root/.../Foo => Bar]

Summary

A contextual predicate starts optionally with a tilde “~” to indicate negation. The first character after the tilde (or simply the first character if there is no tilde) must be either a backslash or a forward slash. The backslash indicates that we are scanning backwards from the current production and the forward slash means that we are scanning forward from the current production.

An ellipsis “…” means that we can have an arbitrary number (including zero) of intervening productions. A dot “.” means that we have exactly one production of any type.

A lookahead predicate can be combined with other conditions in a SCAN instruction. It can be combined with numerical or syntactical lookahead.

   ( SCAN 2 ~\...\Foo => Foo )*

The above would mean that we check that we aren't already inside a Foo production AND we also scan ahead up to 2 tokens of lookahead when deciding whether to enter Foo. Otherwise, we break out of the loop.

Or alternatively, we can specify a syntactic and/or semantic lookahead:

   ( SCAN ~\...\Foo "bar" "baz" => Foo )*

In the above we specify that Foo must be non-reentrant and also that the next 2 tokens must be “bar” followed by “baz”, or else we jump out of the loop.

NB. If you have a SCAN statement that does not specify either numerical or syntactic lookahead, then the generated code will scan ahead an unlimited number of tokens. (Unless the expansion to be parsed is constrained by an up to here marker.) This is a key characteristic of the newer scan statement.

Note also that contextual predicates, like syntactic lookahead in CongoCC, can be nested arbitrarily and work in an arbitrarily nested scanahead routine.