Overview of the Newer Streamlined Syntax

The following is a summary of CongoCC's streamlined syntax from the perspective of somebody migrating from legacy JavaCC. Note that, unlike CongoCC's predecessor, JavaCC 21, CongoCC does not support the legacy syntax. Any existing grammar file must be converter. There is a utility available that automatically converts the legacy syntax to the streamlined syntax. You can pick up the latest build of JavaCC 21 and the converter can be invoked via java -jar javacc-full.jar convert MyGrammarfile. (N.B. The converter is somewhat imperfect, and you may well need to hand-edit the results. Even so, it is bound to be timesaver.)

Nonterminals

There is no need to write empty parentheses after a nonterminal for a production that takes no parameters. Thus:

Foo() Bar() Baz()

can now be written as:

Foo Bar Baz

BNF Productions

  1. There is no need to write void in front of productions with no return value.
  2. A production which takes no parameters no longer needs empty parentheses.
  3. There is no need for an empty code block, i.e. {} as the first thing in your production's definition. (I mean, assuming that you don't actually need to put some code at the top of your production.)
  4. Rather than put the definition of your production inside braces (like Java actions), they are preferably listed (with no opening delimiter) and then terminated with a semicolon.

The above four points (along with the earlier point about no-args nonterminals not needing parentheses) combine such that, where you would previously write:

void Foobar() :
{}
{
    Foo() Bar() 
}

you would now write:

Foobar : Foo Bar;

A list of lexical specifications, a.k.a. Token Productions are written without the curly braces.

In this case, they are written with no opening { and the list is ended with a semicolon. Thus, instead of writing:

TOKEN #Delimiter :
{
    <LPAREN : "(">
    |
    <RPAREN> : ")">
    |
    <LBRACE> : "{">
    |
    <RBRACE> : "}">
}

the newer syntax is:

TOKEN #Delimiter :
  <LPAREN: "(" > 
  | 
  <RPAREN: ")" >
  | 
  <LBRACE: "{" > 
  |
  <RBRACE: "}" > 
;

This was deemedd to be preferable, not because it saves much space (it doesn't!) but because one aspect of the newer syntax is that the {...} are reserved for elements that really are embedded Java actions. As you can see, in the newer syntax for BNF productions, the only use of {...} is for actual Java code.

The Options at the top of a file do not need to be in any sort of block.

Since the options, like TREE_BUILDING_ENABLED=false can only occur at the very top of a file anyway, there is no need for them to be in some special construct Options {..}. Thus, where you would previous have:

options {
    BASE_SRC_DIR="..";
    PARSER_PACKAGE="com.acme.foolang";
}

You now simply put:

BASE_SRC_DIR="..";
PARSER_PACKAGE="com.acme.foolang";

at the top of your file.

The syntax for INJECT is also streamlined.

N.B. This, of course, is not a change from legacy JavaCC, since legacy JavaCC never had an INJECT statement!

You can (optionally) dispense with the parentheses in: INJECT(ClassDeclaration) :

The first block after the colon does not need braces around it. Either part of the injection can be left out. Thus, if the only point is to indicate that a Node extends a class (or implements an interface or you want to use an Annotation), where you previously had to write:

  INJECT(MyNode) :
  {
      extends AbstractBaseNode
  }
  {}

(Actually, the final empty block {} has been optional for some time in these spots, but I don't believe I ever documented that! But now it is much more streamlined.)

You can now write:

 INJECT MyNode : extends AbstractBaseNode

A more complex INJECTION that does insert some code might now look something like:

 INJECT MyNode :
     import java.util.List;
     extends AbstractBaseNode
     implements Nullable
{
    private List<Foo> foos;

    public List<Foo> getFoos() {return foos;}
    
    public void setFoos(List<Foo> foos) {this.foos = foos;}
}

Note that, in the statements immediately following the colon (and immediately preceding the opening brace) the first ends with a semicolon and the other two do not. Well, the extends and implements elements in Java do not end in with a semicolon, while an import statement does. However, if the above looks funny to you, you can (optionally) end the other two lines with a semicolon and the parser will not complain!

New SCAN construct which replaces LOOKAHEAD

The new SCAN instruction supersedes the legacy LOOKAHEAD. See here for more information.

New "up to here" syntax

The up to here syntax provides a way to specify lookahead in a much more clean, intuitive way. See here for more information.