Here, by category, is a list of all the settings in CongoCC that can be set at the top of a grammar file. ====== Options relating to File/Class/Package Naming ====== By default, the tool has some naming conventions that, actually, you might as well use. For example, if your grammar lies in a file named ''Foo.ccc'' then the tool will generate ''FooParser.java'' and ''FooLexer.java'' based on the filename. You can override that default naming using the options **PARSER_CLASS** and **LEXER_CLASS** respectively. You might prefer to use **BASE_NAME**. If you set: BASE_NAME=Foo; at the top of the grammar, then you set the parser class to ''FooParser'' and the lexer class to ''FooLexer'' in one fell swoop. If you set: BASE_NAME=""; then the parser class is ''Parser'' and the lexer class is ''Lexer'' and some people would like that. Unlike its predecessor, CongoCC does not generate any code into the *default* or *unnamed* package. In fact, it puts the parser and lexer code into a package that you can specify via the **PARSER_PACKAGE** setting. And, assuming you have tree-building turned on, it generates the various parse tree nodes in a package that can be specified via the **NODE_PACKAGE** setting. If you don't set either of those things, the packages are named based on the parser class name. So, if your parser class name is ''FooParser'', it will create a package called ''fooparser'' and the node package (assuming it is also unspecified) will be ''fooparser.ast''. And, that is all generated relative to the location of the grammar file. **Unless** you override that with **BASE_SRC_DIR** (which most people will do!) The **BASE_SRC_DIR** is either an absolute directory on the file system or (more likely) is a relative directory, relative to where the grammar file is. So, if you don't specify it, it is the same as saying: BASE_SRC_DIR="."; Something like: BASE_SRC_DIR="../../build/generated-code"; would be pretty typical. Note also that **BASE_SRC_DIR** is one of the handful of settings that can also be set on the command-line, via the ''-d'' setting. If it is set in the grammar and on the command-line, the command-line setting takes precedence. ====== Options relating to Tree Building ====== By default, tree-building is **on**. You can turn it off via: TREE_BUILDING_ENABLED=false; If you want tree-building to be enabled, but to be off by default, you can use: TREE_BUILDING_DEFAULT=false; In CongoCC (like in JavaCC 21, but **not** legacy JavaCC) Tokens are considered the terminal nodes in the parse tree. They are added to the tree. You can disable that via: TOKENS_ARE_NODES=false; The default tree-building pattern is that a production creates a node if there are two (or more) nodes on the stack when the production exits. That is called "smart node creation" and is very typically what one wants, but maybe not. You can turn it off via: SMART_NODE_CREATION=false; And then, the tree-building machinery will create a new node for a production, assuming that there is **one** or more nodes created. If you don't want any nodes created by default, you can use: NODE_DEFAULT_VOID=true; or just: NODE_DEFAULT_VOID; for short. (All these boolean-valued settings can be set to true by just writing them with no value. ====== Options Relating to Niggling Whitespace Issues ====== By default, CongoCC //normalizes// newlines to a lone line-feed character,i.e. converts CR or CR-LF to LF, i.e \n. If you want to preserve the newlines as they were in the input you can write: PRESERVE_LINE_ENDINGS; at the top of your grammar. By default, hard tab characters are left as they are in the input and also, for error reporting purposes, are treated as one horizontal space. This is not usually what you want. If you write: TAB_SIZE=4; at the top of the grammar, the tabs are converted to spaces on the basis of the tab stops being at 4-space intervals. (Or whatever interval you want, of course.) If you really want to preserve the original tab characters, you can also write: PRESERVE_TABS; If you want to ensure that the input to the parser ends with a final newline (i.e. it gets tacked on if it's not there) you can use: ENSURE_FINAL_EOL; If you want to ensure that the input to a parser ends with some specific string, possibly a control character you can use the more general: TERMINATING_STRING="\u001A"; In the above case, the hex ''1A'' is tacked on if it is not there. That is a CTRL-Z control character. But the terminating string can be anything you specify. By the way: ''ENSURE_FINAL_EOL;'' is the exact same thing as: ''TERMINATING_STRING="\n";''. ====== Settings relating to Lexical Processing ====== If you want certain token types to be inactive by default (though presumably turned on at key spots) you can use the **DEACTIVATE_TOKENS** setting. DEACTIVATE_TOKENS=LPAREN,RPAREN; means that, by default, those tokens are inactive. You can use: DEFAULT_LEXICAL_STATE=JAVA; to specify that the parser, by default, starts in the ''JAVA'' lexical state. If this setting is unused, the default lexical state is taken to be one named ''DEFAULT''. You can use the **EXTRA_TOKENS** setting to specify some extra token types are not defined with regular expressions in the lexical grammar.This can be particularly useful in token hook routines, in particular for generating //synthetic// tokens. For example, this is how the synthetic **INDENT** and **DEDENT** tokens in Python are handled. You will see the line: EXTRA_TOKENS=INDENT,DEDENT; which defines these token types, except they are nowhere to be found in the lexical grammar! ====== Settings related to Token class generation ====== One feature that is //sometimes// useful is //token chaining//. You can insert a synthetic token into the chain of tokens. This is actually a very tricky, error-prone usage pattern that is off by default. And we recommend that you only turn it on if you really need it! If you examine the included Python grammar, you will see this in use. You turn it on via: TOKEN_CHAINING; Conversely, you can turn on the **MINIMAL_TOKEN** setting to generate a minimal token. With that turned on, ''Token'' does not have an ''image'' field. It has a ''getImage()'' method that uses its ''beginOffset/endOffset'' to get its string image on demand. But there is no ''setImage()'' method. There are some (possibly tricky) coding patterns that involve setting a token's String image to a different value from what was read in. But if you don't need that, you can use **MINIMAL_TOKEN** and this also reduces the memory footprint of your token objects. ====== Settings related to generating a fault-tolerant parser ====== You can set FAULT_TOLERANT; at the top of your grammar to turn on the experimental support for building a [[fault tolerant]] parser. It is off by default.