meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
include [2020/02/14 00:17] – revusky | include [2023/03/03 16:16] – revusky | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== The INCLUDE Statement ====== | + | < |
- | JavaCC 21's **INCLUDE** statement allows you to break up your grammar file into multiple physical files. It would look like this typically: | + | # The INCLUDE |
- | INCLUDE(" | + | Congo' |
- | //This feature is not present in legacy JavaCC.// | + | INCLUDE " |
+ | |||
+ | *This feature is not present in legacy JavaCC.* | ||
The motivation behind **INCLUDE** should be obvious. By allowing you to reuse a base grammar or generally useful fragment in various files, you can avoid the copy-paste-modify *antipattern* that would have been necessary when using legacy JavaCC. Generally speaking, being able to to organize a large grammar into multiple physical files can be a big win in terms of maintainability. | The motivation behind **INCLUDE** should be obvious. By allowing you to reuse a base grammar or generally useful fragment in various files, you can avoid the copy-paste-modify *antipattern* that would have been necessary when using legacy JavaCC. Generally speaking, being able to to organize a large grammar into multiple physical files can be a big win in terms of maintainability. | ||
Line 11: | Line 13: | ||
Still, as they say, the devil is in the details, and there are some various wrinkles that need to be covered here. | Still, as they say, the devil is in the details, and there are some various wrinkles that need to be covered here. | ||
- | ===== The DEFAULT_LEXICAL_STATE setting | + | ## The DEFAULT_LEXICAL_STATE setting |
- | In legacy JavaCC, if you defined a token production without specifying a lexical state, any lexical definitions belonged to a lexical state called " | + | In legacy JavaCC, if you defined a token production without specifying a lexical state, any lexical definitions belonged to a lexical state called " |
- | Thus, JavaCC 21 introduces | + | Thus, CongoCC has a setting called **DEFAULT_LEXICAL_STATE**. That means that any lexical specifications where the lexical state is unspecified are in that state. Thus, a JSON grammar would likely have something like this at the top: |
+ | |||
+ | |||
+ | DEFAULT_LEXICAL_STATE=JSON; | ||
- | options { | ||
- | | ||
- | } | ||
| | ||
In that case, any grammar for a language that wants to handle embedded JSON data would presumably define its own " | In that case, any grammar for a language that wants to handle embedded JSON data would presumably define its own " | ||
Actually, at the moment, **DEFAULT_LEXICAL_STATE** is the only setting you can put in an **INCLUDE**d grammar that has any effect. All of the other options are simply ignored, since they are presumably set in the top-level *including* grammar. In legacy JavaCC, if you defined a token production without specifying a lexical state, those patterns are matched in a lexical state called " | Actually, at the moment, **DEFAULT_LEXICAL_STATE** is the only setting you can put in an **INCLUDE**d grammar that has any effect. All of the other options are simply ignored, since they are presumably set in the top-level *including* grammar. In legacy JavaCC, if you defined a token production without specifying a lexical state, those patterns are matched in a lexical state called " | ||
- | |||
- | < | ||
## Wrinkles with Code Injection | ## Wrinkles with Code Injection | ||
- | JavaCC still supports the legacy JavaCC constructs of **PARSER_BEGIN...PARSER_END** and **TOKEN_MGR_DECLS**. (For how much longer, I am not making any promises...). However, those constructs are ignored | + | You can |
- | You can still *inject* code into the generated parser or lexer class, from within an included grammar, but you need to write something like: | + | |
- | + | ||
- | | + | |
- | { | + | |
- | ... | + | |
- | } | + | |
{ | { | ||
... | ... | ||
Line 43: | Line 38: | ||
or: | or: | ||
- | INJECT(**LEXER_CLASS**) : | + | INJECT LEXER_CLASS : |
- | { | + | |
- | ... | + | |
- | } | + | |
{ | { | ||
... | ... | ||
Line 53: | Line 45: | ||
JavaCC 21 will replace the **PARSER_CLASS** and **LEXER_CLASS** aliases with the appropriate names -- i.e. the actual class names of the XXXParser or XXXLexer being generated. So, if you have a Foo language in which you want to embed JSON expressions, | JavaCC 21 will replace the **PARSER_CLASS** and **LEXER_CLASS** aliases with the appropriate names -- i.e. the actual class names of the XXXParser or XXXLexer being generated. So, if you have a Foo language in which you want to embed JSON expressions, | ||
- | INJECT(JSONParser) : | + | INJECT JSONParser : |
- | { | + | |
- | ... | + | |
- | } | + | |
{ | { | ||
... | ... | ||
Line 65: | Line 54: | ||
So, do not be surprised when the code within PARSER_BEGIN...PARSER_END is ignored if it is within an INCLUDEd grammar. You need to write INJECT(PARSER_CLASS) to achieve the desired result. | So, do not be surprised when the code within PARSER_BEGIN...PARSER_END is ignored if it is within an INCLUDEd grammar. You need to write INJECT(PARSER_CLASS) to achieve the desired result. | ||
- | In fact, the aliases **PARSER_CLASS**, | + | In fact, the aliases **PARSER_CLASS**, |
To see a concrete example of **INCLUDE** in use, you can take a look at https:// | To see a concrete example of **INCLUDE** in use, you can take a look at https:// | ||
Line 76: | Line 65: | ||
to only contain Java source code. Thus, writing: | to only contain Java source code. Thus, writing: | ||
- | | + | |
is exactly the same as if you wrote: | is exactly the same as if you wrote: |