meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
include [2020/02/14 00:19] – revusky | include [2023/03/03 16:16] – revusky | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== The INCLUDE Statement ====== | + | < |
- | JavaCC 21's **INCLUDE** statement allows you to break up your grammar file into multiple physical files. It would look like this typically: | + | # The INCLUDE |
- | INCLUDE(" | + | Congo' |
- | //This feature is not present in legacy JavaCC.// | + | INCLUDE " |
+ | |||
+ | *This feature is not present in legacy JavaCC.* | ||
The motivation behind **INCLUDE** should be obvious. By allowing you to reuse a base grammar or generally useful fragment in various files, you can avoid the copy-paste-modify *antipattern* that would have been necessary when using legacy JavaCC. Generally speaking, being able to to organize a large grammar into multiple physical files can be a big win in terms of maintainability. | The motivation behind **INCLUDE** should be obvious. By allowing you to reuse a base grammar or generally useful fragment in various files, you can avoid the copy-paste-modify *antipattern* that would have been necessary when using legacy JavaCC. Generally speaking, being able to to organize a large grammar into multiple physical files can be a big win in terms of maintainability. | ||
Line 11: | Line 13: | ||
Still, as they say, the devil is in the details, and there are some various wrinkles that need to be covered here. | Still, as they say, the devil is in the details, and there are some various wrinkles that need to be covered here. | ||
- | ===== The DEFAULT_LEXICAL_STATE setting | + | ## The DEFAULT_LEXICAL_STATE setting |
- | In legacy JavaCC, if you defined a token production without specifying a lexical state, any lexical definitions belonged to a lexical state called " | + | In legacy JavaCC, if you defined a token production without specifying a lexical state, any lexical definitions belonged to a lexical state called " |
- | Thus, JavaCC 21 introduces | + | Thus, CongoCC has a setting called **DEFAULT_LEXICAL_STATE**. That means that any lexical specifications where the lexical state is unspecified are in that state. Thus, a JSON grammar would likely have something like this at the top: |
+ | |||
+ | |||
+ | DEFAULT_LEXICAL_STATE=JSON; | ||
- | options { | ||
- | | ||
- | } | ||
| | ||
In that case, any grammar for a language that wants to handle embedded JSON data would presumably define its own " | In that case, any grammar for a language that wants to handle embedded JSON data would presumably define its own " | ||
Line 25: | Line 27: | ||
Actually, at the moment, **DEFAULT_LEXICAL_STATE** is the only setting you can put in an **INCLUDE**d grammar that has any effect. All of the other options are simply ignored, since they are presumably set in the top-level *including* grammar. In legacy JavaCC, if you defined a token production without specifying a lexical state, those patterns are matched in a lexical state called " | Actually, at the moment, **DEFAULT_LEXICAL_STATE** is the only setting you can put in an **INCLUDE**d grammar that has any effect. All of the other options are simply ignored, since they are presumably set in the top-level *including* grammar. In legacy JavaCC, if you defined a token production without specifying a lexical state, those patterns are matched in a lexical state called " | ||
- | ===== Wrinkles with Code Injection | + | ## Wrinkles with Code Injection |
- | JavaCC still supports the legacy JavaCC constructs of **PARSER_BEGIN...PARSER_END** and **TOKEN_MGR_DECLS**. (For how much longer, I am not making any promises...). However, those constructs are ignored | + | You can |
- | You can still *inject* code into the generated parser or lexer class, from within an included grammar, but you need to write something like: | + | |
- | + | ||
- | | + | |
- | { | + | |
- | ... | + | |
- | } | + | |
{ | { | ||
... | ... | ||
Line 41: | Line 38: | ||
or: | or: | ||
- | INJECT(**LEXER_CLASS**) : | + | INJECT LEXER_CLASS : |
- | { | + | |
- | ... | + | |
- | } | + | |
{ | { | ||
... | ... | ||
Line 51: | Line 45: | ||
JavaCC 21 will replace the **PARSER_CLASS** and **LEXER_CLASS** aliases with the appropriate names -- i.e. the actual class names of the XXXParser or XXXLexer being generated. So, if you have a Foo language in which you want to embed JSON expressions, | JavaCC 21 will replace the **PARSER_CLASS** and **LEXER_CLASS** aliases with the appropriate names -- i.e. the actual class names of the XXXParser or XXXLexer being generated. So, if you have a Foo language in which you want to embed JSON expressions, | ||
- | INJECT(JSONParser) : | + | INJECT JSONParser : |
{ | { | ||
... | ... | ||
} | } | ||
- | { | ||
- | ... | ||
- | } | ||
- | | ||
- | < | ||
because the parser class we are generating is not JSONParser, it is FOOParser! However, the person writing a a generally useful JSON grammar that can be embedded in other grammars does not know the classname of Parser (or Lexer) that is being generated. So, he needs to use the alias **PARSER_CLASS** or possibly **LEXER_CLASS** for the injected code to be included. | because the parser class we are generating is not JSONParser, it is FOOParser! However, the person writing a a generally useful JSON grammar that can be embedded in other grammars does not know the classname of Parser (or Lexer) that is being generated. So, he needs to use the alias **PARSER_CLASS** or possibly **LEXER_CLASS** for the injected code to be included. | ||
Line 65: | Line 54: | ||
So, do not be surprised when the code within PARSER_BEGIN...PARSER_END is ignored if it is within an INCLUDEd grammar. You need to write INJECT(PARSER_CLASS) to achieve the desired result. | So, do not be surprised when the code within PARSER_BEGIN...PARSER_END is ignored if it is within an INCLUDEd grammar. You need to write INJECT(PARSER_CLASS) to achieve the desired result. | ||
- | In fact, the aliases **PARSER_CLASS**, | + | In fact, the aliases **PARSER_CLASS**, |
To see a concrete example of **INCLUDE** in use, you can take a look at https:// | To see a concrete example of **INCLUDE** in use, you can take a look at https:// | ||
Line 76: | Line 65: | ||
to only contain Java source code. Thus, writing: | to only contain Java source code. Thus, writing: | ||
- | | + | |
is exactly the same as if you wrote: | is exactly the same as if you wrote: |