You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into a few issues when trying to parse the IFC express schema.
One being that something like TrueNorth is tried to be parsed (e.g. in an expression rule) as a boolean literal, because the literal rule has higher priority than the simple_id rule, but obviously after changing the order of these rules made something like True not a boolean literal anymore.
So I think there are two options to solve this in its core issue (I think it's just a sign for further issues that may arise because of ambiguous parsing):
Either all the basic parsing rules check that they are not another basic parsing rule (e.g. simple_id checks that it doesn't contain e.g. literals or other things that may also be a simple_id) or use a separate lexer/tokenizer that weeds these cases out already.
I personally prefer using a lexer, it's easier to restrict the problem space/abstract the parser on top of that, because I also have had issues with weird parsing ambiguities in the past when not using a separate lexer (in way simpler languages). I think the BNF grammar of STEP and EXPRESS should allow tokenizing/lexing the whole input without having to think about modal lexing etc. but I'm not sure yet.
I have actually started writing a parser/lexer for the express language, I'm not sure yet, if I will progress this project much further though (I guess I underestimated the scope of supporting STEP completely).
My original motivation was having better error recovery/messages (by using something like chumsky as parser combinator library).
Hi,
I ran into a few issues when trying to parse the IFC express schema.
One being that something like
TrueNorth
is tried to be parsed (e.g. in anexpression
rule) as a boolean literal, because theliteral
rule has higher priority than thesimple_id
rule, but obviously after changing the order of these rules made something likeTrue
not a boolean literal anymore.So I think there are two options to solve this in its core issue (I think it's just a sign for further issues that may arise because of ambiguous parsing):
Either all the basic parsing rules check that they are not another basic parsing rule (e.g.
simple_id
checks that it doesn't contain e.g. literals or other things that may also be asimple_id
) or use a separate lexer/tokenizer that weeds these cases out already.I personally prefer using a lexer, it's easier to restrict the problem space/abstract the parser on top of that, because I also have had issues with weird parsing ambiguities in the past when not using a separate lexer (in way simpler languages). I think the BNF grammar of STEP and EXPRESS should allow tokenizing/lexing the whole input without having to think about modal lexing etc. but I'm not sure yet.
I have actually started writing a parser/lexer for the express language, I'm not sure yet, if I will progress this project much further though (I guess I underestimated the scope of supporting STEP completely).
My original motivation was having better error recovery/messages (by using something like chumsky as parser combinator library).
I think the lexer is almost complete, so you may be interested in this:
https://github.com/Philipp-M/express-parser/blob/6464b29e5eb14d70b0445b84567ed58fdfd144b6/src/lexer.rs
The text was updated successfully, but these errors were encountered: