Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom grammar #36

Open
Ghesselink opened this issue Jan 17, 2023 · 5 comments
Open

Custom grammar #36

Ghesselink opened this issue Jan 17, 2023 · 5 comments

Comments

@Ghesselink
Copy link
Contributor

Ghesselink commented Jan 17, 2023

As an addition to the default behave grammar parsing. For example, to specify which values to permit
Given all instances of IfcGeometricRepresentationContext[without subtypes]

A solution would then be to use pyparsing

ifc_entity = Ifc\w+
grammar = ifc_entity + [ "with subtypes" | "without subtypes" ]

Advantages

  • It's more descriptive in terms of what values it can accept
  • It's stricter in making sure you don't supply wrong pattern
  • You get much better error messages when it doesn't comply

The issue is also to link relevant code snippets/PR comments

@Ghesselink
Copy link
Contributor Author

Applied in 6a125d3

@JTurbakiewicz
Copy link
Contributor

Hi,

I am wondering, what are the advantages of using pyparsing vs using regex in the decorator, as in the below example:

https://behave.readthedocs.io/en/stable/api.html#step-parameters
https://jenisys.github.io/behave.example/step_matcher/re_matcher.html

The first thing that comes to my mind is that regex gets unreadable really quickly. With pyparsing this is perhaps less of an issue.

@Ghesselink
Copy link
Contributor Author

Ghesselink commented Feb 16, 2023

I think you can reach the same with using regex as well, but pyparsing seems way more readable like you mention. There are some additional advantages that are probably/possibly also doable with regular expressions, but I found them quite intuitive with pyparsing.

e.g., it's possible to modify a parsed value immediately. For instance, the following line of code converts a string to an integer:
integer = Word(nums)("integer").setParseAction(lambda t:int(t[0])) One benefit of this is that the parsed sentence is saved in a dictionary, and the integer can be retrieved simply by calling parse['integer'].

It is also possible to group/nest statements. For example, in the following two sentences:

  • Then Each "IfcAlignment" must be nested by exactly 1 instance(s) of "IfcAlignmentHorizontal"
  • Then Each "IfcAlignmentHorizontal" is nested by a list of only instance(s) of "IfcAlignmentSegment"

Can be parsed with

condition_stmt = Optional(oneOf(conditions)('condition') | 
                        Group(Group(Word(alphas) + Optional(Word(alphas)))('operator') + integer('int'))('operator_stmt')
                        )

Then Each "IfcAlignment" must be nested by exactly 1 instance(s) of "IfcAlignmentHorizontal"
The values that have been parsed can be utilized in the Python implementation.

parse.get('operator_stmt', False) # exactly 1
stmt_to_op = {'exactly': operator.eq, "at most": operator.le}
constraint = ' '.join(operator_stmt['operator']) # exactly
num = operator_stmt['int'] # 1

The same for
Then Each "IfcAlignmentHorizontal" is nested by a list of only instance(s) of "IfcAlignmentSegment"
condition = parse.get('condition', False) # 'a list of only

There is also another example for the relationship_type (e.g. is nested by, nests, assigning, etc ..) in this PR #48. More specifically, in

relationship_type = Group(
Or(map(Literal, list(conjugate_verbs(relationship_verbs))))('relationship_type') +
Optional("by") #e.g. useful for the difference between 'Nests' and 'nestedBy'
)('relationship_type')
.

Pyparsing also has support for nested statements like this by using recursion. I feel this would be a bit more robust and readable as there will be less 'Optional' statements/ and less dependency on order. But yeah, I am still trying that out/experimenting with it.

Something like :

relationship_fragment = Forward() # create forward reference for recursion
relationship_fragment << (condition_sentence | relationship_type | ) # | other type of grammar | other type of grammar

# define grammar for entire string
grammar = OneOrMore(relationship_fragment)

# parse the string using the grammar
parse = grammar.parseString(input_string)

@Ghesselink
Copy link
Contributor Author

When considering the parsing of IFC versions and MVDs, is it possible to achieve this without using the " symbols, in order to maintain consistency?

Given A file with Model View Definition "CoordinationView"
And A file with Schema Identifier "IFC2X3"

@evandroAlfieri
Copy link
Collaborator

@aothms @Ghesselink is this still valid or obsolete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants