Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWE002 No mirroring (IVS-304) #333

Open
wants to merge 3 commits into
base: development
Choose a base branch
from
Open

Conversation

aothms
Copy link
Collaborator

@aothms aothms commented Dec 27, 2024

@civilx64 @Ghesselink I had a bit of a nasty regression here that took me some time to figure out, see ee42006 (tests still failing though...)

I added a 'shortcut' to "The value of attribute {attribute} must be ... " because I wanted to reuse the operators we have defined there. However I don't want to apply it to an attribute value, but to the result of some calculation (matrix determinant) so I had to alias out the "of attribute {attribute}".

@gherkin_ifc.step('The value of attribute {attribute} must be {value_or_comparison_op}')
@gherkin_ifc.step('The value of attribute {attribute} must be {value_or_comparison_op} {display_entity:display_entity}')
@gherkin_ifc.step('The value of attribute {attribute} must be {value_or_comparison_op} the expression: {expression}')
@gherkin_ifc.step('The resulting value must be {value_or_comparison_op}')
def step_impl(context, inst, value_or_comparison_op:str, attribute:str=None, expression:str=None, display_entity=0):
"""
Compare an attribute to an expression based on attributes.
The {comparison_op} operator can be 'equal to', 'not equal to', 'greater than', 'less than', 'greater than or equal to', and 'less than or equal to'.
The {expression} should be composed by attribute values, and use the following operators:
+ : addition;
- : subtraction;
* : multiplication;
/ : division;
% : modulus;
** : exponentiation.
"""
binary_operators = {
'equal to' : operator.eq,
'not equal to' : operator.ne,
'greater than' : operator.gt,
'less than' : operator.lt,
'greater than or equal to' : operator.ge,
'less than or equal to' : operator.le,
}

In doing so the step became ambiguous with the following:

@gherkin_ifc.step('The {i:value_or_type} must be "{value}"')
def step_impl(context, inst, i, value):
values = [v.lower() for v in misc.strip_split(value, strp='"', splt=' or ')]
inst = recursive_unpack_value(inst)
if isinstance(inst, ifcopenshell.entity_instance): # redundant due to the statement 'Its entity type must be X; see e.g. ALS007 & ALS008'. This also allows to check for inheritance
inst = inst.is_a()
if inst.lower() not in values:
yield ValidationOutcome(inst=inst, expected= value, observed = inst, severity=OutcomeSeverity.ERROR)

However, Behave never warned me about the ambiguous step (maybe because we have more complex custom parse types??) so all of a sudden a large amount of tests started failing and it took me a long time to figure out that this was because the steps got directed to another implementation.

What didn't help was that I felt there was a bit of a gap in the feature file I first looked at, because we directly compare the value of an entity instance to the expected type, I would have expected something with "its entity type".

Given Its attribute Items
Then The value must be "IfcCurveSegment"

So we kind of saw this coming, we have the same functionality in a bunch of permutations:

  • With or without attribute resolution or direct application to preceeding value
  • With or without explicit binary predicates
  • With or without ability to add Expression
  • With or without automatic entity type resolution

This was expected because we keep adding adhoc things and because we can only match full sentences.

Another problem I have is with the cognitive load of the attributes.py impl I referenced first here. It's not very consistent, with all the branches: there is the expression, the empty / not empty handling and now my regex with binary operands. All have different behaviour, functionality and assumptions.

I think this would be a good candidate to clean up a bit. For example by uplifting the inline binary operators to a new implementation function with those binary ops as a full blown parse type. As a guideline (not strict rule) I think we should aim for no top-level branches in the implementation functions with different functionality. I know this can be tricky with making unique sentences.

  • Consistently apply the attribute access in a separate step, before the then step (is that possible??)
  • Consistently apply the is_a() step in a separate step before the then step (is that possible??)
  • Make sure all steps consistently support different value types: str, number etc.
  • Break down complex implementation functions into separate syntax patterns.

@Ghesselink as the most familiar person with all the rules and the most eager cleaner upper, is this something you'd like to tackle in the new year?

@civilx64
Copy link
Contributor

Thanks for doing the dirty work to get to a resolution. I am trending in the same direction of adding to this step implementation to handle comparison on results of a calculation with my current WIP for linear placement in LIP002. I like the change to The resulting value and will plan to follow that pattern as well.

Also thanks for pointing out the dissonance on ALS004. I agree that the step should involve a check of entity type and not of attribute value. I've captured that as a to-do for the backlog.

In terms of the larger clean up - yes, I can certainly agree that this implementation is a prime candidate for refactoring and housekeeping. I agree that @Ghesselink is the best candidate to lead but I will be happy to help if and where I can.

@aothms do you think this PR is ready for review as-is or is there the possibility that resolution of the failing tests (looks like IFC102 and SPS007) will affect your work here on SWE002?

@aothms
Copy link
Collaborator Author

aothms commented Dec 28, 2024

Well as you said, so far not really a resolution. It seems I've coded myself in a corner that's hard to get out. Simplest was to adapt the quotes in the IFC102 feature text so that the implementation logic could be a bit further simplified.

Not sure if you were already on board at that time @civilx64 but Geert and I have long discussed whether we should make the separation between parameters and control language more clear in the scenarios.

Something like:

Then [its type] is not .IfcProductRepresentation. (excluding subtypes)

Or some other permutation of brackets/parens/quotes/backticks/tildes to consistently indicate whether something is:

  • prose used for matching
  • schema constructs -> .ADDED.
  • freeform literal values -> 1.0 / "hi"
  • 'modifiers' -> excluding subtypes / A *required* relationship
  • binary or unary operator

I don't know how far we would go there, maybe distinguishing between datatypes is a bit overkill, but anyway.

  1. This provides clarity for the end user and rule writer because it is more clear what words can be alternated to form different scenarios.
  2. It helps in step disambiguation a bit like preventing SQL injection, if your pattern is something should be [{x}] without subtypes then the matching of x can never overflow into "without subtypes" because it is constrained by the square brackets.

Maybe also something to formalize and think about as a new year resolution.

Edit: OK tests do seem to be passing now (I don't have IFC4X3 schema locally, so not all tests run...) So I guess this is somewhat ready for review.

Edit2: By all means would be happy if you're involved in any refactoring, just thought your plate is a bit full with alignment rules still. New rules I guess is always a priority :)

@Ghesselink
Copy link
Contributor

I spent some time looking through the code, but I wasn’t sure where to start—it’s quite a mess. This is probably a good moment to invest some effort in cleaning it up, as I feel the current solution, its type is not "<Entity>" excluding subtypes, is rather hacky.. This isn’t your fault, but as you mentioned, these are older parts of the code that we kept expanding without taking a step back to refactor. Initially, we did attempt to come up with a strategy using pyparsing, but I think we abandoned that effort partly due to a lack of rules for predicting parsing patterns at the time. However, our rule set has grown significantly since then. Another con we noted was that the feature becomes less human-readable. We have now learned that very few people actually look at the feature files, and the souls that are brave enough to read them are probably also capable enough to read through brackets/quotes etc.

I tried to find some logical structure within the step implementations themselves, but perhaps we should reverse the process: start with a blank page and develop a coherent parsing strategy, rewrite the Gherkin steps using clearer language in the scenarios, and adjust the step implementations accordingly.

Another con we noted back then was that the feature becomes less human-readable. We have now learned that not many people look at this feature file, and the souls that are brave enough to read them are probably also capable enough to read through brackets/quotes etc.

To add even more complication; we are not even consistent with the way we quote some values. E.g.

Given RepresentationIdentifier = 'Axis' (single quote)

and

Then The value must be "Curve3D" (double quote)

When we mix these (using single instead of double), the behave parser doesn't pick it up ..

Consistently apply the is_a() step in a separate step before the then step (is that possible??)

I think this was the first step of 'cornering' ourselves; we stepped back from writing two separate steps for this because of inheritance. Instead of

 Given its type 
 Then the value must be X 
We started writing 
Its type is( or is not) X *excluding subtypes*

Because with he latter we still have access to inst in the step implementation function, which is needed if we'd like to say inst.is_a(X)

The good news is that the its type is X implementation is equivalent to the Given statement. This means we could perform a simple check and return an Outcome:

If the condition is true, yield an Outcome with PASSED severity.
If false, yield an Outcome with an ERROR.
For instance;
Then Its Entity Type is 'IfcLocalPlacement'

Consistently apply the attribute access in a separate step, before the then step (is that possible??)

Again, sometimes yes and sometimes not ..
I've modified some gherkin rules from
Then The value of attribute X must be Y
to

Given Its attribute X 
Then the value must be Y

In these cases, the simplification works. Issues start arising when we say the attribute must be empty. When we separate these steps, the last The value must be empty doesn't get reached (context.instances is simply empty and we don't proceed to the next gherkin step). We could find a way around this, we could also simply reuse the given step

@gherkin_ifc.step("{attribute} {comparison_op:equal_or_not_equal} {value}")

Perhaps a good plan would then be to

  • Identify which implementations are similar and re-write them
  • Clarification of parameters and control language in the scenarios.

For the latter, let's start with the mentioned categories;

prose used for matching ; opening and closing square brackets []
e.g.

Then [its type] is not IfcWall
Then [Its value] must be greater than 0 (the Given equivalent would be;  [Its value] is greater than 0
  • Schema constructs -> dots; .
    e.g.
Its type is .IfcWall. 
.RepresentationType. is .Annotation.
The .SegmentLength. of the segment must be 0
.OperationType. is empty
  • Freeform literal values -> quotes "smth"
    e.g.
RepresentationType is "Annotation"
The SegmentLength of the segment must be "0"
Its attribute "Representation"
  • modifiers' -> excluding subtypes / A required relationship -> more than / less than symbols (I'd rather not use opening and closing brackets due to the ambiguety with sentences containing it; e.g.instance(s))
There must be less than 1 instance(s)  <excluding subtypes>
Its entity type is IfcElement <including subtypes>
  • binary or unary operator -> asterix *
Its value must be *greater than* 0
There must be *at least* 1 instance(s) of IfcBuilding 
There must be *at most* 4 instance(s) of IfcSite
There must be *less than* 1 instance(s) of IfcBuilding excluding subtypes
The values must be *identical*
The RepresentationType must be *empty*

@Ghesselink
Copy link
Contributor

Some WIP progress trying it out :)
#336

@aothms
Copy link
Collaborator Author

aothms commented Jan 5, 2025

I like the readability of what you're proposing here. But only glanced through it quickly.

Then [Its value] must be greater than 0 (the Given equivalent would be; [Its value] is greater than 0

  1. Do we have something for this actually? i.e to couple must be with then and is with given statements, but otherwise using the same impl function. Could this be something to further generalize in our decorator?

  2. Another thing to consider is whether the string statement we pass to 'our decorator' is always passed through behave unaltered or whether we allow for some 'multiplexing'. We could for example say that we always write "the something is bla", and that we pass this to behave.given and then pass "the something is bla".replace('is', 'must be') to behave.then. I'm paraphrasing here, but you get the idea. This obviously add a layer of magic that is hard to decipher for new comers, but I'd argue it's a small step of additional surprise on top of the already quite complex decorator behaviour.

  3. The same applies maybe to the pyparsing grammars or the parse types constructed from enums, nothing forbids us to build our own intelligence here when we allow the string to modified before feeding to behave. In particular maybe we could be a little bit more intelligent with type annotations. When we have a function with def impl(context, inst, param : typing.Literal["should be", "must"]). Then we could transparently unpack these type annotations to a constraint in gherkin.

Anyway, just some additional ideas... Let's discuss

@Ghesselink
Copy link
Contributor

Do we have something for this actually? i.e to couple must be with then and is with given statements, but otherwise using the same impl function. Could this be something to further generalize in our decorator?

I think so, yes. We're now categorizing implemenations into a given, then and steps, it would probably be better to sort on domain (alignment, values, geometry) instead.
For instance, in the given folder there is the implementation
@gherkin_ifc.step("{attribute} {comparison_op:equal_or_not_equal} {value}")
This one is used for both Given and Then statements at the moment. Some examples;

Given PredefinedType = 'POSITION' or 'STATION' -> ALB005
Then Its entity type is 'IfcGeometricRepresentationContext' -> GEM051

Furthermore, many Then steps can be reformulated to fit into this category.

The value of attribute RepresentationType must be Curve3D
---> 
RepresentationType = Curve3D
---> Re-written 
.RepresentationType. must be "Curve3D"

So the pattern we use would probably be something like

@step_impl('.{schema_construct}.is"{literal_string}"') -> for Given 
@step_impl('.{schema_construct}.must be"{literal_string}"') -> for Then
def step_impl(context, inst, schema_construct, literal_string):
      if not check: 
           return ERROR
       else: 
            return PASSED 

I'm paraphrasing here, but you get the idea.

So instead of the previous example, we'd have then simply one statement
@step_impl('.{schema_construct}.must be"{literal_string}"') because .RepresentationType. is "some_value" would be altered in our decorator to be captured in this implementation.


hard to decipher for new comers

I'm afraid the boat 'easy for new comers' sailed away a while ago.
Although I noticed with the refactoring that some old implementations can also be re-written. This reduces the amount of implementations. Arguably it would be nicer for newcomers even since there would be one obvious way of selecting instances for validation. But maybe this is a naive thought ..
For example (the example above also applies)
```
Given An IfcAlignment

Then It must be nested by only the following entities: IfcAlignmentHorizontal, IfcAlignmentVertical, IfcAlignmentCant, IfcReferent

Can be re-written to
Given An .IfcAlignment.
Given Its attribute .IfcRelNests.
Given Its attribute .RelatedObjects.

Then [Its type] *is equal to* .IfcAlignmentHorizontal. or .IfcAlignmentVertical. or .IfcAlignmentCant. or .IfcReferent.

---
> The same applies maybe to the pyparsing grammars or the parse types constructed from enums

This would be nice, to really have control over where our statements are matched. This worked out well with the enum values; the ambiguity result in the issues we're having, but when we match literal strings this is not the case anymore.   

@E00050
Feature: SWE002 - Mirroring within IfcDerivedProfileDef shall not be used

The rule verifies that IfcDerivedProfileDef is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncated description?

@evandroAlfieri evandroAlfieri changed the title SWE002 No mirroring SWE002 No mirroring (IVS-304) Jan 8, 2025
@civilx64
Copy link
Contributor

civilx64 commented Jan 9, 2025

Not sure if you were already on board at that time @civilx64 but Geert and I have long discussed whether we should make the separation between parameters and control language more clear in the scenarios.

I don't remember it but it does make sense.

I think so, yes. We're now categorizing implemenations into a given, then and steps, it would probably be better to sort on domain (alignment, values, geometry) instead.

+1

The same applies maybe to the pyparsing grammars or the parse types constructed from enums, nothing forbids us to build our own intelligence here when we allow the string to modified before feeding to behave. In particular maybe we could be a little bit more intelligent with type annotations

I would vote against heading this direction and sticking with something more transparent like Geert's example

@step_impl('.{schema_construct}. is "{literal_string}"') -> for Given 
@step_impl('.{schema_construct}. must be "{literal_string}"') -> for Then
def step_impl(context, inst, schema_construct, literal_string):
      if not check: 
           return ERROR
       else: 
            return PASSED 

So how do we move forward? The WIP branch appears to be a solid step in a great direction but unfortunately is blocking any further rule development ATM. I would like to finalize LIP002 but there are considerable moving parts on both the geometric calculation and comparison side:

  1. should I use mpmath for LIP002?
  2. can we make some overall consolidation of geometric checking within a particular tolerance?
  3. can we settle on a single place to determine the precision / epsilon for this tolerance?
  4. Do we need to refactor compare_with_precision?

as well as on the feature file itself:

  1. I think the step should look like .Coordinates. must be *equal to* "the calculated linear placement" ...???

@Ghesselink
Copy link
Contributor

The WIP branch appears to be a solid step in a great direction but unfortunately is blocking any further rule development ATM.

Don't let it block you :-) If it's merged before the WIP, I'll address the conflicts (some help would probably be nice).

Do we need to refactor compare_with_precision?

I'm not sure if it answers the question, but I think this
The value of attribute RadiusOfCurvature must be equal to the expression: HorizontalLength / ( EndGradient - StartGradient )
Will be in a seperate statement.

I think the step should look like .Coordinates. must be equal to "the calculated linear placement" ...???`

I'd think just

.Coordinates. must be *equal to* the calculated linear placement

And then behave will match @('.{schema_construct}.must be *binary_pred* the calculated linear placement')
But as said, I'd say try and push it with the current codebase and rebase the WIP branch / handle the merge conflicts (depending on the merge sequence)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants