Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser should not fail on unknown elements #51

Open
barredterra opened this issue Dec 4, 2024 · 2 comments · May be fixed by #52 or #55
Open

Parser should not fail on unknown elements #51

barredterra opened this issue Dec 4, 2024 · 2 comments · May be fixed by #52 or #55

Comments

@barredterra
Copy link

barredterra commented Dec 4, 2024

There are and likely always will be some legal elements that we forgot to define in our python structure. The parser should not fail when it encounters such unknown elements.

For example, the formerly unknown element that is introduced in #50 should not have made the parser fail. The element is rather unimportant. I'd still like to read the rest of the invoice, even when this piece of information gets lost.

Reproduce

Terminal:

mkdir drafthorse-51
cd drafthorse-51
python3 -m venv env
source env/bin/activate
pip install drafthorse
curl https://raw.githubusercontent.com/ConnectingEurope/eInvoicing-EN16931/7ce3772aff315588f37e38b509173f253d340e45/cii/examples/CII_example5.xml > sample.xml
python

REPL:

>>> from drafthorse.models.document import Document
>>> samplexml = open("sample.xml", "rb").read()
... 
>>> doc = Document.parse(samplexml)
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    doc = Document.parse(samplexml)
  File "/Users/raffael/Code/drafthorse-51/env/lib/python3.13/site-packages/drafthorse/models/elements.py", line 124, in parse
    return cls().from_etree(root)
           ~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/raffael/Code/drafthorse-51/env/lib/python3.13/site-packages/drafthorse/models/elements.py", line 114, in from_etree
    getattr(self, name).from_etree(child)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/Users/raffael/Code/drafthorse-51/env/lib/python3.13/site-packages/drafthorse/models/elements.py", line 114, in from_etree
    getattr(self, name).from_etree(child)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/Users/raffael/Code/drafthorse-51/env/lib/python3.13/site-packages/drafthorse/models/elements.py", line 116, in from_etree
    raise TypeError("Unknown element {}".format(child.tag))
TypeError: Unknown element {urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100}ReceivingAdviceReferencedDocument
@raphaelm
Copy link
Member

raphaelm commented Dec 9, 2024

Hmm, I'm undecided on this one. Currently, the library is "roundtrip-safe", i.e. if you open an XML and serialize it back out, you can be sure to have all information in there. With fixing this like #52 proposes, this would be lost, which could lead to unexpected bugs if someone relies on this property for some post-processing of documents. But I agree that there are legitimate cases for unknown elements to be in there. So we need to decide how important this roundtip-mode is to us and maybe either introduce a "strict mode" where the validation is turned back on, or store the unexpected elements in a way we can serialize them again (likely too much effort).

@barredterra
Copy link
Author

"strict mode" sounds fine to me. This could even be enabled by default, so we don't have any breaking changes.

@barredterra barredterra linked a pull request Dec 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants