-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect behavior of OdfElement.getTextContent() #229
Comments
i don't know the ODFDOM code well, but as a general rule about ODF: inside of a paragraph (text:p/text:h) and its child elements, there is character content which is part of the paragraph text (most elements), and there is character content which is part of some nested object and not part of the paragraph text. there are a bunch of elements for drawing shapes, draw:frame etc. and also office:annotation, that should comprise most of the latter case - these should probably not be included in any API that promises some sort of text content of "the paragraph" or "the table cell". the criteria for what is considered paragraph text are in section "6.1.2 White Space Characters" of ODF 1.3 part 3, and i think the text of a table cell should simply be the concatenation of its paragraphs. |
You are absolutely right, Todd. OdfElement has the base functionality to concatenate the text content: but the every text node containing element like OdfTextSpan should override this method and define its specific behavior. I am uncertain atm, if we have a way in the ODF Specification to differentiate this behavior, if not it would be worth to add it. Taking this over issue over! Thanks for pointing out! |
The ODF specification explains in the bullet list of 6.1.1 all elements containing text that should not be considered, see The funcationality is currently spread and needs to harmonized, e.g. this whitespace collapsing function is implemented in the characters(..) method of ChangesFileSaxHandler #220 and this #229 are quite related and will likely be fixed in one patch. |
While investigating #220, I came across
OdfTableCell.getOdfElement().getTextContent()
, which is defined byorg.w3c.dom.Node.getTextContent()
and implemented inOdfElement
. My interpretation of theNode.getTextContent()
documentation is that it should recursively retrieve the text contents of all child nodes and concatenate them together. However, theOdfElement
implementation only looks at the children, rather than all descendants.I think it is also worth noting that
OdfWhitespaceParser.getText(Node)
appears to do whatOdfElement.getTextContent()
is supposed to do. Or at least something very similar.The text was updated successfully, but these errors were encountered: