Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OdfTableCellRange() is unusably slow for spreadsheets with whole-sheet formatting #285

Closed
Nnnes opened this issue Apr 12, 2024 · 5 comments
Milestone

Comments

@Nnnes
Copy link
Contributor

Nnnes commented Apr 12, 2024

This issue is very closely related to #64. A fix for that issue would most likely also fix this one. The root problem is how LibreOffice handles setting formatting for an entire spreadsheet and how this library interacts with such spreadsheets.

Steps to reproduce:

  1. Create a new spreadsheet document in LibreOffice Calc
  2. Select all cells (Ctrl-A, or click to the left of the column letter list)
  3. Set any kind of formatting (e.g. change the font)
  4. Save

This creates a content.xml with the following inner content:

<table:table-column table:style-name="co1" table:number-columns-repeated="16384" table:default-cell-style-name="ce1" />
<table:table-row table:style-name="ro1" table:number-rows-repeated="1048575">
    <table:table-cell table:number-columns-repeated="16384" />
</table:table-row>
<table:table-row table:style-name="ro1">
    <table:table-cell table:number-columns-repeated="16384" />
</table:table-row>

getColumnCount() and getRowCount() then respectively return 16384 and 1048576. This is a problem for functions like getCellRangeByName(...) and getCellRangeByPosition(...) that eventually call getCellCoverInfos(), which will then attempt to iterate over every cell in the table multiple times.

It might be possible to fix this by patching getOwnerCellByPosition() without having to deal with the spreadsheet size issue. I'll try to do so and submit a pull request if I can figure out what a covered cell is.

@svanteschubert
Copy link
Contributor

@Nnnes A pull request is very much welcome. Thanks in advance!
Regarding your question:
A covered cell is a cell that is "below" a spanned cell, see https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#element-table_covered-table-cell

@Nnnes
Copy link
Contributor Author

Nnnes commented Apr 13, 2024

I'm making progress. Laying out some of my thought process here, mostly for me to use as a reference for myself.

According to the spec, table:number-columns-repeated may only be used for <table:table-cell>/<table:covered-table-cell> if "[t]he cells are not merged horizontally or vertically" ("merged" I think means table:number-rows-spanned/table:number-columns-spanned is greater than 1, which excludes the covered cells - didn't see this clearly defined in v1.2 or v1.3 of the spec, but v1.1 mentioned it).

Similarly, table:number-rows-repeated may only be used if the rows "do not contain vertically merged cells". The spec doesn't mention horizontal merges, but if I try to collapse 6 identical rows with horizontally merged cells into a single <table:table-row table:number-rows-repeated="6"> by manually editing content.xml, LibreOffice seems to completely ignore the table:number-rows-repeated when I reopen the file, and only one row appears. So to avoid causing any breakage, I should not write anything that creates a horizontal merge in a repeated row.

As far as I can tell, all of this means I can handle repeated rows all at once in getCellCoverInfos() without affecting the result, similarly to how it's already being done here.

There are a few other places with patterns like for (int i = 0; i < table.getRowCount(); i++) (example 1, example 2) that I suspect could be improved in the same way. It might be worthwhile to write some kind of utility methods for iterating over rows/columns efficiently, but I'll save that work for a different pull request.

@svanteschubert
Copy link
Contributor

Or https://github.com/tdf/odftoolkit/blob/master/odfdom/src/test/resources/test-input/_tableRepeated.ods
You may run a query (some bash that peeks into the XML within the ZIP looking for certain tags) or query/search the JSON of office operations within
https://github.com/tdf/odftoolkit/tree/master/odfdom/src/test/resources/test-reference/operations
to identify some functionality you require.
The allowed/possible operations (or document changes) are defined:
https://tdf.github.io/odftoolkit/odfdom/operations/operations.html

Godspeed!

@svanteschubert
Copy link
Contributor

Cool! @Nnnes Thanks for the patch!
@mistmist Thanks for the review!

@mistmist mistmist added this to the 0.13.0 milestone Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants