Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalized xpointers, complete incomplete tables #328

Merged
merged 11 commits into from
Feb 24, 2020

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented Feb 23, 2020

See individual commit messages for details.
Some additional notes:

(Upstream) createXPointer/toString: normalized xpointers
Bits from 3 commits by @pkb from buggins/coolreader#125, stuff I never dared doing :) but glad we finally have that: XPointers without the boxing elements, that should survive changes in boxing/CSS display: property values.
Note that I didn't pick their own CRHistFile xpointers migration and gDOMVersionRequested update (that I initially discussed there with them), as we are actually using that CRHistfile in this little bit of frontend code that deal with orientation/window size change - and given the hard time I had to make that work, and the fact that CRHistFile wasn't really easy to understand, best to not take risks and leave it untouched.

toStringUsingNames(): rename to toStringV2()
toStringV2(): output [1] when first but not single
DOM_VERSION_WITH_NORMALIZED_XPOINTERS 20200223
are small adaptations to that Upsteam code.

CSS/List items: skip boxing elements when walking is, I hope, the last needed bit to finally get transparent boxing elements: instead of iterating all the parent's children in a possibly-boxed-DOM, and so possibly missing some of them, we start from the first child and use getUnboxedNextSibling() to iterate all original siblings by walking up and down boxing elements.
This is a bit more expensive, but not really noticable on most books (but on the nwt-E crappy EPUB from #277 (comment), there is a 25% increase of load & rerendering times as it abuses :first-child/nth-child() pseudoclasses - and it's not because of the next commit CSS: parse pseudoclass nth_child(3n+2), same 25% without it).

Tables: complete incomplete tables was discussed and detailed alongside #325 (comment). This may reveal new content that was previously just discarded/invisible.
Follow rules from https://www.w3.org/TR/CSS22/tables.html#anonymous-boxes and https://www.w3.org/TR/css-tables-3/#fixup except that we do 3) Generate missing parents before 2) Generate missing child wrappers.

This comes with:
DOM_VERSION_CURRENT 20200223
DOM_VERSION_WITH_NORMALIZED_XPOINTERS 20200223
so only newly opened books will benefit from them.
But I'll bump this with some frontend code to propose a migration/upgrade, or do it without asking when no risk. Still need a bit of thinking on this.

image

image


This change is Reviewable

SimpleCacheFileHeader( lUInt32 dirtyFlag ) {
memcpy( _magic, _compressCachedData ? COMPRESSED_CACHE_FILE_MAGIC : UNCOMPRESSED_CACHE_FILE_MAGIC, CACHE_FILE_MAGIC_SIZE );
_dirty = dirtyFlag;
_dom_version = gDOMVersionRequested;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. 👍

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@poire-z
Copy link
Contributor Author

poire-z commented Feb 23, 2020

Rebasing to just re-order 2 commits (that I had badly ordered when grouping them by topic):

toStringV2(): output [1] when first but not single depends on stuff added by CSS/List items: skip boxing elements when walking

@poire-z poire-z force-pushed the normalized_xpointers branch from d77164e to 9f90509 Compare February 23, 2020 16:08
Unserialize/serialize xpointers (node position in the DOM)
in a normalized way, conformed to the original DOM tree,
without any of the added autoBoxing/floatBox/inlineBox
wrapping elements needed for a correct rendering, and that
may change depending on CSS styles and rendering features.
This should allow bookmarks/highlights/lastposition
to survive such changes, and avoid future needs
to bump DOM_VERSION_CURRENT when just adding features.
and toStringUsingNamesOld() to toStringV1(), for symmetry
with createXPointerV2()/createXPointerV1(), and ease of
just renaming toString() to toStringV1() when debugging
boxing elements.
Update this to a new DOM_VERSION, as we've been using
20180528 for quite some time (unlike upstream)
Adds getUnboxedParent(), getUnboxedFirstChild(),
getUnboxedLastChild() getUnboxedPrevSibling() and
getUnboxedNextSibling() to allow walking a node's
children and sibling as if our internal boxing
elements weren't there.
Used in the bits that need that, so fixing them:
- CSS descendant and sibling combinators
- CSS pseudoclasses :first-child ... :nth-last-of-type()
- List items walking to get its marker incremented
  value and max width
Previously limited to values "odd" and "even".
toStringV1() would output "/div/" when the element
is the single DIV in its parent children, but would
output "/div[1]/" when parent has more than one DIV
and element is the first one of them, which is nice.
toStringV2() was outputing "div" in both cases.
This has toStringV2() do as toStringV1().
(Note that createXPointerV2() would find the correct
node with both kind of input.)
Adds a flag COMPLETE_INCOMPLETE_TABLES to have enhanced
rendering code add missing children and parents table
elements to those missing them, so they can be drawn
correctly instead of simply dropped.
This applies to HTML TR/TD elements missing a TABLE
parent (which must be rare), but also to any HTML
element which have a "display: table-cell" or
"display: table" while their parent or children are
lacking the expected "display: table-row", which
is surprisingly not rare (some books use that for
footnotes or TOC).
Also properly handle "display: inline-table" by having
it handled like a table, so missing table elements will
be added if it initially have none, which will ensure
the expected rendering.
Also, when not enable, and when we can, render and
draw as erm_killed unproper table elements that we
know would not be shown.
Also convert unauthorized display style on IMG.
Allows discarding previous cache when DOM version
is upgraded.
As we increased the size of RenderRectAccessor by 4 when
adding Enhanced Block Rendering (66dac2c).
@poire-z
Copy link
Contributor Author

poire-z commented Feb 25, 2020

There seems to be a case where xpointers include some bits of our boxing elements, may be when that boxing element is the last.

When bookmarking a page that starts with some inline box containing an image:

["page"] = "/body/DocFragment[16]/body/div[2]/div[3]/div[1]/div[2]/inlineBox.0",
["text"] = "Page 1 in Couverture @ 2020-02-25 22:03:32",

which resolves badly to page 1 :(
When selecting and highlighting some text in that inline box, we get:

["page"] = "/body/DocFragment[16]/body/div[2]/div[3]/div[1]/div[2]/div[1]/span/text().14",
["highlighted"] = true,
["pos1"] = "/body/DocFragment[16]/body/div[2]/div[3]/div[1]/div[2]/div[1]/span/text().33",
["pos0"] = "/body/DocFragment[16]/body/div[2]/div[3]/div[1]/div[2]/div[1]/span/text().14"

@poire-z
Copy link
Contributor Author

poire-z commented Feb 25, 2020

I think this should solve that ^:

@@ -8458,10 +8500,18 @@ lString16 ldomXPointer::toStringV2()
         return path;
     ldomNode * node = getNode();
     int offset = getOffset();
-    if ( offset >= 0 ) {
-        path << "." << fmt::decimal(offset);
-    }
     ldomNode * p = node;
+    if ( !node->isBoxingNode() ) {
+        if ( offset >= 0 ) {
+            path << "." << fmt::decimal(offset);
+        }
+    }
+    else {
+        if ( offset < p->getChildCount() )
+            p = p->getChildNode(offset);
+        else
+            p = p->getParentNode();
+    }
     ldomNode * mainNode = node->getDocument()->getRootNode();
     while (p && p!=mainNode) {
         ldomNode * parent = p->getParentNode();

We'd then get:

["page"] = "/body/DocFragment[16]/body/div[2]/div[3]/div[1]/div[2]/div[1]"
instead of
["page"] = "/body/DocFragment[16]/body/div[2]/div[3]/div[1]/div[2]/inlineBox.0",

@Frenzie
Copy link
Member

Frenzie commented Feb 26, 2020

Alright, nice. :-)

@poire-z
Copy link
Contributor Author

poire-z commented Feb 28, 2020

Cross-referencing: CSS/List items: skip boxing elements when walking does what I mentionned in koreader/koreader#3934 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants