-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tables rendering: fixes and improvements #243
Conversation
Allow local or embedded fonts like symbol.ttf or Wingding3.ttf to be used. Fix infinite loop in measureText() when neither current font nor fallback font have a glyph for a char.
Issues mostly noticable in table cells: - Really ignore leading spaces, so they don't cause a wrap (and so a leading blank line) when the next word doesn't fit in the available width. - When a word exactly fits in the available width, don't forget to check if the following char (even if it itself doesn't fit) is a space, in which case it is a candidate for wrap. Otherwise, such a word would be splitted by the hyphenation rules (This also helps in normal text when the last word fits exactly the line: it will avoid hyphenating that last word, and reduce the amount of added spacing between words because of that.) - Fix some crash when some image gets resized to have a zero width (can happen with some mix of CSS properties)
When consecutive lines or blocks with all page-break: avoid, if they wouldn't fit on a page, some of them may be missing in the computed pages (while still visible in scroll mode). This was because StartPage() did set 'last=line', which was wrong in the case we split on an earlier line (when all current lines have pb avoid, we split on a past one where page break was allowed we remembered (as 'next'). (Mostly noticable with tables with tall rows: currently, a table with 3 rows has all pb avoid between rows). This commit's real change is just the removal of 'last=line' in StartPage(). Also moved the single 'last = line' that was at the end of AddLine() to the proper logical place in each of the if/else branches as it makes more sense there when reading. Added many comments and debugging code (lvpagesplitter may need a revisit.)
- Adds support for named value 'auto' in some length'ed properties (width: and margin:, will only be supported for now when applied to a TABLE element). - Remove support of units in % for border* properties according to CSS specs (in some cases, when the container width is not yet known, it couldn't be computed anyway). - Fixed computing of border named values (thin, medium, thick) that I forgot to *256 when I added support for dpi-based css units (koreader#213).
It was too short by 1px since koreader#211 (bottom border needed koreader#211 fix, and I added in koreader#211 the same kind of fix for the right border as it looked needed - but the more obvious error, easier seen on the right than on the bottom, was already workarounded by some removed '+1' in the their drawing code. Now, they are all similar.
Add forgotten styles, and re-order in the same order styles are defined in lvstyles.h, for easier futur comparisons and updates.
Added more comments about how things work. Adds rendering method 'erm_killed' as a last resort fallback when there is no room to draw some element. Might be needed in some situations when width < 0, and used only with tables for now (was needed while fixing tables as some bug made negative widths, but it looks like I can't reproduce the need for it... but well, let's keep it, might be handy someday). Draw a little figure when erm_killed, so the user can notice some content is missing. Also adds optional SHOW_REND_METHOD option to writeNodeEx() used to return HTML of selection (not enabled in KOReader 'View HTML', but useful to me when debugging).
Needed for proper sizing and positionning of some elements in various contexts: will be used to properly size table cells, but could be useful if we were to add support for 'float:'. Estimate min and max content width of node when rendered, the way it is mentionned in various CSS or HTML specs: - maxWidth: width if it would be rendered on an infinite width area, breaking only on explicit <BR/>. - minWidth: width with a break on all spaces (no hyphenation), so the width taken by the longest word or image. This does mostly what renderBlockElement, renderFinalBlock and lvtextfm.cpp do, but caring only about widths and horizontal margin/border/padding and indent (with no width constraint, so no line splitting and hyphenation - and we don't care about vertical spacing and alignment).
(Many related changes that can't easily be splitted into individual commits... So, describing them as they appear top to bottom in this commit.) LookupElem(): - Table cells (TD/TH): adds support for CSS properties vertical-align, text-align and width. - In ldomDocumentWriterFilter(), used for plain HTML files, translate deprecated tag attributes width=, align= and valign= to their equivalent CSS properties. PlaceCells(): - Fixed oversized colspan and rowspan by adjusting everything to the right amount of really used cols and rows. - border-collapse: proper collapsing from TR/THEAD/TABLE (now correctly displays border from TRs when TDs don't have any). Proper collapsing from TDs to adjacent TDs is not implemented. - Proper computation of cells min and max rendered width with the help of getRenderedWidths() (previously, some bad approximations were used by just computing the length of the whole plain text of the content with the TD cell font size). - Use cells min and max widths to better compute final column widths. Do that as a 2nd pass for colspan cells while trying to keep single-cells column widths ratios. (previously, any colspan was adding too much to all the cells it spanned, with the result of all the columns having the same widths). - Properly compute the assignable width for columns, and avoid modifying final column widths once computed (border-spacing was taken care of too late, and was reducing computed cells width, noticable on cells where the width has been set to its min-content width) - Add optional shrink_to_fit parameter to reduce table width instead of distributing remaining space to cells. Used by default and when table has "width: auto". Previous behaviour can be obtained with "width: 100%" renderCells(): - Never modify computed table cells widths. - Clearer computation of elements' relative positions to their parent (RenderRectAccessor x/y/w/h) - Fix TRs borders being drawn over cells content by just never drawing them: they should only be drawn when border-collapse to the cells, by the cells. - Clearer page splitting (context.Addline()) by doing it in the flow as we met elements (instead of only at the end as it was done previously). - Implement vertical-align: middle/bottom of cell content. - (Moved wrapping function renderTable() from far below to just after the table rendering code). renderBlockElement(): - Switch to erm_killed when erm_table has zero or negative width (just in case, hard to reproduce). - Use shrink_to_fit or not depending on table width: style: - when unspecified or "width: auto": shrink_to_fit - when it's a length (in %, px or em): no shrink_to_fit - Properly align (center or right) table when table's margin-left: and/or margin-right: is 'auto'. DrawDocument(): - Don't draw background for TR and THEAD/TBODY/TFOOT: bgcolor is now spread to cells (TD) by setNodeStyle(). (Firefox never draws the TR bgcolor outside of cells: the border-spacing stays with the bgcolor of the table). - Don't draw borders for TR and THEAD/TBODY/TFOOT: they are never drawned by Firefox when no border-collapse, and when border-collapse, they were collapsed to the cells if needed, and will be drawn by the cells ldomDocumentWriterFilter::OnAttribute(): - HTML autoclosing, with plain HTML documents: fix TR that were autoclosing the THEAD/TBODY/TFOOT they are part of (so these THEAD... styles were not applied to/inherited by any of their TRs)
Not to be merged/bumped before 2018.12 is released ! as there will probably be new bugs or glitches... :| |
Oh, I forgot all about that. :-P I'll probably toss in a MuPDF default scaleByDPI thingy (for EPUB) before I do, and also I don't really feel like it right now. :-) |
I have feeling this is better for wiki maybe this should be the default for wikipedia https://user-images.githubusercontent.com/24273478/49339210-f0949d00-f62e-11e8-8f2c-c2b29b1f3ca5.png You are a monster. Thanks a lot for commit!! |
Is that a Wikipedia page? I'll just ask my standard question: what do the browsers do? (Because even if there's no written standard, there's often a de facto one.) |
Yes, some random mixed chinese/english page I once found and now use for testing: Browsers have a luxury we don't have, they can overflow :) Which is what firefox does: But OK, testing with just extracting some text from this table and putting it in a simple table with no width specified, Firefox does not mind wrapping down to a single CJK char: (right truncated) Same simpler table with this PR crengine, with the is_cjk stuff still commented out (so, full sequence of CJK considered as a unbreakable unit): With the is_cjk uncommented (so, each CJK char considered a small unbreakable unit): |
That's what Firefox does. We'll see and adapt when we get feedback from CJK users.
No more Travis CI auto-check ? |
I'll add some style for BLOCKQUOTE in wikipedia.lua, because currently, there is none, and so P inside BLOCKQUOTE just look like any non-quote P. blockquote {
overflow: hidden;
margin: 1em 0;
padding: 0 40px;
} Thinking about using: blockquote {
margin: 0.5em 0;
padding: 0 2em;
} Should this has its place in our main epub.css instead? |
If that's not in the styles, I indeed think that'd be good to add. It's fairly standard. |
Travis is probably just a hiccup. |
Don't avoid wrap at end of line when ending on a space preceded by a CH_PROP_AVOID_WRAP_AFTER char.
OK, switched to using margin-left/right instead of padding-left/right as suggested by Frenzie in a comment to the (now gone) epub.css previous commit. |
If you want me to read over the actual code, that won't be for today. CSS is just something I can do at a glance. ;-) |
Not forcing that on you :) I'll merge and bump all that tomorrow. |
For Chinese, one character is one word and there is no rules for minimal characters in a line. I don't know about Japanese and Korean. Actually it looks much better now after your change. There is a W3C format requirement for Japanese, which is most comprehensive. Japanese text layout |
Thanks for the feedback! |
I guess that initial test case is a good start to refine it to an even smaller one :) And may be we won't find a smaller one, and it's the whole combination that causes the issue. |
Absolutely, I intended it as a PSA as much as anything. 👍 15 years ago I occasionally reported bugs to Opera, which is when I was given some friendly pointers to improve my test cases. I liked to write minimal HTML, of this variety: <!DOCTYPE html>
<title>test case</title>
<p style="color:blue">This should be blue but it's red! (Okay, obviously this never happened.) And they told me that generally speaking I should first try to write it like this instead to ensure it's truly an issue about incorrect color parsing: <!DOCTYPE html>
<html>
<head>
<title>test case</title>
</head>
<body>
<p style="color:blue">This should be blue but it's red! (Okay, obviously this never happened.)</p>
</body>
</html> In other words, an ideal minimal testcase only includes the absolute minimum to reproduce the issue but it avoids going so minimal that it might introduce confounding issues. The above is a somewhat badly chosen illustration since the "confounding issues" from my sample are the entire point. Perhaps it should be reversed for illustrative purposes instead: <!DOCTYPE html>
<title>test case</title>
<p style="color:blue">This should render but it doesn't because the DOM thinks there's no BODY element!
(So what's that style=color:blue doing there? It's just distracting. :-) ) Btw, @a-kohout is excellent at providing minimal test cases. See koreader/koreader#3192 for example. :-) |
If we remove some of the confounding/obscuring factors you still get a plenty weird result. The test case could still be slightly more minimal, but the following is roughly what I like to see in a test case. ^_^ <!DOCTYPE html>
<html>
<head>
<title>Table autoclosing</title>
</head>
<body>
<table>
<tr>
<td>
<b>Сферы производства</b>
<td>
<b>1890</b>
<td>
<b>1900</b>
<td>
<b>Прирост</b>
<tr>
<td>
Чугун, <i>т</i>
<td>
927 100
<td>
2 933 700
<td>
216%
</table>
</body>
</html> |
(Another thing that can be at play : |
Sure? please read again:
|
I just showed you how changes in one line broke the auto-closing of the |
Now go ahead and restore the string |
Disabling the COMPLETE_INCOMPLETE_TABLES flag (to not add mess to the mess). When describing the algorithm:
<tr>
<td>
<b>Прирост</b>
<tr> <!-- This will close the preceeding TD, but not the first TR -->
<td>
Чугун, <i>т</i> So, the next TR will close the TD, and will the be sibling of the TD, which is unexpected and some bit of the code decides to set rend_method = erm_killed: crengine/crengine/src/lvtinydom.cpp Lines 5724 to 5767 in 8a15844
and so the content of the other TRs are not shown. So, for this case, I'd say it's a limitation of the AutoClose algorithm, that is stopping at the first parent in the AC_TR list, and won't close the preceeding row TR. If you add to that the "complete incomplete table" stuff, it will wrap the standalone TR (that should not be there as a child of TR) into a TABLE-like tabularBox element: So, I'd say it's the AutoClose algorithm that should be fixed/enhanced, so to correctly close multiple unclosed elements (if that's even possible without taking wrong decisions). (note: I'm not interested in digging into that at all :/)
Indeed, with this AC_TR list, it won't close the TD, and won't stop there, and so it will properly close the TR. So, that looks like an easy solution - but I don't know it's side effects :) |
@virxkane I'm not trying to prove you wrong, unless you disagree about effective communication through minimal test cases. In that case I am. ^_^ However, your proposed fix only renders correctly in this particular case somewhat by accident. The resulting DOM is incorrect even if we feed it a fully fledged out table as included below. I would agree it seems like a reasonable workaround since <!DOCTYPE html>
<html>
<head>
<title>Complete table</title>
<style>
thead {color:red}
tbody {color:blue}
</style>
</head>
<body>
<table>
<thead>
<tr>
<th>Сферы производства</th>
<th>1890</th>
<th>1900</th>
<th>Прирост</th>
</tr>
</thead>
<tbody>
<tr>
<td>Чугун, <i>т</i></td>
<td>927 100</td>
<td>2 933 700</td>
<td>216%</td>
</tr>
</tbody>
</table>
</body>
</html> The intended rendering looks like: But since the By comparison, here's the correct DOM with the correct rendering as in the current code. I'd be inclined to agree that the damage is worse without your suggested workaround, but it's definitely a workaround, not a fix. :-) |
Seem like that, thank you.
Ok. The next sample will be smaller :)
No, no, no. I don't proposed this fix, I show problem, I ask how to fix. Point. |
So, anyway, if none of us want to think too much about that, we could agree on this scenario: -static const char * AC_TR[] = {"tr", "tr", "thead", "tfoot", "tbody", NULL};
+static const char * AC_TR[] = {"tr", "tr", "td", NULL}; I did this in 2018 because having THEAD, TBODY... is wrong. -static const char * AC_TR[] = {"tr", "tr", "td", NULL};
+static const char * AC_TR[] = {"tr", "tr", NULL}; But after that, we're not yet done with your initial test case :( |
Same reason for the imbricated DD. |
I tried reading the specs but I finally gave up and went back at the original AutoClose() code and tried to make it do what we expect - and not fail on the problems noticed (nested elements, stopping too early) - may be somehow doing it in the spirits of what I got from the specs. This seems to solve the issues noticed above. --- a/crengine/src/lvtinydom.cpp
+++ b/crengine/src/lvtinydom.cpp
@@ -12952,24 +12952,46 @@ void ldomDocumentWriterFilter::appendStyle( const lChar16 * style )
void ldomDocumentWriterFilter::AutoClose( lUInt16 tag_id, bool open )
{
lUInt16 * rule = _rules[tag_id];
if ( !rule )
return;
if ( open ) {
ldomElementWriter * found = NULL;
+ ldomElementWriter * found_fallback = NULL;
ldomElementWriter * p = _currNode;
while ( p && !found ) {
lUInt16 id = p->_element->getNodeId();
for ( int i=0; rule[i]; i++ ) {
if ( rule[i]==id ) {
- found = p;
- break;
+ if ( i == 0 ) {
+ // rule[0] should be the same as the element that we see opening
+ // If we meet one, we're done auto-closing
+ found = p;
+ break;
+ }
+ else {
+ // rule[1+] are other elements that we can meet and that we
+ // should auto-close to become a sibling of them, but only
+ // when no same-element (from rule[0] has been found.
+ // Not sure if we should stop at the nearest one met,
+ // or take the upper most one.
+ found_fallback = p;
+ }
}
}
p = p->_parent;
+ if ( p && p->_element->getNodeId() == el_table ) {
+ // Don't cross TABLE: a new DD inside a TABLE CELL should
+ // not auto-close a upper DD that contains this TABLE.
+ // This feels generic enough, so do it in all cases
+ break;
+ }
+ }
+ if ( !found && found_fallback ) {
+ found = found_fallback;
}
// found auto-close target
if ( found != NULL ) {
bool done = false;
while ( !done && _currNode ) {
if ( _currNode == found )
done = true; |
Possibly Good Enough™ and better either way. But it's much too warm to think clearly over here. :-) |
Although it might still not be right.
(these are shifted one slot, so rule[0] is actually the 2nd element I'm handling |
Pfffffff. If we update the AC_* lists or the AutoClose() algorithm, we might have a totally different DOM, and previously made bookmarks/highlights would not work. So, that would mean a new DOM_VERSION - and keep using the old AC_ and AutoClose() function for older DOM versions, and switch to new ones for new books - with no migration. If I trust some past comments of mine:
So, should we care about not breaking past highlights for these formats that I consider little-used document formats ? |
I'll drop these tests here https://github.com/html5lib/html5lib-tests/blob/71eebd59772d1d39aced0c0582ae9c09acf3ce6e/serializer/optionaltags.test My view is similar to yours. Breaking it should be relatively low-impact. But if we do, unfortunately that makes the option of incremental improvements (i.e., continuous breakage) worse than it might have been. |
So, I guess my first idea above is not right (assuming the second element is the same as the element and we should stop there). --- a/crengine/src/lvtinydom.cpp
+++ b/crengine/src/lvtinydom.cpp
@@ -12954,21 +12954,50 @@ void ldomDocumentWriterFilter::AutoClose( lUInt16 tag_id, bool open )
lUInt16 * rule = _rules[tag_id];
if ( !rule )
return;
- if ( open ) {
+ if ( open ) { // called by OnTagOpen()
ldomElementWriter * found = NULL;
ldomElementWriter * p = _currNode;
- while ( p && !found ) {
- lUInt16 id = p->_element->getNodeId();
+ lUInt16 id = p ? p->_element->getNodeId() : 0;
+ while ( p ) {
for ( int i=0; rule[i]; i++ ) {
if ( rule[i]==id ) {
found = p;
- break;
+ // We don't stop at the first one met: we continue checking
+ // upper nodes that this node might close. But we might break
+ // below when meeting one of the stop element.
}
}
p = p->_parent;
+ if ( !p )
+ break;
+ id = p->_element->getNodeId();
+ // Some hardcoded bits (to not complicate AC_* rules and their parsing...)
+ // for parent elements that should stop us.
+ if ( id == el_table ) {
+ // Never cross a TABLE (a new DD inside a TABLE CELL should
+ // not auto-close a upper DD that contains this TABLE).
+ // This feels generic enough, as a table has independant block
+ // rendering contexts for each of its cells.
+ break;
+ }
+ if ( id == el_ruby ) {
+ // RUBY should mostly containe ruby sub-element (and possible
+ // some other inline elements that don't auto close).
+ // So, we shouldn't need to check for tag_id: stop there
+ break;
+ }
+ if ( tag_id == el_li && (id == el_ul || id == el_ol) ) {
+ // A LI stops at its UL or OL container, so it does not
+ // close a upper LI
+ break;
+ }
+ if ( id == el_dl && (tag_id == el_dl || tag_id == el_dt || tag_id == el_dd ) ) {
+ // A DL, DT or DD stop at its DL container, so it does not close it
+ // not any upper DT or DD
+ break;
+ }
}
- // found auto-close target
- if ( found != NULL ) {
+ if ( found != NULL ) { // found auto-close target
bool done = false;
while ( !done && _currNode ) {
if ( _currNode == found ) And this AC_ stuff I re-ordered and commented (with a few changes but not showing the diff as it would make it less readable): // Most block container elements close a P.
// Most should not close themselves, as they can be nested
// When closing upper nodes, we stop at TABLE to not close them, as their content
// is an independant rendering context (this is hardcoded in AutoClose())
static const char * AC_P[] = {"p", "p", NULL}; // a P close any P, so there can't be nested <P>s
static const char * AC_DIV[] = {"div", "p", NULL};
// (UL/OL used to have "li" there, but this was wrong as a UL/OL can be inside a LI, with nested lists)
static const char * AC_UL[] = {"ul", "p", NULL};
static const char * AC_OL[] = {"ol", "p", NULL};
// A LI should close a previous LI, but should not close an upper UL or OL, and should not
// go look upper them and close any upper LI (this is hardcoded in AutoClose())
static const char * AC_LI[] = {"li", "li", "p", NULL};
// Description list (DL = list, DT = term, DD = definition/description)
// None of these should close an upper DL containing them, and should not go look upper
// their containing DL and close any upper DD/DT (this is hardcoded in AutoClose())
// A new DL can happen in a DD, but not in a DT.
static const char * AC_DL[] = {"dl", "dt", "p", NULL};
// DT and DD close any previous term or definition
static const char * AC_DT[] = {"dt", "dt", "dd", "p", NULL};
static const char * AC_DD[] = {"dd", "dd", "dt", "p", NULL}; // (used to not close DT, but no reason not to)
// A TABLE should not close anything (except any previous P), because of nested tables
static const char * AC_TABLE[] = {"table", "p", NULL};
// Table rows container should close any previous rows container or standalone rows
static const char * AC_THEAD[] = {"thead", "tr", "thead", "tfoot", "tbody", NULL};
static const char * AC_TBODY[] = {"tbody", "tr", "thead", "tfoot", "tbody", NULL};
static const char * AC_TFOOT[] = {"tfoot", "tr", "thead", "tfoot", "tbody", NULL};
// A TR should close any previous TR (or unclosed previous TD, that would anyway
// be closed when closing its containing TR)
static const char * AC_TR[] = {"tr", "tr", "td", NULL};
// TH and TD close any previous TH or TD
static const char * AC_TH[] = {"th", "th", "td", NULL};
static const char * AC_TD[] = {"td", "td", "th", NULL};
// Ruby elements (often without a closing element and expecting auto closing
// according to various rules, sometimes contradictory...)
// When closing upper nodes, we stop at RUBY to not close upper ruby sub-elements
// and allow nested ruby (this is hardcoded in AutoClose())
// A new RBC and RTC close every other ruby sub-element
static const char * AC_RBC[] = {"rbc", "rbc", "rb", "rtc", "rt", NULL};
static const char * AC_RTC[] = {"rtc", "rtc", "rt", "rbc", "rb", NULL};
// A new RB closes every other ruby sub-element, except RBC (which may contain multiple RB)
static const char * AC_RB[] = {"rb", "rb", "rt", "rtc", NULL};
// A new RT closes every other ruby sub-element, except RTC (which may contain multiple RT)
static const char * AC_RT[] = {"rt", "rt", "rb", "rbc", NULL};
// A RP (ruby fallback parenthesis) closes ruby sub-element, except containers
static const char * AC_RP[] = {"rp", "rp", "rb", "rt", NULL};
// A new one of these should close any previous same one
static const char * AC_OPTION[] = {"option", "option", NULL};
static const char * AC_PARAM[] = {"param", "param", NULL};
static const char * AC_PRE[] = {"pre", "pre", NULL};
// Auto-closing (no sub-element accepted)
static const char * AC_HR[] = {"hr", NULL};
static const char * AC_BR[] = {"br", NULL};
static const char * AC_WBR[] = {"wbr", NULL};
static const char * AC_IMG[]= {"img", NULL};
static const char * AC_COL[] = {"col", NULL};
static const char * AC_BASE[] = {"base", NULL};
static const char * AC_LINK[] = {"link", NULL};
static const char * AC_META[] = {"meta", NULL};
static const char * AC_AREA[] = {"area", NULL};
static const char * AC_EMBED[] = {"embed", NULL};
static const char * AC_INPUT[] = {"input", NULL};
static const char * AC_SOURCE[] = {"source", NULL};
static const char * AC_TRACK[] = {"track", NULL}; Not much change to the original AC (except UL/OL not closing any LI, and DT also closing a DD, for symetry), I've just added comment to explain why these rules with what I know about these tags, and not by following the specs. The new stuff is the stop rules in the C code. |
I'm still trying to understand the specs, and I need a bit of spec reading wits :) There is list of tags that are considered "special", which looks mostly like all elements that are not inline (so, block and others found in head, etc...) Can you look at these steps in the section https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody: They both are bout walking up the parents to close the matching start tag, but they mention: Do you understand (like I do) that all these "special" tags are stop tags when walking up parents? (so, with Above the There's another set: So, it looks like we could do with a few sets of elements, and if we order them correctly in fb2def.h, we could hardcode a few (There's lots of stuff we won't be able to do easily, like active formatting elements |
I think so. I think that should be |
As long as it's related (like images or whatever), that should be fine. But indeed they'll be auto-closed by other block-level elements. |
I'm still working on that HMTL parser. I think I made sense of the specs... And there's the lib.ru hacks in that parser to possibly try to solve. (All that specs reading and work for so little personal benefit :), as most formats like FB2 or EPUB don't use that HTML parser... but better get something right than add more hacks :/ ) |
Good news.
For me, this is still an open question. Now, in the program settings, we can choose 'legacy' or 'latest'. And this choice is saved for each book. If we remove this option, we will have to do something with the bookmarks saved in the database. |
It started with the 30 lines first commit, to support symbol fonts (details in koreader/koreader#4196 (comment)), and...:
and it ended up with 1500+ lines so we can get instead:
Fixed all the buggy stuff I met in that journey (including a right border drawing bug, noticable in all the screenshots in #233: the right border was thicker - and now the various
Vocalu-lo
are no more cut - and the background issue noticed in #233 (comment)).Main improvements and changes:
More adequate computation of column widths, so they don't go smaller than the longest word inside a cell. This should avoid words being cut, and a proper sizing when a cell content uses a different font than the cell itself.
Added
getRenderedWidths()
to get the min and max content width from any node. Used for now for sizing table cells, but could be used for other things (like floats).Added support for table and cells width when specified with CSS (there was support for that when specified as HTML tag attributes, which worked only with plain HTML documents, not with EPUBs).
When no table width specified, defaults to "shrink to fit" (as most browsers do) and no more to 100%.
Implement
vertical-align: middle/bottom
for table cells. Closes koreader/koreader#2846Details in the individual commit messages.
More comparisons (stuff look constrained in the After screenshots, because I used
padding:0
for easier testing - we'd probably find some padding in regular books):Before:
After:
Before:
After:
Before:
After:
(There must be a bug in the font with the first symbol glyph, as its width is different than its rendered width... the other symbols glyphs are ok.)
Some notes and questions:
Supporting cells specified widths is a bit questionable, as I feel it's always better when not, and just using the computed width according to min/max content width feels always good, and specified width can be boggus and change with DPI. We'll see if we need to remove that.
Unless tables are set with a specified width, they will now take only the width necessary to display cells content (shrink to fit) just like current browsers do.
This might be questionable: I feel like always taking 100% for unstyled tables is most often better than not on our devices (otherwise, smaller tables are stuck to the left, which feels strange with justified text... a 100% width table feels like a justified table :)
Got this feeling with most Wikipedia EPUBs, that looked before/after like:
(there is actually a
style="width:9em"
on the first TD in a row, that's the reason they look aligned)What do you think?
If we'd want to restore previous behaviour for unstyled tables, we could just add to epub.css:
table { width: 100%; }
.I'll add a few Style tweaks so people can choose alternatives (ignore table and td widths, make tables full width, make tables shrink, center shrinked tables...) but we should choose the most adequate default.
This is a followup to #233, and change quite a bit of what @frankyifei introduced in #35 and #38.
There might be something left to do for CJK, but I can't judge.
I left some code related to CJK commented out in
getRenderedWidths()
, as logically, one CJK char being itself a word and candidate for wrap, we could get when sizing (in this example, there's a witdh=100% specified in the middle TD...)but not doing it gives a better looking:
@frankyifei or other CJK readers: any feeling about that? Any minimal number of CJK chars to aggregate on a single line, or are there some kind of rules for deciding that?