Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a few CSS selector issues #178

Merged
merged 1 commit into from
May 5, 2018
Merged

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented May 5, 2018

Fix some of the issues reported in #176.

  • fix standalone #ID (was working only as ELEM#ID)
  • fix non-lowercase element name (elements are internally lowercased, so we should lowercase there too to expect any match)
  • fix E+F selector, that should ignore immediate preceding text nodes, and consider the first met element node
  • adds E~F selector (like E+F, but any instead of "immediate" precedessor is considered for a match)

- fix standalone #ID (was working only as ELEM#ID)
- fix non-lowercase element name (elements are internally
  lowercased, so we should lowercase there too to expect any match)
- fix E+F selector, that should ignore immediate preceding text
  nodes, and consider the first met element node
- adds E~F selector (like E+F, but *any* instead of "immediate"
  precedessor is considered for a match)
@@ -94,6 +94,7 @@ enum LVCssSelectorRuleType
cssrt_parent, // E > F
cssrt_ancessor, // E F
cssrt_predecessor, // E + F
cssrt_predsibling, // E ~ F
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

else if ( css_is_alpha( *str ) )
{
// ident
char ident[64];
if (!parse_ident( str, ident ))
return false;
_id = doc->getElementNameIndex( lString16(ident).c_str() );
_id = doc->getElementNameIndex( lString16(ident).lowercase().c_str() );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect on the XML vs HTML split?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML (including those found in EPUB) are parsed by HTMLParser (a subclass of XMLParser that has a setting m_citags=true - unlike XMLParser that has it false) for case-insensitive tags, which make it lowercasing the element names. So, they are all lowercased internally when added to the DOM tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc->getElementNameIndex( "DIV" ) would work (if I remember correctly) like: oh, you're asking me for DIV, I haven't seen it yet, let's make a new ID for DIV for you, here it is - but there would be no element with that new ID in the DOM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this CSS rule only applies to HTML?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to each DOM node: Selectors gets that node's elementnameID (which will always be lowercase) and compares it to its _id (which is now lowercase).
I don't think we build DOM from anything but HTML.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens to the XHTML? I thought you just said it's handled case sensitive? :-P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both standalone html and html/xhtml found in EPUB are parsed by the lowercasing HTMLParser (only other opf/ncx xml stuff in the epub are parsed by the XMLParser).
Then, this HTMLParser feeds "documentWriter" with tag/attribute/text.
standalone HTML uses ldomDocumentWriterFilter, which handle autoclose and LIB.RU.
EPUB uses ldomDocumentFragmentWriter, that doesn't do autoclose, but deals with appending multiple body from multiple HTML as docFragment into a single DOM.
But both are fed with lowercased element tag names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, but okay in practice I suppose. :-p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants