-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a few CSS selector issues #178
Conversation
- fix standalone #ID (was working only as ELEM#ID) - fix non-lowercase element name (elements are internally lowercased, so we should lowercase there too to expect any match) - fix E+F selector, that should ignore immediate preceding text nodes, and consider the first met element node - adds E~F selector (like E+F, but *any* instead of "immediate" precedessor is considered for a match)
@@ -94,6 +94,7 @@ enum LVCssSelectorRuleType | |||
cssrt_parent, // E > F | |||
cssrt_ancessor, // E F | |||
cssrt_predecessor, // E + F | |||
cssrt_predsibling, // E ~ F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
else if ( css_is_alpha( *str ) ) | ||
{ | ||
// ident | ||
char ident[64]; | ||
if (!parse_ident( str, ident )) | ||
return false; | ||
_id = doc->getElementNameIndex( lString16(ident).c_str() ); | ||
_id = doc->getElementNameIndex( lString16(ident).lowercase().c_str() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the effect on the XML vs HTML split?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTML (including those found in EPUB) are parsed by HTMLParser (a subclass of XMLParser that has a setting m_citags=true
- unlike XMLParser that has it false) for case-insensitive tags, which make it lowercasing the element names. So, they are all lowercased internally when added to the DOM tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc->getElementNameIndex( "DIV" )
would work (if I remember correctly) like: oh, you're asking me for DIV, I haven't seen it yet, let's make a new ID for DIV for you, here it is - but there would be no element with that new ID in the DOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this CSS rule only applies to HTML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to each DOM node: Selectors gets that node's elementnameID (which will always be lowercase) and compares it to its _id (which is now lowercase).
I don't think we build DOM from anything but HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens to the XHTML? I thought you just said it's handled case sensitive? :-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both standalone html and html/xhtml found in EPUB are parsed by the lowercasing HTMLParser (only other opf/ncx xml stuff in the epub are parsed by the XMLParser).
Then, this HTMLParser feeds "documentWriter" with tag/attribute/text.
standalone HTML uses ldomDocumentWriterFilter, which handle autoclose and LIB.RU.
EPUB uses ldomDocumentFragmentWriter, that doesn't do autoclose, but deals with appending multiple body from multiple HTML as docFragment into a single DOM.
But both are fed with lowercased element tag names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, but okay in practice I suppose. :-p
Fix some of the issues reported in #176.