Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-closing tag on void elements shouldn't be a parse error/warning in Dom\HTMLParser #17485

Open
xPaw opened this issue Jan 16, 2025 · 17 comments

Comments

@xPaw
Copy link
Contributor

xPaw commented Jan 16, 2025

Description

The following code:

<?php
$Data = "<!DOCTYPE HTML>\n<br />\n<input />";
$Document = \Dom\HTMLDocument::createFromString( $Data );
echo $Document->saveHTML();

Resulted in this output:

Warning: Dom\HTMLDocument::createFromString(): tree error non-void-html-element-start-tag-with-trailing-solidus in Entity, line: 2, column: 2-3 in test.php on line 3
Warning: Dom\HTMLDocument::createFromString(): tree error non-void-html-element-start-tag-with-trailing-solidus in Entity, line: 3, column: 2-6 in test.php on line 3

I believe the html standard specifically allows this on void elements: ref

Then, if the element is one of the void elements, or if the element is a foreign element, then there may be a single U+002F SOLIDUS character (/), which on foreign elements marks the start tag as self-closing. On void elements, it does not mark the start tag as self-closing but instead is unnecessary and has no effect of any kind. For such void elements, it should be used only with caution — especially since, if directly preceded by an unquoted attribute value, it becomes part of the attribute value rather than being discarded by the parser.

it doesn't warn for tags like <div /> and simply opens it as normal.

relevant code:

bool is_void = lxb_html_tag_is_void(token->tag_id);
if (is_void) {
lxb_html_tree_parse_error(tree, token,
LXB_HTML_RULES_ERROR_NOVOHTELSTTAWITRSO);
}

PHP Version

8.4.3

Operating System

No response

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

@lexborisov should I have opened this in your repo?

@lexborisov
Copy link

@xPaw

It looks like it to me.

This is where you have to look at the tree construction specification. It is written there:

https://html.spec.whatwg.org/multipage/parsing.html#acknowledge-self-closing-flag

When a start tag token is emitted with its self-closing flag set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a non-void-html-element-start-tag-with-trailing-solidus parse error.

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

Hmm, that seems confusing to me as to what's the correct implementation for a parser. Browsers obviously accept this.

Edit: actually that appears to be correct, since void elements should be acceptable with the slash? Even the error string implies non-void.

@lexborisov
Copy link

@xPaw

Hmm, that seems confusing to me as to what's the correct implementation for a parser. Browsers obviously accept this.

Edit: actually that appears to be correct, since void elements should be acceptable with the slash?

After reading the spec sheet again,
The problem seems to be solved by changing the condition in the code, to negation.
https://github.com/lexbor/lexbor/blob/58c88147db7342aefbc45d5e5621a974c210106a/source/lexbor/html/tree.h#L333

if (!is_void) {

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

This is how I read it too, but assumed it was too simple of a fix.

@nielsdos
Copy link
Member

I also think the check should probably be negated.

@lexborisov
Copy link

@xPaw @nielsdos

Yeah, it all adds up. I had doubts, because in the code the <param> tag used to be void, but then it was excluded and I didn't notice it, unfortunately.

I'll get the patch ready.

@lexborisov
Copy link

Fixed in lexbor/lexbor@e39083b and lexbor/lexbor@ae97abf

xPaw referenced this issue in lexbor/lexbor Jan 16, 2025
Per report from Pavel Djundik.

This related #17485 issue on GitHub.
@nielsdos
Copy link
Member

@lexborisov Thanks a lot! I'm not sure this is entirely correctly fixed however. After cherry-picking your commits into PHP I get a new warning for this line in this test:

It's a foreign element (in SVG) and I believe it shouldn't warn here based on the spec comments.

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

Just to add about svgs (before the fix at least), I've noticed that <path /> would turn into <path></path> when calling saveHtml.

@nielsdos
Copy link
Member

@xPaw that is at least consistent with my browser. Try this JS:

html = `<!DOCTYPE html>
<html>
<body>
    <svg width="100" height="100">
        <path />
    </svg>
</body>
</html>`;
dom = (new DOMParser).parseFromString(html, 'text/html');
dom.body.innerHTML

@lexborisov
Copy link

lexborisov commented Jan 16, 2025

@nielsdos

@lexborisov Thanks a lot! I'm not sure this is entirely correctly fixed however. After cherry-picking your commits into PHP I get a new warning for this line in this test:

php-src/ext/dom/tests/modern/html/interactions/HTMLDocument_getElementsByTagName.phpt

Line 18 in a4fa1e7

     <circle cx="0" cy="0" r="10"/> 

It's a foreign element (in SVG) and I believe it shouldn't warn here based on the spec comments.

I figured it out, void elements are only checked in the HTML namespace, in other cases it's always false.
But I didn't find any direct text about it. It just comes out of the references in the specification.

https://html.spec.whatwg.org/multipage/syntax.html#elements-2
https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign "Any other start tag"

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

@lexborisov Do you mean this?

Foreign elements must either have a start tag and an end tag, or a start tag that is marked as self-closing, in which case they must not have an end tag.

@lexborisov
Copy link

@lexborisov Do you mean this?

Foreign elements must either have a start tag and an end tag, or a start tag that is marked as self-closing, in which case they must not have an end tag.

But then it is not quite clear why the specification requires to call acknowledge-self-closing-flag for Foreign elements start tag.

Otherwise
Pop the current node off the stack of open elements and acknowledge the token's self-closing flag.

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

Seems like acknowledge disables the error:

When a start tag token is emitted with its self-closing flag set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a non-void-html-element-start-tag-with-trailing-solidus parse error.

@lexborisov
Copy link

Seems like acknowledge disables the error:

When a start tag token is emitted with its self-closing flag set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a non-void-html-element-start-tag-with-trailing-solidus parse error.

It seems to be, but it would seem that it will always be false for Foreign elements. What's the point of calling it in then?
Ok, I'll reread the spec sheet again and if anything, I'll contact the folks at WATHWG.

@xPaw
Copy link
Contributor Author

xPaw commented Jan 16, 2025

The spec is kind of all over the place, but this would imply that it gets set to true during tag name parse:

U+002F SOLIDUS (/)
Switch to the self-closing start tag state.

3.2.5.40 Self-closing start tag state
Consume the next input character:

U+003E GREATER-THAN SIGN (>)
Set the self-closing flag of the current tag token. Switch to the data state. Emit the current tag token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants