Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected sourcepos. #301

Closed
ttakuru88 opened this issue May 3, 2023 · 16 comments · Fixed by #439
Closed

Unexpected sourcepos. #301

ttakuru88 opened this issue May 3, 2023 · 16 comments · Fixed by #439

Comments

@ttakuru88
Copy link

ttakuru88 commented May 3, 2023

Hi. "sourcepos" is exactly what I was looking for.
What do you think of this result?

markdown

[AB
CD](/)

my expected

<p data-sourcepos="1:1-2:6"><a data-sourcepos="1:1-2:6" href="/">AB
CD</a></p>

comrak v0.18.0

<p data-sourcepos="1:1-2:6"><a data-sourcepos="2:1-2:6" href="/">AB
CD</a></p>

Similar pattern.

markdown

- A
[![B](/B.png)](/B)

comrak

<ul data-sourcepos="1:1-2:18">
<li data-sourcepos="1:1-2:18">A
<a data-sourcepos="2:3-2:20" href="/B"><img data-sourcepos="2:4-2:15" src="/B.png" alt="B" /></a></li>

cmark

<ul data-sourcepos="1:1-2:18">
<li data-sourcepos="1:1-2:18">A
<a href="/B"><img src="/B.png" alt="B" /></a></li>
</ul>
@kivikakk
Copy link
Owner

kivikakk commented May 4, 2023

That is a bit surprising! It's actually almost per upstream. Here's cmark-gfm:

$ cat o
[AB
CD](/)

$ build/src/cmark-gfm o -t xml --sourcepos
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document sourcepos="1:1-2:6" xmlns="http://commonmark.org/xml/1.0">
  <paragraph sourcepos="1:1-2:6">
    <link sourcepos="2:1-2:6" destination="/" title="">
      <text sourcepos="1:2-1:3" xml:space="preserve">AB</text>
      <softbreak />
      <text sourcepos="2:1-2:2" xml:space="preserve">CD</text>
    </link>
  </paragraph>
</document>

$ build/src/cmark-gfm o --sourcepos
<p data-sourcepos="1:1-2:6"><a href="/">AB
CD</a></p>

So, in terms of the AST, comrak is doing the same thing as upstream, as shown in <link sourcepos="2:1-2:6"; the difference is that upstream doesn't actually put sourcepos on <a> in output HTML at all.

I'd be happy with fixing this in either direction; either fix how we record sourcepos on the link node, or stop outputting in it HTML. (It's possible that "fixing" the former might have unintended consequences.)

@ttakuru88
Copy link
Author

I see. I prefer the latter(stop outputting).
I don't think writing logic that has nothing to do with the upstream is a good idea since it will make things complex.

@kivikakk
Copy link
Owner

kivikakk commented May 4, 2023

Me too! PRs happily accepted; I don't have the time for this right now myself.

@digitalmoksha
Copy link
Collaborator

Wait...the idea is to not have sourcepos on the <a>? Are you wanting to remove from other non-block html? Please don't...

I tried running it against markdown-it.rs, I get what was expected

<p data-sourcepos="1:1-2:6"><a data-sourcepos="1:1-2:6" href="/">AB
CD</a></p>

@kivikakk
Copy link
Owner

kivikakk commented May 9, 2023

In general, I'd rather "matches the upstream we explicitly follow" than not. Right now we have "doesn't match upstream and is incorrect". I'd accept a solution either way.

@digitalmoksha
Copy link
Collaborator

But it does match the upstream, doesn't it? Block-level sourcepos works correctly I think.

Inline sourcepos is a great enhancement. I'd certainly rather see a fix, but if that's not feasible at the moment, then how about an option to enable it? Those that want it can enable it, and work on a fix. comrak is already going beyond what upstream offers (thankfully) while trying to maintain backward compatibility. So I think ripping it out should be last resort.

@kivikakk kivikakk added the bug label Jul 11, 2024
@kivikakk
Copy link
Owner

Funny thing I found while getting to work on fixing this. I tried pasting the last example, i.e. this:

- A
[![B](/B.png)](/B)

directly into a GitHub comment. Imagine my surprise:

  • A
    B

@kivikakk
Copy link
Owner

#439 fixes this, but only by no longer reporting sourcepos information for inlines in output HTML. (Block-level is unaffected, and the data's still there in the AST or XML output if you want it.)

See #439 (comment) for comments on effort required to get inline sourcepos working consistently.

@digitalmoksha
Copy link
Collaborator

@kivikakk ouch. This is a huge breaking change for us. We've built significant functionality on top of inline sourcepos. Its removal is a real problem for us.

I understand it's not 100% accurate, but it's been pretty darn accurate for most things.

Also releasing it along with the other great fixes you just put in means we can't use them.

Not sure what I'm going to do yet. Probably the only thing I can do is build our own HTML outputter. Or if an inline-sourcepos option was added to re-enable it, obviously with the caveat that it's sometimes inaccurate.

@kivikakk
Copy link
Owner

I'd be happy to add it back in with an option, yeah! But "sometimes inaccurate" to me reads as "garbage in, garbage out"; of course, if your pipeline means those cases aren't hit, it's not such a big deal, but we can't even enumerate them all because there's some fundamental brokenness here. (It's why, I'm guessing, they aren't exposed in cmark.) That brokenness then becomes the sourcepos user's brokenness in unexpected ways.

So, happy to take an option to re-add them in the HTML output, but it needs to be called something like experimental-inline-sourcepos. I think removing it from the default sourcepos output is correct, and I'm sorry that this hurts your workflow!

@digitalmoksha
Copy link
Collaborator

Thanks @kivikakk 🙇. I agree with naming an option experimental-inline-sourcepos.

I'm not happy with "garbage in, garbage out" either. So far we haven't hit these particular cases, but as we come to rely more heavily on it, I'm sure we will eventually run across them.

I'll see about reallocating some of my time to work on some type of fix, though I may have to lean on your expertise 😄

@digitalmoksha
Copy link
Collaborator

To better clarify what we use it for, in GitLab we have a rich text editor for our markdown fields (comments, descriptions, etc). Markdown is the single source of truth, and we convert from markdown to rich text and back. There is actually a button in the edit field that allow you to switch back and forth.

Sourcepos allows us to have fidelity when converting between to the two representations - in general we don't want to change markdown the user has created but not edited. It's still a work in progress, but it's evolving.

You can test it out in any comment field in an issue, for example in https://gitlab.com/gitlab-org/gitlab/-/issues/456980

@kivikakk
Copy link
Owner

I don't quite understand how sourcepos is used in that conversion (my experiments resulted in changes on every roundtrip), but I'm glad it's helpful!

@digitalmoksha
Copy link
Collaborator

Yeah, it's very much a work in progress 😅

Though I am going to have to go back and adjust expectations.

@kivikakk
Copy link
Owner

:D I see! It looks like you might be after more CST-like support as well (thinking of #422 here). In general I don't mind going that route, but for sourcepos fidelity, the big change forewarned about in #439 (comment) will be unavoidable.

@digitalmoksha
Copy link
Collaborator

Possibly, though I don't understand the ramifications of that yet, particularly since at the moment it's the generated HTML that gets converted into the editor.

the big change forewarned about in #439 (comment) will be unavoidable.

yeah I figured

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants