Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support HTML definition lists (<dl>, <dt>, and <dd>) #173

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

chrispy-snps
Copy link
Collaborator

@chrispy-snps chrispy-snps commented Dec 31, 2024

Fixes #172.

New convert_dt() and convert_dd() functions are added that follow the PHP Markdown Extra syntax:

https://michelf.ca/projects/php-markdown/extra/#def-list

If additional definition list dialects are requested in the future, a configuration option can be added to select the format.

No convert_dl() function is added; the child-tag conversion functions do all the work.

The regression tests are updated to test various structures. I also used Pandoc to confirm that all Markdownify results are converted back to the expected HTML source.

Note: This pull request requires that #171 be merged first; otherwise the test_dl unit test will fail.

Limitations

There are two limitations in this support, both related to the fact that blank lines are added outside the convert_dt() and convert_dd() function scopes.

Limitation 1 - multiple terms sharing the same definition are not handled properly (the term lines are separated by a blank line instead of kept directly adjacent):

<dl>
  <dt>term 1a</dt>
  <dt>term 1b</dt>
  <dd>definition</dd>
</dl>

Limitation 2 - a blank line is always inserted before definitions, causing them to signify paragraph-based definitions even when they were not:

<dl>
  <dt>term 1</dt>
  <dd>bare definition</dd>
  <dt>term 2</dt>
  <dd><p>definition in paragraph</p></dd>
</dl>

@ninsbl
Copy link

ninsbl commented Jan 8, 2025

I just tested the feature locally as I am interested in the functionality for markdown conversion. It is great to see this becoming available, I noticed that the result is one big text paragraph. I think an additional newline would be needed after the <dd> elements (for example).

@chrispy-snps
Copy link
Collaborator Author

@ninsbl - can you share a small testcase here for me to reproduce it?

@ninsbl
Copy link

ninsbl commented Jan 8, 2025

Sure. We are currently looking into moving documentation for GRASS GIS from HTML to Markdown.

One manual page with <dl>, <dt> and <dd> is for example: https://grass.osgeo.org/grass84/manuals/grass.html

Here is the code used for translation to Markdown (with python-markdownify installed for the branch)

import bs4
import requests

from markdownify import markdownify, MarkdownConverter

resp = requests.get("https://grass.osgeo.org/grass84/manuals/grass.html")
soup = bs4.BeautifulSoup(resp.text, 'html5lib')
MarkdownConverter(
    **{
        "heading_style": "atx",
        "escape_misc": True,
        "code_language": "shell",
        "newline_style": "backslash",
        "wrap_width": 79
    }
).convert_soup(soup)

You can see resulting markdown here: https://github.com/ninsbl/grass/edit/md_test/md/grass.md. Please look at the "FLAGS" section...

Note the line break before the : (which becomes a white space) and only one linebreak after the <dd> element becoming space as well...

Also, element content seems only parially wrapped...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support HTML definition lists (<dl>, <dt>, and <dd>)
2 participants