Skip to content

Commit

Permalink
Merge pull request #4206 from omnivore-app/fix/substack-headings
Browse files Browse the repository at this point in the history
fix: substack headings are removed because its class name contains header
  • Loading branch information
sywhb authored Jul 18, 2024
2 parents fdbd182 + 7d2e10b commit 52341e5
Show file tree
Hide file tree
Showing 7 changed files with 2,268 additions and 341 deletions.
5 changes: 4 additions & 1 deletion packages/readabilityjs/Readability.js
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,10 @@ Readability.prototype = {
unlikelyCandidates: /\bad\b|ai2html|banner|breadcrumbs|breadcrumb|combx|comment|community|cover-wrap|disqus|extra|footer|gdpr|header|legends|menu|related|remark|replies|rss|shoutbox|sidebar|skyscraper|social|sponsor|supplemental|ad-break|agegate|pagination|pager(?!ow)|popup|yom-remote|copyright|keywords|outline|infinite-list|beta|recirculation|site-index|hide-for-print|post-end-share-cta|post-end-cta-full|post-footer|post-head|post-tag|li-date|main-navigation|programtic-ads|outstream_article|hfeed|comment-holder|back-to-top|show-up-next|onward-journey|topic-tracker|list-nav|block-ad-entity|adSpecs|gift-article-button|modal-title|in-story-masthead|share-tools|standard-dock|expanded-dock|margins-h|subscribe-dialog|icon|bumped|dvz-social-media-buttons|post-toc|mobile-menu|mobile-navbar|tl_article_header|mvp(-post)*-(add-story|soc(-mob)*-wrap)|w-condition-invisible|rich-text-block main w-richtext|rich-text-block_ataglance at-a-glance test w-richtext|PostsPage-commentsSection|hide-text|text-blurple|bottom-wrapper/i,
// okMaybeItsACandidate: /and|article(?!-breadcrumb)|body|column|content|main|shadow|post-header/i,
get okMaybeItsACandidate() {
return new RegExp(`and|(?<!${this.articleNegativeLookAheadCandidates.source})article(?!-(${this.articleNegativeLookBehindCandidates.source}))|body|column|content|^(?!main-navigation|main-header)main|shadow|post-header|hfeed site|blog-posts hfeed|container-banners|menu-opacity|header-with-anchor-widget|commentOnSelection|highlight--with-header`, 'i')
return new RegExp(
`and|(?<!${this.articleNegativeLookAheadCandidates.source})article(?!-(${this.articleNegativeLookBehindCandidates.source}))|body|column|content|^(?!main-navigation|main-header)main|shadow|post-header|hfeed site|blog-posts hfeed|container-banners|menu-opacity|header-with-anchor-widget|commentOnSelection|highlight--with-header|header-anchor-post`,
'i'
)
},

positive: /article|body|content|entry|hentry|h-entry|main|page|pagination|post|text|blog|story|tweet(-\w+)?|instagram|image|container-banners|player|commentOnSelection/i,
Expand Down
686 changes: 346 additions & 340 deletions packages/readabilityjs/test/index.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"title": "Seniority",
"byline": "Franco Fernando",
"dir": null,
"excerpt": "How to behave as a senior software engineer: 7 behaviors that every senior software should have.",
"siteName": "The Polymathic Engineer",
"siteIcon": "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a2cb03-66ce-42b1-b84f-69b1c253a8db%2Ffavicon.ico",
"previewImage": "https://substackcdn.com/image/fetch/f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Ffrancofernando.substack.com%2Fapi%2Fv1%2Fpost_preview%2F146543353%2Ftwitter.jpg%3Fversion%3D4",
"publishedDate": "2024-07-13T07:30:49.000Z",
"language": "English",
"readerable": true
}
179 changes: 179 additions & 0 deletions packages/readabilityjs/test/test-pages/substack-headings/expected.html

Large diffs are not rendered by default.

1,631 changes: 1,631 additions & 0 deletions packages/readabilityjs/test/test-pages/substack-headings/source.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://newsletter.francofernando.com/p/seniority

0 comments on commit 52341e5

Please sign in to comment.