Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom options for URL normalisation #40

Open
falkecarlsen opened this issue Oct 23, 2019 · 0 comments
Open

Allow custom options for URL normalisation #40

falkecarlsen opened this issue Oct 23, 2019 · 0 comments
Labels
enhancement New feature or request req/could-have could be a nice to finish before deadline

Comments

@falkecarlsen
Copy link
Member

Note that the www domain label is enough to change the equality of two URLs according to the Url-package. This is expected behaviour, as it does actually change semantics.

For scraping, some links may use the domain label and others may not, e.g. human-written links would probably omit the label while automated links would probably include it for completeness. This could result in a page with two seemingly different tasks, that are actually pointing to the same page.

@falkecarlsen falkecarlsen added the enhancement New feature or request label Oct 23, 2019
@lars4509 lars4509 self-assigned this Oct 23, 2019
@falkecarlsen falkecarlsen added the req/could-have could be a nice to finish before deadline label Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request req/could-have could be a nice to finish before deadline
Projects
None yet
Development

No branches or pull requests

2 participants