Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buggy output on pages with javascript #53

Open
falkecarlsen opened this issue Oct 28, 2019 · 3 comments
Open

Buggy output on pages with javascript #53

falkecarlsen opened this issue Oct 28, 2019 · 3 comments
Labels
bug Something isn't working

Comments

@falkecarlsen
Copy link
Member

When testing on reddit and a task hit a submission the following output was produced. Something janky might've been submitted to the queue.

Output:

Worker W1 received task https://old.reddit.com/r/BetterEveryLoop/gilded
Worker W1 received task https://old.reddit.com/r/BetterEveryLoop/comments/do6b4r/donald_slowly_realising_a_whole_stadium_is_booing/
Worker W1 received task javascript: void 0;
W1 failed to download a page.
Worker W1 received task https://old.reddit.com/#
Worker W1 received task javascript:void(0)
W1 failed to download a page.
Worker W1 received task javascript:void(0)
W1 failed to download a page.
Worker W1 received task https://old.reddit.com/
@falkecarlsen falkecarlsen added the bug Something isn't working label Oct 28, 2019
@falkecarlsen
Copy link
Member Author

Could have something to do with the extractor that @jenrik built.

@jenrik
Copy link
Member

jenrik commented Oct 29, 2019

The HTMLLinkExtractor assumes that the href attribute of a anchor tag is always a valid URL, so yes it's broken because of the extractor I built

@lars4509
Copy link
Contributor

lars4509 commented Nov 21, 2019

The extractor now only look for schemes where http or https are used, this also mean that javascript:void(0) will be discarded, since the parser will recognise javasript as the scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants