Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query parameter [Based on #357] #403

Merged
merged 11 commits into from
Apr 19, 2024
Merged

Add query parameter [Based on #357] #403

merged 11 commits into from
Apr 19, 2024

Conversation

addie9800
Copy link
Collaborator

This PR is a rebrand of and closes #392 where the code is changed to work with #357
The major change compared to the former version ist that the query_parameter attribute accepts a dictionary of key value pairs. Adding Heise will be done in a separate PR

@addie9800 addie9800 added the DON'T MERGE Current PR is based on another PR. When used, indicate in title with [Based on #...] label Apr 6, 2024
@addie9800 addie9800 changed the title Add query parameter Add query parameter [Based on #357] Apr 6, 2024
@MaxDall MaxDall changed the base branch from master to unbatch-fundus April 6, 2024 13:26
addie9800 added a commit that referenced this pull request Apr 14, 2024
@addie9800 addie9800 removed the DON'T MERGE Current PR is based on another PR. When used, indicate in title with [Based on #...] label Apr 17, 2024
Base automatically changed from unbatch-fundus to master April 18, 2024 10:28
Copy link
Collaborator

@MaxDall MaxDall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addie9800 Thanks for adding. Looks like a great feature :)

@@ -18,7 +18,7 @@

def get_test_article(enum: PublisherEnum, url: Optional[str] = None) -> Optional[Article]:
if url is not None:
source = WebSource([url], publisher=enum.publisher_name)
source = WebSource([url], publisher=enum.publisher_name, query_parameters=enum.query_parameter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is about crawling a specified URL we should not pass the query parameter here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll remove that

delay: Optional[Delay] = None,
):
self.url_source = url_source
self.publisher = publisher
self.url_filter = url_filter
self.request_header = request_header or _default_header
self.query_parameters = query_parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest

self.query_parameters = query_parameters or {}

and remove

if self.query_parameters is not None:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@addie9800 addie9800 requested a review from MaxDall April 19, 2024 10:44
src/fundus/scraping/html.py Outdated Show resolved Hide resolved
@addie9800 addie9800 merged commit 6e5f229 into master Apr 19, 2024
5 checks passed
@addie9800 addie9800 deleted the add-query-parameter branch April 19, 2024 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants