Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow resume (Crawler) #127

Open
stooit opened this issue Apr 6, 2020 · 0 comments
Open

Allow resume (Crawler) #127

stooit opened this issue Apr 6, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@stooit
Copy link
Contributor

stooit commented Apr 6, 2020

Description
If a crawl is interrupted it needs to start over. The cache makes this very fast as it passes through cached content, but it would be better if it simply resumed from where it left off.

Proposed solution
Keep a copy of MigrateCrawlQueue urls and pendingUrls in a lockfile. If the same config is run again ask the user if they would like to resume.

Additional context
Currently moderately painful when doing very large sites (e.g millions of pages)

@stooit stooit added the enhancement New feature or request label Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant