Allow resume (Crawler) #127

stooit · 2020-04-06T18:14:17Z

Description
If a crawl is interrupted it needs to start over. The cache makes this very fast as it passes through cached content, but it would be better if it simply resumed from where it left off.

Proposed solution
Keep a copy of MigrateCrawlQueue urls and pendingUrls in a lockfile. If the same config is run again ask the user if they would like to resume.

Additional context
Currently moderately painful when doing very large sites (e.g millions of pages)

The text was updated successfully, but these errors were encountered:

stooit added the enhancement New feature or request label Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow resume (Crawler) #127

Allow resume (Crawler) #127

stooit commented Apr 6, 2020

Allow resume (Crawler) #127

Allow resume (Crawler) #127

Comments

stooit commented Apr 6, 2020