Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 621 Bytes

README.md

File metadata and controls

18 lines (12 loc) · 621 Bytes

web-scraper

This is the scraper script that finds all the links in a page using multi-threaded spiders.

Warning!!! It runs multi-threads so can slow down your computer!!!

To-do:

  1. In main.py, change the value in the variables PROJECT_NAME and HOMEPAGE
  2. Go to your terminal and run the python file 'python main.py'
  3. It is going to take some time to complete the process
  4. After completion your links will be in the directory "<PROJECT_NAME>/crawled.txt"

If you want to see total number of pages:

  1. Go to <PROJECT_NAME> directory
  2. Run the file using terminal: python number.py

Resource used: thenewboston