web-scraper

This is the scraper script that finds all the links in a page using multi-threaded spiders.

Warning!!! It runs multi-threads so can slow down your computer!!!

To-do:

In main.py, change the value in the variables PROJECT_NAME and HOMEPAGE
Go to your terminal and run the python file 'python main.py'
It is going to take some time to complete the process
After completion your links will be in the directory "<PROJECT_NAME>/crawled.txt"

If you want to see total number of pages:

Go to <PROJECT_NAME> directory
Run the file using terminal: python number.py

Resource used: thenewboston