- Clone repository into directory of your choice and navigate to vis folder (hereto referenced as <vis_home>)
- Navigate to .../<vis_home>/ and run the command
sudo sh setup_env.sh
. - Run the command in the CL
python3 setup.py install
To begin utilizing this module, first do import scraper
. This imports scraper.Session
which will keep track of the scraping session that you are currently running and handle multi-threading internally.
In the Scraper module there are two important classes: Scraper and Action. As a rule of thumb, Scraper is the information source, while Action is an action that acts on the information present in a Scraper.
This class is the source of truth for any Action that acts on the scraper. Note that this means the Scraper itself does not actually do anything; you submit actions to this Scraper, initialize the Actions, and then run the queue.
This class's instances act on a Scraper. To run an action immediately, you can use Action.execute(self)
. To spawn an action that attaches to the queue and runs when the resources are available, use Action.run(self)
. Running the latter will attach an action method to the Session queue which will run when an available thread can handle it.
To create custom actions, extend the Action class and override the Action.get_act(self, scraper)
method. This should return a higher-order function that will be run and act on the information stored in the Scraper. It is possible to chain together Actions by creating, in the higher-order function, sub-actions and using Action.execute(scraper)
to immediately run the action, thereby stringing together functionality and consolidating it into a single Action.
import scraper
queue = scraper.Session.action_queue
get_action = scraper.Scraper.Get_Action()
scraper1 = scraper.Scraper.Scraper(site = 'https://www.google.com/', actions = [get_action])
queue.populate_queue()
queue.run()