Skip to content

Latest commit

 

History

History
22 lines (20 loc) · 1.21 KB

INSTALL.md

File metadata and controls

22 lines (20 loc) · 1.21 KB

Installation instructions

  1. Make sure you have Python 3 installed. I also recommend pyenv for virtualenv and version management.
  2. Clone this repository with git clone https://github.com/kshepherd/feed2html.git
  3. Optional: Create or activate a virtualenv with the standard tools or pyenv
  4. Install requirements with pip install -r requirements
  5. Identify the start URL for your DSpace ListRecords OAI verb, eg. https://openaccess.myinstitution.edu/oai/request?verb=ListRecords&metadataPrefix=oai_dc
  6. Give the output/oaidc2html.xsl stylesheet a quick check to make sure it is
  7. Set up a base directory for your OCFL repository and css files and note the full path
  8. Copy or symlink output/css to this base directory
  9. Begin a crawl! Let's go with that example URL and a base dir of /tmp/site
scrapy crawl oaipmh_dc_xml \
   -a url="https://openaccess.myinstitution.edu/oai/request?verb=ListRecords&metadataPrefix=oai_dc" \
   -a website_title="Test" \
   -a website_subtitle="open access research" \
   -a path_to_assets="/tmp/site" \
   -a path_to_ocfl="/tmp/site/repository" \
   -L INFO

To test just the first page of the OAI results, uncomment CLOSESPIDER_ITEMCOUNT in feed2html/spiders/oaipmh_dc_xml.py