Skip to content

ivanprytula/price-navigator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web application for efficient (easy and cheap) shopping of groceries & household items

language Built with Cookiecutter Django CI pre-commit.ci status Black code style

Formulation of the current problem

No easy(!) way of having head-to-head price comparison of specific food items between different shops on specific day/period. Why it is so?

Big local stores and supermarkets have their promotional brochures/leaflets in paper and digital forms:

  • paper brochures/leaflets which are distributed into physical mailboxes of citizens as well as those leaflets are present inside stores near entrance
  • proprietary mobile apps and websites of specific store/supermarket
  • 'aggregational' mobile app and website "Moja Gazetka" with leaflets of many stores/supermarkets

The latter option is so far the best 'cause we have many leaflets with prices in one place.

But the main inconvenience of using those leaflets (either paper or digital) is that leaflets are maid as carousel of HTML pages - aka paper journal or flip book. Those pages include photos, price tags with prev/new prices, for what unit/pcs that price is, etc. Examples: link1, link2, link3

As a customer I want to have convenient "single source of truth" - one web/mobile app which helps me to buy goods cheaply whenever I go shopping.

The questions that must be answered and solved are:

  • how to easily get/grab these prices and items names so we can have a list of all items that are promoted on specific dates (usually, its 2-3 days or 1 week)?
  • how to parse/collect data from many webpages?
  • where we can find APIs of those stores/supermarkets - it's much easier to work with structured JSON/XML responses

Solution, v0.0.1

Web application with the following high-level components and functionality:

  • Backend
    • web framework
      • web scraping/parsing
        • agnostic scrapping, e.g. it is possible to get data from any shop with some configurations adjustments
        • asynchronous scraping and data processing for faster I/O operations
      • data analysis
        • ETL pipeline to aggregate data into database
        • filtering, grouping, and aggregating the data to extract insights
        • data visualization: analytic dashboard with sorting/filtering, visualization, data export in different formats
      • user accounts: authorization (sign up/sign in), profile, UI settings for personalization etc.
      • admin panel
    • databases (SQL and noSQL)
      • store and retrieve data
        • design schemas
        • write queries
        • CRUD operations
        • optimize performance
    • background processing
      • display up-to-date information without any delays
    • testing
      • unit tests for your code to ensure that it is working correctly
      • load testing to ensure that the dashboard can handle high levels of traffic
  • Frontend
    • design / mockups
    • lightweight UI framework (HTML/CSS/JS) or big JavaScript framework
    • visualization dashboard
    • users can enter the URL of a website they want to scrape
    • specify what data they want to extract, and view the results of the scraping and analysis
  • Backend / frontend communication
    • RESTful API
    • OpenAPI docs
  • DevOps
    • code repository
    • containerization
    • orchestration
    • continuous integration and delivery (CI/CD)
    • clouds services

More details about architecture and tech stack can be found in documentation

Installation

git clone https://github.com/ivanprytula/price-navigator.git

# or
gh repo clone ivanprytula/price-navigator

cd price-navigator

Local setup

  1. Start [status|stop] PostgreSQL server: sudo systemctl start postgresql or sudo service postgresql start
  2. Create a new PostgreSQL database with ...
    1. PostgreSQL client psql (steps below)
    2. Shell CLI createdb
    3. pgAdmin
    4. Your preferable way
  3. Create/activate a virtualenv
    1. python3.10 -m venv <virtual env path>
    2. source <virtual env path>/bin/activate
    3. pip install -r requirements.local.txt
  4. Install pre-commit hook: pre-commit install
  5. Set the environment variables:
    1. Create/copy .env file in the root of your project with all needed variables: mv env.example.local .env | cp env.example.local .env
      1. then export DJANGO_READ_DOT_ENV_FILE=True
      2. or use a local environment manager like direnv (NB: you also need .envrc file)
  6. 'Dry run' w/o applying migrations - just spin off classic ./manage.py runserver or ./manage.py runserver_plus (w/ watchdog and Werkzeug debugger)
  7. Or skip prev step and do 'full run': ./manage.py migrate -> ./manage.py runserver 0.0.0.0:8000
  8. Visit http://127.0.0.1:8000/
  9. Setting up your users:
    1. normal user account: just go to Sign Up and fill out the form. Once you submit it, you'll see a "Verify Your E-mail Address" page. Go to your console to see a simulated email verification message. Copy the link into your browser. Now the user's email should be verified and ready to go.
    2. python manage.py createsuperuser
  10. Sanity checks of code quality: run test, type checks, linter, sort imports, formatter
    1. pytest -p no:warnings -v
    2. mypy price_navigator/
    3. flake8
    4. isort .
    5. black --config pyproject.toml .
  11. Run the following command from the project directory to build and explore HTML documentation: make -C docs livehtml
# verbose option (2.1)
sudo -u postgres -i psql

CREATE DATABASE price_navigator;
CREATE USER price_dwh_user WITH PASSWORD 'my_password';
# it is recommended to set these stuff also
# https://docs.djangoproject.com/en/4.2/ref/databases/#optimizing-postgresql-s-configuration
ALTER ROLE price_dwh_user SET client_encoding TO 'utf8';
ALTER ROLE price_dwh_user SET default_transaction_isolation TO 'read committed';
ALTER ROLE price_dwh_user SET timezone TO 'UTC';
GRANT ALL PRIVILEGES ON DATABASE "price_navigator" to price_dwh_user;

# jic, if you are in a hurry, here is simplified one-liner command
sudo -u postgres psql -c 'create database price_navigator;'
postgres=# \l  # list all databases

Dockerized setup

Development setup
# OPTION 1. Use dedicated files in .envs/.local/

# [!] This option is currently used (17-06-2023)
# docker-compose.yml + docker-compose.dev.yml

# NB: read if you have Docker Engine 23.0+ version
docker info
# https://docs.docker.com/engine/reference/commandline/build/#use-a-dockerignore-file

# OPTION 2. Load environment variables into shell
# This method is similar to CI as on CI pipeline we have env vars (plain text) and secrets (configured for repository)
# For this use: docker-compose.yml + docker-compose.dev-with-environment-attribute.yml

# check current $SHELL environment vars
env

# double-check that load_env_vars.sh file is executable, 'x' must be in OWNER permissions
ls -l --human-readable
-rwxr-xr-x   1 OWNER GROUP OTHERS  size Jun 13 03:50 load_env_vars.sh
# if not:
chmod u=rwx,go=r load_env_vars.sh
# if in rush:
chmod +x load_env_vars.sh

. ./load_env_vars.sh
# or
source ./load_env_vars.sh

# confirm that project env vars in $SHELL
env

# 2. Build and spin containers
make build
make up

# dive deep inside django & db containers
make sh-django
make sh-db

# stop and remove containers
make down

# Connect to db in VSCode with SQLTools extension
# 1.
docker inspect price_navigator_local_postgres  # container name
# 2. find in output line:
"IPAddress": "192.168.80.3", # address could be another
# 3. use this IPAddress and other data from .postgres/.env file to config connection

# 4. example
{
  "previewLimit": 50,
  "server": "192.168.80.3",
  "port": 5432,
  "driver": "PostgreSQL",
  "name": "price_navigator_docker",
  "database": "postgres",
  "username": "postgres"
}

Working process

  • Check Makefile for shortened versions of verbose commands
  • Explore and use Django management commands: ./manage.py --help and django-extensions commands

Main URLs

About

[WIP] Django scrapping and data analysis app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published