Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AutoRAG to Monorepo #960

Draft
wants to merge 63 commits into
base: main
Choose a base branch
from
Draft

Make AutoRAG to Monorepo #960

wants to merge 63 commits into from

Conversation

vkehfdl1
Copy link
Contributor

No description provided.

@vkehfdl1 vkehfdl1 marked this pull request as ready for review November 19, 2024 10:14
@vkehfdl1 vkehfdl1 requested review from hongsw and bwook00 November 19, 2024 10:14
@vkehfdl1 vkehfdl1 marked this pull request as draft November 19, 2024 11:21
홍승우 added 8 commits November 19, 2024 21:00
Added various entries to ignore specific files and directories in both the root directory's .gitignore and the api directory's .dockerignore. Additionally, included a Dockerfile for building a Python 3.10-slim-based API image with specified dependencies and runtime configurations. A docker-compose.yml file was introduced to define services and networks for frontend and API components.
…ents

This commit updates the project naming convention in the README file from "AutoRAG API Server" to "AutoRAG-API" for consistency. Additionally, it modifies the version requirement in the `requirements.txt` file for AutoRAG to be greater than or equal to 0.3.8 to ensure compatibility with the latest features.
jeffrey and others added 3 commits November 20, 2024 09:42
# Conflicts:
#	autorag/autorag/vectordb/couchbase.py
hongsw
hongsw previously approved these changes Nov 20, 2024
.github/workflows/publish.yml Show resolved Hide resolved
vkehfdl1 and others added 23 commits November 23, 2024 22:11
* add delete endpoint and change to .env based operations

* add api endpoint for gathering all env settings

* load env variable when start each task

* change GET /env to return everything (key & values)

---------

Co-authored-by: jeffrey <[email protected]>
# Conflicts:
#	autorag/autorag/vectordb/qdrant.py
…987)

* feat: refactor SQL Trial DB from Pandas Trial DB, and Test code

* 🚑 fix: Set correct WORK_DIR based on environment variable

- Updated the logic in app.py to properly set the `WORK_DIR` based on the environment variable `AUTORAG_API_ENV`. If the environment is 'dev', the `WORK_DIR` will be located at `"../projects"`, otherwise, it will be set to `"projects"`. Additionally, the `.env` file path is now correctly constructed using the determined `WORK_DIR` value.

* 🚑 fix: Update method to use model_validate_json in trial_dict['config'] assignment and update set_trial_config for trial_id with TrialConfig model dump JSON. Add get_all_config_ids and get_all_trial_ids SQL query functions.

* ✨ feat: Add CORS headers and handle OPTIONS requests

This commit introduces the addition of CORS headers in every response and explicit handling of OPTIONS requests in the API server. Includes setting Access-Control-Allow-Origin, Access-Control-Allow-Credentials, Access-Control-Allow-Headers, and Access-Control-Allow-Methods based on the request origin.

* ✅ test: add test file for project creation with setup and cleanup fixtures, including logging configurations, environment setup, client creation, and project directory validation

* 🚑 fix: Remove unnecessary commented-out properties in Trial class

* 🚑 fix: Set correct WORK_DIR based on environment variable AUTORAG_WORK_DIR

* ♻️ refactor: Update code in app.py and schema.py for better handling of working directory and model configuration. Fix deprecated usage in test_app.py and enhance testing in test_trial_config.py.

* 📝 docs: update README with instructions for running using Docker Compose and monitoring options.

* ✨ feat: start parsing documents task with improved import handling

This commit introduces changes to the document parsing task initiation. The import statement for `parse_documents` has been updated within the file. Additionally, the logic for initiating the parsing process has been streamlined and improved for better performance and handling of imports.

* ✅ test: add tests for project database operations such as initializing DB, setting/getting trials, updating trial configurations, and retrieving trial information by project or ID.

* ♻️ refactor: Improve database initialization in SQLiteProjectDB

- Refactored the `_init_db` method to enhance database initialization.
- Added logging and enhanced debugging statements for better clarity.
- Now checks for the existence of the database file and its directory before initializing.
- If the database file does not exist, it creates the necessary directory and tables.
- Adjusted permissions for directories (777) and the database file (666) accordingly.

* 🚑 fix: correct chunking and parsing tasks in trial_tasks.py

* 🔧 chore: Update imports and debug logging level in app.py

- Updated import statement in app.py to include chunk_documents from trial_tasks module.
- Changed the logging level from INFO to DEBUG for more detailed logging information.

* ♻️ refactor: refactor parsing endpoint and improve error handling

- Refactored the parsing endpoint to handle configuration data retrieval more efficiently.
- Improved error handling to provide more informative error messages in case of missing data or failed tasks.

* 🚑 fix: Correct chunked data path and task handling in start_chunking function

* ✨ feat: Configure not to use uvloop, apply nest_asyncio, and correct import in app.py

- Avoid using uvloop by setting asyncio event loop policy to DefaultEventLoopPolicy().
- Apply nest_asyncio after that to prevent conflicts.
- Change the import in app.py from `from database.project_db import SQLiteProjectDB` to the correct import.

refactor: Update Celery configuration in celery_app.py

- Adjust broker and backend URLs to use 'redis://redis:6379/0'.
- Modify the timezone to 'Asia/Seoul' for better synchronization.

* 🚑 fix: Install system dependencies and pip, adjust Dockerfile for API service

- Removed unnecessary comments related to installing pip as it's clear from the command itself
- Added installation of 'watchfiles', setting PYTHONPATH and PYTHONUNBUFFERED environment variables
- Created a directory for celery beat schedule and added an entrypoint script
- Adjusted permissions for the entrypoint script and removed Windows line endings
- Updated entrypoint to /entrypoint.sh in the API service section
- Added environment variables for watching files, setting time zone, log level, and disabling Python output buffering

* 🔧 chore: update subproject commit reference in autorag-frontend

* 🔧 chore: add test_projects to .gitignore

* add new lines and fix .env.dev

* fix chunk_documents

---------

Co-authored-by: Seungwoo hong <Seungwoo hong [email protected]>
Co-authored-by: jeffrey <[email protected]>
* Change all datetime.now() to the timezone UTC

* properly working UTC timezone in the API server

---------

Co-authored-by: jeffrey <[email protected]>
…py (#1005)

* ✨ feat: Add QA document generation task in trial_tasks.py and schema.py

- Added a new field `qa_task_id` in the Trial schema to store the QA task ID.
- Introduced `generate_qa_documents` shared task in `trial_tasks.py` for creating QA documents.
- Updated imports and added `QACreationRequest` in `trial_tasks.py`.
- Included function `run_qa_creation` in `generate_qa_documents` task for generating QA documents with status tracking and database updates.

* 🚑 fix: Return full trial config in get_trial_config

Adjusts the return statement in `get_trial_config` to return the complete trial configuration instead of just the model dump.

* 🔧 chore: update subproject commit in autorag-frontend to 1434e797

---------

Co-authored-by: Seungwoo hong <Seungwoo hong [email protected]>
* Change the WORK_DIR setting

* send file directly
…id. (#1011)

* get all parsed documents and the parse is not relevant to the trial_id now

* add get chunk list at the API server

* chunk document at project view

* /parse POST with parse_name

* QA creation endpoint
* Refactor start_evaluate api endpoint

* if there is no .env, make one

* make to one api endpoint that retrieve file content
/artifacts/content

* add /artifacts/content delete operation to delete the file

* upload korean filenames

* working parse with frontend

* working QA!

* validation 정상화 shout!

* checkpoint (working but no result at evaluation)

* Fix problem that trial_tasks.py cannot load the env

* Finally success!!!!
Working evaluate and validate
…erver with streaming (#1021)

* working running dashboard

* working running and closing report

* working and closing the chat streamlit server

* working and closing the external api server port to 8100
* add parsed file get endpoint

* Add an "all_files" endpoint.
* change to the dynamic root directory

* enable uploading html and data file extensions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add AutoRAG front-end and API server for run AutoRAG
3 participants