graphrag/.github/workflows/python-integration-tests.yml
KennyZhang1 8368b12532
Add Cosmos DB storage/cache option (#1431)
* added cosmosdb constructor and database methods

* added rest of abstract method headers

* added cosmos db container methods

* implemented has and delete methods

* finished implementing abstract class methods

* integrated class into storage factory

* integrated cosmosdb class into cache factory

* added support for new config file fields

* replaced primary key cosmosdb initialization with connection strings

* modified cosmosdb setter to require json

* Fix non-default emitters

* Format

* Ruff

* ruff

* first successful run of cosmosdb indexing

* removed extraneous container_name setting

* require base_dir to be typed as str

* reverted merged changed from closed branch

* removed nested try statement

* readded initial non-parquet emitter fix

* added basic support for parquet emitter using internal conversions

* merged with main and resolved conflicts

* fixed more merge conflicts

* added cosmosdb functionality to query pipeline

* tested query for cosmosdb

* collapsed cosmosdb schema to use minimal containers and databases

* simplified create_database and create_container functions

* ruff fixes and semversioner

* spellcheck and ci fixes

* updated pyproject toml and lock file

* apply fixes after merge from main

* add temporary comments

* refactor cache factory

* refactored storage factory

* minor formatting

* update dictionary

* fix spellcheck typo

* fix default value

* fix pydantic model defaults

* update pydantic models

* fix init_content

* cleanup how factory passes parameters to file storage

* remove unnecessary output file type

* update pydantic model

* cleanup code

* implemented clear method

* fix merge from main

* add test stub for cosmosdb

* regenerate lock file

* modified set method to collapse parquet rows

* modified get method to collapse parquet rows

* updated has and delete methods and docstrings to adhere to new schema

* added prefix helper function

* replaced delimiter for prefixed id

* verified empty tests are passing

* fix merges from main

* add find test

* update cicd step name

* tested querying for new schema

* resolved errors from merge conflicts

* refactored set method to handle cache in new schema

* refactored get method to handle cache in new schema

* force unique ids to be written to cosmos for nodes

* found bug with has and delete methods

* modified has and delete to work with cache in new schema

* fix the merge from main

* minor typo fixes

* update lock file

* spellcheck fix

* fix init function signature

* minor formatting updates

* remove https protocol

* change localhost to 127.0.0.1 address

* update pytest to use bacj engine

* verified cache tests

* improved speed of has function

* resolved pytest error with find function

* added test for child method

* make container_name variable private as _container_name

* minor variable name fix

* cleanup cosmos pytest and make the cosmosdb storage class operations more efficient

* update cicd to use different cosmosdb emulator

* test with http protocol

* added pytest for clear()

* add longer timeout for cosmosdb emulator startup

* revert http connection back to https

* add comments to cicd code for future dev usage

* set to container and database clients to none upon deletion

* ruff changes

* add comments to cicd code

* removed unneeded None statements and ruff fixes

* more ruff fixes

* Update test_run.py

* remove unnecessary call to delete container

* ruff format updates

* Reverted test_run.py

* fix ruff formatter errors

* cleanup variable names to be more consistent

* remove extra semversioner file

* revert pydantic model changes

* revert pydantic model change

* revert pydantic model change

* re-enable inline formatting rule

* update documentation in dev guide

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:43:21 -06:00

101 lines
2.8 KiB
YAML

name: Python Integration Tests
on:
push:
branches:
- "**/main" # match branches like feature/main
- "main" # match the main branch
pull_request:
types:
- opened
- reopened
- synchronize
- ready_for_review
branches:
- "**/main"
- "main"
paths-ignore:
- "**/*.md"
- ".semversioner/**"
permissions:
contents: read
pull-requests: read
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
# only run the for the latest commit
cancel-in-progress: true
env:
POETRY_VERSION: 1.8.3
jobs:
python-ci:
# skip draft PRs
if: github.event.pull_request.draft == false
strategy:
matrix:
python-version: ["3.10"]
os: [ubuntu-latest, windows-latest]
fail-fast: false # continue running all jobs even if one fails
env:
DEBUG: 1
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: changes
with:
filters: |
python:
- 'graphrag/**/*'
- 'poetry.lock'
- 'pyproject.toml'
- '**/*.py'
- '**/*.toml'
- '**/*.ipynb'
- '.github/workflows/python*.yml'
- 'tests/integration/**/*'
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install Poetry
uses: abatilo/actions-poetry@v3.0.0
with:
poetry-version: $POETRY_VERSION
- name: Install dependencies
shell: bash
run: |
poetry self add setuptools wheel
poetry run python -m pip install gensim
poetry install
- name: Build
run: |
poetry build
- name: Install Azurite
id: azuright
uses: potatoqualitee/azuright@v1.1
# For more information on installation/setup of Azure Cosmos DB Emulator
# https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-develop-emulator?tabs=docker-linux%2Cpython&pivots=api-nosql
# Note: the emulator is only available on Windows runners. It can take longer than the default to initially startup so we increase the default timeout.
# If a job fails due to timeout, restarting the cicd job usually resolves the problem.
- name: Install Azure Cosmos DB Emulator
if: runner.os == 'Windows'
run: |
Write-Host "Launching Cosmos DB Emulator"
Import-Module "$env:ProgramFiles\Azure Cosmos DB Emulator\PSModules\Microsoft.Azure.CosmosDB.Emulator"
Start-CosmosDbEmulator -Timeout 500
- name: Integration Test
run: |
poetry run poe test_integration