Commit Graph

75 Commits

Author SHA1 Message Date
Derek Worthen
80367be018
Remove config input models (#1570)
* Remove config input models

* remove unit tests related to config input models

* add semversioner change

* Merge branch 'main' into config-remove-input-models
2025-01-02 15:25:10 -08:00
gaudyb
185f513ca7
Basic search implementation (#1563)
* basic search implementation

* basic streaming functionality

* format check

* check fix

* release change

* Chore/gleanings any encoding (#1569)

* Make claims and entities independent of encoding

* Semver

* Change semver release type

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-02 13:49:11 -06:00
Nathan Evans
a2647da473
Simplify flow config (#1554)
* Flatten compute_communities config

* Remove cluster strategy type

* Flatten create_base_text_units config

* Move cluster seed to config default, leave as None in functions

* Remove "prechunked" logic

* Remove hard-coded encoding model

* Remove unused variables

* Strongly type embed_config

* Simplify layout_graph config

* Semver

* Fix integration test

* Fix config unit tests: ignore new config defaults

* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
KennyZhang1
8368b12532
Add Cosmos DB storage/cache option (#1431)
* added cosmosdb constructor and database methods

* added rest of abstract method headers

* added cosmos db container methods

* implemented has and delete methods

* finished implementing abstract class methods

* integrated class into storage factory

* integrated cosmosdb class into cache factory

* added support for new config file fields

* replaced primary key cosmosdb initialization with connection strings

* modified cosmosdb setter to require json

* Fix non-default emitters

* Format

* Ruff

* ruff

* first successful run of cosmosdb indexing

* removed extraneous container_name setting

* require base_dir to be typed as str

* reverted merged changed from closed branch

* removed nested try statement

* readded initial non-parquet emitter fix

* added basic support for parquet emitter using internal conversions

* merged with main and resolved conflicts

* fixed more merge conflicts

* added cosmosdb functionality to query pipeline

* tested query for cosmosdb

* collapsed cosmosdb schema to use minimal containers and databases

* simplified create_database and create_container functions

* ruff fixes and semversioner

* spellcheck and ci fixes

* updated pyproject toml and lock file

* apply fixes after merge from main

* add temporary comments

* refactor cache factory

* refactored storage factory

* minor formatting

* update dictionary

* fix spellcheck typo

* fix default value

* fix pydantic model defaults

* update pydantic models

* fix init_content

* cleanup how factory passes parameters to file storage

* remove unnecessary output file type

* update pydantic model

* cleanup code

* implemented clear method

* fix merge from main

* add test stub for cosmosdb

* regenerate lock file

* modified set method to collapse parquet rows

* modified get method to collapse parquet rows

* updated has and delete methods and docstrings to adhere to new schema

* added prefix helper function

* replaced delimiter for prefixed id

* verified empty tests are passing

* fix merges from main

* add find test

* update cicd step name

* tested querying for new schema

* resolved errors from merge conflicts

* refactored set method to handle cache in new schema

* refactored get method to handle cache in new schema

* force unique ids to be written to cosmos for nodes

* found bug with has and delete methods

* modified has and delete to work with cache in new schema

* fix the merge from main

* minor typo fixes

* update lock file

* spellcheck fix

* fix init function signature

* minor formatting updates

* remove https protocol

* change localhost to 127.0.0.1 address

* update pytest to use bacj engine

* verified cache tests

* improved speed of has function

* resolved pytest error with find function

* added test for child method

* make container_name variable private as _container_name

* minor variable name fix

* cleanup cosmos pytest and make the cosmosdb storage class operations more efficient

* update cicd to use different cosmosdb emulator

* test with http protocol

* added pytest for clear()

* add longer timeout for cosmosdb emulator startup

* revert http connection back to https

* add comments to cicd code for future dev usage

* set to container and database clients to none upon deletion

* ruff changes

* add comments to cicd code

* removed unneeded None statements and ruff fixes

* more ruff fixes

* Update test_run.py

* remove unnecessary call to delete container

* ruff format updates

* Reverted test_run.py

* fix ruff formatter errors

* cleanup variable names to be more consistent

* remove extra semversioner file

* revert pydantic model changes

* revert pydantic model change

* revert pydantic model change

* re-enable inline formatting rule

* update documentation in dev guide

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:43:21 -06:00
Nathan Evans
c1c09bab80
Flow cleanup (#1510)
* Move snapshots out of flows into verbs

* Move degree compute out of extract_graph

* Move entity/relationship df merging into extract

* Move "title" to extraction source

* Move text_unit_ids agg closer to extraction

* Move data definition

* Update test data

* Semver

* Update smoke tests

* Fix empty degree field and update smoke tests and verb data

* Move extractors (#1516)

* Consolidate graph embedding and umap

* Consolidate claim extraction

* Consolidate graph extractor

* Move graph utils

* Move summarizers

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Fix syntax typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 18:07:44 -08:00
Nathan Evans
d0543d1fd6
Move extractors (#1516)
* Consolidate graph embedding and umap

* Consolidate claim extraction

* Consolidate graph extractor

* Move graph utils

* Move summarizers

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 16:21:41 -08:00
Nathan Evans
1d68af308b
Community workflow (#1495)
* Create separate communities workflow

* Add test for new workflow

* Rename workflows

* Collapse subflows into parents

* Rename flows, reuse variables

* Semver

* Fix integration test

* Fix smoke tests

* Fix megapipeline format

* Rename missed files

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-11 15:41:16 -06:00
Josh Bradley
823342188d
Cleanup factory methods (#1482)
* cleanup factory methods to have similar design pattern across codebase

* add semversioner file

* cleanup logging factory

* update developer guide

* add comment

* typo fix

* cleanup reporter terminology

* renmae reporter to logger

* fix comments

* update comment

* instantiate factory classes correctly and update index api callback parameter

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-10 16:11:11 -06:00
Alonso Guevara
04405803db
Add Parent to communities in data model (#1491)
* Add Parent to communities in data model

* Semver

* Pyright

* Update docs

* Use leiden cluster parent id

* Format
2024-12-10 14:38:11 -06:00
Alonso Guevara
1c3b0f34c3
Chore/lib updates (#1477)
* Update dependencies and fix issues

* Format

* Semver

* Fix Pyright

* Pyright

* More Pyright

* Pyright
2024-12-06 14:08:24 -06:00
Chris Trevino
5ff2d3c76d
Remove graphrag.llm, replace with fnllm (#1315)
* add fnllm; remove llm folder

* remove llm unit tests

* update imports

* update imports

* formatting

* enable autosave

* update mockllm

* update community reports extractor

* move most llm usage to fnllm

* update type issues

* fix unit tests

* type updates

* update dictionary

* semver

* update llm construction, get integration tests working

* load from llmparameters model

* move ruff settings to ruff.toml

* add gitattributes file

* ignore ruff.toml spelling

* update .gitattributes

* update gitignore

* update config construction

* update prompt var usage

* add cache adapter

* use cache adapter in embeddings calls

* update embedding strategy

* add fnllm

* add pytest-dotenv

* fix some verb tests

* get verbtests running

* update ruff.toml for vscode

* enable ruff native server in vscode

* update artifact inspecting code

* remove local-test update

* use string.replace instead of string.format in community reprots etxractor

* bump timeout

* revert ruff.toml, vscode settings for another pr

* revert cspell config

* revert gitignore

* remove json-repair, update fnllm

* use fnllm generic type interfaces

* update load_llm to use target models

* consolidate chat parameters

* add 'extra_attributes' prop to community report response

* formatting

* update fnllm

* formatting

* formatting

* Add defaults to some llm params to avoid null on params hash

* Formatting

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 18:07:47 -06:00
Alonso Guevara
d43124e576
Refactor Create Final Community reports to simplify code (#1456)
* Optimize prep claims

* Optimize community hierarchy restore

* Partial optimization of prepare_community_reports

* More optimization code

* Fix context string generation

* Filter community -1

* Fix cache, add more optimization fixes

* Fix local search community ids

* Cleanup

* Format

* Semver

* Remove perf counter

* Unused import

* Format

* Fix edge addition to reports

* Add edge by edge context creation

* Re-org of the optimization code

* Format

* Ruff

* Some Ruff fixes

* More pyright

* More pyright

* Pyright

* Pyright

* Update tests
2024-12-05 17:13:05 -06:00
KennyZhang1
10f84c91eb
Replace md5 hash (#1470)
* switched hashing function helper to sha256

* refactored references to hashing util

* semversioner

* switched from sha256 to sha512

* new semversioner

* updated tests/verbs/data folder

* generated fresh parquet files in data folder

* moved ignore flag
2024-12-05 13:24:35 -06:00
Nathan Evans
d17dfd01f9
Graph collapse (#1464)
* Refactor graph creation

* Semver

* Spellcheck

* Update integ pipeline

* Fix cast

* Improve pandas chaining

* Cleaner apply

* Use list comprehensions

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 11:57:26 -06:00
Josh Bradley
dad2176b3c
Miscellaneous code cleanup procedures (#1452) 2024-11-27 13:27:43 -05:00
Josh Bradley
22a57d14c7
Improve CLI speed with lazy imports (#1319) 2024-11-15 19:41:10 -05:00
Nathan Evans
9b4f24ebce
First cut at config cleanup (#1411)
* Firsst cut at config cleanup

* Reorder top nav

* Add query prompts to tuning page

* Remove dynamic notebook from nav

* Add more thorough yml config descriptions in docs

* Further clean out the config

* Semver

* Add new blog post

* Emphasize yaml

* Clarify output

* Fix unit test

* Fix bullet nesting
2024-11-15 14:33:26 -08:00
Nathan Evans
51912b2e03
Move prompts (#1404)
* Move indexing prompts to root

* Move query prompts to root

* Export query prompts during init

* Extract general knowledge prompt

* Load query prompts from disk

* Semver

* Fix unit tests
2024-11-14 10:45:37 -08:00
Nathan Evans
c8c354e357
Artifact cleanup (#1341)
* Add source documents for verb tests

* Remove entity_type erroneous column

* Add new test data

* Remove source/target degree columns

* Remove top_level_node_id

* Remove chunk column configs

* Rename "chunk" to "text"

* Rename "chunk" to "text" in base

* Re-map document input to use base text units

* Revert base text units as final documents dep

* Update test data

* Split/rename node source_id

* Drop node size (dup of degree)

* Drop document_ids from covariates

* Remove unused document_ids from models

* Remove n_tokens from covariate table

* Fix missed document_ids delete

* Wire base text units to final documents

* Rename relationship rank as combined_degree

* Add rank as first-class property to Relationship

* Remove split_text operation

* Fix relationships test parquet

* Update test parquets

* Add entity ids to community table

* Remove stored graph embedding columns

* Format

* Semver

* Fix JSON typo

* Spelling

* Rename lancedb

* Sort lancedb

* Fix unit test

* Fix test to account for changing period

* Update tests for separate embeddings

* Format

* Better assertion printing

* Fix unit test for windows

* Rename document.raw_content -> document.text

* Remove read_documents function

* Remove unused document summary from model

* Remove unused imports

* Format

* Add new snapshots to default init

* Use util to construct embeddings collection name

* Align inc index model with branch changes

* Update data and tests for int ids

* Clean up embedding locs

* Switch entity "name" to "title" for consistency

* Fix short_id -> human_readable_id defaults

* Format

* Rework community IDs

* Fix community size compute

* Fix unit tests

* Fix report read

* Pare down nodes table output

* Fix unit test

* Fix merge

* Fix community loading

* Format

* Fix community id report extraction

* Update tests

* Consistent short IDs and ordering

* Update ordering and tests

* Update incremental for new nodes model

* Guard document columns loc

* Match column ordering

* Fix document guard

* Update smoke tests

* Fill NA on community extract

* Logging for smoke test debug

* Add parquet schema details doc

* Fix community hierarchy guard

* Use better empty hierarchy guard

* Back-compat shims

* Semver

* Fix warning

* Format

* Remove default fallback

* Reuse key
2024-11-13 15:11:19 -08:00
Alonso Guevara
d9f985ae52
Drift Search CLI, API, Docs and Example Notebook (#1348)
* Drift CLI and backwards compat

* Adding DRIFT Cli, Docs and example notebook

* Update tests and fix ruff

* Format

* Small cleanup

* Fix smoke tests

* Update notebook

* Oopsie fix

* Delete duplicate img
2024-11-05 12:05:19 -06:00
Nathan Evans
634e3ed62a
Transient entity graph (#1349)
* Make base_entity_graph transient

* Add transient snapshots

* Semver

* Fix unit test

* Fix smoke tests
2024-11-04 17:23:29 -08:00
gaudyb
17658c5df8
New workflow to generate embeddings in a single workflow (#1296)
* New workflow to generate embeddings in a single workflow

* New workflow to generate embeddings in a single workflow

* version change

* clean tests without any embeddings references

* clean tests without any embeddings references

* remove code

* feedback implemented

* changes in logic

* feedback implemented

* store in table bug fixed

* smoke test for generate_text_embeddings workflow

* smoke test fix

* add generate_text_embeddings to the list of transient workflows

* smoke tests

* fix

* ruff formatting updates

* fix

* smoke test fixed

* smoke test fixed

* fix lancedb import

* smoke test fix

* ignore sorting

* smoke test fixed

* smoke test fixed

* check smoke test

* smoke test fixed

* change config for vector store

* format fix

* vector store changes

* revert debug profile back to empty filepath

* merge conflict solved

* merge conflict solved

* format fixed

* format fixed

* fix return dataframe

* snapshot fix

* format fix

* embeddings param implemented

* validation fixes

* fix map

* fix map

* fix properties

* config updates

* smoke test fixed

* settings change

* Update collection config and rework back-compat

* Repalce . with - for embedding store

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2024-11-01 15:01:35 -07:00
Alonso Guevara
7235c6faf5
Add Incremental Indexing v1 (#1318)
* Create entypoint for cli and api (#1067)

* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests

* Incremental indexing/file delta (#1123)

* Calculate new inputs and deleted inputs on update

* Semver

* Clear ruff checks

* Fix pyright

* Fix PyRight

* Ruff again

* Update relationships after inc index (#1236)

* Collapse create final community reports (#1227)

* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set

* Collapse create base entity graph (#1233)

* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment

* Collapse create summarized entities (#1237)

* Collapse entity summarize

* Semver

* Collapse create base extracted entities (#1235)

* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests

* Incremental indexing/update final text units (#1241)

* Update final text units

* Format

* Address comments

* Add v1 community merge using time period (#1257)

* Add naive community merge using time period

* formatting

* Query fixes

* Add descriptions from merged_entities

* Add summarization and embeddings

* Use iso format

* Ruff

* Pyright and smoke tests

* Pyright

* Pyright

* Update parquet for verb tests

* Fix smoke tests

* Remove sorting

* Update smoke tests

* Smoke tests

* Smoke tests

* Updated verb test to ack for latest changes on covariates

* Add config for incremental index + Bug fixes (#1317)

* Add config for incremental index + Bug fixes

* Ruff

* Fix smoke tests

* Semversioner

* Small refactor

* Remove unused file

* Ruff

* Update verb tests inputs

* Update verb tests inputs

---------

Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2024-10-30 11:59:44 -06:00
Josh Bradley
0cc79b9cf7
Add backwards compatibility patch for vector store (#1334) 2024-10-29 14:54:08 -04:00
Josh Bradley
083de12bcf
Auto-generate CLI doc pages (#1325) 2024-10-25 19:00:24 -04:00
Josh Bradley
d6e6f5c077
Convert CLI to Typer app (#1305) 2024-10-24 14:22:32 -04:00
Nathan Evans
94f1e62e5c
Rework workflow architecture (#1311)
* Rename pipeline_storage file

* Add runtime storage option to context

* Fix import

* Switch to memory storage for runtime

* Infra for workflow runtime storage

* Migrate base_text_units to runtime storage

* Fix comment

* Semver

* Remove whitespace

* Remove subflow smoke tests and ignore transient artifacts

* Remove entity graph from transient list (not yet implemented)

* Increase smoke runtime allotment for create_base_entity_graph

* Revert format fix

* Remove noqa
2024-10-24 10:20:03 -07:00
Alonso Guevara
8a6d4e66fe
DRIFT Search (#1285)
* drift search

* args for drift global query in local search

* accept drift context in search base

* optionally parse embeddings from df when creating CommunityReport

* abstract class for drift context

* pathing for drift config

* drift config

* add defs for drift config

* formatting

* capture generated tokens in token count

* semversion

* Formatting and ruff

* Some algorithmic refactors

* Ruff

* Format

* Use asdict()

* Address comments

* Update smoke tests

* Update smoke tests

* Update smoke tests part 2

---------

Co-authored-by: Julian Whiting <j2whitin@gmail.com>
2024-10-21 17:22:11 -06:00
KennyZhang1
e0840a2dc4
Fix vector store logic and refactor audience parameter (#1259) 2024-10-21 16:56:56 -04:00
Matthieu Maitre
6aae386b30
Perf optimizations in map_query_to_entities() (#1276)
* Address perf issue in map_query_to_entities()

* Add semver

---------

Co-authored-by: Matthieu Maitre <mmaitre@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-21 12:03:48 -06:00
Nathan Evans
1f70d42572
Empty workflow returns (#1291)
* Skip emitting empty dataframes

* Semver

* Better empty df check
2024-10-17 09:25:36 -07:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows (#1284)
* Copy base documents logic into final documents

* Delete create_base_documents

* Combine graph creation under create_base_entity_graph

* Delete collapsed workflows

* Migrate most graph internals to nx.Graph

* Fix None edge case

* Semver

* Remove comment typo

* Fix smoke tests
2024-10-15 13:58:58 -06:00
Andres Morales
fc9895f793
Replace current docs by mkdocs (#1263)
* Replace docs by mkdocs-material

* Fix markdown

* Fix verions in gh-pages workflow

* remove whitespaces

* add semver

* Add build docs check on python-ci

* Fix command in index cli

* Spellcheck

* Spellcheck

* remove docsite paths

* clear outputs from notebook

* remove dependabot npm for docsite

* remove more docsite left overs

* execute notebooks

* Update notebooks

* update poetry lock

* Remove notebook build from ci

* Revert dep update

* Navigation tabs

* Fix stylesheet

* add kwds to dictionary

* Turn on notebook execution

* Update gitignore

* Add MSR Blog posts

* spellcheck

* Accessibility Changes

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-11 13:39:03 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs (#1248)
* Remove genid

* Move snapshot_rows

* Move snapshot

* Delete spread_json

* Delete unzip

* Delete zip

* Move unpack_graph

* Move compute_edge_combined_degree

* Delete create_graph

* Delete concat

* Delete text replace

* Delete text_translate

* Move text_split

* Inline aggregate override

* Move cluster_graph

* Move merge_graphs

* Semver

* Move text_chunk

* Move layout_graph and fix some __init__s

* Move extract_covariates

* Rename text_split -> split_text

* Move extract_entities

* Move summarize_descriptions

* Rename text_chunk -> chunk_text

* Move community report creation

* Remove verb-level packing operators

* Streamline some naming

* Streamline param name/order

* Move mock LLM data to tests

* Fixed missed rename

* Update some strategy refs

* Rename run_gi

* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Nathan Evans
f5c5876dde
Reorganize flows (#1240)
* Extract base docs and entity graph

* Move extracted entities and text units

* Move communities and community reports

* Move covariates and final documents

* Move entities, nodes, relationships

* Move text_units and summarized entities

* Assert all snapshot null cases

* Remove disabled steps util

* Remove incorrect use of input "others"

* Convert text_embed_df to just return the embeddings, not update the df

* Convert snapshot functions to noops

* Semver

* Remove lingering covariates_enabled param

* Name consistency

* Syntax cleanup
2024-10-02 08:57:08 -07:00
Nathan Evans
9070ea5c3c
Collapse create base extracted entities (#1235)
* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests
2024-09-30 17:32:56 -07:00
Nathan Evans
630679f8e3
Collapse create summarized entities (#1237)
* Collapse entity summarize

* Semver
2024-09-30 17:17:44 -07:00
Nathan Evans
5220bb7ecc
Collapse create base entity graph (#1233)
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
2024-09-30 15:39:42 -07:00
Nathan Evans
00d5e77568
Collapse create final community reports (#1227)
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
2024-09-30 10:46:07 -07:00
Nathan Evans
ce71bcf7fb
Collapse create final entities (#1220)
* Collapse create_final_entities

* Update smoke tests

* Semver

* Remove prints

* Update embedding assertions
2024-09-25 17:35:44 -07:00
Nathan Evans
3217013019
Revisit create final text units (#1216)
* Add embeddings to collapsed subflow

* Semver

* Fix smoke tests
2024-09-25 16:55:27 -07:00
Nathan Evans
73e709b686
Collapse create final covariates (#1215)
* Add covariate test

* Add detailed mock assertions

* Collapse create_final_covariates

* Delete unused doc_id field

* Semver

* Update smoke test

* Remove unused subject/object type columns
2024-09-25 16:30:22 -07:00
Nathan Evans
14750f4d37
Collapse create final documents (#1217)
* Collapse create_final_documents

* Semver
2024-09-25 15:50:46 -07:00
Nathan Evans
f518c8b80b
Collapse relationship embeddings (#1199)
* Merge text_embed into a single relationships subflow

* Update smoke tests

* Semver

* Spelling
2024-09-24 15:03:26 -07:00
Nathan Evans
1755afbdec
Collapse create base text units (#1178)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Setup initial test and config

* Collapse create_base_text_units

* Semver

* Spelling

* Fix smoke tests

* Addres PR comments

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-23 16:55:53 -07:00
Nathan Evans
fbc483e4e5
Collapse create base documents (#1176)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Clean up some df/tests
2024-09-23 13:24:06 -07:00
Nathan Evans
f8ab1b30dc
Collapse create_final_nodes (#1171)
* Collapse create_final_nodes

* Update smoke tests

* Typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 13:48:56 -07:00
Nathan Evans
ae094bb144
Collapse create final relationships (#1158)
* Collapse pre/post embedding workflows

* Semver

* Fix smoke tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 17:38:01 -06:00
Derek Worthen
3b09df6e07
Migrate towards using static output directories (#1113)
* Migrate towards using static output directories

- Fixes load_config eagering resolving directories.
    Directories are only resolved when the output
    directories are local.
- Add support for `--output` and `--reporting` flags
    for index CLI. To achieve previous output structure
    `index --output run1/artifacts --reports run1/reports`.
- Use static output directories when initializing
    a new project.
- Maintains backward compatibility for those using
    timestamp outputs locally.

* fix smoke tests

* update query cli to work with static directories

* remove eager path resolution from load_config. Support CLI overrides that can be resolved.

* add docs and output logs/artifacts to same directory

* use match statement

* switch back to if statement

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-18 17:36:50 -06:00
Nathan Evans
aa5b426f1d
Collapse final communities workflow (#1150)
* Collapse create_final_communities

* Semver

* Spellcheck

* Clean up filtering

* Add space in title

* Format

* Cleanup imports and format

* Spruce up the tests

* Update dictionary.txt

* Spellcheck

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-17 17:04:42 -07:00