+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[6], line 1
+----> 1 resp =awaitsearch.asearch("Who is agent Mercer?")
+
+NameError: name 'search' is not defined
@@ -2511,10 +2488,16 @@ search = DRIFTSearch(
-
-
Out[7]:
-
-
"Agent Alex Mercer is a central figure at the Dulce Military Base, particularly within the Paranormal Military Squad. He is noted for his leadership in navigating complex interstellar communications and decoding alien transmissions. Mercer's background as a former military officer equips him with strategic thinking and a calm, authoritative demeanor, which significantly influences the squad's operations [Data: Reports (50); Sources (20, 16, 18, 23, 15, 24, 26)].\n\n### Role and Responsibilities\nMercer leads missions that require strategic thinking and adaptability, indicating a background likely involving military training and experience in high-stakes, covert operations. His leadership style is characterized by diplomacy and caution, ensuring mission success while maintaining focus amid high-stakes activities [Data: Reports (50); Sources (20, 16, 18, 23, 15, 24, 26)].\n\n### Collaboration\nHe collaborates with key figures like Taylor Cruz and Dr. Jordan Hayes. While Mercer focuses on strategic decisions, Cruz brings pragmatism, and Dr. Hayes provides technical analysis, highlighting the collaborative nature of their work [Data: Reports (50)].\n\n### Symbolic Presence\nMercer's presence is symbolic of authority and respect, underscoring his influence and the pivotal nature of his leadership in ensuring the success of the squad's mission [Data: Reports (50)].\n\nOverall, Alex Mercer is a prominent leader with a focus on strategic thinking and interstellar diplomacy, although the full scope of his impact and the specific nature of the alien messages he handles remain somewhat mysterious [Data: Reports (50)]."
+
+
+
+
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[7], line 1
+----> 1resp.response
+
+NameError: name 'resp' is not defined
@@ -2555,11 +2538,11 @@ search = DRIFTSearch(
---------------------------------------------------------------------------
-TypeError Traceback (most recent call last)
+NameError Traceback (most recent call last)
Cell In[8], line 1
-----> 1resp.response["nodes"][0]["answer"]
+----> 1resp.response["nodes"][0]["answer"]
-TypeError: string indices must be integers, not 'str'
Load community reports as context for global search¶
-
Load all community reports in the create_final_community_reports table from the GraphRAG, to be used as context data for global search.
-
Load entities from the create_final_nodes and create_final_entities tables from the GraphRAG, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)
-
Load all communities in the create_final_communites table from the GraphRAG, to be used to reconstruct the community graph hierarchy for dynamic community selection.
+
Load all community reports in the community_reports table from GraphRAG, to be used as context data for global search.
+
Load entities from the entities tables from GraphRAG, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)
+
Load all communities in the communities table from the GraphRAG, to be used to reconstruct the community graph hierarchy for dynamic community selection.
# parquet files generated from indexing pipelineINPUT_DIR="./inputs/operation dulce"
-COMMUNITY_TABLE="create_final_communities"
-COMMUNITY_REPORT_TABLE="create_final_community_reports"
-ENTITY_TABLE="create_final_nodes"
-ENTITY_EMBEDDING_TABLE="create_final_entities"
+COMMUNITY_TABLE="communities"
+COMMUNITY_REPORT_TABLE="community_reports"
+ENTITY_TABLE="entities"# community level in the Leiden community hierarchy from which we will load the community reports# higher value means we use reports from more fine-grained communities (at the cost of higher computation cost)
@@ -2172,10 +2171,9 @@ token_encoder = tiktoken.encoding_for_model(llm_model)
# parquet files generated from indexing pipeline
INPUT_DIR = "./inputs/operation dulce"
-COMMUNITY_TABLE = "create_final_communities"
-COMMUNITY_REPORT_TABLE = "create_final_community_reports"
-ENTITY_TABLE = "create_final_nodes"
-ENTITY_EMBEDDING_TABLE = "create_final_entities"
+COMMUNITY_TABLE = "communities"
+COMMUNITY_REPORT_TABLE = "community_reports"
+ENTITY_TABLE = "entities"
# community level in the Leiden community hierarchy from which we will load the community reports
# higher value means we use reports from more fine-grained communities (at the cost of higher computation cost)
@@ -2207,11 +2205,10 @@ COMMUNITY_LEVEL = 2
Total report count: 72
-Report count after filtering by community level 2: 56
-
-
-
-
-
Out[5]:
-
-
-
-
-
-
-
-
id
-
human_readable_id
-
community
-
parent
-
level
-
title
-
summary
-
full_content
-
rank
-
rank_explanation
-
findings
-
full_content_json
-
period
-
size
-
-
-
-
-
0
-
16949a5d17b740b2b4a6f787b0a637f1
-
43
-
43
-
10
-
2
-
Ben Bloomberg and the Harmoniser Project
-
The community centers around Ben Bloomberg, a ...
-
# Ben Bloomberg and the Harmoniser Project\n\n...
-
7.5
-
The impact severity rating is high due to the ...
-
[{'explanation': 'Ben Bloomberg is a pivotal f...
-
{\n "title": "Ben Bloomberg and the Harmoni...
-
2025-01-10
-
35
-
-
-
1
-
4ff756b7041f4dcab6612e016af2b14d
-
44
-
44
-
10
-
2
-
North Hampton and Influential Musicians
-
The community centers around North Hampton, a ...
-
# North Hampton and Influential Musicians\n\nT...
-
6.5
-
The impact severity rating is moderately high ...
-
[{'explanation': 'North Hampton serves as the ...
-
{\n "title": "North Hampton and Influential...
-
2025-01-10
-
4
-
-
-
2
-
2d3df394272743a781606ad80ccb5312
-
45
-
45
-
10
-
2
-
Prince of Monaco and Monaco
-
The community revolves around the Prince of Mo...
-
# Prince of Monaco and Monaco\n\nThe community...
-
4.0
-
The impact severity rating is moderate due to ...
-
[{'explanation': 'The Prince of Monaco is a ke...
-
{\n "title": "Prince of Monaco and Monaco",...
-
2025-01-10
-
2
-
-
-
3
-
becbd958973f42b0bd53cca9250feaf1
-
46
-
46
-
10
-
2
-
Robot Opera and Broadway
-
The community revolves around the Robot Opera,...
-
# Robot Opera and Broadway\n\nThe community re...
-
7.5
-
The impact severity rating is high due to the ...
-
[{'explanation': 'The Robot Opera is a notable...
-
{\n "title": "Robot Opera and Broadway",\n ...
-
2025-01-10
-
2
-
-
-
4
-
f7d29921ae3e41a79ae7f88dae584892
-
47
-
47
-
13
-
2
-
Ben and Jacob's Fusion of Art and Technology
-
The community centers around Ben and Jacob, wh...
-
# Ben and Jacob's Fusion of Art and Technology...
-
7.5
-
The impact severity rating is high due to the ...
-
[{'explanation': 'Ben and Jacob are key collab...
-
{\n "title": "Ben and Jacob's Fusion of Art...
-
2025-01-10
-
5
-
-
-
-
+File ~/.cache/pypoetry/virtualenvs/graphrag-F2jvqev7-py3.11/lib/python3.11/site-packages/pandas/io/parquet.py:267, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
+ 264if manager =="array":
+ 265 to_pandas_kwargs["split_blocks"] =True# type: ignore[assignment]
+--> 267 path_or_handle, handles, filesystem =_get_path_or_handle(
+ 268path,
+ 269filesystem,
+ 270storage_options=storage_options,
+ 271mode="rb",
+ 272)
+ 273try:
+ 274 pa_table =self.api.parquet.read_table(
+ 275 path_or_handle,
+ 276 columns=columns,
+ (...)
+ 279**kwargs,
+ 280 )
+
+File ~/.cache/pypoetry/virtualenvs/graphrag-F2jvqev7-py3.11/lib/python3.11/site-packages/pandas/io/parquet.py:140, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
+ 130 handles =None
+ 131if (
+ 132not fs
+ 133andnot is_dir
+ (...)
+ 138# fsspec resources can also point to directories
+ 139# this branch is used for example when reading from non-fsspec URLs
+--> 140 handles =get_handle(
+ 141path_or_handle,mode,is_text=False,storage_options=storage_options
+ 142)
+ 143 fs =None
+ 144 path_or_handle = handles.handle
+
+File ~/.cache/pypoetry/virtualenvs/graphrag-F2jvqev7-py3.11/lib/python3.11/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
+ 873 handle =open(
+ 874 handle,
+ 875 ioargs.mode,
+ (...)
+ 878 newline="",
+ 879 )
+ 880else:
+ 881# Binary mode
+--> 882 handle =open(handle,ioargs.mode)
+ 883 handles.append(handle)
+ 885# Convert BytesIO or file objects passed with an encoding
+
+FileNotFoundError: [Errno 2] No such file or directory: './inputs/operation dulce/communities.parquet'
@@ -2393,8 +2325,8 @@ Report count after filtering by community level 2: 56
-
-
+
+
@@ -2429,6 +2361,29 @@ Report count after filtering by community level 2: 56
+
+
+
+
+
+
+
+
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[6], line 2
+ 1 context_builder = GlobalCommunityContext(
+----> 2 community_reports=reports,
+ 3 communities=communities,
+ 4 entities=entities, # default to None if you don't want to use community weights for ranking
+ 5 token_encoder=token_encoder,
+ 6 )
+
+NameError: name 'reports' is not defined
+
+
+
+
@@ -2512,8 +2467,8 @@ reduce_llm_params = {
-
-
+
+
@@ -2562,6 +2517,36 @@ reduce_llm_params = {
+
+
+
+
+
+
+
+
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[8], line 3
+ 1 search_engine = GlobalSearch(
+ 2 llm=llm,
+----> 3 context_builder=context_builder,
+ 4 token_encoder=token_encoder,
+ 5 max_data_tokens=12_000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
+ 6 map_llm_params=map_llm_params,
+ 7 reduce_llm_params=reduce_llm_params,
+ 8 allow_general_knowledge=False, # set this to True will add instruction to encourage the LLM to incorporate general knowledge in the response, which may increase hallucinations, but could be useful in some use cases.
+ 9 json_mode=True, # set this to False if your LLM model does not support JSON mode.
+ 10 context_builder_params=context_builder_params,
+ 11 concurrent_coroutines=32,
+ 12 response_type="multiple paragraphs", # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
+ 13 )
+
+NameError: name 'context_builder' is not defined
+
+
+
+
@@ -2603,17 +2588,17 @@ print(result.response)
-
-
### Cosmic Vocalization: An Overview
+
+
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[9], line 1
+----> 1 result =awaitsearch_engine.asearch(
+ 2"What is Cosmic Vocalization and who are involved in it?"
+ 3 )
+ 5print(result.response)
-Cosmic Vocalization is a term coined by Jordan Hayes to describe a repeating sequence found in cryptic communications. This concept is pivotal as it serves as a reference point for both humanity and extraterrestrial entities, facilitating a mutual understanding and interpretation of signals exchanged during the Interstellar Duet [Data: Reports (65)]. The idea of Cosmic Vocalization underscores the importance of establishing a common ground in interstellar communications, which is crucial for the success of such exchanges.
-
-### Key Participants
-
-The Paranormal Military Squad plays a significant role in activities related to Cosmic Vocalization. They are integral participants in the Galactic Orchestra, which encompasses the Interstellar Duet and the exchange of Harmonious Signals [Data: Reports (65)]. This involvement highlights the strategic importance of Cosmic Vocalization in broader interstellar and paranormal military operations, suggesting that these communications are not only scientific but also have potential military applications.
-
-In summary, Cosmic Vocalization is a critical concept in the realm of interstellar communication, with the Paranormal Military Squad being key participants in its related activities. This involvement indicates a blend of scientific exploration and strategic military interests in the ongoing efforts to understand and utilize these cryptic communications.
-
Load community reports as context for global search¶
-
Load all community reports in the create_final_community_reports table from the ire-indexing engine, to be used as context data for global search.
-
Load entities from the create_final_nodes and create_final_entities tables from the ire-indexing engine, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)
-
Load all communities in the create_final_communites table from the ire-indexing engine, to be used to reconstruct the community graph hierarchy for dynamic community selection.
+
Load all community reports in the community_reports table from the indexing engine, to be used as context data for global search.
+
Load entities from the entities tables from the indexing engine, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)
+
Load all communities in the communities table from the indexing engine, to be used to reconstruct the community graph hierarchy for dynamic community selection.
# parquet files generated from indexing pipelineINPUT_DIR="./inputs/operation dulce"
-COMMUNITY_TABLE="create_final_communities"
-COMMUNITY_REPORT_TABLE="create_final_community_reports"
-ENTITY_TABLE="create_final_nodes"
-ENTITY_EMBEDDING_TABLE="create_final_entities"
+COMMUNITY_TABLE="communities"
+COMMUNITY_REPORT_TABLE="community_reports"
+ENTITY_TABLE="entities"# we don't fix a specific community level but instead use an agent to dynamicially# search through all the community reports to check if they are relevant.
@@ -2066,10 +2065,9 @@ token_encoder = tiktoken.encoding_for_model(llm_model)
# parquet files generated from indexing pipeline
INPUT_DIR = "./inputs/operation dulce"
-COMMUNITY_TABLE = "create_final_communities"
-COMMUNITY_REPORT_TABLE = "create_final_community_reports"
-ENTITY_TABLE = "create_final_nodes"
-ENTITY_EMBEDDING_TABLE = "create_final_entities"
+COMMUNITY_TABLE = "communities"
+COMMUNITY_REPORT_TABLE = "community_reports"
+ENTITY_TABLE = "entities"
# we don't fix a specific community level but instead use an agent to dynamicially
# search through all the community reports to check if they are relevant.
@@ -2101,17 +2099,16 @@ COMMUNITY_LEVEL = None
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[6], line 10
+ 1 mini_llm = ChatOpenAI(
+ 2 api_key=api_key,
+ 3 model="gpt-4o-mini",
+ 4 api_type=OpenaiApiType.OpenAI, # OpenaiApiType.OpenAI or OpenaiApiType.AzureOpenAI
+ 5 max_retries=20,
+ 6 )
+ 7 mini_token_encoder = tiktoken.encoding_for_model(mini_llm.model)
+ 9 context_builder = GlobalCommunityContext(
+---> 10 community_reports=reports,
+ 11 communities=communities,
+ 12 entities=entities, # default to None if you don't want to use community weights for ranking
+ 13 token_encoder=token_encoder,
+ 14 dynamic_community_selection=True,
+ 15 dynamic_community_selection_kwargs={
+ 16"llm": mini_llm,
+ 17"token_encoder": mini_token_encoder,
+ 18 },
+ 19 )
+
+NameError: name 'reports' is not defined
+
+
+
+
@@ -2448,8 +2415,8 @@ reduce_llm_params = {
-
-
+
+
@@ -2498,6 +2465,36 @@ reduce_llm_params = {
+
+
+
+
+
+
+
+
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[8], line 3
+ 1 search_engine = GlobalSearch(
+ 2 llm=llm,
+----> 3 context_builder=context_builder,
+ 4 token_encoder=token_encoder,
+ 5 max_data_tokens=12_000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
+ 6 map_llm_params=map_llm_params,
+ 7 reduce_llm_params=reduce_llm_params,
+ 8 allow_general_knowledge=False, # set this to True will add instruction to encourage the LLM to incorporate general knowledge in the response, which may increase hallucinations, but could be useful in some use cases.
+ 9 json_mode=True, # set this to False if your LLM model does not support JSON mode.
+ 10 context_builder_params=context_builder_params,
+ 11 concurrent_coroutines=32,
+ 12 response_type="multiple paragraphs", # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
+ 13 )
+
+NameError: name 'context_builder' is not defined
+
+
+
+
@@ -2539,17 +2536,17 @@ print(result.response)
-
-
### Cosmic Vocalization: An Overview
+
+
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[9], line 1
+----> 1 result =awaitsearch_engine.asearch(
+ 2"What is Cosmic Vocalization and who are involved in it?"
+ 3 )
+ 5print(result.response)
-Cosmic Vocalization is a term coined by Jordan Hayes to describe a repeating sequence found in cryptic communications. This concept is pivotal as it serves as a reference point for both humanity and extraterrestrial entities, facilitating a mutual understanding and interpretation of signals exchanged during what is known as the Interstellar Duet [Data: Reports (65)].
-
-### Key Involvement
-
-Jordan Hayes is notably involved in the development and use of the concept of Cosmic Vocalization. They utilize this term to articulate the repeating sequence in these communications, highlighting its significance in bridging the communicative gap between different entities [Data: Reports (65)].
-
-The involvement of Jordan Hayes underscores the importance of Cosmic Vocalization in the broader context of interstellar communication, suggesting that it may play a crucial role in future interactions and understandings between humans and extraterrestrial beings.
-
+---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+Cell In[11], line 2
+ 1# inspect number of LLM calls and tokens in dynamic community selection
+----> 2 llm_calls =result.llm_calls_categories["build_context"]
+ 3 prompt_tokens = result.prompt_tokens_categories["build_context"]
+ 4 output_tokens = result.output_tokens_categories["build_context"]
+
+NameError: name 'result' is not defined
diff --git a/examples_notebooks/index_migration/index.html b/examples_notebooks/index_migration_to_v1/index.html
similarity index 98%
rename from examples_notebooks/index_migration/index.html
rename to examples_notebooks/index_migration_to_v1/index.html
index 4eaaa3ba..a1c0a3f6 100644
--- a/examples_notebooks/index_migration/index.html
+++ b/examples_notebooks/index_migration_to_v1/index.html
@@ -16,7 +16,7 @@
- Index migration - GraphRAG
+ Index migration to v1 - GraphRAG
@@ -72,7 +72,7 @@
This notebook is used to maintain data model parity with older indexes for version 2.0 of GraphRAG. If you have a pre-2.0 index and need to migrate without re-running the entire pipeline, you can use this notebook to only update the pieces necessary for alignment. If you have a pre-1.0 index, please run the v1 migration notebook first!
+
NOTE: we recommend regenerating your settings.yml with the latest version of GraphRAG using graphrag init. Copy your LLM settings into it before running this notebook. This ensures your config is aligned with the latest version for the migration. This also ensures that you have default vector store config, which is now required or indexing will fail.
+
WARNING: This will overwrite your parquet files, you may want to make a backup!
+
+
+
+
+
+
+
+
+
+
In [2]:
+
+
+
+
+Copied!
+
+
+
+
+
# This is the directory that has your settings.yaml
+PROJECT_DIRECTORY="<your project directory"
+
+
# This is the directory that has your settings.yaml
+PROJECT_DIRECTORY = "
+
+---------------------------------------------------------------------------
+FileNotFoundError Traceback (most recent call last)
+Cell In[3], line 6
+ 3fromgraphrag.config.load_configimport load_config
+ 4fromgraphrag.storage.factoryimport StorageFactory
+----> 6 config =load_config(Path(PROJECT_DIRECTORY))
+ 7 storage_config = config.output.model_dump()
+ 8 storage = StorageFactory().create_storage(
+ 9 storage_type=storage_config["type"],
+ 10 kwargs=storage_config,
+ 11 )
+
+File ~/work/graphrag/graphrag/graphrag/config/load_config.py:183, in load_config(root_dir, config_filepath, cli_overrides)
+ 151"""Load configuration from a file.
+ 152
+ 153Parameters
+ (...)
+ 180 If there are pydantic validation errors when instantiating the config.
+ 181"""
+ 182 root = root_dir.resolve()
+--> 183 config_path =_get_config_path(root,config_filepath)
+ 184 _load_dotenv(config_path)
+ 185 config_extension = config_path.suffix
+
+File ~/work/graphrag/graphrag/graphrag/config/load_config.py:106, in _get_config_path(root_dir, config_filepath)
+ 104raiseFileNotFoundError(msg)
+ 105else:
+--> 106 config_path =_search_for_config_in_root_dir(root_dir)
+ 108ifnot config_path:
+ 109 msg =f"Config file not found in root directory: {root_dir}"
+
+File ~/work/graphrag/graphrag/graphrag/config/load_config.py:40, in _search_for_config_in_root_dir(root)
+ 38ifnot root.is_dir():
+ 39 msg =f"Invalid config path: {root} is not a directory"
+---> 40raiseFileNotFoundError(msg)
+ 42for file in _default_config_files:
+ 43if (root / file).is_file():
+
+FileNotFoundError: Invalid config path: /home/runner/work/graphrag/graphrag/docs/examples_notebooks/<your project directory is not a directory
+
+
+
+
+
+
+
+
+
+
+
+
In [4]:
+
+
+
+
+Copied!
+
+
+
+
+
defremove_columns(df,columns):
+"""Remove columns from a DataFrame, suppressing errors."""
+ df.drop(labels=columns,axis=1,errors="ignore",inplace=True)
+
+
def remove_columns(df, columns):
+ """Remove columns from a DataFrame, suppressing errors."""
+ df.drop(labels=columns, axis=1, errors="ignore", inplace=True)
+
+
+
+
+
+
+
+
+
+
+
+
In [5]:
+
+
+
+
+Copied!
+
+
+
+
+
fromgraphrag.utils.storageimport(
+ delete_table_from_storage,
+ load_table_from_storage,
+ write_table_to_storage,
+)
+
+final_documents=awaitload_table_from_storage("create_final_documents",storage)
+final_text_units=awaitload_table_from_storage("create_final_text_units",storage)
+final_entities=awaitload_table_from_storage("create_final_entities",storage)
+final_nodes=awaitload_table_from_storage("create_final_nodes",storage)
+final_relationships=awaitload_table_from_storage(
+ "create_final_relationships",storage
+)
+final_communities=awaitload_table_from_storage("create_final_communities",storage)
+final_community_reports=awaitload_table_from_storage(
+ "create_final_community_reports",storage
+)
+
+# we've renamed document attributes as metadata
+if"attributes"infinal_documents.columns:
+ final_documents.rename(columns={"attributes":"metadata"},inplace=True)
+
+# we're removing the nodes table, so we need to copy the graph columns into entities
+graph_props=(
+ final_nodes.loc[:,["id","degree","x","y"]].groupby("id").first().reset_index()
+)
+final_entities=final_entities.merge(graph_props,on="id",how="left")
+
+# we renamed all the output files for better clarity now that we don't have workflow naming constraints from DataShaper
+awaitwrite_table_to_storage(final_documents,"documents",storage)
+awaitwrite_table_to_storage(final_text_units,"text_units",storage)
+awaitwrite_table_to_storage(final_entities,"entities",storage)
+awaitwrite_table_to_storage(final_relationships,"relationships",storage)
+awaitwrite_table_to_storage(final_communities,"communities",storage)
+awaitwrite_table_to_storage(final_community_reports,"community_reports",storage)
+
+# delete all the old versions
+awaitdelete_table_from_storage("create_final_documents",storage)
+awaitdelete_table_from_storage("create_final_text_units",storage)
+awaitdelete_table_from_storage("create_final_entities",storage)
+awaitdelete_table_from_storage("create_final_nodes",storage)
+awaitdelete_table_from_storage("create_final_relationships",storage)
+awaitdelete_table_from_storage("create_final_communities",storage)
+awaitdelete_table_from_storage("create_final_community_reports",storage)
+
+
from graphrag.utils.storage import (
+ delete_table_from_storage,
+ load_table_from_storage,
+ write_table_to_storage,
+)
+
+final_documents = await load_table_from_storage("create_final_documents", storage)
+final_text_units = await load_table_from_storage("create_final_text_units", storage)
+final_entities = await load_table_from_storage("create_final_entities", storage)
+final_nodes = await load_table_from_storage("create_final_nodes", storage)
+final_relationships = await load_table_from_storage(
+ "create_final_relationships", storage
+)
+final_communities = await load_table_from_storage("create_final_communities", storage)
+final_community_reports = await load_table_from_storage(
+ "create_final_community_reports", storage
+)
+
+# we've renamed document attributes as metadata
+if "attributes" in final_documents.columns:
+ final_documents.rename(columns={"attributes": "metadata"}, inplace=True)
+
+# we're removing the nodes table, so we need to copy the graph columns into entities
+graph_props = (
+ final_nodes.loc[:, ["id", "degree", "x", "y"]].groupby("id").first().reset_index()
+)
+final_entities = final_entities.merge(graph_props, on="id", how="left")
+
+# we renamed all the output files for better clarity now that we don't have workflow naming constraints from DataShaper
+await write_table_to_storage(final_documents, "documents", storage)
+await write_table_to_storage(final_text_units, "text_units", storage)
+await write_table_to_storage(final_entities, "entities", storage)
+await write_table_to_storage(final_relationships, "relationships", storage)
+await write_table_to_storage(final_communities, "communities", storage)
+await write_table_to_storage(final_community_reports, "community_reports", storage)
+
+# delete all the old versions
+await delete_table_from_storage("create_final_documents", storage)
+await delete_table_from_storage("create_final_text_units", storage)
+await delete_table_from_storage("create_final_entities", storage)
+await delete_table_from_storage("create_final_nodes", storage)
+await delete_table_from_storage("create_final_relationships", storage)
+await delete_table_from_storage("create_final_communities", storage)
+await delete_table_from_storage("create_final_community_reports", storage)