Traditional RAG systems excel at finding semantically similar documents but fail catastrophically when queries require connecting information across multiple sources or understanding dataset-wide themes. A question like “What are the main themes in our customer feedback?” cannot be answered by retrieving the top 10 most similar chunks because no single chunk contains the answer. GraphRAG addresses this fundamental limitation by constructing knowledge graphs that capture entities, relationships, and hierarchical structures, enabling both local queries (specific facts) and global queries (dataset-wide insights).
The Context Collapse Problem in Traditional RAG
Vector-based RAG systems suffer from context collapse, the phenomenon where crucial relationships between information are lost when documents are chunked and embedded independently. Consider a corpus of internal company documents discussing a product launch. Traditional RAG might successfully retrieve chunks mentioning the product name, launch date, or key stakeholders, but fails to understand how these elements connect to form a coherent narrative about project delays, budget overruns, or strategic pivots.
Microsoft Research demonstrated this limitation using the VIINA dataset of news articles. When asked “What has Novorossiya done?”, baseline RAG failed completely because no single chunk contained a comprehensive answer. The information was distributed across multiple articles discussing different incidents and relationships. GraphRAG succeeded by constructing a knowledge graph where Novorossiya appeared as an entity connected to various actions through explicit relationships, enabling retrieval of all relevant information regardless of document boundaries.
Research shows traditional RAG achieves only 23% accuracy on multi-hop reasoning tasks, questions requiring synthesis across multiple information sources. GraphRAG achieves 87% accuracy on the same tasks by maintaining explicit relationship structures that enable traversal from one piece of information to another through shared attributes.
graph TD
A[Traditional RAG Limitations] --> B[Context Collapse]
A --> C[Poor Multi-Hop Reasoning]
A --> D[Cannot Answer Global Questions]
B --> B1[Chunks Processed Independently]
B --> B2[Relationships Lost]
B --> B3[23% Accuracy on Complex Queries]
C --> C1[Cannot Connect Dots]
C --> C2[Fails on Traversal Questions]
C --> C3[Limited to Single-Hop Retrieval]
D --> D1[Cannot Summarize Themes]
D --> D2[No Dataset-Wide Understanding]
D --> D3[Retrieves Similar NOT Relevant]
E[GraphRAG Solution] --> F[Knowledge Graph Construction]
E --> G[Community Detection]
E --> H[Hierarchical Summaries]
F --> F1[Entities and Relationships]
F --> F2[87% Accuracy on Complex Queries]
F --> F3[Explicit Connections Preserved]
G --> G1[Semantic Clustering]
G --> G2[Theme Identification]
G --> G3[Multi-Level Organization]
H --> H1[Global Query Support]
H --> H2[Dataset-Wide Insights]
H --> H3[Pre-Computed Summaries]
style A fill:#ffebee
style E fill:#e8f5e9
style B3 fill:#fff4e1
style F2 fill:#e1f5ffGraphRAG Architecture: From Documents to Knowledge Graphs
GraphRAG transforms unstructured text into structured knowledge through a multi-stage pipeline that extracts entities, relationships, and hierarchical communities. The process begins by chunking documents into analyzable units, then uses LLMs to identify entities (people, organizations, concepts) and relationships (actions, associations, temporal sequences) within each chunk. These entities and relationships form nodes and edges in a knowledge graph that spans the entire corpus.
The architecture differs fundamentally from traditional RAG in three ways. First, it performs comprehensive analysis of all documents during indexing rather than on-demand retrieval. Second, it builds explicit relationship structures that enable graph traversal algorithms. Third, it applies community detection algorithms to identify semantic clusters and generates summaries at multiple hierarchical levels, enabling both detailed and high-level queries.
graph LR
A[Raw Documents] --> B[Text Chunking]
B --> C[Entity Extraction via LLM]
B --> D[Relationship Extraction via LLM]
C --> E[Entity Nodes]
D --> F[Relationship Edges]
E --> G[Knowledge Graph]
F --> G
G --> H[Community Detection]
H --> I[Hierarchical Clustering]
I --> J[Level 0: Global Themes]
I --> K[Level 1: Topic Clusters]
I --> L[Level 2: Specific Concepts]
J --> M[LLM Summary Generation]
K --> M
L --> M
M --> N[Community Summaries]
O[Query] --> P{Query Type}
P -->|Local| Q[Entity/Relationship Search]
P -->|Global| R[Community Summary Retrieval]
Q --> S[Graph Traversal]
S --> T[Multi-Hop Results]
R --> U[Hierarchical Navigation]
U --> V[Dataset-Wide Insights]
G --> Q
N --> R
style A fill:#f0f0f0
style G fill:#e1f5ff
style N fill:#fff4e1
style T fill:#e8f5e9
style V fill:#f3e5f5Entity and Relationship Extraction
The extraction phase uses carefully designed prompts that instruct the LLM to identify domain-specific entities and relationships. For a technical documentation corpus, entities might include software components, API endpoints, configuration parameters, and error codes, while relationships capture dependencies, version compatibility, and integration patterns. For business documents, entities might be projects, stakeholders, budgets, and milestones, with relationships representing ownership, funding, and temporal sequences.
Microsoft’s implementation uses GPT-4 for extraction with prompts tuned to the specific domain. The prompt engineering process typically starts with generic entity types (Person, Organization, Location, Event) and refines based on corpus analysis. Automatic prompt tuning techniques analyze sample documents to identify frequently occurring entity types and relationship patterns, then generate domain-specific extraction prompts that capture 40-60% more relevant information than generic prompts.
Here is a Python implementation showing entity and relationship extraction with Azure OpenAI:
from openai import AzureOpenAI
import json
from typing import List, Dict, Tuple
import networkx as nx
from dataclasses import dataclass
import hashlib
@dataclass
class Entity:
name: str
type: str
description: str
source_chunks: List[str]
@dataclass
class Relationship:
source: str
target: str
relationship_type: str
description: str
source_chunks: List[str]
class GraphRAGIndexer:
def __init__(
self,
azure_endpoint: str,
api_key: str,
api_version: str = "2024-02-15-preview",
deployment_name: str = "gpt-4o"
):
self.client = AzureOpenAI(
azure_endpoint=azure_endpoint,
api_key=api_key,
api_version=api_version
)
self.deployment = deployment_name
self.graph = nx.Graph()
self.entities = {}
self.relationships = []
def extract_entities_relationships(
self,
text_chunk: str,
chunk_id: str,
entity_types: List[str] = None
) -> Tuple[List[Entity], List[Relationship]]:
"""Extract entities and relationships from a text chunk using LLM"""
if entity_types is None:
entity_types = [
"Person", "Organization", "Location", "Technology",
"Product", "Concept", "Event", "Date"
]
extraction_prompt = f"""
You are an expert at extracting structured information from text.
Extract all entities and relationships from the following text.
ENTITY TYPES TO IDENTIFY:
{', '.join(entity_types)}
RELATIONSHIP TYPES TO IDENTIFY:
- WORKS_FOR: Person works for Organization
- LOCATED_IN: Entity is located in Location
- USES: Entity uses Technology/Product
- PART_OF: Entity is part of another Entity
- RELATES_TO: General relationship between entities
- HAPPENED_ON: Event happened on Date
- DEVELOPED_BY: Product/Technology developed by Person/Organization
TEXT TO ANALYZE:
{text_chunk}
Return a JSON object with this exact structure:
{{
"entities": [
{{
"name": "entity name",
"type": "entity type from list above",
"description": "brief description of the entity"
}}
],
"relationships": [
{{
"source": "source entity name",
"target": "target entity name",
"type": "relationship type from list above",
"description": "description of the relationship"
}}
]
}}
Return ONLY the JSON object, no other text.
"""
response = self.client.chat.completions.create(
model=self.deployment,
messages=[
{"role": "system", "content": "You extract structured information and return only valid JSON."},
{"role": "user", "content": extraction_prompt}
],
temperature=0.1,
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
# Convert to Entity and Relationship objects
entities = []
for e in result.get("entities", []):
entity = Entity(
name=e["name"],
type=e["type"],
description=e.get("description", ""),
source_chunks=[chunk_id]
)
entities.append(entity)
relationships = []
for r in result.get("relationships", []):
relationship = Relationship(
source=r["source"],
target=r["target"],
relationship_type=r["type"],
description=r.get("description", ""),
source_chunks=[chunk_id]
)
relationships.append(relationship)
return entities, relationships
def build_knowledge_graph(
self,
documents: List[Dict[str, str]],
chunk_size: int = 1000,
chunk_overlap: int = 200
):
"""Build knowledge graph from documents"""
print(f"Processing {len(documents)} documents...")
for doc_idx, doc in enumerate(documents):
content = doc.get("content", "")
doc_id = doc.get("id", f"doc_{doc_idx}")
# Chunk document
chunks = self._chunk_text(content, chunk_size, chunk_overlap)
for chunk_idx, chunk in enumerate(chunks):
chunk_id = f"{doc_id}_chunk_{chunk_idx}"
print(f" Processing {chunk_id}...")
# Extract entities and relationships
entities, relationships = self.extract_entities_relationships(
chunk,
chunk_id
)
# Add entities to graph
for entity in entities:
entity_id = self._get_entity_id(entity.name, entity.type)
if entity_id in self.entities:
# Merge with existing entity
self.entities[entity_id].source_chunks.append(chunk_id)
else:
# Add new entity
self.entities[entity_id] = entity
self.graph.add_node(
entity_id,
name=entity.name,
type=entity.type,
description=entity.description
)
# Add relationships to graph
for rel in relationships:
source_id = self._get_entity_id(rel.source, None)
target_id = self._get_entity_id(rel.target, None)
if source_id in self.graph and target_id in self.graph:
self.graph.add_edge(
source_id,
target_id,
type=rel.relationship_type,
description=rel.description
)
self.relationships.append(rel)
print(f"\nKnowledge graph built:")
print(f" Entities: {len(self.entities)}")
print(f" Relationships: {len(self.relationships)}")
print(f" Graph nodes: {self.graph.number_of_nodes()}")
print(f" Graph edges: {self.graph.number_of_edges()}")
def detect_communities(self, resolution: float = 1.0) -> Dict[str, int]:
"""Detect communities using Louvain algorithm"""
from networkx.algorithms import community
print("\nDetecting communities...")
# Use Louvain method for community detection
communities = community.louvain_communities(
self.graph,
resolution=resolution,
seed=42
)
# Create mapping of node to community
node_to_community = {}
for comm_idx, comm_nodes in enumerate(communities):
for node in comm_nodes:
node_to_community[node] = comm_idx
print(f" Detected {len(communities)} communities")
# Add community info to graph
nx.set_node_attributes(self.graph, node_to_community, "community")
return node_to_community
def generate_community_summaries(
self,
node_to_community: Dict[str, int]
) -> Dict[int, str]:
"""Generate summaries for each community using LLM"""
print("\nGenerating community summaries...")
# Group entities by community
communities = {}
for node_id, comm_id in node_to_community.items():
if comm_id not in communities:
communities[comm_id] = []
entity = self.entities.get(node_id)
if entity:
communities[comm_id].append(entity)
# Generate summary for each community
summaries = {}
for comm_id, entities in communities.items():
print(f" Summarizing community {comm_id} ({len(entities)} entities)...")
# Prepare entity descriptions
entity_descriptions = [
f"- {e.name} ({e.type}): {e.description}"
for e in entities[:50] # Limit to avoid token overflow
]
summary_prompt = f"""
Analyze the following entities that belong to a semantic community and provide:
1. A descriptive title for this community (5-10 words)
2. A comprehensive summary of the main themes and concepts (2-3 paragraphs)
3. Key entities that best represent this community (top 5)
ENTITIES IN COMMUNITY:
{chr(10).join(entity_descriptions)}
Return a JSON object:
{{
"title": "community title",
"summary": "detailed summary",
"key_entities": ["entity1", "entity2", ...]
}}
"""
response = self.client.chat.completions.create(
model=self.deployment,
messages=[
{"role": "system", "content": "You analyze entity clusters and create summaries."},
{"role": "user", "content": summary_prompt}
],
temperature=0.3,
response_format={"type": "json_object"}
)
summary_data = json.loads(response.choices[0].message.content)
summaries[comm_id] = summary_data
return summaries
def _chunk_text(
self,
text: str,
chunk_size: int,
overlap: int
) -> List[str]:
"""Simple text chunking with overlap"""
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunk = text[start:end]
chunks.append(chunk)
start += chunk_size - overlap
return chunks
def _get_entity_id(self, name: str, entity_type: str = None) -> str:
"""Generate consistent entity ID"""
key = f"{name.lower()}_{entity_type}" if entity_type else name.lower()
return hashlib.md5(key.encode()).hexdigest()[:16]
def save_graph(self, filepath: str):
"""Save graph to file"""
nx.write_graphml(self.graph, filepath)
print(f"\nGraph saved to {filepath}")
# Usage example
indexer = GraphRAGIndexer(
azure_endpoint="https://your-openai.openai.azure.com",
api_key="your-api-key",
deployment_name="gpt-4o"
)
# Sample documents
documents = [
{
"id": "doc1",
"content": """Azure AI Search is a cloud search service that provides infrastructure,
APIs, and tools for building search experiences. It was developed by Microsoft and
integrates with Azure OpenAI Service for embedding generation. The service uses
vector search algorithms like HNSW for efficient similarity search."""
},
{
"id": "doc2",
"content": """Microsoft Azure offers comprehensive AI services through Azure OpenAI Service.
This service provides access to GPT-4 and other language models. Azure AI Search works
alongside Azure OpenAI to enable RAG implementations for enterprise applications."""
}
]
# Build knowledge graph
indexer.build_knowledge_graph(documents)
# Detect communities
communities = indexer.detect_communities(resolution=1.0)
# Generate summaries
summaries = indexer.generate_community_summaries(communities)
# Print community summaries
for comm_id, summary in summaries.items():
print(f"\n{'='*60}")
print(f"Community {comm_id}: {summary['title']}")
print(f"{'='*60}")
print(summary['summary'])
print(f"\nKey Entities: {', '.join(summary['key_entities'])}")
# Save graph
indexer.save_graph("knowledge_graph.graphml")This implementation demonstrates the core GraphRAG indexing pipeline. The extract_entities_relationships method uses structured prompts with JSON response format to ensure consistent extraction. The build_knowledge_graph method processes documents in chunks, merging duplicate entities and building the graph structure. Community detection uses the Louvain algorithm, which identifies densely connected clusters through modularity optimization. The generate_community_summaries method creates natural language descriptions of each community, enabling global query answering.
Hierarchical Community Structure
Community detection algorithms like Louvain produce hierarchical clusters at multiple resolution levels. Level 0 communities represent the highest-level themes (3-10 communities for typical datasets), capturing broad semantic categories. Level 1 communities provide more granular topic clusters (20-50 communities), while Level 2 identifies specific concepts and subcategories (100-200 communities).
For a podcast transcript dataset with 1 million tokens, GraphRAG might identify Level 0 communities like “Technology Innovation”, “Business Strategy”, and “Social Impact”. Level 1 communities under Technology Innovation might include “Artificial Intelligence”, “Cloud Computing”, and “Cybersecurity”. Level 2 communities under Artificial Intelligence might cover “Large Language Models”, “Computer Vision”, and “Reinforcement Learning”. This hierarchical structure enables queries at different granularities from broad themes to specific details.
Query Types: Local vs Global
GraphRAG supports two fundamentally different query patterns that traditional RAG cannot handle effectively. Local queries seek specific facts about particular entities or relationships, similar to traditional RAG but enhanced by graph traversal. Global queries ask about dataset-wide themes, trends, or comprehensive summaries that require synthesizing information across the entire corpus.
graph TD
A[Query] --> B{Classify Query Type}
B -->|Local Query| C[Entity Identification]
B -->|Global Query| D[Community Summary Retrieval]
C --> C1[Find Entities in Graph]
C1 --> C2[Traverse Relationships]
C2 --> C3[Multi-Hop Navigation]
C3 --> C4[Collect Connected Information]
C4 --> E[Generate Answer with Context]
D --> D1[Identify Relevant Communities]
D1 --> D2[Retrieve Level 0 Summaries]
D2 --> D3[Drill Down to Level 1/2]
D3 --> D4[Aggregate Insights]
D4 --> F[Generate Comprehensive Answer]
G[Local Query Examples] --> G1[What has Entity X done?]
G --> G2[How is A related to B?]
G --> G3[Who works for Company Y?]
H[Global Query Examples] --> H1[What are main themes?]
H --> H2[Summarize the dataset]
H --> H3[What trends emerged?]
style C fill:#e1f5ff
style D fill:#fff4e1
style E fill:#e8f5e9
style F fill:#f3e5f5Local Query Implementation
Local queries start by identifying entities mentioned in the query, then traverse the graph to find connected information. A query like “What partnerships has Microsoft announced?” would identify “Microsoft” as an entity, find all edges with relationship type “PARTNERSHIP”, and retrieve descriptions of connected entities and the relationships themselves.
Multi-hop queries extend this pattern by traversing multiple edges. “What technologies do companies that partner with Microsoft use?” would first find Microsoft’s partners, then traverse from each partner to their associated technologies. This pattern enables answering complex questions that require synthesis across multiple relationship hops.
Here is a Node.js implementation for local query processing:
import { AzureOpenAI } from '@azure/openai';
import neo4j from 'neo4j-driver';
interface Entity {
name: string;
type: string;
description: string;
}
interface Relationship {
source: string;
target: string;
type: string;
description: string;
}
class GraphRAGQuery {
private openaiClient: AzureOpenAI;
private neo4jDriver: any;
private deploymentName: string;
constructor(
openaiEndpoint: string,
openaiKey: string,
neo4jUri: string,
neo4jUser: string,
neo4jPassword: string,
deploymentName: string = 'gpt-4o'
) {
this.openaiClient = new AzureOpenAI({
endpoint: openaiEndpoint,
apiKey: openaiKey,
apiVersion: '2024-02-15-preview'
});
this.neo4jDriver = neo4j.driver(
neo4jUri,
neo4j.auth.basic(neo4jUser, neo4jPassword)
);
this.deploymentName = deploymentName;
}
async localQuery(query: string, maxHops: number = 2): Promise<string> {
console.log(`\nProcessing local query: ${query}`);
// Step 1: Extract entities from query
const entities = await this.extractQueryEntities(query);
console.log(`Identified entities:`, entities);
if (entities.length === 0) {
return "No entities found in query. Please try a more specific question.";
}
// Step 2: Find entities in graph and traverse relationships
const graphContext = await this.traverseGraph(entities, maxHops);
console.log(`Retrieved ${graphContext.entities.length} entities and ${graphContext.relationships.length} relationships`);
// Step 3: Generate answer using graph context
const answer = await this.generateAnswer(query, graphContext);
return answer;
}
private async extractQueryEntities(query: string): Promise<string[]> {
const extractionPrompt = `
Extract all entity names mentioned in this query.
Return only a JSON array of entity names, nothing else.
Query: ${query}
Example response: ["Microsoft", "Azure", "OpenAI"]
`;
const response = await this.openaiClient.chat.completions.create({
model: this.deploymentName,
messages: [
{ role: 'system', content: 'You extract entity names and return JSON arrays.' },
{ role: 'user', content: extractionPrompt }
],
temperature: 0.1,
responseFormat: { type: 'json_object' }
});
const result = JSON.parse(response.choices[0].message.content);
return result.entities || [];
}
private async traverseGraph(
entities: string[],
maxHops: number
): Promise<{ entities: Entity[], relationships: Relationship[] }> {
const session = this.neo4jDriver.session();
try {
// Cypher query to find entities and traverse relationships
const cypherQuery = `
MATCH (e:Entity)
WHERE e.name IN $entityNames
CALL {
WITH e
MATCH path = (e)-[r*1..${maxHops}]-(connected)
RETURN e as entity, relationships(path) as rels, nodes(path) as pathNodes
}
RETURN DISTINCT
entity.name as entityName,
entity.type as entityType,
entity.description as entityDescription,
[rel in rels | {
source: startNode(rel).name,
target: endNode(rel).name,
type: type(rel),
description: rel.description
}] as relationships,
[node in pathNodes | {
name: node.name,
type: node.type,
description: node.description
}] as connectedEntities
LIMIT 100
`;
const result = await session.run(cypherQuery, {
entityNames: entities
});
const allEntities = new Map<string, Entity>();
const allRelationships: Relationship[] = [];
for (const record of result.records) {
// Add main entity
const entityName = record.get('entityName');
if (!allEntities.has(entityName)) {
allEntities.set(entityName, {
name: entityName,
type: record.get('entityType'),
description: record.get('entityDescription')
});
}
// Add connected entities
const connected = record.get('connectedEntities') || [];
for (const node of connected) {
if (!allEntities.has(node.name)) {
allEntities.set(node.name, node);
}
}
// Add relationships
const rels = record.get('relationships') || [];
allRelationships.push(...rels);
}
return {
entities: Array.from(allEntities.values()),
relationships: allRelationships
};
} finally {
await session.close();
}
}
private async generateAnswer(
query: string,
context: { entities: Entity[], relationships: Relationship[] }
): Promise<string> {
// Format context for LLM
const entityContext = context.entities
.map(e => `- ${e.name} (${e.type}): ${e.description}`)
.join('\n');
const relationshipContext = context.relationships
.map(r => `- ${r.source} ${r.type} ${r.target}: ${r.description}`)
.join('\n');
const answerPrompt = `
You are answering a question using information from a knowledge graph.
QUESTION:
${query}
ENTITIES IN GRAPH:
${entityContext}
RELATIONSHIPS IN GRAPH:
${relationshipContext}
Provide a comprehensive answer to the question using the graph information above.
Include specific details from the entities and relationships.
If the graph doesn't contain enough information to answer fully, acknowledge this.
`;
const response = await this.openaiClient.chat.completions.create({
model: this.deploymentName,
messages: [
{ role: 'system', content: 'You answer questions using knowledge graph information.' },
{ role: 'user', content: answerPrompt }
],
temperature: 0.3
});
return response.choices[0].message.content;
}
async globalQuery(query: string): Promise<string> {
console.log(`\nProcessing global query: ${query}`);
// Step 1: Retrieve community summaries
const summaries = await this.retrieveCommunitySummaries();
console.log(`Retrieved ${summaries.length} community summaries`);
// Step 2: Generate answer from summaries
const answer = await this.generateGlobalAnswer(query, summaries);
return answer;
}
private async retrieveCommunitySummaries(): Promise<any[]> {
const session = this.neo4jDriver.session();
try {
// Retrieve all community summaries
const cypherQuery = `
MATCH (c:Community)
RETURN c.id as communityId,
c.title as title,
c.summary as summary,
c.level as level,
c.key_entities as keyEntities
ORDER BY c.level, c.id
`;
const result = await session.run(cypherQuery);
return result.records.map(record => ({
communityId: record.get('communityId'),
title: record.get('title'),
summary: record.get('summary'),
level: record.get('level'),
keyEntities: record.get('keyEntities')
}));
} finally {
await session.close();
}
}
private async generateGlobalAnswer(
query: string,
summaries: any[]
): Promise<string> {
// Format summaries for LLM
const summaryContext = summaries
.map(s => `\nCommunity: ${s.title}\n${s.summary}`)
.join('\n---');
const answerPrompt = `
You are answering a high-level question about a dataset using community summaries from a knowledge graph.
QUESTION:
${query}
COMMUNITY SUMMARIES:
${summaryContext}
Provide a comprehensive answer that synthesizes insights from the community summaries.
Organize your answer with clear themes and supporting details.
`;
const response = await this.openaiClient.chat.completions.create({
model: this.deploymentName,
messages: [
{ role: 'system', content: 'You answer high-level questions by synthesizing community summaries.' },
{ role: 'user', content: answerPrompt }
],
temperature: 0.5
});
return response.choices[0].message.content;
}
async close(): Promise<void> {
await this.neo4jDriver.close();
}
}
// Usage example
const graphRAG = new GraphRAGQuery(
'https://your-openai.openai.azure.com',
'your-openai-key',
'neo4j://localhost:7687',
'neo4j',
'password',
'gpt-4o'
);
// Local query example
const localAnswer = await graphRAG.localQuery(
'What partnerships has Microsoft announced with AI companies?',
maxHops: 2
);
console.log('\nLocal Query Answer:');
console.log(localAnswer);
// Global query example
const globalAnswer = await graphRAG.globalQuery(
'What are the main themes and trends in the technology industry?'
);
console.log('\nGlobal Query Answer:');
console.log(globalAnswer);
await graphRAG.close();This implementation uses Neo4j as the graph database, which provides native support for graph traversal through Cypher queries. The traverseGraph method finds entities matching the query and explores relationships up to maxHops away, capturing both direct and indirect connections. For global queries, the system retrieves pre-computed community summaries rather than traversing the graph, enabling efficient answering of dataset-wide questions.
Production Implementation with Azure
Deploying GraphRAG in production requires addressing indexing costs, query latency, and infrastructure complexity. The indexing phase typically costs $20-50 per million tokens processed depending on the model used (GPT-4 vs GPT-3.5-turbo). For a 10 million token dataset, initial indexing might cost $200-500 and take 2-4 hours. This upfront investment enables much more capable querying than traditional RAG.
Microsoft’s implementation provides several cost optimization strategies. LazyGraphRAG reduces costs by eliminating LLM-based knowledge graph extraction, instead using smaller local models to extract nouns and building community structures based on co-occurrence patterns. Community summaries are generated dynamically during queries rather than pre-computed. This approach reduces indexing costs by 70-80% while sacrificing some accuracy on complex queries.
Here is a complete C# implementation for Azure deployment:
using Azure.AI.OpenAI;
using Azure;
using Neo4j.Driver;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.Json;
using System.Threading.Tasks;
public class AzureGraphRAG : IAsyncDisposable
{
private readonly OpenAIClient _openAIClient;
private readonly IDriver _neo4jDriver;
private readonly string _deploymentName;
// Cost tracking
private int _totalTokensUsed;
private decimal _estimatedCost;
public AzureGraphRAG(
string openAIEndpoint,
string openAIKey,
string neo4jUri,
string neo4jUser,
string neo4jPassword,
string deploymentName = "gpt-4o")
{
_openAIClient = new OpenAIClient(
new Uri(openAIEndpoint),
new AzureKeyCredential(openAIKey)
);
_neo4jDriver = GraphDatabase.Driver(
neo4jUri,
AuthTokens.Basic(neo4jUser, neo4jPassword)
);
_deploymentName = deploymentName;
_totalTokensUsed = 0;
_estimatedCost = 0m;
}
public async Task<IndexingResult> IndexDocuments(
List<Document> documents,
int chunkSize = 1000,
int chunkOverlap = 200)
{
var startTime = DateTime.UtcNow;
var stats = new IndexingStats();
Console.WriteLine($"Starting indexing of {documents.Count} documents...");
await using var session = _neo4jDriver.AsyncSession();
foreach (var (doc, docIndex) in documents.Select((d, i) => (d, i)))
{
Console.WriteLine($"Processing document {docIndex + 1}/{documents.Count}...");
// Chunk document
var chunks = ChunkText(doc.Content, chunkSize, chunkOverlap);
foreach (var (chunk, chunkIndex) in chunks.Select((c, i) => (c, i)))
{
try
{
// Extract entities and relationships
var (entities, relationships) = await ExtractEntitiesAndRelationships(
chunk,
$"{doc.Id}_chunk_{chunkIndex}"
);
// Store in Neo4j
await StoreInGraph(session, entities, relationships);
stats.EntitiesExtracted += entities.Count;
stats.RelationshipsExtracted += relationships.Count;
stats.ChunksProcessed++;
}
catch (Exception ex)
{
Console.WriteLine($"Error processing chunk: {ex.Message}");
stats.ErrorsEncountered++;
}
}
}
// Detect communities
Console.WriteLine("\nDetecting communities...");
var communityCount = await DetectCommunities(session);
stats.CommunitiesDetected = communityCount;
// Generate community summaries
Console.WriteLine("Generating community summaries...");
await GenerateCommunitySummaries(session);
var elapsed = DateTime.UtcNow - startTime;
return new IndexingResult
{
Stats = stats,
Duration = elapsed,
TotalTokensUsed = _totalTokensUsed,
EstimatedCost = _estimatedCost
};
}
private async Task<(List<Entity>, List<Relationship>)> ExtractEntitiesAndRelationships(
string text,
string chunkId)
{
var extractionPrompt = $@"
Extract entities and relationships from the text below.
Entity types: Person, Organization, Location, Technology, Product, Concept, Event
Relationship types: WORKS_FOR, LOCATED_IN, USES, PART_OF, RELATES_TO, DEVELOPED_BY
TEXT:
{text}
Return JSON:
{{
""entities"": [{{""name"": """", ""type"": """", ""description"": """"}}],
""relationships"": [{{""source"": """", ""target"": """", ""type"": """", ""description"": """"}}]
}}";
var chatCompletionsOptions = new ChatCompletionsOptions
{
DeploymentName = _deploymentName,
Messages =
{
new ChatRequestSystemMessage("You extract structured information and return JSON."),
new ChatRequestUserMessage(extractionPrompt)
},
Temperature = 0.1f,
ResponseFormat = ChatCompletionsResponseFormat.JsonObject
};
var response = await _openAIClient.GetChatCompletionsAsync(chatCompletionsOptions);
var result = response.Value;
// Track token usage
_totalTokensUsed += result.Usage.TotalTokens;
_estimatedCost += CalculateCost(result.Usage.TotalTokens);
var content = result.Choices[0].Message.Content;
var extraction = JsonSerializer.Deserialize<ExtractionResult>(content);
var entities = extraction.Entities.Select(e => new Entity
{
Name = e.Name,
Type = e.Type,
Description = e.Description,
SourceChunk = chunkId
}).ToList();
var relationships = extraction.Relationships.Select(r => new Relationship
{
Source = r.Source,
Target = r.Target,
Type = r.Type,
Description = r.Description,
SourceChunk = chunkId
}).ToList();
return (entities, relationships);
}
private async Task StoreInGraph(
IAsyncSession session,
List<Entity> entities,
List<Relationship> relationships)
{
// Store entities
foreach (var entity in entities)
{
await session.ExecuteWriteAsync(async tx =>
{
var query = @"
MERGE (e:Entity {name: $name})
ON CREATE SET e.type = $type, e.description = $description, e.source_chunks = [$sourceChunk]
ON MATCH SET e.source_chunks = e.source_chunks + $sourceChunk";
await tx.RunAsync(query, new
{
name = entity.Name,
type = entity.Type,
description = entity.Description,
sourceChunk = entity.SourceChunk
});
});
}
// Store relationships
foreach (var rel in relationships)
{
await session.ExecuteWriteAsync(async tx =>
{
var query = $@"
MATCH (source:Entity {{name: $source}})
MATCH (target:Entity {{name: $target}})
MERGE (source)-[r:{rel.Type}]->(target)
ON CREATE SET r.description = $description, r.source_chunks = [$sourceChunk]
ON MATCH SET r.source_chunks = r.source_chunks + $sourceChunk";
await tx.RunAsync(query, new
{
source = rel.Source,
target = rel.Target,
description = rel.Description,
sourceChunk = rel.SourceChunk
});
});
}
}
private async Task<int> DetectCommunities(IAsyncSession session)
{
// Use Louvain algorithm in Neo4j
var query = @"
CALL gds.louvain.write('myGraph', {
writeProperty: 'community',
includeIntermediateCommunities: true
})
YIELD communityCount
RETURN communityCount";
var result = await session.ExecuteReadAsync(async tx =>
{
var cursor = await tx.RunAsync(query);
var record = await cursor.SingleAsync();
return record["communityCount"].As<int>();
});
return result;
}
private async Task GenerateCommunitySummaries(IAsyncSession session)
{
// Get communities
var communities = await session.ExecuteReadAsync(async tx =>
{
var query = @"
MATCH (e:Entity)
WITH e.community as communityId, collect(e) as entities
RETURN communityId, entities
LIMIT 50";
var cursor = await tx.RunAsync(query);
var records = await cursor.ToListAsync();
return records.Select(r => new
{
CommunityId = r["communityId"].As<int>(),
Entities = r["entities"].As<List<INode>>()
}).ToList();
});
// Generate summary for each community
foreach (var community in communities)
{
var entityDescriptions = string.Join("\n",
community.Entities.Take(50).Select(e =>
$"- {e["name"].As<string>()} ({e["type"].As<string>()}): {e["description"].As<string>()}"
)
);
var summaryPrompt = $@"
Analyze these entities from a semantic community and provide:
1. Title (5-10 words)
2. Summary (2-3 paragraphs)
3. Key entities (top 5)
ENTITIES:
{entityDescriptions}
Return JSON:
{{""title"": """", ""summary"": """", ""key_entities"": []}}";
var chatOptions = new ChatCompletionsOptions
{
DeploymentName = _deploymentName,
Messages =
{
new ChatRequestSystemMessage("You analyze entity clusters."),
new ChatRequestUserMessage(summaryPrompt)
},
Temperature = 0.3f,
ResponseFormat = ChatCompletionsResponseFormat.JsonObject
};
var response = await _openAIClient.GetChatCompletionsAsync(chatOptions);
var summaryData = JsonSerializer.Deserialize<CommunitySummary>(
response.Value.Choices[0].Message.Content
);
// Store summary
await session.ExecuteWriteAsync(async tx =>
{
var query = @"
MERGE (c:Community {id: $communityId})
SET c.title = $title,
c.summary = $summary,
c.key_entities = $keyEntities";
await tx.RunAsync(query, new
{
communityId = community.CommunityId,
title = summaryData.Title,
summary = summaryData.Summary,
keyEntities = summaryData.KeyEntities
});
});
}
}
private List<string> ChunkText(string text, int size, int overlap)
{
var chunks = new List<string>();
var start = 0;
while (start < text.Length)
{
var end = Math.Min(start + size, text.Length);
chunks.Add(text.Substring(start, end - start));
start += size - overlap;
}
return chunks;
}
private decimal CalculateCost(int tokens)
{
// GPT-4o pricing: $5 per 1M input tokens, $15 per 1M output tokens
// Simplified: assume 60/40 split
return (tokens * 0.00001m);
}
public async ValueTask DisposeAsync()
{
await _neo4jDriver?.DisposeAsync();
}
}
// Supporting classes
public class Document
{
public string Id { get; set; }
public string Content { get; set; }
}
public class Entity
{
public string Name { get; set; }
public string Type { get; set; }
public string Description { get; set; }
public string SourceChunk { get; set; }
}
public class Relationship
{
public string Source { get; set; }
public string Target { get; set; }
public string Type { get; set; }
public string Description { get; set; }
public string SourceChunk { get; set; }
}
public class ExtractionResult
{
public List<EntityData> Entities { get; set; }
public List<RelationshipData> Relationships { get; set; }
}
public class EntityData
{
public string Name { get; set; }
public string Type { get; set; }
public string Description { get; set; }
}
public class RelationshipData
{
public string Source { get; set; }
public string Target { get; set; }
public string Type { get; set; }
public string Description { get; set; }
}
public class CommunitySummary
{
public string Title { get; set; }
public string Summary { get; set; }
public List<string> KeyEntities { get; set; }
}
public class IndexingStats
{
public int ChunksProcessed { get; set; }
public int EntitiesExtracted { get; set; }
public int RelationshipsExtracted { get; set; }
public int CommunitiesDetected { get; set; }
public int ErrorsEncountered { get; set; }
}
public class IndexingResult
{
public IndexingStats Stats { get; set; }
public TimeSpan Duration { get; set; }
public int TotalTokensUsed { get; set; }
public decimal EstimatedCost { get; set; }
}
// Usage
var graphRAG = new AzureGraphRAG(
"https://your-openai.openai.azure.com",
"your-key",
"neo4j://localhost:7687",
"neo4j",
"password"
);
var documents = new List<Document>
{
new Document
{
Id = "doc1",
Content = "Your document content here..."
}
};
var result = await graphRAG.IndexDocuments(documents);
Console.WriteLine($"\nIndexing Complete:");
Console.WriteLine($" Duration: {result.Duration.TotalMinutes:F1} minutes");
Console.WriteLine($" Entities: {result.Stats.EntitiesExtracted}");
Console.WriteLine($" Relationships: {result.Stats.RelationshipsExtracted}");
Console.WriteLine($" Communities: {result.Stats.CommunitiesDetected}");
Console.WriteLine($" Tokens Used: {result.TotalTokensUsed:N0}");
Console.WriteLine($" Estimated Cost: ${result.EstimatedCost:F2}");This production implementation includes comprehensive cost tracking, error handling, and batch processing. The CalculateCost method estimates spending based on token usage, enabling budget monitoring during indexing. The implementation uses Neo4j’s native Louvain algorithm through the Graph Data Science library for efficient community detection at scale.
Performance Characteristics and Optimization
GraphRAG indexing costs are substantial but front-loaded, while query costs remain comparable to traditional RAG. A 1 million token dataset might cost $20-50 to index but only $0.01-0.05 per query depending on whether it is local or global. For applications with high query volume relative to indexing frequency, GraphRAG economics are favorable despite higher upfront costs.
Query latency depends primarily on graph database performance and community summary retrieval speed. Local queries with 2-hop traversal typically complete in 100-300ms on properly indexed graphs with millions of entities. Global queries access pre-computed summaries and complete in 50-150ms, faster than local queries despite synthesizing more information. This performance enables interactive applications where GraphRAG was previously considered too slow.
Optimization strategies focus on three areas. First, caching frequently accessed subgraphs and community summaries reduces database load by 60-80%. Second, hierarchical querying starts with high-level community summaries and drills down only when needed, reducing token usage by 40-50% compared to always retrieving full context. Third, hybrid approaches combine GraphRAG for complex queries with traditional RAG for simple lookups, optimizing cost and performance based on query complexity.
When to Use GraphRAG vs Traditional RAG
GraphRAG excels at three specific scenarios that traditional RAG handles poorly. First, multi-hop reasoning questions that require connecting information across multiple documents through shared entities. Second, global queries asking about themes, trends, or comprehensive summaries of entire datasets. Third, exploratory analysis where users need to understand information structure before formulating specific questions.
Traditional RAG remains superior for simple fact retrieval from clearly defined sources, questions with obvious keywords that enable effective vector search, and scenarios where indexing costs cannot be justified. A hybrid architecture using both approaches provides optimal results, routing queries to GraphRAG when they require relationship understanding or global synthesis, and to traditional RAG for straightforward lookups.
The decision often comes down to query complexity distribution. If 80% of queries are simple lookups and 20% require complex reasoning, implementing GraphRAG for the entire corpus may not be cost-effective. However, if 40% or more of queries benefit from relationship understanding, GraphRAG typically justifies its costs through substantially improved answer quality and user satisfaction.
Key Takeaways
GraphRAG represents a fundamental architecture shift from document similarity to relationship understanding. By constructing knowledge graphs with explicit entities, relationships, and hierarchical communities, it enables answering complex questions that traditional RAG cannot handle. Multi-hop reasoning accuracy improves from 23% with traditional RAG to 87% with GraphRAG, while global queries become answerable for the first time.
Production implementation requires careful consideration of indexing costs, typically $20-50 per million tokens, and infrastructure complexity with graph databases like Neo4j. However, query costs remain comparable to traditional RAG at $0.01-0.05 per query, making GraphRAG economically viable for applications with high query volume.
The optimal strategy for most organizations is a hybrid architecture that routes simple queries to traditional RAG and complex queries to GraphRAG. This approach balances cost, performance, and capability, delivering exceptional results on complex questions while maintaining efficiency on straightforward lookups.
The next part examines production deployment patterns, covering infrastructure architecture, scaling strategies, monitoring approaches, and operational best practices for running vector database and GraphRAG systems in production environments.
References
- Microsoft – “GraphRAG Documentation”
- Microsoft Research – “GraphRAG: Unlocking LLM discovery on narrative private data”
- Microsoft Research – “GraphRAG: New tool for complex data discovery now on GitHub”
- GitHub – “microsoft/graphrag”
- IBM – “What is GraphRAG?”
- RAG About It – “The GraphRAG Revolution”
- Jorge Arango – “Seeing the Forest: Using Graph RAG for Information Architecture”
- MarkTechPost – “Microsoft Research Introduces GraphRAG”
- RAGFlow – “The Rise and Evolution of RAG in 2024”
