Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers
| From | Henson Choi |
|---|---|
| Subject | Re: SQL Property Graph Queries (SQL/PGQ) |
| Date | |
| Msg-id | CAAAe_zAJz1nP3h2vHNi+wgN_cmzRcNpSHrUj8tRyP+VPexUwcg@mail.gmail.com Whole thread Raw |
| In response to | Re: SQL Property Graph Queries (SQL/PGQ) (Henson Choi <assam258@gmail.com>) |
| Responses |
Re: SQL Property Graph Queries (SQL/PGQ)
|
| List | pgsql-hackers |
Hi hackers,
I think it may be a bit early for this, but I wanted to share this design
document now so we can revisit and discuss when the time is right.
===================================================================
Variable Length Edge (VLE) Implementation Strategies for PostgreSQL
===================================================================
1. Overview
-----------
This document describes two implementation strategies for Variable
Length Edge (VLE) traversal in PostgreSQL-based graph databases.
VLE Query Example:
MATCH (a)-[*1..5]->(b) RETURN a, b
MATCH p = (a)-[:knows*]->(b) RETURN p
Two implementation approaches:
1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
2. Optimized: Custom executor node with DFS algorithm
2. Unoptimized Approach: WITH RECURSIVE Rewriting
-------------------------------------------------
2.1 Query Transformation
Cypher query:
MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name = 'Alice' RETURN b
Rewritten SQL using WITH RECURSIVE:
WITH RECURSIVE vle_paths AS (
-- Base case: start vertices
SELECT v.id as start_vid,
v.id as end_vid,
ARRAY[] as edge_ids,
ARRAY[] as edges,
0 as depth
FROM person v
WHERE name = 'Alice'
UNION ALL
-- Recursive case: expand one hop
SELECT p.start_vid,
e.end_id,
p.edge_ids || e.id,
p.edges || e,
p.depth + 1
FROM vle_paths p
JOIN knows e ON p.end_vid = e.start_id
WHERE p.depth < 3
AND e.id <> ALL(p.edge_ids) -- Edge uniqueness check
)
SEARCH DEPTH FIRST BY end_vid SET ordercol
SELECT b.*, edges
FROM vle_paths
JOIN person b ON b.id = end_vid
WHERE depth BETWEEN 1 AND 3;
2.2 Execution Plan
Hash Join (b vertex lookup)
Hash Cond: (end_vid = b.id)
-> CTE Scan on vle_paths
Filter: (depth >= 1 AND depth <= 3)
Sort Key: ordercol -- SEARCH DEPTH FIRST ordering
CTE vle_paths
Recursive Union
Seq Scan on person v
Filter: (name = 'Alice')
Hash Join
Hash Cond: (p.end_vid = e.start_id)
Filter: (e.id <> ALL(p.edge_ids)) AND (p.depth < 3)
WorkTable Scan on vle_paths p
Hash
Seq Scan on knows e
-> Hash
-> Seq Scan on person b
2.3 Problems
┌─────────────────────┬──────────────────────────────────────────┐
│ Problem │ Description │
├─────────────────────┼──────────────────────────────────────────┤
│ Execution Model │ BFS execution (set-based iteration) │
│ │ SEARCH DEPTH FIRST changes output order │
│ │ only, not execution or memory model │
├─────────────────────┼──────────────────────────────────────────┤
│ No Backtracking │ CTE stores all intermediate results │
│ │ Cannot restore previous state │
├─────────────────────┼──────────────────────────────────────────┤
│ Memory Usage │ O(total_paths) - all partial paths │
│ │ Exponential growth with depth │
├─────────────────────┼──────────────────────────────────────────┤
│ Edge Uniqueness │ e.id <> ALL(p.edge_ids) │
│ │ O(path_length) comparison per row │
├─────────────────────┼──────────────────────────────────────────┤
│ No Streaming │ Must complete all iterations before │
│ │ returning results │
├─────────────────────┼──────────────────────────────────────────┤
│ Array Operations │ p.edge_ids || e.id creates new array │
│ │ O(n) copy per iteration │
└─────────────────────┴──────────────────────────────────────────┘
2.4 Memory Analysis
Graph: Tree with branching factor B=10, max depth D=5
Iteration 1: 10 rows (depth 1)
Iteration 2: 100 rows (depth 2)
Iteration 3: 1,000 rows (depth 3)
Iteration 4: 10,000 rows (depth 4)
Iteration 5: 100,000 rows (depth 5)
Total rows in WorkTable: 111,110
Each row contains: start_vid + end_vid + edge_ids array + edges array
Memory = 111,110 * (8 + 8 + avg_path_len * 8 + avg_path_len * edge_size)
3. Optimized Approach: Custom Executor Node
-------------------------------------------
3.1 Design Principle
Instead of query rewriting, implement a dedicated executor node that:
- Performs true DFS traversal with backtracking
- Maintains only current path in memory
- Uses PlanState stack for depth-wise edge scanning
- Yields results incrementally (streaming)
3.2 Plan Node Structure
┌────────────────────────────────────────────────────────┐
│ Nested Loop (b lookup) │
└────────────────────────────────────────────────────────┘
│ │
│ outer │ inner
▼ ▼
┌──────────────────────────────┐ ┌─────────────────────────┐
│ VLEPlan Node │ │ IndexScan on person │
│ - DFS with PlanState stack │ │ (b vertex lookup) │
│ - O(depth) memory │ └─────────────────────────┘
└──────────────────────────────┘
│ │
outer │ │ inner (edge subplan)
▼ ▼
┌───────────┐ ┌───────────────────────────┐
│ IndexScan │ │ IndexScan on knows │
│ (person a)│ │ (re-scanned per vertex) │
└───────────┘ └───────────────────────────┘
Execution Flow:
Outer plan returns vertex A
│
▼
VLEPlan performs DFS:
│
├──→ Push PlanState for depth 0 (edges from A)
│ │
│ ├──→ Find edge A->B, push PlanState for depth 1
│ │ │
│ │ ├──→ Find edge B->D, push PlanState for depth 2
│ │ │ │
│ │ │ └──→ No more edges, pop PlanState
│ │ │
│ │ ├──→ Continue B's PlanState, find B->E
│ │ │
│ │ └──→ Pop PlanState (B exhausted)
│ │
│ └──→ Continue A's PlanState, find A->C
│
└──→ Yield results when depth in [min, max]
3.3 Key Concept: PlanState Stack
The executor maintains a stack of inner subplan states (PlanState),
one per depth level. Each PlanState holds the scan position for
edges from that depth's vertex.
At each depth:
- Re-scan inner subplan with current vertex as parameter
- On finding an edge, push new PlanState for next depth
- On exhausting edges, pop PlanState and backtrack
Memory: Only current path + one PlanState per depth level
3.4 Execution Plan (Optimized)
EXPLAIN MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name='Alice' RETURN b
Nested Loop (b vertex lookup)
-> VLEPlan (minDepth=1 maxDepth=3 direction=RIGHT)
-> Index Scan on person a (outer: start vertices)
Index Cond: (name = 'Alice')
-> Index Scan on knows e (inner: edge subplan)
Index Cond: (start_id = $current_vertex)
-> Index Scan on person b
Index Cond: (id = vle.end_vertex)
4. Comparison Summary
---------------------
┌─────────────────────┬─────────────────────────┬─────────────────────────┐
│ Aspect │ Unoptimized (CTE) │ Optimized (Executor) │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Implementation │ Query rewriting │ Custom executor node │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Traversal Order │ BFS (set-based) │ True DFS │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Backtracking │ Not possible │ PlanState stack pop/push│
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Memory Usage │ O(B^D) all paths │ O(D) current path only │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Edge Uniqueness │ <> ALL(array) O(n) │ Array scan O(n) │
│ │ + array copy per row │ No array copy │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Result Streaming │ After all iterations │ Yield as found │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Code Complexity │ Low (rewriting only) │ High (new executor) │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ PostgreSQL Changes │ None │ New plan/executor node │
└─────────────────────┴─────────────────────────┴─────────────────────────┘
5. Memory Usage Example
-----------------------
Graph: Social network, avg 100 friends per person
Query: MATCH (a)-[:friend*1..4]->(b) RETURN b
Unoptimized (CTE):
Depth 1: 100 rows
Depth 2: 10,000 rows
Depth 3: 1,000,000 rows
Depth 4: 100,000,000 rows (!)
Total: ~101 million partial paths in memory
Each path: vertex IDs + edge arrays (growing)
Optimized (DFS Executor):
PlanState stack: max 4 PlanState (~1KB each)
Path arrays: max 4 edges
Total: ~10KB regardless of graph size
6. Key Techniques Summary
-------------------------
┌────┬───────────────────────────┬────────────────────────────────────────┐
│ # │ Technique │ Benefit │
├────┼───────────────────────────┼────────────────────────────────────────┤
│ 1 │ Non-Directional Handling │ Support -- (bidirectional) edges │
│ 2 │ Multi-Label Scan │ Handle label inheritance │
│ 3 │ PlanState Stack Backtrack │ True DFS with O(depth) memory │
│ 4 │ Edge Uniqueness Check │ Prevent cycles efficiently │
│ 5 │ Property Filtering │ Early pruning during scan │
│ 6 │ Direction-Aware Traversal │ Support ->, <-, -- correctly │
│ 7 │ Result Streaming │ Return results without full compute │
│ 8 │ Depth Expansion │ Continue traversal to max depth │
└────┴───────────────────────────┴────────────────────────────────────────┘
7. Conclusion
-------------
The unoptimized CTE-based approach is simple to implement (query
rewriting only) but suffers from:
- Exponential memory growth with depth
- No true DFS traversal capability
- No result streaming
- Expensive array operations per row
The optimized executor-based approach requires more implementation
effort but provides:
- O(depth) memory usage regardless of graph size
- True DFS with backtracking capability
- Streaming results as they are discovered
- Efficient PlanState-based edge scanning
- Clean integration with PostgreSQL executor framework
For production graph database systems, the optimized approach is
essential for handling real-world graph sizes and traversal depths.
This document focuses on the conceptual design. Implementation details
such as PlanState cloning vs. scan position saving, memory allocation
strategies, and index structures are left for follow-up discussion.
I think it may be a bit early for this, but I wanted to share this design
document now so we can revisit and discuss when the time is right.
===================================================================
Variable Length Edge (VLE) Implementation Strategies for PostgreSQL
===================================================================
1. Overview
-----------
This document describes two implementation strategies for Variable
Length Edge (VLE) traversal in PostgreSQL-based graph databases.
VLE Query Example:
MATCH (a)-[*1..5]->(b) RETURN a, b
MATCH p = (a)-[:knows*]->(b) RETURN p
Two implementation approaches:
1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
2. Optimized: Custom executor node with DFS algorithm
2. Unoptimized Approach: WITH RECURSIVE Rewriting
-------------------------------------------------
2.1 Query Transformation
Cypher query:
MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name = 'Alice' RETURN b
Rewritten SQL using WITH RECURSIVE:
WITH RECURSIVE vle_paths AS (
-- Base case: start vertices
SELECT v.id as start_vid,
v.id as end_vid,
ARRAY[] as edge_ids,
ARRAY[] as edges,
0 as depth
FROM person v
WHERE name = 'Alice'
UNION ALL
-- Recursive case: expand one hop
SELECT p.start_vid,
e.end_id,
p.edge_ids || e.id,
p.edges || e,
p.depth + 1
FROM vle_paths p
JOIN knows e ON p.end_vid = e.start_id
WHERE p.depth < 3
AND e.id <> ALL(p.edge_ids) -- Edge uniqueness check
)
SEARCH DEPTH FIRST BY end_vid SET ordercol
SELECT b.*, edges
FROM vle_paths
JOIN person b ON b.id = end_vid
WHERE depth BETWEEN 1 AND 3;
2.2 Execution Plan
Hash Join (b vertex lookup)
Hash Cond: (end_vid = b.id)
-> CTE Scan on vle_paths
Filter: (depth >= 1 AND depth <= 3)
Sort Key: ordercol -- SEARCH DEPTH FIRST ordering
CTE vle_paths
Recursive Union
Seq Scan on person v
Filter: (name = 'Alice')
Hash Join
Hash Cond: (p.end_vid = e.start_id)
Filter: (e.id <> ALL(p.edge_ids)) AND (p.depth < 3)
WorkTable Scan on vle_paths p
Hash
Seq Scan on knows e
-> Hash
-> Seq Scan on person b
2.3 Problems
┌─────────────────────┬──────────────────────────────────────────┐
│ Problem │ Description │
├─────────────────────┼──────────────────────────────────────────┤
│ Execution Model │ BFS execution (set-based iteration) │
│ │ SEARCH DEPTH FIRST changes output order │
│ │ only, not execution or memory model │
├─────────────────────┼──────────────────────────────────────────┤
│ No Backtracking │ CTE stores all intermediate results │
│ │ Cannot restore previous state │
├─────────────────────┼──────────────────────────────────────────┤
│ Memory Usage │ O(total_paths) - all partial paths │
│ │ Exponential growth with depth │
├─────────────────────┼──────────────────────────────────────────┤
│ Edge Uniqueness │ e.id <> ALL(p.edge_ids) │
│ │ O(path_length) comparison per row │
├─────────────────────┼──────────────────────────────────────────┤
│ No Streaming │ Must complete all iterations before │
│ │ returning results │
├─────────────────────┼──────────────────────────────────────────┤
│ Array Operations │ p.edge_ids || e.id creates new array │
│ │ O(n) copy per iteration │
└─────────────────────┴──────────────────────────────────────────┘
2.4 Memory Analysis
Graph: Tree with branching factor B=10, max depth D=5
Iteration 1: 10 rows (depth 1)
Iteration 2: 100 rows (depth 2)
Iteration 3: 1,000 rows (depth 3)
Iteration 4: 10,000 rows (depth 4)
Iteration 5: 100,000 rows (depth 5)
Total rows in WorkTable: 111,110
Each row contains: start_vid + end_vid + edge_ids array + edges array
Memory = 111,110 * (8 + 8 + avg_path_len * 8 + avg_path_len * edge_size)
3. Optimized Approach: Custom Executor Node
-------------------------------------------
3.1 Design Principle
Instead of query rewriting, implement a dedicated executor node that:
- Performs true DFS traversal with backtracking
- Maintains only current path in memory
- Uses PlanState stack for depth-wise edge scanning
- Yields results incrementally (streaming)
3.2 Plan Node Structure
┌────────────────────────────────────────────────────────┐
│ Nested Loop (b lookup) │
└────────────────────────────────────────────────────────┘
│ │
│ outer │ inner
▼ ▼
┌──────────────────────────────┐ ┌─────────────────────────┐
│ VLEPlan Node │ │ IndexScan on person │
│ - DFS with PlanState stack │ │ (b vertex lookup) │
│ - O(depth) memory │ └─────────────────────────┘
└──────────────────────────────┘
│ │
outer │ │ inner (edge subplan)
▼ ▼
┌───────────┐ ┌───────────────────────────┐
│ IndexScan │ │ IndexScan on knows │
│ (person a)│ │ (re-scanned per vertex) │
└───────────┘ └───────────────────────────┘
Execution Flow:
Outer plan returns vertex A
│
▼
VLEPlan performs DFS:
│
├──→ Push PlanState for depth 0 (edges from A)
│ │
│ ├──→ Find edge A->B, push PlanState for depth 1
│ │ │
│ │ ├──→ Find edge B->D, push PlanState for depth 2
│ │ │ │
│ │ │ └──→ No more edges, pop PlanState
│ │ │
│ │ ├──→ Continue B's PlanState, find B->E
│ │ │
│ │ └──→ Pop PlanState (B exhausted)
│ │
│ └──→ Continue A's PlanState, find A->C
│
└──→ Yield results when depth in [min, max]
3.3 Key Concept: PlanState Stack
The executor maintains a stack of inner subplan states (PlanState),
one per depth level. Each PlanState holds the scan position for
edges from that depth's vertex.
At each depth:
- Re-scan inner subplan with current vertex as parameter
- On finding an edge, push new PlanState for next depth
- On exhausting edges, pop PlanState and backtrack
Memory: Only current path + one PlanState per depth level
3.4 Execution Plan (Optimized)
EXPLAIN MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name='Alice' RETURN b
Nested Loop (b vertex lookup)
-> VLEPlan (minDepth=1 maxDepth=3 direction=RIGHT)
-> Index Scan on person a (outer: start vertices)
Index Cond: (name = 'Alice')
-> Index Scan on knows e (inner: edge subplan)
Index Cond: (start_id = $current_vertex)
-> Index Scan on person b
Index Cond: (id = vle.end_vertex)
4. Comparison Summary
---------------------
┌─────────────────────┬─────────────────────────┬─────────────────────────┐
│ Aspect │ Unoptimized (CTE) │ Optimized (Executor) │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Implementation │ Query rewriting │ Custom executor node │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Traversal Order │ BFS (set-based) │ True DFS │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Backtracking │ Not possible │ PlanState stack pop/push│
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Memory Usage │ O(B^D) all paths │ O(D) current path only │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Edge Uniqueness │ <> ALL(array) O(n) │ Array scan O(n) │
│ │ + array copy per row │ No array copy │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Result Streaming │ After all iterations │ Yield as found │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ Code Complexity │ Low (rewriting only) │ High (new executor) │
├─────────────────────┼─────────────────────────┼─────────────────────────┤
│ PostgreSQL Changes │ None │ New plan/executor node │
└─────────────────────┴─────────────────────────┴─────────────────────────┘
5. Memory Usage Example
-----------------------
Graph: Social network, avg 100 friends per person
Query: MATCH (a)-[:friend*1..4]->(b) RETURN b
Unoptimized (CTE):
Depth 1: 100 rows
Depth 2: 10,000 rows
Depth 3: 1,000,000 rows
Depth 4: 100,000,000 rows (!)
Total: ~101 million partial paths in memory
Each path: vertex IDs + edge arrays (growing)
Optimized (DFS Executor):
PlanState stack: max 4 PlanState (~1KB each)
Path arrays: max 4 edges
Total: ~10KB regardless of graph size
6. Key Techniques Summary
-------------------------
┌────┬───────────────────────────┬────────────────────────────────────────┐
│ # │ Technique │ Benefit │
├────┼───────────────────────────┼────────────────────────────────────────┤
│ 1 │ Non-Directional Handling │ Support -- (bidirectional) edges │
│ 2 │ Multi-Label Scan │ Handle label inheritance │
│ 3 │ PlanState Stack Backtrack │ True DFS with O(depth) memory │
│ 4 │ Edge Uniqueness Check │ Prevent cycles efficiently │
│ 5 │ Property Filtering │ Early pruning during scan │
│ 6 │ Direction-Aware Traversal │ Support ->, <-, -- correctly │
│ 7 │ Result Streaming │ Return results without full compute │
│ 8 │ Depth Expansion │ Continue traversal to max depth │
└────┴───────────────────────────┴────────────────────────────────────────┘
7. Conclusion
-------------
The unoptimized CTE-based approach is simple to implement (query
rewriting only) but suffers from:
- Exponential memory growth with depth
- No true DFS traversal capability
- No result streaming
- Expensive array operations per row
The optimized executor-based approach requires more implementation
effort but provides:
- O(depth) memory usage regardless of graph size
- True DFS with backtracking capability
- Streaming results as they are discovered
- Efficient PlanState-based edge scanning
- Clean integration with PostgreSQL executor framework
For production graph database systems, the optimized approach is
essential for handling real-world graph sizes and traversal depths.
This document focuses on the conceptual design. Implementation details
such as PlanState cloning vs. scan position saving, memory allocation
strategies, and index structures are left for follow-up discussion.
pgsql-hackers by date: