Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: SQL Property Graph Queries (SQL/PGQ)
Date
Msg-id CAEG8a3+fdtHSEUGCd7o74ChO+wq4yV07NyjTh_dOmuBBye_1Wg@mail.gmail.com
Whole thread Raw
In response to Re: SQL Property Graph Queries (SQL/PGQ)  (Henson Choi <assam258@gmail.com>)
Responses Re: SQL Property Graph Queries (SQL/PGQ)
List pgsql-hackers
On Mon, Jan 5, 2026 at 9:56 PM Henson Choi <assam258@gmail.com> wrote:
>
> Hi hackers,
>
> I think it may be a bit early for this, but I wanted to share this design
> document now so we can revisit and discuss when the time is right.
>
> ===================================================================
> Variable Length Edge (VLE) Implementation Strategies for PostgreSQL
> ===================================================================
>
> 1. Overview
> -----------
>
> This document describes two implementation strategies for Variable
> Length Edge (VLE) traversal in PostgreSQL-based graph databases.
>
> VLE Query Example:
>
>     MATCH (a)-[*1..5]->(b) RETURN a, b
>     MATCH p = (a)-[:knows*]->(b) RETURN p
>
> Two implementation approaches:
>
>     1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
>     2. Optimized:   Custom executor node with DFS algorithm
>

Interesting ideas. I was thinking about expanding the edge pattern using the
lower and upper but it seems it can not handle the * case.

ISTM these two approaches can both work when we need runtime
pruning, for example if we are going to support path mode.

The first option can be costly, since you have to pre calculate the recursive
CTE, it can not utilize the predicate from previous hops. Can it work with
quantifier * case?

The second option seems more promising, but it needs to touch the planner
and executor. I guess you need a special node to represent the VLE in
the rewrite phase.

>
> 2. Unoptimized Approach: WITH RECURSIVE Rewriting
> -------------------------------------------------
>
> 2.1 Query Transformation
>
> Cypher query:
>
>     MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name = 'Alice' RETURN b
>
> Rewritten SQL using WITH RECURSIVE:
>
>     WITH RECURSIVE vle_paths AS (
>         -- Base case: start vertices
>         SELECT v.id as start_vid,
>                v.id as end_vid,
>                ARRAY[] as edge_ids,
>                ARRAY[] as edges,
>                0 as depth
>         FROM person v
>         WHERE name = 'Alice'
>
>         UNION ALL
>
>         -- Recursive case: expand one hop
>         SELECT p.start_vid,
>                e.end_id,
>                p.edge_ids || e.id,
>                p.edges || e,
>                p.depth + 1
>         FROM vle_paths p
>         JOIN knows e ON p.end_vid = e.start_id
>         WHERE p.depth < 3
>           AND e.id <> ALL(p.edge_ids)  -- Edge uniqueness check
>     )
>     SEARCH DEPTH FIRST BY end_vid SET ordercol
>     SELECT b.*, edges
>     FROM vle_paths
>     JOIN person b ON b.id = end_vid
>     WHERE depth BETWEEN 1 AND 3;
>
> 2.2 Execution Plan
>
>     Hash Join  (b vertex lookup)
>       Hash Cond: (end_vid = b.id)
>       ->  CTE Scan on vle_paths
>             Filter: (depth >= 1 AND depth <= 3)
>             Sort Key: ordercol  -- SEARCH DEPTH FIRST ordering
>             CTE vle_paths
>               Recursive Union
>                 Seq Scan on person v
>                   Filter: (name = 'Alice')
>                 Hash Join
>                   Hash Cond: (p.end_vid = e.start_id)
>                   Filter: (e.id <> ALL(p.edge_ids)) AND (p.depth < 3)
>                   WorkTable Scan on vle_paths p
>                   Hash
>                     Seq Scan on knows e
>       ->  Hash
>             ->  Seq Scan on person b
>
> 2.3 Problems
>
>     ┌─────────────────────┬──────────────────────────────────────────┐
>     │ Problem             │ Description                              │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Execution Model     │ BFS execution (set-based iteration)      │
>     │                     │ SEARCH DEPTH FIRST changes output order  │
>     │                     │ only, not execution or memory model      │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ No Backtracking     │ CTE stores all intermediate results      │
>     │                     │ Cannot restore previous state            │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Memory Usage        │ O(total_paths) - all partial paths       │
>     │                     │ Exponential growth with depth            │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Edge Uniqueness     │ e.id <> ALL(p.edge_ids)                  │
>     │                     │ O(path_length) comparison per row        │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ No Streaming        │ Must complete all iterations before      │
>     │                     │ returning results                        │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Array Operations    │ p.edge_ids || e.id creates new array     │
>     │                     │ O(n) copy per iteration                  │
>     └─────────────────────┴──────────────────────────────────────────┘
>
> 2.4 Memory Analysis
>
>     Graph: Tree with branching factor B=10, max depth D=5
>
>     Iteration 1: 10 rows      (depth 1)
>     Iteration 2: 100 rows     (depth 2)
>     Iteration 3: 1,000 rows   (depth 3)
>     Iteration 4: 10,000 rows  (depth 4)
>     Iteration 5: 100,000 rows (depth 5)
>
>     Total rows in WorkTable: 111,110
>     Each row contains: start_vid + end_vid + edge_ids array + edges array
>
>     Memory = 111,110 * (8 + 8 + avg_path_len * 8 + avg_path_len * edge_size)
>
>
> 3. Optimized Approach: Custom Executor Node
> -------------------------------------------
>
> 3.1 Design Principle
>
> Instead of query rewriting, implement a dedicated executor node that:
>
>     - Performs true DFS traversal with backtracking
>     - Maintains only current path in memory
>     - Uses PlanState stack for depth-wise edge scanning
>     - Yields results incrementally (streaming)
>
> 3.2 Plan Node Structure
>
>     ┌────────────────────────────────────────────────────────┐
>     │                 Nested Loop (b lookup)                 │
>     └────────────────────────────────────────────────────────┘
>                    │                           │
>                    │ outer                     │ inner
>                    ▼                           ▼
>     ┌──────────────────────────────┐   ┌─────────────────────────┐
>     │        VLEPlan Node          │   │   IndexScan on person   │
>     │  - DFS with PlanState stack  │   │   (b vertex lookup)     │
>     │  - O(depth) memory           │   └─────────────────────────┘
>     └──────────────────────────────┘
>            │           │
>      outer │           │ inner (edge subplan)
>            ▼           ▼
>     ┌───────────┐  ┌───────────────────────────┐
>     │ IndexScan │  │ IndexScan on knows        │
>     │ (person a)│  │ (re-scanned per vertex)   │
>     └───────────┘  └───────────────────────────┘
>
>     Execution Flow:
>
>     Outer plan returns vertex A
>            │
>            ▼
>     VLEPlan performs DFS:
>            │
>            ├──→ Push PlanState for depth 0 (edges from A)
>            │         │
>            │         ├──→ Find edge A->B, push PlanState for depth 1
>            │         │         │
>            │         │         ├──→ Find edge B->D, push PlanState for depth 2
>            │         │         │         │
>            │         │         │         └──→ No more edges, pop PlanState
>            │         │         │
>            │         │         ├──→ Continue B's PlanState, find B->E
>            │         │         │
>            │         │         └──→ Pop PlanState (B exhausted)
>            │         │
>            │         └──→ Continue A's PlanState, find A->C
>            │
>            └──→ Yield results when depth in [min, max]
>
>
> 3.3 Key Concept: PlanState Stack
>
>     The executor maintains a stack of inner subplan states (PlanState),
>     one per depth level. Each PlanState holds the scan position for
>     edges from that depth's vertex.
>
>     At each depth:
>         - Re-scan inner subplan with current vertex as parameter
>         - On finding an edge, push new PlanState for next depth
>         - On exhausting edges, pop PlanState and backtrack
>
>     Memory: Only current path + one PlanState per depth level
>
>
> 3.4 Execution Plan (Optimized)
>
>     EXPLAIN MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name='Alice' RETURN b
>
>     Nested Loop  (b vertex lookup)
>       ->  VLEPlan  (minDepth=1 maxDepth=3 direction=RIGHT)
>             ->  Index Scan on person a  (outer: start vertices)
>                   Index Cond: (name = 'Alice')
>             ->  Index Scan on knows e   (inner: edge subplan)
>                   Index Cond: (start_id = $current_vertex)
>       ->  Index Scan on person b
>             Index Cond: (id = vle.end_vertex)
>
>
> 4. Comparison Summary
> ---------------------
>
>     ┌─────────────────────┬─────────────────────────┬─────────────────────────┐
>     │ Aspect              │ Unoptimized (CTE)       │ Optimized (Executor)    │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Implementation      │ Query rewriting         │ Custom executor node    │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Traversal Order     │ BFS (set-based)         │ True DFS                │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Backtracking        │ Not possible            │ PlanState stack pop/push│
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Memory Usage        │ O(B^D) all paths        │ O(D) current path only  │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Edge Uniqueness     │ <> ALL(array) O(n)      │ Array scan O(n)         │
>     │                     │ + array copy per row    │ No array copy           │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Result Streaming    │ After all iterations    │ Yield as found          │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Code Complexity     │ Low (rewriting only)    │ High (new executor)     │
>     ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ PostgreSQL Changes  │ None                    │ New plan/executor node  │
>     └─────────────────────┴─────────────────────────┴─────────────────────────┘
>
>
> 5. Memory Usage Example
> -----------------------
>
>     Graph: Social network, avg 100 friends per person
>     Query: MATCH (a)-[:friend*1..4]->(b) RETURN b
>
>     Unoptimized (CTE):
>         Depth 1: 100 rows
>         Depth 2: 10,000 rows
>         Depth 3: 1,000,000 rows
>         Depth 4: 100,000,000 rows (!)
>
>         Total: ~101 million partial paths in memory
>         Each path: vertex IDs + edge arrays (growing)
>
>     Optimized (DFS Executor):
>         PlanState stack: max 4 PlanState (~1KB each)
>         Path arrays: max 4 edges
>         Total: ~10KB regardless of graph size
>
>
> 6. Key Techniques Summary
> -------------------------
>
>     ┌────┬───────────────────────────┬────────────────────────────────────────┐
>     │ #  │ Technique                 │ Benefit                                │
>     ├────┼───────────────────────────┼────────────────────────────────────────┤
>     │ 1  │ Non-Directional Handling  │ Support -- (bidirectional) edges       │
>     │ 2  │ Multi-Label Scan          │ Handle label inheritance               │
>     │ 3  │ PlanState Stack Backtrack │ True DFS with O(depth) memory          │
>     │ 4  │ Edge Uniqueness Check     │ Prevent cycles efficiently             │
>     │ 5  │ Property Filtering        │ Early pruning during scan              │
>     │ 6  │ Direction-Aware Traversal │ Support ->, <-, -- correctly           │
>     │ 7  │ Result Streaming          │ Return results without full compute    │
>     │ 8  │ Depth Expansion           │ Continue traversal to max depth        │
>     └────┴───────────────────────────┴────────────────────────────────────────┘
>
>
> 7. Conclusion
> -------------
>
> The unoptimized CTE-based approach is simple to implement (query
> rewriting only) but suffers from:
>
>     - Exponential memory growth with depth
>     - No true DFS traversal capability
>     - No result streaming
>     - Expensive array operations per row
>
> The optimized executor-based approach requires more implementation
> effort but provides:
>
>     - O(depth) memory usage regardless of graph size
>     - True DFS with backtracking capability
>     - Streaming results as they are discovered
>     - Efficient PlanState-based edge scanning
>     - Clean integration with PostgreSQL executor framework
>
> For production graph database systems, the optimized approach is
> essential for handling real-world graph sizes and traversal depths.
>
> This document focuses on the conceptual design. Implementation details
> such as PlanState cloning vs. scan position saving, memory allocation
> strategies, and index structures are left for follow-up discussion.



-- 
Regards
Junwang Zhao

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Decouple C++ support in Meson's PGXS from LLVM enablement
Next
From: Dilip Kumar
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication