Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers

From Henson Choi
Subject Re: SQL Property Graph Queries (SQL/PGQ)
Date
Msg-id CAAAe_zAJz1nP3h2vHNi+wgN_cmzRcNpSHrUj8tRyP+VPexUwcg@mail.gmail.com
Whole thread Raw
In response to Re: SQL Property Graph Queries (SQL/PGQ)  (Henson Choi <assam258@gmail.com>)
Responses Re: SQL Property Graph Queries (SQL/PGQ)
List pgsql-hackers
Hi hackers,

I think it may be a bit early for this, but I wanted to share this design
document now so we can revisit and discuss when the time is right.

===================================================================
Variable Length Edge (VLE) Implementation Strategies for PostgreSQL
===================================================================

1. Overview
-----------

This document describes two implementation strategies for Variable
Length Edge (VLE) traversal in PostgreSQL-based graph databases.

VLE Query Example:

    MATCH (a)-[*1..5]->(b) RETURN a, b
    MATCH p = (a)-[:knows*]->(b) RETURN p

Two implementation approaches:

    1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
    2. Optimized:   Custom executor node with DFS algorithm


2. Unoptimized Approach: WITH RECURSIVE Rewriting
-------------------------------------------------

2.1 Query Transformation

Cypher query:

    MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name = 'Alice' RETURN b

Rewritten SQL using WITH RECURSIVE:

    WITH RECURSIVE vle_paths AS (
        -- Base case: start vertices
        SELECT v.id as start_vid,
               v.id as end_vid,
               ARRAY[] as edge_ids,
               ARRAY[] as edges,
               0 as depth
        FROM person v
        WHERE name = 'Alice'

        UNION ALL

        -- Recursive case: expand one hop
        SELECT p.start_vid,
               e.end_id,
               p.edge_ids || e.id,
               p.edges || e,
               p.depth + 1
        FROM vle_paths p
        JOIN knows e ON p.end_vid = e.start_id
        WHERE p.depth < 3
          AND e.id <> ALL(p.edge_ids)  -- Edge uniqueness check
    )
    SEARCH DEPTH FIRST BY end_vid SET ordercol
    SELECT b.*, edges
    FROM vle_paths
    JOIN person b ON b.id = end_vid
    WHERE depth BETWEEN 1 AND 3;

2.2 Execution Plan

    Hash Join  (b vertex lookup)
      Hash Cond: (end_vid = b.id)
      ->  CTE Scan on vle_paths
            Filter: (depth >= 1 AND depth <= 3)
            Sort Key: ordercol  -- SEARCH DEPTH FIRST ordering
            CTE vle_paths
              Recursive Union
                Seq Scan on person v
                  Filter: (name = 'Alice')
                Hash Join
                  Hash Cond: (p.end_vid = e.start_id)
                  Filter: (e.id <> ALL(p.edge_ids)) AND (p.depth < 3)
                  WorkTable Scan on vle_paths p
                  Hash
                    Seq Scan on knows e
      ->  Hash
            ->  Seq Scan on person b

2.3 Problems

    ┌─────────────────────┬──────────────────────────────────────────┐
    │ Problem             │ Description                              │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Execution Model     │ BFS execution (set-based iteration)      │
    │                     │ SEARCH DEPTH FIRST changes output order  │
    │                     │ only, not execution or memory model      │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ No Backtracking     │ CTE stores all intermediate results      │
    │                     │ Cannot restore previous state            │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Memory Usage        │ O(total_paths) - all partial paths       │
    │                     │ Exponential growth with depth            │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Edge Uniqueness     │ e.id <> ALL(p.edge_ids)                  │
    │                     │ O(path_length) comparison per row        │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ No Streaming        │ Must complete all iterations before      │
    │                     │ returning results                        │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Array Operations    │ p.edge_ids || e.id creates new array     │
    │                     │ O(n) copy per iteration                  │
    └─────────────────────┴──────────────────────────────────────────┘

2.4 Memory Analysis

    Graph: Tree with branching factor B=10, max depth D=5

    Iteration 1: 10 rows      (depth 1)
    Iteration 2: 100 rows     (depth 2)
    Iteration 3: 1,000 rows   (depth 3)
    Iteration 4: 10,000 rows  (depth 4)
    Iteration 5: 100,000 rows (depth 5)

    Total rows in WorkTable: 111,110
    Each row contains: start_vid + end_vid + edge_ids array + edges array

    Memory = 111,110 * (8 + 8 + avg_path_len * 8 + avg_path_len * edge_size)


3. Optimized Approach: Custom Executor Node
-------------------------------------------

3.1 Design Principle

Instead of query rewriting, implement a dedicated executor node that:

    - Performs true DFS traversal with backtracking
    - Maintains only current path in memory
    - Uses PlanState stack for depth-wise edge scanning
    - Yields results incrementally (streaming)

3.2 Plan Node Structure

    ┌────────────────────────────────────────────────────────┐
    │                 Nested Loop (b lookup)                 │
    └────────────────────────────────────────────────────────┘
                   │                           │
                   │ outer                     │ inner
                   ▼                           ▼
    ┌──────────────────────────────┐   ┌─────────────────────────┐
    │        VLEPlan Node          │   │   IndexScan on person   │
    │  - DFS with PlanState stack  │   │   (b vertex lookup)     │
    │  - O(depth) memory           │   └─────────────────────────┘
    └──────────────────────────────┘
           │           │
     outer │           │ inner (edge subplan)
           ▼           ▼
    ┌───────────┐  ┌───────────────────────────┐
    │ IndexScan │  │ IndexScan on knows        │
    │ (person a)│  │ (re-scanned per vertex)   │
    └───────────┘  └───────────────────────────┘

    Execution Flow:

    Outer plan returns vertex A
           │
           ▼
    VLEPlan performs DFS:
           │
           ├──→ Push PlanState for depth 0 (edges from A)
           │         │
           │         ├──→ Find edge A->B, push PlanState for depth 1
           │         │         │
           │         │         ├──→ Find edge B->D, push PlanState for depth 2
           │         │         │         │
           │         │         │         └──→ No more edges, pop PlanState
           │         │         │
           │         │         ├──→ Continue B's PlanState, find B->E
           │         │         │
           │         │         └──→ Pop PlanState (B exhausted)
           │         │
           │         └──→ Continue A's PlanState, find A->C
           │
           └──→ Yield results when depth in [min, max]


3.3 Key Concept: PlanState Stack

    The executor maintains a stack of inner subplan states (PlanState),
    one per depth level. Each PlanState holds the scan position for
    edges from that depth's vertex.

    At each depth:
        - Re-scan inner subplan with current vertex as parameter
        - On finding an edge, push new PlanState for next depth
        - On exhausting edges, pop PlanState and backtrack

    Memory: Only current path + one PlanState per depth level


3.4 Execution Plan (Optimized)

    EXPLAIN MATCH (a:person)-[:knows*1..3]->(b) WHERE a.name='Alice' RETURN b

    Nested Loop  (b vertex lookup)
      ->  VLEPlan  (minDepth=1 maxDepth=3 direction=RIGHT)
            ->  Index Scan on person a  (outer: start vertices)
                  Index Cond: (name = 'Alice')
            ->  Index Scan on knows e   (inner: edge subplan)
                  Index Cond: (start_id = $current_vertex)
      ->  Index Scan on person b
            Index Cond: (id = vle.end_vertex)


4. Comparison Summary
---------------------

    ┌─────────────────────┬─────────────────────────┬─────────────────────────┐
    │ Aspect              │ Unoptimized (CTE)       │ Optimized (Executor)    │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Implementation      │ Query rewriting         │ Custom executor node    │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Traversal Order     │ BFS (set-based)         │ True DFS                │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Backtracking        │ Not possible            │ PlanState stack pop/push│
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Memory Usage        │ O(B^D) all paths        │ O(D) current path only  │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Edge Uniqueness     │ <> ALL(array) O(n)      │ Array scan O(n)         │
    │                     │ + array copy per row    │ No array copy           │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Result Streaming    │ After all iterations    │ Yield as found          │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Code Complexity     │ Low (rewriting only)    │ High (new executor)     │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ PostgreSQL Changes  │ None                    │ New plan/executor node  │
    └─────────────────────┴─────────────────────────┴─────────────────────────┘


5. Memory Usage Example
-----------------------

    Graph: Social network, avg 100 friends per person
    Query: MATCH (a)-[:friend*1..4]->(b) RETURN b

    Unoptimized (CTE):
        Depth 1: 100 rows
        Depth 2: 10,000 rows
        Depth 3: 1,000,000 rows
        Depth 4: 100,000,000 rows (!)

        Total: ~101 million partial paths in memory
        Each path: vertex IDs + edge arrays (growing)

    Optimized (DFS Executor):
        PlanState stack: max 4 PlanState (~1KB each)
        Path arrays: max 4 edges
        Total: ~10KB regardless of graph size


6. Key Techniques Summary
-------------------------

    ┌────┬───────────────────────────┬────────────────────────────────────────┐
    │ #  │ Technique                 │ Benefit                                │
    ├────┼───────────────────────────┼────────────────────────────────────────┤
    │ 1  │ Non-Directional Handling  │ Support -- (bidirectional) edges       │
    │ 2  │ Multi-Label Scan          │ Handle label inheritance               │
    │ 3  │ PlanState Stack Backtrack │ True DFS with O(depth) memory          │
    │ 4  │ Edge Uniqueness Check     │ Prevent cycles efficiently             │
    │ 5  │ Property Filtering        │ Early pruning during scan              │
    │ 6  │ Direction-Aware Traversal │ Support ->, <-, -- correctly           │
    │ 7  │ Result Streaming          │ Return results without full compute    │
    │ 8  │ Depth Expansion           │ Continue traversal to max depth        │
    └────┴───────────────────────────┴────────────────────────────────────────┘


7. Conclusion
-------------

The unoptimized CTE-based approach is simple to implement (query
rewriting only) but suffers from:

    - Exponential memory growth with depth
    - No true DFS traversal capability
    - No result streaming
    - Expensive array operations per row

The optimized executor-based approach requires more implementation
effort but provides:

    - O(depth) memory usage regardless of graph size
    - True DFS with backtracking capability
    - Streaming results as they are discovered
    - Efficient PlanState-based edge scanning
    - Clean integration with PostgreSQL executor framework

For production graph database systems, the optimized approach is
essential for handling real-world graph sizes and traversal depths.

This document focuses on the conceptual design. Implementation details
such as PlanState cloning vs. scan position saving, memory allocation
strategies, and index structures are left for follow-up discussion.

pgsql-hackers by date:

Previous
From: "Matheus Alcantara"
Date:
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts
Next
From: Henson Choi
Date:
Subject: Re: SQL Property Graph Queries (SQL/PGQ)