Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers

From Henson Choi
Subject Re: SQL Property Graph Queries (SQL/PGQ)
Date
Msg-id CAAAe_zDbS-uDYgbQ3pQn+BD02JvCTek2LzpdK4b-xy48k7ZMmg@mail.gmail.com
Whole thread Raw
In response to Re: SQL Property Graph Queries (SQL/PGQ)  (Henson Choi <assam258@gmail.com>)
List pgsql-hackers
Hi hackers,

I think it may be a bit early for this, but I wanted to share this design
document now so we can revisit and discuss when the time is right.

=====================================================
Shortestpath Implementation Strategies for PostgreSQL
=====================================================

1. Overview
-----------

This document describes two implementation strategies for shortest path
queries in PostgreSQL-based graph databases.

Shortestpath Query Example:

    MATCH p = shortestpath((a)-[*]->(b)) RETURN p
    MATCH p = allshortestpaths((a)-[:knows*]->(b)) RETURN p

Two implementation approaches:

    1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
    2. Optimized:   Custom executor node with bidirectional BFS


2. Unoptimized Approach: WITH RECURSIVE Rewriting
-------------------------------------------------

2.1 Query Transformation

Cypher query:

    MATCH p = shortestpath((a:person)-[:knows*]->(b:person))
    WHERE a.name = 'Alice' AND b.name = 'Bob'
    RETURN p

Rewritten SQL using WITH RECURSIVE:

    WITH RECURSIVE sp_paths AS (
        -- Base case: start vertex
        SELECT v.id as start_vid,
               v.id as end_vid,
               ARRAY[v.id] as vertex_ids,
               ARRAY[]::bigint[] as edge_ids,
               0 as depth
        FROM person v
        WHERE name = 'Alice'

        UNION ALL

        -- Recursive case: expand one hop
        SELECT p.start_vid,
               e.end_id,
               p.vertex_ids || e.end_id,
               p.edge_ids || e.id,
               p.depth + 1
        FROM sp_paths p
        JOIN knows e ON p.end_vid = e.start_id
        WHERE e.end_id <> ALL(p.vertex_ids)  -- Cycle prevention
    )
    SELECT vertex_ids, edge_ids, depth
    FROM sp_paths
    WHERE end_vid = (SELECT id FROM person WHERE name = 'Bob')
    ORDER BY depth
    LIMIT 1;

2.2 Execution Plan

    Limit
      ->  Sort
            Sort Key: depth
            ->  CTE Scan on sp_paths
                  Filter: (end_vid = $bob_id)
                  CTE sp_paths
                    Recursive Union
                      Seq Scan on person v
                        Filter: (name = 'Alice')
                      Hash Join
                        Hash Cond: (p.end_vid = e.start_id)
                        Filter: (e.end_id <> ALL(p.vertex_ids))
                        WorkTable Scan on sp_paths p
                        Hash
                          Seq Scan on knows e

2.3 Problems

    ┌─────────────────────┬──────────────────────────────────────────┐
    │ Problem             │ Description                              │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Unidirectional      │ Search expands only from start vertex    │
    │                     │ Misses optimization from end vertex      │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Full Exploration    │ Must explore all paths up to min depth   │
    │                     │ before finding shortest                  │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ No Early Termination│ Cannot stop when shortest found          │
    │                     │ CTE runs to completion                   │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Memory Usage        │ O(V) visited vertices per iteration      │
    │                     │ All partial paths stored                 │
    ├─────────────────────┼──────────────────────────────────────────┤
    │ Cycle Prevention    │ <> ALL(array) grows with path length     │
    │                     │ O(path_length) per edge check            │
    └─────────────────────┴──────────────────────────────────────────┘

2.4 Search Space Comparison

    Unidirectional BFS (CTE):

        Start ──────────────────────────────────────> End
              ╲                                      ╱
               ╲    Explores entire cone           ╱
                ╲      from start                 ╱
                 ╲                              ╱
                  ╲   Search space: O(B^D)    ╱
                   ╲                        ╱
                    ────────────────────────

    Bidirectional BFS (Optimized):

        Start ────────> <──────── End
              ╲        ╱╲       ╱
               ╲      ╱  ╲     ╱
                ╲    ╱    ╲   ╱
                 ╲  ╱ Meet ╲ ╱
                  ╲╱ Point  ╲
                   Search space: O(2 * B^(D/2))


3. Optimized Approach: Custom Executor Node
-------------------------------------------

3.1 Design Principle

Instead of query rewriting, implement a dedicated executor node that:

    - Performs bidirectional BFS from both start and end vertices
    - Uses hash tables to track visited vertices from each direction
    - Detects meeting point where both searches intersect
    - Terminates early when shortest path is found

3.2 Plan Node Structure

    ┌────────────────────────────────────────────────────────────┐
    │                    ShortestpathPlan                        │
    │  - Bidirectional BFS with dual hash tables                 │
    │  - Early termination on path found                         │
    └────────────────────────────────────────────────────────────┘
                    │                       │
              left  │                       │ right
                    ▼                       ▼
    ┌───────────────────────────┐   ┌───────────────────────────┐
    │     Hash2Side (Left)      │   │     Hash2Side (Right)     │
    │  - KeyTable + HashTable   │   │  - KeyTable + HashTable   │
    │  - Forward direction      │   │  - Reverse direction      │
    └───────────────────────────┘   └───────────────────────────┘
                    │                       │
                    ▼                       ▼
    ┌───────────────────────────┐   ┌───────────────────────────┐
    │   SubPlan (Forward)       │   │   SubPlan (Reverse)       │
    │  Index Cond:              │   │  Index Cond:              │
    │  start_id = $current      │   │  end_id = $current        │
    └───────────────────────────┘   └───────────────────────────┘

    Execution Flow:

    Initialize:
        Left HashTable  = {start_vertex}
        Right HashTable = {end_vertex}
           │
           ▼
    Expansion loop (smaller side first):
           │
           ├──→ Compare |Left frontier| vs |Right frontier|
           │         │
           │         └──→ Expand smaller side
           │                   │
           │                   └──→ Check intersection with other side
           │
           ├──→ Repeat until intersection found
           │
           └──→ Return shortest path(s)


3.3 Key Concept: Dual Hash Tables

    The executor maintains two hash tables for bidirectional search:

    ┌─────────────────────────────────────────────────────────────┐
    │  Left Direction (Start → End)                               │
    │  ┌─────────────────────────────────────────────────────┐   │
    │  │ KeyTable:   Previous hop vertices (expansion base)  │   │
    │  │ HashTable:  Current hop results (newly discovered)  │   │
    │  └─────────────────────────────────────────────────────┘   │
    ├─────────────────────────────────────────────────────────────┤
    │  Right Direction (End → Start)                              │
    │  ┌─────────────────────────────────────────────────────┐   │
    │  │ KeyTable:   Previous hop vertices (expansion base)  │   │
    │  │ HashTable:  Current hop results (newly discovered)  │   │
    │  └─────────────────────────────────────────────────────┘   │
    └─────────────────────────────────────────────────────────────┘

    After each hop:
        - Swap KeyTable and HashTable (new base for next hop)
        - Check for intersection between Left and Right HashTables
        - If intersection found, reconstruct path through meeting point

3.4 Key Concept: Smaller Side Expansion

    At each hop, expand from the direction with fewer frontier vertices.
    This minimizes the total number of edges scanned.

    Example (highly unbalanced tree):

        Graph structure:
            - Forward (Left->Right):  branching factor = 100
            - Backward (Right->Left): branching factor = 2
            - Path distance: 6 hops

        Simple alternating:

            Hop 1: expand Left  (1)      -> Left=100,       Right=1
            Hop 2: expand Right (1)      -> Left=100,       Right=2
            Hop 3: expand Left  (100)    -> Left=10,000,    Right=2
            Hop 4: expand Right (2)      -> Left=10,000,    Right=4
            Hop 5: expand Left  (10,000) -> Left=1,000,000, Right=4
            Hop 6: expand Right (4)      -> meet!

            Total vertices expanded: 1 + 1 + 100 + 2 + 10,000 + 4 = 10,108

        With smaller side expansion:

            Hop 1: Left=1, Right=1 -> expand Left  -> Left=100, Right=1
            Hop 2: 1 < 100         -> expand Right -> Left=100, Right=2
            Hop 3: 2 < 100         -> expand Right -> Left=100, Right=4
            Hop 4: 4 < 100         -> expand Right -> Left=100, Right=8
            Hop 5: 8 < 100         -> expand Right -> Left=100, Right=16
            Hop 6: 16 < 100        -> expand Right -> meet!

            Total vertices expanded: 1 + 1 + 2 + 4 + 8 + 16 = 32

        Speedup: 10,108 / 32 = ~300x fewer vertices explored


3.5 Key Concept: Cycle Detection via Hash Lookup

    Hash table insertion automatically prevents revisiting vertices.

    Two tuple types in HashTable:

        ┌─────────────────────────────────────────────────────────────┐
        │  Long Tuple (path data)                                     │
        │  ┌─────────┬─────────┬─────────────────────────────────┐   │
        │  │ Header  │ Graphid │ Edge Rowids (path to vertex)    │   │
        │  └─────────┴─────────┴─────────────────────────────────┘   │
        ├─────────────────────────────────────────────────────────────┤
        │  Short Tuple (visited marker)                               │
        │  ┌─────────┬─────────┐                                     │
        │  │ Header  │ Graphid │  <- No path data, just marker       │
        │  └─────────┴─────────┘                                     │
        └─────────────────────────────────────────────────────────────┘

    Insertion logic:

        1. Inserting long tuple (with path):
           - Check if short tuple (marker) exists for same Graphid
           - If marker exists: reject insertion (already visited)

        2. Inserting short tuple (marker):
           - Insert marker first
           - Remove all existing long tuples with same Graphid
           - Vertex is now "locked" - no more paths to this vertex

    Example:

        Left Hop 1: Expand from A
            HashTable = {B(long), C(long)}  <- paths A->B, A->C

        Left and Right meet at C -> shortest path found

        Insert C(short) marker:
            HashTable = {B(long), C(short)} <- C's long tuple removed

        Later, another path tries to reach C (e.g., D->C):
            -> C(short) marker exists -> insertion rejected
            -> Already processed at shortest hop

    Benefits:
        - O(1) duplicate detection via hash lookup
        - No separate "visited" set needed
        - Automatic cleanup of superseded paths
        - Memory efficient (markers are minimal size)


3.6 Key Concept: Batch Synchronization

    When hash table exceeds memory, it splits into multiple batches.
    Left and Right hash tables must have matching batch structure:

        Problem without synchronization:

            Left:   nbatch=1  [all data in batch 0]
            Right:  nbatch=4  [batch0] [batch1] [batch2] [batch3]

            -> Cannot find intersection (different batch structures)

        Solution:

            When Right splits into 4 batches, clone structure to Left:

            Left:   nbatch=4  [batch0] [batch1] [batch2] [batch3]
            Right:  nbatch=4  [batch0] [batch1] [batch2] [batch3]
                                 |        |        |        |
                              Join only matching batches

    This ensures correct intersection detection across batch boundaries.


3.7 Execution Plan (Optimized)

    EXPLAIN MATCH p = shortestpath((a:person)-[:knows*]->(b:person))
            WHERE a.name='Alice' AND b.name='Bob' RETURN p

    Shortestpath  (limit=1)
      ->  Hash2Side  (left direction=FORWARD)
            ->  Index Scan on knows e
                  Index Cond: (start_id = $current_vertex)
      ->  Hash2Side  (right direction=REVERSE)
            ->  Index Scan on knows e
                  Index Cond: (end_id = $current_vertex)


4. Comparison Summary
---------------------

    ┌─────────────────────┬─────────────────────────┬─────────────────────────┐
    │ Aspect              │ Unoptimized (CTE)       │ Optimized (Executor)    │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Implementation      │ Query rewriting         │ Custom executor node    │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Search Direction    │ Unidirectional          │ Bidirectional           │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Search Space        │ O(B^D)                  │ O(2 * B^(D/2))          │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Early Termination   │ Not possible            │ Stop when path found    │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Data Structure      │ WorkTable (tuplestore)  │ Dual hash tables        │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Cycle Prevention    │ Array membership check  │ Hash table lookup O(1)  │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ Code Complexity     │ Low (rewriting only)    │ High (new executor)     │
    ├─────────────────────┼─────────────────────────┼─────────────────────────┤
    │ PostgreSQL Changes  │ None                    │ New plan/executor node  │
    └─────────────────────┴─────────────────────────┴─────────────────────────┘


5. Search Space Example
-----------------------

    Graph: Social network, avg 100 connections per person
    Query: shortestpath((Alice)-[*]->(Bob)), actual distance = 6 hops

    Unidirectional BFS (CTE):
        Hop 1: 100 vertices
        Hop 2: 10,000 vertices
        Hop 3: 1,000,000 vertices
        Hop 4: 100,000,000 vertices
        Hop 5: 10,000,000,000 vertices
        Hop 6: Find Bob

        Total vertices explored: ~10 billion

    Bidirectional BFS with smaller side expansion:
        Hop 1: Left=1, Right=1   -> expand Left  -> Left=100
        Hop 2: 1 < 100           -> expand Right -> Right=100
        Hop 3: 100 = 100         -> expand Left  -> Left=10,000
        Hop 4: 100 < 10,000      -> expand Right -> Right=10,000
        Hop 5: 10,000 = 10,000   -> expand Left  -> Left=1,000,000
        Hop 6: 10,000 < 1,000,000-> expand Right -> Meet!

        Total vertices explored: ~1 million (each side explores ~500K)

    Speedup: ~10,000x fewer vertices explored


6. Key Techniques Summary
-------------------------

    ┌────┬───────────────────────────┬────────────────────────────────────────┐
    │ #  │ Technique                 │ Benefit                                │
    ├────┼───────────────────────────┼────────────────────────────────────────┤
    │ 1  │ Bidirectional BFS         │ Exponential reduction in search space  │
    │ 2  │ Dual Hash Tables          │ O(1) visited check, O(1) intersection  │
    │ 3  │ Smaller Side Expansion    │ Expand from side with fewer vertices   │
    │ 4  │ Early Termination         │ Stop immediately when path found       │
    │ 5  │ Cycle Detection           │ Hash lookup prevents revisiting vertex │
    │ 6  │ Batch Synchronization     │ Match nbatch between Left/Right hashes │
    │ 7  │ Path Reconstruction       │ Merge left and right paths at meeting  │
    │ 8  │ Parameterized Subplan     │ Reuse edge scan plan per vertex        │
    │ 9  │ Batch Processing          │ Handle memory overflow with batching   │
    │ 10 │ Limit Control             │ shortestpath (1) vs allshortestpaths   │
    └────┴───────────────────────────┴────────────────────────────────────────┘


7. Conclusion
-------------

The unoptimized CTE-based approach is simpler to implement but suffers from:

    - Unidirectional search with O(B^D) complexity
    - No early termination capability
    - Inefficient cycle prevention with arrays
    - Must explore all paths before finding shortest

The optimized executor-based approach requires more implementation
effort but provides:

    - Bidirectional BFS with O(B^(D/2)) complexity
    - Early termination when shortest path found
    - Efficient hash-based visited tracking
    - Support for both shortestpath and allshortestpaths

For production graph database systems, the bidirectional BFS approach
is essential for practical shortest path queries on large graphs.

This document focuses on the conceptual design. Implementation details
such as hash table sizing, batch overflow handling, and path reconstruction
algorithms are left for follow-up discussion.

pgsql-hackers by date:

Previous
From: Henson Choi
Date:
Subject: Re: SQL Property Graph Queries (SQL/PGQ)
Next
From: Alvaro Herrera
Date:
Subject: Re: Adding REPACK [concurrently]