Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers

From Henson Choi
Subject Re: SQL Property Graph Queries (SQL/PGQ)
Date
Msg-id CAAAe_zCPm3W7ZTSBfKUgVJPrTn4-D99agGsz5P2MBLZN=tME3g@mail.gmail.com
Whole thread Raw
In response to Re: SQL Property Graph Queries (SQL/PGQ)  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
SQL/PGQ Patch Review Report
============================

Patch: SQL Property Graph Queries (SQL/PGQ)
Commitfest: https://commitfest.postgresql.org/patch/4904

Review Methodology:
  This review focused on quality assessment, not line-by-line code audit.
  Key code paths and quality issues were examined with surrounding context
  when concerns arose. Documentation files were reviewed with AI-assisted
  grammar and typo checking. Code coverage was measured using gcov and
  custom analysis tools.

Limitations:
  - Not a security audit or formal verification
  - Parser and executor internals reviewed at module level, not exhaustively
  - Performance characteristics not benchmarked


TABLE OF CONTENTS
-----------------

1. Executive Summary
2. Issues Found
   2.1 Critical / Major
   2.2 Minor Issues
   2.3 Suggestions for Discussion
3. Test Coverage
   3.1 Covered Areas
   3.2 Untested Items
   3.3 Unimplemented Features (No Test Needed)
4. Code Coverage Analysis
   4.1 Overall Coverage
   4.2 Coverage by File
   4.3 Uncovered Code Risk Assessment
5. Recommended Additional Tests
   5.1 Black-box Tests (Functional)
   5.2 White-box Tests (Coverage)
   5.3 Untestable Code (Defensive)
6. Recommendations
7. Conclusion


1. EXECUTIVE SUMMARY
--------------------

Overall assessment: GOOD

The SQL/PGQ patch demonstrates solid implementation quality within its
defined scope. Code follows PostgreSQL coding standards, test coverage
is comprehensive at 90.5%, and documentation is thorough with only
minor typos.

Rating by Area:
- Code Quality:     Excellent (PostgreSQL style compliant, 1 FIXME)
- Test Coverage:    Good (90.5% line coverage, ~180 test cases)
- Documentation:    Good (Complete, 5 minor issues)
- Build/Regress:    Pass (make check-world, 56 test suites passed)


2. ISSUES FOUND
---------------

2.1 Critical / Major
(None)

2.2 Minor Issues

#1 [Code] src/backend/commands/propgraphcmds.c:1632
   FIXME: Missing index for plppropid column causes sequential scan.
   Decision needed: (a) add index, or (c) allow seq scan for rare path.

#2 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:286
   Compatibility section incorrectly states "CREATE PROPERTY GRAPH"
   Should be: "ALTER PROPERTY GRAPH"

#3 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:65
   Synopsis shows VERTEX/NODE and EDGE/RELATIONSHIP as separate clauses,
   but Description merges them as {VERTEX|NODE|EDGE|RELATIONSHIP}.

#4 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:80
   Grammar error: "the graph removed" should be "the graph is removed"

#5 [Doc] doc/src/sgml/queries.sgml:2929,2931
   TODO comments for unimplemented features without clear limitation notes.

#6 [Doc] doc/src/sgml/ddl.sgml:5717
   Typo: "infrastucture" should be "infrastructure"

2.3 Alternative Approaches for Discussion

#1 Support CREATE PROPERTY GRAPH IF NOT EXISTS
   Rationale: PostgreSQL-style extension, consistent with other DDL.

#2 Return 0 rows for non-existent labels instead of error
   Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:NonExistent) ...) raises error
   Alternative: Return empty result set instead
   Rationale: Better for exploratory queries, similar to SQL WHERE on
   non-existent values returning 0 rows rather than error.

#3 Return 0 rows when same variable has different labels
   Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:Person)-[e]-(a:Company) ...)
   raises error because variable 'a' has conflicting labels.
   Alternative: Return empty result set instead
   Rationale: Logically, a node cannot be both Person AND Company,
   so 0 rows is the correct result. Consistent with standard SQL
   WHERE semantics (impossible condition = 0 rows, not error).


3. TEST COVERAGE
----------------

3.1 Covered Areas

- CREATE PROPERTY GRAPH: Empty graph, VERTEX/EDGE TABLES, KEY, LABEL, PROPERTIES
- ALTER PROPERTY GRAPH: ADD/DROP VERTEX/EDGE, ADD/DROP LABEL, ADD/DROP PROPERTIES
- DROP PROPERTY GRAPH: Basic DROP, IF EXISTS
- RENAME TO / SET SCHEMA: Tested in alter_generic.sql
- OWNER TO: Permission change tests
- GRAPH_TABLE queries: Pattern matching, WHERE, COLUMNS, LATERAL, etc.
- RLS integration: PERMISSIVE/RESTRICTIVE policies, inheritance/partition tables
- Error cases: Type/collation mismatch, non-existent objects, etc.
- psql integration: \dG, \d commands
- pg_dump: Tested in t/002_pg_dump.pl

3.2 Untested Items

- CREATE TEMP PROPERTY GRAPH (Medium priority)
- DROP ... CASCADE (Low priority)
- DROP ... RESTRICT (Low priority)

3.3 Unimplemented Features (No Test Needed)

- Multiple path patterns: Parser supports, execution not implemented
- Quantifier {n,m}: Parser supports, execution not implemented


4. CODE COVERAGE ANALYSIS
-------------------------

4.1 Overall Coverage

90.5% (2,029 / 2,243 lines)

4.2 Coverage by File

propgraphcmds.c:      94.6% (DDL command processing)
rewriteGraphTable.c:  94.6% (GRAPH_TABLE rewrite)
parse_graphtable.c:   97.6% (Graph query parser)
parse_clause.c:       98.0% (FROM clause parser)
ruleutils.c:          85.1% (pg_get_propgraphdef, etc.)
pg_dump.c:            84.0% (Dump/restore)
describe.c:           88.9% (psql \dG command)

4.3 Uncovered Code Risk Assessment

Low Risk (13 items):
- Defensive code, debug functions, unimplemented feature guards

Medium Risk (1 item):
- TEMP graph functionality untested

Conclusion: Most uncovered code consists of error handling, unimplemented
feature guards, and debug utilities. No security or functional risk.


5. RECOMMENDED ADDITIONAL TESTS
-------------------------------

5.1 Black-box Tests (Functional)

Based on feature specification, independent of code structure.

Feature              Test Case                                    Priority
-------------------------------------------------------------------------------
TEMP graph           CREATE TEMP PROPERTY GRAPH                   Medium
TEMP graph           TEMP graph with permanent table reference    Medium
TEMP graph           ALTER ADD permanent table to TEMP graph      Medium
CASCADE              DROP ... CASCADE (dependent view cascade)    Low
RESTRICT             DROP ... RESTRICT (dependent object error)   Low

5.2 White-box Tests (Coverage)

Based on uncovered code paths identified in coverage analysis.

File:Line                          Test Case                      Target Branch
-------------------------------------------------------------------------------
propgraphcmds.c:136,179,254-262    TEMP table in graph creation   Auto TEMP conversion
propgraphcmds.c:1323,1364          ALTER ADD TEMP to perm graph   Error branch
propgraphcmds.c:724-734            CREATE without LABEL clause    Default label else
propgraphcmds.c:1300,1303          ALTER IF EXISTS non-existent   missing_ok branch
propgraphcmds.c:116                CREATE UNLOGGED attempt        UNLOGGED error
execMain.c:1162-1163               DML on graph                   RELKIND_PROPGRAPH
rewriteGraphTable.c:120            Multiple path patterns         Length check
rewriteGraphTable.c:205            Quantifier usage               Quantifier error
ruleutils.c:7939-7963              Complex label VIEW deparsing   T_BoolExpr branch

5.3 Untestable Code (Defensive)

File:Line                          Reason
-------------------------------------------------------------------------------
rewriteGraphTable.c:202            PAREN_EXPR blocked at parser level
rewriteGraphTable.c:297,304,319    Cyclic pattern edge conflict - unreachable
rewriteGraphTable.c:801-818        get_gep_kind_name - error path only
nodeFuncs.c:4585-4596,4755-4774    Walker branches - internal implementation


6. RECOMMENDATIONS
------------------

6.1 Code Review Required (Minor, Decision Needed)

Location: propgraphcmds.c:1632
Issue: FIXME - No single-column index for plppropid
Current: InvalidOid causes sequential scan
Purpose: Check property references across all labels when deleting orphaned properties
Options:
  (a) Add index: Create new single-column index on plppropid
  (b) Use existing: NOT POSSIBLE (cannot specify plpellabelid, need all labels)
  (c) Allow: Rare path (property deletion), sequential scan acceptable

6.2 Documentation Fixes (Minor)

- alter_property_graph.sgml: 3 typos/errors to fix
- queries.sgml: Add clear limitation notes for unimplemented features
- ddl.sgml: Fix "infrastucture" typo

6.3 Test Additions (Optional)

- CREATE TEMP PROPERTY GRAPH tests
- DROP ... CASCADE/RESTRICT tests


7. CONCLUSION
-------------

Test Quality: GOOD

Core functionality is thoroughly tested with comprehensive error case and
security (RLS) integration tests.

The patch is well-implemented within its defined scope. Identified issues
are minor documentation typos and one code decision point (FIXME).
No critical or major issues found.

Recommended actions before commit:
1. Decide on FIXME at propgraphcmds.c:1632 (index vs. allow seq scan)
2. Fix 5 documentation issues
3. Consider adding TEMP graph tests (optional but recommended)

Points for discussion (optional):
4. CREATE PROPERTY GRAPH IF NOT EXISTS support
5. Error vs. 0 rows behavior for non-existent/conflicting labels


Attachment:
- coverage_report.tar.gz (HTML coverage report generated by gcov)


---
End of Report

2025년 12월 30일 (화) PM 8:27, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>님이 작성:
On Tue, Dec 30, 2025 at 3:14 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 2:45 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > >
> > > I did another investigation about whether this level of checking is
> > > necessary.  I think according to the letter of the SQL standard, the
> > > typmods must indeed match.  It seems Oracle does not check (the example
> > > mentioned above came from an Oracle source).  But I think it's okay to
> > > keep the check.  In PostgreSQL, it is much less common to write like
> > > varchar(1000).  And we can always consider relaxing it later.
> >
> > +1.
> >
> > Attached patch adds a couple more test statements.
> >
>
> Squashed this into the main patchset.
>
> > >
> > > 2) I had it in my notes to consider whether we should support the colon
> > > syntax for label expressions.  I think we might have talked about that
> > > before.
> > >
> > > I'm now leaning toward not supporting it in the first iteration.  I
> > > don't know that we have fully explored possible conflicts with host
> > > variable syntax in ecpg and psql and the like.  Maybe avoid that for now.
> > >
> >
> > I was aware of ecpg and I vaguely remember we fixed something in ECPG
> > to allow : in MATCH statement. Probably following changes in
> > psqlscan.l and pgc.l
> > -self                   [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
> > +self                   [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=]
> >
> > Those changes add | after : (and not the : itself) so maybe they are
> > not about supporting : . Do you remember what those are?
>
> I reverted those changes from both the files and ran "meson test". I
> did not observe any failure. It seems those changes are not needed.
> But adding them as a separate commit (0004) in case CI bot reveals any
> failures without them.
>
> I noticed that there were no ECPG tests for SQL/PGQ. Added a basic
> test in patch 0003.
>
> >
> > I spotted some examples that use : in ddl.sgml.
> > <programlisting>
> > SELECT customer_name FROM GRAPH_TABLE (myshop MATCH
> > (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date)
> > COLUMNS (c.name AS customer_name));
> > </programlisting>
> >
> > The query demonstrates that one can use label names in a way that will
> > make the pattern look like an English sentence. Replacing : with IS
> > defeats that purpose.
> >
> > As written in that paragraph, the labels serve the purpose of exposing
> > the table with a different logical view (using different label and
> > property names). So we need that paragraph, but I think we should
> > change the example to use IS instead of :. Attached is suggested
> > minimal change, but I am not happy with it. Another possibility is we
> > completely remove that paragraph; I don't think we need to discuss all
> > possible usages the users will come up with.
> >
> > The patch changes one more instance of : by IS. But that's straight forward.
> >
> > In ddl.sgml I noticed a seemingly incomplete sentence
> >    A property graph is a way to represent database contents, instead of using
> >    relational structures such as tables.
> >
> > Represent the contents as what? I feel the complete sentence should be
> > one of the following
> > property graph is a way to represent database contents as a graph,
> > instead of representing those as relational structures OR
> > property graph is another way to represent database contents instead
> > of using relational structures such as tables
> >
> > But I can't figure out what was originally intended.
>
>
> 0002 contains some edits to this part of documentation. I think the
> paragraph reads better than before. Let me know what you think.
>
> Please let me know which of 0002 to 0004 look good to you. I will
> squash those into the patchset in the next version.

The previous patchset had a whitespace difference in ECPG expected
output files. Fixed in the attached patchset.

--
Best Wishes,
Ashutosh Bapat
Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: CREATE TABLE LIKE INCLUDING TRIGGERS
Next
From: Yugo Nagata
Date:
Subject: Re: Fix comments on _bt_skiparray_strat_increment/decrement