Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: SQL Property Graph Queries (SQL/PGQ)
Date
Msg-id CAEG8a3JCFJ8-TwAZOeDgnjzhxZ4c1vb2RUuMRN=UiCU=6OKw_Q@mail.gmail.com
Whole thread Raw
In response to Re: SQL Property Graph Queries (SQL/PGQ)  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
Hi Ashutosh and Peter,

On Wed, Mar 12, 2025 at 12:31 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Thanks
>
> On Tue, Mar 11, 2025 at 7:51 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> >
> > Here is a new version with Amit's fix and my trivial refactors.
> >
> > 0001-0010 is the same as Ashutosh's last email
> > 0011 is Amit's fix of the crash in ExecCheckPermissions
>
> I think that fix is correct and it fixes the crash. I need to think
> more about it - especially whether there are more places where we are
> missing RTE_GRAPH_TABLE. But will do that as time permits.
>
> > 0012 is a trivial fix that remove the test with default collation, or
> > it will fail CI, see[1]
> > 0013-0015 are some trivial fix and refactor, feel free to incorporate
> > or drop when you review them.
> >
> > [1]:
https://api.cirrus-ci.com/v1/artifact/task/5290818690351104/testrun/build/testrun/regress/regress/regression.diffs
> >
>
> Thanks Junwang. I have added your patches to my local repository. Next
> time I post an updated set, I will post it along with your patches.
> Will merge them into the original patches as time permits.
>
> --
> Best Wishes,
> Ashutosh Bapat

Since this PGQ feature won't be in PG 18, I'd like to raise a discussion of
the possibility of implementing the quantifier feature, which I think is a
quite useful feature in the graph database area.

I'll start with a graph definition first.

`Person(id, name, age, sex)` with id as PK
`Knows(id, start_id, end_id, since)` with id as PK, start_id and
end_id FK referencing Person's id

insert into Person values(1, 'A', 31, 'M'), (2, 'B', 30, 'F'), (3,
'C', 33, 'M'), (4, 'D', 31, 'F'), (5, 'E', 32, 'M'), (6, 'F', 33,
'M');
insert into Knows values (1, 1, 2, '2020');  -- A knows B since 2020
insert into Knows values (2, 1, 3, '2021');  -- A knows C since 2021
insert into Knows values (3, 1, 4, '2020');  -- A knows D since 2020
insert into Knows values (4, 2, 4, '2023');  -- B knows D since 2023
insert into Knows values (5, 3, 5, '2022');  -- C knows E since 2022
insert into Knows values (6, 2, 6, '2021');  -- B knows F since 2021
insert into Knows values (7, 4, 6, '2020');  -- D knows F since 2020

Then we create a property graph:

CREATE property graph new_graph
VERTEX TABLES (Person)
EDGE TABLES (Knows);

If we want to find A's non-directly known friends within 3 hops, we can query:

select name from graph_table (new_graph match (a:Person WHERE a.name =
'A') --> (b:Person) --> (c:Person) COLUMNS (c.name))
union
select name from graph_table (new_graph match (a:Person WHERE a.name =
'A') --> (b:Person) -->(c:Person)-->(d:Person) COLUMNS (d.name));

Or if we support quantifier, we can simply the query as:

select name from graph_table (new_graph match (a:Person WHERE a.name =
'A') -->{2,3} (b:Person) COLUMNS (b.name));

In the current design of PostgreSQL, we can rewrite this pattern with
quantifiers to
the union form with some effort.

But what if the pattern is more complicated, for example:

1. select name, since from graph_table (new_graph match (a:Person
WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
r.since));
Can we support the r.since column? I guess not, in this case r is a
variable length edge.

2. select name, count from graph_table (new_graph match (a:Person
WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
count(r)));
Can we support this count aggregation(this is called horizontal
aggregation in Oracle's pgql)? How can the executor know the length of
the variable length edge?

3. What if the query doesn't specify the Label of edge, and there can
be different edge labels of r, can we easily do the rewrite?

I did some study of the apache age, they have fixed columns for node
labels(id, agtype)
and edge labels(id, source_id, end_id, agtype), agtype is kind of
json. So they can
resolve the above question easily.

Above are just my random thoughts of the quantifier feature, I don't have a copy
of the PGQ standard, so I'd like to hear your opinion about this.

[1] https://pgql-lang.org/spec/1.2/#horizontal-aggregation-using-group-variables

--
Regards
Junwang Zhao



pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Modern SHA2- based password hashes for pgcrypto
Next
From: Tomas Vondra
Date:
Subject: Re: Draft for basic NUMA observability