Re: Remove restrictions in recursive query - Mailing list pgsql-hackers
From | Renan Alves Fonseca |
---|---|
Subject | Re: Remove restrictions in recursive query |
Date | |
Msg-id | CAN_p2QjTZ4M+o=5hkpTR1UYv-ga+2+9EuyHY=ho6ib9ys3O_YQ@mail.gmail.com Whole thread Raw |
In response to | Re: Remove restrictions in recursive query (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Remove restrictions in recursive query
|
List | pgsql-hackers |
On Thu, Mar 27, 2025 at 7:32 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > I'll assume that the silence about allowing GROUP BY means it is not a > > great idea... > > I don't think there's really anything to keep you from doing this -- > just put the grouping operation where you refer to the recursive CTE, > instead of inside the recursive CTE itself. I think allowing it to > appear inside the recursive CTE would be rather confusing -- it's > probably best if the mandatory UNION operation is at the top level. > I don't think I was clear enough about the proposed improvements. I hope this example will clarify. Consider a table representing the edges of a directed acyclic graph. The following lines will create an example of such a table. -- create DAG example \set output_degree 10 \set maxN 20000 create temp table dag(source,target) as (with recursive t1(source,target) as (select 1,1 union select target, target * i from t1, generate_series(2,:output_degree +1) i where target*i< :maxN ) select * from t1 where source!=target ); Then, suppose we want to count the number of distinct paths from the root (1) to all sink nodes. We use the following auxiliary table for the sink nodes: create temp table sink_nodes(node) as (select target from dag except select source from dag); The solution supported in PostgreSQL is the following: with recursive t1(node,path) as (select 1,array[1] union select dag.target, t1.path||dag.target from t1 join dag on t1.node=dag.source ) select count(*) from t1 join sink_nodes using (node) ; This query enumerates every path and probably has something like exponential complexity on the number of edges. It gives these results in my laptop: count --------- 1114163 (1 row) Time: 8121.044 ms (00:08.121) The solution using GROUP BY in the recursive query is the following: with recursive t1(node,nb_path) as (select 1,1::numeric union all (select dag.target, sum(nb_path) from t1 join dag on t1.node=dag.source group by 1) ) select sum(nb_path) from t1 join sink_nodes using (node) ; At every iteration step, we sum the number of paths that arrived at a given node. That is another class of algorithm complexity. The first query cannot scale to larger inputs, while this one can scale. Here is the result: sum --------- 1114163 (1 row) Time: 27.123 ms I hope this example made clear that allowing the GROUP BY in the recursive clause significantly increases the capabilities of doing analytics on graph data, and it is not only syntax sugar. Please let me know if there is another way of handling this problem efficiently without the proposed change. Thanks again for your attention, Renan
pgsql-hackers by date: