Re: When you really want to force a certain join type? - Mailing list pgsql-performance
From | Gunther Schadow |
---|---|
Subject | Re: When you really want to force a certain join type? |
Date | |
Msg-id | c0cf4fbd-a5db-e5dc-935f-399a355c02ee@gusw.net Whole thread Raw |
In response to | Re: When you really want to force a certain join type? (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: When you really want to force a certain join type?
|
List | pgsql-performance |
Maybe the new parameter in v15 would help. https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-RECURSIVE-WORKTABLE-FACTOR recursive_worktable_factor (floating point) Sets the planner's estimate of the average size of the working table of a recursive query, as a multiple of the estimated size of the initial non-recursive term of the query. This helps the planner choose the most appropriate method for joining the working table to the query's other tables. The default value is 10.0. A smaller value such as 1.0 can be helpful when the recursion has low “fan-out” from one step to the next, as for example in shortest-path queries. Graph analytics queries may benefit from larger-than-default values.
Thanks that's something I will try after I upgraded.
But speaking of such other uses for recursive queries, I can say I have quite a bit of experience of turning graph related "traversal" and search and optimization and classification queries into SQL, in short, computing the transitive closure. And usually I have stayed away from the recursive WITH query and instead set up a start table and then perform the iterative step. And there are two ways to go about it. Say you have a graph, simple nodes and arcs. You want to find all paths through the graph.
Now you can set up start nodes and then extend them at the end by joining the recursive table to the simple arc table and extend your path every time. This is what the WITH RECURSIVE supports. These queries linearly iterate as many times as the length of the longest path.
with recursive arcs as ( select source, target, 1 as distance, ... from ... ), paths as ( select * from arcs union all select a.source, b.target, a.distance + 1 as distance, ... from paths a inner join arcs b on(b.source = a.target) ) select * from paths
But another way is to join paths with paths. It would be this, which I think I have seen postgresql unable to deal with:
with recursive arcs as ( select source, target, 1 as distance, ... from ... ), paths as ( select * from arcs union all select a.source, b.target, a.distance + 1 as distance, ... from paths a inner join paths b on(b.source = a.target) ) select * from paths
So, instead of the recursive union to join back to the fixed table, it joins the recursive table to the recursive table, and the benefit of that is that these queries converge much quicker. Instead of going 10 iterations to find a path of length 10, you go 1 iteration to find all paths of 2 (if distance 1 is the base table of all arcs), then next you find paths of up to 4 then you find paths of up to 8, then 16, 32, ... This converges much faster. I usually do that as follows
create table paths as select source, target, 1 as distance, ... from arcs; prepare rep as insert into paths(source, target, distance, ...) select a.source, b.target, a.distance + b.distance as distance, ... from paths a inner join paths b on(b.source = a.target) except select * from paths; execute rep; execute rep; ...
or instead of the except, in order to minimize distances:
where not exists (select 1 from paths x where x.source = a.source and x.target = a.target and x.distance < a.distance)
I have even done a group by in the recursive step which replaces the paths relation at every iteration (e.g. with only minimal distance paths).
Since this converges so rapidly I often prefer that approach over a recursive union query.
I think in IBM DB2 allowed to join the recursive table with itself. Is this something you want to support at some time?
Also, why even use the RECURSIVE keyword, DB2 didn't need it, and the query analyzer should immediately see the recursion, so no need to have that keyword.
regards,
-Gunther
pgsql-performance by date: