Re: Asymmetric partition-wise JOIN - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: Asymmetric partition-wise JOIN
Date
Msg-id 944ed18c-3e7d-42ef-816e-0afc41610e93@postgrespro.ru
Whole thread Raw
In response to Re: Asymmetric partition-wise JOIN  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 1/8/2024 20:56, Alexander Korotkov wrote:
> On Tue, Apr 2, 2024 at 6:07 AM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
> Actually, the idea I tried to express is the combination of #1 and #2:
> to build individual plan for every partition, but consider the 'Common
> Resources'.  Let me explain this a bit more.
Thanks for keeping your eye on it!
> My idea is to introduce a new property for paths selection.
> 3) Usage of common resources.  The common resource can be: hash
> representation of relation, memoize over relation scan, etc.  We can
> exclude the cost of common resource generation from the path cost, but
> keep the reference for the common resource with its generation cost.
> If one path uses more common resources than another path, it could
> cost-dominate another one only if its cheaper together with its extra
> common resources cost.  If one path uses less or equal common
> resources than another, it could normally cost-dominate another one.
The most challenging part for me is the cost calculation, which is 
bonded with estimations of other paths. To correctly estimate the 
effect, we need to remember at least the whole number of paths sharing 
resources.
Also, I wonder if it can cause some corner cases where prediction error 
on a shared resource will cause an even worse situation upstream.
I think we could push off here from an example and a counter-example, 
but I still can't find them.

> However, I understand this is huge amount of work given we have to
> introduce new basic optimizer concepts.  I get that the main
> application of this patch is sharding.  If we have global tables
> residing each shard, we can push down any joins with them.  Given this
> patch gives some optimization for non-sharded case, I think we
> *probably* can accept its concept even that it this optimization is
> obviously not perfect.
Yes, right now sharding is the most profitable case. We can push down 
parts of the plan which references only some common resources: 
FunctionScan, ValueScan, tables which can be proved are existed 
everywhere and provide the same output. But for now it is too far from 
the core code, IMO. - So, I search for cases that can be helpful for a 
single instance.

-- 
regards,
Andrei Lepikhov
Postgres Professional




pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: MultiXact\SLRU buffers configuration
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: gitmaster server problem?