Re: Additional improvements to extended statistics - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Additional improvements to extended statistics
Date
Msg-id 4ba455c6-a0fa-cae7-7bc3-4aa5b6cd11d4@enterprisedb.com
Whole thread Raw
In response to Re: Additional improvements to extended statistics  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On 12/7/20 5:15 PM, Dean Rasheed wrote:
> On Wed, 2 Dec 2020 at 15:51, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>
>> The sort of queries I had in mind were things like this:
>>
>>   WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2)
>>
>> However, the new code doesn't apply the extended stats directly using
>> clauselist_selectivity_or() for this kind of query because there are
>> no RestrictInfos for the nested AND clauses, so
>> find_single_rel_for_clauses() (and similarly
>> statext_is_compatible_clause()) regards those clauses as not
>> compatible with extended stats. So what ends up happening is that
>> extended stats are used only when we descend down to the two AND
>> clauses, and their results are combined using the original "s1 + s2 -
>> s1 * s2" formula. That actually works OK in this case, because there
>> is no overlap between the two AND clauses, but it wouldn't work so
>> well if there was.
>>
>> I'm pretty sure that can be fixed by teaching
>> find_single_rel_for_clauses() and statext_is_compatible_clause() to
>> handle BoolExpr clauses, looking for RestrictInfos underneath them,
>> but I think that should be left for a follow-in patch.
> 
> Attached is a patch doing that, which improves a couple of the
> estimates for queries with AND clauses underneath OR clauses, as
> expected.
> 
> This also revealed a minor bug in the way that the estimates for
> multiple statistics objects were combined while processing an OR
> clause -- the estimates for the overlaps between clauses only apply
> for the current statistics object, so we really have to combine the
> estimates for each set of clauses for each statistics object as if
> they were independent of one another.
> 
> 0001 fixes the multiple-extended-stats issue for OR clauses, and 0002
> improves the estimates for sub-AND clauses underneath OR clauses.
> 

Cool! Thanks for taking time to investigate and fixing those. Both
patches seem fine to me.

> These are both quite small patches, that hopefully won't interfere
> with any of the other extended stats patches.
> 

I haven't tried, but it should not interfere with it too much.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS