Re: Bad estimate with partial index - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Bad estimate with partial index
Date
Msg-id 34f4218c-f267-a851-d744-75e242661848@enterprisedb.com
Whole thread Raw
In response to Re: Bad estimate with partial index  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Bad estimate with partial index  (Tom Lane <tgl@sss.pgh.pa.us>)
RE: Bad estimate with partial index  (André Hänsel <andre@webkr.de>)
List pgsql-hackers

On 4/20/22 09:58, Tomas Vondra wrote:
> On 4/19/22 23:08, Tom Lane wrote:
>> I wrote:
>>> it looks like the problem is that the extended stats haven't been used
>>> while forming the estimate of the number of index entries retrieved, so
>>> we overestimate the cost of using this index.
>>> That seems like a bug.  Tomas?
>>
>> I dug into this enough to locate the source of the problem.
>> btcostestimate includes the partial index clauses in what it
>> sends to clauselist_selectivity, but per the comments for
>> add_predicate_to_index_quals:
>>
>>  * Note that indexQuals contains RestrictInfo nodes while the indpred
>>  * does not, so the output list will be mixed.  This is OK for both
>>  * predicate_implied_by() and clauselist_selectivity(), but might be
>>  * problematic if the result were passed to other things.
>>
>> That comment was true when it was written, but it's been falsified
>> by the extended-stats patches, which have added a whole lot of logic
>> in and under clauselist_selectivity that ignores clauses that are not
>> RestrictInfos.
>>
>> While we could perhaps fix this by having add_predicate_to_index_quals
>> add RestrictInfos, I'm inclined to feel that the extended-stats code
>> is in the wrong.  The contract for clauselist_selectivity has always
>> been that it could optimize if given RestrictInfos rather than bare
>> clauses, not that it would fail to work entirely without them.
>> There are probably more places besides add_predicate_to_index_quals
>> that are relying on that.
>>
> 
> Yes, that seems like a fair assessment. I'll look into fixing this, not
> sure how invasive it will get, though.
> 

So, here's a WIP fix that improves the example shared by Andre, and does
not seem to break anything (or at least not any regression test).

The whole idea is that instead of bailing out for non-RestrictInfo case,
it calculates the necessary information for the clause from scratch.
This means relids and pseudoconstant flag, which are checked to decide
if the clause is compatible with extended stats.

But when inspecting how to calculate pseudoconstant, I realized that
maybe that's not really needed. Per distribute_qual_to_rels() we only
set it to 'true' when bms_is_empty(relids), and we already check that
relids is a singleton, so it can't be empty - which means pseudoconstant
can't be true either.


Andre, are you in position to test this fix with your application? Which
Postgres version are you using, actually?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: using an end-of-recovery record in all cases
Next
From: "David G. Johnston"
Date:
Subject: Re: Odd off-by-one dirty buffers and checkpoint buffers written