Home > mailing lists

Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: PATCH: add support for IN and @> in functional-dependencystatistics use
Date	March 17, 2020 15:37:06
Msg-id	20200317153706.z6pdtijr25iyr424@development Whole thread
In response to	Re: PATCH: add support for IN and @> in functional-dependencystatistics use (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses	Re: PATCH: add support for IN and @> in functional-dependencystatistics use Re: PATCH: add support for IN and @> in functional-dependencystatistics use
List	pgsql-hackers

Tree view

On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote:
>On Sat, 14 Mar 2020 at 18:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> I realized there's one more thing that probably needs discussing.
>> Essentially, these two clause types are the same:
>>
>>    a IN (1, 2, 3)
>>
>>    (a = 1 OR a = 2 OR a = 3)
>>
>> but with 8f321bd1 we only recognize the first one as compatible with
>> functional dependencies. It was always the case that we estimated those
>> two clauses a bit differently, but the differences were usually small.
>> But now that we recognize IN as compatible with dependencies, the
>> difference may be much larger, which bugs me a bit ...
>>
>> So I wonder if we should recognize the special form of an OR clause,
>> with all Vars referencing to the same attribute etc. and treat this as
>> supported by functional dependencies - the attached patch does that.
>> MCV lists there's already no difference because OR clauses are
>> supported.
>>
>
>Makes sense, and the patch looks straightforward enough.
>
>> The question is whether we want to do this, and whether we should also
>> teach the per-column estimates to recognize this special case of IN
>> clause.
>
>I'm not convinced about that second part though. I'd say that
>recognising the OR clause for functional dependencies is sufficient to
>prevent the large differences in estimates relative to the equivalent
>IN clauses. The small differences between the way that OR and IN
>clauses are handled have always been there, and I think that changing
>that is out of scope for this work.
>

Not sure. I think the inconsistency between plan and extended stats may
be a bit surprising, but I agree that issue may be negligible.

>The other thing that I'm still concerned about is the possibility of
>returning estimates with P(a,b) > P(a) or P(b). I think that such a
>thing becomes much more likely with the new types of clause supported
>here, because they now allow multiple values from each column, where
>before we only allowed one. I took another look at the patch I posted
>on the other thread, and I've convinced myself that it's correct.
>Attached is an updated version, with some cosmetic tidying up and now
>with some additional regression tests.
>

Yeah, I agree that's something we need to fix. Do you plan to push the
fix, or do you want me to do it?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Julien Rouhaud
Date: 17 March 2020, 15:31:36
Subject: Re: WAL usage calculation patch

From: Tom Lane
Date: 17 March 2020, 16:11:16
Subject: Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

Previous

Next