Home > mailing lists

Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

From	Dean Rasheed
Subject	Re: PATCH: add support for IN and @> in functional-dependencystatistics use
Date	March 17, 2020 12:42:52
Msg-id	CAEZATCVV2NM0S=hZ1_qH-9B_psrC+5Tqx2xjmL9kHc9EsACSyA@mail.gmail.com Whole thread
In response to	Re: PATCH: add support for IN and @> in functional-dependencystatistics use (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: PATCH: add support for IN and @> in functional-dependencystatistics use
List	pgsql-hackers

Tree view

On Sat, 14 Mar 2020 at 18:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I realized there's one more thing that probably needs discussing.
> Essentially, these two clause types are the same:
>
>    a IN (1, 2, 3)
>
>    (a = 1 OR a = 2 OR a = 3)
>
> but with 8f321bd1 we only recognize the first one as compatible with
> functional dependencies. It was always the case that we estimated those
> two clauses a bit differently, but the differences were usually small.
> But now that we recognize IN as compatible with dependencies, the
> difference may be much larger, which bugs me a bit ...
>
> So I wonder if we should recognize the special form of an OR clause,
> with all Vars referencing to the same attribute etc. and treat this as
> supported by functional dependencies - the attached patch does that.
> MCV lists there's already no difference because OR clauses are
> supported.
>

Makes sense, and the patch looks straightforward enough.

> The question is whether we want to do this, and whether we should also
> teach the per-column estimates to recognize this special case of IN
> clause.

I'm not convinced about that second part though. I'd say that
recognising the OR clause for functional dependencies is sufficient to
prevent the large differences in estimates relative to the equivalent
IN clauses. The small differences between the way that OR and IN
clauses are handled have always been there, and I think that changing
that is out of scope for this work.

The other thing that I'm still concerned about is the possibility of
returning estimates with P(a,b) > P(a) or P(b). I think that such a
thing becomes much more likely with the new types of clause supported
here, because they now allow multiple values from each column, where
before we only allowed one. I took another look at the patch I posted
on the other thread, and I've convinced myself that it's correct.
Attached is an updated version, with some cosmetic tidying up and now
with some additional regression tests.

Regards,
Dean

Attachment

dependencies_clauselist_selectivity-v2.patch

pgsql-hackers by date:

From: David Steele
Date: 17 March 2020, 12:10:37
Subject: Re: Commitfest 2020-03 Now in Progress

From: Kartyshov Ivan
Date: 17 March 2020, 12:47:54
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed

Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

Attachment

Previous

Next