Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: add support for IN and @> in functional-dependencystatistics use |
Date | |
Msg-id | 20200317153706.z6pdtijr25iyr424@development Whole thread Raw |
In response to | Re: PATCH: add support for IN and @> in functional-dependencystatistics use (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: PATCH: add support for IN and @> in functional-dependencystatistics use
(Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: PATCH: add support for IN and @> in functional-dependencystatistics use (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote: >On Sat, 14 Mar 2020 at 18:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> I realized there's one more thing that probably needs discussing. >> Essentially, these two clause types are the same: >> >> a IN (1, 2, 3) >> >> (a = 1 OR a = 2 OR a = 3) >> >> but with 8f321bd1 we only recognize the first one as compatible with >> functional dependencies. It was always the case that we estimated those >> two clauses a bit differently, but the differences were usually small. >> But now that we recognize IN as compatible with dependencies, the >> difference may be much larger, which bugs me a bit ... >> >> So I wonder if we should recognize the special form of an OR clause, >> with all Vars referencing to the same attribute etc. and treat this as >> supported by functional dependencies - the attached patch does that. >> MCV lists there's already no difference because OR clauses are >> supported. >> > >Makes sense, and the patch looks straightforward enough. > >> The question is whether we want to do this, and whether we should also >> teach the per-column estimates to recognize this special case of IN >> clause. > >I'm not convinced about that second part though. I'd say that >recognising the OR clause for functional dependencies is sufficient to >prevent the large differences in estimates relative to the equivalent >IN clauses. The small differences between the way that OR and IN >clauses are handled have always been there, and I think that changing >that is out of scope for this work. > Not sure. I think the inconsistency between plan and extended stats may be a bit surprising, but I agree that issue may be negligible. >The other thing that I'm still concerned about is the possibility of >returning estimates with P(a,b) > P(a) or P(b). I think that such a >thing becomes much more likely with the new types of clause supported >here, because they now allow multiple values from each column, where >before we only allowed one. I took another look at the patch I posted >on the other thread, and I've convinced myself that it's correct. >Attached is an updated version, with some cosmetic tidying up and now >with some additional regression tests. > Yeah, I agree that's something we need to fix. Do you plan to push the fix, or do you want me to do it? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: