Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: add support for IN and @> in functional-dependencystatistics use |
Date | |
Msg-id | 20200312173047.stp55sxoj4sipn7q@development Whole thread Raw |
In response to | Re: PATCH: add support for IN and @> in functional-dependencystatistics use (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: PATCH: add support for IN and @> in functional-dependencystatistics use
|
List | pgsql-hackers |
On Thu, Mar 12, 2020 at 10:25:41AM +0000, Dean Rasheed wrote: >[ For the sake of the archives, some of the discussion on the other >thread [1-3] should really have been on this thread. ] > >On Sun, 2 Feb 2020 at 18:41, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> I think the challenge here is in applying the functional dependency >> computed for the whole array to individual elements. I'm not sure we can >> do that. >> >> For example, with a table like this: >> >> CREATE TABLE t (a int, b int[]); >> CREATE STATISTICS s (dependencies) ON a, b FROM t; >> >> Let's say the functional dependency is "perfect" i.e. has strength 1.0. >> But that only tells us dependency for complete array values, we don't >> know how much information we gain by knowledge of subset of the values. >> > >The more I think about this example, the more I think this is really >just a special case of the more general problem of compatibility of >clauses. Once you add support for IN (...) clauses, any query of the >form > > SELECT ... WHERE (any clauses on col a) AND (any clauses on col b) > >can be recast as > > SELECT ... WHERE a IN (...) AND b IN (...) > >so any counter-example with bad estimates produced with a query in the >first form can also be written in the second form. > >I think we should really be thinking in terms of making a strong >functional dependency (a => b) applicable generally to queries in the >first form, which will work well if the clauses on b are compatible >with those on b, but not if they're incompatible. However, that's not >so very different from the current state without extended stats, which >assumes independence, and will return poor estimates if the >columns/clauses aren't independent. > I'm sorry, but I don't see how we could do this for arbitrary clauses. I think we could do that for clauses that have equality semantics and reference column values as a whole. So I think it's possible to do this for IN clauses (which is what the first part of the patch does), but I don't think we can do it for the containment operator. I.e. we can do that for WHERE a IN (...) AND b IN (...) but I don't see how we could do that for WHERE a @> (...) AND b @> (...) I don't think the dependency degree gives us any reliable insight into statistical dependency of elements of the values. Or maybe we're just talking about different things? You seem to be talking abotu IN clauses (which I think is doable), but my question was about using functional dependencies to estimate array containment clauses (which I think is not really doable). >So I'd be tempted to apply a tidied up version of the patch from [3], >and then lift all restrictions from dependency_is_compatible_clause(), >other than the requirement that the clause refer to a single variable. > I haven't looked at the patch from [3] closely yet, but you're right P(A & B) <= Min(P(A), P(B)) and the approach you proposed seems reasonable. I don't think how we can just remove all the restriction on clause type - the restriction that dependencies only handle equality-like clauses seems pretty much baked into the dependencies. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: