Home > mailing lists

Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

From	Dean Rasheed
Subject	Re: PATCH: add support for IN and @> in functional-dependencystatistics use
Date	March 12, 2020 10:25:41
Msg-id	CAEZATCXnQtw-ZytE+rp1Uby9KMs0hWim-unMa-nMdmS5MMgtbQ@mail.gmail.com Whole thread
In response to	Re: PATCH: add support for IN and @> in functional-dependencystatistics use (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: PATCH: add support for IN and @> in functional-dependencystatistics use
List	pgsql-hackers

Tree view

[ For the sake of the archives, some of the discussion on the other
thread [1-3] should really have been on this thread. ]

On Sun, 2 Feb 2020 at 18:41, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I think the challenge here is in applying the functional dependency
> computed for the whole array to individual elements. I'm not sure we can
> do that.
>
> For example, with a table like this:
>
>    CREATE TABLE t (a int, b int[]);
>    CREATE STATISTICS s (dependencies) ON a, b FROM t;
>
> Let's say the functional dependency is "perfect" i.e. has strength 1.0.
> But that only tells us dependency for complete array values, we don't
> know how much information we gain by knowledge of subset of the values.
>

The more I think about this example, the more I think this is really
just a special case of the more general problem of compatibility of
clauses. Once you add support for IN (...) clauses, any query of the
form

  SELECT ... WHERE (any clauses on col a) AND (any clauses on col b)

can be recast as

  SELECT ... WHERE a IN (...) AND b IN (...)

so any counter-example with bad estimates produced with a query in the
first form can also be written in the second form.

I think we should really be thinking in terms of making a strong
functional dependency (a => b) applicable generally to queries in the
first form, which will work well if the clauses on b are compatible
with those on b, but not if they're incompatible. However, that's not
so very different from the current state without extended stats, which
assumes independence, and will return poor estimates if the
columns/clauses aren't independent.

So I'd be tempted to apply a tidied up version of the patch from [3],
and then lift all restrictions from dependency_is_compatible_clause(),
other than the requirement that the clause refer to a single variable.

Regards,
Dean

[1] https://www.postgresql.org/message-id/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20200309181915.5lxhuw2qxoihfoqo%40development
[3] https://www.postgresql.org/message-id/CAEZATCUic8PwhTnexC%2BUx-Z_e5MhWD-8jk%3DJ1MtnVW8TJD%2BVHw%40mail.gmail.com

pgsql-hackers by date:

From: "曾文旌(义从)"
Date: 12 March 2020, 10:06:19
Subject: Re: [Proposal] Global temporary tables

From: Julien Rouhaud
Date: 12 March 2020, 10:31:15
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)

Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

Previous

Next