Re: PATCH: add support for IN and @> in functional-dependencystatistics use - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: PATCH: add support for IN and @> in functional-dependencystatistics use
Date
Msg-id CAEZATCXnQtw-ZytE+rp1Uby9KMs0hWim-unMa-nMdmS5MMgtbQ@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: add support for IN and @> in functional-dependencystatistics use  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: add support for IN and @> in functional-dependencystatistics use  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
[ For the sake of the archives, some of the discussion on the other
thread [1-3] should really have been on this thread. ]

On Sun, 2 Feb 2020 at 18:41, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I think the challenge here is in applying the functional dependency
> computed for the whole array to individual elements. I'm not sure we can
> do that.
>
> For example, with a table like this:
>
>    CREATE TABLE t (a int, b int[]);
>    CREATE STATISTICS s (dependencies) ON a, b FROM t;
>
> Let's say the functional dependency is "perfect" i.e. has strength 1.0.
> But that only tells us dependency for complete array values, we don't
> know how much information we gain by knowledge of subset of the values.
>

The more I think about this example, the more I think this is really
just a special case of the more general problem of compatibility of
clauses. Once you add support for IN (...) clauses, any query of the
form

  SELECT ... WHERE (any clauses on col a) AND (any clauses on col b)

can be recast as

  SELECT ... WHERE a IN (...) AND b IN (...)

so any counter-example with bad estimates produced with a query in the
first form can also be written in the second form.

I think we should really be thinking in terms of making a strong
functional dependency (a => b) applicable generally to queries in the
first form, which will work well if the clauses on b are compatible
with those on b, but not if they're incompatible. However, that's not
so very different from the current state without extended stats, which
assumes independence, and will return poor estimates if the
columns/clauses aren't independent.

So I'd be tempted to apply a tidied up version of the patch from [3],
and then lift all restrictions from dependency_is_compatible_clause(),
other than the requirement that the clause refer to a single variable.

Regards,
Dean

[1] https://www.postgresql.org/message-id/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20200309181915.5lxhuw2qxoihfoqo%40development
[3] https://www.postgresql.org/message-id/CAEZATCUic8PwhTnexC%2BUx-Z_e5MhWD-8jk%3DJ1MtnVW8TJD%2BVHw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: "曾文旌(义从)"
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Julien Rouhaud
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)