extended statistics - functional dependencies vs. MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject extended statistics - functional dependencies vs. MCV lists
Date
Msg-id da3bb397-26df-8a0e-4ff4-0cfacb0ababf@enterprisedb.com
Whole thread Raw
List pgsql-hackers
Hi,

over in the pgsql-general channel, Michael Lewis reported [1] a bit
strange behavior switching between good/bad estimates with extended
statistics.

The crux of the issue is that with statistics containing both MCV and
functional dependencies, we prefer applying the MCV. And functional
dependencies are used only for the remaining clauses on columns not
covered by the MCV list.

This works perfectly fine when the clauses match a MCV item (or even
multiple of them). But if there's no matching MCV item, this may be
problematic - statext_mcv_clauselist_selectivity tries to be smart, but
when the MCV represents only a small fraction of the data set the
results may not be far from just a product of selectivities (as if the
clauses were independent).

So I'm wondering about two things:

1) Does it actually make sense to define extended statistics with both
MCV and functional dependencies? ISTM the MCV part will always filter
all the clauses, before we even try to apply the dependencies.

2) Could we consider the functional dependencies when estimating the
part not covered by the MCV list. Of course, this could only help with
equality clauses (as supported by functional dependencies).


regards


[1]
https://www.postgresql.org/message-id/CAMcsB%3Dy%3D3G_%2Bs_zFYPu2-O6OMWOvOkb3t1MU%3D907yk5RC_RaYw%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Autovacuum worker doesn't immediately exit on postmaster death
Next
From: David Rowley
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans