Re: Using multiple extended statistics for estimates - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Using multiple extended statistics for estimates |
Date | |
Msg-id | 20191106195440.wtxsyhiwdidl37hv@development Whole thread Raw |
In response to | Using multiple extended statistics for estimates (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Using multiple extended statistics for estimates
|
List | pgsql-hackers |
On Mon, Oct 28, 2019 at 04:20:48PM +0100, Tomas Vondra wrote: >Hi, > >PostgreSQL 10 introduced extended statistics, allowing us to consider >correlation between columns to improve estimates, and PostgreSQL 12 >added support for MCV statistics. But we still had the limitation that >we only allowed using a single extended statistics per relation, i.e. >given a table with two extended stats > > CREATE TABLE t (a int, b int, c int, d int); > CREATE STATISTICS s1 (mcv) ON a, b FROM t; > CREATE STATISTICS s2 (mcv) ON c, d FROM t; > >and a query > > SELECT * FROM t WHERE a = 1 AND b = 1 AND c = 1 AND d = 1; > >we only ever used one of the statistics (and we considered them in a not >particularly well determined order). > >This patch addresses this by using as many extended stats as possible, >by adding a loop to statext_mcv_clauselist_selectivity(). In each step >we pick the "best" applicable statistics (in the sense of covering the >most attributes) and factor it into the oveall estimate. > >All this happens where we'd originally consider applying a single MCV >list, i.e. before even considering the functional dependencies, so >roughly like this: > > while () > { > ... apply another MCV list ... > } > > ... apply functional dependencies ... > > >I've both in the loop, but I think that'd be wrong - the MCV list is >expected to contain more information about individual values (compared >to functional deps, which are column-level). > Here is a slightly polished v2 of the patch, the main difference being that computing clause_attnums was moved to a separate function. This is a fairly simple patch, and it's not entirely new functionality (applying multiple statistics was part of the very first patch seris, although of course in a very different form). So unless there are objections, I'd like to get this committed sometime next week. There's room for improvement, of course, for example when handling overlapping statistics. Consider a table with columns (a,b,c) and two extended statistics on (a,b) and (b,c), and query with one clause per column SELECT * FROM t WHERE a = 1 AND b = 1 AND c = 1 In this case the patch does not help, because we apply (a,b) and then we have just a single clause remaining. What we could do is still apply the (b,c) statistic, using the already-estimated clause on b as a condition. So essentially we'd compute P(a=1 && b=1) * P(c=1 | b=1) But that'll require larger changes, and I see it as an evolution of the current patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: