Re: MCV lists for highly skewed distributions - Mailing list pgsql-hackers

From Robert Haas
Subject Re: MCV lists for highly skewed distributions
Date
Msg-id CA+TgmoaqvtPa+YwU6iHXS7LPk_Z1fFxYjmDdviunatq46w7EqA@mail.gmail.com
Whole thread Raw
In response to Re: MCV lists for highly skewed distributions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: MCV lists for highly skewed distributions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On Wed, Feb 7, 2018 at 3:51 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Thanks for testing. I agree, this new algorithm seems to stand up
> pretty well in the testing I've done too. One thing about it that can
> be tuned is the cutoff threshold for the relative standard error -- I
> chose 10% completely arbitrarily, but it could just as easily be 20%,
> 30% or even higher (in the patch s/0.01/0.04/ or s/0.01/0.09).

Maybe it'd be worth having a separate GUC for this, and a reloption to
override the GUC.  It seems to me that it would be very reasonable to
want to separately control (a) how much sampling you want to do and
(b) how aggressively you want to be about including things in the MCV
list.  Of course, even if we do that, it doesn't excuse us from
needing to set a good default value.  And it might not be necessary to
do the extra work for this.

Looking at your data, it definitely seems like 10% would be too strict
-- if I'm reading this correctly, with a stats target in the 10-50
range, your normally-distributed table gets no MCVs at all, rather
than a number equal to the statistics target.  That doesn't seem good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] path toward faster partition pruning
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] path toward faster partition pruning