Re: More stable query plans via more predictable column statistics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: More stable query plans via more predictable column statistics
Date
Msg-id 31801.1459545281@sss.pgh.pa.us
Whole thread Raw
In response to Re: More stable query plans via more predictable column statistics  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: More stable query plans via more predictable column statistics
List pgsql-hackers
"Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
> Alright.  I'm attaching the latest version of this patch split in two
> parts: the first one is NULLs-related bugfix and the second is the
> "improvement" part, which applies on top of the first one.

I've applied the first of these patches, broken into two parts first
because it seemed like there were two issues and second because Tomas
deserved primary credit for one part, ie realizing we were using the
Haas-Stokes formula wrong.

As for the other part, I committed it with one non-cosmetic change:
I do not think it is right to omit "too wide" values when considering
the threshold for MCVs.  As submitted, the patch was inconsistent on
that point anyway since it did it differently in compute_distinct_stats
and compute_scalar_stats.  But the larger picture here is that we define
the MCV population to exclude nulls, so it's reasonable to consider a
value as an MCV even if it's greatly outnumbered by nulls.  There is
no such exclusion for "too wide" values; those things are just an
implementation limitation in analyze.c, not something that is part of
the pg_statistic definition.  If there are a lot of "too wide" values
in the sample, we don't know whether any of them are duplicates, but
we do know that the frequencies of the normal-width values have to be
discounted appropriately.

Haven't looked at 0002 yet.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Next
From: Christoph Berg
Date:
Subject: pg_upgrade 9.6->9.6: column "amtype" does not exist