Re: More stable query plans via more predictable column statistics - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: More stable query plans via more predictable column statistics
Date
Msg-id CAASwCXdX4yneNF-J_=7yeQb+YbWGtGSp=LX4nMFLkV9jxPoyHQ@mail.gmail.com
Whole thread Raw
In response to Re: More stable query plans via more predictable column statistics  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: More stable query plans via more predictable column statistics  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-hackers
Hi Alex,

Thanks for excellent research.

I've ran your queries against Trustly's production database and I can
confirm your findings, the results are similar:

WITH ...
SELECT count(1),      min(hist_ratio)::real,      avg(hist_ratio)::real,      max(hist_ratio)::real,
stddev(hist_ratio)::realFROM stats2WHERE histogram_bounds IS NOT NULL; 

-[ RECORD 1 ]----
count  | 2814
min    | 0.193548
avg    | 0.927357
max    | 1
stddev | 0.164134


WHERE distinct_hist < num_hist
-[ RECORD 1 ]----
count  | 624
min    | 0.193548
avg    | 0.672407
max    | 0.990099
stddev | 0.194901


WITH ..
SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,      n_distinct, null_frac,      num_mcv, most_common_vals,
most_common_freqs,     mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,      distinct_hist, num_hist,
hist_ratio,     histogram_bounds FROM stats2ORDER BY hist_ratioLIMIT 1; 
-[ RECORD 1
]-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
columnname        | public.x.y
n_distinct        | 103
null_frac         | 0
num_mcv           | 10
most_common_vals  | {0,1,2,3,4,5,6,7,8,9}
most_common_freqs |
{0.4765,0.141733,0.1073,0.0830667,0.0559667,0.0373333,0.0251,0.0188,0.0141,0.0113667}
mcv_frac          | 0.971267
nonnull_mcv_frac  | 0.971267
distinct_hist     | 18
num_hist          | 93
hist_ratio        | 0.193548387096774
histogram_bounds  |

{10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}



On Mon, Jan 18, 2016 at 4:46 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr
> <oleksandr.shulgin@zalando.de> wrote:
>>
>> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
>>> > This post summarizes a few weeks of research of ANALYZE statistics
>>> > distribution on one of our bigger production databases with some
>>> > real-world
>>> > data and proposes a patch to rectify some of the oddities observed.
>>>
>>> Please add this to the 2016-01 commitfest ...
>>
>>
>> Added: https://commitfest.postgresql.org/8/434/
>
>
> It would be great if some folks could find a moment to run the queries I was
> showing on their data to confirm (or refute) my findings, or to contribute
> to the picture in general.
>
> As I was saying, the queries were designed in such a way that even
> unprivileged user can run them (the results will be limited to the stats
> data available to that user, obviously; and for custom-tailored statstarget
> one still needs superuser to join the pg_statistic table directly).  Also,
> on the scale of ~30k attribute statistics records, the queries take only a
> few seconds to finish.
>
> Cheers!
> --
> Alex
>



--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Allowing to run a buildfarm animal under valgrind
Next
From: Amit Kapila
Date:
Subject: Re: Relation extension scalability