Re: MCV lists for highly skewed distributions - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: MCV lists for highly skewed distributions |
Date | |
Msg-id | CAJVSVGUOw4DWVjTK1eb5jOiYVQK8s9m8MejqpdPoRDmwk9icUg@mail.gmail.com Whole thread Raw |
In response to | Re: MCV lists for highly skewed distributions (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: MCV lists for highly skewed distributions
|
List | pgsql-hackers |
On 3/19/18, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > As promised, here is a new patch, with comment updates, per John and > Tomas' suggestions, plus the continuity correction, which seemed just > about worthwhile. Great. I'm happy with the behavior of the patch. I've marked it ready for committer. > I repeated these tests with a 2-SD continuity-corrected threshold and > the results fell somewhere between the 2-SD and 2.5-SD cases. My > inclination is to go with that, as something that strikes a reasonable > balance, and represents a distinct improvement over master in a number > of different areas. I also ran some tests with a hand-edited continuity correction last night, but just now got around to looking at the results (queries attached). I ran largely as before, but did 20 analyze runs with each file instead of 10, using the V2 patch, the CC patch, and the V2 patch with 2.5 threshold. I took out a bunch of the uniform tests to save time since they largely behave the same. ----------------- Non-uniform: To reduce noise for analysis, I tried filtering out the least and greatest distinct value before taking the average for the underestimation ratio for most common non-MCV. I also removed results where all 3 patches had a full MCV list. There was still enough noise that it's impossible to discern differences between patches that are very similar. It's not like against HEAD where there were clear differences in this test, so I won't post those numbers. Looking at the number of MCVs, it's a little clearer. With few exceptions, the number either stays the same or decreases slightly going left to right: title | v20_num_mcvs | cc05_num_mcvs | v25_num_mcvs ------------------------------------------+--------------+---------------+-------------- Exp. dist. (N=500k, sd=0.25 (beta)) | 3 | 3 | 3 Exp. dist. (N=500k, sd=0.50 (beta)) | 5 | 6 | 5 Exp. dist. (N=500k, sd=1.00 (beta)) | 9 | 9 | 9 Exp. dist. (N=500k, sd=2.00 (beta)) | 15 | 15 | 15 Exp. dist. (N=500k, sd=3.00 (beta)) | 22 | 21 | 21 Exp. dist. (N=500k, sd=4.00 (beta)) | 27 | 27 | 26 Exp. dist. (N=500k, sd=5.00 (beta)) | 33 | 32 | 30 Exp. dist. (N=500k, sd=10.00 (beta)) | 60 | 58 | 56 Exp. dist. (N=500k, sd=20.00 (beta)) | 100 | 99 | 97 Laplace dist. (N=500k, b=0.25 (lambda)) | 5 | 6 | 5 Laplace dist. (N=500k, b=0.50 (lambda)) | 9 | 8 | 7 Laplace dist. (N=500k, b=1.00 (lambda)) | 15 | 15 | 15 Laplace dist. (N=500k, b=2.00 (lambda)) | 27 | 26 | 26 Laplace dist. (N=500k, b=3.00 (lambda)) | 38 | 37 | 36 Laplace dist. (N=500k, b=4.00 (lambda)) | 48 | 47 | 45 Laplace dist. (N=500k, b=5.00 (lambda)) | 58 | 57 | 55 Laplace dist. (N=500k, b=10.00 (lambda)) | 100 | 99 | 97 MNM (N=2000, peaks=20, sep=20, sd=15) | 100 | 100 | 62 Normal dist. (N=500k, sd=1) | 7 | 7 | 7 Normal dist. (N=500k, sd=2) | 15 | 15 | 14 Normal dist. (N=500k, sd=3) | 21 | 21 | 20 Normal dist. (N=500k, sd=4) | 28 | 26 | 26 Normal dist. (N=500k, sd=5) | 33 | 33 | 33 Normal dist. (N=500k, sd=6) | 39 | 38 | 38 Normal dist. (N=500k, sd=7) | 45 | 45 | 44 Normal dist. (N=500k, sd=8) | 51 | 49 | 49 Normal dist. (N=500k, sd=9) | 56 | 56 | 54 Normal dist. (N=500k, sd=10) | 63 | 62 | 60 Normal dist. (N=500k, sd=15) | 89 | 89 | 86 Pareto dist. (N=500, a=1.00) | 70 | 65 | 56 Pareto dist. (N=500, a=2.00) | 20 | 19 | 17 Pareto dist. (N=500, a=3.00) | 10 | 9 | 9 Pareto dist. (N=500, a=4.00) | 6 | 6 | 6 Pareto dist. (N=500, a=5.00) | 5 | 5 | 4 (34 rows) ----------------- Uniform: Ideally, we want an empty MCV list unless we're sampling a large portion of the table. As Dean mentioned, this is not as important as getting the non-uniform case right. That said, it's nice to be confident we can keep out flotsam. The number generally gets smaller as we go left to right. title | v20_num_mcvs | cc05_num_mcvs | v25_num_mcvs ---------------------------------------+--------------+---------------+-------------- Corr. unif. dist. (N=1000k, Nd=1000) | 13 | 9 | 2 Corr. unif. dist. (N=1000k, Nd=10000) | 33 | 8 | 0 Corr. unif. dist. (N=1000k, Nd=200) | 4 | 3 | 1 Corr. unif. dist. (N=1000k, Nd=2000) | 21 | 10 | 1 Corr. unif. dist. (N=1000k, Nd=500) | 9 | 8 | 1 Corr. unif. dist. (N=1000k, Nd=5000) | 15 | 15 | 2 Corr. unif. dist. (N=100k, Nd=1000) | 12 | 7 | 3 Corr. unif. dist. (N=100k, Nd=10000) | 15 | 2 | 0 Corr. unif. dist. (N=100k, Nd=200) | 4 | 3 | 1 Corr. unif. dist. (N=100k, Nd=2000) | 24 | 11 | 2 Corr. unif. dist. (N=100k, Nd=500) | 6 | 7 | 2 Corr. unif. dist. (N=100k, Nd=5000) | 25 | 6 | 1 Corr. unif. dist. (N=30k, Nd=150) | 100 | 100 | 100 Corr. unif. dist. (N=60k, Nd=1000) | 14 | 6 | 1 Corr. unif. dist. (N=60k, Nd=10000) | 0 | 0 | 0 Corr. unif. dist. (N=60k, Nd=200) | 4 | 4 | 1 Corr. unif. dist. (N=60k, Nd=2000) | 16 | 5 | 1 Corr. unif. dist. (N=60k, Nd=500) | 9 | 5 | 1 Corr. unif. dist. (N=60k, Nd=5000) | 16 | 1 | 0 Ran. unif. dist. (N=1000k, Nd=1000) | 16 | 9 | 2 Ran. unif. dist. (N=1000k, Nd=10000) | 33 | 9 | 1 Ran. unif. dist. (N=1000k, Nd=200) | 4 | 4 | 1 Ran. unif. dist. (N=1000k, Nd=2000) | 19 | 11 | 2 Ran. unif. dist. (N=1000k, Nd=500) | 9 | 7 | 1 Ran. unif. dist. (N=1000k, Nd=5000) | 15 | 15 | 2 Ran. unif. dist. (N=100k, Nd=1000) | 12 | 8 | 3 Ran. unif. dist. (N=100k, Nd=10000) | 16 | 1 | 0 Ran. unif. dist. (N=100k, Nd=200) | 4 | 4 | 1 Ran. unif. dist. (N=100k, Nd=2000) | 25 | 12 | 1 Ran. unif. dist. (N=100k, Nd=500) | 7 | 7 | 1 Ran. unif. dist. (N=100k, Nd=5000) | 25 | 6 | 1 Ran. unif. dist. (N=30k, Nd=150) | 100 | 100 | 100 Ran. unif. dist. (N=60k, Nd=1000) | 14 | 7 | 1 Ran. unif. dist. (N=60k, Nd=10000) | 0 | 0 | 0 Ran. unif. dist. (N=60k, Nd=200) | 5 | 4 | 1 Ran. unif. dist. (N=60k, Nd=2000) | 16 | 5 | 1 Ran. unif. dist. (N=60k, Nd=500) | 9 | 6 | 1 Ran. unif. dist. (N=60k, Nd=5000) | 14 | 1 | 0 (38 rows) -John Naylor
Attachment
pgsql-hackers by date: