Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. - Mailing list pgsql-hackers
From | David Gould |
---|---|
Subject | Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. |
Date | |
Msg-id | 20180313001440.581b6662@engels Whole thread Raw |
In response to | Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
|
List | pgsql-hackers |
On Mon, 12 Mar 2018 12:21:34 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Maybe this type of situation is an argument for trusting an ANALYZE-based > > estimate more than the VACUUM-based estimate. I remain uncomfortable with > > that in cases where VACUUM looked at much more of the table than ANALYZE > > did, though. Maybe we need some heuristic based on the number of pages > > actually visited by each pass? > > I looked into doing something like that. It's possible, but it's fairly > invasive; there's no clean way to compare those page counts without > altering the API of acquire_sample_rows() to pass back the number of pages > it visited. That would mean a change in FDW-visible APIs. We could do > that, but I don't want to insist on it when there's nothing backing it up > except a fear that *maybe* ANALYZE's estimate will be wrong often enough > to worry about. > > So at this point I'm prepared to go forward with your patch, though not > to risk back-patching it. Field experience will tell us if we need to > do more. I propose the attached cosmetic refactoring, though. I like the re-factor. Making vac_estimate_reltuples() specific to the special case of vacuum and having the normal analyze case just in analyze seems like an improvement overall. It it helps I have been experimenting with your thought experiment (update first 20% of rows, then delete 50% of those) to try to trick analyze. I built test scripts and generate data and found the the current system after vacuum is usually about 8% to 10% off on reltuples. Analyze moves it very slowly closer to the true count. With the patch analyze nails it immediately. To see how this was affected by the relationship of table size and sample I created another test setup just to iterate analyze runs and compare to the true count. fter a couple thousand analyzes with various table mutations and table sizes up to 100 M rows, (1.8 M pages) and default_statistics_targets ranging from 1 to 1000 I am pretty confident that we can get 2% accuracy for any size table even with the old statistics target. The table size does not really matter much, the error drops and clusters more tightly as sample size increases but past 10000 or so it's well into diminishing returns. Summary of 100 analyze runs for each line below. Errors and sample fraction are in percent for easy reading. / ----------- percent ---------------/ testcase | Mrows | stats | pages | sample | fraction | maxerr | avgerr | stddev -----------+-------+-------+---------+--------+----------+--------+--------+--------- first-last | 10 | 1 | 163935 | 300 | 0.001830 | 6.6663 | 2.3491 | 2.9310 first-last | 10 | 3 | 163935 | 900 | 0.005490 | 3.8886 | 1.2451 | 1.5960 first-last | 10 | 10 | 163935 | 3000 | 0.018300 | 2.8337 | 0.7539 | 0.9657 first-last | 10 | 33 | 163935 | 9900 | 0.060390 | 1.4903 | 0.3723 | 0.4653 first-last | 10 | 100 | 163935 | 30000 | 0.182999 | 0.6580 | 0.2221 | 0.2707 first-last | 10 | 333 | 163935 | 99900 | 0.609388 | 0.1960 | 0.0758 | 0.0897 first-last | 100 | 1 | 1639345 | 300 | 0.000183 | 8.7500 | 2.2292 | 2.8685 first-last | 100 | 3 | 1639345 | 900 | 0.000549 | 5.4166 | 1.1695 | 1.5431 first-last | 100 | 10 | 1639345 | 3000 | 0.001830 | 1.7916 | 0.6258 | 0.7593 first-last | 100 | 33 | 1639345 | 9900 | 0.006039 | 1.8182 | 0.4141 | 0.5433 first-last | 100 | 100 | 1639345 | 30000 | 0.018300 | 0.9417 | 0.2464 | 0.3098 first-last | 100 | 333 | 1639345 | 99900 | 0.060939 | 0.4642 | 0.1206 | 0.1542 first-last | 100 | 1000 | 1639345 | 300000 | 0.183000 | 0.2192 | 0.0626 | 0.0776 un-updated | 10 | 1 | 180328 | 300 | 0.001664 | 7.9259 | 2.2845 | 2.7806 un-updated | 10 | 3 | 180328 | 900 | 0.004991 | 4.2964 | 1.2923 | 1.5990 un-updated | 10 | 10 | 180328 | 3000 | 0.016636 | 2.2593 | 0.6734 | 0.8271 un-updated | 10 | 33 | 180328 | 9900 | 0.054900 | 0.9260 | 0.3305 | 0.3997 un-updated | 10 | 100 | 180328 | 30000 | 0.166364 | 1.0162 | 0.2024 | 0.2691 un-updated | 10 | 333 | 180328 | 99900 | 0.553991 | 0.2058 | 0.0683 | 0.0868 un-updated | 100 | 1 | 1803279 | 300 | 0.000166 | 7.1111 | 1.8793 | 2.3820 un-updated | 100 | 3 | 1803279 | 900 | 0.000499 | 3.8889 | 1.0586 | 1.3265 un-updated | 100 | 10 | 1803279 | 3000 | 0.001664 | 2.1407 | 0.6710 | 0.8376 un-updated | 100 | 33 | 1803279 | 9900 | 0.005490 | 1.1728 | 0.3779 | 0.4596 un-updated | 100 | 100 | 1803279 | 30000 | 0.016636 | 0.6256 | 0.1983 | 0.2551 un-updated | 100 | 333 | 1803279 | 99900 | 0.055399 | 0.3454 | 0.1181 | 0.1407 un-updated | 100 | 1000 | 1803279 | 300000 | 0.166364 | 0.1738 | 0.0593 | 0.0724 I also thought about the theory and am confident that there really is no way to trick it. Basically if there are enough pages that are different to affect the overall density, say 10% empty or so, there is no way a random sample larger than a few hundred probes can miss them no matter how big the table is. If there are few enough pages to "hide" from the sample, then they are so few they don't matter anyway. After all this my vote is for back patching too. I don't see any case where the patched analyze is or could be worse than what we are doing. I'm happy to provide my test cases if anyone is interested. Thanks -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
pgsql-hackers by date: