Hi,
Attached is a WIP/PoC fix addressing the O(N^2) behavior in ANALYZE with
high statistic target values. It needs more work, but it's good enough to
show some measurements.
For benchmark, I've created a simple 2-column table, with MCV list on
those two columns:
CREATE TABLE t (a int, b int);
CREATE STATISTICS s (mcv) ON a,b FROM t;
and then loaded data sets with different numbers of random combinations,
determined by number of values in each column. For example with 10 values
in a column, you get ~100 combinations.
INSERT INTO t
SELECT 10*random(), 10*random() FROM generate_series(1,3e6);
The 3M rows is picked because that's the sample size with target 10000.
The results with different statistic targets look like this:
1) master
values 100 1000 5000 10000
====================================================
10 103 586 2419 3041
100 116 789 4778 8934
1000 116 690 3162 499748
2) patched
values 100 1000 5000 10000
====================================================
10 113 606 2460 3716
100 143 711 3371 5231
1000 156 994 3836 6002
3) comparison (patched / master)
values 100 1000 5000 10000
====================================================
10 110% 103% 102% 122%
100 123% 90% 71% 59%
1000 134% 144% 121% 1%
So clearly, the issue for large statistic targets is resolved (duration
drops from 500s to just 6s), but there is measurable regression for the
other cases. That needs more investigation & fix before commit.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services