Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. - Mailing list pgsql-hackers
From | David Gould |
---|---|
Subject | Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. |
Date | |
Msg-id | 20180301142726.0f276e7a@engels Whole thread Raw |
In response to | Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>) |
List | pgsql-hackers |
On Thu, 1 Mar 2018 17:25:09 +0300 Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > Well, that sounds reasonable. But the problem with the moving average > calculation remains. Suppose you run vacuum and not analyze. If the > updates are random enough, vacuum won't be able to reclaim all the > pages, so the number of pages will grow. Again, we'll have the same > thing where the number of pages grows, the real number of live tuples > stays constant, and the estimated reltuples grows after each vacuum run. I agree VACUUM's moving average may be imperfect, but the rationale makes sense and I don't have a plan to improve it now. This patch only intends to improve the behavior of ANALYZE by using the estimated row density time relpages to get reltuples. It does not change VACUUM. The problem with the moving average for ANALYZE is that it prevents ANALYZE from changing the reltuples estimate enough for large tables. Consider this based on the test setup from the patch: create table big as select id*p, ten, hun, thou, tenk, lahk, meg, padding from reltuples_test, generate_series(0,9) g(p); -- SELECT 100000000 alter table big set (autovacuum_enabled=false); select count(*) from big; -- count -- 100000000 select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 0 | 0 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 30000 of 1538462 pages, containing 1950000 live rows and 0 dead rows; -- 30000 rows in sample, 100000030 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 100000032 | 1538462 delete from big where ten > 1; -- DELETE 80000000 select count(*) from big; -- count -- 20000000 select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 100000032 | 1538462 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 30000 of 1538462 pages, containing 388775 live rows and 1561225 dead rows; -- 30000 rows in sample, 98438807 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; reltuples | relpages 98438808 | 1538462 select count(*) from big; -- count -- 20000000 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 30000 of 1538462 pages, containing 390885 live rows and 1559115 dead rows; -- 30000 rows in sample, 96910137 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; reltuples | relpages 96910136 | 1538462 Table big has 1.5 million pages. ANALYZE samples 30 thousand. No matter how many rows we change in T, ANALYZE can only change the reltuples estimate by old_estimate + new_estimate * (30000/1538462), ie about 1.9 percent. With the patch on this same table we get: select count(*) from big; -- count -- 20000000 select reltuples::int, relpages from pg_class where relname = 'big'; reltuples | relpages 96910136 | 1538462 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 30000 of 1538462 pages, containing 390745 live rows and 1559255 dead rows; -- 30000 rows in sample, 20038211 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 20038212 | 1538462 -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
pgsql-hackers by date: