Home > mailing lists

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. - Mailing list pgsql-hackers

From	David Gould
Subject	Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.
Date	March 2, 2018 01:27:26
Msg-id	20180301142726.0f276e7a@engels Whole thread Raw
In response to	Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
List	pgsql-hackers

Tree view

On Thu, 1 Mar 2018 17:25:09 +0300
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:

> Well, that sounds reasonable. But the problem with the moving average 
> calculation remains. Suppose you run vacuum and not analyze. If the 
> updates are random enough, vacuum won't be able to reclaim all the 
> pages, so the number of pages will grow. Again, we'll have the same 
> thing where the number of pages grows, the real number of live tuples 
> stays constant, and the estimated reltuples grows after each vacuum run.

I agree VACUUM's moving average may be imperfect, but the rationale makes
sense and I don't have a plan to improve it now. This patch only intends to
improve the behavior of ANALYZE by using the estimated row density time
relpages to get reltuples. It does not change VACUUM.

The problem with the moving average for ANALYZE is that it prevents ANALYZE
from changing the reltuples estimate enough for large tables.

Consider this based on the test setup from the patch:

create table big as select id*p, ten, hun, thou, tenk, lahk, meg, padding
  from reltuples_test,
  generate_series(0,9) g(p);
-- SELECT 100000000
alter table big set (autovacuum_enabled=false);

select count(*) from big;
--    count
--  100000000
select reltuples::int, relpages from pg_class where relname = 'big';
--  reltuples | relpages
--          0 |        0

analyze verbose big;
-- INFO:  analyzing "public.big"
-- INFO:  "big": scanned 30000 of 1538462 pages, containing 1950000 live rows and 0 dead rows;
--        30000 rows in sample, 100000030 estimated total rows

select reltuples::int, relpages from pg_class where relname = 'big';
--  reltuples | relpages 
--  100000032 |  1538462

delete from big where ten > 1;
-- DELETE 80000000
select count(*) from big;
--   count   
--  20000000
select reltuples::int, relpages from pg_class where relname = 'big';
--  reltuples | relpages 
--  100000032 |  1538462

analyze verbose big;
-- INFO:  analyzing "public.big"
-- INFO:  "big": scanned 30000 of 1538462 pages, containing 388775 live rows and 1561225 dead rows;
--        30000 rows in sample, 98438807 estimated total rows

select reltuples::int, relpages from pg_class where relname = 'big';
 reltuples | relpages 
  98438808 |  1538462
select count(*) from big;
--   count   
--  20000000

analyze verbose big;
-- INFO:  analyzing "public.big"
-- INFO:  "big": scanned 30000 of 1538462 pages, containing 390885 live rows and 1559115 dead rows;
--         30000 rows in sample, 96910137 estimated total rows

select reltuples::int, relpages from pg_class where relname = 'big';
 reltuples | relpages 
  96910136 |  1538462

Table big has 1.5 million pages. ANALYZE samples 30 thousand. No matter how
many rows we change in T, ANALYZE can only change the reltuples estimate
by old_estimate + new_estimate * (30000/1538462), ie about 1.9 percent.

With the patch on this same table we get:

select count(*) from big;
--   count
--  20000000
select reltuples::int, relpages from pg_class where relname = 'big';
 reltuples | relpages 
  96910136 |  1538462

analyze verbose big;
-- INFO:  analyzing "public.big"
-- INFO:  "big": scanned 30000 of 1538462 pages, containing 390745 live rows and 1559255 dead rows;
--        30000 rows in sample, 20038211 estimated total rows

select reltuples::int, relpages from pg_class where relname = 'big';
--  reltuples | relpages
--   20038212 |  1538462

-dg

-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

pgsql-hackers by date:

From: Dean Rasheed
Date: 02 March 2018, 01:26:48
Subject: Re: MCV lists for highly skewed distributions

From: Andres Freund
Date: 02 March 2018, 01:29:58
Subject: Re: [PATCH] GET DIAGNOSTICS FUNCTION_NAME

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. - Mailing list pgsql-hackers

Previous

Next