Planner estimates and VACUUM/autovacuum - Mailing list pgsql-performance

From Artur Zając CFI
Subject Planner estimates and VACUUM/autovacuum
Date
Msg-id 033801cf2111$d3518750$79f495f0$@cfi.pl
Whole thread Raw
List pgsql-performance

Hi,

 

I am not sure if it is bug or not but I found some strange behaviour. Maybe it is the same as described on http://www.postgresql.org/message-id/14616.1244317490@sss.pgh.pa.us ?). If yes – I’m sorry for the trouble, but I think that my example is more obvious.

 

Tested on PostgreSQL 9.2.4 and 9.2.6.

 

Console 1:

BEGIN;

DECLARE a CURSOR FOR SELECT * FROM tab;

--- Keep cursor open for disallow full vacuum of tab

 

Console 2:

SELECT count(*) FROM tab;    

---- Result: 3588;

select reltuples from pg_class where relname='table’; 

--- Result: 3588

UPDATE tab SET id=id;

UPDATE tab SET id=id;

UPDATE tab SET id=id;

VACUUM ANALYZE tab;

select reltuples from pg_class where relname='table’;

--- Result: 3588

 

Now wait few seconds J

 

select reltuples from pg_class where relname='table’;

--- Result: 12560

 

VACUUM ANALYZE tab;

select reltuples from pg_class where relname='table’;

--- Result: 3588

 

There is 3588 live records and 12560 live+dead records in table.

That is strange for me. VACUUM updates pg_class.reltuples differently (only live roiws count) than autovacuum (live and dead rows). Why?

 

Also in planning:

 

explain SELECT id FROM tab;

                              QUERY PLAN

----------------------------------------------------------------------

Seq Scan on tab  (cost=0.00..1074.60 rows=12560 width=4)

 

Estimation is done with the use of current pg_class.reltuples value.  This value includes dead rows count after autovacuum so estimation is bad, especially in more complex planner tree, for example:

 

Explain SELECT a.id FROM tab AS a JOIN tab AS b USING (id);

 

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

Nested Loop  (cost=0.00..6410.70 rows=12560 width=4)

   ->  Seq Scan on tab a  (cost=0.00..1074.60 rows=12560 width=8)

   ->  Index Only Scan using tab_pkey on tab b  (cost=0.00..0.41 rows=1 width=4)

         Index Cond: (id = a.id)

 

PostgreSQL estimates 12560 records in query result. This is wrong estimation if dead tuples are removed during seq scan or index scan (I suppose that it is).

 

I don’t think that AUTOVACUUM and VACUUM ANALYZE should behave differently L

 

--------------------------------------------------------------------------

Artur Zajac

 

pgsql-performance by date:

Previous
From: Dattaram Porob
Date:
Subject: Re: Increased memory utilization by pgsql backend after upgrade from 9.1.3 to 9.2.6
Next
From: "Pweaver (Paul Weaver)"
Date:
Subject: Postgres Query Plan Live Lock