Thread: Planner estimates and VACUUM/autovacuum

Planner estimates and VACUUM/autovacuum

From
Artur Zając
Date:

Hi,

 

I am not sure if it is bug or not but I found some strange behaviour. Maybe it is the same as described on http://www.postgresql.org/message-id/14616.1244317490@sss.pgh.pa.us ?). If yes – I’m sorry for the trouble, but I think that my example is more obvious.

 

Tested on PostgreSQL 9.2.4 and 9.2.6.

 

Console 1:

BEGIN;

DECLARE a CURSOR FOR SELECT * FROM tab;

--- Keep cursor open for disallow full vacuum of tab

 

Console 2:

SELECT count(*) FROM tab;    

---- Result: 3588;

select reltuples from pg_class where relname='table’; 

--- Result: 3588

UPDATE tab SET id=id;

UPDATE tab SET id=id;

UPDATE tab SET id=id;

VACUUM ANALYZE tab;

select reltuples from pg_class where relname='table’;

--- Result: 3588

 

Now wait few seconds J

 

select reltuples from pg_class where relname='table’;

--- Result: 12560

 

VACUUM ANALYZE tab;

select reltuples from pg_class where relname='table’;

--- Result: 3588

 

There is 3588 live records and 12560 live+dead records in table.

That is strange for me. VACUUM updates pg_class.reltuples differently (only live roiws count) than autovacuum (live and dead rows). Why?

 

Also in planning:

 

explain SELECT id FROM tab;

                              QUERY PLAN

----------------------------------------------------------------------

Seq Scan on tab  (cost=0.00..1074.60 rows=12560 width=4)

 

Estimation is done with the use of current pg_class.reltuples value.  This value includes dead rows count after autovacuum so estimation is bad, especially in more complex planner tree, for example:

 

Explain SELECT a.id FROM tab AS a JOIN tab AS b USING (id);

 

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

Nested Loop  (cost=0.00..6410.70 rows=12560 width=4)

   ->  Seq Scan on tab a  (cost=0.00..1074.60 rows=12560 width=8)

   ->  Index Only Scan using tab_pkey on tab b  (cost=0.00..0.41 rows=1 width=4)

         Index Cond: (id = a.id)

 

PostgreSQL estimates 12560 records in query result. This is wrong estimation if dead tuples are removed during seq scan or index scan (I suppose that it is).

 

I don’t think that AUTOVACUUM and VACUUM ANALYZE should behave differently L

 

--------------------------------------------------------------------------

Artur Zajac

 

 

Re: Planner estimates and VACUUM/autovacuum

From
Tom Lane
Date:
=?iso-8859-2?Q?Artur_Zaj=B1c?= <azajac@ang.com.pl> writes:
> That is strange for me. VACUUM updates pg_class.reltuples differently (only
> live roiws count) than autovacuum (live and dead rows). Why?

I don't have time to poke into this in detail right now, but I think
that more likely the issue is that ANALYZE might have a different rule
than VACUUM.  autovacuum will fire those operations independently, which
doesn't match what you're doing by hand.

Which is not to say that we shouldn't think about fixing that, but that
it's important to understand the problem clearly first.

            regards, tom lane