After VACUUM, statistics become skewed - Mailing list pgsql-admin

From Robert.Farrugia@go.com.mt
Subject After VACUUM, statistics become skewed
Date
Msg-id 20030521150018.0CBB1924EC2@developer.postgresql.org
Whole thread Raw
Responses Re: After VACUUM, statistics become skewed  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin

I have noticed the following problem on various large tables for certain queries.

Given a table around 4GB in size containing millions of records, before vacuuming the following query used the correct index (mo_200302_called_idx) which is built on answertime and callednumber_type.

Query is as follows:
select answertime::date, count(*), sum(callduration) from mobileorig_200302 where answertime between '2003/2/3 00:00:00' and '2003/2/14 23:59:59' and normal(dialleddigits_value) = '50043992' and callednumber_type in ('P', 'M') group by answertime::date

After doing a database wide vacuum full analyze, the above query is starting to use an incorrect index i.e. the callingnumber_type (which uses answertime and callingnumber_type fields).  This eventually slows down the system since the query takes more time to finish.

NOTICE:  QUERY PLAN:

Aggregate  (cost=9218923.08..9218935.56 rows=166 width=12)
  ->  Group  (cost=9218923.08..9218927.24 rows=1664 width=12)
        ->  Sort  (cost=9218923.08..9218923.08 rows=1664 width=12)
              ->  Index Scan using mo_200302_calling_idx on mobileorig_200302  (cost=0.00..9218834.06 rows=1664 width=12)

EXPLAIN


I managed to track down the problem to this.  If I used only one callednumber_type, i.e. the query becomes
select answertime::date, count(*), sum(callduration) from mobileorig_200302 where answertime between '2003/2/3 00:00:00' and '2003/2/14 23:59:59' and normal(dialleddigits_value) = '50043992' and callednumber_type in ('M') group by answertime::date

the query planner uses the correct index, while using more than one type, it skews up.

NOTICE:  QUERY PLAN:
Aggregate  (cost=137870.25..137871.57 rows=18 width=12)
  ->  Group  (cost=137870.25..137870.69 rows=177 width=12)
        ->  Sort  (cost=137870.25..137870.25 rows=177 width=12)
              ->  Index Scan using mo_200302_called_idx on mobileorig_200302  (cost=0.00..137863.66 rows=177 width=12)
EXPLAIN

Anyone else encountered something similar ? Any ideas on what is happening and if is solvable ?  Usually by dropping the table and reloading it from disk may solve the problem (it may take more than one try), but this is becoming unpractical due to the amount of data to restore each time.  I'll try re-indexing the tables and see if something happens.  Unfortunately this also happens on tables which have not been changed for a very long time and have also been re-indexed so I'm not confident on this.

The system runs on postgres 7.2.3.

Regards
Robert

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Logfile removal
Next
From: Tom Lane
Date:
Subject: Re: After VACUUM, statistics become skewed