Re: Bad plan on a huge table query - Mailing list pgsql-general

From Jeff Janes
Subject Re: Bad plan on a huge table query
Date
Msg-id CAMkU=1w3LAxjq1SLsdCbhQEP4Y2jUyrQAuOBTvjuwaP8p6E16w@mail.gmail.com
Whole thread Raw
In response to Re: Bad plan on a huge table query  (Daniel Cristian Cruz <danielcristian@gmail.com>)
List pgsql-general
On Tue, Mar 26, 2013 at 5:08 AM, Daniel Cristian Cruz <danielcristian@gmail.com> wrote:
Well, I did it:

explain (analyze, buffers) 
select count(*) from turma.aula_confirmacao where 
inicio_aula::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP; -- changed name because of a conflict in some queries


That looks reasonable now.  But what changed?


And just to update, this is the actual query and plan:

...
 

The part of this complex query that corresponds to that simple query now looks reasonable, while before it did not.  What changed in the mean time? Whatever it is that changed (a good vacuum analyze?), you probably need to do the same thing on  idx_presenca_3 .


But I see that the difference is not that the estimate moved, but rather that the number of actual rows moved, to be closer to the estimate.  I don't know what to make of that.


I guess that, there is something with estudante_periodo, because there is 24% with only one row and 50% with 5 or less rows on it:

That could be, but I don't really know how to tackle that.  So I'm starting at the bottom level queries that are simple (i.e. just an index scan, no complex joins or filters) and yet are poorly estimated, because those ones are tractable.  Maybe the horrible estimates at the top are just the cumulation of bad estimates at the bottom (but maybe they are not--but still it seems to make sense the tackle the easy ones first)  
 


> After the refactoring, idx_aula_confirmacao_2 became idx_aula_confirmacao_1:

select * from pg_stats where tablename = 'idx_aula_confirmacao_1';


Alas, this information is not so interesting anymore, because now the query is getting the estimate at this step pretty close.  It is back when the estimate was so horrible that this info would be interesting.


 
-[ RECORD 1 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | turma
tablename              | idx_aula_confirmacao_1
attname                | inicio_aula
inherited              | f
null_frac              | 0.996792

But, with the null_frac so high, a partial index for "where ... not null" might be a good idea.  I doubt it would make the query change the execution plan, but it would at least take up less disk space and so be more cacheable.

Cheers,

Jeff

pgsql-general by date:

Previous
From: Misa Simic
Date:
Subject: Re: Money casting too liberal?
Next
From: Tore Halvorsen
Date:
Subject: Problem with pg_basebackup and streaming replication. (9.2.3 / win64)