Re: Optimizer choosing smaller index instead of right one - Mailing list pgsql-general

From Tom Lane
Subject Re: Optimizer choosing smaller index instead of right one
Date
Msg-id 18452.1083284132@sss.pgh.pa.us
Whole thread Raw
In response to Optimizer choosing smaller index instead of right one  (Ian Ribas <ian.ribas@digitro.com.br>)
Responses Re: Optimizer choosing smaller index instead of right one
List pgsql-general
Ian Ribas <ian.ribas@digitro.com.br> writes:
> The index I created to try to optimize it is "xie3cham_servico", and
> has all the three columns of the table that are used in the query. But
> the index being used is actually xie2cham_servico (as can be seen
> below) which has only two columns, one of them completely useless for
> the query!  I'm probably missing some setting, but I couldn't figure
> out which.

The optimizer's cost equations do say that a smaller index should be
preferred over a larger one if the number of rows obtainable from the
index is comparable (which I trust you will agree is reasonable).
I think the real issue here is a datatype conflict.  Look closely
at the filter conditions:

>          ->  Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual
time=13604.37..13980.16rows=3173 loops=1) 
>                Index Cond: (id_chave_grupo = '7458'::character varying)
>                Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone >
to_timestamp('2004-04-20'::text,'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now())) 

The two comparisons on dt_finalizacao are unindexable because they
coerce it to a different datatype; the "<" and ">" are the operators on
timestamp with time zone and so they are not relevant to an index on
timestamp without time zone.

Very likely your column really ought to be of type timestamp with time
zone.  If you are convinced it should be timestamp without time zone
then you need to fix the values you are comparing to.

I'm also a bit disturbed by the fact that the rows estimates are off by
more than two orders of magnitude (26 est vs 3173 actual).  It would
perhaps help if you wrote the boundary date as a simple constant:

    ... AND dt_finalizacao > '2004-04-20' AND ...

so that the planner had some chance of comparing the constant to its
statistics about the distribution of dt_finalizacao.  to_timestamp is
not a constant-foldable function (because it depends on the current
TimeZone setting) and so the query as you've written it looks like

    ... AND dt_finalizacao > unpredictable_expression AND ...

as far as the planner knows.  (I'm not sure this will help though;
it may be that the big problem is that the three columns are not
independent, which is something the planner doesn't know because it has
no cross-column statistics.)

            regards, tom lane

pgsql-general by date:

Previous
From: Richard Gass
Date:
Subject: TCP only listening on localhost???
Next
From: Tom Lane
Date:
Subject: Re: locking question