Re: Optimizer choosing smaller index instead of right one - Mailing list pgsql-general

From Ian Ribas
Subject Re: Optimizer choosing smaller index instead of right one
Date
Msg-id 409287F8.5040209@digitro.com.br
Whole thread Raw
In response to Re: Optimizer choosing smaller index instead of right one  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Effectively, I hadn't seen the cast being done on the field
dt_finalizacao! The query actually runs inside a pgplsql procedure, so
the parameters are unknown constants (variables), but the situation got
much much better when I forced a cast on the constant part of the
condition (as follows), so allowing the planner to correctly choose an
index :

explain analyze SELECT dt_finalizacao
FROM cham_servico
WHERE id_chave_grupo = '7458' AND cod_bxs = 1
AND dt_finalizacao > CAST(to_timestamp ('2004-04-20','YYYY-MM-DD') AS
TIMESTAMP(0) WITHOUT TIME ZONE)
AND dt_finalizacao < CAST(now() AS TIMESTAMP(0) WITHOUT TIME ZONE)
ORDER BY dt_finalizacao ASC LIMIT 1;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..364.98 rows=1 width=8) (actual time=52.73..52.90
rows=1 loops=1)
   ->  Index Scan using xie1cham_servico on cham_servico
(cost=0.00..9756.63 rows=27 width=8) (actual time=52.73..52.89 rows=2
loops=1)
         Index Cond: ((dt_finalizacao >
(to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text))::timestamp(0)
without time zone) AND (dt_finalizacao < (now())::timestamp(0) without
time zone))
         Filter: ((id_chave_grupo = '7458'::character varying) AND
(cod_bxs = 1))
 Total runtime: 53.11 msec
(5 rows)

Unfortunately, I really must use timestamp without time zone for some
other unrelated reasons ...

Thank you very much for your attention

Best wishes,
Ian.

Tom Lane wrote:

>Ian Ribas <ian.ribas@digitro.com.br> writes:
>
>
>>The index I created to try to optimize it is "xie3cham_servico", and
>>has all the three columns of the table that are used in the query. But
>>the index being used is actually xie2cham_servico (as can be seen
>>below) which has only two columns, one of them completely useless for
>>the query!  I'm probably missing some setting, but I couldn't figure
>>out which.
>>
>>
>
>The optimizer's cost equations do say that a smaller index should be
>preferred over a larger one if the number of rows obtainable from the
>index is comparable (which I trust you will agree is reasonable).
>I think the real issue here is a datatype conflict.  Look closely
>at the filter conditions:
>
>
>
>>         ->  Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual
time=13604.37..13980.16rows=3173 loops=1) 
>>               Index Cond: (id_chave_grupo = '7458'::character varying)
>>               Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone >
to_timestamp('2004-04-20'::text,'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now())) 
>>
>>
>
>The two comparisons on dt_finalizacao are unindexable because they
>coerce it to a different datatype; the "<" and ">" are the operators on
>timestamp with time zone and so they are not relevant to an index on
>timestamp without time zone.
>
>Very likely your column really ought to be of type timestamp with time
>zone.  If you are convinced it should be timestamp without time zone
>then you need to fix the values you are comparing to.
>
>I'm also a bit disturbed by the fact that the rows estimates are off by
>more than two orders of magnitude (26 est vs 3173 actual).  It would
>perhaps help if you wrote the boundary date as a simple constant:
>
>    ... AND dt_finalizacao > '2004-04-20' AND ...
>
>so that the planner had some chance of comparing the constant to its
>statistics about the distribution of dt_finalizacao.  to_timestamp is
>not a constant-foldable function (because it depends on the current
>TimeZone setting) and so the query as you've written it looks like
>
>    ... AND dt_finalizacao > unpredictable_expression AND ...
>
>as far as the planner knows.  (I'm not sure this will help though;
>it may be that the big problem is that the three columns are not
>independent, which is something the planner doesn't know because it has
>no cross-column statistics.)
>
>            regards, tom lane
>
>
>



pgsql-general by date:

Previous
From: Kris Jurka
Date:
Subject: Re: "Idle in Transaction" and hung connections
Next
From: Bruno Wolff III
Date:
Subject: Re: Foreign Key ON DELETE CASCADE Performance