Thread: How index are running and how to optimise ?

How index are running and how to optimise ?

From
Hervé Piedvache
Date:
Hi,

I have may be a stupid question, but I'm a little surprised with some explains
I have, using date fields ...

I would like to understand exactly when index are used ...
I'm using PostgresQL 7.4.1

I have a table with 351 000 records.
I have about 300 to 600 new records by day
I have an index like this :
ix_contracts_start_stop_date btree (start_date, stop_date)

I want to simply do something like this :

select o.id_contract
   from contracts o
 where o.start_date <= '2001-10-31'
    and (o.stop_date > '2001-11-06' or stop_date is null);

OK I get an explain like this :
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on contracts o  (cost=0.00..12021.80 rows=160823 width=4)
   Filter: ((start_date <= '2001-10-31'::date) AND ((stop_date >
'2001-11-06'::date) OR (stop_date IS NULL)))

I understand that the OR could make the no use of the stop_date index ..., but
why I'm not using the index for the start_date part ?

Index are used only if I use an egality like this :

select o.id_contract
   from contracts o
 where o.start_date = '2001-10-31'
    and o.stop_date = '2001-11-06';

                                          QUERY PLAN
------------------------------------------------------------------------------------------------
 Index Scan using ix_contracts_start_stop_date on contracts o
(cost=0.00..6.00 rows=1 width=4)
   Index Cond: ((start_date = '2001-10-31'::date) AND (stop_date =
'2001-11-06'::date))

Could you please explain me why index are not used with <, > and how I can
optimise my request ... I have no idea but I'm using this request to do
insert in another table and this segmentation take 13 hours for making the
insert ! :o((

Thanks for help,
--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902


Re: How index are running and how to optimise ?

From
Nick Barr
Date:
Hervé Piedvache wrote:

>Hi,
>
>I have may be a stupid question, but I'm a little surprised with some explains
>I have, using date fields ...
>
>I would like to understand exactly when index are used ...
>I'm using PostgresQL 7.4.1
>
>I have a table with 351 000 records.
>I have about 300 to 600 new records by day
>I have an index like this :
>ix_contracts_start_stop_date btree (start_date, stop_date)
>
>I want to simply do something like this :
>
>select o.id_contract
>   from contracts o
> where o.start_date <= '2001-10-31'
>    and (o.stop_date > '2001-11-06' or stop_date is null);
>
>OK I get an explain like this :
>                                                  QUERY PLAN
>--------------------------------------------------------------------------------------------------------------
> Seq Scan on contracts o  (cost=0.00..12021.80 rows=160823 width=4)
>   Filter: ((start_date <= '2001-10-31'::date) AND ((stop_date >
>'2001-11-06'::date) OR (stop_date IS NULL)))
>
>I understand that the OR could make the no use of the stop_date index ..., but
>why I'm not using the index for the start_date part ?
>
>Index are used only if I use an egality like this :
>
>select o.id_contract
>   from contracts o
> where o.start_date = '2001-10-31'
>    and o.stop_date = '2001-11-06';
>
>                                          QUERY PLAN
>------------------------------------------------------------------------------------------------
> Index Scan using ix_contracts_start_stop_date on contracts o
>(cost=0.00..6.00 rows=1 width=4)
>   Index Cond: ((start_date = '2001-10-31'::date) AND (stop_date =
>'2001-11-06'::date))
>
>Could you please explain me why index are not used with <, > and how I can
>optimise my request ... I have no idea but I'm using this request to do
>insert in another table and this segmentation take 13 hours for making the
>insert ! :o((
>
>Thanks for help,
>
>
Have you ANALYZEd recently? If not you need to do that regularly. Try

VACUUM ANALYZE contracts;

to vacuum that specific table.

Could you also try

select
    o.id_contract
from
    contracts o
where
    o.start_date NOT BETWEEN '2001-10-31' AND '2001-11-06' OR
    o.stop_date IS NULL;

Also could you paste the results of EXPLAIN ANALYZE instead of EXPLAIN.



Cheers

Nick




Re: How index are running and how to optimise ?

From
"scott.marlowe"
Date:
On Wed, 3 Mar 2004, [iso-8859-15] Hervé Piedvache wrote:

> Hi,
>
> I have may be a stupid question, but I'm a little surprised with some explains
> I have, using date fields ...
>
> I would like to understand exactly when index are used ...
> I'm using PostgresQL 7.4.1
>
> I have a table with 351 000 records.
> I have about 300 to 600 new records by day
> I have an index like this :
> ix_contracts_start_stop_date btree (start_date, stop_date)
>
> I want to simply do something like this :
>
> select o.id_contract
>    from contracts o
>  where o.start_date <= '2001-10-31'
>     and (o.stop_date > '2001-11-06' or stop_date is null);
>
> OK I get an explain like this :
>                                                   QUERY PLAN
> --------------------------------------------------------------------------------------------------------------
>  Seq Scan on contracts o  (cost=0.00..12021.80 rows=160823 width=4)
>    Filter: ((start_date <= '2001-10-31'::date) AND ((stop_date >
> '2001-11-06'::date) OR (stop_date IS NULL)))

Notice the planner is expecting to get back 160823 rows here.  How many
does it actually return?

> I understand that the OR could make the no use of the stop_date index ..., but
> why I'm not using the index for the start_date part ?
>
> Index are used only if I use an egality like this :
>
> select o.id_contract
>    from contracts o
>  where o.start_date = '2001-10-31'
>     and o.stop_date = '2001-11-06';

No, you don't have to do that.  You should be able to use a range and get
an index scan IF said index scan will be faster (in the query planner's
estimate.)

explain select * from test where dt>'2004-01-01 00:00:00' and
dt<'2004-01-02 00:00:00';
                                                               QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_dt on test  (cost=0.00..118628.84 rows=29793
width=51)
   Index Cond: ((dt > '2004-01-01 00:00:00'::timestamp without time zone)
AND (dt < '2004-01-02 00:00:00'::timestamp without time zone))
(2 rows)

Notice the use of an index there.

>                                           QUERY PLAN
> ------------------------------------------------------------------------------------------------
>  Index Scan using ix_contracts_start_stop_date on contracts o
> (cost=0.00..6.00 rows=1 width=4)
>    Index Cond: ((start_date = '2001-10-31'::date) AND (stop_date =
> '2001-11-06'::date))

Here the planner expects ONE row.  Of course it's using an index.

> Could you please explain me why index are not used with <, > and how I can
> optimise my request ... I have no idea but I'm using this request to do
> insert in another table and this segmentation take 13 hours for making the
> insert ! :o((

It may well be the inserts that are slow and not the selects.  how long
does the select, by itself, take to run?