Thread: Doesn't use index, why?

Doesn't use index, why?

From

Marco Catunda

Date:

04 January 2001, 11:17:47

Hello,

I have some doubt about index.
Look the follow example:


    naslog=# \d desconexao
                                   Table "desconexao"
       Attribute    |    Type     |                    Modifier
    ----------------+-------------+-------------------------------------------------
     id             | integer     | not null default
    nextval('id_desconexao'::text)
     time           | timestamp   |
     client_user    | varchar(20) |
     client         | varchar(40) |
     ip_nas         | inet        |
     ip_client_user | inet        |
     disconnect     | smallint    |
     user_time      | interval    |
     data_rate      | integer     |
     called_number  | varchar(14) |
     calling_number | varchar(14) |
     filtrado       | boolean     | default 'f'::bool
    Indices: client_desconexao_idx,
             desconexao_pkey,
             filtro_idx,
             time_idx


    naslog=# \d time_idx
       Index "time_idx"
     Attribute |   Type
    -----------+-----------
     time      | timestamp
    btree


    naslog=# explain select * from desconexao where time = '2000-12-01';
    NOTICE:  QUERY PLAN:

    Index Scan using time_idx on desconexao  (cost=0.00..20.18 rows=5
    width=103)

    EXPLAIN



Every thing is perfect, but when i execute de follow query:


    naslog=# explain select * from desconexao where time > '2000-12-01';
    NOTICE:  QUERY PLAN:

    Seq Scan on desconexao  (cost=0.00..19547.71 rows=657958 width=103)

    EXPLAIN


Why postgresql didn't use the time_idx index? I only changed the
operator
"=" to ">" in clause "where".


Thank you.
Regards
-- Marco Catunda

Re: Doesn't use index, why?

From

Dave Smith

Date:

04 January 2001, 11:54:58

Marco Catunda wrote:

> Hello,
>
> I have some doubt about index.
> Look the follow example:
>
>
>     naslog=# \d desconexao
>                                    Table "desconexao"
>        Attribute    |    Type     |                    Modifier
>     ----------------+-------------+-------------------------------------------------
>      id             | integer     | not null default
>     nextval('id_desconexao'::text)
>      time           | timestamp   |
>      client_user    | varchar(20) |
>      client         | varchar(40) |
>      ip_nas         | inet        |
>      ip_client_user | inet        |
>      disconnect     | smallint    |
>      user_time      | interval    |
>      data_rate      | integer     |
>      called_number  | varchar(14) |
>      calling_number | varchar(14) |
>      filtrado       | boolean     | default 'f'::bool
>     Indices: client_desconexao_idx,
>              desconexao_pkey,
>              filtro_idx,
>              time_idx
>
>
>     naslog=# \d time_idx
>        Index "time_idx"
>      Attribute |   Type
>     -----------+-----------
>      time      | timestamp
>     btree
>
>
>     naslog=# explain select * from desconexao where time = '2000-12-01';
>     NOTICE:  QUERY PLAN:
>
>     Index Scan using time_idx on desconexao  (cost=0.00..20.18 rows=5
>     width=103)
>
>     EXPLAIN
>
>
>
> Every thing is perfect, but when i execute de follow query:
>
>
>     naslog=# explain select * from desconexao where time > '2000-12-01';
>     NOTICE:  QUERY PLAN:
>
>     Seq Scan on desconexao  (cost=0.00..19547.71 rows=657958 width=103)
>
>     EXPLAIN
>
>
> Why postgresql didn't use the time_idx index? I only changed the
> operator
> "=" to ">" in clause "where".
>
>
> Thank you.
> Regards
> -- Marco Catunda
>
>
Firstly you should include your version along with the report.

Two thoughts ... did you run vacuum?
Of the total number of records how many are greater than 2000-12-01? It
could be that postgresql figures that a large portion of the result set
is greater than this date and it is faster to do a sequential scan

Re: Doesn't use index, why?

From

Tom Lane

Date:

04 January 2001, 12:11:54

Marco Catunda <catunda@pobox.com> writes:
>     naslog=# explain select * from desconexao where time = '2000-12-01';
>     NOTICE:  QUERY PLAN:

>     Index Scan using time_idx on desconexao  (cost=0.00..20.18 rows=5
>     width=103)

>     naslog=# explain select * from desconexao where time > '2000-12-01';
>     NOTICE:  QUERY PLAN:

>     Seq Scan on desconexao  (cost=0.00..19547.71 rows=657958 width=103)

> Why postgresql didn't use the time_idx index? I only changed the
> operator "=" to ">" in clause "where".

Because the '>' condition is much less restrictive (notice the
difference in estimated row counts: 5 versus 657958).  Unless the
planner's row count estimates are way off, it very likely *is* faster
to do the second query by sequential scan.

            regards, tom lane

Re: Doesn't use index, why?

From

Marco Catunda

Date:

05 January 2001, 06:19:10

On 04 Jan 2001 11:55:14 -0500, Dave Smith wrote:
> Marco Catunda wrote:
>
> > Hello,
> >
> > I have some doubt about index.
> > Look the follow example:
> >
> >
> >     naslog=# \d desconexao
> >                                    Table "desconexao"
> >        Attribute    |    Type     |                    Modifier
> >     ----------------+-------------+-------------------------------------------------
> >      id             | integer     | not null default
> >     nextval('id_desconexao'::text)
> >      time           | timestamp   |
> >      client_user    | varchar(20) |
> >      client         | varchar(40) |
> >      ip_nas         | inet        |
> >      ip_client_user | inet        |
> >      disconnect     | smallint    |
> >      user_time      | interval    |
> >      data_rate      | integer     |
> >      called_number  | varchar(14) |
> >      calling_number | varchar(14) |
> >      filtrado       | boolean     | default 'f'::bool
> >     Indices: client_desconexao_idx,
> >              desconexao_pkey,
> >              filtro_idx,
> >              time_idx
> >
> >
> >     naslog=# \d time_idx
> >        Index "time_idx"
> >      Attribute |   Type
> >     -----------+-----------
> >      time      | timestamp
> >     btree
> >
> >
> >     naslog=# explain select * from desconexao where time = '2000-12-01';
> >     NOTICE:  QUERY PLAN:
> >
> >     Index Scan using time_idx on desconexao  (cost=0.00..20.18 rows=5
> >     width=103)
> >
> >     EXPLAIN
> >
> >
> >
> > Every thing is perfect, but when i execute de follow query:
> >
> >
> >     naslog=# explain select * from desconexao where time > '2000-12-01';
> >     NOTICE:  QUERY PLAN:
> >
> >     Seq Scan on desconexao  (cost=0.00..19547.71 rows=657958 width=103)
> >
> >     EXPLAIN
> >
> >
> > Why postgresql didn't use the time_idx index? I only changed the
> > operator
> > "=" to ">" in clause "where".
> >
> >
> > Thank you.
> > Regards
> > -- Marco Catunda
> >
> >
> Firstly you should include your version along with the report.
>
> Two thoughts ... did you run vacuum?
> Of the total number of records how many are greater than 2000-12-01? It
> could be that postgresql figures that a large portion of the result set
> is greater than this date and it is faster to do a sequential scan


Sorry, the version of Postgresql is 7.0.3

This is the number of records in database:

    naslog=# select count(*) from desconexao;
     count
    --------
     658617
    (1 row)



I changed the data of example because all records is greater than
'2000-12-01', sorry.
But when I try '2000-12-10' the samething happens.


    naslog=# explain select * from desconexao where time < '2000-12-10';
    NOTICE:  QUERY PLAN:

    Seq Scan on desconexao  (cost=0.00..19547.71 rows=231489 width=103)

    EXPLAIN
    naslog=# explain select * from desconexao where time >=
    '2000-12-10';
    NOTICE:  QUERY PLAN:

    Seq Scan on desconexao  (cost=0.00..19547.71 rows=427128 width=103)

    EXPLAIN





The number of records are:


    naslog=# select count(*) from desconexao where time >= '2000-12-10';
     count
    --------
     585789
    (1 row)

    naslog=# select count(*) from desconexao where time < '2000-12-10';
     count
    -------
     72828
    (1 row)





I think the estimate rows (231489) in query < '2000-12-10' is far away
to real value (72828).
So I execute vacuum analyze:


    naslog=# vacuum verbose analyze desconexao;

    NOTICE:  --Relation desconexao--
    NOTICE:  Pages 11318: Changed 0, reaped 2616, Empty 0, New 0; Tup
    658617: Vac 200, Keep/VTL 0/0, Crash 0, UnUsed 8007, MinLen 105,
    MaxLen 166; Re-using: Free/Avail. Space 150084/35492;
    EndEmpty/Avail. Pages 0/105. CPU 3.08s/27.90u sec.
    NOTICE:  Index client_desconexao_idx: Pages 9786; Tuples 658617:
    Deleted 200. CPU 2.50s/3.33u sec.
    NOTICE:  Index filtro_idx: Pages 2946; Tuples 658617: Deleted 200.
    CPU 0.70s/3.20u sec.
    NOTICE:  Index time_idx: Pages 5503; Tuples 658617: Deleted 200. CPU
    1.52s/3.04u sec.
    NOTICE:  Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted
    200. CPU 1.59s/3.12u sec.
    NOTICE:  Rel desconexao: Pages: 11318 --> 11315; Tuple(s) moved:
    199. CPU 5.41s/1.03u sec.
    NOTICE:  Index client_desconexao_idx: Pages 9786; Tuples 658617:
    Deleted 199. CPU 2.85s/1.79u sec.
    NOTICE:  Index filtro_idx: Pages 2946; Tuples 658617: Deleted 199.
    CPU 0.71s/1.60u sec.
    NOTICE:  Index time_idx: Pages 5503; Tuples 658617: Deleted 199. CPU
    1.51s/1.65u sec.
    NOTICE:  Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted
    199. CPU 1.59s/1.58u sec.
    VACUUM

    naslog=# explain select * from desconexao where time < '2000-12-10';
    NOTICE:  QUERY PLAN:

    Seq Scan on desconexao  (cost=0.00..19547.71 rows=231489 width=103)

    EXPLAIN
    naslog=# explain select * from desconexao where time >=
    '2000-12-10';
    NOTICE:  QUERY PLAN:

    Seq Scan on desconexao  (cost=0.00..19547.71 rows=427128 width=103)

    EXPLAIN



The same estimates values happen.
Is there a way to force index?


Thank you
-- Marco Catunda

Re: Doesn't use index, why?

From

Tom Lane

Date:

05 January 2001, 11:59:59

Marco Catunda <catunda@pobox.com> writes:
>     naslog=# explain select * from desconexao where time < '2000-12-10';
>     NOTICE:  QUERY PLAN:

>     Seq Scan on desconexao  (cost=0.00..19547.71 rows=231489 width=103)

>     naslog=# explain select * from desconexao where time >=
>     '2000-12-10';
>     NOTICE:  QUERY PLAN:

>     Seq Scan on desconexao  (cost=0.00..19547.71 rows=427128 width=103)

> The number of records are:

>     naslog=# select count(*) from desconexao where time >= '2000-12-10';
>      count
>     --------
>      585789
>     (1 row)

>     naslog=# select count(*) from desconexao where time < '2000-12-10';
>      count
>     -------
>      72828
>     (1 row)

In this case the planner is doing *exactly* the right thing; it is
smarter than you are.  If you want to prove it, force the planner to
use an indexscan by doing SET ENABLE_SEQSCAN TO OFF.  Then time the
query, and compare the runtime against the seqscan version.

The bottom line here is that a query that needs to touch more than a
few percent of the rows in a table is better off being done as a
seqscan.

            regards, tom lane

Re: Doesn't use index, why?

From

Bruce Momjian

Date:

09 January 2001, 01:10:04

> In this case the planner is doing *exactly* the right thing; it is
> smarter than you are.  If you want to prove it, force the planner to
> use an indexscan by doing SET ENABLE_SEQSCAN TO OFF.  Then time the
> query, and compare the runtime against the seqscan version.
>
> The bottom line here is that a query that needs to touch more than a
> few percent of the rows in a table is better off being done as a
> seqscan.

The only other workaround is to CLUSTER the table on an index, then
force an index scan.  That _may_ be faster.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026