Thread: index use again and again

index use again and again

From
Holger Marzen
Date:
I read the Postgres boog, browsed the mailing list archives and have
still no clue.

I have a table with about 150.000 rows and put some indexes (not unique)
on it. If I use "=" in the where clause tha index is used, if I use ">"
or "between" then it is not used. It happens both with 7.1.3 and the new
7.2:

|db1=# select count(*) from verfuegbarkeiten;
| count
|--------
| 152428
|(1 row)
|
|db1=# explain select red from verfuegbarkeiten where datum =
|'2002-01-01';
|NOTICE:  QUERY PLAN:
|
|Index Scan using verfuegbarkeiten_datum_idx on verfuegbarkeiten
|(cost=0.00..1489.67 rows=566 width=4)
|
|EXPLAIN
|db1=# explain select red from verfuegbarkeiten where datum >
|'2002-01-01';
|NOTICE:  QUERY PLAN:
|
|Seq Scan on verfuegbarkeiten  (cost=0.00..3820.35 rows=22322 width=4)
|
|EXPLAIN

"vacuum analyze" has be done before (and is done daily). CASTs like
"where datum > '2002-01-01'::date" don't help.

Are 152428 rows not enough to use the index?


--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1


Re: index use again and again

From
"Roderick A. Anderson"
Date:
On Tue, 12 Feb 2002, Holger Marzen wrote:

> I read the Postgres boog, browsed the mailing list archives and have
> still no clue.
>
> I have a table with about 150.000 rows and put some indexes (not unique)
> on it. If I use "=" in the where clause tha index is used, if I use ">"
> or "between" then it is not used. It happens both with 7.1.3 and the new
> 7.2:

Previous discussion on this topic give me the idea that index use is
determined by the optimizer and it will use and index or sequential
depending on some percentage of the records it thinks will be returned.
   I.e. is it more economical to just go through them all or spend time
jumping around getting the right records.  Of course one of the hackers
will be able to give you a much better and probably correct explanation.

Since the columns are not unique I'd guess there are duplicate values in
them.  An interesting thought - if there are only a few unique values
per column - is, would a clustering of the column/index get the
optimizer to use the index.
   I'm sure this has been discussed in previous threads also but my
brain is full so I have to keep only the current stuff around.


Regards,
Rod
--
                      Let Accuracy Triumph Over Victory

                                                       Zetetic Institute
                                                        "David's Sling"
                                                         Marc Stiegler


Re: index use again and again

From
Darren Ferguson
Date:
The reason i believe is that you are doing comparisions instead of direct
this is the value so the table would have to do a sequencial scan of the
table

Darren

On Tue, 12 Feb 2002, Holger Marzen wrote:

> I read the Postgres boog, browsed the mailing list archives and have
> still no clue.
>
> I have a table with about 150.000 rows and put some indexes (not unique)
> on it. If I use "=" in the where clause tha index is used, if I use ">"
> or "between" then it is not used. It happens both with 7.1.3 and the new
> 7.2:
>
> |db1=# select count(*) from verfuegbarkeiten;
> | count
> |--------
> | 152428
> |(1 row)
> |
> |db1=# explain select red from verfuegbarkeiten where datum =
> |'2002-01-01';
> |NOTICE:  QUERY PLAN:
> |
> |Index Scan using verfuegbarkeiten_datum_idx on verfuegbarkeiten
> |(cost=0.00..1489.67 rows=566 width=4)
> |
> |EXPLAIN
> |db1=# explain select red from verfuegbarkeiten where datum >
> |'2002-01-01';
> |NOTICE:  QUERY PLAN:
> |
> |Seq Scan on verfuegbarkeiten  (cost=0.00..3820.35 rows=22322 width=4)
> |
> |EXPLAIN
>
> "vacuum analyze" has be done before (and is done daily). CASTs like
> "where datum > '2002-01-01'::date" don't help.
>
> Are 152428 rows not enough to use the index?
>
>
> --
> PGP/GPG Key-ID:
> http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>


Re: index use again and again

From
Holger Marzen
Date:
On Tue, 12 Feb 2002, Roderick A. Anderson wrote:

> On Tue, 12 Feb 2002, Holger Marzen wrote:
>
> > I read the Postgres boog, browsed the mailing list archives and have
> > still no clue.
> >
> > I have a table with about 150.000 rows and put some indexes (not unique)
> > on it. If I use "=" in the where clause tha index is used, if I use ">"
> > or "between" then it is not used. It happens both with 7.1.3 and the new
> > 7.2:
>
> Previous discussion on this topic give me the idea that index use is
> determined by the optimizer and it will use and index or sequential
> depending on some percentage of the records it thinks will be returned.
>    I.e. is it more economical to just go through them all or spend time
> jumping around getting the right records.  Of course one of the hackers
> will be able to give you a much better and probably correct explanation.

Yes. When I choose a ">" that returns only a few rows then EXPLAIN
reports an index scan. So it's depending on the number of returned rows.

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1


Re: index use again and again

From
Tom Lane
Date:
Holger Marzen <holger@marzen.de> writes:
> Are 152428 rows not enough to use the index?

More like "there are too many rows to use the index".

You've got a query that is estimated to hit 22322/152428 = 15% of the
table.  If the rows in question are uniformly scattered through the
table then the system will certainly have to read every block of the
table to get them all.  It may as well read the table sequentially,
rather than do the extra I/O to read the index too.

            regards, tom lane

Re: index use again and again

From
Holger Marzen
Date:
On Tue, 12 Feb 2002, Tom Lane wrote:

> Holger Marzen <holger@marzen.de> writes:
> > Are 152428 rows not enough to use the index?
>
> More like "there are too many rows to use the index".
>
> You've got a query that is estimated to hit 22322/152428 = 15% of the
> table.  If the rows in question are uniformly scattered through the
> table then the system will certainly have to read every block of the
> table to get them all.  It may as well read the table sequentially,
> rather than do the extra I/O to read the index too.

You're right. When the result is small, the index is used. When the
number of rows is a little bit higher and postgres uses a table scan,
the query is much slower. The table is not very unordered. Can I force
postgres to raise the limit where it starts scanning the whole table?

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1


Re: index use again and again

From
Tom Lane
Date:
Holger Marzen <holger@marzen.de> writes:
> You're right. When the result is small, the index is used. When the
> number of rows is a little bit higher and postgres uses a table scan,
> the query is much slower. The table is not very unordered. Can I force
> postgres to raise the limit where it starts scanning the whole table?

Not directly.  There is code in 7.2 to try to take account of the
effects of table ordering, but it probably needs to be improved.
See cost_index in costsize.c if you are interested.

            regards, tom lane