Thread: Indexes on Large Tables

Indexes on Large Tables

From

"Donny Drummonds"

Date:

06 February 2003, 13:47:00

I am running a dual athlon 1800 with an gig of ram.  I am running postgres
7.2.  I have a table that hs 15 million rows.  I have a query that has one
columns and the select cluase and one column in the where clause with no
joining or sub-queries.  If I do not index the column from the where clause
the query returns the 150,000 rows in 4 and a half minutes.  If in do index
the column from the where clause using a btree the 150,000 rows return in 11
and a half minutes.

Any insight would be greatly appreciated.

Donny L. Drummonds

Re: Indexes on Large Tables

From

Stephan Szabo

Date:

06 February 2003, 14:33:41

On Mon, 3 Feb 2003, Donny Drummonds wrote:

> I am running a dual athlon 1800 with an gig of ram.  I am running postgres
> 7.2.  I have a table that hs 15 million rows.  I have a query that has one
> columns and the select cluase and one column in the where clause with no
> joining or sub-queries.  If I do not index the column from the where clause
> the query returns the 150,000 rows in 4 and a half minutes.  If in do index
> the column from the where clause using a btree the 150,000 rows return in 11
> and a half minutes.
>
> Any insight would be greatly appreciated.

Hmm, have you analyzed recently?  In any case, the table definition, query
and explain analyze outpute would be useful to see.

Re: Indexes on Large Tables

From

Curt Sampson

Date:

07 February 2003, 00:13:14

On Mon, 3 Feb 2003, Donny Drummonds wrote:

> If I do not index the column from the where clause the query returns
> the 150,000 rows in 4 and a half minutes. If in do index the column
> from the where clause using a btree the 150,000 rows return in 11 and
> a half minutes.
>
> Any insight would be greatly appreciated.

Well, I'm not sure if this is the insight you're looking for, but....

The reason it takes longer if you use an index is that you change
from sequential I/O (which is relatively fast) to random I/O (which
is relatively slow). With the table scan (reading the entire table in
whatever order it's in on the disk) you're reading several times as much
data, but you're not doing head seeks all over the place to move the
head to the place where the next bit of data to be read is.

Obviously, in this case, even though an index was available, the planner
was wrong to chose to use it rather than just read the entire table.
That is, as someone else mentioned, likely due to bad statistics: the
planner thought you were going to select a very small part of the table,
rather than ten percent of it (which is a pretty large fraction, for
these purposes). Try doing an ANALYZE.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: Indexes on Large Tables

From

"Donald Fraser"

Date:

07 February 2003, 05:05:52

----- Original Message -----
From: "Curt Sampson" <cjs@cynic.net>
To: "Donny Drummonds" <donny@cypresstg.com>
Cc: <pgsql-admin@postgresql.org>
Sent: Friday, February 07, 2003 5:13 AM
Subject: Re: [ADMIN] Indexes on Large Tables


> On Mon, 3 Feb 2003, Donny Drummonds wrote:
>
> > If I do not index the column from the where clause the query returns
> > the 150,000 rows in 4 and a half minutes. If in do index the column
> > from the where clause using a btree the 150,000 rows return in 11 and
> > a half minutes.
> >
> > Any insight would be greatly appreciated.
>
> Well, I'm not sure if this is the insight you're looking for, but....
>
> The reason it takes longer if you use an index is that you change
> from sequential I/O (which is relatively fast) to random I/O (which
> is relatively slow). With the table scan (reading the entire table in
> whatever order it's in on the disk) you're reading several times as much
> data, but you're not doing head seeks all over the place to move the
> head to the place where the next bit of data to be read is.
>
> Obviously, in this case, even though an index was available, the planner
> was wrong to chose to use it rather than just read the entire table.
> That is, as someone else mentioned, likely due to bad statistics: the
> planner thought you were going to select a very small part of the table,
> rather than ten percent of it (which is a pretty large fraction, for
> these purposes). Try doing an ANALYZE.
>
> cjs

If the index that you use in the WHERE clause is the most common method that
you are accessing the table then I recommend that you periodically run the
cluster command:
CLUSTER indexname ON tablename
When a table is clustered, it is physically reordered on disk based on the
index information.

Donald