Thread: Why are selects so slow on large tables, even when indexed?

Why are selects so slow on large tables, even when indexed?

From

"Robert Wille"

Date:

26 March 2002, 18:31:56

To test PostgreSQL's scalability, I created a table with approximately 76M rows. The table had four columns: a bigint, a varchar(32), another bigint and a varchar(80). The first three columns were filled with values, the fourth was left null. After populating the table, I created an index on the first column (a non-unique index, as the column contains duplicate values) and then VACUUMed. Select statements involving only the indexed column are pathetically slow (tens of minutes). Some examples:

select count(*) from a where id < 0; /* returns 0 rows */

select * from a where id=5; /* returns a handful of rows */

76M rows is a lot, but it shouldn't be that bad when id is indexed.

Attached are two scripts. One creates the table, the other populates it. I typed "create index index_a on a(id)" and "vacuum" by hand. I see this behavior both on Windows and RedHat Linux using PostgreSQL version 7.1.3 in both cases. Any idea why the performance is so poor? Can this be corrected by tuning?

Attachment

Re: Why are selects so slow on large tables, even when indexed?

From

"Dan Langille"

Date:

26 March 2002, 18:41:37

On 26 Mar 2002 at 16:28, Robert Wille wrote:

> Attached are two scripts. One creates the table, the other populates it. I
> typed "create index index_a on a(id)" and "vacuum" by hand.


What was the vacuum command you issued?  I hope it was "vacuum analyze".
--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/ - practical examples

Re: Why are selects so slow on large tables, even when

From

Neil Conway

Date:

26 March 2002, 18:48:46

On Tue, 2002-03-26 at 18:28, Robert Wille wrote:
> To test PostgreSQL's scalability, I created a table with approximately 76M rows.
> The table had four columns: a bigint, a varchar(32), another bigint
> and a varchar(80).

> select count(*) from a where id < 0; /* returns 0 rows */
> select * from a where id=5;    /* returns a handful of rows */
>
> 76M rows is a lot, but it shouldn't be that bad when id is indexed.

A couple things:

(1) You indicated that you ran VACUUM. You'll need to run VACUUM ANALYZE
(or just ANALYZE) to update the planner's statistics. For your
particular situation, this is essential.

(2) There is a long-standing bug with indexes on int8 columns: if you
use a numeric literal as a qualifier, it will be converted to an int4,
so the index won't be used. There is an easy work-around:

select * from a where id = 5;       /* won't use index if id is int8 */
select * from a where id = 5::int8; /* will use index, if appropriate */

(3) You can get more information on the decisions Postgres is making
when executing your query through the use of EXPLAIN. In this instance,
it will likely tell you that the index isn't being used at all, and a
sequential scan is being performed.

If you follow suggestions #1 and #2, you should see markedly improved
performance. Let us know the results...

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Why are selects so slow on large tables, even whenindexed?

From

"Robert Wille"

Date:

27 March 2002, 09:55:42

The suggested fixes have helped a lot, but it is still rather slow. The time
varies and can be upwards of 10 to 20 seconds on a ~47M row table. Is this
normal? Similar queries on an indexed varchar column in Oracle with about
1/2 as many rows execute at least a hundred times faster.

----- Original Message -----
From: "Neil Conway" <nconway@klamath.dyndns.org>
To: "Robert Wille" <rwille@iarchives.com>
Cc: <pgsql-general@postgresql.org>; "Russell Black"
<russell.black@iarchives.com>
Sent: Tuesday, March 26, 2002 4:47 PM
Subject: Re: [GENERAL] Why are selects so slow on large tables, even
whenindexed?


> On Tue, 2002-03-26 at 18:28, Robert Wille wrote:
> > To test PostgreSQL's scalability, I created a table with approximately
76M rows.
> > The table had four columns: a bigint, a varchar(32), another bigint
> > and a varchar(80).
>
> > select count(*) from a where id < 0; /* returns 0 rows */
> > select * from a where id=5;    /* returns a handful of rows */
> >
> > 76M rows is a lot, but it shouldn't be that bad when id is indexed.
>
> A couple things:
>
> (1) You indicated that you ran VACUUM. You'll need to run VACUUM ANALYZE
> (or just ANALYZE) to update the planner's statistics. For your
> particular situation, this is essential.
>
> (2) There is a long-standing bug with indexes on int8 columns: if you
> use a numeric literal as a qualifier, it will be converted to an int4,
> so the index won't be used. There is an easy work-around:
>
> select * from a where id = 5;       /* won't use index if id is int8 */
> select * from a where id = 5::int8; /* will use index, if appropriate */
>
> (3) You can get more information on the decisions Postgres is making
> when executing your query through the use of EXPLAIN. In this instance,
> it will likely tell you that the index isn't being used at all, and a
> sequential scan is being performed.
>
> If you follow suggestions #1 and #2, you should see markedly improved
> performance. Let us know the results...
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <neilconway@rogers.com>
> PGP Key ID: DB3C29FC
>

Re: Why are selects so slow on large tables, even

From

Neil Conway

Date:

27 March 2002, 18:49:31

On Wed, 2002-03-27 at 09:52, Robert Wille wrote:
> The suggested fixes have helped a lot, but it is still rather slow. The time
> varies and can be upwards of 10 to 20 seconds on a ~47M row table. Is this
> normal?

You haven't given us enough information to be able to tell. What
hardware is this running on? What (exact) queries are you executing?
What is the schema of any relevant database tables, indexes, views,
etc.? What does EXPLAIN produce for the query? What configuration
changes have you made? (e.g. increasing the size of the shared buffers).
What OS is this running on, and how has the OS been tuned? Are you
running a single query, multiple sequentual queries, or multiple
concurrent queries?

For my own curiosity, what are the results if you execute the same query
using a hash index -- i.e.

CREATE INDEX your_index ON your_table USING hash (your_column);

(You'll need to drop the existing btree index -- once you've got the
results, I'd recommend removing the hash index and re-instating the
btree one for production use.)

> Similar queries on an indexed varchar column in Oracle with about
> 1/2 as many rows execute at least a hundred times faster.

Keep in mind that the time taken to return a single SELECT query is a
very incomplete measure of RDBMS performance; nevertheless, I doubt that
Postgres is, on average, 100x slower than Oracle. Can you post the
Oracle equivalent of EXPLAIN for the queries you're running to derive
this 100x figure?

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Why are selects so slow on large tables, even

From

Jason Earl

Date:

28 March 2002, 17:35:21

"Robert Wille" <rwille@iarchives.com> writes:

> The suggested fixes have helped a lot, but it is still rather
> slow. The time varies and can be upwards of 10 to 20 seconds on a
> ~47M row table. Is this normal? Similar queries on an indexed
> varchar column in Oracle with about 1/2 as many rows execute at
> least a hundred times faster.

I realize that it has taken me quite a while to get back to you, on
this particular case but I have been running some tests on your data
(or 28 million lines of it anyway) and I have some stuff to share.

First of all, your sample data set has the *opposite* problem of most
queries that PostgreSQL users complain about.  Most people complain
about queries that do sequential scans when PostgreSQL should be doing
an index scan.  Your data, on the other hand, caused PostgreSQL to do
an indexscan when it probably should have been doing a sequential
scan.  After all, there are only 1000 (or so) different unique values
of "id" and the instances of each value are spread evenly throughout
the table.  Since you are going to touch most pages anyhow consulting
the index is just an extra step.  On my limited test machine I
actually saw modest gains on queries like:

SELECT count(*) FROM a WHERE id = 89::bigint;

However, if you cluster the index on id then similar queries started
to return *immediately*.  Don't forget to vacuum after clustering.

I hope this was helpful,  I sured learned a lot.

Jason