Thread: Why are selects so slow on large tables, even when indexed?

Why are selects so slow on large tables, even when indexed?

From
"Robert Wille"
Date:
To test PostgreSQL's scalability, I created a table with approximately 76M rows. The table had four columns: a bigint, a varchar(32), another bigint and a varchar(80). The first three columns were filled with values, the fourth was left null. After populating the table, I created an index on the first column (a non-unique index, as the column contains duplicate values) and then VACUUMed. Select statements involving only the indexed column are pathetically slow (tens of minutes). Some examples:
 
select count(*) from a where id < 0; /* returns 0 rows */
select * from a where id=5;    /* returns a handful of rows */
 
76M rows is a lot, but it shouldn't be that bad when id is indexed.
 
Attached are two scripts. One creates the table, the other populates it. I typed "create index index_a on a(id)" and "vacuum" by hand. I see this behavior both on Windows and RedHat Linux using PostgreSQL version 7.1.3 in both cases. Any idea why the performance is so poor? Can this be corrected by tuning?
 

Re: Why are selects so slow on large tables, even when

From
Stephan Szabo
Date:
On Tue, 26 Mar 2002, Robert Wille wrote:

> To test PostgreSQL's scalability, I created a table with approximately
> 76M rows. The table had four columns: a bigint, a varchar(32), another
> bigint and a varchar(80). The first three columns were filled with
> values, the fourth was left null. After populating the table, I
> created an index on the first column (a non-unique index, as the
> column contains duplicate values) and then VACUUMed. Select statements
> involving only the indexed column are pathetically slow (tens of
> minutes). Some examples:
>
> select count(*) from a where id < 0; /* returns 0 rows */
> select * from a where id=5;    /* returns a handful of rows */
>
> 76M rows is a lot, but it shouldn't be that bad when id is indexed.
>
> Attached are two scripts. One creates the table, the other populates
> it. I typed "create index index_a on a(id)" and "vacuum" by hand. I
> see this behavior both on Windows and RedHat Linux using PostgreSQL
> version 7.1.3 in both cases. Any idea why the performance is so poor?
> Can this be corrected by tuning?

If you did just a vacuum and not a vacuum analyze, you should go back and
do so to make sure the statistics are updated.