To test PostgreSQL's scalability, I created a table with approximately 76M rows. The table had four columns: a bigint, a varchar(32), another bigint and a varchar(80). The first three columns were filled with values, the fourth was left null. After populating the table, I created an index on the first column (a non-unique index, as the column contains duplicate values) and then VACUUMed. Select statements involving only the indexed column are pathetically slow (tens of minutes). Some examples:
select count(*) from a where id < 0; /* returns 0 rows */
select * from a where id=5; /* returns a handful of rows */
76M rows is a lot, but it shouldn't be that bad when id is indexed.
Attached are two scripts. One creates the table, the other populates it. I typed "create index index_a on a(id)" and "vacuum" by hand. I see this behavior both on Windows and RedHat Linux using PostgreSQL version 7.1.3 in both cases. Any idea why the performance is so poor? Can this be corrected by tuning?