Well, this message never made it through, but I managed to answer my own
question -- I never ran vacuum analyze which caused a table scan instead of
an index scan. After running vacuum analyze the query returns immediately. I
would still be interested in any hints to optimize performance for very
large tables (>10M tuples).
Thanks,
Joe
> Hello,
>
> I'm working on an application where I need to design for one table to grow
> to an extremely large size. I'm already planning to partition the data
into
> multiple tables, and even possibly multiple servers, but even so each
table
> may need to grow to the 10 - 15 million tuple range. This table will be
used
> for a keyed lookup and it is very important that the query return in well
> under a second. I've done a small test using a dual ppro 200 server with
512
> MB RAM, running RedHat Linux 6.1, and a late November 7.1 pre-beta
snapshot
> of PostgreSQL (PGOPTS="-D/opt/pgsql/data -i  -B 25000"). I used a test
table
> with about 5 million tuples.
>
> Details:
>
> CREATE TABLE foo(
>     guid varchar(20) not null,
>     ks varchar(20) not null
> );
>
> --> insert 5,000,000 rows, both guid and ks are 20 byte sha-1 hashes
> -- tried this first
> -- create index foo_idx1 on foo(guid);
> -- then tried
> create index foo_idx1 on foo using HASH (guid);
>
> SELECT ks FROM foo WHERE guid =
'f9ee1930f4010ba34cc5ca6efd27d988eb4f921d';
>
> The query currently takes in excess of 40 seconds. I would appreciate any
> suggestions for optimizing to bring this down substantially.
>
> Thanks in advance,
>
> Joe Conway
>