I have the following tables and indexes
CREATE TABLE cddb
(
id integer PRIMARY KEY,
/* artist name */
name text,
/* disk title */
title text,
/* the type of the artist group, person, orchestra*/
type smallint,
/* the creation date */
created integer
);
CREATE INDEX cddb1 on cddb(title);
CREATE INDEX cddb2 on cddb(name);
CREATE TABLE cddbentry
(
id integer PRIMARY KEY,
diskid text,
entry integer REFERENCES cddb
);
CREATE INDEX cddbentry1 on cddbentry (diskid);
CREATE INDEX cddbentry2 on cddbentry (entry);
I am using the following query:
select cddb.* from cddb,cddbentry where cddbentry.diskid = 'toto' and
cddbentry.entry = cddb.id;
On version 7.3.4 it produces this query plan
Nested Loop (cost=0.00..41.25 rows=5 width=78)
-> Index Scan using cddbentry1 on cddbentry (cost=0.00..17.07
rows=5 width=4)
Index Cond: (diskid = 'toto'::text)
-> Index Scan using cddb_pkey on cddb (cost=0.00..4.82 rows=1 width=74)
Index Cond: ("outer".entry = cddb.id)
I have upgraded to version 7.4.0 (compiling the software and migrating
the database using dump/restore)
Now the following plan is produced
Hash Join (cost=17.08..42.15 rows=7 width=74)
Hash Cond: ("outer".id = "inner".entry)
-> Seq Scan on cddb (cost=0.00..20.00 rows=1000 width=74)
-> Hash (cost=17.07..17.07 rows=6 width=4)
-> Index Scan using cddbentry1 on cddbentry (cost=0.00..17.07
rows=6 width=4)
Index Cond: (diskid = 'toto'::text)
which result in a VERY much slower query as the cddb table has more than
1 million entry.... and there is at most one entry in cddbentry
which matches the diskid !
The workaround is to disable the hash join capability using set
enable_hashjoin to false; resulting in the same query plan as in 7.3.x.
I think this can classified as a regression bug.
N.B: The cost value may be wrong, because to be able to send you this
email, they have been produced on empty tables; but
I can guarantee you they are the same on the full loaded database.
Hope this help to improve this great product
Bernard SNYERS