Chantal,
I'm short of time right now. So just a few quick notes and a request
for more information. Next round tomorrow ...
On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann
<chantal.ackermann@biomax.de> wrote:
>yeah, I noticed the difference between the two queries. actually, I am
>afraid of dropping the distinct cause I had results with duplicate rows
>(though I shall recheck when this is really the case). These are the
>table declarations and constraints: [...]
AFAICS there's no way to get duplicates, so no need for DISTINCT.
> we have to use bigint as type. is the join therefore
>such expensive?
No.
>we had a primary key occurrence_id on the occurrences tables but we
>noticed that we don't use it, so we didn't recreate it in the new
>database. is it possible that the postgres could work with it internally?
No.
> Nested Loop Total runtime: 53508.86 msec
>Merge Join: Total runtime: 113066.81 msec
>Hash Join: Total runtime: 439344.44 msec
>I changed the gene name at every run to avoid the caching.
You can't compare the runtimes unless you query for the same data.
Either run each query twice to make sure everything is cached or do
something like
tar cf /dev/null /some/big/directory
before each query to empty your disk cache. BTW, what is your
shared_buffers setting.
>disabling the hash join results again in a Nested Loop with very high
>cost but low runtime - I'm not sure if the latter is the consequence of
>caching.
Please send EXPLAIN ANALYZE results for all queries. Send it to me
off-list if you think its too long.
Servus
Manfred