Re: [SQL] Beginner's headache of joins - Mailing list pgsql-sql

From webmaster
Subject Re: [SQL] Beginner's headache of joins
Date
Msg-id 99070213263907.00441@localhost.localdomain
Whole thread Raw
List pgsql-sql
On Wed, 30 Jun 1999, you wrote:
>webmaster <webmaster@tony.cz> writes:
>> explain select catnum.catnum, kat.ident, kat.nazev where
>> catnum.catnum=kat.ident;
>
>> resulting as:
>
>> HASH JOIN ...
>>    -> SEQ SCAN ON catnum ...
>>    -> HASH ...
>>        -> SEQ SCAN ON kat...
>
>> So, I can see that it's performing two scans without using indexes. Why?
>
>Looks like a perfectly reasonable plan to me.  The nice thing about a
>hash join is that it doesn't need to examine the tables in sorted order,
>so there's no need for the expense of an index scan.  The system tries
>to estimate whether this will be cheaper than a merge join (which does
>need to scan the tables in sorted order), and evidently it thinks so.
>
>You haven't given us nearly enough info to tell whether that's a good
>decision or not, however.  How big are the tables, and what are the data
>types of the columns you're joining on?  It would help also to see the
>*full* output from EXPLAIN, including all the numeric values.
>
>                        regards, tom lane

I'm a bit surprised that there is some better way to solve such querry than
using index scan - I though it should be done by seq. scanning first table and
for each rec. make one index scan to find if there is "linked" record at second
table... OK, it's something far away my dbase experiences :) Where I can find
more information about HASH JOIN algorithm? And why it's performing so slow
(compared to similar job using old good .dbf tables) ?

After my post to this mail. list I read here discussion about joins and
something like LINK field type. I think it was discussion about my problem (or
my point of view) too. Maybe sometimes it would be better to have some
control of querry optimizer? Maybe I'm out of SQL philosophy, I don't know...

Anyway, here are some details about my tables. The first is about 60 000 recs,
second is about 80 000 recs, and there is about 8000 matching records -
records which have the same value for id keys (varchar(20) and varchar(13)). 
I'm just trying to link the information from some different source than
mine with my primary catalogue of music titles. OK, I believe you that it was
correct decision of the optimizer, if you say it. Evidently I should study more
the relational database concepts... Could somebody point me to some docs about
it?

--
Michal Samek, Tony distribuce s.r.o.
webmaster@tony.cz  (++420659/321350)
ICQ: 38607210


pgsql-sql by date:

Previous
From: Kyle Bateman
Date:
Subject: Re: [SQL] begin/end/abort work for sequences?
Next
From: Peter Eisentraut
Date:
Subject: Tricky SQL (?)