Thread: Beginner's headache of joins

Beginner's headache of joins

From
webmaster
Date:
Hi all,
I'm trying to deal with joins the first time and it looks like I'm stupid
enough to solve the problem without your help :)

My main problem is when I'm performing SELECT on 3 tables - all the keys are
indexed - Postgresql is working for so long time that I don't thing everything
is ok. My last try ended with disk full of files like HJ999... at the database
directory - so I know the postgresql is performing something called hash
join... I don't know exactly if it is ok or not. IMHO my select job is not so
complicated and tables are not so big to end with 300MB of temporary files (and
it didn't finished at all), so something IS wrong...

OK, I want to solve it myself... I was studying the docs so I know something
more about it and I started experimenting with easier join of two tables. But
I'm not sure what does mean following result of EXPLAIN SELECT:

explain select catnum.catnum, kat.ident, kat.nazev where
catnum.catnum=kat.ident;

resulting as:

HASH JOIN ... -> SEQ SCAN ON catnum ... -> HASH ...   -> SEQ SCAN ON kat...

So, I can see that it's performing two scans without using indexes. Why? both
catnum.catnum and kat.ident are indexed (default index type)... I've expected
one index scan, but there is not. What's wrong? I can't find the description of
such problem at docs...  I've tried VACUUMing both the tables but result
is the same. My postgresql is 6.4.2. Maybe i'm using wrong index type? Maybe
it's ok and I'm not understanding the whole join thing? 

Anyway thanks for any comments and sorry for my school English...

--
Michal Samek, Tony distribuce s.r.o.
webmaster@tony.cz  (++420659/321350)
ICQ: 38607210


Re: [SQL] Beginner's headache of joins

From
Tom Lane
Date:
webmaster <webmaster@tony.cz> writes:
> explain select catnum.catnum, kat.ident, kat.nazev where
> catnum.catnum=kat.ident;

> resulting as:

> HASH JOIN ...
>   -> SEQ SCAN ON catnum ...
>   -> HASH ...
>     -> SEQ SCAN ON kat...

> So, I can see that it's performing two scans without using indexes. Why?

Looks like a perfectly reasonable plan to me.  The nice thing about a
hash join is that it doesn't need to examine the tables in sorted order,
so there's no need for the expense of an index scan.  The system tries
to estimate whether this will be cheaper than a merge join (which does
need to scan the tables in sorted order), and evidently it thinks so.

You haven't given us nearly enough info to tell whether that's a good
decision or not, however.  How big are the tables, and what are the data
types of the columns you're joining on?  It would help also to see the
*full* output from EXPLAIN, including all the numeric values.
        regards, tom lane


Re: [SQL] Beginner's headache of joins

From
Chris Bitmead
Date:
Telling us the query you're doing might help.

webmaster wrote:
> 
> Hi all,
> I'm trying to deal with joins the first time and it looks like I'm stupid
> enough to solve the problem without your help :)
> 
> My main problem is when I'm performing SELECT on 3 tables - all the keys are
> indexed - Postgresql is working for so long time that I don't thing everything
> is ok. My last try ended with disk full of files like HJ999... at the database
> directory - so I know the postgresql is performing something called hash
> join... I don't know exactly if it is ok or not. IMHO my select job is not so
> complicated and tables are not so big to end with 300MB of temporary files (and
> it didn't finished at all), so something IS wrong...
> 
> OK, I want to solve it myself... I was studying the docs so I know something
> more about it and I started experimenting with easier join of two tables. But
> I'm not sure what does mean following result of EXPLAIN SELECT:
> 
> explain select catnum.catnum, kat.ident, kat.nazev where
> catnum.catnum=kat.ident;
> 
> resulting as:
> 
> HASH JOIN ...
>   -> SEQ SCAN ON catnum ...
>   -> HASH ...
>     -> SEQ SCAN ON kat...
> 
> So, I can see that it's performing two scans without using indexes. Why? both
> catnum.catnum and kat.ident are indexed (default index type)... I've expected
> one index scan, but there is not. What's wrong? I can't find the description of
> such problem at docs...  I've tried VACUUMing both the tables but result
> is the same. My postgresql is 6.4.2. Maybe i'm using wrong index type? Maybe
> it's ok and I'm not understanding the whole join thing?
> 
> Anyway thanks for any comments and sorry for my school English...
> 
> --
> Michal Samek, Tony distribuce s.r.o.
> webmaster@tony.cz  (++420659/321350)
> ICQ: 38