Thread: Beginner's headache of joins
Hi all, I'm trying to deal with joins the first time and it looks like I'm stupid enough to solve the problem without your help :) My main problem is when I'm performing SELECT on 3 tables - all the keys are indexed - Postgresql is working for so long time that I don't thing everything is ok. My last try ended with disk full of files like HJ999... at the database directory - so I know the postgresql is performing something called hash join... I don't know exactly if it is ok or not. IMHO my select job is not so complicated and tables are not so big to end with 300MB of temporary files (and it didn't finished at all), so something IS wrong... OK, I want to solve it myself... I was studying the docs so I know something more about it and I started experimenting with easier join of two tables. But I'm not sure what does mean following result of EXPLAIN SELECT: explain select catnum.catnum, kat.ident, kat.nazev where catnum.catnum=kat.ident; resulting as: HASH JOIN ... -> SEQ SCAN ON catnum ... -> HASH ... -> SEQ SCAN ON kat... So, I can see that it's performing two scans without using indexes. Why? both catnum.catnum and kat.ident are indexed (default index type)... I've expected one index scan, but there is not. What's wrong? I can't find the description of such problem at docs... I've tried VACUUMing both the tables but result is the same. My postgresql is 6.4.2. Maybe i'm using wrong index type? Maybe it's ok and I'm not understanding the whole join thing? Anyway thanks for any comments and sorry for my school English... -- Michal Samek, Tony distribuce s.r.o. webmaster@tony.cz (++420659/321350) ICQ: 38607210
webmaster <webmaster@tony.cz> writes: > explain select catnum.catnum, kat.ident, kat.nazev where > catnum.catnum=kat.ident; > resulting as: > HASH JOIN ... > -> SEQ SCAN ON catnum ... > -> HASH ... > -> SEQ SCAN ON kat... > So, I can see that it's performing two scans without using indexes. Why? Looks like a perfectly reasonable plan to me. The nice thing about a hash join is that it doesn't need to examine the tables in sorted order, so there's no need for the expense of an index scan. The system tries to estimate whether this will be cheaper than a merge join (which does need to scan the tables in sorted order), and evidently it thinks so. You haven't given us nearly enough info to tell whether that's a good decision or not, however. How big are the tables, and what are the data types of the columns you're joining on? It would help also to see the *full* output from EXPLAIN, including all the numeric values. regards, tom lane
Telling us the query you're doing might help. webmaster wrote: > > Hi all, > I'm trying to deal with joins the first time and it looks like I'm stupid > enough to solve the problem without your help :) > > My main problem is when I'm performing SELECT on 3 tables - all the keys are > indexed - Postgresql is working for so long time that I don't thing everything > is ok. My last try ended with disk full of files like HJ999... at the database > directory - so I know the postgresql is performing something called hash > join... I don't know exactly if it is ok or not. IMHO my select job is not so > complicated and tables are not so big to end with 300MB of temporary files (and > it didn't finished at all), so something IS wrong... > > OK, I want to solve it myself... I was studying the docs so I know something > more about it and I started experimenting with easier join of two tables. But > I'm not sure what does mean following result of EXPLAIN SELECT: > > explain select catnum.catnum, kat.ident, kat.nazev where > catnum.catnum=kat.ident; > > resulting as: > > HASH JOIN ... > -> SEQ SCAN ON catnum ... > -> HASH ... > -> SEQ SCAN ON kat... > > So, I can see that it's performing two scans without using indexes. Why? both > catnum.catnum and kat.ident are indexed (default index type)... I've expected > one index scan, but there is not. What's wrong? I can't find the description of > such problem at docs... I've tried VACUUMing both the tables but result > is the same. My postgresql is 6.4.2. Maybe i'm using wrong index type? Maybe > it's ok and I'm not understanding the whole join thing? > > Anyway thanks for any comments and sorry for my school English... > > -- > Michal Samek, Tony distribuce s.r.o. > webmaster@tony.cz (++420659/321350) > ICQ: 38