Re: JOIN vs. LEFT JOIN - Mailing list pgsql-novice
From | Andreas Wenk |
---|---|
Subject | Re: JOIN vs. LEFT JOIN |
Date | |
Msg-id | 4980528B.7040605@netzmeister-st-pauli.de Whole thread Raw |
In response to | Re: JOIN vs. LEFT JOIN ("Nico Callewaert" <callewaert.nico@telenet.be>) |
List | pgsql-novice |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nico Callewaert schrieb: >>>>>> Nico Callewaert schrieb: >>>>>>> Hi ! >>>>>>> >>>>>>> I heard that it is always better to use a full JOIN rather than a >>>>>>> LEFT >>>>>>> JOIN, for performance reasons. Is that correct ? >>>>>> >>>>>> please note that a JOIN and a LEFT JOIN are tow different things. >>>>>> Let's assume you have >>>>>> two tables (without any sens): >>>>>> >>>>>> books: >>>>>> >>>>>> headline | content >>>>>> - ----------+------------- >>>>>> politics | russia >>>>>> politics | middle east >>>>>> politics | germany >>>>>> >>>>>> interests: >>>>>> >>>>>> content >>>>>> - --------- >>>>>> germany >>>>>> >>>>>> Then fire two select statements like this: >>>>>> >>>>>> #1: >>>>>> SELECT a.headline,a.content,b.content as contentb >>>>>> FROM books a >>>>>> JOIN interests b ON a.content = b.content; >>>>>> >>>>>> headline | content | contentb >>>>>> - ----------+---------+---------- >>>>>> politics | germany | germany >>>>>> (1 row) >>>>>> >>>>>> #2: >>>>>> SELECT a.headline,a.content,b.content as contentb >>>>>> FROM books a >>>>>> LEFT JOIN interests b on a.content = b.content; >>>>>> headline | content | contentb >>>>>> - ----------+-------------+---------- >>>>>> politics | russia | >>>>>> politics | middle east | >>>>>> politics | germany | germany >>>>>> (3 rows) >>>>>> >>>>>>> But it's barely possible to use full JOINS all the time, since >>>>>>> most of >>>>>>> the lookup fields are not required. >>>>>>> Example : in the customer table, the customer type is a looup field >>>>>>> to a >>>>>>> customer_type table. But that is not required to post the record. >>>>>>> So I >>>>>>> was thinking to create a record in the customer_type table with ID = >>>>>>> -1. And in case the customer type is NULL, to assign it the >>>>>>> value -1. >>>>>>> That way, it will be always possible to do a full JOIN. I was >>>>>>> wondering >>>>>>> if that is good practice or rather nonsense ? >>>>>> >>>>>> Hm concerning the functionality of LEFT JOIN I do not really >>>>>> understand what you wanna do >>>>>> here. You created the customer_type table to have the possibility to >>>>>> give more types to >>>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The >>>>>> statement could be like: >>>>>> >>>>>> SELECT name a >>>>>> FROM customer a, cutomer_type b >>>>>> WHERE a.id = b.id >>>>>> AND b.type = 'super customer' >>>>>> >>>>>> Or not? >>>>>> >>>>>> Cheers >>>>>> >>>>>> Andy >>>>> >>>>> >>>>> Hi Andreas, >>>>> >>>>> Thanks for the reply. I know the difference between JOIN and LEFT >>>>> JOIN. >>>>> The question was just if there is a performance hit between the 2 ? >>>>> >>>>> I never use constructs like this : SELECT name a >>>>>> FROM customer a, cutomer_type b >>>>>> WHERE a.id = b.id >>>>>> AND b.type = 'super customer' >>>>> >>>>> Always Joins. >>>>> I will try to clarify. The current situation is : the >>>>> customer_type has >>>>> no value, so = NULL, no problem with LEFT JOIN. >>>> >>>> Why do you need an empty entry in this table? Your application should >>>> take care, that this >>>> does not happen ... >>>> >>>>> But I'm wondering with tables that has thousands of records, if LEFT >>>>> JOINS are performing well ? >>>> >>>> EXPLAIN ANALYZE with my examples: >>>> >>>> #1 >>>> test=# explain analyze select a.headline,a.content,b.content as >>>> contentb from books a join >>>> interests b on a.content = b.content; >>>> QUERY PLAN >>>> >>>> - >>>> ------------------------------------------------------------------------------------------------------------------ >>>> >>>> >>>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual >>>> time=0.051..0.058 rows=1 loops=1) >>>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) >>>> (actual >>>> time=0.007..0.010 rows=1 loops=1) >>>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual >>>> time=0.026..0.026 rows=3 loops=1) >>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >>>> (actual >>>> time=0.004..0.013 rows=3 loops=1) >>>> Total runtime: 0.094 ms >>>> (6 rows) >>>> >>>> >>>> #2 >>>> test=# explain analyze select a.headline,a.content,b.content as >>>> contentb from books a >>>> left join interests b on a.content = b.content; >>>> QUERY PLAN >>>> >>>> - >>>> ---------------------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual >>>> time=0.038..0.062 rows=3 >>>> loops=1) >>>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >>>> (actual >>>> time=0.007..0.014 rows=3 loops=1) >>>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual >>>> time=0.015..0.015 rows=1 loops=1) >>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 >>>> width=516) (actual >>>> time=0.004..0.007 rows=1 loops=1) >>>> Total runtime: 0.102 ms >>>> (6 rows) >>>> >>>> I fired each statement ten times. But seriously - this is more or less >>>> just a hint that a >>>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it >>>> depend on which execution >>>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE >>>> with your statements. >>>> >>>> Sorry for the big post with much output ... >>>> >>>> Cheers >>>> >>>> Andy >>>> >>> >>> >>> Hi Andy, >>> >>> Thanks again for the fast reply ! >>> >>>> Why do you need an empty entry in this table? Your application should >>>> take care, that this >>>> does not happen ... >>> >>> That is the whole point of my doubts. When a user creates a new >>> customer, it's not an obligation to enter a customer_type, and still >>> many other fields, like tax, payment, yellow pages, payment method, >>> etc... related things. So I always have lots of LEFT JOIN's in my >>> queries, because of the NULL fields. You said the application should >>> handle it. So you mean, even the field is not obligatory, you would put >>> a value on it ? All lookup tables are of course linked through foreign >>> keys to enforce data integrity. So what value would you enter ? Like 0, >>> or -1 ?? And make sure there is a 0, or -1 record in the lookup table >>> ? Otherwise the foreign key will complain. >>> I think I almost get your point, just need a last little push :-) >> >> Ah ok - now we are more in database/application design ;-). Here's >> another question: are >> you really sure that the databse design is correct? Think about having >> all the info like >> customer_type, tax, payment, yellow pages, payment method in the >> customer table also. >> Menas - do you really need for these info more than one entry for one >> customer? So are you >> sure you need the 1:n relation in all cases? I think not, so that >> would mean you put the >> info about e.g tax, payment method and so on in the customer table as >> well and not in one >> (or more ...? you were writing "all referenced tables") referenced >> tables. As a result you >> will have less LEFT JOINS ... >> >> Cheers >> >> Andy >> > > Hi Andy, > > I don't know what are the group policies here ? In case I have to cut a > part of the message, just tell me, in case it gets too long. > And in case it gets off topic, just send me a warning :-) don't worry ;-) > Well, yes, we need all those 1:n relationships, because customer type > can be prospect, customer, not active. Payment methods can be visa, > cash, wire transfer, etc... Yellow pages are the category numbers. VAT > rules can be 21%, 6%, etc... All of these fields are feeded through Ok please don't get me wrong for asking again: one customer can have two tax rates? Why? Because he's living in the USA and has a taxrate of 25% and lives also in Germany and has 19%? Hm ... strange ... ;-) . And also the custumer is 'prospect' OR 'not active' ... but not both ... ;-) > drop downs in the application, so the customer can easily select a value. So all these dropdowns are multiselect ...? I am not sure if I can help you with this. My idea is really to think about the design of the application and / or the database ... again - please don't get me wrong ... > Nico > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgFKLVa7znmSP9AwRAlGkAJ9imdvkfk3iTw146Eru3BFB1llPZQCgqPXb fYhH9zDowHAtjAYjfAHlhcM= =Kpf5 -----END PGP SIGNATURE-----
pgsql-novice by date: