Re: JOIN vs. LEFT JOIN - Mailing list pgsql-novice

From Nico Callewaert
Subject Re: JOIN vs. LEFT JOIN
Date
Msg-id 5A5C5FB6FC174396A1B165B2B282526A@etsinformatics.local
Whole thread Raw
In response to JOIN vs. LEFT JOIN  ("Nico Callewaert" <callewaert.nico@telenet.be>)
Responses Re: JOIN vs. LEFT JOIN  (Andreas Wenk <a.wenk@netzmeister-st-pauli.de>)
List pgsql-novice
>>>>> Nico Callewaert schrieb:
>>>>>> Hi !
>>>>>>
>>>>>> I heard that it is always better to use a full JOIN rather than a
>>>>>> LEFT
>>>>>> JOIN, for performance reasons.  Is that correct ?
>>>>>
>>>>> please note that a JOIN and a LEFT JOIN are tow different things.
>>>>> Let's assume you have
>>>>> two tables (without any sens):
>>>>>
>>>>> books:
>>>>>
>>>>> headline |   content
>>>>> - ----------+-------------
>>>>> politics | russia
>>>>> politics | middle east
>>>>> politics | germany
>>>>>
>>>>> interests:
>>>>>
>>>>> content
>>>>> - ---------
>>>>> germany
>>>>>
>>>>> Then fire two select statements like this:
>>>>>
>>>>> #1:
>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>> FROM books a
>>>>> JOIN interests b ON a.content = b.content;
>>>>>
>>>>>  headline | content | contentb
>>>>> - ----------+---------+----------
>>>>> politics | germany | germany
>>>>> (1 row)
>>>>>
>>>>> #2:
>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>> FROM books a
>>>>> LEFT JOIN interests b on a.content = b.content;
>>>>> headline |   content   | contentb
>>>>> - ----------+-------------+----------
>>>>> politics | russia      |
>>>>> politics | middle east |
>>>>> politics | germany     | germany
>>>>> (3 rows)
>>>>>
>>>>>> But it's barely possible to use full JOINS all the time, since most
>>>>>> of
>>>>>> the lookup fields are not required.
>>>>>> Example : in the customer table, the customer type is a looup field
>>>>>> to a
>>>>>> customer_type table.  But that is not required to post the record.
>>>>>> So I
>>>>>> was thinking to create a record in the customer_type table with ID =
>>>>>> -1.  And in case the customer type is NULL, to assign it the
>>>>>> value -1.
>>>>>> That way, it will be always possible to do a full JOIN.  I was
>>>>>> wondering
>>>>>> if that is good practice or rather nonsense ?
>>>>>
>>>>> Hm concerning the functionality of LEFT JOIN I do not really
>>>>> understand what you wanna do
>>>>> here. You created the customer_type table to have the possibility to
>>>>> give more types to
>>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The
>>>>> statement could be like:
>>>>>
>>>>> SELECT name a
>>>>> FROM customer a, cutomer_type b
>>>>> WHERE a.id = b.id
>>>>> AND b.type = 'super customer'
>>>>>
>>>>> Or not?
>>>>>
>>>>> Cheers
>>>>>
>>>>> Andy
>>>>
>>>>
>>>> Hi Andreas,
>>>>
>>>> Thanks for the reply.  I know the difference between JOIN and LEFT
>>>> JOIN.
>>>> The question was just if there is a performance hit between the 2 ?
>>>>
>>>> I never use constructs like this : SELECT name a
>>>>> FROM customer a, cutomer_type b
>>>>> WHERE a.id = b.id
>>>>> AND b.type = 'super customer'
>>>>
>>>> Always Joins.
>>>> I will try to clarify.  The current situation is : the customer_type
>>>> has
>>>> no value, so = NULL, no problem with LEFT JOIN.
>>>
>>> Why do you need an empty entry in this table? Your application should
>>> take care, that this
>>> does not happen ...
>>>
>>>> But I'm wondering with tables that has thousands of records, if LEFT
>>>> JOINS are performing well ?
>>>
>>> EXPLAIN ANALYZE with my examples:
>>>
>>> #1
>>> test=# explain analyze select a.headline,a.content,b.content as
>>> contentb from books a join
>>> interests b on a.content = b.content;
>>>                                                    QUERY PLAN
>>>
>>> -
>>> ------------------------------------------------------------------------------------------------------------------
>>>
>>> Hash Join  (cost=10.88..23.67 rows=70 width=1548) (actual
>>> time=0.051..0.058 rows=1 loops=1)
>>>   Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>>   ->  Seq Scan on interests b  (cost=0.00..11.40 rows=140 width=516)
>>> (actual
>>> time=0.007..0.010 rows=1 loops=1)
>>>   ->  Hash  (cost=10.70..10.70 rows=70 width=1032) (actual
>>> time=0.026..0.026 rows=3 loops=1)
>>>         ->  Seq Scan on books a  (cost=0.00..10.70 rows=70 width=1032)
>>> (actual
>>> time=0.004..0.013 rows=3 loops=1)
>>> Total runtime: 0.094 ms
>>> (6 rows)
>>>
>>>
>>> #2
>>> test=# explain analyze select a.headline,a.content,b.content as
>>> contentb from books a
>>> left join interests b on a.content = b.content;
>>>                                                      QUERY PLAN
>>>
>>> -
>>>
----------------------------------------------------------------------------------------------------------------------
>>>
>>> Hash Left Join  (cost=11.75..23.72 rows=70 width=1548) (actual
>>> time=0.038..0.062 rows=3
>>> loops=1)
>>>   Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>>   ->  Seq Scan on books a  (cost=0.00..10.70 rows=70 width=1032) (actual
>>> time=0.007..0.014 rows=3 loops=1)
>>>   ->  Hash  (cost=11.40..11.40 rows=140 width=516) (actual
>>> time=0.015..0.015 rows=1 loops=1)
>>>         ->  Seq Scan on interests b  (cost=0.00..11.40 rows=140
>>> width=516) (actual
>>> time=0.004..0.007 rows=1 loops=1)
>>> Total runtime: 0.102 ms
>>> (6 rows)
>>>
>>> I fired each statement ten times. But seriously - this is more or less
>>> just a hint that a
>>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it
>>> depend on which execution
>>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE
>>> with your statements.
>>>
>>> Sorry for the big post with much output ...
>>>
>>> Cheers
>>>
>>> Andy
>>>
>>
>>
>> Hi Andy,
>>
>> Thanks again for the fast reply !
>>
>>> Why do you need an empty entry in this table? Your application should
>>> take care, that this
>>> does not happen ...
>>
>> That is the whole point of my doubts.  When a user creates a new
>> customer, it's not an obligation to enter a customer_type, and still
>> many other fields, like tax, payment, yellow pages, payment method,
>> etc... related things.  So I always have lots of LEFT JOIN's in my
>> queries, because of the NULL fields.  You said the application should
>> handle it.  So you mean, even the field is not obligatory, you would put
>> a value on it ?  All lookup tables are of course linked through foreign
>> keys to enforce data integrity. So what value would you enter ?  Like 0,
>> or -1 ??  And make sure there is a 0, or -1 record in the lookup table
>> ?  Otherwise the foreign key will complain.
>> I think I almost get your point, just need a last little push :-)
>
> Ah ok - now we are more in database/application design ;-). Here's another
> question: are
> you really sure that the databse design is correct? Think about having all
> the info like
> customer_type, tax, payment, yellow pages, payment method in the customer
> table also.
> Menas - do you really need for these info more than one entry for one
> customer? So are you
> sure you need the 1:n relation in all cases? I think not, so that would
> mean you put the
> info about e.g tax, payment method and so on in the customer table as well
> and not in one
> (or more ...? you were writing "all referenced tables") referenced tables.
> As a result you
> will have less LEFT JOINS ...
>
> Cheers
>
> Andy
>

Hi Andy,

I don't know what are the group policies here ? In case I have to cut a part
of the message, just tell me, in case it gets too long.
And in case it gets off topic, just send me a warning :-)

Well, yes, we need all those 1:n relationships, because customer type can be
prospect, customer, not active.  Payment methods can be visa, cash, wire
transfer, etc...  Yellow pages are the category numbers.  VAT rules can be
21%, 6%, etc...  All of these fields are feeded through drop downs in the
application, so the customer can easily select a value.

Nico



pgsql-novice by date:

Previous
From: Andreas Wenk
Date:
Subject: Re: JOIN vs. LEFT JOIN
Next
From: "Obe, Regina"
Date:
Subject: Re: JOIN vs. LEFT JOIN