Thread: Join efficiency

Join efficiency

From
"Russ Brown"
Date:
Hello all,

Recently a post on this list made me think a bit about the way in which I
write my queries.

I have always written queries with ordinary joins in this manner:

SELECT * FROM a, b WHERE a.x=b.x;

However I recently saw an laternative syntax:

SELECT * FROM a JOIN b ON a.x=b.x;

Is there any difference between these queries in terms of the speed of
planning or the quality of the plan untimately used? I'd imagine that the
second form provides more information that the planner may be able to use
to make a better plan (or make a good plan more easily), but I've never
had any problems with the first form.

It also seems to me that the second form is more self-documenting, which
is something I'm always in favour of.

I'd appreciate anyone's thought/insight.

Thanks.

--

Russell Brown

Re: Join efficiency

From
Date:
NOTE: The first way cannot support OUTER joins, the second way can.  Hence sometimes one has to use
the second way for at least some of the joins.

PREVIOUSLY: The second way can allow one to tell the planner a "better way" to join the tables.
Likewise it can also enable the programmer to force the planner into a worse way.  Oops!
NOW: I believe that the latest version of postgres (7.4.x) the planner will override the 2nd methods
requested join method if it knows of a better way and can do the better way.  (Outer joins need to
be done last, by the nature of them, and so cannot be changed much, there may be other cases where
the planner cannot change the requested plan).

I am not an expert, but this is what I recall from following the list.

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Fax: (416) 441-9085


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Russ Brown
> Sent: Wednesday, September 01, 2004 7:55 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Join efficiency
>
>
> Hello all,
>
> Recently a post on this list made me think a bit about the
> way in which I
> write my queries.
>
> I have always written queries with ordinary joins in this manner:
>
> SELECT * FROM a, b WHERE a.x=b.x;
>
> However I recently saw an laternative syntax:
>
> SELECT * FROM a JOIN b ON a.x=b.x;
>
> Is there any difference between these queries in terms of the
> speed of
> planning or the quality of the plan untimately used? I'd
> imagine that the
> second form provides more information that the planner may be
> able to use
> to make a better plan (or make a good plan more easily), but
> I've never
> had any problems with the first form.
>
> It also seems to me that the second form is more
> self-documenting, which
> is something I'm always in favour of.
>
> I'd appreciate anyone's thought/insight.
>
> Thanks.
>
> --
>
> Russell Brown
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index
> scan if your
>       joining column's datatypes do not match
>


...

From
"Russ Brown"
Date:
Hi, thanks for your reply,

On Wed, 1 Sep 2004 08:10:52 -0400, <terry@ashtonwoodshomes.com> wrote:

> NOTE: The first way cannot support OUTER joins, the second way can.
> Hence sometimes one has to use
> the second way for at least some of the joins.
>

Yes, I've always done OUTER joins the second way. I suppose it's just the
way I was taught SQL: I was initially taught now to do 'ordinary' joins
using the first syntax, and then taught 'LEFT' joins using the second
syntax when I came to need to use them (I very much leaned SQL 'on the
job', though I know of people who *always* use OUTER joins in their
queries). I'd never considered that there was another syntax!

> PREVIOUSLY: The second way can allow one to tell the planner a "better
> way" to join the tables.
> Likewise it can also enable the programmer to force the planner into a
> worse way.  Oops!
> NOW: I believe that the latest version of postgres (7.4.x) the planner
> will override the 2nd methods
> requested join method if it knows of a better way and can do the better
> way.  (Outer joins need to
> be done last, by the nature of them, and so cannot be changed much,
> there may be other cases where
> the planner cannot change the requested plan).
>

That being the case, would it be true to say that with recent versions of
PostgreSQL they both perform identically, meaning the second could be
considered preferable due to its self-documenting nature (and consistency
with the OUTER JOIN syntax)?

> I am not an expert, but this is what I recall from following the list.
>
> Terry Fielder
> Manager Software Development and Deployment
> Great Gulf Homes / Ashton Woods Homes
> terry@greatgulfhomes.com
> Fax: (416) 441-9085
>
>
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org
>> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Russ Brown
>> Sent: Wednesday, September 01, 2004 7:55 AM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Join efficiency
>>
>>
>> Hello all,
>>
>> Recently a post on this list made me think a bit about the
>> way in which I
>> write my queries.
>>
>> I have always written queries with ordinary joins in this manner:
>>
>> SELECT * FROM a, b WHERE a.x=b.x;
>>
>> However I recently saw an laternative syntax:
>>
>> SELECT * FROM a JOIN b ON a.x=b.x;
>>
>> Is there any difference between these queries in terms of the
>> speed of
>> planning or the quality of the plan untimately used? I'd
>> imagine that the
>> second form provides more information that the planner may be
>> able to use
>> to make a better plan (or make a good plan more easily), but
>> I've never
>> had any problems with the first form.
>>
>> It also seems to me that the second form is more
>> self-documenting, which
>> is something I'm always in favour of.
>>
>> I'd appreciate anyone's thought/insight.
>>
>> Thanks.
>>
>> --
>>
>> Russell Brown
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index
>> scan if your
>>       joining column's datatypes do not match
>>
>



--

Russell Brown

Re: Join efficiency

From
Richard Huxton
Date:
Russ Brown wrote:
>
> I have always written queries with ordinary joins in this manner:
>
> SELECT * FROM a, b WHERE a.x=b.x;
>
> However I recently saw an laternative syntax:
>
> SELECT * FROM a JOIN b ON a.x=b.x;
>
> Is there any difference between these queries in terms of the speed of
> planning or the quality of the plan untimately used? I'd imagine that
> the  second form provides more information that the planner may be able
> to use  to make a better plan (or make a good plan more easily), but
> I've never  had any problems with the first form.

The first form allows PG to plan however it sees fit. The second will
force the join order to be the same as you specify in the query. This
doesn't matter here, but might with a more complicated query.

With v7.4 and higher, I believe this join forcing is configurable
(join_collapse_limit).

> It also seems to me that the second form is more self-documenting,
> which  is something I'm always in favour of.

I tend to prefer the WHERE form, but that might just be me.

--
   Richard Huxton
   Archonet Ltd

Re: Join efficiency

From
John Sidney-Woollett
Date:
Does anyone know if there is a postgres shorthand for Oracle's (+)
notation to denote an outer join?

eg

SELECT * from a, b where a.x = b.x (+)

John Sidney-Woollett

Richard Huxton wrote:
> Russ Brown wrote:
>
>>
>> I have always written queries with ordinary joins in this manner:
>>
>> SELECT * FROM a, b WHERE a.x=b.x;
>>
>> However I recently saw an laternative syntax:
>>
>> SELECT * FROM a JOIN b ON a.x=b.x;
>>
>> Is there any difference between these queries in terms of the speed
>> of  planning or the quality of the plan untimately used? I'd imagine
>> that the  second form provides more information that the planner may
>> be able to use  to make a better plan (or make a good plan more
>> easily), but I've never  had any problems with the first form.
>
>
> The first form allows PG to plan however it sees fit. The second will
> force the join order to be the same as you specify in the query. This
> doesn't matter here, but might with a more complicated query.
>
> With v7.4 and higher, I believe this join forcing is configurable
> (join_collapse_limit).
>
>> It also seems to me that the second form is more self-documenting,
>> which  is something I'm always in favour of.
>
>
> I tend to prefer the WHERE form, but that might just be me.
>

Re: Join efficiency

From
Richard Huxton
Date:
John Sidney-Woollett wrote:
> Does anyone know if there is a postgres shorthand for Oracle's (+)
> notation to denote an outer join?
>
> eg
>
> SELECT * from a, b where a.x = b.x (+)

Just the standard LEFT JOIN ... afaik

--
   Richard Huxton
   Archonet Ltd

Re: Join efficiency

From
"Michael Paesold"
Date:
Russ Brown wrote:

> >> SELECT * FROM a, b WHERE a.x=b.x;

> >> SELECT * FROM a JOIN b ON a.x=b.x;

> That being the case, would it be true to say that with recent versions of
> PostgreSQL they both perform identically, meaning the second could be
> considered preferable due to its self-documenting nature (and consistency
> with the OUTER JOIN syntax)?

Assuming join_collapse_limit is at it's default or set higher...

As far as I can say from reading the documentation, following the hackers
list and trying out myself: yes, both versions should yield the same
optimized query plan and are therefore equal performance wise.

You can just use the one you prefer.

Best Regards,
Michael Paesold


Re: Join efficiency

From
Tom Lane
Date:
"Russ Brown" <postgres@dot4dot.plus.com> writes:
> Is there any difference between these queries in terms of the speed of
> planning or the quality of the plan untimately used?

http://www.postgresql.org/docs/7.4/static/explicit-joins.html
http://www.postgresql.org/docs/7.3/static/explicit-joins.html
http://www.postgresql.org/docs/7.2/static/explicit-joins.html
http://www.postgresql.org/docs/7.1/static/explicit-joins.html

depending on which version you are using.  (I think 7.1-7.3
are essentially alike, but 7.4 is not.)

            regards, tom lane

Re: Join efficiency

From
"Russ Brown"
Date:
On Wed, 01 Sep 2004 10:31:07 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Russ Brown" <postgres@dot4dot.plus.com> writes:
>> Is there any difference between these queries in terms of the speed of
>> planning or the quality of the plan untimately used?
>
> http://www.postgresql.org/docs/7.4/static/explicit-joins.html
> http://www.postgresql.org/docs/7.3/static/explicit-joins.html
> http://www.postgresql.org/docs/7.2/static/explicit-joins.html
> http://www.postgresql.org/docs/7.1/static/explicit-joins.html
>
> depending on which version you are using.  (I think 7.1-7.3
> are essentially alike, but 7.4 is not.)
>
>             regards, tom lane
>

Thanks for that: very informative.

I should have spotted that in the manual myself, though it has been nice
reading other people's opinions on the subject too.

Regards.

--

Russell Brown

Re: Join efficiency

From
Laura Vance
Date:
This thread also brings up the question... what ever happened to the *
notation of the SQL2 standard for LEFT and RIGHT outer joins?

To pull all rows from table 'a' and only those from table 'b' that match
the column criteria.
SELECT * FROM a, b WHERE a.x*=b.x;

To pull all rows from table 'b' and only those from table 'a' that match
the column criteria.
SELECT * FROM a, b WHERE a.x=*b.x;

This notation was always easy to remember when you think of the asterisk
as being a wildcard (or 'all') for its side of the expression.

Russ Brown wrote:

> Hello all,
>
> Recently a post on this list made me think a bit about the way in
> which I  write my queries.
>
> I have always written queries with ordinary joins in this manner:
>
> SELECT * FROM a, b WHERE a.x=b.x;
>
> However I recently saw an laternative syntax:
>
> SELECT * FROM a JOIN b ON a.x=b.x;
>
> Is there any difference between these queries in terms of the speed
> of  planning or the quality of the plan untimately used? I'd imagine
> that the  second form provides more information that the planner may
> be able to use  to make a better plan (or make a good plan more
> easily), but I've never  had any problems with the first form.
>
> It also seems to me that the second form is more self-documenting,
> which  is something I'm always in favour of.
>
> I'd appreciate anyone's thought/insight.
>
> Thanks.
>

--
Thanks,
Laura Vance
Systems Engineer
Winfree Academy Charter Schools
6221 Riverside Dr. Suite 110
Irving, Tx  75039
Web: www.winfreeacademy.com



Re: Join efficiency

From
Jeff Boes
Date:
Russ Brown wrote:

> Is there any difference between these queries in terms of the speed of
> planning or the quality of the plan untimately used? I'd imagine that
> the  second form provides more information that the planner may be able
> to use  to make a better plan (or make a good plan more easily), but
> I've never  had any problems with the first form.

Use EXPLAIN:

http://www.postgresql.org/docs/7.4/static/sql-explain.html

--
(Posted from an account used as a SPAM dump. If you really want to get
in touch with me, dump the 'jboes' and substitute 'mur'.)
________
Jeffery Boes <>< jboes@qtm.net

Re: Join efficiency

From
"tanjunhua"
Date:
I'm sorry for my mistake, the postgre version is:

test_db=# SELECT version();
                                                    version
---------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.8 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)
(1 row)

----- Original Message -----
From: "tanjunhua" <tanjh@riso.co.jp>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, September 02, 2009 2:31 PM
Subject: [GENERAL] Join efficiency


> Hello, everybody.
> In my project, I have a select syntax to get record summary between three
> tables. one of them is tab_main consist of  46 columns(with 27797
> records), another is tab_user consist of 32 columns(with 3 records) and
> the last one is tab_property consist of 117 columns(with 30541 records). I
> have the trouble that it cost me a lot of time when execute the select
> syntax. the following is the select syntax and analyze result.
>
> table structure:
> tab_main(id, uid, status, bpassword, realdelflag, delflag, kind, ...)
> tab_user(uid, printauth, bprtpermit, ...)
> tab_property(id, mode, ...)
>
> 1. select syntax:
> EXPLAIN ANALYZE  SELECT count(Id) FROM (SELECT DISTINCT t1.Id AS Id FROM
> tab_main t1, tab_user t2, tab_property t3 WHERE (t1.uid = 2 AND t1.status
>  >= 21 AND t1.bpassword = 0 AND t1.realdelflag = 0 AND (t1.kind= 1  OR
> (t1.kind = 0 AND (t1.delflag <> 0 OR (t2.uid = 2 AND (t2.printauth = 2 OR
> t2.printauth = 3) AND t2.bprtpermit = 0 AND t3.id = t1.id AND (t3.mode = 0
> OR t3.mode = 1))))))) subt0;
>
> 2. analyze result:
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=19129.93..19129.94 rows=1 width=4) (actual
> time=10661.656..10661.658 rows=1 loops=1)
>   ->  Unique  (cost=18672.11..19129.92 rows=1 width=4) (actual
> time=8288.446..10661.586 rows=5 loops=1)
>         ->  Sort  (cost=18672.11..18901.01 rows=91562 width=4) (actual
> time=8288.440..9532.507 rows=458115 loops=1)
>               Sort Key: t1.id
>               ->  Nested Loop  (cost=1550.00..10341.45 rows=91562 width=4)
> (actual time=5.002..4724.436 rows=458115 loops=1)
>                     Join Filter: (("inner".jobkind = 1) OR
> (("inner".jobkind = 0) AND (("inner".delflag <> 0) OR (("inner".uid = 2)
> AND (("inner".printright = 2) OR ("inner".printright = 3)) AND
> ("inner".print_permitcolorprint = 0) AND ("outer".id = "inner".id) AND
> (("outer".colormode = 0) OR ("outer".colormode = 1))))))
>                     ->  Seq Scan on job_p t3  (cost=0.00..4668.41
> rows=30541 width=8) (actual time=0.023..170.619 rows=30541 loops=1)
>                     ->  Materialize  (cost=1550.00..1550.03 rows=3
> width=24) (actual time=0.002..0.055 rows=21 loops=30541)
>                           ->  Nested Loop  (cost=0.00..1550.00 rows=3
> width=24) (actual time=4.949..149.081 rows=21 loops=1)
>                                 ->  Seq Scan on job_ctl t1
> (cost=0.00..1548.94 rows=1 width=12) (actual time=4.906..148.794 rows=7
> loops=1)
>                                       Filter: ((uid = 2) AND (jobsts >=
> 21) AND (pinflag = 0) AND (realdelflag = 0))
>                                 ->  Seq Scan on users t2  (cost=0.00..1.03
> rows=3 width=12) (actual time=0.007..0.016 rows=3 loops=7)
> Total runtime: 10696.630 ms
> (13 rows)
>
> could anyone explain the result of analyze and give me some idea to speed
> up the select?  looking forward your response.
> best wishes.
>
> winsea