Re: Q: Performance of join vs embedded query for simple queries? - Mailing list pgsql-performance

From Tom Lane
Subject Re: Q: Performance of join vs embedded query for simple queries?
Date
Msg-id 1484.1155864093@sss.pgh.pa.us
Whole thread Raw
In response to Q: Performance of join vs embedded query for simple queries?  (mark@mark.mielke.cc)
Responses Re: Q: Performance of join vs embedded query for simple queries?  (mark@mark.mielke.cc)
Re: Q: Performance of join vs embedded query for simple queries?  (mark@mark.mielke.cc)
List pgsql-performance
mark@mark.mielke.cc writes:
> I have two simple queries that do what I believe to be the exact same
> thing.

These are actually not equivalent per spec.

> neudb=> select uid, name from sm_change where system_dbid = (select system_dbid from sm_system where uid =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da')and lower(name) = lower('markm-Q00855572'); 

> neudb=> select sm_change.uid, sm_change.name from sm_change join sm_system using (system_dbid) where sm_system.uid =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'and lower(sm_change.name) = lower('markm-Q00855572'); 

The subselect form constrains the sub-select to return at most one row
--- you'd have gotten an error if there were more than one sm_system row
with that uid.  The join form does not make this constraint.

Another related form is

neudb=> select uid, name from sm_change where system_dbid IN (select system_dbid from sm_system where uid =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da')and lower(name) = lower('markm-Q00855572'); 

This still isn't equivalent to the join: it'll return at most one copy
of any sm_change row, whereas you can get multiple copies of the same
sm_change row from the join, if there were multiple matching sm_system
rows.  (Hm, given the unique index on (system_dbid, uid), I guess that
couldn't actually happen --- but you have to reason about it knowing
that that index is there, it's not obvious from the form of the query.)

Anyway: given the way that the planner works, the IN form and the join
form will probably take comparable amounts of time to plan.  The "=
subselect" form is much more constrained in terms of the number of
alternative implementations we have, so it doesn't surprise me that it
takes less time to plan.

            regards, tom lane

pgsql-performance by date:

Previous
From: mark@mark.mielke.cc
Date:
Subject: Q: Performance of join vs embedded query for simple queries?
Next
From: mark@mark.mielke.cc
Date:
Subject: Re: Q: Performance of join vs embedded query for simple queries?