Thread: Left Join Question

Left Join Question

From
"Ryan Wells"
Date:
<div class="Section1"><p class="MsoNormal">While looking through our data layer code today, I ran across this query:<p
class="MsoNormal"> <pclass="MsoNormal">SELECT <p class="MsoNormal">tasks.*,<p class="MsoNormal">clients.FirstName,<p
class="MsoNormal">clients.LastName,<pclass="MsoNormal">clients.MiddleInitial,<p class="MsoNormal">iteminfo.CreatedBy,<p
class="MsoNormal">iteminfo.StationId,<pclass="MsoNormal">iteminfo.CreatedDate,<p
class="MsoNormal">changelog.LastModified,<pclass="MsoNormal">changelog.LastModifiedBy,<p
class="MsoNormal">changelog.LastModifiedAt,<pclass="MsoNormal">ticklers.Due,<p class="MsoNormal">ticklers.Reminder<p
class="MsoNormal"> <pclass="MsoNormal">FROM tasks <p class="MsoNormal"> LEFT JOIN clients ON tasks.ClientId =
clients.ClientId<p class="MsoNormal">  LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId <p class="MsoNormal">   LEFT
JOINchangelog ON tasks.Id = changelog.ItemId <p class="MsoNormal">    LEFT JOIN ticklers ON tasks.Id =
ticklers.RelatedId<p class="MsoNormal"> <p class="MsoNormal">WHERE tasks.Id = '123456';<p class="MsoNormal"> <p
class="MsoNormal">(I'vecleaned it up so it's  easier to read.)<p class="MsoNormal"> <p class="MsoNormal">The basic data
structureis that we have a todo list that contains a list of tasks which may or may not be associated with clients,
items,log entries, or ticklers (scheduled reminders).  <p class="MsoNormal"> <p class="MsoNormal">The query works as
intended: it returns a result-set with all the necessary data to display in the todo list.   The performance is not a
majorconcern, although it can be slow for large lists.<p class="MsoNormal"> <p class="MsoNormal">Since it works, my
questionis really more about principles:  Given that each of the tables in question will contain tens of thousands of
rows,is a nested join really the best way to approach this?<p class="MsoNormal"> <p class="MsoNormal">Thanks!<p
class="MsoNormal">Ryan</div>

Re: Left Join Question

From
ries van Twisk
Date:

On Nov 18, 2008, at 5:48 PM, Ryan Wells wrote:

While looking through our data layer code today, I ran across this query:
 
SELECT
tasks.*,
clients.FirstName,
clients.LastName,
clients.MiddleInitial,
iteminfo.CreatedBy,
iteminfo.StationId,
iteminfo.CreatedDate,
changelog.LastModified,
changelog.LastModifiedBy,
changelog.LastModifiedAt,
ticklers.Due,
ticklers.Reminder
 
FROM tasks
 LEFT JOIN clients ON tasks.ClientId = clients.ClientId
  LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId
   LEFT JOIN changelog ON tasks.Id = changelog.ItemId
    LEFT JOIN ticklers ON tasks.Id = ticklers.RelatedId
 
WHERE tasks.Id = '123456';
 
(I've cleaned it up so it's  easier to read.)
 
The basic data structure is that we have a todo list that contains a list of tasks which may or may not be associated with clients, items, log entries, or ticklers (scheduled reminders). 
 
The query works as intended:  it returns a result-set with all the necessary data to display in the todo list.   The performance is not a major concern, although it can be slow for large lists.
 
Since it works, my question is really more about principles:  Given that each of the tables in question will contain tens of thousands of rows, is a nested join really the best way to approach this?
 
Thanks!
Ryan

To answer this better we need to know the exact relations between each table.
For example, does each task have 0 or more clients, or 1 or more clients?
If it's 1 or more, then you can properly use a JOIN instead of a LEFT JOIN, this would make the operation faster I believe.

I hope I did explain myself correctly.

Ries







Re: Left Join Question

From
Mark Roberts
Date:
On Tue, 2008-11-18 at 16:48 -0600, Ryan Wells wrote:
> Since it works, my question is really more about principles:  Given
> that each of the tables in question will contain tens of thousands of
> rows, is a nested join really the best way to approach this?

I don't see what's wrong with it.  The planner will likely pare down
tasks to its result set before joining to the other joins (which are all
keyed on various fields from task).

I know that we have lots of joins like this scattered all over our code,
but we usually use inner joins unless there's a specific reason not to.

-Mark




Re: Left Join Question

From
Richard Huxton
Date:
Ryan Wells wrote:
> 
> FROM tasks 
>  LEFT JOIN clients ON tasks.ClientId = clients.ClientId 
>   LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId 
>    LEFT JOIN changelog ON tasks.Id = changelog.ItemId 
>     LEFT JOIN ticklers ON tasks.Id = ticklers.RelatedId 
> WHERE tasks.Id = '123456';
> 
> (I've cleaned it up so it's  easier to read.)

Thanks Ryan - that always makes it easier.

> The query works as intended:  it returns a result-set with all the
> necessary data to display in the todo list.   The performance is not a
> major concern, although it can be slow for large lists.
> 
> Since it works, my question is really more about principles:  Given that
> each of the tables in question will contain tens of thousands of rows,
> is a nested join really the best way to approach this?

For a single query, I don't think you've got any alternative.

For the case when you have large numbers of results from iteminfo etc.
it might well be quicker to do separate queries. That's simple enough
since you're joining straight to tasks.id, but is obviously more fiddly
and when there aren't many rows returned would probably be slower (you'd
have 5 lots of query parsing/execute overhead). That might be a
trade-off that makes sense to you though.

Some other systems allow you to define stored procedures that return
multiple record-sets, but the closest PG can do at the moment is to
return multiple cursors (see plpgsql and refcursor in the manuals) and
that's a bit fiddly to use.

HTH
--  Richard Huxton Archonet Ltd


Re: Left Join Question

From
"Ryan Wells"
Date:
Thanks for the feedback, everyone.

> > FROM tasks
> >  LEFT JOIN clients ON tasks.ClientId = clients.ClientId
> >   LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId
> >    LEFT JOIN changelog ON tasks.Id = changelog.ItemId
> >     LEFT JOIN ticklers ON tasks.Id = ticklers.RelatedId
> > WHERE tasks.Id = '123456';
> >
> > (I've cleaned it up so it's  easier to read.)
>
> Thanks Ryan - that always makes it easier.

It actually looked so much less crazy after the cleaning that I nearly
decided not to ask about it, but my I was really interested in learning
if there was a dramatically better approach.  Looks like there probably
isn't, which is fine.

> For the case when you have large numbers of results from iteminfo etc.
> it might well be quicker to do separate queries. That's simple enough
> since you're joining straight to tasks.id, but is obviously more
fiddly
> and when there aren't many rows returned would probably be slower
(you'd
> have 5 lots of query parsing/execute overhead). That might be a
> trade-off that makes sense to you though.

Even though the tables are fairly big, the result-set should be small (a
few dozen maybe), so multiple queries probably wouldn't gain much.

Thanks again!
Ryan