Thread: Left Join Question
<div class="Section1"><p class="MsoNormal">While looking through our data layer code today, I ran across this query:<p class="MsoNormal"> <pclass="MsoNormal">SELECT <p class="MsoNormal">tasks.*,<p class="MsoNormal">clients.FirstName,<p class="MsoNormal">clients.LastName,<pclass="MsoNormal">clients.MiddleInitial,<p class="MsoNormal">iteminfo.CreatedBy,<p class="MsoNormal">iteminfo.StationId,<pclass="MsoNormal">iteminfo.CreatedDate,<p class="MsoNormal">changelog.LastModified,<pclass="MsoNormal">changelog.LastModifiedBy,<p class="MsoNormal">changelog.LastModifiedAt,<pclass="MsoNormal">ticklers.Due,<p class="MsoNormal">ticklers.Reminder<p class="MsoNormal"> <pclass="MsoNormal">FROM tasks <p class="MsoNormal"> LEFT JOIN clients ON tasks.ClientId = clients.ClientId<p class="MsoNormal"> LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId <p class="MsoNormal"> LEFT JOINchangelog ON tasks.Id = changelog.ItemId <p class="MsoNormal"> LEFT JOIN ticklers ON tasks.Id = ticklers.RelatedId<p class="MsoNormal"> <p class="MsoNormal">WHERE tasks.Id = '123456';<p class="MsoNormal"> <p class="MsoNormal">(I'vecleaned it up so it's easier to read.)<p class="MsoNormal"> <p class="MsoNormal">The basic data structureis that we have a todo list that contains a list of tasks which may or may not be associated with clients, items,log entries, or ticklers (scheduled reminders). <p class="MsoNormal"> <p class="MsoNormal">The query works as intended: it returns a result-set with all the necessary data to display in the todo list. The performance is not a majorconcern, although it can be slow for large lists.<p class="MsoNormal"> <p class="MsoNormal">Since it works, my questionis really more about principles: Given that each of the tables in question will contain tens of thousands of rows,is a nested join really the best way to approach this?<p class="MsoNormal"> <p class="MsoNormal">Thanks!<p class="MsoNormal">Ryan</div>
On Nov 18, 2008, at 5:48 PM, Ryan Wells wrote:
While looking through our data layer code today, I ran across this query:SELECTtasks.*,clients.FirstName,clients.LastName,clients.MiddleInitial,iteminfo.CreatedBy,iteminfo.StationId,iteminfo.CreatedDate,changelog.LastModified,changelog.LastModifiedBy,changelog.LastModifiedAt,ticklers.Due,ticklers.ReminderFROM tasksLEFT JOIN clients ON tasks.ClientId = clients.ClientIdLEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemIdLEFT JOIN changelog ON tasks.Id = changelog.ItemIdLEFT JOIN ticklers ON tasks.Id = ticklers.RelatedIdWHERE tasks.Id = '123456';(I've cleaned it up so it's easier to read.)The basic data structure is that we have a todo list that contains a list of tasks which may or may not be associated with clients, items, log entries, or ticklers (scheduled reminders).The query works as intended: it returns a result-set with all the necessary data to display in the todo list. The performance is not a major concern, although it can be slow for large lists.Since it works, my question is really more about principles: Given that each of the tables in question will contain tens of thousands of rows, is a nested join really the best way to approach this?Thanks!Ryan
To answer this better we need to know the exact relations between each table.
For example, does each task have 0 or more clients, or 1 or more clients?
If it's 1 or more, then you can properly use a JOIN instead of a LEFT JOIN, this would make the operation faster I believe.
I hope I did explain myself correctly.
Ries
On Tue, 2008-11-18 at 16:48 -0600, Ryan Wells wrote: > Since it works, my question is really more about principles: Given > that each of the tables in question will contain tens of thousands of > rows, is a nested join really the best way to approach this? I don't see what's wrong with it. The planner will likely pare down tasks to its result set before joining to the other joins (which are all keyed on various fields from task). I know that we have lots of joins like this scattered all over our code, but we usually use inner joins unless there's a specific reason not to. -Mark
Ryan Wells wrote: > > FROM tasks > LEFT JOIN clients ON tasks.ClientId = clients.ClientId > LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId > LEFT JOIN changelog ON tasks.Id = changelog.ItemId > LEFT JOIN ticklers ON tasks.Id = ticklers.RelatedId > WHERE tasks.Id = '123456'; > > (I've cleaned it up so it's easier to read.) Thanks Ryan - that always makes it easier. > The query works as intended: it returns a result-set with all the > necessary data to display in the todo list. The performance is not a > major concern, although it can be slow for large lists. > > Since it works, my question is really more about principles: Given that > each of the tables in question will contain tens of thousands of rows, > is a nested join really the best way to approach this? For a single query, I don't think you've got any alternative. For the case when you have large numbers of results from iteminfo etc. it might well be quicker to do separate queries. That's simple enough since you're joining straight to tasks.id, but is obviously more fiddly and when there aren't many rows returned would probably be slower (you'd have 5 lots of query parsing/execute overhead). That might be a trade-off that makes sense to you though. Some other systems allow you to define stored procedures that return multiple record-sets, but the closest PG can do at the moment is to return multiple cursors (see plpgsql and refcursor in the manuals) and that's a bit fiddly to use. HTH -- Richard Huxton Archonet Ltd
Thanks for the feedback, everyone. > > FROM tasks > > LEFT JOIN clients ON tasks.ClientId = clients.ClientId > > LEFT JOIN iteminfo ON tasks.Id = iteminfo.ItemId > > LEFT JOIN changelog ON tasks.Id = changelog.ItemId > > LEFT JOIN ticklers ON tasks.Id = ticklers.RelatedId > > WHERE tasks.Id = '123456'; > > > > (I've cleaned it up so it's easier to read.) > > Thanks Ryan - that always makes it easier. It actually looked so much less crazy after the cleaning that I nearly decided not to ask about it, but my I was really interested in learning if there was a dramatically better approach. Looks like there probably isn't, which is fine. > For the case when you have large numbers of results from iteminfo etc. > it might well be quicker to do separate queries. That's simple enough > since you're joining straight to tasks.id, but is obviously more fiddly > and when there aren't many rows returned would probably be slower (you'd > have 5 lots of query parsing/execute overhead). That might be a > trade-off that makes sense to you though. Even though the tables are fairly big, the result-set should be small (a few dozen maybe), so multiple queries probably wouldn't gain much. Thanks again! Ryan