Thread: Badly planned queries with JOIN syntax
All (apologies if this gets posted twice - my outgoing email had changed since I last posted and Majordomo got confused), I have a requirement for some rather complex multi-table queries involving inner, outer and full joins. However, I'm running into some problems because the planner always JOINs in the order I give them (as documented) - which is not the optimal plan. The query is (very) dynamically generated, so it's not as simple as "order the JOINs right" because there are some 40,000 possible queries (and that's just with the current data and table set). What I would like to do is push all JOIN constraints down into a WHERE clause, and for INNER joins specified this way the planner seems to generate the optimal query each time (since it has freedom to re-order). However, under Postgresql, I'm not aware of any way of doing OUTER joins with a WHERE clause (I believe ANSI SQL92 had a "table.column *= otable.ocolumn" which equates to "table LEFT OUTER JOIN otable on column=ocolumn"). An example: create table a (id text, somedata text, somedata2 text, primary key (id)); create table b (id text, pid text not null, extradata text, primary key (id)); create index b_pid on b(pid); create table c (id text, pid textnot null, moredata text, primary key (id)); create index c_pid on c(pid); a, b, c contain tens of thousands of rows. The search function can search on any field, but if the user searches on "moredata", you can do: select * from a join b on b.pid = a.id join c on c.pid = b.id where moredata like 'blah%'; .This gives me a query plan that does a sequential scan over a and b (usually with a hash join) before joining to c, which it will index scan. However, reordering that: select * from c join b on c.pid = b.id join a on b.pid = a.id where moredata like 'blah%'; .or doing a query with a like on table a: select * from a join b on b.pid = a.id join c on c.pid = b.id where a.somedata like 'foo%'; .will do an indexed scan, which is the optimal plan (I have verified this and am aware that indexed scans are not always the optimal plan - they *are* in this case, I assure you). Put another way - I *don't* want to use the order of the JOINs as an explicit command to the planner, but *do* need to use the JOIN syntax since I need OUTER and FULL joins in some or all queries (which you can's specify with WHERE). I could re-order the JOINs such that the LIKEd tables come first, but that's really the job of the planner, and some of these queries involve large numbers of tables and very complex join conditions (as I said, 40,000+ possible query formats), so it's not obvious to me *how* to order them programatically - but of course, the planner knows. Suggestions? -- Regards, Phil +------------------------------------------+ | Phil Mayers | | Network & Infrastructure Group | | Information & Communication Technologies | | Imperial College | +------------------------------------------+ ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/
On Fri, 4 Apr 2003, Phil Mayers wrote: > I have a requirement for some rather complex multi-table queries involving > inner, outer and full joins. However, I'm running into some problems because the > planner always JOINs in the order I give them (as documented) - which is not the > optimal plan. The query is (very) dynamically generated, so it's not as simple > as "order the JOINs right" because there are some 40,000 possible queries (and > that's just with the current data and table set). > > What I would like to do is push all JOIN constraints down into a WHERE clause, > and for INNER joins specified this way the planner seems to generate the optimal > query each time (since it has freedom to re-order). However, under Postgresql, > I'm not aware of any way of doing OUTER joins with a WHERE clause (I believe > ANSI SQL92 had a "table.column *= otable.ocolumn" which equates to "table LEFT > OUTER JOIN otable on column=ocolumn"). AFAICT SQL92 has no such syntax. It's hard to do without alot of limitations (because join order can be important to the results of the query with outer joins). > create table a (id text, somedata text, somedata2 text, primary key (id)); > create table b (id text, pid text not null, extradata text, primary key (id)); > create index b_pid on b(pid); > create table c (id text, pid textnot null, moredata text, primary key (id)); > create index c_pid on c(pid); > > a, b, c contain tens of thousands of rows. The search function can search on any > field, but if the user searches on "moredata", you can do: > > select * from a join b on b.pid = a.id join c on c.pid = b.id where > moredata like 'blah%'; > select * from a join b on b.pid = a.id join c on c.pid = b.id where > a.somedata like 'foo%'; > Put another way - I *don't* want to use the order of the JOINs as an explicit > command to the planner, but *do* need to use the JOIN syntax since I need OUTER > and FULL joins in some or all queries (which you can's specify with WHERE). As an example, where would the outer join go in the above? Would it be in place of one of the joins there or to another table entirely? If the latter, you might be able to do a subselct in from that contains the inner join portion using the from table,table where ... syntax and outer join that to the necessary tables. I think the planner is probably smart enough to do something reasonable with that.
Phil Mayers <p.mayers@imperial.ac.uk> writes: > Put another way - I *don't* want to use the order of the JOINs as an > explicit command to the planner, but *do* need to use the JOIN syntax > since I need OUTER and FULL joins in some or all queries (which you > can's specify with WHERE). This behavior has been changed for 7.4, but there isn't much to be done about it in 7.3, unless you want to try to back-port the relevant fixes. regards, tom lane
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Fri, 4 Apr 2003, Phil Mayers wrote: > > > ANSI SQL92 had a "table.column *= otable.ocolumn" which equates to "table LEFT > > OUTER JOIN otable on column=ocolumn"). > > AFAICT SQL92 has no such syntax. It's hard to do without alot of > limitations (because join order can be important to the results of the > query with outer joins). Fwiw, this is MSSQL's (and presumably Sybase's?) non-standard outer join syntax. -- greg