Thread: Optomizing left outer joins
I'm writing an on-the-fly report program that generates and executes an SQL statement. The statement depends upon the choices users make when selecting from several hundred columns spread across > 90 tables. Since some of the data fields are optional and I won't be able to match across tables, I need to use left outer joins to make sure I don't drop rows. Unfortunately, execution time is somewhere around a minute (there's other stuff going on, but the majority of the time is in the execution). Way too long. For comparison, my queries run in about 2 seconds when I don't have to do outer joins. As a workaround, I'm doing something like this: Select (list of all fields) from ... where... join all Select (list of all fields except the ones from tables that I can't match) from... where key not in (select key from optional table) This runs in the 5-10 second range, but it's going to be a nightmare to code, especially the "where" clause. I never know which columns the users will pick so I can't really hardcode the queries. Everything is generated on the fly. I've been asked not to index any columns. Any ideas about how I can make this thing run faster so I can drop the workaround, or any alternative ideas? I think we're using version 7.3.2. Thanks a lot for your help. Lorraine
On Mon, 2003-04-21 at 11:14, Lorraine Dewey wrote: > I'm writing an on-the-fly report program that generates and executes > an SQL statement. The statement depends upon the choices users make > when selecting from several hundred columns spread across > 90 tables. > > Since some of the data fields are optional and I won't be able to > match across tables, I need to use left outer joins to make sure I > don't drop rows. Unfortunately, execution time is somewhere around a > minute (there's other stuff going on, but the majority of the time is > in the execution). Way too long. For comparison, my queries run in > about 2 seconds when I don't have to do outer joins. Have you tried (or are able to) change the order the tables are joined in? PostgreSQL runs outer joins in the order they are provided in, which not not necessarily the best order to do so. Try using EXPLAIN ANALYZE on your fast query for help on determining the best join order. Its up to you to ensure the results still apply. Switch the join order may change the results depending on what you're doing. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Lorraine, > Since some of the data fields are optional and I won't be able to > match across tables, I need to use left outer joins to make sure I > don't drop rows. Unfortunately, execution time is somewhere around a > minute (there's other stuff going on, but the majority of the time is > in the execution). Way too long. For comparison, my queries run in > about 2 seconds when I don't have to do outer joins. FWIW, outer joins are slower than regular joins on all RDBMSs I've tested -- often up to 5 times slower. I'm not sure whether this is just the planner restricitons inherent in an outer join, or whether this is something about the required join algorithm itself. The general solution is not to allow nulls in join columns. For example, I recently had to force one of my clients to add 0: Not Selected to all of their reference lists to make those columns NOT NULL. Their reports run 80% faster now. > I never know which columns the users will pick so I can't really > hardcode the queries. Everything is generated on the fly. I've been > asked not to index any columns. Any ideas about how I can make this > thing run faster so I can drop the workaround, or any alternative > ideas? I think we're using version 7.3.2. If the client is tying your hands, you'll have to use a workaround. Make sure you tell them so ;-> -- -Josh BerkusAglio Database SolutionsSan Francisco
Lorraine, > Perhaps I should have said that not only is the column optional, the entire > row is optional. I am trying to join a master table (contains insurance claim > information) to an error table. If the claim in the master table doesn't have > any errors, there won't be a row for it in the error table. When I select > columns from both tables, I only get the errored claims. That's why I was > trying to use the outer join. Well, an outer join is appropriate in that case. However, a single outer join shouldn't be killing your queries -- in the example I gave, I eliminated 19 outer joins to boost the client's performance. I think the lack of indexes is your real albatross. > I like your advice about telling the customer about the workaround. I'll > definitely do that! The workaround is pretty awkward: two select statements > joined by "union all," then some subselects to pick out the unmatched records. > It returns the correct data and it's faster than the left join, but it takes > a lot more code. Anyway, I guess I'm stuck with it. You probably are. But make sure you tell the client that the lack of indexes is killing performance; even if it doesn't get fixed, your keister is covered. -- -Josh BerkusAglio Database SolutionsSan Francisco
Josh Berkus <josh@agliodbs.com> writes: > FWIW, outer joins are slower than regular joins on all RDBMSs I've tested -- > often up to 5 times slower. I'm not sure whether this is just the planner > restricitons inherent in an outer join, or whether this is something about > the required join algorithm itself. There isn't anything fundamentally slower about left join vs plain join (unless, perhaps, the left join has to generate a really large number of null-extended rows that the plain join would not generate). My money is on the left join having prevented some optimization that could be applied in the plain-join case. But since we have not been shown any exact details of the tables or the query, it's impossible to do more than wager... regards, tom lane