Thread: Optomizing left outer joins

Optomizing left outer joins

From
Lorraine.Dewey@companiongroup.com (Lorraine Dewey)
Date:
I'm writing an on-the-fly report program that generates and executes
an SQL statement.  The statement depends upon the choices users make
when selecting from several hundred columns spread across > 90 tables.

Since some of the data fields are optional and I won't be able to
match across tables, I need to use left outer joins to make sure I
don't drop rows.  Unfortunately, execution time is somewhere around a
minute (there's other stuff going on, but the majority of the time is
in the execution). Way too long. For comparison, my queries run in
about 2 seconds when I don't have to do outer joins.

As a workaround, I'm doing something like this:
 Select (list of all fields) from ... where...   join all Select (list of all fields except the ones from tables that I
can't
match) from... where key not in (select key from optional table)

This runs in the 5-10 second range, but it's going to be a nightmare
to code, especially the "where" clause.

I never know which columns the users will pick so I can't really
hardcode the queries.  Everything is generated on the fly. I've been
asked not to index any columns.  Any ideas about how I can make this
thing run faster so I can drop the workaround, or any alternative
ideas?  I think we're using version 7.3.2.

Thanks a lot for your help.

Lorraine



Re: Optomizing left outer joins

From
Rod Taylor
Date:
On Mon, 2003-04-21 at 11:14, Lorraine Dewey wrote:
> I'm writing an on-the-fly report program that generates and executes
> an SQL statement.  The statement depends upon the choices users make
> when selecting from several hundred columns spread across > 90 tables.
>
> Since some of the data fields are optional and I won't be able to
> match across tables, I need to use left outer joins to make sure I
> don't drop rows.  Unfortunately, execution time is somewhere around a
> minute (there's other stuff going on, but the majority of the time is
> in the execution). Way too long. For comparison, my queries run in
> about 2 seconds when I don't have to do outer joins.

Have you tried (or are able to) change the order the tables are joined
in?  PostgreSQL runs outer joins in the order they are provided in,
which not not necessarily the best order to do so.

Try using EXPLAIN ANALYZE on your fast query for help on determining the
best join order.

Its up to you to ensure the results still apply.  Switch the join order
may change the results depending on what you're doing.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Optomizing left outer joins

From
Josh Berkus
Date:
Lorraine,

> Since some of the data fields are optional and I won't be able to
> match across tables, I need to use left outer joins to make sure I
> don't drop rows.  Unfortunately, execution time is somewhere around a
> minute (there's other stuff going on, but the majority of the time is
> in the execution). Way too long. For comparison, my queries run in
> about 2 seconds when I don't have to do outer joins.

FWIW, outer joins are slower than regular joins on all RDBMSs I've tested --
often up to 5 times slower.   I'm not sure whether this is just the planner
restricitons inherent in an outer join, or whether this is something about
the required join algorithm itself.

The general solution is not to allow nulls in join columns.  For example, I
recently had to force one of my clients to add 0: Not Selected to all of
their reference lists to make those columns NOT NULL.   Their reports run 80%
faster now.

> I never know which columns the users will pick so I can't really
> hardcode the queries.  Everything is generated on the fly. I've been
> asked not to index any columns.  Any ideas about how I can make this
> thing run faster so I can drop the workaround, or any alternative
> ideas?  I think we're using version 7.3.2.

If the client is tying your hands, you'll have to use a workaround.  Make sure
you tell them so ;->

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Optomizing left outer joins

From
Josh Berkus
Date:
Lorraine,

> Perhaps I should have said that not only is the column optional, the entire
> row is optional.  I am trying to join a master table (contains insurance
claim
> information) to an error table.  If the claim in the master table doesn't
have
> any errors, there won't be a row for it in the error table.  When I select
> columns from both tables, I only get the errored claims.  That's why I was
> trying to use the outer join.

Well, an outer join is appropriate in that case.   However, a single outer
join shouldn't be killing your queries -- in the example I gave, I eliminated
19 outer joins to boost the client's performance.

I think the lack of indexes is your real albatross.

> I like your advice about telling the customer about the workaround.  I'll
> definitely do that!  The workaround is pretty awkward: two select statements
> joined by "union all," then some subselects to pick out the unmatched
records.
>  It returns the correct data and it's faster than the left join, but it
takes
> a lot more code.  Anyway, I guess I'm stuck with it.

You probably are.   But make sure you tell the client that the lack of indexes
is killing performance; even if it doesn't get fixed, your keister is
covered.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Optomizing left outer joins

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> FWIW, outer joins are slower than regular joins on all RDBMSs I've tested -- 
> often up to 5 times slower.   I'm not sure whether this is just the planner 
> restricitons inherent in an outer join, or whether this is something about 
> the required join algorithm itself.

There isn't anything fundamentally slower about left join vs plain join
(unless, perhaps, the left join has to generate a really large number
of null-extended rows that the plain join would not generate).  My money
is on the left join having prevented some optimization that could be
applied in the plain-join case.  But since we have not been shown any
exact details of the tables or the query, it's impossible to do more
than wager...
        regards, tom lane