Home > mailing lists

Re: horrendous query challenge :-) - Mailing list pgsql-general

From	Tom Lane
Subject	Re: horrendous query challenge :-)
Date	May 30, 2002 16:28:56
Msg-id	2935.1022790172@sss.pgh.pa.us Whole thread Raw
In response to	Re: horrendous query challenge :-) (Fran Fabrizio <ffabrizio@mmrd.com>)
List	pgsql-general

Tree view

Fran Fabrizio <ffabrizio@mmrd.com> writes:
> ... at which point the query runs pretty much instantly.

> That's an awful lot of overhead for that poor function.  findsite() is a
> key function that we use all over the place.  I thought it was fairly
> efficient but this demonstrates how it can quickly get out of hand.  I
> suppose if I could always ensure that findsite_cache was completely
> populated, we could always just hit that directly.  Since "what is the
> site id of the site that holds this entity?" is such a common question,
> we really should have it in a table as opposed to a function lookup,
> huh?  Does even the simplest plpgsql function have this kind of
> overhead?  Or is my function poorly written?

The problem is not that there's anything wrong with the function on its
own terms.  The problem is that the query planner has no idea what the
semantics of findsite(a.foo) = b.bar are, and so it has no alternative
but to execute the query as a nested loop: compare every row of A to
every row of B, computing findsite() again for each such comparison.
And each findsite call performs an independent probe into
findparent_cache, making the thing effectively equivalent to a
three-level nested loop.

You did not show the query plan being used after you converted this
to a join against findparent_cache, but obviously it's a lot better than
a 3-level nested loop...

It would have helped if the planner had computed findsite(a.foo) only
once per row of A.  In the current scheme of things I believe that would
happen if B were being scanned with an inner indexscan, but
unfortunately B was too small to justify an indexscan.  (The extra cost
of re-evaluating findsite isn't weighted sufficiently high in making
that choice, because the planner doesn't know that findsite() is any
more expensive than any other function.)

However, the real lesson is that pushing table lookups down into
functions prevents the planner from optimizing those lookups as joins.

            regards, tom lane

pgsql-general by date:

From: Kurt Gunderson
Date: 30 May 2002, 16:17:43
Subject: Re: Scaling with memory & disk planning

From: Antai Ning
Date: 30 May 2002, 16:29:49
Subject: Is there a way to check if a db connection handle is still valid?

Re: horrendous query challenge :-) - Mailing list pgsql-general

Previous

Next