Re: [HACKERS] Slow - grindingly slow - query - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Slow - grindingly slow - query
Date
Msg-id 28763.942467455@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Slow - grindingly slow - query  (Brian Hirt <bhirt@mobygames.com>)
List pgsql-hackers
Brian Hirt <bhirt@mobygames.com> writes:
> On Fri, Nov 12, 1999 at 09:58:14AM -0500, Tom Lane wrote:
>> If it's bugging you enough to go fix it now, contributions are always
>> welcome ;-)

> Okay, what would be the correct approach to solving the problem, 
> and where would be a good place to start?  I'v only been on this list
> for a few weeks, so I'm missed discussion on the approach to solving 
> this problem.  Should this change be localized to just the planner? 
> Should the rewrite system be creating a different query tree?  Will both 
> need to be changed?  If a lot of work is being done to this part of 
> the system, is now a bad time to try this work?

Well, actually, figuring out how & where to do it is the trickiest part
of the work.  Might not be the best project for a newbie backend-hacker
to start with :-(.

After a few moments' thought, it seems to me that this issue might be
closely intertwined with the OUTER JOIN stuff that Thomas is working on
and the querytree representation redesign that Jan and I have been
muttering about (but not yet actually doing anything about).  We want
to handle SELECT ... WHERE expr IN (SELECT ...) like a join, but the
semantics aren't exactly the same as a conventional join, so it might
be that the thing needs to be rewritten as a special join type.  In
that case it'd fit right in with OUTER JOIN, I suspect.

The Informix EXPLAIN results that Theo Kramer posted (a few messages
back in this thread) are pretty interesting too.  If I'm reading that
printout right, Informix is not any smarter than we are about choosing
the scan types for the outer and inner queries; and yet they have a much
faster runtime for the WHERE IN query.  I speculate that they are doing
the physical matching of outer and inner tuples in a smarter way than we
are --- perhaps they are doing one scan of the inner query and entering
all the values into a hashtable that's then probed for each outer tuple.
(As opposed to rescanning the inner query for each outer tuple, as we
currently do.)  If that's the answer, then it could probably be
implemented as a localized change: rewrite the SubPlan node executor to
look more like the HashJoin node executor.  This isn't perfect --- it
wouldn't pick up the possibility of a merge-style join --- but it would
be better than what we have for a lot less work than the "full" solution.

This is all shooting from the hip; I haven't spent time looking into it.
Has anyone else got insights to offer?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] RFC: create/alter user extension
Next
From: Tim Holloway
Date:
Subject: Thread-safe queueing?