Re: a few crazy ideas about hash joins - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: a few crazy ideas about hash joins
Date
Msg-id 200904071355.n37DtS617785@momjian.us
Whole thread Raw
In response to Re: a few crazy ideas about hash joins  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: a few crazy ideas about hash joins  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Are there any TODOs here?

---------------------------------------------------------------------------

Robert Haas wrote:
> On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Correct, but you've got the details all wrong. ?The real problem is that
> >>> the planner might discard a join path hash(A,B) at level 2 because it
> >>> loses compared to, say, merge(A,B). ?But when we get to level three,
> >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> >>> of the two hashes. ?We'll never find that out unless we keep the
> >>> "inferior" hash path around. ?We can certainly do that; the question
> >>> is what's it going to cost us to allow more paths to survive to the
> >>> next join level. ?(And I'm afraid the answer may be "plenty"; it's a
> >>> combinatorial explosion we're looking at here.)
> >
> >> That would be crazy. ?I think doing it the way I suggested is correct,
> >> just not guaranteed to catch every case. ?The reality is that even if
> >> we took Greg Stark's suggestion of detecting this situation only in
> >> the executor, we'd still get some benefit out of this. ?If we take my
> >> intermediate approach, we'll catch more cases where this is a win.
> >> What you're suggesting here would catch every conceivable case, but at
> >> the expense of what I'm sure would be an unacceptable increase in
> >> planning time for very limit benefit.
> >
> > Maybe, maybe not. ?I've seen plenty of plans that have several
> > mergejoins stacked up on top of each other with no intervening sorts.
> > There is 0 chance that the planner would have produced that if it
> > thought that it had to re-sort at each level; something else would have
> > looked cheaper. ?I think that your proposals will end up getting very
> > little of the possible benefit, because the planner will fail to choose
> > plan trees in which the optimization can be exploited.
> 
> Well, I'm all ears if you have suggestions for improvement.  For
> sorts, we use PathKeys to represent the ordering of each path and keep
> the paths for each set of pathkeys.  By analogy, we could maintain a
> list of PathHash structures for each path representing the tables that
> had already been hashed.  add_path() would then have to consider both
> the PathHash structures and the PathKey structures before concluding
> that a path was definitely worse than some path previously found.  At
> each level of the join tree, we'd need to truncate PathHash structures
> that provably have no further use (e.g. on a base table that does not
> appear again above the level of the join already planned) to avoid
> keeping around paths that appeared to be better only because we didn't
> know that the paths they have hashed are worthless in practice.  Maybe
> that wouldn't even be that expensive, actually, because there will be
> lots of cases where you know the relevant table doesn't appear
> elsewhere in the query and not save any extra paths.  But I think we'd
> have to write the code and benchmark it to really know.
> 
> I guess the reason I'm not too worked up about this is because my
> experience is that the planner nearly always prefers hash joins on
> small tables, even when an index is present - the queries I'm worried
> about optimizing don't need any additional encouragement to use hash
> joins; they're doing it already.  But certainly it doesn't hurt to see
> how many cases we can pick up.
> 
> ...Robert
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Path separator
Next
From: Tom Lane
Date:
Subject: Re: More message encoding woes