Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: 9.3 Pre-proposal: Range Merge Join
Date
Msg-id 1334598230.19915.16.camel@jdavis
Whole thread Raw
In response to Re: 9.3 Pre-proposal: Range Merge Join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 9.3 Pre-proposal: Range Merge Join  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, 2012-04-16 at 02:52 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> >  1. Order the ranges on both sides by the lower bound, then upper bound.
> > Empty ranges can be excluded entirely.
> >  2. Left := first range on left, Right := first range on right
> >  3. If Left or Right is empty, terminate.
> >  4. If lower(Left) > upper(Right), discard Right, goto 2
> >  5. If lower(Right) > upper(Left), discard Left, goto 2
> >  6. return (Left, Right) as joined tuple
> >  7. Right := next range on right
> >  8. goto 3
> 
> This is surely not correct in detail.  As written, it will be impossible
> for any tuple on the right side to be joined to more than one left-side
> tuple.  You will need something analogous to the mark-and-restore
> rewinding logic in standard mergejoin to make this work.

Every time you discard a tuple on the left, you go to step 2, which
rewinds the right side back to the first non-discarded tuple.

So, implemented using mark and restore:
 * start off with the marks at the first tuple on each side * "discard" means move the mark down a tuple * setting it
backto the first range means restoring to the mark * going to the next range (step 7) just means getting another
tuple,without changing the mark
 

Only one side really needs the mark and restore logic, but it was easier
to write the pseudocode in a symmetrical way (except step 7).

> I will note that mergejoin is not only one of the more complicated
> executor modules, but it accounts for a huge percentage of the planner's
> complexity; which doesn't exactly make me sympathize with the notion of
> let's-copy-and-paste-all-that.  It'd be a lot better if we could build
> it as a small tweak to mergejoin rather than an independent thing.
> 
> (Having said that, I have no idea how the planner support could work
> at all, because its treatment of mergejoins is intimately tied to
> mergejoinable equality and EquivalenceClasses and PathKeys, which don't
> seem readily extensible to the range situation.)

I think this is the more important problem area. It's clear that I'll
need to do some more investigation into the planning.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Next
From: Simon Riggs
Date:
Subject: Re: index-only scans vs. Hot Standby, round two