Thread: Merge Join with an Index Optimization

Merge Join with an Index Optimization

From

Michael Malis

Date:

11 September 2016, 03:03:12

Hi,

As I understand it, a merge join will currently read all tuples from both subqueries (besides early termination). I believe it should be possible to take advantages of the indexes on one or both of the tables being read from to skip a large number of tuples that would currently be read. As an example, let's say we are doing equi merge join between two tables with the following data:

(3, 4, 5, 9, 10, 11)

(0, 1, 2, 6, 7, 8)

Currently, even though no tuples would be returned from the merge join, all of the data will be read from both tables. If there are indexes on both tables, pg should be able to execute as follows. Get the tuple with value 3 from the first table and then look up the first value greater than 3 in the second table using the index. In this case, that value would be 6. Since 3 != 6, pg would then look up the lowest value greater than 6 in the first table. This process repeats, and tuples are output whenever the values are equal. This can be thought of as an alternating nested loop join, where the positions in the indexes are maintained. In the case where only one of the tables has an index, this can be thought of as a nested loop join where the inner relation is the table with the index, and the position in that index is maintained while iterating over the outer relation (the outer relation would need to be sorted for this to work).

I can't demonstrate in any benchmarks, but I imagine this would dramatically speed up cases of a merge join between two tables where the number of tuples that satisfy the join condition is small. I don't know the Postgres internals well enough to know if there is anything obvious that would prevent this optimization.

Thanks,

Michael

Re: Merge Join with an Index Optimization

From

Tom Lane

Date:

11 September 2016, 16:20:38

Michael Malis <michaelmalis2@gmail.com> writes:
> As I understand it, a merge join will currently read all tuples from both
> subqueries (besides early termination). I believe it should be possible to
> take advantages of the indexes on one or both of the tables being read from
> to skip a large number of tuples that would currently be read.

IIUC, what you're proposing is to replace "read past some tuples in the
index" with "re-descend the index btree".  Yes, that would win if it
allows skipping over a large number of index entries, but it could lose
big-time if not.  The difficulty is that I don't see how the merge join
level could predict whether it would win for any particular advance step.
You'd really need most of the smarts to be in the index AM.

You might want to troll the archives for past discussions of index skip
scan, which is more or less the same idea (could possibly be exactly the
same idea, depending on how it's implemented).  I think we had arrived
at the conclusion that re-descent from the root might be appropriate
when transitioning to another index page, but not intra-page.  Anyway
no one's produced a concrete patch yet.
        regards, tom lane

Re: Merge Join with an Index Optimization

From

Michael Malis

Date:

11 September 2016, 21:08:04

On Sun, Sep 11, 2016 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Malis <michaelmalis2@gmail.com> writes:
>> As I understand it, a merge join will currently read all tuples from both
>> subqueries (besides early termination). I believe it should be possible to
>> take advantages of the indexes on one or both of the tables being read from
>> to skip a large number of tuples that would currently be read.
>>
> IIUC, what you're proposing is to replace "read past some tuples in the
> index" with "re-descend the index btree". Yes, that would win if it

Roughly yes, although is it necessary to rescan the btree from the top? Is it not possible to "resume" the scan from the previous location?

> You might want to troll the archives for past discussions of index skip
> scan, which is more or less the same idea (could possibly be exactly the
> same idea, depending on how it's implemented).

After searching a bit, all I was able to find was this. It looks like someone had a rough patch of applying a similar idea to DISTINCT. I can't tell what happened to it, but in the patch they mention that it should be possible to do a "local traversal ie from the current page".

A different, but similar optimization, would be to first check the join condition with the data in the index. Then only fetch the full row only if the data in the index passes the join condition. I imagine that this would be easier to implement, although less efficent than what I proposed above because it has to scan the entire index.

Thanks,

Michael

Re: Merge Join with an Index Optimization

From

Thomas Munro

Date:

11 September 2016, 23:20:26

On Sun, Sep 11, 2016 at 11:07 PM, Michael Malis <michaelmalis2@gmail.com> wrote:
> On Sun, Sep 11, 2016 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Michael Malis <michaelmalis2@gmail.com> writes:
>> >> As I understand it, a merge join will currently read all tuples from
>> >> both
>> >> subqueries (besides early termination). I believe it should be possible
>> >> to
>> >> take advantages of the indexes on one or both of the tables being read
>> >> from
>> >> to skip a large number of tuples that would currently be read.
>> >>
>> > IIUC, what you're proposing is to replace "read past some tuples in the
>> > index" with "re-descend the index btree".  Yes, that would win if it
>
>
> Roughly yes, although is it necessary to rescan the btree from the top? Is
> it not possible to "resume" the scan from the previous location?
>
>>
>> > You might want to troll the archives for past discussions of index skip
>> > scan, which is more or less the same idea (could possibly be exactly the
>> > same idea, depending on how it's implemented).
>
>
> After searching a bit, all I was able to find was this. It looks like
> someone had a rough patch of applying a similar idea to DISTINCT. I can't
> tell what happened to it, but in the patch they mention that it should be
> possible to do a "local traversal ie from the current page".

That was me.  Yeah, reserving the option for traversing other than
from the root was my reason for wanting to introduce a skip operation
to the IM interface as described in the later email in that thread[2],
rather than doing it via rescanning (though maybe it's not strictly
necessary, I don't know).  There are a whole lot of interesting
execution tricks that could be enabled by btree skipping (see Oracle,
DB2, MySQL, SQLite).  DISTINCT was the simplest thing I could think of
where literally every other RDBMS beats us because we don't have it.
But that was nowhere near a useful patch, just an experiment.  I hope
I can get back to looking at it for 10...

[2] https://www.postgresql.org/message-id/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Merge Join with an Index Optimization

From

Michael Malis

Date:

11 October 2016, 00:28:23

<div dir="ltr">I discovered that the kind of join I proposed is called the leapfrog triejoin: <a
href="https://arxiv.org/pdf/1210.0481v5.pdf">https://arxiv.org/pdf/1210.0481v5.pdf</a></div>

Re: Merge Join with an Index Optimization

From

Claudio Freire

Date:

11 October 2016, 19:18:19

On Sun, Sep 11, 2016 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Malis <michaelmalis2@gmail.com> writes:
>> As I understand it, a merge join will currently read all tuples from both
>> subqueries (besides early termination). I believe it should be possible to
>> take advantages of the indexes on one or both of the tables being read from
>> to skip a large number of tuples that would currently be read.
>
> IIUC, what you're proposing is to replace "read past some tuples in the
> index" with "re-descend the index btree".  Yes, that would win if it
> allows skipping over a large number of index entries, but it could lose
> big-time if not.  The difficulty is that I don't see how the merge join
> level could predict whether it would win for any particular advance step.
> You'd really need most of the smarts to be in the index AM.
>
> You might want to troll the archives for past discussions of index skip
> scan, which is more or less the same idea (could possibly be exactly the
> same idea, depending on how it's implemented).  I think we had arrived
> at the conclusion that re-descent from the root might be appropriate
> when transitioning to another index page, but not intra-page.  Anyway
> no one's produced a concrete patch yet.

Interesting it should be brought up.

I was considering the index skip optimization for vacuum at some
point, might as well consider it for regular scans too if I get to
that.

Basically, instead of a simple get next, I was considering adding a
"get next skip until". The caller would be the one responsible for
sending the hint, and the index am would be free to implement the skip
in a smart way if possible.

The interface for vacuum is a bit different in practice, but
conceptually the same.