Merge Join with an Index Optimization - Mailing list pgsql-hackers

From Michael Malis
Subject Merge Join with an Index Optimization
Date
Msg-id CAFQtOwp4e1hU6gKi1R_nHgOLTzUJko8EcaC1FdAV0rQFo1ARzA@mail.gmail.com
Whole thread Raw
Responses Re: Merge Join with an Index Optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

As I understand it, a merge join will currently read all tuples from both subqueries (besides early termination). I believe it should be possible to take advantages of the indexes on one or both of the tables being read from to skip a large number of tuples that would currently be read. As an example, let's say we are doing equi merge join between two tables with the following data:

  (3, 4, 5, 9, 10, 11)
  (0, 1, 2, 6, 7, 8)

Currently, even though no tuples would be returned from the merge join, all of the data will be read from both tables. If there are indexes on both tables, pg should be able to execute as follows. Get the tuple with value 3 from the first table and then look up the first value greater than 3 in the second table using the index. In this case, that value would be 6. Since 3 != 6, pg would then look up the lowest value greater than 6 in the first table. This process repeats, and tuples are output whenever the values are equal. This can be thought of as an alternating nested loop join, where the positions in the indexes are maintained. In the case where only one of the tables has an index, this can be thought of as a nested loop join where the inner relation is the table with the index, and the position in that index is maintained while iterating over the outer relation (the outer relation would need to be sorted for this to work).

I can't demonstrate in any benchmarks, but I imagine this would dramatically speed up cases of a merge join between two tables where the number of tuples that satisfy the join condition is small. I don't know the Postgres internals well enough to know if there is anything obvious that would prevent this optimization.

Thanks,
Michael

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: feature request: explain "with details" option
Next
From: Jim Nasby
Date:
Subject: Re: [PATCH] Alter or rename enum value