[HACKERS] PoC: full merge join on comparison clause - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject [HACKERS] PoC: full merge join on comparison clause
Date
Msg-id b31e1a2d-5ed2-cbca-649e-136f1a7c4c31@postgrespro.ru
Whole thread Raw
Responses Re: [HACKERS] PoC: full merge join on comparison clause
Re: [HACKERS] PoC: full merge join on comparison clause
List pgsql-hackers
Hi hackers,

As you know, at this time Postgres cannot perform a full join on a 
comparison clause. For example, if we have two tables with numeric 
columns and run a query like 'select * from t1 full join t2 on t1.a > 
t2.a', we get an error: "FULL JOIN is only supported with merge-joinable 
or hash-joinable join conditions". Such queries are legitimate SQL and 
sometimes arise when porting applications from different DBMS, so it 
would be good to support them in Postgres. They can be rewritten as 
union of right and left joins, but it requires manual work and runs 
somewhat slower (see the benchmark at the end of the letter). This 
proof-of-concept patch explores the possibility of performing such 
queries as merge joins.

Consider the following example where outer and inner relations are in 
ascending order, and we are asked to return outer tuples that are 
greater than inner.
                 outer > inner
    outer tuple -  6       4  - marked tuple
                   7       5
                   8       6  - inner tuple
                   8       7

The main difference from normal merge join is that we do not need to 
advance the marked tuple. This behavior can be implemented with some 
simple changes to the function that compares inner and outer tuples. 
However, for the join clause 'outer < inner' we would have to advance 
the marked tuple, which would require adding a new state to the merge 
join executor node. We do not do this. Instead, at the path creation 
stage, we make sure that the particular combination of sorting order and 
join clause allows us to perform merge join the simple way.

The optimizer requires some other changes to support these joins. 
Currently, it uses the same clauses to generate equivalence classes and 
to perform merge joins. This patch has to separate these two uses. 
Clauses that correspond to a btree equality operator are used to 
construct equivalence classes; the operator families for these clauses 
are recorded in the 'equivopfamilies' field of RestrictInfo struct. 
Clauses that correspond to btree equality or comparison are used to 
perform merge joins, and have their operator families recorded in the 
'mergeopfamilies'.

The optimizer also has to check whether the particular join clause list 
can be used for merge join, and ensure that it is compatible with 
inner/outer path ordering. These checks are performed by 
'can_sort_for_mergejoin()' and 'outer_sort_suitable_for_mergejoin()'.

There is an important unsolved problem in this patch. When generating 
index paths for base relations, the optimizer tries to use only one scan 
direction to limit the number of paths. This direction might not be 
suitable for a given join clause, and the input path will have to be 
sorted. We could generate paths for both directions, but this was 
specifically removed for optimization (SHA 834ddc62 by Tom Lane, 
10/27/2007 09:45 AM).

For inner joins one would expect the merge join to be slower than the 
nested loop, because it has more complex code paths, and indeed this can 
be seen on simple benchmarks (see the end of the letter). Costs should 
be revised further to reflect this difference.

I would be glad to hear your opinion on this approach.

Some benchmarks:

===== Full join vs union of left and right joins 
========================================

test1=# explain analyze select * from t4 right join t1 on t4.a < t1.a 
union all select * from t4 left join t1 on t4.a < t1.a where t1.a is null;
                                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Append  (cost=809.69..70703.19 rows=3340000 width=8) (actual 
time=8.336..1195.534 rows=5007546 loops=1)
    ->  Merge Left Join  (cost=809.69..34230.49 rows=3333333 width=8) 
(actual time=8.335..920.442 rows=5007537 loops=1)
          Merge Cond: (t1.a > t4.a)
          ->  Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 
rows=1000 width=4) (actual time=0.027..0.395 rows=1001 loops=1)
                Heap Fetches: 97
          ->  Sort  (cost=809.39..834.39 rows=10000 width=4) (actual 
time=8.300..356.821 rows=5007538 loops=1)
                Sort Key: t4.a
                Sort Method: quicksort  Memory: 931kB
                ->  Seq Scan on t4  (cost=0.00..145.00 rows=10000 
width=4) (actual time=0.019..2.533 rows=10000 loops=1)
    ->  Nested Loop Anti Join  (cost=0.28..3072.71 rows=6667 width=8) 
(actual time=4.685..35.421 rows=9 loops=1)
          ->  Seq Scan on t4 t4_1  (cost=0.00..145.00 rows=10000 
width=4) (actual time=0.010..0.656 rows=10000 loops=1)
          ->  Index Only Scan using idx_t1_a on t1 t1_1 (cost=0.28..6.10 
rows=333 width=4) (actual time=0.003..0.003 rows=1 loops=10000)
                Index Cond: (a > t4_1.a)
                Heap Fetches: 971
  Planning time: 1.414 ms
  Execution time: 1324.985 ms
(16 rows)

test1=# explain analyze select * from t4 full join t1 on t4.a < t1.a;
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Merge Full Join  (cost=809.66..34230.49 rows=3333333 width=8) (actual 
time=8.351..914.590 rows=5007546 loops=1)
    Merge Cond: (t1.a > t4.a)
    ->  Index Only Scan using idx_t1_a on t1  (cost=0.28..35.27 
rows=1000 width=4) (actual time=0.035..0.368 rows=1001 loops=1)
          Heap Fetches: 97
    ->  Sort  (cost=809.39..834.39 rows=10000 width=4) (actual 
time=8.309..347.851 rows=5007546 loops=1)
          Sort Key: t4.a
          Sort Method: quicksort  Memory: 931kB
          ->  Seq Scan on t4  (cost=0.00..145.00 rows=10000 width=4) 
(actual time=0.020..2.563 rows=10000 loops=1)
  Planning time: 1.083 ms
  Execution time: 1044.869 ms
(10 rows)


=== Merge vs nested loop ===========================================

test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a;
                                                            QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.28..944713.00 rows=33333333 width=8) (actual 
time=0.055..8718.840 rows=50014145 loops=1)
    ->  Seq Scan on t5  (cost=0.00..1443.00 rows=100000 width=4) (actual 
time=0.019..6.428 rows=100000 loops=1)
    ->  Index Only Scan using idx_t1_a on t1  (cost=0.28..6.10 rows=333 
width=4) (actual time=0.003..0.050 rows=500 loops=100000)
          Index Cond: (a >= t5.a)
          Heap Fetches: 9147995
  Planning time: 2.209 ms
  Execution time: 9942.176 ms
(7 rows)

test1=# set enable_mergejoin TO on;
SET
test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a;
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=9748.54..343618.88 rows=33333333 width=8) (actual 
time=35.491..9281.482 rows=50014145 loops=1)
    Merge Cond: (t1.a >= t5.a)
    ->  Index Only Scan using idx_t1_a on t1  (cost=0.28..35.27 
rows=1000 width=4) (actual time=0.027..0.769 rows=1001 loops=1)
          Heap Fetches: 97
    ->  Sort  (cost=9747.82..9997.82 rows=100000 width=4) (actual 
time=35.458..3906.652 rows=50014145 loops=1)
          Sort Key: t5.a
          Sort Method: quicksort  Memory: 8541kB
          ->  Seq Scan on t5  (cost=0.00..1443.00 rows=100000 width=4) 
(actual time=0.013..8.570 rows=100000 loops=1)
  Planning time: 2.368 ms
  Execution time: 10530.356 ms
(10 rows)

-- 
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Beena Emerson
Date:
Subject: Re: [HACKERS] Adding support for Default partition in partitioning
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] WITH clause in CREATE STATISTICS