[HACKERS] PoC: full merge join on comparison clause - Mailing list pgsql-hackers
From | Alexander Kuzmenkov |
---|---|
Subject | [HACKERS] PoC: full merge join on comparison clause |
Date | |
Msg-id | b31e1a2d-5ed2-cbca-649e-136f1a7c4c31@postgrespro.ru Whole thread Raw |
Responses |
Re: [HACKERS] PoC: full merge join on comparison clause
Re: [HACKERS] PoC: full merge join on comparison clause |
List | pgsql-hackers |
Hi hackers, As you know, at this time Postgres cannot perform a full join on a comparison clause. For example, if we have two tables with numeric columns and run a query like 'select * from t1 full join t2 on t1.a > t2.a', we get an error: "FULL JOIN is only supported with merge-joinable or hash-joinable join conditions". Such queries are legitimate SQL and sometimes arise when porting applications from different DBMS, so it would be good to support them in Postgres. They can be rewritten as union of right and left joins, but it requires manual work and runs somewhat slower (see the benchmark at the end of the letter). This proof-of-concept patch explores the possibility of performing such queries as merge joins. Consider the following example where outer and inner relations are in ascending order, and we are asked to return outer tuples that are greater than inner. outer > inner outer tuple - 6 4 - marked tuple 7 5 8 6 - inner tuple 8 7 The main difference from normal merge join is that we do not need to advance the marked tuple. This behavior can be implemented with some simple changes to the function that compares inner and outer tuples. However, for the join clause 'outer < inner' we would have to advance the marked tuple, which would require adding a new state to the merge join executor node. We do not do this. Instead, at the path creation stage, we make sure that the particular combination of sorting order and join clause allows us to perform merge join the simple way. The optimizer requires some other changes to support these joins. Currently, it uses the same clauses to generate equivalence classes and to perform merge joins. This patch has to separate these two uses. Clauses that correspond to a btree equality operator are used to construct equivalence classes; the operator families for these clauses are recorded in the 'equivopfamilies' field of RestrictInfo struct. Clauses that correspond to btree equality or comparison are used to perform merge joins, and have their operator families recorded in the 'mergeopfamilies'. The optimizer also has to check whether the particular join clause list can be used for merge join, and ensure that it is compatible with inner/outer path ordering. These checks are performed by 'can_sort_for_mergejoin()' and 'outer_sort_suitable_for_mergejoin()'. There is an important unsolved problem in this patch. When generating index paths for base relations, the optimizer tries to use only one scan direction to limit the number of paths. This direction might not be suitable for a given join clause, and the input path will have to be sorted. We could generate paths for both directions, but this was specifically removed for optimization (SHA 834ddc62 by Tom Lane, 10/27/2007 09:45 AM). For inner joins one would expect the merge join to be slower than the nested loop, because it has more complex code paths, and indeed this can be seen on simple benchmarks (see the end of the letter). Costs should be revised further to reflect this difference. I would be glad to hear your opinion on this approach. Some benchmarks: ===== Full join vs union of left and right joins ======================================== test1=# explain analyze select * from t4 right join t1 on t4.a < t1.a union all select * from t4 left join t1 on t4.a < t1.a where t1.a is null; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Append (cost=809.69..70703.19 rows=3340000 width=8) (actual time=8.336..1195.534 rows=5007546 loops=1) -> Merge Left Join (cost=809.69..34230.49 rows=3333333 width=8) (actual time=8.335..920.442 rows=5007537 loops=1) Merge Cond: (t1.a > t4.a) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 rows=1000 width=4) (actual time=0.027..0.395 rows=1001 loops=1) Heap Fetches: 97 -> Sort (cost=809.39..834.39 rows=10000 width=4) (actual time=8.300..356.821 rows=5007538 loops=1) Sort Key: t4.a Sort Method: quicksort Memory: 931kB -> Seq Scan on t4 (cost=0.00..145.00 rows=10000 width=4) (actual time=0.019..2.533 rows=10000 loops=1) -> Nested Loop Anti Join (cost=0.28..3072.71 rows=6667 width=8) (actual time=4.685..35.421 rows=9 loops=1) -> Seq Scan on t4 t4_1 (cost=0.00..145.00 rows=10000 width=4) (actual time=0.010..0.656 rows=10000 loops=1) -> Index Only Scan using idx_t1_a on t1 t1_1 (cost=0.28..6.10 rows=333 width=4) (actual time=0.003..0.003 rows=1 loops=10000) Index Cond: (a > t4_1.a) Heap Fetches: 971 Planning time: 1.414 ms Execution time: 1324.985 ms (16 rows) test1=# explain analyze select * from t4 full join t1 on t4.a < t1.a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Merge Full Join (cost=809.66..34230.49 rows=3333333 width=8) (actual time=8.351..914.590 rows=5007546 loops=1) Merge Cond: (t1.a > t4.a) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 rows=1000 width=4) (actual time=0.035..0.368 rows=1001 loops=1) Heap Fetches: 97 -> Sort (cost=809.39..834.39 rows=10000 width=4) (actual time=8.309..347.851 rows=5007546 loops=1) Sort Key: t4.a Sort Method: quicksort Memory: 931kB -> Seq Scan on t4 (cost=0.00..145.00 rows=10000 width=4) (actual time=0.020..2.563 rows=10000 loops=1) Planning time: 1.083 ms Execution time: 1044.869 ms (10 rows) === Merge vs nested loop =========================================== test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.28..944713.00 rows=33333333 width=8) (actual time=0.055..8718.840 rows=50014145 loops=1) -> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.019..6.428 rows=100000 loops=1) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..6.10 rows=333 width=4) (actual time=0.003..0.050 rows=500 loops=100000) Index Cond: (a >= t5.a) Heap Fetches: 9147995 Planning time: 2.209 ms Execution time: 9942.176 ms (7 rows) test1=# set enable_mergejoin TO on; SET test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=9748.54..343618.88 rows=33333333 width=8) (actual time=35.491..9281.482 rows=50014145 loops=1) Merge Cond: (t1.a >= t5.a) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 rows=1000 width=4) (actual time=0.027..0.769 rows=1001 loops=1) Heap Fetches: 97 -> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual time=35.458..3906.652 rows=50014145 loops=1) Sort Key: t5.a Sort Method: quicksort Memory: 8541kB -> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.013..8.570 rows=100000 loops=1) Planning time: 2.368 ms Execution time: 10530.356 ms (10 rows) -- Alexander Kuzmenkov Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: