unnesesary sorting after Merge Full Join - Mailing list pgsql-general

From Alexey Nalbat
Subject unnesesary sorting after Merge Full Join
Date
Msg-id 200802211308.00798.nalbat@price.ru
Whole thread Raw
Responses Re: unnesesary sorting after Merge Full Join
List pgsql-general
Hello.

I'd like to use ORDER BY in any specified order and LIMIT, OFFSET for paging query results.
The query is FULL OUTER JOIN of two tables by field id. I think the results of Merge Full Join
to be ordered by some "combined id". And there is no need in extra Sort if I specify ORDER BY
that "combined id". But unfortunately it is not so.

Here is a simple example:
-- BEGIN
create table t1 as select generate_series(1,1000000,2) as id;
create table t2 as select generate_series(1,1000000,3) as id;

create index i1 on t1 ( id );
create index i2 on t2 ( id );

analyze t1;
analyze t2;

explain analyze
select id, t1.*, t2.*
from t1 natural full join t2
order by 1 limit 10 offset 10;

drop table t1;
drop table t2;
-- END

Postgresql chooses such plan:
 Limit  (cost=44080.12..44080.15 rows=10 width=8) (actual time=6724.850..6724.906 rows=10 loops=1)
   ->  Sort  (cost=44080.10..45330.10 rows=500000 width=8) (actual time=6724.806..6724.845 rows=20 loops=1)
         Sort Key: (COALESCE(t1.id, t2.id))
         Sort Method:  top-N heapsort  Memory: 25kB
         ->  Merge Full Join  (cost=0.00..30775.28 rows=500000 width=8) (actual time=0.142..5237.289 rows=666667
loops=1)
               Merge Cond: (t1.id = t2.id)
               ->  Index Scan using i1 on t1  (cost=0.00..15212.30 rows=500000 width=4) (actual time=0.079..1188.601
rows=500000loops=1) 
               ->  Index Scan using i2 on t2  (cost=0.00..10146.30 rows=333334 width=4) (actual time=0.051..793.635
rows=333334loops=1) 

The desired plan is much faster:
 Limit  (cost=0.62..1.23 rows=10 width=8) (actual time=0.262..0.366 rows=10 loops=1)
   ->  Merge Full Join  (cost=0.00..30775.28 rows=500000 width=8) (actual time=0.156..0.303 rows=20 loops=1)
         Merge Cond: (t1.id = t2.id)
         ->  Index Scan using i1 on t1  (cost=0.00..15212.30 rows=500000 width=4) (actual time=0.088..0.120 rows=15
loops=1)
         ->  Index Scan using i2 on t2  (cost=0.00..10146.30 rows=333334 width=4) (actual time=0.056..0.078 rows=11
loops=1)

I found comment in src/backend/optimizer/path/pathkeys.c:
* EXCEPTION: in a FULL or RIGHT join, we cannot treat the result as
* having the outer path's path keys, because null lefthand rows may be
* inserted at random points. It must be treated as unsorted.

How can I get rid of this sorting? Or could this behavior of Merge Full Join be improved?

--
Alexey A. Nalbat

Price Express
http://www.price.ru/
http://www.tyndex.ru/

pgsql-general by date:

Previous
From: Mike Leahy
Date:
Subject: No pgxs.mk with win32 binaries
Next
From: Gordon
Date:
Subject: Re: How to make update rapidly?