Re: Pathify RHS unique-ification for semijoin planning - Mailing list pgsql-hackers
From | Álvaro Herrera |
---|---|
Subject | Re: Pathify RHS unique-ification for semijoin planning |
Date | |
Msg-id | 202507230811.ueghjzzloxkm@alvherre.pgsql Whole thread Raw |
In response to | Re: Pathify RHS unique-ification for semijoin planning (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Pathify RHS unique-ification for semijoin planning
|
List | pgsql-hackers |
Hello, As a very trivial test on this patch, I ran the query in your opening email, both with and without the patch, scaling up the size of the table a little bit. So I did this drop table if exists t; create table t(a int, b int); insert into t select i % 100000, i from generate_series(1,1e7) i; create index on t(a); vacuum analyze t; set enable_hashagg to off; explain (costs off, analyze, buffers) select * from t t1 where t1.a in (select a from t t2 where a < 10000) order by t1.a; This is the plan without the patch: QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Merge Join (actual time=289.262..700.761 rows=1000000.00 loops=1) Merge Cond: (t1.a = t2.a) Buffers: shared hit=1017728 read=3945 written=3361, temp read=1471 written=1476 -> Index Scan using t_a_idx on t t1 (actual time=0.011..320.747 rows=1000001.00 loops=1) Index Searches: 1 Buffers: shared hit=997725 read=3112 written=2664 -> Sort (actual time=219.273..219.771 rows=10000.00 loops=1) Sort Key: t2.a Sort Method: quicksort Memory: 385kB Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476 -> Unique (actual time=128.173..218.708 rows=10000.00 loops=1) Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476 -> Sort (actual time=128.170..185.461 rows=1000000.00 loops=1) Sort Key: t2.a Sort Method: external merge Disk: 11768kB Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476 -> Index Only Scan using t_a_idx on t t2 (actual time=0.024..74.171 rows=1000000.00 loops=1) Index Cond: (a < 10000) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=20003 read=833 written=697 Planning: Buffers: shared hit=28 read=7 Planning Time: 0.212 ms Execution Time: 732.840 ms and this is the plan with the patch: QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────────────────── Merge Join (actual time=70.310..595.116 rows=1000000.00 loops=1) Merge Cond: (t1.a = t2.a) Buffers: shared hit=1017750 read=3923 written=3586 -> Index Scan using t_a_idx on t t1 (actual time=0.020..341.257 rows=1000001.00 loops=1) Index Searches: 1 Buffers: shared hit=996914 read=3923 written=3586 -> Unique (actual time=0.028..99.074 rows=10000.00 loops=1) Buffers: shared hit=20836 -> Index Only Scan using t_a_idx on t t2 (actual time=0.026..66.219 rows=1000000.00 loops=1) Index Cond: (a < 10000) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=20836 Planning: Buffers: shared hit=55 read=15 written=14 Planning Time: 0.391 ms Execution Time: 621.377 ms This is a really nice improvement. I think we could find queries that are arbitrarily faster, by feeding enough tuples to the unnecessary Sort nodes. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No necesitamos banderas No reconocemos fronteras" (Jorge González)
pgsql-hackers by date: