Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: Performance improvement for joins where outer side is unique |
Date | |
Msg-id | CAFj8pRBDTGh9EJ1BnfTTGMXdb81z-+ec1Gt0fkSKtE0aPOG7yg@mail.gmail.com Whole thread Raw |
In response to | Re: Performance improvement for joins where outer side is unique (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Performance improvement for joins where outer side is unique
|
List | pgsql-hackers |
2015-10-13 23:28 GMT+02:00 David Rowley <david.rowley@2ndquadrant.com>:
On 4 September 2015 at 04:50, Robert Haas <robertmhaas@gmail.com> wrote:
Also: very nice performance results.Thanks.On following a thread in [General] [1] it occurred to me that this patch can give a massive improvement on Merge joins where the mark and restore causes an index scan to have to skip over many filtered rows again and again.I mocked up some tables and some data from the scenario on the [General] thread:create table customers (id bigint, group_id bigint not null);insert into customers select x.x,x.x%27724+1 from generate_series(1,473733) x(x);alter table customers add constraint customer_pkey primary key (id);create table balances (id bigint, balance int not null, tracking_number int not null, customer_id bigint not null);insert into balances select x.x, 100, 12345, x.x % 45 + 1 from generate_Series(1,16876) x(x);create index balance_customer_id_index on balances (customer_id);create index balances_customer_tracking_number_index on balances (customer_id,tracking_number);analyze;Unpatched I get:test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id = ac.customer_id WHERE o.group_id = 45;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual time=6.110..1491.408 rows=375 loops=1)Merge Cond: (ac.customer_id = o.id)-> Index Scan using balance_customer_id_index on balances ac (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..5.206 rows=16876 loops=1)-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75 rows=17 width=8) (actual time=0.014..1484.382 rows=376 loops=1)Filter: (group_id = 45)Rows Removed by Filter: 10396168Planning time: 0.207 msExecution time: 1491.469 ms(8 rows)Patched:test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id = ac.customer_id WHERE o.group_id = 45;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual time=6.037..11.528 rows=375 loops=1)Merge Cond: (ac.customer_id = o.id)-> Index Scan using balance_customer_id_index on balances ac (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..4.978 rows=16876 loops=1)-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75 rows=17 width=8) (actual time=0.015..5.141 rows=2 loops=1)Filter: (group_id = 45)Rows Removed by Filter: 27766Planning time: 0.204 msExecution time: 11.575 ms(8 rows)Now it could well be that the merge join costs need a bit more work to avoid a merge join in this case, but as it stands as of today, this is your performance gain.Regards
it is great
Pavel
David Rowley--
David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: