Home > mailing lists

Thread: materialization blocks hash join

materialization blocks hash join

From

Pavel Stehule

Date:

30 March 2020, 16:06:44

when I was in talk with Silvio Moioli, I found strange hash join. Hash was created from bigger table.

https://www.postgresql.org/message-id/79dd683d-3296-1b21-ab4a-28fdc2d98807%40suse.de

Now it looks so materialized CTE disallow hash

create table bigger(a int);

create table smaller(a int);

insert into bigger select random()* 10000 from generate_series(1,100000);

insert into smaller select i from generate_series(1,100000) g(i);

analyze bigger, smaller;

-- no problem

explain analyze select * from bigger b join smaller s on b.a = s.a;

postgres=# explain analyze select * from bigger b join smaller s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.00..7075.00 rows=100000 width=8) (actual time=32.937..87.276 rows=99994 loops=1)
Hash Cond: (b.a = s.a)
-> Seq Scan on bigger b (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.028..8.546 rows=100000 loops=1)
-> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual time=32.423..32.423 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2785kB
-> Seq Scan on smaller s (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.025..9.931 rows=100000 loops=1)
Planning Time: 0.438 ms
Execution Time: 91.193 ms
(8 rows)

but with materialized CTE

postgres=# explain analyze with b as materialized (select * from bigger), s as materialized (select * from smaller) select * from b join s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)
Merge Cond: (b.a = s.a)
CTE b
-> Seq Scan on bigger (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.026..11.083 rows=100000 loops=1)
CTE s
-> Seq Scan on smaller (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.015..9.161 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=78.775..90.953 rows=100000 loops=1)
Sort Key: b.a
Sort Method: external merge Disk: 1376kB
-> CTE Scan on b (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.033..39.274 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=62.453..74.004 rows=99996 loops=1)
Sort Key: s.a
Sort Method: external sort Disk: 1768kB
-> CTE Scan on s (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.018..31.669 rows=100000 loops=1)
Planning Time: 0.303 ms
Execution Time: 199.919 ms
(16 rows)

It doesn't use hash join - the estimations are perfect, but plan is suboptimal

Regards

Pavel

Re: materialization blocks hash join

From

Pavel Stehule

Date:

30 March 2020, 16:14:42

po 30. 3. 2020 v 18:06 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

when I was in talk with Silvio Moioli, I found strange hash join. Hash was created from bigger table.

https://www.postgresql.org/message-id/79dd683d-3296-1b21-ab4a-28fdc2d98807%40suse.de

Now it looks so materialized CTE disallow hash

create table bigger(a int);
create table smaller(a int);
insert into bigger select random()* 10000 from generate_series(1,100000);
insert into smaller select i from generate_series(1,100000) g(i);

analyze bigger, smaller;

-- no problem
explain analyze select * from bigger b join smaller s on b.a = s.a;

postgres=# explain analyze select * from bigger b join smaller s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.00..7075.00 rows=100000 width=8) (actual time=32.937..87.276 rows=99994 loops=1)
Hash Cond: (b.a = s.a)
-> Seq Scan on bigger b (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.028..8.546 rows=100000 loops=1)
-> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual time=32.423..32.423 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2785kB
-> Seq Scan on smaller s (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.025..9.931 rows=100000 loops=1)
Planning Time: 0.438 ms
Execution Time: 91.193 ms
(8 rows)

but with materialized CTE

postgres=# explain analyze with b as materialized (select * from bigger), s as materialized (select * from smaller) select * from b join s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)
Merge Cond: (b.a = s.a)
CTE b
-> Seq Scan on bigger (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.026..11.083 rows=100000 loops=1)
CTE s
-> Seq Scan on smaller (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.015..9.161 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=78.775..90.953 rows=100000 loops=1)
Sort Key: b.a
Sort Method: external merge Disk: 1376kB
-> CTE Scan on b (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.033..39.274 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=62.453..74.004 rows=99996 loops=1)
Sort Key: s.a
Sort Method: external sort Disk: 1768kB
-> CTE Scan on s (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.018..31.669 rows=100000 loops=1)
Planning Time: 0.303 ms
Execution Time: 199.919 ms
(16 rows)

It doesn't use hash join - the estimations are perfect, but plan is suboptimal

I was wrong, the estimation on CTE is ok, but JOIN estimation is bad

Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)

Regards

Pavel

Re: materialization blocks hash join

From

Tomas Vondra

Date:

30 March 2020, 16:51:28

On Mon, Mar 30, 2020 at 06:14:42PM +0200, Pavel Stehule wrote:
>po 30. 3. 2020 v 18:06 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
>napsal:
>
>> Hi
>>
>> when I was in talk with Silvio Moioli, I found strange hash join. Hash was
>> created from bigger table.
>>
>>
>> https://www.postgresql.org/message-id/79dd683d-3296-1b21-ab4a-28fdc2d98807%40suse.de
>>
>> Now it looks so materialized CTE disallow hash
>>
>>
>> create table bigger(a int);
>> create table smaller(a int);
>> insert into bigger select random()* 10000 from generate_series(1,100000);
>> insert into smaller select i from generate_series(1,100000) g(i);
>>
>> analyze bigger, smaller;
>>
>> -- no problem
>> explain analyze select * from bigger b join smaller s on b.a = s.a;
>>
>> postgres=# explain analyze select * from bigger b join smaller s on b.a =
>> s.a;
>>                                                          QUERY PLAN
>>
>>
>>
----------------------------------------------------------------------------------------------------------------------------
>>  Hash Join  (cost=3084.00..7075.00 rows=100000 width=8) (actual
>> time=32.937..87.276 rows=99994 loops=1)
>>    Hash Cond: (b.a = s.a)
>>    ->  Seq Scan on bigger b  (cost=0.00..1443.00 rows=100000 width=4)
>> (actual time=0.028..8.546 rows=100000 loops=1)
>>    ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4) (actual
>> time=32.423..32.423 rows=100000 loops=1)
>>          Buckets: 131072  Batches: 2  Memory Usage: 2785kB
>>          ->  Seq Scan on smaller s  (cost=0.00..1443.00 rows=100000
>> width=4) (actual time=0.025..9.931 rows=100000 loops=1)
>>  Planning Time: 0.438 ms
>>  Execution Time: 91.193 ms
>> (8 rows)
>>
>> but with materialized CTE
>>
>> postgres=# explain analyze with b as materialized (select * from bigger),
>> s as materialized (select * from smaller) select * from b join s on b.a =
>> s.a;
>>                                                       QUERY PLAN
>>
>>
>>
----------------------------------------------------------------------------------------------------------------------
>>  Merge Join  (cost=23495.64..773995.64 rows=50000000 width=8) (actual
>> time=141.242..193.375 rows=99994 loops=1)
>>    Merge Cond: (b.a = s.a)
>>    CTE b
>>      ->  Seq Scan on bigger  (cost=0.00..1443.00 rows=100000 width=4)
>> (actual time=0.026..11.083 rows=100000 loops=1)
>>    CTE s
>>      ->  Seq Scan on smaller  (cost=0.00..1443.00 rows=100000 width=4)
>> (actual time=0.015..9.161 rows=100000 loops=1)
>>    ->  Sort  (cost=10304.82..10554.82 rows=100000 width=4) (actual
>> time=78.775..90.953 rows=100000 loops=1)
>>          Sort Key: b.a
>>          Sort Method: external merge  Disk: 1376kB
>>          ->  CTE Scan on b  (cost=0.00..2000.00 rows=100000 width=4)
>> (actual time=0.033..39.274 rows=100000 loops=1)
>>    ->  Sort  (cost=10304.82..10554.82 rows=100000 width=4) (actual
>> time=62.453..74.004 rows=99996 loops=1)
>>          Sort Key: s.a
>>          Sort Method: external sort  Disk: 1768kB
>>          ->  CTE Scan on s  (cost=0.00..2000.00 rows=100000 width=4)
>> (actual time=0.018..31.669 rows=100000 loops=1)
>>  Planning Time: 0.303 ms
>>  Execution Time: 199.919 ms
>> (16 rows)
>>
>> It doesn't use hash join - the estimations are perfect, but plan is
>> suboptimal
>>
>
>I was wrong, the estimation on CTE is ok, but JOIN estimation is bad
>
>Merge Join  (cost=23495.64..773995.64 rows=50000000 width=8) (actual
>time=141.242..193.375 rows=99994 loops=1)
>

That's because eqjoinsel_inner won't have any statistics for either side
of the join, so it'll use default ndistinct values (200), resulting in
estimate of 0.5% for the join condition.

But this should not affect the choice of join algorithm, I think,
because that's only the output of the join.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: materialization blocks hash join

From

Tom Lane

Date:

30 March 2020, 17:13:15

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> That's because eqjoinsel_inner won't have any statistics for either side
> of the join, so it'll use default ndistinct values (200), resulting in
> estimate of 0.5% for the join condition.

Right.

> But this should not affect the choice of join algorithm, I think,
> because that's only the output of the join.

Lack of stats will also discourage use of a hash join, because the
default assumption in the absence of stats is that the join column
has a pretty non-flat distribution, risking clumping into a few
hash buckets.  Merge join is less sensitive to the data distribution
so it tends to come out as preferred in such cases.

            regards, tom lane