Thread: materialization blocks hash join
Hi
when I was in talk with Silvio Moioli, I found strange hash join. Hash was created from bigger table.
Now it looks so materialized CTE disallow hash
create table bigger(a int);
create table smaller(a int);
insert into bigger select random()* 10000 from generate_series(1,100000);
insert into smaller select i from generate_series(1,100000) g(i);
analyze bigger, smaller;
-- no problem
explain analyze select * from bigger b join smaller s on b.a = s.a;
postgres=# explain analyze select * from bigger b join smaller s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.00..7075.00 rows=100000 width=8) (actual time=32.937..87.276 rows=99994 loops=1)
Hash Cond: (b.a = s.a)
-> Seq Scan on bigger b (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.028..8.546 rows=100000 loops=1)
-> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual time=32.423..32.423 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2785kB
-> Seq Scan on smaller s (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.025..9.931 rows=100000 loops=1)
Planning Time: 0.438 ms
Execution Time: 91.193 ms
(8 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.00..7075.00 rows=100000 width=8) (actual time=32.937..87.276 rows=99994 loops=1)
Hash Cond: (b.a = s.a)
-> Seq Scan on bigger b (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.028..8.546 rows=100000 loops=1)
-> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual time=32.423..32.423 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2785kB
-> Seq Scan on smaller s (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.025..9.931 rows=100000 loops=1)
Planning Time: 0.438 ms
Execution Time: 91.193 ms
(8 rows)
but with materialized CTE
postgres=# explain analyze with b as materialized (select * from bigger), s as materialized (select * from smaller) select * from b join s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)
Merge Cond: (b.a = s.a)
CTE b
-> Seq Scan on bigger (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.026..11.083 rows=100000 loops=1)
CTE s
-> Seq Scan on smaller (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.015..9.161 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=78.775..90.953 rows=100000 loops=1)
Sort Key: b.a
Sort Method: external merge Disk: 1376kB
-> CTE Scan on b (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.033..39.274 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=62.453..74.004 rows=99996 loops=1)
Sort Key: s.a
Sort Method: external sort Disk: 1768kB
-> CTE Scan on s (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.018..31.669 rows=100000 loops=1)
Planning Time: 0.303 ms
Execution Time: 199.919 ms
(16 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)
Merge Cond: (b.a = s.a)
CTE b
-> Seq Scan on bigger (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.026..11.083 rows=100000 loops=1)
CTE s
-> Seq Scan on smaller (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.015..9.161 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=78.775..90.953 rows=100000 loops=1)
Sort Key: b.a
Sort Method: external merge Disk: 1376kB
-> CTE Scan on b (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.033..39.274 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=62.453..74.004 rows=99996 loops=1)
Sort Key: s.a
Sort Method: external sort Disk: 1768kB
-> CTE Scan on s (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.018..31.669 rows=100000 loops=1)
Planning Time: 0.303 ms
Execution Time: 199.919 ms
(16 rows)
It doesn't use hash join - the estimations are perfect, but plan is suboptimal
Regards
Pavel
po 30. 3. 2020 v 18:06 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hiwhen I was in talk with Silvio Moioli, I found strange hash join. Hash was created from bigger table.Now it looks so materialized CTE disallow hashcreate table bigger(a int);create table smaller(a int);insert into bigger select random()* 10000 from generate_series(1,100000);insert into smaller select i from generate_series(1,100000) g(i);analyze bigger, smaller;-- no problemexplain analyze select * from bigger b join smaller s on b.a = s.a;postgres=# explain analyze select * from bigger b join smaller s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.00..7075.00 rows=100000 width=8) (actual time=32.937..87.276 rows=99994 loops=1)
Hash Cond: (b.a = s.a)
-> Seq Scan on bigger b (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.028..8.546 rows=100000 loops=1)
-> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual time=32.423..32.423 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2785kB
-> Seq Scan on smaller s (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.025..9.931 rows=100000 loops=1)
Planning Time: 0.438 ms
Execution Time: 91.193 ms
(8 rows)but with materialized CTEpostgres=# explain analyze with b as materialized (select * from bigger), s as materialized (select * from smaller) select * from b join s on b.a = s.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)
Merge Cond: (b.a = s.a)
CTE b
-> Seq Scan on bigger (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.026..11.083 rows=100000 loops=1)
CTE s
-> Seq Scan on smaller (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.015..9.161 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=78.775..90.953 rows=100000 loops=1)
Sort Key: b.a
Sort Method: external merge Disk: 1376kB
-> CTE Scan on b (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.033..39.274 rows=100000 loops=1)
-> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual time=62.453..74.004 rows=99996 loops=1)
Sort Key: s.a
Sort Method: external sort Disk: 1768kB
-> CTE Scan on s (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.018..31.669 rows=100000 loops=1)
Planning Time: 0.303 ms
Execution Time: 199.919 ms
(16 rows)It doesn't use hash join - the estimations are perfect, but plan is suboptimal
I was wrong, the estimation on CTE is ok, but JOIN estimation is bad
Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual time=141.242..193.375 rows=99994 loops=1)
RegardsPavel
On Mon, Mar 30, 2020 at 06:14:42PM +0200, Pavel Stehule wrote: >po 30. 3. 2020 v 18:06 odesílatel Pavel Stehule <pavel.stehule@gmail.com> >napsal: > >> Hi >> >> when I was in talk with Silvio Moioli, I found strange hash join. Hash was >> created from bigger table. >> >> >> https://www.postgresql.org/message-id/79dd683d-3296-1b21-ab4a-28fdc2d98807%40suse.de >> >> Now it looks so materialized CTE disallow hash >> >> >> create table bigger(a int); >> create table smaller(a int); >> insert into bigger select random()* 10000 from generate_series(1,100000); >> insert into smaller select i from generate_series(1,100000) g(i); >> >> analyze bigger, smaller; >> >> -- no problem >> explain analyze select * from bigger b join smaller s on b.a = s.a; >> >> postgres=# explain analyze select * from bigger b join smaller s on b.a = >> s.a; >> QUERY PLAN >> >> >> ---------------------------------------------------------------------------------------------------------------------------- >> Hash Join (cost=3084.00..7075.00 rows=100000 width=8) (actual >> time=32.937..87.276 rows=99994 loops=1) >> Hash Cond: (b.a = s.a) >> -> Seq Scan on bigger b (cost=0.00..1443.00 rows=100000 width=4) >> (actual time=0.028..8.546 rows=100000 loops=1) >> -> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual >> time=32.423..32.423 rows=100000 loops=1) >> Buckets: 131072 Batches: 2 Memory Usage: 2785kB >> -> Seq Scan on smaller s (cost=0.00..1443.00 rows=100000 >> width=4) (actual time=0.025..9.931 rows=100000 loops=1) >> Planning Time: 0.438 ms >> Execution Time: 91.193 ms >> (8 rows) >> >> but with materialized CTE >> >> postgres=# explain analyze with b as materialized (select * from bigger), >> s as materialized (select * from smaller) select * from b join s on b.a = >> s.a; >> QUERY PLAN >> >> >> ---------------------------------------------------------------------------------------------------------------------- >> Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual >> time=141.242..193.375 rows=99994 loops=1) >> Merge Cond: (b.a = s.a) >> CTE b >> -> Seq Scan on bigger (cost=0.00..1443.00 rows=100000 width=4) >> (actual time=0.026..11.083 rows=100000 loops=1) >> CTE s >> -> Seq Scan on smaller (cost=0.00..1443.00 rows=100000 width=4) >> (actual time=0.015..9.161 rows=100000 loops=1) >> -> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual >> time=78.775..90.953 rows=100000 loops=1) >> Sort Key: b.a >> Sort Method: external merge Disk: 1376kB >> -> CTE Scan on b (cost=0.00..2000.00 rows=100000 width=4) >> (actual time=0.033..39.274 rows=100000 loops=1) >> -> Sort (cost=10304.82..10554.82 rows=100000 width=4) (actual >> time=62.453..74.004 rows=99996 loops=1) >> Sort Key: s.a >> Sort Method: external sort Disk: 1768kB >> -> CTE Scan on s (cost=0.00..2000.00 rows=100000 width=4) >> (actual time=0.018..31.669 rows=100000 loops=1) >> Planning Time: 0.303 ms >> Execution Time: 199.919 ms >> (16 rows) >> >> It doesn't use hash join - the estimations are perfect, but plan is >> suboptimal >> > >I was wrong, the estimation on CTE is ok, but JOIN estimation is bad > >Merge Join (cost=23495.64..773995.64 rows=50000000 width=8) (actual >time=141.242..193.375 rows=99994 loops=1) > That's because eqjoinsel_inner won't have any statistics for either side of the join, so it'll use default ndistinct values (200), resulting in estimate of 0.5% for the join condition. But this should not affect the choice of join algorithm, I think, because that's only the output of the join. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > That's because eqjoinsel_inner won't have any statistics for either side > of the join, so it'll use default ndistinct values (200), resulting in > estimate of 0.5% for the join condition. Right. > But this should not affect the choice of join algorithm, I think, > because that's only the output of the join. Lack of stats will also discourage use of a hash join, because the default assumption in the absence of stats is that the join column has a pretty non-flat distribution, risking clumping into a few hash buckets. Merge join is less sensitive to the data distribution so it tends to come out as preferred in such cases. regards, tom lane