Thread: Re: BUG #18973: The default enable_material=ON affects the cost estimation of optimizer, resulting in 10968x slow
Re: BUG #18973: The default enable_material=ON affects the cost estimation of optimizer, resulting in 10968x slow
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > I may have found a performance issue. The parameter enable_material is set > to ON by default, and it affects the cost estimation of optimizer, resulting > in 10968x slow. You can reproduce it as follows: > CREATE TABLE t0(c0 INT8); > CREATE TABLE t1(c1 INT8); > INSERT INTO t1 SELECT i FROM generate_series(1, 100000000) AS i; > SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1; > c0 | c1 > ----+---- > (0 rows) > Time: 9794.016 ms (00:09.794) The problem with this example is that you didn't ANALYZE the tables. If you do, it switches to a plan without Materialize: regression=# CREATE TABLE t0(c0 INT8); CREATE TABLE regression=# CREATE TABLE t1(c1 INT8); CREATE TABLE regression=# INSERT INTO t1 SELECT i FROM generate_series(1, 100000000) AS i; INSERT 0 100000000 regression=# explain analyze SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..3391443465.73 rows=224870062964 width=16) (actual time=19992.481..19992.483 rows=0.00 loops=1) Join Filter: (t0.c0 <> t1.c1) Buffers: shared read=442478 dirtied=442478 written=428541 -> Seq Scan on t1 (cost=0.00..1442478.28 rows=100000028 width=8) (actual time=0.136..11957.262 rows=100000000.00 loops=1) Buffers: shared read=442478 dirtied=442478 written=428541 -> Materialize (cost=0.00..43.90 rows=2260 width=8) (actual time=0.000..0.000 rows=0.00 loops=100000000) Storage: Memory Maximum Storage: 17kB -> Seq Scan on t0 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.005..0.005 rows=0.00 loops=1) Planning: Buffers: shared hit=68 read=33 Planning Time: 4.135 ms Execution Time: 19992.525 ms (12 rows) regression=# vacuum analyze t0,t1; VACUUM regression=# explain analyze SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..2692478.72 rows=100000031 width=16) (actual time=0.004..0.005 rows=0.00 loops=1) Join Filter: (t0.c0 <> t1.c1) -> Seq Scan on t0 (cost=0.00..0.00 rows=1 width=8) (actual time=0.004..0.004 rows=0.00 loops=1) -> Seq Scan on t1 (cost=0.00..1442478.32 rows=100000032 width=8) (never executed) Planning: Buffers: shared hit=9 Planning Time: 0.094 ms Execution Time: 0.017 ms (8 rows) But really that's kind of cheating, because it depends critically on t0 being completely empty. If we add a row there so that the join has to do some work, there is not so much value after all: regression=# insert into t0 values(1); INSERT 0 1 regression=# vacuum analyze t0; VACUUM regression=# explain analyze SELECT * FROM t0 INNER JOIN t1 ON t0.c0 != t1.c1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..2692479.73 rows=100000031 width=16) (actual time=0.051..11894.867 rows=99999999.00 loops=1) Join Filter: (t0.c0 <> t1.c1) Rows Removed by Join Filter: 1 Buffers: shared hit=15701 read=426778 -> Seq Scan on t0 (cost=0.00..1.01 rows=1 width=8) (actual time=0.003..0.005 rows=1.00 loops=1) Buffers: shared hit=1 -> Seq Scan on t1 (cost=0.00..1442478.32 rows=100000032 width=8) (actual time=0.044..3853.565 rows=100000000.00 loops=1) Buffers: shared hit=15700 read=426778 Planning: Buffers: shared hit=6 Planning Time: 0.068 ms Execution Time: 14050.387 ms (12 rows) We don't optimize for the case of tables being completely empty, because that's basically a zero-probability situation in real-world queries. So even though this don't-scan-the- inner-table-when-the-outer-one-is-empty short-circuit exists in the executor, the optimizer does not plan on the assumption of that happening. That's not a bug, it's intentional. We judge that a plan made on that assumption will be too brittle if the table turns out to not be empty after all. regards, tom lane