Thread: Wrong rows count estimation in query with simple UNION ALL leads to drammatically slow plan

Wrong rows count estimation in query with simple UNION ALL leads to drammatically slow plan

From

Michael Kolomeitsev

Date:

06 January 2014, 08:50:58

The problem was described here earlier but there is no answers unfortunately: http://www.postgresql.org/message-id/1387780366.910542228@f394.i.mail.ru

It looks like defect.

CREATE TABLE t1 (c1 INTEGER, id INTEGER PRIMARY KEY);

INSERT INTO t1 (c1, id) SELECT b, b FROM generate_series(1, 1000000) a (b);

REINDEX TABLE t1;

ANALYZE t1;

CREATE TABLE t2 (c1 INTEGER, id INTEGER PRIMARY KEY);

INSERT INTO t2 (c1, id) SELECT b, b FROM generate_series(1, 1000000) a (b);

REINDEX TABLE t2;

ANALYZE t2;

CREATE TEMP TABLE t3 (ref_id INTEGER);

INSERT INTO t3 (ref_id) VALUES (333333), (666666);

ANALYZE t3;

test=> EXPLAIN (ANALYZE) SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id;

QUERY PLAN

----------------------------------------------------------------------------------------------------

--------------------

Nested Loop (cost=0.42..34.84 rows=20000 width=12) (actual time=0.046..0.104 rows=4 loops=1)

-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4) (actual time=0.008..0.009 rows=2 loops=1)

-> Append (cost=0.42..16.89 rows=2 width=8) (actual time=0.023..0.042 rows=2 loops=2)

-> Index Scan using t1_pkey on t1 (cost=0.42..8.45 rows=1 width=8) (actual time=0.020..0.022 rows=1 loops=2)

Index Cond: (id = t3.ref_id)

-> Index Scan using t2_pkey on t2 (cost=0.42..8.45 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=2)

Index Cond: (id = t3.ref_id)

Total runtime: 0.184 ms

(8 rows)

This plan is perfect. But the rows estimation is not: 20000 vs 4.

As I can see Pg is able to do correct rows estimation: inner append: rows = 2, outer seq scan: rows = 2. And nested loop has to know that one is able to produce 2 * 2 = 4 rows max.

Moreover the cost estimation is _correct_! It is corresponded to 'rows=4'.

Why it is important to make correct row estimation? 'Cause it does matter in more complex query.

Let's join another big table in that query:

CREATE TABLE links (c1 INTEGER PRIMARY KEY, descr TEXT);

INSERT INTO links (c1, descr) SELECT b, '2' FROM generate_series(1, 100000) a (b);

REINDEX TABLE links;

ANALYZE links;

test=> EXPLAIN (ANALYZE) SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id INNER JOIN links l ON (t.c1 = l.c1);

QUERY PLAN

----------------------------------------------------------------------------------------------------

--------------------------

Hash Join (cost=2693.43..3127.84 rows=20000 width=18) (actual time=33.619..33.619 rows=0 loops=1)

Hash Cond: (t1.c1 = l.c1)

-> Nested Loop (cost=0.42..34.84 rows=20000 width=12) (actual time=0.038..0.078 rows=4 loops=1)

-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4) (actual time=0.006..0.007 rows=2 loops=1)

-> Append (cost=0.42..16.89 rows=2 width=8) (actual time=0.017..0.029 rows=2 loops=2)

-> Index Scan using t1_pkey on t1 (cost=0.42..8.45 rows=1 width=8) (actual time=0.015..0.017 rows=1 loops=2)

Index Cond: (id = t3.ref_id)

-> Index Scan using t2_pkey on t2 (cost=0.42..8.45 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=2)

Index Cond: (id = t3.ref_id)

-> Hash (cost=1443.00..1443.00 rows=100000 width=6) (actual time=33.479..33.479 rows=100000 loops=1)

Buckets: 16384 Batches: 1 Memory Usage: 3711kB

-> Seq Scan on links l (cost=0.00..1443.00 rows=100000 width=6) (actual time=0.017..14.853 rows=100000 loops=1)

Total runtime: 33.716 ms

(13 rows)

Planner thinks there'll be 20000 rows when join is performed between "t" and "t3". And that's why it makes a decision to use hash join with "links" table.

Let's prove it:

CREATE OR REPLACE FUNCTION public.f1()

RETURNS SETOF integer

LANGUAGE plpgsql

ROWS 20000

AS $function$

BEGIN

RETURN QUERY EXECUTE 'SELECT t.c1 FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id';

END;

$function$

test=> explain select * from f1() t(c1) INNER JOIN links l ON (t.c1 = l.c1);

QUERY PLAN

---------------------------------------------------------------------------

Hash Join (cost=2693.25..3293.25 rows=20000 width=10)

Hash Cond: (t.c1 = l.c1)

-> Function Scan on f1 t (cost=0.25..200.25 rows=20000 width=4)

-> Hash (cost=1443.00..1443.00 rows=100000 width=6)

-> Seq Scan on links l (cost=0.00..1443.00 rows=100000 width=6)

(5 rows)

The same "defect" plan.

test=> ALTER FUNCTION f1() ROWS 4;

ALTER FUNCTION

test=> explain select * from f1() t(c1) INNER JOIN links l ON (t.c1 = l.c1);

QUERY PLAN

--------------------------------------------------------------------------------

Nested Loop (cost=0.54..33.58 rows=4 width=10)

-> Function Scan on f1 t (cost=0.25..0.29 rows=4 width=4)

-> Index Scan using links_pkey on links l (cost=0.29..8.31 rows=1 width=6)

Index Cond: (c1 = t.c1)

(4 rows)

The correct/perfect plan.

In real life I have bigger "links" table and wrong plan slows execution significantly.

I found several workarounds. And it is not a problem anymore for me.

I just want to report this "strange thing".

I tried to look into source code, found some interesting places there but I think it is useless: Pg developers know the situation much better than me.