Thread: BUG #7610: planner get wrong rows estimate with LIKE operator

BUG #7610: planner get wrong rows estimate with LIKE operator

From
sam@hellosam.net
Date:
17 October 2012, 09:35:54
The following bug has been logged on the website:

Bug reference:      7610
Logged by:          Sam Wong
Email address:      sam@hellosam.net
PostgreSQL version: 9.2.1
Operating system:   Windows 7 64-bit
Description:        =


Repro SQL:

CREATE TABLE a (id text, primary key (id));

INSERT INTO a SELECT to_char(generate_series, 'FM0000000') from
generate_series(1,1000000);

ANALYZE a;

Q1: EXPLAIN ANALYZE SELECT * from a where id like '0005000%';
-----------------
 Index Only Scan using a_pkey on a  (cost=3D0.00..9.37 rows=3D100 width=3D8)
(actual time=3D0.170..0.173 rows=3D1 loops=3D1)
   Index Cond: ((id >=3D '0005000'::text) AND (id < '0005001'::text))
   Filter: (id ~~ '0005000%'::text)
   Heap Fetches: 1
 Total runtime: 0.229 ms
(5 rows)

Q2: EXPLAIN ANALYZE SELECT * from a where id >=3D '0005000' and id <
'0005001';
-----------------
 Index Only Scan using a_pkey on a  (cost=3D0.00..9.37 rows=3D1 width=3D8) =
(actual
time=3D0.027..0.028 rows=3D1 loops=3D1)
   Index Cond: ((id >=3D '0005000'::text) AND (id < '0005001'::text))
   Heap Fetches: 1
 Total runtime: 0.072 ms
(4 rows)

Problems:
 * For Q1, the planner incorrectly estimates that there will be 100 rows.
For Q2, it gives a correct estimation.
 * My actual problem in the production is that - because of the much larger
estimation, it prefers a merge/hash join in the later stage of a complex
query, instead of a nested loop. The outcome is that query tooks 10 seconds
instead of 100ms.

Observations:
 * Q1 and Q2 should be the logically identical. The psql thinks the same
(refer to the Index Cond in the anazyle output)
 * The analyze output says that Q1 not only has the same index condition of
Q2, but with an additional filter, yet surprisingly it is estimated to have
more rows than Q2.