Thread: BUG #4170: Rows estimation which are cast from TEXT is inaccurate.
The following bug has been logged online: Bug reference: 4170 Logged by: Tashuhito Kasahara Email address: kasahara.tatsuhito@oss.ntt.co.jp PostgreSQL version: 8.3.1 Operating system: Linux Description: Rows estimation which are cast from TEXT is inaccurate. Details: I noticed that rows estimation is not accurate when we cast some datetype to TEXT. See the following example. (TEXT -> TIMESTAMP) ============================================================================ ==== test=# SELECT count(*) FROM test WHERE t < '2008-05-14 23:55:00'; count ------- 86099 (1 row) test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 23:55:00'; QUERY PLAN -------------------------------------------------------------------- Seq Scan on test (cost=0.00..1727.00 rows=85721 width=12) Filter: (t < '2008-05-14 23:55:00'::timestamp without time zone) (2 rows) test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 23:55:00'::text::timestamp; QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on test (cost=0.00..2209.00 rows=32133 width=12) <- too little number of the estimates Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone) (2 rows) test=# SELECT count(*) FROM test WHERE t < '2008-05-14 23:55:00'; count ------- 86099 (1 row) ============================================================================ ==== We can avoid this problem by setting appropriate cast-function. ============================================================================ ==== CREATE FUNCTION text2timestamp(text) RETURNS timestamp AS $$ SELECT timestamp_in(textout($1), 0, 0); $$ LANGUAGE sql STRICT STABLE; CREATE CAST (text AS timestamp) WITH FUNCTION text2timestamp(text) AS ASSIGNMENT; test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 23:55:00'::text::timestamp; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1968.00 rows=85721 width=12) Filter: (t < timestamp_in('2008-05-14 23:55:00'::cstring, 0::oid, 0)) (2 rows) ============================================================================ ==== I think it's a bug and will be troubled at plan optimization. Best regards.
"Tashuhito Kasahara" <kasahara.tatsuhito@oss.ntt.co.jp> writes: > test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 > 23:55:00'::text::timestamp; > QUERY PLAN > ---------------------------------------------------------------------------- > Seq Scan on test (cost=0.00..2209.00 rows=32133 width=12) <- too little > number of the estimates > Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone) > (2 rows) Hmm ... as of 8.3 this will generate a CoerceViaIO node, and it looks like I forgot to teach eval_const_expressions how to simplify those. regards, tom lane
Re: BUG #4170: Rows estimation which are cast from TEXT is inaccurate.
From
Tatsuhito Kasahara
Date:
Hi. Tom Lane wrote: > "Tashuhito Kasahara" <kasahara.tatsuhito@oss.ntt.co.jp> writes: >> test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 >> 23:55:00'::text::timestamp; >> QUERY PLAN >> ---------------------------------------------------------------------------- > >> Seq Scan on test (cost=0.00..2209.00 rows=32133 width=12) <- too little >> number of the estimates >> Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone) >> (2 rows) > > Hmm ... as of 8.3 this will generate a CoerceViaIO node, and it looks > like I forgot to teach eval_const_expressions how to simplify those. Relevant issues ocurred on PostgreSQL versions 7.4 also. 8.2, 8.1 and 8.0 seemed to be accurate estimates on simple test. ============= 7.4 test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp; QUERY PLAN ----------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..224.98 rows=3667 width=8) (actual time=0.026..30.987 rows=60 loops=1) Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone) Total runtime: 31.074 ms (3 rows) ============= 8.2 test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..224.98 rows=55 width=8) (actual time=0.043..13.896 rows=60 loops=1) Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone) Total runtime: 13.951 ms (3 rows) ============= 8.1 test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..224.98 rows=60 width=8) (actual time=0.034..15.409 rows=60 loops=1) Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone) Total runtime: 15.464 ms (3 rows) ============= 8.0 test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..224.98 rows=63 width=8) (actual time=0.041..18.645 rows=60 loops=1) Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone) Total runtime: 18.706 ms (3 rows) ============= Best regards. -- Tatsuhito Kasahara kasahara.tatsuhito@oss.ntt.co.jp