Thread: varchar does not work too well with IS NOT NULL partial indexes.
Hello, I guess I've stuck upon one problem with (I guess) implicit casting varchar to text... Basically it looks like planner makes better use of WHERE ... IS NOT NULL indexes if either you explicitly put "text" as a column type or that you cast the column to ::text when making index. Here's a self-contained example (this is 8.2.4 server) CREATE TABLE foo (i int); INSERT INTO foo SELECT CASE WHEN i%10=0 THEN NULL ELSE i END FROM generate_series(1,100000) AS n(i); CREATE INDEX foo_i_index ON foo (i) WHERE i IS NOT NULL; ANALYZE foo; EXPLAIN SELECT * FROM foo WHERE i=17; QUERY PLAN ----------------------------------------------------------------------- Index Scan using foo_i_index on foo (cost=0.00..8.28 rows=1 width=4) Index Cond: (i = 17) ALTER TABLE foo ALTER COLUMN i TYPE text; EXPLAIN SELECT * FROM foo WHERE i=17; QUERY PLAN ----------------------------------------------------------------------------- Bitmap Heap Scan on foo (cost=12.14..554.42 rows=500 width=32) Recheck Cond: (i = '17'::text) -> Bitmap Index Scan on foo_i_index (cost=0.00..12.01 rows=498 width=0) Index Cond: (i = '17'::text) EXPLAIN SELECT * FROM foo WHERE i=17; QUERY PLAN --------------------------------------------------------- Seq Scan on foo (cost=0.00..1772.00 rows=500 width=34) Filter: ((i)::text = '17'::text) CREATE INDEX foo_i2_index ON foo ((i::text)); EXPLAIN SELECT * FROM foo WHERE i='17'::text; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on foo (cost=12.14..554.82 rows=500 width=34) Recheck Cond: ((i)::text = '17'::text) -> Bitmap Index Scan on foo_i2_index (cost=0.00..12.01 rows=500 width=0) Index Cond: ((i)::text = '17'::text) Regards, Dawid
"Dawid Kuroczko" <qnex42@gmail.com> writes: > ALTER TABLE foo ALTER COLUMN i TYPE text; > EXPLAIN SELECT * FROM foo WHERE i=17; > QUERY PLAN > ----------------------------------------------------------------------------- > Bitmap Heap Scan on foo (cost=12.14..554.42 rows=500 width=32) > Recheck Cond: (i = '17'::text) > -> Bitmap Index Scan on foo_i_index (cost=0.00..12.01 rows=498 width=0) > Index Cond: (i = '17'::text) I think you've lost some single-quotes around 17 in this query. With the single-quotes it works like this which seems like the correct result. You don't need the casts in the index definition if you write the query with single-quotes. > EXPLAIN SELECT * FROM foo WHERE i=17; > QUERY PLAN > --------------------------------------------------------- > Seq Scan on foo (cost=0.00..1772.00 rows=500 width=34) > Filter: ((i)::text = '17'::text) This is now an error: LINE 1: EXPLAIN SELECT * FROM foo WHERE i=17; ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. In fact it's not clear what you would want to happen here. Should it cast the text to an integer and use integer comparison or cast the integer to text and use text comparison? They don't necessarily generate the same results. (In fact I suspect they would for equals but consider the same situation for < or >) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 7/24/07, Gregory Stark <stark@enterprisedb.com> wrote: > "Dawid Kuroczko" <qnex42@gmail.com> writes: > > > ALTER TABLE foo ALTER COLUMN i TYPE text; > > EXPLAIN SELECT * FROM foo WHERE i=17; > > QUERY PLAN > > ----------------------------------------------------------------------------- > > Bitmap Heap Scan on foo (cost=12.14..554.42 rows=500 width=32) > > Recheck Cond: (i = '17'::text) > > -> Bitmap Index Scan on foo_i_index (cost=0.00..12.01 rows=498 width=0) > > Index Cond: (i = '17'::text) > > I think you've lost some single-quotes around 17 in this query. With the > single-quotes it works like this which seems like the correct result. You > don't need the casts in the index definition if you write the query with > single-quotes. Well, maybe I used wrong example... CREATE TABLE foo (t varchar(100)); INSERT INTO foo SELECT CASE WHEN i % 10 = 0 THEN NULL ELSE 'X' || i END FROM generate_series(1,1000000) AS n(i); What we have here is a table with every 10th row NULL. CREATE INDEX foo_t_index ON foo (t) WHERE t IS NOT NULL; ...and an index which will contain only NOT NULL values. Now, if we: # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=0.00..18025.78 rows=1 width=8) (actual time=0.079..565.661 rows=1 loops=1) Filter: ((t)::text = 'X17'::text) Total runtime: 565.689 ms # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; QUERY PLAN ------------------------------------------------------- Seq Scan on foo (cost=0.00..178.00 rows=50 width=68) Filter: ((t)::text = 'X17'::text) (2 rows) But if we: # ALTER TABLE foo ALTER COLUMN t TYPE text; # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Index Scan using foo_t_index on foo (cost=0.00..8.39 rows=1 width=10) (actual time=0.051..0.052 rows=1 loops=1) Index Cond: (t = 'X17'::text) Total runtime: 0.077 ms ...so it does nothing to do with single quotes. Actually it works fine, so long as you use text instead of varchar2: # EXPLAIN ANALYZE SELECT t FROM foo WHERE t=17; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Index Scan using foo_t_index on foo (cost=0.00..8.39 rows=1 width=10) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: (t = '17'::text) Total runtime: 0.034 ms I hope I have stated the problem clearly now. :-) Regards, Dawid
"Dawid Kuroczko" <qnex42@gmail.com> writes: > Now, if we: > > # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; > QUERY PLAN > --------------------------------------------------------------------------------------------------- > Seq Scan on foo (cost=0.00..18025.78 rows=1 width=8) (actual > time=0.079..565.661 rows=1 loops=1) > Filter: ((t)::text = 'X17'::text) > Total runtime: 565.689 ms > > > # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; > QUERY PLAN > ------------------------------------------------------- > Seq Scan on foo (cost=0.00..178.00 rows=50 width=68) > Filter: ((t)::text = 'X17'::text) > (2 rows) I still think you're playing games with the output. a) This is not an EXPLAIN ANALYZE at all, there are no "actual" values. And b) there's no explanation for why the estimates should be different for this query than the previous, identical, query. Send along the actual psql session, not an edited version. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 7/24/07, Gregory Stark <stark@enterprisedb.com> wrote: > "Dawid Kuroczko" <qnex42@gmail.com> writes: > > > Now, if we: > > > > # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------- > > Seq Scan on foo (cost=0.00..18025.78 rows=1 width=8) (actual > > time=0.079..565.661 rows=1 loops=1) > > Filter: ((t)::text = 'X17'::text) > > Total runtime: 565.689 ms > > > > > > # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17'; > > QUERY PLAN > > ------------------------------------------------------- > > Seq Scan on foo (cost=0.00..178.00 rows=50 width=68) > > Filter: ((t)::text = 'X17'::text) > > (2 rows) > > I still think you're playing games with the output. a) This is not an EXPLAIN > ANALYZE at all, there are no "actual" values. And b) there's no explanation > for why the estimates should be different for this query than the previous, > identical, query. My mistake, copy&paste error. > Send along the actual psql session, not an edited version. Actual session is attached. If I may suggest it -- try to run the queries yourself. You will find the problem lies not in the statistics. Regards, Dawid
Attachment
"Dawid Kuroczko" <qnex42@gmail.com> writes: > If I may suggest it -- try to run the queries yourself. You will find the > problem lies not in the statistics. I was more concerned that there might be other discrepancies between the commands in the email and the actual commands you're running. Running it myself I do see the same behaviour in 8.3. I'm not sure whether this is something we expect to work or not though. Binary-compatible types are a bit of weirdness I still haven't quite absorbed. postgres=# alter table foo alter column i type text; ALTER TABLE postgres=# analyze foo; ANALYZE postgres=# explain analyze select * from foo where i='17'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using foo_i_index on foo (cost=0.00..8.28 rows=1 width=5) (actual time=0.132..0.138 rows=1 loops=1) Index Cond: (i = '17'::text) Total runtime: 0.235 ms (3 rows) postgres=# alter table foo alter column i type varchar(100); ALTER TABLE postgres=# analyze foo; ANALYZE postgres=# explain analyze select * from foo where i='17'; QUERY PLAN -------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=0.00..1681.00 rows=1 width=5) (actual time=0.147..281.349 rows=1 loops=1) Filter: ((i)::text = '17'::text) Total runtime: 281.448 ms (3 rows) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Running it myself I do see the same behaviour in 8.3. I'm not sure whether > this is something we expect to work or not though. It looks like it might be relatively easy to fix, but I haven't dug down to find exactly where things go wrong. Presumably there's something in predtest.c that's not quite smart enough. regards, tom lane
"Dawid Kuroczko" <qnex42@gmail.com> writes: > Basically it looks like planner makes better use of > WHERE ... IS NOT NULL indexes if either you explicitly > put "text" as a column type or that you cast the column > to ::text when making index. I've applied a patch for this. regards, tom lane