Thread: bpchar, text and indexes
Greetings.
--
I'd like to understand why Postgres behaves the way it does.
I was not able to find relevant mail thread myself, if one exists — please, point at it.
Test setup:
PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
create table t(t_id int4, sn_c char(20));
insert into t select id, chr((random()*26)::int4+65)||chr((random()*26)::int4+65)||((random()*99999)::int4+1) from generate_series(1, 10000) id;
create index i_t_sn_c on t(sn_c);
vacuum analyze t;
Now, if I do a typical query, all is good:
postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234';
QUERY PLAN
-------------------------------------------------------------------------------
Index Only Scan using i_t_sn_c on t (actual time=0.015..0.015 rows=0 loops=1)
Index Cond: (sn_c = 'AB1234'::bpchar)
Heap Fetches: 0
If I explicitly cast constant to `text`, then Postgres will add `(sn_c)::text` cast, which disables index:
postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'::text;
QUERY PLAN
---------------------------------------------------------
Seq Scan on t (actual time=5.729..5.729 rows=0 loops=1)
Filter: ((sn_c)::text = 'AB1234'::text)
Rows Removed by Filter: 10000
Although, if I will use LIKE instead of equality, then index is used:
postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c ~~ 'AB1234'::text;
QUERY PLAN
-------------------------------------------------------------------------------
Index Only Scan using i_t_sn_c on t (actual time=0.012..0.012 rows=0 loops=1)
Index Cond: (sn_c = 'AB1234'::bpchar)
Filter: (sn_c ~~ 'AB1234'::text)
Heap Fetches: 0
And what I also see is — `varchar` has no such effect:
postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'::varchar;
QUERY PLAN
-------------------------------------------------------------------------------
Index Only Scan using i_t_sn_c on t (actual time=0.041..0.041 rows=0 loops=1)
Index Cond: (sn_c = 'AB1234'::bpchar)
Heap Fetches: 0
My questions are:
1. according to `pg_cast`, `text` => `bpchar` is binary coercible. Why Postgres is casting `sn_c` to `text` here, disabling index usage?
2. as I can see in `pg_cast`, setup for `varchar` is pretty much the same: `varchar` => `bpchar` is also binary coercible. So why for `varchar` behaviour is different?
Thanks in advance.
Victor Y. Yegorov
Victor Yegorov schrieb am 22.02.2016 um 16:45: > Test setup: > > PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit > > create table t(t_id int4, sn_c char(20)); > insert into t select id, chr((random()*26)::int4+65)||chr((random()*26)::int4+65)||((random()*99999)::int4+1) from generate_series(1,10000) id; > create index i_t_sn_c on t(sn_c); > vacuum analyze t; > > Now, if I do a typical query, all is good: > > postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'; > QUERY PLAN > ------------------------------------------------------------------------------- > Index Only Scan using i_t_sn_c on t (actual time=0.015..0.015 rows=0 loops=1) > Index Cond: (sn_c = 'AB1234'::bpchar) > Heap Fetches: 0 > > > If I explicitly cast constant to `text`, then Postgres will add `(sn_c)::text` cast, which disables index: > > postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'::text; > QUERY PLAN > --------------------------------------------------------- > Seq Scan on t (actual time=5.729..5.729 rows=0 loops=1) > Filter: ((sn_c)::text = 'AB1234'::text) > Rows Removed by Filter: 10000 > I assume that this has to do with the fact that char(n) is blank padded to 20 character. To be able to correctly compare that to a text value, sn_c has to be casted to text and then the index (which contains blankpadded values) can not be used any more. Another very good example why the dreaded char() should not be used ;) If you use varchar(20) instead of char(20) both queries yield the same execution plan (at least on my local 9.5) Thomas
2016-02-22 18:00 GMT+02:00 Thomas Kellerer <spam_eater@gmx.net>:
I assume that this has to do with the fact that char(n) is blank padded to 20 character.
Yes, this was my thought too, but I do not understand what is going on in details.
To be able to correctly compare that to a text value, sn_c has to be casted to text and then the index (which contains blank padded values) can not be used any more.
Well, for `varchar` type Postgres is able to do `varchar` -> `bpchar` cast for my constant. I do not understand why for `text` it cannot and casts column instead.
Victor Y. Yegorov
Victor Yegorov <vyegorov@gmail.com> writes: > Well, for `varchar` type Postgres is able to do `varchar` -> `bpchar` cast > for my constant. I do not understand why for `text` it cannot and casts > column instead. In cross-type comparisons like these, the parser basically has a choice between whether to apply texteq or bpchareq. It's going to have to cast one side or the other either way. It prefers to cast to text, not away from text, because text is the preferred type in the string category. So "bpchar = text" is resolved as texteq. In the "bpchar = varchar" case, that doesn't happen because neither side is text to start with, so bpchareq is chosen on the grounds of requiring fewer casts. There's no varchareq operator (because varchar just relies on text's operators). If there were, you'd probably get an "ambiguous operator, please cast" error from "bpchar = varchar", because neither of those are preferred types so there would be no way to prefer one interpretation over the other. Similarly, if text weren't treated as a preferred type then "bpchar = text" would just fail, it would still not be resolved the way you want. See http://www.postgresql.org/docs/9.5/static/typeconv-oper.html for a more detailed explanation. regards, tom lane