Dubious usage of TYPCATEGORY_STRING - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Dubious usage of TYPCATEGORY_STRING |
Date | |
Msg-id | 2216388.1638480141@sss.pgh.pa.us Whole thread Raw |
Responses |
types reliant on encodings [was Re: Dubious usage of TYPCATEGORY_STRING]
Re: Dubious usage of TYPCATEGORY_STRING Re: Dubious usage of TYPCATEGORY_STRING TYPCATEGORY_{NETWORK,USER} [was Dubious usage of TYPCATEGORY_STRING] |
List | pgsql-hackers |
The parser's type-coercion heuristics include some special rules for types belonging to the STRING category, which are predicated on the assumption that such types are reasonably general-purpose string types. This assumption has been violated by a few types, one error being ancient and the rest not so much: 1. The "char" type is labeled as STRING category, even though it's (a) deprecated for general use and (b) unable to store more than one byte, making "string" quite a misnomer. 2. Various types we invented to store special catalog data, such as pg_node_tree and pg_ndistinct, are also labeled as STRING category. This seems like a fairly bad idea too. An example of the reasons not to treat these types as being general-purpose strings can be seen at [1], where the "char" type has acquired some never-intended cast behaviors. Taking that to an extreme, we currently accept regression=# select '(1,2)'::point::"char"; char ------ ( (1 row) My first thought about fixing point 1 was to put "char" into some other typcategory, but that turns out to break some of psql's catalog queries, with results like: regression=# \dp ERROR: operator is not unique: unknown || "char" LINE 16: E' (' || polcmd || E'):' ^ HINT: Could not choose a best candidate operator. You might need to add explicit type casts. I looked briefly at rejiggering the casting rules so that that would still work, but it looks like a mess. The problem is that unknown || "char" can match either text || text or text || anynonarray, and it's only the special preference for preferred types *of the same typcategory as the input type* that allows us to prefer one of those over the other. Hence, what 0001 below does is to leave "char" in the string category, but explicitly disable its access to the special cast-via-I/O rules. This is a hack for sure, but it won't have any surprising side-effects on other types, which changing the general operator-matching rules could. The only thing it breaks in check-world is that contrib/citext expects casting between "char" and citext to work. I think that's not a very reasonable expectation so I just took out the relevant test cases. (If anyone is hot about it, we could add explicit support for such casts in the citext module ... but it doesn't seem worth the trouble.) As for point 2, I haven't found any negative side-effects of taking the special types out of the string category, so 0002 attached invents a separate TYPCATEGORY_INTERNAL category to put them in. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAOC8YUcXymCMpC5d%3D7JvcwyjXPTT00WeebOM3UqTBreOD1N9hw%40mail.gmail.com diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out index ec99aaed5d..307d292d56 100644 --- a/contrib/citext/expected/citext.out +++ b/contrib/citext/expected/citext.out @@ -721,18 +721,6 @@ SELECT 'f'::citext::char = 'f'::char AS t; t (1 row) -SELECT 'f'::"char"::citext = 'f' AS t; - t ---- - t -(1 row) - -SELECT 'f'::citext::"char" = 'f'::"char" AS t; - t ---- - t -(1 row) - SELECT '100'::money::citext = '$100.00' AS t; t --- @@ -1041,7 +1029,6 @@ CREATE TABLE caster ( varchar varchar, bpchar bpchar, char char, - chr "char", name name, bytea bytea, boolean boolean, @@ -1087,10 +1074,6 @@ INSERT INTO caster (char) VALUES ('f'::text); INSERT INTO caster (text) VALUES ('f'::char); INSERT INTO caster (char) VALUES ('f'::citext); INSERT INTO caster (citext) VALUES ('f'::char); -INSERT INTO caster (chr) VALUES ('f'::text); -INSERT INTO caster (text) VALUES ('f'::"char"); -INSERT INTO caster (chr) VALUES ('f'::citext); -INSERT INTO caster (citext) VALUES ('f'::"char"); INSERT INTO caster (name) VALUES ('foo'::text); INSERT INTO caster (text) VALUES ('foo'::name); INSERT INTO caster (name) VALUES ('foo'::citext); diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out index 75fd08b7cc..9f423b7496 100644 --- a/contrib/citext/expected/citext_1.out +++ b/contrib/citext/expected/citext_1.out @@ -721,18 +721,6 @@ SELECT 'f'::citext::char = 'f'::char AS t; t (1 row) -SELECT 'f'::"char"::citext = 'f' AS t; - t ---- - t -(1 row) - -SELECT 'f'::citext::"char" = 'f'::"char" AS t; - t ---- - t -(1 row) - SELECT '100'::money::citext = '$100.00' AS t; t --- @@ -1041,7 +1029,6 @@ CREATE TABLE caster ( varchar varchar, bpchar bpchar, char char, - chr "char", name name, bytea bytea, boolean boolean, @@ -1087,10 +1074,6 @@ INSERT INTO caster (char) VALUES ('f'::text); INSERT INTO caster (text) VALUES ('f'::char); INSERT INTO caster (char) VALUES ('f'::citext); INSERT INTO caster (citext) VALUES ('f'::char); -INSERT INTO caster (chr) VALUES ('f'::text); -INSERT INTO caster (text) VALUES ('f'::"char"); -INSERT INTO caster (chr) VALUES ('f'::citext); -INSERT INTO caster (citext) VALUES ('f'::"char"); INSERT INTO caster (name) VALUES ('foo'::text); INSERT INTO caster (text) VALUES ('foo'::name); INSERT INTO caster (name) VALUES ('foo'::citext); diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql index 10232f5a9f..30ce6b807d 100644 --- a/contrib/citext/sql/citext.sql +++ b/contrib/citext/sql/citext.sql @@ -230,9 +230,6 @@ SELECT 'foo'::citext::name = 'foo'::name AS t; SELECT 'f'::char::citext = 'f' AS t; SELECT 'f'::citext::char = 'f'::char AS t; -SELECT 'f'::"char"::citext = 'f' AS t; -SELECT 'f'::citext::"char" = 'f'::"char" AS t; - SELECT '100'::money::citext = '$100.00' AS t; SELECT '100'::citext::money = '100'::money AS t; @@ -308,7 +305,6 @@ CREATE TABLE caster ( varchar varchar, bpchar bpchar, char char, - chr "char", name name, bytea bytea, boolean boolean, @@ -359,11 +355,6 @@ INSERT INTO caster (text) VALUES ('f'::char); INSERT INTO caster (char) VALUES ('f'::citext); INSERT INTO caster (citext) VALUES ('f'::char); -INSERT INTO caster (chr) VALUES ('f'::text); -INSERT INTO caster (text) VALUES ('f'::"char"); -INSERT INTO caster (chr) VALUES ('f'::citext); -INSERT INTO caster (citext) VALUES ('f'::"char"); - INSERT INTO caster (name) VALUES ('foo'::text); INSERT INTO caster (text) VALUES ('foo'::name); INSERT INTO caster (name) VALUES ('foo'::citext); diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c index 78194afedf..4ece7278f4 100644 --- a/src/backend/parser/parse_coerce.c +++ b/src/backend/parser/parse_coerce.c @@ -3139,13 +3139,19 @@ find_coercion_pathway(Oid targetTypeId, Oid sourceTypeId, * probably be better to insist on explicit casts in both directions, * but this is a compromise to preserve something of the pre-8.3 * behavior that many types had implicit (yipes!) casts to text. + * + * As a special case, don't treat type "char" as being a string type + * for this purpose. In principle we should give it some other + * typcategory, but doing so breaks cases that we want to work. */ if (result == COERCION_PATH_NONE) { if (ccontext >= COERCION_ASSIGNMENT && + targetTypeId != CHAROID && TypeCategory(targetTypeId) == TYPCATEGORY_STRING) result = COERCION_PATH_COERCEVIAIO; else if (ccontext >= COERCION_EXPLICIT && + sourceTypeId != CHAROID && TypeCategory(sourceTypeId) == TYPCATEGORY_STRING) result = COERCION_PATH_COERCEVIAIO; } diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index c1d11be73f..216aa4510d 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9305,6 +9305,10 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l <entry><literal>X</literal></entry> <entry><type>unknown</type> type</entry> </row> + <row> + <entry><literal>Z</literal></entry> + <entry>Internal-use types</entry> + </row> </tbody> </tgroup> </table> diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 41074c994b..14d50376a1 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -145,24 +145,24 @@ typsend => 'xml_send', typalign => 'i', typstorage => 'x' }, { oid => '194', descr => 'string representing an internal node tree', typname => 'pg_node_tree', typlen => '-1', typbyval => 'f', - typcategory => 'S', typinput => 'pg_node_tree_in', + typcategory => 'Z', typinput => 'pg_node_tree_in', typoutput => 'pg_node_tree_out', typreceive => 'pg_node_tree_recv', typsend => 'pg_node_tree_send', typalign => 'i', typstorage => 'x', typcollation => 'default' }, { oid => '3361', descr => 'multivariate ndistinct coefficients', typname => 'pg_ndistinct', typlen => '-1', typbyval => 'f', - typcategory => 'S', typinput => 'pg_ndistinct_in', + typcategory => 'Z', typinput => 'pg_ndistinct_in', typoutput => 'pg_ndistinct_out', typreceive => 'pg_ndistinct_recv', typsend => 'pg_ndistinct_send', typalign => 'i', typstorage => 'x', typcollation => 'default' }, { oid => '3402', descr => 'multivariate dependencies', typname => 'pg_dependencies', typlen => '-1', typbyval => 'f', - typcategory => 'S', typinput => 'pg_dependencies_in', + typcategory => 'Z', typinput => 'pg_dependencies_in', typoutput => 'pg_dependencies_out', typreceive => 'pg_dependencies_recv', typsend => 'pg_dependencies_send', typalign => 'i', typstorage => 'x', typcollation => 'default' }, { oid => '5017', descr => 'multivariate MCV list', - typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'S', + typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'Z', typinput => 'pg_mcv_list_in', typoutput => 'pg_mcv_list_out', typreceive => 'pg_mcv_list_recv', typsend => 'pg_mcv_list_send', typalign => 'i', typstorage => 'x', typcollation => 'default' }, @@ -681,13 +681,13 @@ typalign => 'd', typstorage => 'x' }, { oid => '4600', descr => 'BRIN bloom summary', typname => 'pg_brin_bloom_summary', typlen => '-1', typbyval => 'f', - typcategory => 'S', typinput => 'brin_bloom_summary_in', + typcategory => 'Z', typinput => 'brin_bloom_summary_in', typoutput => 'brin_bloom_summary_out', typreceive => 'brin_bloom_summary_recv', typsend => 'brin_bloom_summary_send', typalign => 'i', typstorage => 'x', typcollation => 'default' }, { oid => '4601', descr => 'BRIN minmax-multi summary', typname => 'pg_brin_minmax_multi_summary', typlen => '-1', typbyval => 'f', - typcategory => 'S', typinput => 'brin_minmax_multi_summary_in', + typcategory => 'Z', typinput => 'brin_minmax_multi_summary_in', typoutput => 'brin_minmax_multi_summary_out', typreceive => 'brin_minmax_multi_summary_recv', typsend => 'brin_minmax_multi_summary_send', typalign => 'i', diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h index e568e21dee..5e891a0596 100644 --- a/src/include/catalog/pg_type.h +++ b/src/include/catalog/pg_type.h @@ -294,6 +294,7 @@ DECLARE_UNIQUE_INDEX(pg_type_typname_nsp_index, 2704, TypeNameNspIndexId, on pg_ #define TYPCATEGORY_USER 'U' #define TYPCATEGORY_BITSTRING 'V' /* er ... "varbit"? */ #define TYPCATEGORY_UNKNOWN 'X' +#define TYPCATEGORY_INTERNAL 'Z' #define TYPALIGN_CHAR 'c' /* char alignment (i.e. unaligned) */ #define TYPALIGN_SHORT 's' /* short alignment (typically 2 bytes) */
pgsql-hackers by date: