Dubious usage of TYPCATEGORY_STRING - Mailing list pgsql-hackers

The parser's type-coercion heuristics include some special rules
for types belonging to the STRING category, which are predicated
on the assumption that such types are reasonably general-purpose
string types.  This assumption has been violated by a few types,
one error being ancient and the rest not so much:

1. The "char" type is labeled as STRING category, even though it's
(a) deprecated for general use and (b) unable to store more than
one byte, making "string" quite a misnomer.

2. Various types we invented to store special catalog data, such as
pg_node_tree and pg_ndistinct, are also labeled as STRING category.
This seems like a fairly bad idea too.

An example of the reasons not to treat these types as being
general-purpose strings can be seen at [1], where the "char"
type has acquired some never-intended cast behaviors.  Taking
that to an extreme, we currently accept

regression=# select '(1,2)'::point::"char";
 char
------
 (
(1 row)

My first thought about fixing point 1 was to put "char" into some
other typcategory, but that turns out to break some of psql's
catalog queries, with results like:

regression=# \dp
ERROR:  operator is not unique: unknown || "char"
LINE 16:            E' (' || polcmd || E'):'
                          ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

I looked briefly at rejiggering the casting rules so that that
would still work, but it looks like a mess.  The problem is that
unknown || "char" can match either text || text or text || anynonarray,
and it's only the special preference for preferred types *of the
same typcategory as the input type* that allows us to prefer one
of those over the other.

Hence, what 0001 below does is to leave "char" in the string
category, but explicitly disable its access to the special
cast-via-I/O rules.  This is a hack for sure, but it won't have
any surprising side-effects on other types, which changing the
general operator-matching rules could.  The only thing it breaks
in check-world is that contrib/citext expects casting between
"char" and citext to work.  I think that's not a very reasonable
expectation so I just took out the relevant test cases.  (If anyone
is hot about it, we could add explicit support for such casts in
the citext module ... but it doesn't seem worth the trouble.)

As for point 2, I haven't found any negative side-effects of
taking the special types out of the string category, so 0002
attached invents a separate TYPCATEGORY_INTERNAL category to
put them in.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAOC8YUcXymCMpC5d%3D7JvcwyjXPTT00WeebOM3UqTBreOD1N9hw%40mail.gmail.com

diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index ec99aaed5d..307d292d56 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -721,18 +721,6 @@ SELECT 'f'::citext::char = 'f'::char AS t;
  t
 (1 row)

-SELECT 'f'::"char"::citext = 'f' AS t;
- t
----
- t
-(1 row)
-
-SELECT 'f'::citext::"char" = 'f'::"char" AS t;
- t
----
- t
-(1 row)
-
 SELECT '100'::money::citext = '$100.00' AS t;
  t
 ---
@@ -1041,7 +1029,6 @@ CREATE TABLE caster (
     varchar     varchar,
     bpchar      bpchar,
     char        char,
-    chr         "char",
     name        name,
     bytea       bytea,
     boolean     boolean,
@@ -1087,10 +1074,6 @@ INSERT INTO caster (char)          VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::char);
 INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);
-INSERT INTO caster (chr)           VALUES ('f'::text);
-INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
-INSERT INTO caster (citext)        VALUES ('f'::"char");
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
 INSERT INTO caster (name)          VALUES ('foo'::citext);
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index 75fd08b7cc..9f423b7496 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -721,18 +721,6 @@ SELECT 'f'::citext::char = 'f'::char AS t;
  t
 (1 row)

-SELECT 'f'::"char"::citext = 'f' AS t;
- t
----
- t
-(1 row)
-
-SELECT 'f'::citext::"char" = 'f'::"char" AS t;
- t
----
- t
-(1 row)
-
 SELECT '100'::money::citext = '$100.00' AS t;
  t
 ---
@@ -1041,7 +1029,6 @@ CREATE TABLE caster (
     varchar     varchar,
     bpchar      bpchar,
     char        char,
-    chr         "char",
     name        name,
     bytea       bytea,
     boolean     boolean,
@@ -1087,10 +1074,6 @@ INSERT INTO caster (char)          VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::char);
 INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);
-INSERT INTO caster (chr)           VALUES ('f'::text);
-INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
-INSERT INTO caster (citext)        VALUES ('f'::"char");
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
 INSERT INTO caster (name)          VALUES ('foo'::citext);
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index 10232f5a9f..30ce6b807d 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -230,9 +230,6 @@ SELECT 'foo'::citext::name = 'foo'::name AS t;
 SELECT 'f'::char::citext = 'f' AS t;
 SELECT 'f'::citext::char = 'f'::char AS t;

-SELECT 'f'::"char"::citext = 'f' AS t;
-SELECT 'f'::citext::"char" = 'f'::"char" AS t;
-
 SELECT '100'::money::citext = '$100.00' AS t;
 SELECT '100'::citext::money = '100'::money AS t;

@@ -308,7 +305,6 @@ CREATE TABLE caster (
     varchar     varchar,
     bpchar      bpchar,
     char        char,
-    chr         "char",
     name        name,
     bytea       bytea,
     boolean     boolean,
@@ -359,11 +355,6 @@ INSERT INTO caster (text)          VALUES ('f'::char);
 INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);

-INSERT INTO caster (chr)           VALUES ('f'::text);
-INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
-INSERT INTO caster (citext)        VALUES ('f'::"char");
-
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
 INSERT INTO caster (name)          VALUES ('foo'::citext);
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index 78194afedf..4ece7278f4 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -3139,13 +3139,19 @@ find_coercion_pathway(Oid targetTypeId, Oid sourceTypeId,
          * probably be better to insist on explicit casts in both directions,
          * but this is a compromise to preserve something of the pre-8.3
          * behavior that many types had implicit (yipes!) casts to text.
+         *
+         * As a special case, don't treat type "char" as being a string type
+         * for this purpose.  In principle we should give it some other
+         * typcategory, but doing so breaks cases that we want to work.
          */
         if (result == COERCION_PATH_NONE)
         {
             if (ccontext >= COERCION_ASSIGNMENT &&
+                targetTypeId != CHAROID &&
                 TypeCategory(targetTypeId) == TYPCATEGORY_STRING)
                 result = COERCION_PATH_COERCEVIAIO;
             else if (ccontext >= COERCION_EXPLICIT &&
+                     sourceTypeId != CHAROID &&
                      TypeCategory(sourceTypeId) == TYPCATEGORY_STRING)
                 result = COERCION_PATH_COERCEVIAIO;
         }
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1d11be73f..216aa4510d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9305,6 +9305,10 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
       <entry><literal>X</literal></entry>
       <entry><type>unknown</type> type</entry>
      </row>
+     <row>
+      <entry><literal>Z</literal></entry>
+      <entry>Internal-use types</entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 41074c994b..14d50376a1 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -145,24 +145,24 @@
   typsend => 'xml_send', typalign => 'i', typstorage => 'x' },
 { oid => '194', descr => 'string representing an internal node tree',
   typname => 'pg_node_tree', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_node_tree_in',
+  typcategory => 'Z', typinput => 'pg_node_tree_in',
   typoutput => 'pg_node_tree_out', typreceive => 'pg_node_tree_recv',
   typsend => 'pg_node_tree_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '3361', descr => 'multivariate ndistinct coefficients',
   typname => 'pg_ndistinct', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_ndistinct_in',
+  typcategory => 'Z', typinput => 'pg_ndistinct_in',
   typoutput => 'pg_ndistinct_out', typreceive => 'pg_ndistinct_recv',
   typsend => 'pg_ndistinct_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '3402', descr => 'multivariate dependencies',
   typname => 'pg_dependencies', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_dependencies_in',
+  typcategory => 'Z', typinput => 'pg_dependencies_in',
   typoutput => 'pg_dependencies_out', typreceive => 'pg_dependencies_recv',
   typsend => 'pg_dependencies_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '5017', descr => 'multivariate MCV list',
-  typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'S',
+  typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'Z',
   typinput => 'pg_mcv_list_in', typoutput => 'pg_mcv_list_out',
   typreceive => 'pg_mcv_list_recv', typsend => 'pg_mcv_list_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
@@ -681,13 +681,13 @@
   typalign => 'd', typstorage => 'x' },
 { oid => '4600', descr => 'BRIN bloom summary',
   typname => 'pg_brin_bloom_summary', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'brin_bloom_summary_in',
+  typcategory => 'Z', typinput => 'brin_bloom_summary_in',
   typoutput => 'brin_bloom_summary_out',
   typreceive => 'brin_bloom_summary_recv', typsend => 'brin_bloom_summary_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
 { oid => '4601', descr => 'BRIN minmax-multi summary',
   typname => 'pg_brin_minmax_multi_summary', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'brin_minmax_multi_summary_in',
+  typcategory => 'Z', typinput => 'brin_minmax_multi_summary_in',
   typoutput => 'brin_minmax_multi_summary_out',
   typreceive => 'brin_minmax_multi_summary_recv',
   typsend => 'brin_minmax_multi_summary_send', typalign => 'i',
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index e568e21dee..5e891a0596 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -294,6 +294,7 @@ DECLARE_UNIQUE_INDEX(pg_type_typname_nsp_index, 2704, TypeNameNspIndexId, on pg_
 #define  TYPCATEGORY_USER        'U'
 #define  TYPCATEGORY_BITSTRING    'V' /* er ... "varbit"? */
 #define  TYPCATEGORY_UNKNOWN    'X'
+#define  TYPCATEGORY_INTERNAL    'Z'

 #define  TYPALIGN_CHAR            'c' /* char alignment (i.e. unaligned) */
 #define  TYPALIGN_SHORT            's' /* short alignment (typically 2 bytes) */

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Next
From: "Bossart, Nathan"
Date:
Subject: Re: O(n) tasks cause lengthy startups and checkpoints