Thread: Dubious usage of TYPCATEGORY_STRING

Dubious usage of TYPCATEGORY_STRING

From
Tom Lane
Date:
The parser's type-coercion heuristics include some special rules
for types belonging to the STRING category, which are predicated
on the assumption that such types are reasonably general-purpose
string types.  This assumption has been violated by a few types,
one error being ancient and the rest not so much:

1. The "char" type is labeled as STRING category, even though it's
(a) deprecated for general use and (b) unable to store more than
one byte, making "string" quite a misnomer.

2. Various types we invented to store special catalog data, such as
pg_node_tree and pg_ndistinct, are also labeled as STRING category.
This seems like a fairly bad idea too.

An example of the reasons not to treat these types as being
general-purpose strings can be seen at [1], where the "char"
type has acquired some never-intended cast behaviors.  Taking
that to an extreme, we currently accept

regression=# select '(1,2)'::point::"char";
 char
------
 (
(1 row)

My first thought about fixing point 1 was to put "char" into some
other typcategory, but that turns out to break some of psql's
catalog queries, with results like:

regression=# \dp
ERROR:  operator is not unique: unknown || "char"
LINE 16:            E' (' || polcmd || E'):'
                          ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

I looked briefly at rejiggering the casting rules so that that
would still work, but it looks like a mess.  The problem is that
unknown || "char" can match either text || text or text || anynonarray,
and it's only the special preference for preferred types *of the
same typcategory as the input type* that allows us to prefer one
of those over the other.

Hence, what 0001 below does is to leave "char" in the string
category, but explicitly disable its access to the special
cast-via-I/O rules.  This is a hack for sure, but it won't have
any surprising side-effects on other types, which changing the
general operator-matching rules could.  The only thing it breaks
in check-world is that contrib/citext expects casting between
"char" and citext to work.  I think that's not a very reasonable
expectation so I just took out the relevant test cases.  (If anyone
is hot about it, we could add explicit support for such casts in
the citext module ... but it doesn't seem worth the trouble.)

As for point 2, I haven't found any negative side-effects of
taking the special types out of the string category, so 0002
attached invents a separate TYPCATEGORY_INTERNAL category to
put them in.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAOC8YUcXymCMpC5d%3D7JvcwyjXPTT00WeebOM3UqTBreOD1N9hw%40mail.gmail.com

diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index ec99aaed5d..307d292d56 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -721,18 +721,6 @@ SELECT 'f'::citext::char = 'f'::char AS t;
  t
 (1 row)

-SELECT 'f'::"char"::citext = 'f' AS t;
- t
----
- t
-(1 row)
-
-SELECT 'f'::citext::"char" = 'f'::"char" AS t;
- t
----
- t
-(1 row)
-
 SELECT '100'::money::citext = '$100.00' AS t;
  t
 ---
@@ -1041,7 +1029,6 @@ CREATE TABLE caster (
     varchar     varchar,
     bpchar      bpchar,
     char        char,
-    chr         "char",
     name        name,
     bytea       bytea,
     boolean     boolean,
@@ -1087,10 +1074,6 @@ INSERT INTO caster (char)          VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::char);
 INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);
-INSERT INTO caster (chr)           VALUES ('f'::text);
-INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
-INSERT INTO caster (citext)        VALUES ('f'::"char");
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
 INSERT INTO caster (name)          VALUES ('foo'::citext);
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index 75fd08b7cc..9f423b7496 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -721,18 +721,6 @@ SELECT 'f'::citext::char = 'f'::char AS t;
  t
 (1 row)

-SELECT 'f'::"char"::citext = 'f' AS t;
- t
----
- t
-(1 row)
-
-SELECT 'f'::citext::"char" = 'f'::"char" AS t;
- t
----
- t
-(1 row)
-
 SELECT '100'::money::citext = '$100.00' AS t;
  t
 ---
@@ -1041,7 +1029,6 @@ CREATE TABLE caster (
     varchar     varchar,
     bpchar      bpchar,
     char        char,
-    chr         "char",
     name        name,
     bytea       bytea,
     boolean     boolean,
@@ -1087,10 +1074,6 @@ INSERT INTO caster (char)          VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::char);
 INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);
-INSERT INTO caster (chr)           VALUES ('f'::text);
-INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
-INSERT INTO caster (citext)        VALUES ('f'::"char");
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
 INSERT INTO caster (name)          VALUES ('foo'::citext);
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index 10232f5a9f..30ce6b807d 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -230,9 +230,6 @@ SELECT 'foo'::citext::name = 'foo'::name AS t;
 SELECT 'f'::char::citext = 'f' AS t;
 SELECT 'f'::citext::char = 'f'::char AS t;

-SELECT 'f'::"char"::citext = 'f' AS t;
-SELECT 'f'::citext::"char" = 'f'::"char" AS t;
-
 SELECT '100'::money::citext = '$100.00' AS t;
 SELECT '100'::citext::money = '100'::money AS t;

@@ -308,7 +305,6 @@ CREATE TABLE caster (
     varchar     varchar,
     bpchar      bpchar,
     char        char,
-    chr         "char",
     name        name,
     bytea       bytea,
     boolean     boolean,
@@ -359,11 +355,6 @@ INSERT INTO caster (text)          VALUES ('f'::char);
 INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);

-INSERT INTO caster (chr)           VALUES ('f'::text);
-INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
-INSERT INTO caster (citext)        VALUES ('f'::"char");
-
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
 INSERT INTO caster (name)          VALUES ('foo'::citext);
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index 78194afedf..4ece7278f4 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -3139,13 +3139,19 @@ find_coercion_pathway(Oid targetTypeId, Oid sourceTypeId,
          * probably be better to insist on explicit casts in both directions,
          * but this is a compromise to preserve something of the pre-8.3
          * behavior that many types had implicit (yipes!) casts to text.
+         *
+         * As a special case, don't treat type "char" as being a string type
+         * for this purpose.  In principle we should give it some other
+         * typcategory, but doing so breaks cases that we want to work.
          */
         if (result == COERCION_PATH_NONE)
         {
             if (ccontext >= COERCION_ASSIGNMENT &&
+                targetTypeId != CHAROID &&
                 TypeCategory(targetTypeId) == TYPCATEGORY_STRING)
                 result = COERCION_PATH_COERCEVIAIO;
             else if (ccontext >= COERCION_EXPLICIT &&
+                     sourceTypeId != CHAROID &&
                      TypeCategory(sourceTypeId) == TYPCATEGORY_STRING)
                 result = COERCION_PATH_COERCEVIAIO;
         }
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1d11be73f..216aa4510d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9305,6 +9305,10 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
       <entry><literal>X</literal></entry>
       <entry><type>unknown</type> type</entry>
      </row>
+     <row>
+      <entry><literal>Z</literal></entry>
+      <entry>Internal-use types</entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 41074c994b..14d50376a1 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -145,24 +145,24 @@
   typsend => 'xml_send', typalign => 'i', typstorage => 'x' },
 { oid => '194', descr => 'string representing an internal node tree',
   typname => 'pg_node_tree', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_node_tree_in',
+  typcategory => 'Z', typinput => 'pg_node_tree_in',
   typoutput => 'pg_node_tree_out', typreceive => 'pg_node_tree_recv',
   typsend => 'pg_node_tree_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '3361', descr => 'multivariate ndistinct coefficients',
   typname => 'pg_ndistinct', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_ndistinct_in',
+  typcategory => 'Z', typinput => 'pg_ndistinct_in',
   typoutput => 'pg_ndistinct_out', typreceive => 'pg_ndistinct_recv',
   typsend => 'pg_ndistinct_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '3402', descr => 'multivariate dependencies',
   typname => 'pg_dependencies', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_dependencies_in',
+  typcategory => 'Z', typinput => 'pg_dependencies_in',
   typoutput => 'pg_dependencies_out', typreceive => 'pg_dependencies_recv',
   typsend => 'pg_dependencies_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '5017', descr => 'multivariate MCV list',
-  typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'S',
+  typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'Z',
   typinput => 'pg_mcv_list_in', typoutput => 'pg_mcv_list_out',
   typreceive => 'pg_mcv_list_recv', typsend => 'pg_mcv_list_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
@@ -681,13 +681,13 @@
   typalign => 'd', typstorage => 'x' },
 { oid => '4600', descr => 'BRIN bloom summary',
   typname => 'pg_brin_bloom_summary', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'brin_bloom_summary_in',
+  typcategory => 'Z', typinput => 'brin_bloom_summary_in',
   typoutput => 'brin_bloom_summary_out',
   typreceive => 'brin_bloom_summary_recv', typsend => 'brin_bloom_summary_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
 { oid => '4601', descr => 'BRIN minmax-multi summary',
   typname => 'pg_brin_minmax_multi_summary', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'brin_minmax_multi_summary_in',
+  typcategory => 'Z', typinput => 'brin_minmax_multi_summary_in',
   typoutput => 'brin_minmax_multi_summary_out',
   typreceive => 'brin_minmax_multi_summary_recv',
   typsend => 'brin_minmax_multi_summary_send', typalign => 'i',
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index e568e21dee..5e891a0596 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -294,6 +294,7 @@ DECLARE_UNIQUE_INDEX(pg_type_typname_nsp_index, 2704, TypeNameNspIndexId, on pg_
 #define  TYPCATEGORY_USER        'U'
 #define  TYPCATEGORY_BITSTRING    'V' /* er ... "varbit"? */
 #define  TYPCATEGORY_UNKNOWN    'X'
+#define  TYPCATEGORY_INTERNAL    'Z'

 #define  TYPALIGN_CHAR            'c' /* char alignment (i.e. unaligned) */
 #define  TYPALIGN_SHORT            's' /* short alignment (typically 2 bytes) */

types reliant on encodings [was Re: Dubious usage of TYPCATEGORY_STRING]

From
Chapman Flack
Date:
On 12/02/21 16:22, Tom Lane wrote:
> ... types belonging to the STRING category, which are predicated
> on the assumption that such types are reasonably general-purpose
> string types.

This prods me to submit a question I've been incubating for a while.

Is there any way to find out, from the catalogs or in any automatable way,
which types are implemented with a dependence on the database encoding
(or on some encoding)?

You might think S category types, for a start: name, text, character,
varchar, all dependent on the server encoding, as you'd expect. The ones
Tom moves here to category Z were most of the ones I wondered about.

Then there's "char". It's category S, but does not apply the server
encoding. You could call it an 8-bit int type, but it's typically used
as a character, making it well-defined for ASCII values and not so
for others, just like SQL_ASCII encoding. You could as well say that
the "char" type has a defined encoding of SQL_ASCII at all times,
regardless of the database encoding.

U types are a mixed bag. Category U includes bytea (no character encoding)
and xml/json/jsonb (server encoding). Also tied to the server encoding
are cstring and unknown.

As an aside, I think it's unfortunate that the xml type has this implicit
dependency on the server encoding, when XML is by definition Unicode.
It means there are valid XML documents that PostgreSQL may not be able
to store, and which documents those are depends on what the database
encoding is. I think json and jsonb suffer in the same way.

Changing that would be disruptive at this point and I'm not suggesting it,
but there might be value in the thought experiment to see what the
alternate universe would look like.

In the alternate world, you would know that certain datatypes were
inherently encoding-oblivious (numbers, polygons, times, ...), certain
others are bound to the server encoding (text, varchar, name, ...), and
still others are bound to a known encoding other than the server encoding:
the ISO SQL NCHAR type (bound to an alternate configurable database
encoding), "char" (always SQL_ASCII), xml/json/jsonb (always with the full
Unicode repertoire, however they choose to represent it internally).

That last parenthetical reminded me that I'm really talking
about 'repertoire' here, which ISO SQL treats as a separate topic from
'encoding'. Exactly how an xml or jsonb type is represented internally
might be none of my business (unless I am developing a binary-capable
driver), but it's fair to ask what its repertoire is, and whether that's
full Unicode or not, and if not, whether the repertoire changes when some
server setting does.

I also think in that ideal world, or even this one, you could want
some way to query the catalogs to answer that basic question
about some given type.

Am I right that we simply don't have that? I currently answer such questions
by querying the catalog for the type's _send or _recv function name, then
going off to read the code, but that's hard to automate.

Regards,
-Chap



Re: Dubious usage of TYPCATEGORY_STRING

From
Peter Eisentraut
Date:
On 02.12.21 22:22, Tom Lane wrote:
> My first thought about fixing point 1 was to put "char" into some
> other typcategory, but that turns out to break some of psql's
> catalog queries, with results like:
> 
> regression=# \dp
> ERROR:  operator is not unique: unknown || "char"
> LINE 16:            E' (' || polcmd || E'):'
>                            ^
> HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

Could we add explicit casts (like polcmd::text) here?  Or would it break 
too much?




Re: Dubious usage of TYPCATEGORY_STRING

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 02.12.21 22:22, Tom Lane wrote:
>> My first thought about fixing point 1 was to put "char" into some
>> other typcategory, but that turns out to break some of psql's
>> catalog queries, with results like:
>>
>> regression=# \dp
>> ERROR:  operator is not unique: unknown || "char"
>> LINE 16:            E' (' || polcmd || E'):'
>> ^
>> HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

> Could we add explicit casts (like polcmd::text) here?  Or would it break
> too much?

I assumed it'd break too much to consider doing that.  But I suppose
that since a typcategory change would be initdb-forcing anyway, maybe
it's not out of the question.  I'll investigate and see exactly how
many places would need an explicit cast.

            regards, tom lane



Re: types reliant on encodings [was Re: Dubious usage of TYPCATEGORY_STRING]

From
Peter Eisentraut
Date:
On 03.12.21 19:42, Chapman Flack wrote:
> Is there any way to find out, from the catalogs or in any automatable way,
> which types are implemented with a dependence on the database encoding
> (or on some encoding)?

What is this needed for?  C code can internally do whatever it wants, 
and the database encoding is effectively a constant, so there is no need 
for server-side code to be very much concerned about whether types do this.

Also, "types" is perhaps the wrong subject here.  Types only contain 
input and output functions and a few more bits.  Additional functions 
operating on the type could look at the server encoding without the type 
and its core functions knowing about it.




Re: Dubious usage of TYPCATEGORY_STRING

From
Robert Haas
Date:
On Thu, Dec 2, 2021 at 4:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> An example of the reasons not to treat these types as being
> general-purpose strings can be seen at [1], where the "char"
> type has acquired some never-intended cast behaviors.  Taking
> that to an extreme, we currently accept
>
> regression=# select '(1,2)'::point::"char";
>  char
> ------
>  (
> (1 row)

What's wrong with that?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Dubious usage of TYPCATEGORY_STRING

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 2, 2021 at 4:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> An example of the reasons not to treat these types as being
>> general-purpose strings can be seen at [1], where the "char"
>> type has acquired some never-intended cast behaviors.  Taking
>> that to an extreme, we currently accept
>> 
>> regression=# select '(1,2)'::point::"char";
>> char
>> ------
>> (
>> (1 row)

> What's wrong with that?

Well, we don't allow things like

regression=# select '(1,2)'::point::float8;
ERROR:  cannot cast type point to double precision
LINE 1: select '(1,2)'::point::float8;
                             ^

It's not very clear to me why "char" should get a pass on that.
We allow such cases when the target is text/varchar/etc, but
the assumption is that the textual representation is sufficient
for your purposes.  It's hard to claim that just the first
byte is a useful textual representation.

Worse, PG is actually treating this as an assignment-level cast,
so we accept this:

regression=# create table t(f1 "char");
CREATE TABLE
regression=# insert into t values ('(1,2)'::point);
INSERT 0 1
regression=# table t;
 f1 
----
 (
(1 row)

I definitely don't think that should have worked without
any complaint.

            regards, tom lane



Re: Dubious usage of TYPCATEGORY_STRING

From
Robert Haas
Date:
On Tue, Dec 7, 2021 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What's wrong with that?
>
> Well, we don't allow things like
>
> regression=# select '(1,2)'::point::float8;
> ERROR:  cannot cast type point to double precision
> LINE 1: select '(1,2)'::point::float8;
>                              ^
>
> It's not very clear to me why "char" should get a pass on that.
> We allow such cases when the target is text/varchar/etc, but
> the assumption is that the textual representation is sufficient
> for your purposes.  It's hard to claim that just the first
> byte is a useful textual representation.

Fair enough, I guess. I am pretty skeptical of the merits of refusing
an explicit cast. If I ran the zoo, I would probably choose to allow
all such casts and make them coerce via IO when no other pathway is
available. But I get that's not our policy.

> Worse, PG is actually treating this as an assignment-level cast,
> so we accept this:
>
> regression=# create table t(f1 "char");
> CREATE TABLE
> regression=# insert into t values ('(1,2)'::point);
> INSERT 0 1
> regression=# table t;
>  f1
> ----
>  (
> (1 row)
>
> I definitely don't think that should have worked without
> any complaint.

Yes, that one's a bridge too far, even for me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Dubious usage of TYPCATEGORY_STRING

From
Tom Lane
Date:
I wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> Could we add explicit casts (like polcmd::text) here?  Or would it break 
>> too much?

> I assumed it'd break too much to consider doing that.  But I suppose
> that since a typcategory change would be initdb-forcing anyway, maybe
> it's not out of the question.  I'll investigate and see exactly how
> many places would need an explicit cast.

Um, I definitely gave up too easily there.  The one usage in \dp
seems to be the *only* thing that breaks in describe.c, and pg_dump
doesn't need any changes so far as check-world reveals.  So let's
just move "char" to another category, as attached.

            regards, tom lane

diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index ec99aaed5d..3bac0534fb 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -1089,7 +1089,12 @@ INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);
 INSERT INTO caster (chr)           VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
+INSERT INTO caster (chr)           VALUES ('f'::citext);  -- requires cast
+ERROR:  column "chr" is of type "char" but expression is of type citext
+LINE 1: INSERT INTO caster (chr)           VALUES ('f'::citext);
+                                                   ^
+HINT:  You will need to rewrite or cast the expression.
+INSERT INTO caster (chr)           VALUES ('f'::citext::text);
 INSERT INTO caster (citext)        VALUES ('f'::"char");
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index 75fd08b7cc..57fc863f7a 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -1089,7 +1089,12 @@ INSERT INTO caster (char)          VALUES ('f'::citext);
 INSERT INTO caster (citext)        VALUES ('f'::char);
 INSERT INTO caster (chr)           VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
+INSERT INTO caster (chr)           VALUES ('f'::citext);  -- requires cast
+ERROR:  column "chr" is of type "char" but expression is of type citext
+LINE 1: INSERT INTO caster (chr)           VALUES ('f'::citext);
+                                                   ^
+HINT:  You will need to rewrite or cast the expression.
+INSERT INTO caster (chr)           VALUES ('f'::citext::text);
 INSERT INTO caster (citext)        VALUES ('f'::"char");
 INSERT INTO caster (name)          VALUES ('foo'::text);
 INSERT INTO caster (text)          VALUES ('foo'::name);
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index 10232f5a9f..55fb1d11a6 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -361,7 +361,8 @@ INSERT INTO caster (citext)        VALUES ('f'::char);

 INSERT INTO caster (chr)           VALUES ('f'::text);
 INSERT INTO caster (text)          VALUES ('f'::"char");
-INSERT INTO caster (chr)           VALUES ('f'::citext);
+INSERT INTO caster (chr)           VALUES ('f'::citext);  -- requires cast
+INSERT INTO caster (chr)           VALUES ('f'::citext::text);
 INSERT INTO caster (citext)        VALUES ('f'::"char");

 INSERT INTO caster (name)          VALUES ('foo'::text);
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1d11be73f..216aa4510d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9305,6 +9305,10 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
       <entry><literal>X</literal></entry>
       <entry><type>unknown</type> type</entry>
      </row>
+     <row>
+      <entry><literal>Z</literal></entry>
+      <entry>Internal-use types</entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index ea721d963a..72d8547628 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1142,7 +1142,7 @@ permissionsList(const char *pattern)
                           ",\n  pg_catalog.array_to_string(ARRAY(\n"
                           "    SELECT polname\n"
                           "    || CASE WHEN polcmd != '*' THEN\n"
-                          "           E' (' || polcmd || E'):'\n"
+                          "           E' (' || polcmd::pg_catalog.text || E'):'\n"
                           "       ELSE E':'\n"
                           "       END\n"
                           "    || CASE WHEN polqual IS NOT NULL THEN\n"
@@ -1176,7 +1176,7 @@ permissionsList(const char *pattern)
                           "       E' (RESTRICTIVE)'\n"
                           "       ELSE '' END\n"
                           "    || CASE WHEN polcmd != '*' THEN\n"
-                          "           E' (' || polcmd || E'):'\n"
+                          "           E' (' || polcmd::pg_catalog.text || E'):'\n"
                           "       ELSE E':'\n"
                           "       END\n"
                           "    || CASE WHEN polqual IS NOT NULL THEN\n"
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 41074c994b..f3d94f3cf5 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -42,7 +42,7 @@
   typinput => 'byteain', typoutput => 'byteaout', typreceive => 'bytearecv',
   typsend => 'byteasend', typalign => 'i', typstorage => 'x' },
 { oid => '18', array_type_oid => '1002', descr => 'single character',
-  typname => 'char', typlen => '1', typbyval => 't', typcategory => 'S',
+  typname => 'char', typlen => '1', typbyval => 't', typcategory => 'Z',
   typinput => 'charin', typoutput => 'charout', typreceive => 'charrecv',
   typsend => 'charsend', typalign => 'c' },
 { oid => '19', array_type_oid => '1003',
@@ -145,24 +145,24 @@
   typsend => 'xml_send', typalign => 'i', typstorage => 'x' },
 { oid => '194', descr => 'string representing an internal node tree',
   typname => 'pg_node_tree', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_node_tree_in',
+  typcategory => 'Z', typinput => 'pg_node_tree_in',
   typoutput => 'pg_node_tree_out', typreceive => 'pg_node_tree_recv',
   typsend => 'pg_node_tree_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '3361', descr => 'multivariate ndistinct coefficients',
   typname => 'pg_ndistinct', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_ndistinct_in',
+  typcategory => 'Z', typinput => 'pg_ndistinct_in',
   typoutput => 'pg_ndistinct_out', typreceive => 'pg_ndistinct_recv',
   typsend => 'pg_ndistinct_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '3402', descr => 'multivariate dependencies',
   typname => 'pg_dependencies', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'pg_dependencies_in',
+  typcategory => 'Z', typinput => 'pg_dependencies_in',
   typoutput => 'pg_dependencies_out', typreceive => 'pg_dependencies_recv',
   typsend => 'pg_dependencies_send', typalign => 'i', typstorage => 'x',
   typcollation => 'default' },
 { oid => '5017', descr => 'multivariate MCV list',
-  typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'S',
+  typname => 'pg_mcv_list', typlen => '-1', typbyval => 'f', typcategory => 'Z',
   typinput => 'pg_mcv_list_in', typoutput => 'pg_mcv_list_out',
   typreceive => 'pg_mcv_list_recv', typsend => 'pg_mcv_list_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
@@ -681,13 +681,13 @@
   typalign => 'd', typstorage => 'x' },
 { oid => '4600', descr => 'BRIN bloom summary',
   typname => 'pg_brin_bloom_summary', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'brin_bloom_summary_in',
+  typcategory => 'Z', typinput => 'brin_bloom_summary_in',
   typoutput => 'brin_bloom_summary_out',
   typreceive => 'brin_bloom_summary_recv', typsend => 'brin_bloom_summary_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
 { oid => '4601', descr => 'BRIN minmax-multi summary',
   typname => 'pg_brin_minmax_multi_summary', typlen => '-1', typbyval => 'f',
-  typcategory => 'S', typinput => 'brin_minmax_multi_summary_in',
+  typcategory => 'Z', typinput => 'brin_minmax_multi_summary_in',
   typoutput => 'brin_minmax_multi_summary_out',
   typreceive => 'brin_minmax_multi_summary_recv',
   typsend => 'brin_minmax_multi_summary_send', typalign => 'i',
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index e568e21dee..5e891a0596 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -294,6 +294,7 @@ DECLARE_UNIQUE_INDEX(pg_type_typname_nsp_index, 2704, TypeNameNspIndexId, on pg_
 #define  TYPCATEGORY_USER        'U'
 #define  TYPCATEGORY_BITSTRING    'V' /* er ... "varbit"? */
 #define  TYPCATEGORY_UNKNOWN    'X'
+#define  TYPCATEGORY_INTERNAL    'Z'

 #define  TYPALIGN_CHAR            'c' /* char alignment (i.e. unaligned) */
 #define  TYPALIGN_SHORT            's' /* short alignment (typically 2 bytes) */

Re: Dubious usage of TYPCATEGORY_STRING

From
Peter Eisentraut
Date:
On 07.12.21 21:24, Tom Lane wrote:
> I wrote:
>> Peter Eisentraut<peter.eisentraut@enterprisedb.com>  writes:
>>> Could we add explicit casts (like polcmd::text) here?  Or would it break
>>> too much?
>> I assumed it'd break too much to consider doing that.  But I suppose
>> that since a typcategory change would be initdb-forcing anyway, maybe
>> it's not out of the question.  I'll investigate and see exactly how
>> many places would need an explicit cast.
> Um, I definitely gave up too easily there.  The one usage in \dp
> seems to be the*only*  thing that breaks in describe.c, and pg_dump
> doesn't need any changes so far as check-world reveals.  So let's
> just move "char" to another category, as attached.

Looks good to me.



Re: Dubious usage of TYPCATEGORY_STRING

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 07.12.21 21:24, Tom Lane wrote:
>> Um, I definitely gave up too easily there.  The one usage in \dp
>> seems to be the*only*  thing that breaks in describe.c, and pg_dump
>> doesn't need any changes so far as check-world reveals.  So let's
>> just move "char" to another category, as attached.

> Looks good to me.

Pushed, thanks for reviewing.

            regards, tom lane



TYPCATEGORY_{NETWORK,USER} [was Dubious usage of TYPCATEGORY_STRING]

From
Chapman Flack
Date:
On 12/02/21 16:22, Tom Lane wrote:
> taking the special types out of the string category, so 0002
> attached invents a separate TYPCATEGORY_INTERNAL category to
> put them in.

On the same general topic, was there a deliberate choice to put
inet and cidr in TYPCATEGORY_NETWORK but macaddr and macaddr8
in TYPCATEGORY_USER?

It looks like macaddr was put in category U (macaddr8 didn't exist yet)
in bac3e83, the same commit that put inet and cidr into category I,
apparently in order to "hew exactly to the behavior of the previous
hardwired logic", on the principle that "any adjustment of the standard
set of categories should be done separately".

The birth of macaddr looks to have been back in 1998 in 2d69fd9, the
same commit that added 'ipaddr'. Neither was added at that time to
the hardcoded switch in TypeCategory(). The plot thickens....

ipaddr became inet in 8849655 (8 Oct 1998). cidr was added in 858a3b5
(21 Oct 1998).

Then ca2995 added NETWORK_TYPE to TypeCategory and put inet and cidr
in it (22 Oct 1998). Looks like that was done to reduce duplication
of pg_proc entries between inet and cidr by allowing implicit coercion.

And I guess you wouldn't want to suggest the existence of coercions
between MAC addresses and inet addresses.

But there aren't any such casts present in pg_cast anyway, so is that
a persuasive present-day rationale for the (otherwise odd-seeming) split
of these types across categories? They are grouped in a single
documentation "category".

Regards,
-Chap



Re: TYPCATEGORY_{NETWORK,USER} [was Dubious usage of TYPCATEGORY_STRING]

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> On the same general topic, was there a deliberate choice to put
> inet and cidr in TYPCATEGORY_NETWORK but macaddr and macaddr8
> in TYPCATEGORY_USER?

Hard to say how "deliberate" it was, at this remove of time.

I do see an argument against reclassifying macaddr[8] into
TYPCATEGORY_NETWORK now: we generally expect that if a
category has a preferred type, any member type of the category
can be cast to that preferred type.  (The fact that OID is
marked preferred breaks that rule, but it holds pretty well
otherwise.)  I think this is why type interval has its own
category rather than being within TYPCATEGORY_DATETIME.

            regards, tom lane



Re: TYPCATEGORY_{NETWORK,USER} [was Dubious usage of TYPCATEGORY_STRING]

From
Chapman Flack
Date:
On 01/03/22 13:55, Tom Lane wrote:
> I do see an argument against reclassifying macaddr[8] into
> TYPCATEGORY_NETWORK now: we generally expect that if a
> category has a preferred type, any member type of the category
> can be cast to that preferred type.

I was wondering about the details of how that information gets used.
It seems partly redundant with what you learn from pg_cast. The
CREATE TYPE documentation says:

  The category and preferred parameters can be used to help control
  which implicit cast will be applied in ambiguous situations. ...
  For types that have no implicit casts to or from any other types,
  it is sufficient to leave these settings at the defaults. However,
  for a group of related types that have implicit casts, it is often
  helpful ...

which would suggest (to me on a first reading, anyway) that one starts
in pg_cast to find out what implicit casts, if any, exist, and then
looks to category and preferred if needed to resolve any ambiguity
that remains.

If understood that way, it doesn't seem to imply any ill effect of
having types within a category that might be partitioned into a few
disjoint subsets by "implicit cast exists between". (Such subsets
might be regarded as autodiscovered mini-categories.) But I could be
off-base to understand it that way.

Are there spots in the code where the expectation "if a category has
a preferred type, any member type of the category can be cast to that
preferred type" really takes that stronger form?

Hmm, I guess I can see some spots in Chapter 10, in the rules for
finding best-match operators or functions, or resolving UNION/CASE
types.

The UNION/CASE rules look like the effect might be benign: you have
step 4, inputs not of the same category => fail, then step 5, where
discovery of a preferred type can foreclose consideration of other
inputs, then step 6, implicit cast doesn't exist => fail. At first
blush, maybe that only fails the same cases that (if you treated
implicit-cast-related subsets within a category as mini-categories)
you would have failed in step 4.

The operator and function resolution rules seem harder to reason about,
and yeah, I haven't convinced myself their "any candidate accepts a
preferred type => discard candidates accepting non-preferred types" rules
couldn't end up discarding the part of the solution space where the
solution is, if disjoint "mini-categories" exist.

Regards,
-Chap