Thread: Patch: Add parse_type Function

Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid
andtypmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s
parseTypeString()function. 

The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as
renderedby, for example, pg_dump. An example: 

david=# WITH s(s) AS (
    SELECT * FROM unnest(ARRAY[
        'timestamp(4)',
        'interval(0)',
        'interval second(0)',
        'timestamptz',
        'timestamptz(6)',
        'varchar',
        'varchar(128)'
    ])
),
p(type, typid, typmod) AS (
    SELECT s, ((parse_type(s)).*)
      FROM s
)
SELECT type, format_type(typid, typmod) FROM p;
        type        |          format_type
--------------------+--------------------------------
 timestamp(4)       | timestamp(4) without time zone
 interval(0)        | interval(0)
 interval second(0) | interval second(0)
 timestamptz        | timestamp with time zone
 timestamptz(6)     | timestamp(6) with time zone
 varchar            | character varying
 varchar(128)       | character varying(128)
(7 rows)


The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names
fortesting data types, but recently added support for aliases. This broke for some types, because it was difficult to
extractthe typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)`
isthe typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to
`format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and
`pg_parse_type_string()`.Whatever the consensus is works for me. 

* The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that
justcontains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think
itmight also be useful to have access to the typmod and typid directly via this method, since they’re already exposed
as`format_type()`’s arguments. 

* Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(),
buthappy to move it to a more appropriate location. 

* I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not
sureit resolves. 

Best,

David


Attachment

Re: Patch: Add parse_type Function

From
Pavel Stehule
Date:
Hi

ne 4. 2. 2024 v 18:51 odesílatel David E. Wheeler <david@justatheory.com> napsal:
Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid and typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s parseTypeString() function.

The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as rendered by, for example, pg_dump. An example:

david=# WITH s(s) AS (
    SELECT * FROM unnest(ARRAY[
        'timestamp(4)',
        'interval(0)',
        'interval second(0)',
        'timestamptz',
        'timestamptz(6)',
        'varchar',
        'varchar(128)'
    ])         
),
p(type, typid, typmod) AS (
    SELECT s, ((parse_type(s)).*)
      FROM s                     
)           
SELECT type, format_type(typid, typmod) FROM p;
        type        |          format_type           
--------------------+--------------------------------
 timestamp(4)       | timestamp(4) without time zone
 interval(0)        | interval(0)
 interval second(0) | interval second(0)
 timestamptz        | timestamp with time zone
 timestamptz(6)     | timestamp(6) with time zone
 varchar            | character varying
 varchar(128)       | character varying(128)
(7 rows)


The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names for testing data types, but recently added support for aliases. This broke for some types, because it was difficult to extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to `format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. Whatever the consensus is works for me.

there can be an analogy with parse_ident, so parse_type looks ok


* The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that just contains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think it might also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as `format_type()`’s arguments.

I think so record has sense for this case
 

* Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), but happy to move it to a more appropriate location.

yes
 

* I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sure it resolves.

I thinks so proposed functionality can be useful

Regards

Pavel
 

Best,

David

Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 4, 2024, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> I thinks so proposed functionality can be useful 

Great, thank you!

Is that a review? :-)

D





Re: Patch: Add parse_type Function

From
Pavel Stehule
Date:


ne 4. 2. 2024 v 19:30 odesílatel David E. Wheeler <david@justatheory.com> napsal:
On Feb 4, 2024, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> I thinks so proposed functionality can be useful

Great, thank you!

Is that a review? :-)

not yet, but I'll do it

Pavel


 

D


Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 4, 2024, at 13:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> not yet, but I'll do it

Nice, thank you. I put it into the Commitfest.

  https://commitfest.postgresql.org/47/4807/

Best,

David




Re: Patch: Add parse_type Function

From
Jim Jones
Date:
Hi David,
Thanks for the patch.

On 04.02.24 18:51, David E. Wheeler wrote:
> Hackers,
>
> Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the
typidand typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the
parser’sparseTypeString() function.
 
>
> The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as
renderedby, for example, pg_dump. An example:
 
>
> david=# WITH s(s) AS (
>     SELECT * FROM unnest(ARRAY[
>         'timestamp(4)',
>         'interval(0)',
>         'interval second(0)',
>         'timestamptz',
>         'timestamptz(6)',
>         'varchar',
>         'varchar(128)' 
>     ])         
> ),
> p(type, typid, typmod) AS (
>     SELECT s, ((parse_type(s)).*)
>       FROM s                     
> )           
> SELECT type, format_type(typid, typmod) FROM p;
>         type        |          format_type           
> --------------------+--------------------------------
>  timestamp(4)       | timestamp(4) without time zone
>  interval(0)        | interval(0)
>  interval second(0) | interval second(0)
>  timestamptz        | timestamp with time zone
>  timestamptz(6)     | timestamp(6) with time zone
>  varchar            | character varying
>  varchar(128)       | character varying(128)
> (7 rows)
>
>
> The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type
namesfor testing data types, but recently added support for aliases. This broke for some types, because it was
difficultto extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where
`second(0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass
to`format_type()`.
 
>
> Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch.
>
> Potential issues and questions for this patch:
>
> * Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and
`pg_parse_type_string()`.Whatever the consensus is works for me.
 
>
> * The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that
justcontains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think
itmight also be useful to have access to the typmod and typid directly via this method, since they’re already exposed
as`format_type()`’s arguments.
 
>
> * Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(),
buthappy to move it to a more appropriate location.
 
>
> * I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not
sureit resolves.
 
>
> Best,
>
> David
>
The patch applies cleanly

Checking patch doc/src/sgml/func.sgml...
Checking patch src/backend/utils/adt/format_type.c...
Checking patch src/include/catalog/pg_proc.dat...
Checking patch src/include/utils/builtins.h...
Checking patch src/test/regress/expected/create_type.out...
Checking patch src/test/regress/sql/create_type.sql...
Applied patch doc/src/sgml/func.sgml cleanly.
Applied patch src/backend/utils/adt/format_type.c cleanly.
Applied patch src/include/catalog/pg_proc.dat cleanly.
Applied patch src/include/utils/builtins.h cleanly.
Applied patch src/test/regress/expected/create_type.out cleanly.
Applied patch src/test/regress/sql/create_type.sql cleanly.

The docs render just fine and all tests pass (check and check-world).

There are a few issues though:

1) The function causes a crash when called with a NULL parameter:

SELECT * FROM parse_type(NULL);
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

You have to check it in the beginning of function and either return an
error message or just NULL, e.g

    if (PG_ARGISNULL(0))
        PG_RETURN_NULL();

2) Add a function call with NULL in the regression tests

SELECT * FROM parse_type(NULL);

3) Add the expected behaviour of calling the function with NULL in the
docs (error message or null)

4) The name of the returned columns is repeated in the docs
> Returns a record with two fields, <parameter>typid</parameter> and
<parameter>typid</parameter>

-- 
Jim




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 5, 2024, at 08:02, Jim Jones <jim.jones@uni-muenster.de> wrote:

> There are a few issues though:

Excellent, thank you very much! Updated patch attached.

Best,

David



Attachment

Re: Patch: Add parse_type Function

From
Dagfinn Ilmari Mannsåker
Date:
Jim Jones <jim.jones@uni-muenster.de> writes:

> 1) The function causes a crash when called with a NULL parameter:
>
> SELECT * FROM parse_type(NULL);
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
>
> You have to check it in the beginning of function and either return an
> error message or just NULL, e.g
>
>     if (PG_ARGISNULL(0))
>         PG_RETURN_NULL();

Instead of handling this in the function body, the function should be
declared as strict.  This is in fact the default in pg_proc.h,

    /* strict with respect to NULLs? */
    bool        proisstrict BKI_DEFAULT(t);

so removing the explicit proisstrict => 'f' from the pg_proc.dat entry
will fix it with no additional code.

> 2) Add a function call with NULL in the regression tests
>
> SELECT * FROM parse_type(NULL);
>
> 3) Add the expected behaviour of calling the function with NULL in the
> docs (error message or null)

Once the function is declared strict, I don't think either of these is
necessary: function strictness is tested elsewhere, and it's the default
behaviour.  The only functions that explicitly say they return NULL on
NULL inputs are quote_literal (because you might expect it to return the
string 'NULL', but there's qoute_nullable for that) and xmlexists (which
I don't see any particular reason for).

- ilmari



Re: Patch: Add parse_type Function

From
Jim Jones
Date:
On 05.02.24 15:10, David E. Wheeler wrote:
> Excellent, thank you very much! Updated patch attached.
>
> Best,
>
> David

v2 no longer crashes with a null parameter.
docs and regress tests were updated accordingly.

patch no longer applies cleanly (tiny little indentation issue):

/home/jim/Downloads/v2-0001-Add-parse_type-SQL-function.patch:140:
indent with spaces.
        PG_RETURN_NULL();
warning: 1 line adds whitespace errors.

I read the comments again, and something is not entirely clear to me.
Line 494 says "Raises an error on an invalid type." and 501 says
"Returns NULL for an invalid type."
Perhaps merging both comment blocks and rephrasing these sentences would
make things clearer?

-- 
Jim




Re: Patch: Add parse_type Function

From
Jim Jones
Date:
On 05.02.24 15:32, Dagfinn Ilmari Mannsåker wrote:
> Once the function is declared strict, I don't think either of these is
> necessary: function strictness is tested elsewhere, and it's the default
> behaviour.  The only functions that explicitly say they return NULL on
> NULL inputs are quote_literal (because you might expect it to return the
> string 'NULL', but there's qoute_nullable for that) and xmlexists (which
> I don't see any particular reason for).
>
> - ilmari
>
>
+1
Yes, if the function was strict (which in the current design is not the
case) there is no need to check for nulls.

-- 
Jim




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 5, 2024, at 09:49, Jim Jones <jim.jones@uni-muenster.de> wrote:

> Yes, if the function was strict (which in the current design is not the
> case) there is no need to check for nulls.

Right, done, and cleaned up the redundant comments.

Best,

David


Attachment

Re: Patch: Add parse_type Function

From
Jim Jones
Date:
On 05.02.24 16:14, David E. Wheeler wrote:
> Right, done, and cleaned up the redundant comments.
>
> Best,
>
> David

Nice.
The patch applies cleanly and it no longer crashes with NULL parameters.
Docs render fine and regression tests pass. I have nothing more to add.
Let's now wait for Pavel's review.

Thanks!

-- 
Jim




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 5, 2024, at 10:47 AM, Jim Jones <jim.jones@uni-muenster.de> wrote:

> The patch applies cleanly and it no longer crashes with NULL parameters.
> Docs render fine and regression tests pass. I have nothing more to add.
> Let's now wait for Pavel's review.

Much obliged!

D




Re: Patch: Add parse_type Function

From
jian he
Date:
+ /*
+ * Parse type-name argument to obtain type OID and encoded typmod. We don't
+ * need to check for parseTypeString failure, but just let the error be
+ * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16
+ * and the `bool missing_ok` arg in 9.4-15.
+ */
+ (void) parseTypeString(type, &typid, &typmod, 0);
if you are wondering around other code deal with NULL, ErrorSaveContext.
NULL would be more correct?
`(void) parseTypeString(type, &typid, &typmod, NULL);`

also parseTypeString already explained the third argument, our
comments can be simplified as:
/*
* Parse type-name argument to obtain type OID and encoded typmod. We don't
* need to handle parseTypeString failure, just let the error be
* raised.
*/

cosmetic issue. Looking around other error handling code, the format
should be something like:
`
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
    ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                    errmsg("function returning record called in"
                                 "context that cannot accept type record")));
`

`PG_FUNCTION_INFO_V1(parse_type);`
is unnecessary?
based on the error information:  https://cirrus-ci.com/task/5899457502380032
not sure why it only fails on windows.

+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+#undef PARSE_TYPE_STRING_COLS
Are these necessary?
given that the comments already say return the OID and type modifier.


+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
here `int4` should be `integer`?

commit message:
`Also accounts for data typs that require the SQL grammar to be parsed:`
except the typo issue, this sentence is still hard for me to understand.

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitabld for passing to `format_type()`.

suitabld should be suitable.


+       <para>
+        Parses a string representing an SQL type declaration as used in a
+        <command>CREATE TABLE</command> statement, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>

can be simplified:
+       <para>
+        Parses a string representing an SQL data type, optionally
schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>
(I don't have a strong opinion though)



Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
Let me comment on some issues since I wrote the very first version of
parse_type() on which David's patch is based.

> On 2024-02-01 01:00 +0100, jian he <jian.universality@gmail.com> wrote:
> 
> cosmetic issue. Looking around other error handling code, the format
> should be something like:
> `
> if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
>     ereport(ERROR,
>                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                     errmsg("function returning record called in"
>                                  "context that cannot accept type record")));
> `

+1

> `PG_FUNCTION_INFO_V1(parse_type);`
> is unnecessary?
> based on the error information:  https://cirrus-ci.com/task/5899457502380032
> not sure why it only fails on windows.

Yeah, it's not necessary for internal functions per [1].  It's a
leftover from when this function was part of the pgtap extension.

> +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
> +#undef PARSE_TYPE_STRING_COLS
> Are these necessary?
> given that the comments already say return the OID and type modifier.

Not sure what the convention here is but I found this in a couple of
places, maybe even a tutorial on writing C functions.  See
`git grep '_COLS\s\+[1-9]'` for those instances.  It's in line with my
habit of avoiding magic constants.

> +        ( <parameter>typid</parameter> <type>oid</type>,
> +          <parameter>typmod</parameter> <type>int4</type> )
> here `int4` should be `integer`?

+1

> commit message:
> `Also accounts for data typs that require the SQL grammar to be parsed:`
> except the typo issue, this sentence is still hard for me to understand.

Yes, the sentence is rather handwavy.  What is meant here is that this
function also resolves types whose typmod is determined by the SQL
parser or some step after that.  There are types whose typmod is
whatever value is found inside the parenthesis, e.g. bit(13) has typmod
13.  Our first idea before coming up with that function was to do some
string manipulation and extract the typmod from inside the parenthesis.
But we soon found out that other typmods don't translate one to one,
e.g.  varchar(13) has typmod 17.  The interval type is also special
because the typmod is some bitmask encoding of e.g. 'second(0)'.  Hence
the mention of the SQL grammar.

> +       <para>
> +        Parses a string representing an SQL type declaration as used in a
> +        <command>CREATE TABLE</command> statement, optionally schema-qualified.
> +        Returns a record with two fields, <parameter>typid</parameter> and
> +        <parameter>typmod</parameter>, representing the OID and
> modifier for the
> +        type. These correspond to the parameters to pass to the
> +        <link linkend="format_type"><function>format_type</function>
> function.</link>
> +       </para>
> 
> can be simplified:
> +       <para>
> +        Parses a string representing an SQL data type, optionally
> schema-qualified.
> +        Returns a record with two fields, <parameter>typid</parameter> and
> +        <parameter>typmod</parameter>, representing the OID and
> modifier for the
> +        type. These correspond to the parameters to pass to the
> +        <link linkend="format_type"><function>format_type</function>
> function.</link>
> +       </para>
> (I don't have a strong opinion though)

Sounds better.  The CREATE TABLE reference is superfluous.

[1] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-V1-CALL-CONV

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 10, 2024, at 20:52, Erik Wienhold <ewie@ewie.name> wrote:
>
> Let me comment on some issues since I wrote the very first version of
> parse_type() on which David's patch is based.

Thanks Erik.

>> On 2024-02-01 01:00 +0100, jian he <jian.universality@gmail.com> wrote:
>> if you are wondering around other code deal with NULL, ErrorSaveContext.
>> NULL would be more correct?
>> `(void) parseTypeString(type, &typid, &typmod, NULL);`

Fixed.

>> also parseTypeString already explained the third argument, our
>> comments can be simplified as:
>> /*
>> * Parse type-name argument to obtain type OID and encoded typmod. We don't
>> * need to handle parseTypeString failure, just let the error be
>> * raised.
>> */


Thanks, adopted that language.

>> cosmetic issue. Looking around other error handling code, the format
>> should be something like:
>> `
>> if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
>>    ereport(ERROR,
>>                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>                    errmsg("function returning record called in"
>>                                 "context that cannot accept type record")));
>> `
>
> +1

I poked around and found some other examples, yes. I went with a single long line for errmsg following the example in
adminpack.c[1]

>> `PG_FUNCTION_INFO_V1(parse_type);`
>> is unnecessary?
>> based on the error information:  https://cirrus-ci.com/task/5899457502380032
>> not sure why it only fails on windows.
>
> Yeah, it's not necessary for internal functions per [1].  It's a
> leftover from when this function was part of the pgtap extension.

Removed.

>> +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
>> +#undef PARSE_TYPE_STRING_COLS
>> Are these necessary?
>> given that the comments already say return the OID and type modifier.
>
> Not sure what the convention here is but I found this in a couple of
> places, maybe even a tutorial on writing C functions.  See
> `git grep '_COLS\s\+[1-9]'` for those instances.  It's in line with my
> habit of avoiding magic constants.

Left in place for now.

>
>> +        ( <parameter>typid</parameter> <type>oid</type>,
>> +          <parameter>typmod</parameter> <type>int4</type> )
>> here `int4` should be `integer`?
>
> +1

Fixed.

> Yes, the sentence is rather handwavy.  What is meant here is that this
> function also resolves types whose typmod is determined by the SQL
> parser or some step after that.  There are types whose typmod is
> whatever value is found inside the parenthesis, e.g. bit(13) has typmod
> 13.  Our first idea before coming up with that function was to do some
> string manipulation and extract the typmod from inside the parenthesis.
> But we soon found out that other typmods don't translate one to one,
> e.g.  varchar(13) has typmod 17.  The interval type is also special
> because the typmod is some bitmask encoding of e.g. 'second(0)'.  Hence
> the mention of the SQL grammar.

I adopted some of your language here --- and fixed the spelling errors :-)

>> can be simplified:
>> +       <para>
>> +        Parses a string representing an SQL data type, optionally
>> schema-qualified.
>> +        Returns a record with two fields, <parameter>typid</parameter> and
>> +        <parameter>typmod</parameter>, representing the OID and
>> modifier for the
>> +        type. These correspond to the parameters to pass to the
>> +        <link linkend="format_type"><function>format_type</function>
>> function.</link>
>> +       </para>
>> (I don't have a strong opinion though)
>
> Sounds better.  The CREATE TABLE reference is superfluous.

Done.

Best,

David
[1]
https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/contrib/adminpack/adminpack.c#L508-L511




Attachment

Re: Patch: Add parse_type Function

From
Tom Lane
Date:
"David E. Wheeler" <david@justatheory.com> writes:
> [ v4-0001-Add-parse_type-SQL-function.patch ]

It strikes me that this is basically to_regtype() with the additional
option to return the typmod.  That leads to some questions:

* Should we choose a name related to to_regtype?  I don't have any
immediate suggestions, but since you didn't seem entirely set on
parse_type, maybe it's worth thinking along those lines.  OTOH,
to the extent that people would use this with format_type, maybe
parse_type is fine.

* Perhaps the type OID output argument should be declared regtype
not plain OID?  It wouldn't make any difference for passing it to
format_type, but in other use-cases perhaps regtype would be more
readable.  It's not a deal-breaker either way of course, since
you can cast oid to regtype or vice versa.

* Maybe the code should be in adt/regproc.c not format_type.c.

* Experience with to_regtype suggests strongly that people would
prefer "return NULL" to failure for an unrecognized type name.


Skimming the patch, I notice that the manual addition to
builtins.h should be unnecessary: the pg_proc.dat entry
should be enough to create an extern in fmgrprotos.h.
Also, I'm pretty sure that reformat_dat_file.pl will
think your pg_proc.dat entry is overly verbose.  See
https://www.postgresql.org/docs/devel/system-catalog-initial-data.html

            regards, tom lane



Re: Patch: Add parse_type Function

From
Tom Lane
Date:
I wrote:
> It strikes me that this is basically to_regtype() with the additional
> option to return the typmod.  That leads to some questions:

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

This is intellectually ugly, since it implies parsing the same
typename string twice.  But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function.  So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use.  I don't
have an opinion particularly; just throwing the idea out there.

            regards, tom lane



Re: Patch: Add parse_type Function

From
Pavel Stehule
Date:


po 12. 2. 2024 v 19:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
I wrote:
> It strikes me that this is basically to_regtype() with the additional
> option to return the typmod.  That leads to some questions:

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

This is intellectually ugly, since it implies parsing the same
typename string twice.  But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function.  So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use.  I don't
have an opinion particularly; just throwing the idea out there.

+1

Pavel

                        regards, tom lane

Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 12, 2024, at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "David E. Wheeler" <david@justatheory.com> writes:
>> [ v4-0001-Add-parse_type-SQL-function.patch ]
>
> It strikes me that this is basically to_regtype() with the additional
> option to return the typmod.  That leads to some questions:

Huh. I saw it more on a par with format_type(), at least in the output I expect.

> * Should we choose a name related to to_regtype?  I don't have any
> immediate suggestions, but since you didn't seem entirely set on
> parse_type, maybe it's worth thinking along those lines.  OTOH,
> to the extent that people would use this with format_type, maybe
> parse_type is fine.

Yeah, and the `to_*` functions return a single OID, not two fields.

> * Perhaps the type OID output argument should be declared regtype
> not plain OID?  It wouldn't make any difference for passing it to
> format_type, but in other use-cases perhaps regtype would be more
> readable.  It's not a deal-breaker either way of course, since
> you can cast oid to regtype or vice versa.

Sure. I used `oid` because that’s exactly what the argument to format_type() is called, so thought the parity there
moreappropriate. Maybe its argument should be renamed to regtype? 

> * Maybe the code should be in adt/regproc.c not format_type.c.

Happy to move it wherever makes the most sense.

> * Experience with to_regtype suggests strongly that people would
> prefer "return NULL" to failure for an unrecognized type name.

Could do, but as with to_regtype() there’s an issue with error handling when it descends into the SQL parser.

> Skimming the patch, I notice that the manual addition to
> builtins.h should be unnecessary: the pg_proc.dat entry
> should be enough to create an extern in fmgrprotos.h.

Confirmed, will remove in next patch.

> Also, I'm pretty sure that reformat_dat_file.pl will
> think your pg_proc.dat entry is overly verbose.  See
> https://www.postgresql.org/docs/devel/system-catalog-initial-data.html

There are a bunch of things it reformats:

❯ git diff --name-only
src/include/catalog/pg_aggregate.dat
src/include/catalog/pg_database.dat
src/include/catalog/pg_proc.dat
src/include/catalog/pg_type.dat
src/include/utils/builtins.h

And even in pg_proc.dat there are 13 blocks it reformats. I can submit with just my block formatted if you’d prefer.

> BTW, another way that this problem could be approached is to use
> to_regtype() as-is, with a separate function to obtain the typmod:
>
> select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

Oh interesting.

> This is intellectually ugly, since it implies parsing the same
> typename string twice.  But on the other hand it avoids the notational
> pain and runtime overhead involved in using a record-returning
> function.  So I think it might be roughly a wash for performance.
> Question to think about is which way is easier to use.  I don't
> have an opinion particularly; just throwing the idea out there.

Honestly I haven’t cared for the fact that parse_type() returns a record; it makes it a bit clunky. But yeah, so does
thisto pass the same value to two function calls. 

Perhaps it makes sense to add to_regtypmod() as you suggest, but then also something like:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
    SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

Best,

David




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 12, 2024, at 15:55, David E. Wheeler <david@justatheory.com> wrote:

> Happy to move it wherever makes the most sense.

Update with a bunch of the suggested changes. Some questions still open from the previous post, though.

Best,

David


Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-12 19:20 +0100, Tom Lane wrote:
> I wrote:
> > It strikes me that this is basically to_regtype() with the additional
> > option to return the typmod.  That leads to some questions:
> 
> BTW, another way that this problem could be approached is to use
> to_regtype() as-is, with a separate function to obtain the typmod:
> 
> select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));
> 
> This is intellectually ugly, since it implies parsing the same
> typename string twice.  But on the other hand it avoids the notational
> pain and runtime overhead involved in using a record-returning
> function.  So I think it might be roughly a wash for performance.
> Question to think about is which way is easier to use.  I don't
> have an opinion particularly; just throwing the idea out there.

Out of curiosity, I benchmarked this with the attached to_regtypmod()
patch based on David's v5 applied to a6c21887a9.  The script running
pgbench and its output are included at the end.

Just calling parse_type() vs to_regtype()/to_regtypmod() is a wash for
performance as you thought.  But format_type() performs better with
to_regtypmod() than with parse_type().  Accessing the record fields
returned by parse_type() adds some overhead.

to_regtypmod() is better for our use case in pgTAP which relies on
format_type() to normalize the type name.  The implementation of
to_regtypmod() is also simpler than parse_type().  Usage-wise, both are
clunky IMO.

Benchmark script:

    #!/usr/bin/env bash
    
    set -eu
    
    cat <<'SQL' > parse_type.sql
    SELECT parse_type('interval second(0)');
    SQL
    
    cat <<'SQL' > parse_type_and_format.sql
    SELECT format_type(p.typid, p.typmod) FROM parse_type('interval second(0)') p;
    SQL
    
    cat <<'SQL' > to_regtypmod.sql
    SELECT to_regtype('interval second(0)'), to_regtypmod('interval second(0)');
    SQL
    
    cat <<'SQL' > to_regtypmod_and_format.sql
    SELECT format_type(to_regtype('interval second(0)'), to_regtypmod('interval second(0)'));
    SQL
    
    for f in \
        parse_type.sql \
        parse_type_and_format.sql \
        to_regtypmod.sql \
        to_regtypmod_and_format.sql
    do
        pgbench -n -f "$f" -T10 postgres
        echo
    done

pgbench output:

    pgbench (17devel)
    transaction type: parse_type.sql
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 10 s
    number of transactions actually processed: 277017
    number of failed transactions: 0 (0.000%)
    latency average = 0.036 ms
    initial connection time = 1.623 ms
    tps = 27706.005513 (without initial connection time)
    
    pgbench (17devel)
    transaction type: parse_type_and_format.sql
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 10 s
    number of transactions actually processed: 222487
    number of failed transactions: 0 (0.000%)
    latency average = 0.045 ms
    initial connection time = 1.603 ms
    tps = 22252.095670 (without initial connection time)
    
    pgbench (17devel)
    transaction type: to_regtypmod.sql
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 10 s
    number of transactions actually processed: 276134
    number of failed transactions: 0 (0.000%)
    latency average = 0.036 ms
    initial connection time = 1.570 ms
    tps = 27617.628259 (without initial connection time)
    
    pgbench (17devel)
    transaction type: to_regtypmod_and_format.sql
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 10 s
    number of transactions actually processed: 270820
    number of failed transactions: 0 (0.000%)
    latency average = 0.037 ms
    initial connection time = 1.631 ms
    tps = 27086.331104 (without initial connection time)

-- 
Erik

Attachment

Re: Patch: Add parse_type Function

From
Pavel Stehule
Date:
Hi

ne 18. 2. 2024 v 19:50 odesílatel Erik Wienhold <ewie@ewie.name> napsal:
On 2024-02-12 19:20 +0100, Tom Lane wrote:
> I wrote:
> > It strikes me that this is basically to_regtype() with the additional
> > option to return the typmod.  That leads to some questions:
>
> BTW, another way that this problem could be approached is to use
> to_regtype() as-is, with a separate function to obtain the typmod:
>
> select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));
>
> This is intellectually ugly, since it implies parsing the same
> typename string twice.  But on the other hand it avoids the notational
> pain and runtime overhead involved in using a record-returning
> function.  So I think it might be roughly a wash for performance.
> Question to think about is which way is easier to use.  I don't
> have an opinion particularly; just throwing the idea out there.

Out of curiosity, I benchmarked this with the attached to_regtypmod()
patch based on David's v5 applied to a6c21887a9.  The script running
pgbench and its output are included at the end.

Just calling parse_type() vs to_regtype()/to_regtypmod() is a wash for
performance as you thought.  But format_type() performs better with
to_regtypmod() than with parse_type().  Accessing the record fields
returned by parse_type() adds some overhead.

to_regtypmod() is better for our use case in pgTAP which relies on
format_type() to normalize the type name.  The implementation of
to_regtypmod() is also simpler than parse_type().  Usage-wise, both are
clunky IMO.

Benchmark script:

        #!/usr/bin/env bash

        set -eu

        cat <<'SQL' > parse_type.sql
        SELECT parse_type('interval second(0)');
        SQL

        cat <<'SQL' > parse_type_and_format.sql
        SELECT format_type(p.typid, p.typmod) FROM parse_type('interval second(0)') p;
        SQL

        cat <<'SQL' > to_regtypmod.sql
        SELECT to_regtype('interval second(0)'), to_regtypmod('interval second(0)');
        SQL

        cat <<'SQL' > to_regtypmod_and_format.sql
        SELECT format_type(to_regtype('interval second(0)'), to_regtypmod('interval second(0)'));
        SQL

        for f in \
            parse_type.sql \
            parse_type_and_format.sql \
            to_regtypmod.sql \
            to_regtypmod_and_format.sql
        do
            pgbench -n -f "$f" -T10 postgres
            echo
        done

pgbench output:

        pgbench (17devel)
        transaction type: parse_type.sql
        scaling factor: 1
        query mode: simple
        number of clients: 1
        number of threads: 1
        maximum number of tries: 1
        duration: 10 s
        number of transactions actually processed: 277017
        number of failed transactions: 0 (0.000%)
        latency average = 0.036 ms
        initial connection time = 1.623 ms
        tps = 27706.005513 (without initial connection time)

        pgbench (17devel)
        transaction type: parse_type_and_format.sql
        scaling factor: 1
        query mode: simple
        number of clients: 1
        number of threads: 1
        maximum number of tries: 1
        duration: 10 s
        number of transactions actually processed: 222487
        number of failed transactions: 0 (0.000%)
        latency average = 0.045 ms
        initial connection time = 1.603 ms
        tps = 22252.095670 (without initial connection time)

        pgbench (17devel)
        transaction type: to_regtypmod.sql
        scaling factor: 1
        query mode: simple
        number of clients: 1
        number of threads: 1
        maximum number of tries: 1
        duration: 10 s
        number of transactions actually processed: 276134
        number of failed transactions: 0 (0.000%)
        latency average = 0.036 ms
        initial connection time = 1.570 ms
        tps = 27617.628259 (without initial connection time)

        pgbench (17devel)
        transaction type: to_regtypmod_and_format.sql
        scaling factor: 1
        query mode: simple
        number of clients: 1
        number of threads: 1
        maximum number of tries: 1
        duration: 10 s
        number of transactions actually processed: 270820
        number of failed transactions: 0 (0.000%)
        latency average = 0.037 ms
        initial connection time = 1.631 ms
        tps = 27086.331104 (without initial connection time)

The overhead of parse_type_and_format can be related to higher planning time. PL/pgSQL can assign composite without usage FROM clause.

Regards

Pavel


--
Erik

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-18 20:00 +0100, Pavel Stehule wrote:
> The overhead of parse_type_and_format can be related to higher planning
> time. PL/pgSQL can assign composite without usage FROM clause.

Thanks, didn't know that this makes a difference.  In that case both
variants are on par.

    BEGIN;
    
    CREATE FUNCTION format_with_parse_type(text)
    RETURNS text
    LANGUAGE plpgsql
    STABLE STRICT
    AS $$
    DECLARE
        p record := parse_type($1);
    BEGIN
        RETURN format_type(p.typid, p.typmod);
    END
    $$;
    
    CREATE FUNCTION format_with_to_regtypmod(text)
    RETURNS text
    LANGUAGE plpgsql
    STABLE STRICT
    AS $$
    BEGIN
        RETURN format_type(to_regtype($1), to_regtypmod($1));
    END
    $$;
    
    COMMIT;

Results:

    SELECT format_with_parse_type('interval second(0)');

    pgbench (17devel)
    transaction type: format_with_parse_type.sql
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 10 s
    number of transactions actually processed: 253530
    number of failed transactions: 0 (0.000%)
    latency average = 0.039 ms
    initial connection time = 1.846 ms
    tps = 25357.551681 (without initial connection time)

    SELECT format_with_to_regtypmod('interval second(0)');

    pgbench (17devel)
    transaction type: format_with_to_regtypmod.sql
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 10 s
    number of transactions actually processed: 257942
    number of failed transactions: 0 (0.000%)
    latency average = 0.039 ms
    initial connection time = 1.544 ms
    tps = 25798.015526 (without initial connection time)

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 18, 2024, at 15:55, Erik Wienhold <ewie@ewie.name> wrote:

>> The overhead of parse_type_and_format can be related to higher planning
>> time. PL/pgSQL can assign composite without usage FROM clause.
>
> Thanks, didn't know that this makes a difference.  In that case both
> variants are on par.

Presumably this is a side-effect of the use of a RECORD here, which requires a FROM clause in pure SQL, yes?

The only way I can think of to avoid that is to:

1. Add a to_regtypmod() for those who just want the typemod.

2. Add a format_type_string() function that returns a string, the equivalent of this function but in C:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
   SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

We could also keep the record-returning function for users who want both the regtype and the regytypemod at once, but
withthe other two I consider it less essential. 

Thoughts?

Best,

David




Re: Patch: Add parse_type Function

From
Tom Lane
Date:
"David E. Wheeler" <david@justatheory.com> writes:
> The only way I can think of to avoid that is to:

> 1. Add a to_regtypmod() for those who just want the typemod.

Seems like there's a good case for doing that.

> 2. Add a format_type_string() function that returns a string, the equivalent of this function but in C:

> CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
>    SELECT format_type(to_regtype($1), to_regtypmod($1))
> $$;

I'm less thrilled about that, mainly because I can't think of
a good name for it ("format_type_string" is certainly not that).
Is the use-case for this functionality really strong enough that
we need to provide it as a single function rather than something
assembled out of spare parts?

            regards, tom lane



Re: Patch: Add parse_type Function

From
Tom Lane
Date:
I wrote:
> I'm less thrilled about that, mainly because I can't think of
> a good name for it ("format_type_string" is certainly not that).

After some time ruminating, a couple of possibilities occurred to
me:
    reformat_type(text)
    canonical_type_name(text)

> Is the use-case for this functionality really strong enough that
> we need to provide it as a single function rather than something
> assembled out of spare parts?

I'm still unconvinced about that, though.

            regards, tom lane



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 19, 2024, at 15:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> 1. Add a to_regtypmod() for those who just want the typemod.
>
> Seems like there's a good case for doing that.

I’ll work on that.

> I'm less thrilled about that, mainly because I can't think of
> a good name for it ("format_type_string" is certainly not that).
> Is the use-case for this functionality really strong enough that
> we need to provide it as a single function rather than something
> assembled out of spare parts?

Not essential for pgTAP, no, as we can just use the parts. It was the typmod bit that was missing.

On Feb 19, 2024, at 17:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> After some time ruminating, a couple of possibilities occurred to
> me:
> reformat_type(text)
> canonical_type_name(text)

I was just thinking about hitting the thesaurus entry for “canonical”, but I quite like reformat_type. It’s super clear
anddraws the parallel to format_type() more clearly. Will likely steal the name for pgTAP if we don’t add it to core.
:-)

> I'm still unconvinced about that, though.

I guess the questions are:

* What are the other use cases for it?
* How obvious is it how to do it?

For the latter, it could easily be an example in the docs.

Best,

David




Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-19 23:59 +0100, David E. Wheeler wrote:
> On Feb 19, 2024, at 15:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> >> 1. Add a to_regtypmod() for those who just want the typemod.
> > 
> > Seems like there's a good case for doing that.
> 
> I’ll work on that.

See the patch I wrote for my benchmarks.  But it's pretty easy anyway to
cut down parse_type() ;)

> > I'm less thrilled about that, mainly because I can't think of a good
> > name for it ("format_type_string" is certainly not that).  Is the
> > use-case for this functionality really strong enough that we need to
> > provide it as a single function rather than something assembled out
> > of spare parts?
> 
> Not essential for pgTAP, no, as we can just use the parts. It was the
> typmod bit that was missing.

But you don't actually need reformat_type() in pgTAP.  You can just get
the type OID and modifier of the want_type and have_type and compare
those.  Then use format_type() for the error message.  Looks a bit
cleaner to me than doing the string comparison.

On second thought, I guess comparing the reformatted type names is
necessary in order to have a uniform API on older Postgres releases
where pgTAP has to provide its own to_regtypmod() based on typmodin
functions.

> On Feb 19, 2024, at 17:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > After some time ruminating, a couple of possibilities occurred to
> > me: reformat_type(text) canonical_type_name(text)
> 
> I was just thinking about hitting the thesaurus entry for “canonical”,
> but I quite like reformat_type. It’s super clear and draws the
> parallel to format_type() more clearly. Will likely steal the name for
> pgTAP if we don’t add it to core. :-)
> 
> > I'm still unconvinced about that, though.
> 
> I guess the questions are:
> 
> * What are the other use cases for it?

Can't think of any right now other than this are-type-names-the-same
check.  Perhaps also for pretty-printing user-provided type names where
we don't take the type info from e.g. pg_attribute.

> * How obvious is it how to do it?
> 
> For the latter, it could easily be an example in the docs.

Can be mentioned right under format_type().

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 19, 2024, at 21:58, Erik Wienhold <ewie@ewie.name> wrote:

> See the patch I wrote for my benchmarks.  But it's pretty easy anyway to
> cut down parse_type() ;)

LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached.

> But you don't actually need reformat_type() in pgTAP.  You can just get
> the type OID and modifier of the want_type and have_type and compare
> those.  Then use format_type() for the error message.  Looks a bit
> cleaner to me than doing the string comparison.

Fair.

> On second thought, I guess comparing the reformatted type names is
> necessary in order to have a uniform API on older Postgres releases
> where pgTAP has to provide its own to_regtypmod() based on typmodin
> functions.

Maybe. Worth playing with.

>> For the latter, it could easily be an example in the docs.
> 
> Can be mentioned right under format_type().

Well I included it in the to_regtypemod docs here, but could so either.

Best,

David


Attachment

Re: Patch: Add parse_type Function

From
jian he
Date:
On Tue, Feb 20, 2024 at 11:06 AM David E. Wheeler <david@justatheory.com> wrote:
>
> LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached.
>

+SELECT to_regtypemod('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT to_regtypemod('year(4)'); -- grammar error expected
+ to_regtypemod
+---------------
+
+(1 row)
+

the second hint `-- grammar error expected` seems to contradict with
the results?



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 20, 2024, at 01:30, jian he <jian.universality@gmail.com> wrote:

> the second hint `-- grammar error expected` seems to contradict with
> the results?

Quite right, thank you, that’s actually a trapped error. I’ve tweaked the comments and their order in v7, attached.

This goes back to the discussion of the error raising of to_regtype[1], so I’ve incorporated the patch from that thread
intothis patch, and set up the docs for to_regtypemod() with similar information. The wording is still a little opaque
formy taste, though, written more for someone who knows a bit about the internals, but it’s a start. 

I’ve also fixed the wayward parameter in the function signature in the docs, and added a note about why I’ve also
patchedgenbki.pl. 

Best,

David

[1]
https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e


Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-20 15:44 +0100, David E. Wheeler wrote:
> I’ve tweaked the comments and their order in v7, attached.
> 
> This goes back to the discussion of the error raising of
> to_regtype[1], so I’ve incorporated the patch from that thread into
> this patch, and set up the docs for to_regtypemod() with similar
> information.

Makes sense.

> [1]
https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e

> -       <entry role="func_table_entry"><para role="func_signature">
> +       <entry id="format_type" role="func_table_entry"><para role="func_signature">
>          <indexterm>
>           <primary>format_type</primary>
>          </indexterm>
> @@ -25462,7 +25462,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
>        </row>
>  
>        <row>
> -       <entry role="func_table_entry"><para role="func_signature">
> +       <entry id="to_regtype" role="func_table_entry"><para role="func_signature">

The references are printed as "???" right now.  Can be fixed with
xreflabel="format_type" and xreflabel="to_regtype" on those <entry>
elements.

> +        <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )

The docs show parameter name "type" but pg_proc.data does not define
proargnames.  So the to_regtypemod() cannot be called using named
notation:

    test=> select to_regtypemod(type => 'int'::text);
    ERROR:  function to_regtypemod(type => text) does not exist

> +        Parses a string of text, extracts a potential type name from it, and
> +        translates its typmod, the modifier for the type, if any. Failure to

s/typmod, the modifier of the type/type modifier/

Because format_type() already uses "type modifier" and I think it helps
searchability to use the same wording.

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 20, 2024, at 21:09, Erik Wienhold <ewie@ewie.name> wrote:

> The references are printed as "???" right now.  Can be fixed with
> xreflabel="format_type" and xreflabel="to_regtype" on those <entry>
> elements.

Thanks.

> The docs show parameter name "type" but pg_proc.data does not define
> proargnames.  So the to_regtypemod() cannot be called using named
> notation:
>
> test=> select to_regtypemod(type => 'int'::text);
> ERROR:  function to_regtypemod(type => text) does not exist

Yes, this format is identical to that of to_regtype():

david=# select to_regtype(type => 'int'::text);
ERROR:  function to_regtype(type => text) does not exist

> +        Parses a string of text, extracts a potential type name from it, and
>> +        translates its typmod, the modifier for the type, if any. Failure to
>
> s/typmod, the modifier of the type/type modifier/
>
> Because format_type() already uses "type modifier" and I think it helps
> searchability to use the same wording.

Yes, definitely better wording, thanks. V8 attached.

Best,

David




Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-21 16:14 +0100, David E. Wheeler wrote:
>
> V8 attached.

Thanks.  But it's an applefile again, not a patch :P

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 21, 2024, at 11:18, Erik Wienhold <ewie@ewie.name> wrote:

> Thanks.  But it's an applefile again, not a patch :P

WTF. I still have that feature disabled.

Oh, I think I deleted the file after dragged it into Mail but before sending, because it’s empty everywhere I look.
🤦🏻‍♂️Let’s try that again. 

Best,

David

Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-21 17:51 +0100, David E. Wheeler wrote:
> On Feb 21, 2024, at 11:18, Erik Wienhold <ewie@ewie.name> wrote:
> 
> > Thanks.  But it's an applefile again, not a patch :P
> 
> Let’s try that again.

Thanks.

> +        <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )

The docs still state that to_regtypemod() has a named parameter, which
is not the case per pg_proc.dat.

Besides that, LGTM.

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 21, 2024, at 16:43, Erik Wienhold <ewie@ewie.name> wrote:

> The docs still state that to_regtypemod() has a named parameter, which
> is not the case per pg_proc.dat.

Bah, I cleaned it up before but somehow put it back. Thanks for the catch. Fixed.

Best,

David


Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-02-21 22:49 +0100, David E. Wheeler wrote:
> On Feb 21, 2024, at 16:43, Erik Wienhold <ewie@ewie.name> wrote:
> 
> > The docs still state that to_regtypemod() has a named parameter, which
> > is not the case per pg_proc.dat.
> 
> Bah, I cleaned it up before but somehow put it back. Thanks for the
> catch. Fixed.

Thanks David!  LGTM.

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 21, 2024, at 17:19, Erik Wienhold <ewie@ewie.name> wrote:

> Thanks David!  LGTM.

Thanks. Anyone else? Or is it ready for committer?

Best,

David




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 21, 2024, at 19:13, David E. Wheeler <david@justatheory.com> wrote:

> Thanks. Anyone else? Or is it ready for committer?

What’s the protocol for marking a patch ready for committer?

Thanks,

David




Re: Patch: Add parse_type Function

From
Jim Jones
Date:

On 24.02.24 14:46, David E. Wheeler wrote:
> What’s the protocol for marking a patch ready for committer?
I guess after the review of the last assigned reviewer


v9 applies cleanly, all tests pass and documentation builds correctly.

Just a very small observation:

The fact that a completely invalid type returns NULL ..

SELECT to_regtypemod('foo');
 to_regtypemod
---------------
              
(1 row)


.. but a "partially" valid one returns an error might be confusing

postgres=# SELECT to_regtypemod('timestamp(-4)');
ERROR:  syntax error at or near "-"
LINE 1: SELECT to_regtypemod('timestamp(-4)');
                  ^
CONTEXT:  invalid type name "timestamp(-4)"

postgres=# SELECT to_regtypemod('text(-4)');
ERROR:  type modifier is not allowed for type "text"


This behaviour is mentioned in the documentation, so I'd say it is ok.

+        <xref linkend="datatype-oid"/>). Failure to extract a valid
potential
+        type name results in an error; however, if the extracted name
is not
+        known to the system, this function will return
<literal>NULL</literal>.

I would personally prefer either NULL or an error in both cases, but I
can totally live with the current design.

OTOH, format_type returns "???" and it seems to be fine, so don't take
this comment too seriously :)

SELECT format_type(-1,-1);
 format_type
-------------
 ???
(1 row)


Other than that, LGTM.

Thanks David!

Best,
Jim

-- 
Jim




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Feb 24, 2024, at 19:11, Jim Jones <jim.jones@uni-muenster.de> wrote:

>> What’s the protocol for marking a patch ready for committer?
>
> I guess after the review of the last assigned reviewer

Oh, I didn’t realize someone was assigned. :-)

> The fact that a completely invalid type returns NULL ..
>
> SELECT to_regtypemod('foo');
>  to_regtypemod
> ---------------
>
> (1 row)
>
>
> .. but a "partially" valid one returns an error might be confusing
>
> postgres=# SELECT to_regtypemod('timestamp(-4)');
> ERROR:  syntax error at or near "-"
> LINE 1: SELECT to_regtypemod('timestamp(-4)');
>                   ^
> CONTEXT:  invalid type name "timestamp(-4)"
>
> postgres=# SELECT to_regtypemod('text(-4)');
> ERROR:  type modifier is not allowed for type "text"

Yeah, there was quite a bit of discussion of this issue back in September[1].

> This behaviour is mentioned in the documentation, so I'd say it is ok.

This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH.

> I would personally prefer either NULL or an error in both cases, but I
> can totally live with the current design.

SAME.

Best,

David

[1] https://www.postgresql.org/message-id/flat/09F9CAD6-5096-43CC-B6A7-685703E4714D@justatheory.com




Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
Hello Hackers,

On Feb 25, 2024, at 13:00, David E. Wheeler <david@justatheory.com> wrote:

>> postgres=# SELECT to_regtypemod('timestamp(-4)');
>> ERROR:  syntax error at or near "-"
>> LINE 1: SELECT to_regtypemod('timestamp(-4)');
>>                  ^
>> CONTEXT:  invalid type name "timestamp(-4)"
>>
>> postgres=# SELECT to_regtypemod('text(-4)');
>> ERROR:  type modifier is not allowed for type "text"
>
> Yeah, there was quite a bit of discussion of this issue back in September[1].
>
>> This behaviour is mentioned in the documentation, so I'd say it is ok.
>
> This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH.

I’ve rebased the patch and, in an attempt to clarify this behavior, added a couple of examples to the docs for
to_regtype.Updated patch attached. 

Best,

David


Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
Hi David,

On 2024-03-08 02:37 +0100, David E. Wheeler wrote:
> I’ve rebased the patch and, in an attempt to clarify this behavior,
> added a couple of examples to the docs for to_regtype. Updated patch
> attached.

On your latest addition:

> +        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
> +        type name results in an error. For example:
> +<programlisting>
> +SELECT to_regtype('party');
> + to_regtype
> +------------
> +
> +</programlisting>
> +        However, if the extracted name is not known to the system, this function
> +        will return <literal>NULL</literal>. For example:
> +<programlisting>
> +SELECT to_regtype('interval nonesuch');
> +ERROR:  syntax error at or near "nonesuch"
> +LINE 1: select to_regtype('interval nonesuch');
> +                 ^
> +CONTEXT:  invalid type name "interval nonesuch"
> +</programlisting>

I think you need to swap the examples.  The text mentions the error case
first and the NULL case second.

-- 
Erik



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Mar 7, 2024, at 23:39, Erik Wienhold <ewie@ewie.name> wrote:

> I think you need to swap the examples.  The text mentions the error case
> first and the NULL case second.

🤦🏻‍♂️

Thanks, fixed in the attached patch.

David


Attachment

Re: Patch: Add parse_type Function

From
Erik Wienhold
Date:
On 2024-03-09 02:42 +0100, David E. Wheeler wrote:
> On Mar 7, 2024, at 23:39, Erik Wienhold <ewie@ewie.name> wrote:
> 
> > I think you need to swap the examples.  The text mentions the error case
> > first and the NULL case second.
> 
> 🤦🏻‍♂️
> 
> Thanks, fixed in the attached patch.

Thanks, LGTM.

-- 
Erik



Re: Patch: Add parse_type Function

From
Tom Lane
Date:
"David E. Wheeler" <david@justatheory.com> writes:
> Thanks, fixed in the attached patch.

Pushed with some editorialization.  Mostly, I whacked the
documentation around pretty heavily: we have a convention for what
examples in function descriptions should look like, and this wasn't
it.  Not entirely your fault, since some nearby entries in that
table hadn't gotten the word either.

Also, I made a point of adding tests for a corner case that
I initially thought the patch would not get right:

SELECT format_type(to_regtype('bit'), to_regtypemod('bit'));
 format_type 
-------------
 bit(1)
(1 row)

SELECT format_type(to_regtype('"bit"'), to_regtypemod('"bit"'));
 format_type 
-------------
 "bit"
(1 row)

This comes from the comment in format_type():

                /*
                 * bit with typmod -1 is not the same as BIT, which means
                 * BIT(1) per SQL spec.  Report it as the quoted typename so
                 * that parser will not assign a bogus typmod.
                 */

My initial fear was that we'd have to emit NULL for no typmod in order
to round-trip the second case, but it seems to work as-is, so that's good.
I left it emitting -1 for no typmod, and documented that explicitly.

            regards, tom lane



Re: Patch: Add parse_type Function

From
"David E. Wheeler"
Date:
On Mar 20, 2024, at 17:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Pushed with some editorialization.  Mostly, I whacked the
> documentation around pretty heavily: we have a convention for what
> examples in function descriptions should look like, and this wasn't
> it.  Not entirely your fault, since some nearby entries in that
> table hadn't gotten the word either.

Ah, great, and your wording on the parser error issue is much better, thank you!

Best,

David