Thread: Patch: Add parse_type Function
Hackers, Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid andtypmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s parseTypeString()function. The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as renderedby, for example, pg_dump. An example: david=# WITH s(s) AS ( SELECT * FROM unnest(ARRAY[ 'timestamp(4)', 'interval(0)', 'interval second(0)', 'timestamptz', 'timestamptz(6)', 'varchar', 'varchar(128)' ]) ), p(type, typid, typmod) AS ( SELECT s, ((parse_type(s)).*) FROM s ) SELECT type, format_type(typid, typmod) FROM p; type | format_type --------------------+-------------------------------- timestamp(4) | timestamp(4) without time zone interval(0) | interval(0) interval second(0) | interval second(0) timestamptz | timestamp with time zone timestamptz(6) | timestamp(6) with time zone varchar | character varying varchar(128) | character varying(128) (7 rows) The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names fortesting data types, but recently added support for aliases. This broke for some types, because it was difficult to extractthe typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)` isthe typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to `format_type()`. Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch. Potential issues and questions for this patch: * Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`.Whatever the consensus is works for me. * The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that justcontains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think itmight also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as`format_type()`’s arguments. * Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), buthappy to move it to a more appropriate location. * I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sureit resolves. Best, David
Attachment
Hi
ne 4. 2. 2024 v 18:51 odesílatel David E. Wheeler <david@justatheory.com> napsal:
Hackers,
Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid and typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s parseTypeString() function.
The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as rendered by, for example, pg_dump. An example:
david=# WITH s(s) AS (
SELECT * FROM unnest(ARRAY[
'timestamp(4)',
'interval(0)',
'interval second(0)',
'timestamptz',
'timestamptz(6)',
'varchar',
'varchar(128)'
])
),
p(type, typid, typmod) AS (
SELECT s, ((parse_type(s)).*)
FROM s
)
SELECT type, format_type(typid, typmod) FROM p;
type | format_type
--------------------+--------------------------------
timestamp(4) | timestamp(4) without time zone
interval(0) | interval(0)
interval second(0) | interval second(0)
timestamptz | timestamp with time zone
timestamptz(6) | timestamp(6) with time zone
varchar | character varying
varchar(128) | character varying(128)
(7 rows)
The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names for testing data types, but recently added support for aliases. This broke for some types, because it was difficult to extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to `format_type()`.
Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch.
Potential issues and questions for this patch:
* Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. Whatever the consensus is works for me.
there can be an analogy with parse_ident, so parse_type looks ok
* The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that just contains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think it might also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as `format_type()`’s arguments.
I think so record has sense for this case
* Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), but happy to move it to a more appropriate location.
yes
* I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sure it resolves.
I thinks so proposed functionality can be useful
Regards
Pavel
Best,
David
On Feb 4, 2024, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I thinks so proposed functionality can be useful Great, thank you! Is that a review? :-) D
ne 4. 2. 2024 v 19:30 odesílatel David E. Wheeler <david@justatheory.com> napsal:
On Feb 4, 2024, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I thinks so proposed functionality can be useful
Great, thank you!
Is that a review? :-)
not yet, but I'll do it
Pavel
D
On Feb 4, 2024, at 13:52, Pavel Stehule <pavel.stehule@gmail.com> wrote: > not yet, but I'll do it Nice, thank you. I put it into the Commitfest. https://commitfest.postgresql.org/47/4807/ Best, David
Hi David, Thanks for the patch. On 04.02.24 18:51, David E. Wheeler wrote: > Hackers, > > Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typidand typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’sparseTypeString() function. > > The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as renderedby, for example, pg_dump. An example: > > david=# WITH s(s) AS ( > SELECT * FROM unnest(ARRAY[ > 'timestamp(4)', > 'interval(0)', > 'interval second(0)', > 'timestamptz', > 'timestamptz(6)', > 'varchar', > 'varchar(128)' > ]) > ), > p(type, typid, typmod) AS ( > SELECT s, ((parse_type(s)).*) > FROM s > ) > SELECT type, format_type(typid, typmod) FROM p; > type | format_type > --------------------+-------------------------------- > timestamp(4) | timestamp(4) without time zone > interval(0) | interval(0) > interval second(0) | interval second(0) > timestamptz | timestamp with time zone > timestamptz(6) | timestamp(6) with time zone > varchar | character varying > varchar(128) | character varying(128) > (7 rows) > > > The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type namesfor testing data types, but recently added support for aliases. This broke for some types, because it was difficultto extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second(0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to`format_type()`. > > Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch. > > Potential issues and questions for this patch: > > * Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`.Whatever the consensus is works for me. > > * The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that justcontains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think itmight also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as`format_type()`’s arguments. > > * Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), buthappy to move it to a more appropriate location. > > * I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sureit resolves. > > Best, > > David > The patch applies cleanly Checking patch doc/src/sgml/func.sgml... Checking patch src/backend/utils/adt/format_type.c... Checking patch src/include/catalog/pg_proc.dat... Checking patch src/include/utils/builtins.h... Checking patch src/test/regress/expected/create_type.out... Checking patch src/test/regress/sql/create_type.sql... Applied patch doc/src/sgml/func.sgml cleanly. Applied patch src/backend/utils/adt/format_type.c cleanly. Applied patch src/include/catalog/pg_proc.dat cleanly. Applied patch src/include/utils/builtins.h cleanly. Applied patch src/test/regress/expected/create_type.out cleanly. Applied patch src/test/regress/sql/create_type.sql cleanly. The docs render just fine and all tests pass (check and check-world). There are a few issues though: 1) The function causes a crash when called with a NULL parameter: SELECT * FROM parse_type(NULL); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The connection to the server was lost. Attempting reset: Failed. You have to check it in the beginning of function and either return an error message or just NULL, e.g if (PG_ARGISNULL(0)) PG_RETURN_NULL(); 2) Add a function call with NULL in the regression tests SELECT * FROM parse_type(NULL); 3) Add the expected behaviour of calling the function with NULL in the docs (error message or null) 4) The name of the returned columns is repeated in the docs > Returns a record with two fields, <parameter>typid</parameter> and <parameter>typid</parameter> -- Jim
On Feb 5, 2024, at 08:02, Jim Jones <jim.jones@uni-muenster.de> wrote: > There are a few issues though: Excellent, thank you very much! Updated patch attached. Best, David
Attachment
Jim Jones <jim.jones@uni-muenster.de> writes: > 1) The function causes a crash when called with a NULL parameter: > > SELECT * FROM parse_type(NULL); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > The connection to the server was lost. Attempting reset: Failed. > > You have to check it in the beginning of function and either return an > error message or just NULL, e.g > > if (PG_ARGISNULL(0)) > PG_RETURN_NULL(); Instead of handling this in the function body, the function should be declared as strict. This is in fact the default in pg_proc.h, /* strict with respect to NULLs? */ bool proisstrict BKI_DEFAULT(t); so removing the explicit proisstrict => 'f' from the pg_proc.dat entry will fix it with no additional code. > 2) Add a function call with NULL in the regression tests > > SELECT * FROM parse_type(NULL); > > 3) Add the expected behaviour of calling the function with NULL in the > docs (error message or null) Once the function is declared strict, I don't think either of these is necessary: function strictness is tested elsewhere, and it's the default behaviour. The only functions that explicitly say they return NULL on NULL inputs are quote_literal (because you might expect it to return the string 'NULL', but there's qoute_nullable for that) and xmlexists (which I don't see any particular reason for). - ilmari
On 05.02.24 15:10, David E. Wheeler wrote: > Excellent, thank you very much! Updated patch attached. > > Best, > > David v2 no longer crashes with a null parameter. docs and regress tests were updated accordingly. patch no longer applies cleanly (tiny little indentation issue): /home/jim/Downloads/v2-0001-Add-parse_type-SQL-function.patch:140: indent with spaces. PG_RETURN_NULL(); warning: 1 line adds whitespace errors. I read the comments again, and something is not entirely clear to me. Line 494 says "Raises an error on an invalid type." and 501 says "Returns NULL for an invalid type." Perhaps merging both comment blocks and rephrasing these sentences would make things clearer? -- Jim
On 05.02.24 15:32, Dagfinn Ilmari Mannsåker wrote: > Once the function is declared strict, I don't think either of these is > necessary: function strictness is tested elsewhere, and it's the default > behaviour. The only functions that explicitly say they return NULL on > NULL inputs are quote_literal (because you might expect it to return the > string 'NULL', but there's qoute_nullable for that) and xmlexists (which > I don't see any particular reason for). > > - ilmari > > +1 Yes, if the function was strict (which in the current design is not the case) there is no need to check for nulls. -- Jim
On Feb 5, 2024, at 09:49, Jim Jones <jim.jones@uni-muenster.de> wrote: > Yes, if the function was strict (which in the current design is not the > case) there is no need to check for nulls. Right, done, and cleaned up the redundant comments. Best, David
Attachment
On 05.02.24 16:14, David E. Wheeler wrote: > Right, done, and cleaned up the redundant comments. > > Best, > > David Nice. The patch applies cleanly and it no longer crashes with NULL parameters. Docs render fine and regression tests pass. I have nothing more to add. Let's now wait for Pavel's review. Thanks! -- Jim
On Feb 5, 2024, at 10:47 AM, Jim Jones <jim.jones@uni-muenster.de> wrote: > The patch applies cleanly and it no longer crashes with NULL parameters. > Docs render fine and regression tests pass. I have nothing more to add. > Let's now wait for Pavel's review. Much obliged! D
+ /* + * Parse type-name argument to obtain type OID and encoded typmod. We don't + * need to check for parseTypeString failure, but just let the error be + * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16 + * and the `bool missing_ok` arg in 9.4-15. + */ + (void) parseTypeString(type, &typid, &typmod, 0); if you are wondering around other code deal with NULL, ErrorSaveContext. NULL would be more correct? `(void) parseTypeString(type, &typid, &typmod, NULL);` also parseTypeString already explained the third argument, our comments can be simplified as: /* * Parse type-name argument to obtain type OID and encoded typmod. We don't * need to handle parseTypeString failure, just let the error be * raised. */ cosmetic issue. Looking around other error handling code, the format should be something like: ` if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("function returning record called in" "context that cannot accept type record"))); ` `PG_FUNCTION_INFO_V1(parse_type);` is unnecessary? based on the error information: https://cirrus-ci.com/task/5899457502380032 not sure why it only fails on windows. +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */ +#undef PARSE_TYPE_STRING_COLS Are these necessary? given that the comments already say return the OID and type modifier. + ( <parameter>typid</parameter> <type>oid</type>, + <parameter>typmod</parameter> <type>int4</type> ) here `int4` should be `integer`? commit message: `Also accounts for data typs that require the SQL grammar to be parsed:` except the typo issue, this sentence is still hard for me to understand. The `parse_type()` function uses the underlying `parseTypeString()` C function to parse a string representing a data type into a type ID and typmod suitabld for passing to `format_type()`. suitabld should be suitable. + <para> + Parses a string representing an SQL type declaration as used in a + <command>CREATE TABLE</command> statement, optionally schema-qualified. + Returns a record with two fields, <parameter>typid</parameter> and + <parameter>typmod</parameter>, representing the OID and modifier for the + type. These correspond to the parameters to pass to the + <link linkend="format_type"><function>format_type</function> function.</link> + </para> can be simplified: + <para> + Parses a string representing an SQL data type, optionally schema-qualified. + Returns a record with two fields, <parameter>typid</parameter> and + <parameter>typmod</parameter>, representing the OID and modifier for the + type. These correspond to the parameters to pass to the + <link linkend="format_type"><function>format_type</function> function.</link> + </para> (I don't have a strong opinion though)
Let me comment on some issues since I wrote the very first version of parse_type() on which David's patch is based. > On 2024-02-01 01:00 +0100, jian he <jian.universality@gmail.com> wrote: > > cosmetic issue. Looking around other error handling code, the format > should be something like: > ` > if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("function returning record called in" > "context that cannot accept type record"))); > ` +1 > `PG_FUNCTION_INFO_V1(parse_type);` > is unnecessary? > based on the error information: https://cirrus-ci.com/task/5899457502380032 > not sure why it only fails on windows. Yeah, it's not necessary for internal functions per [1]. It's a leftover from when this function was part of the pgtap extension. > +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */ > +#undef PARSE_TYPE_STRING_COLS > Are these necessary? > given that the comments already say return the OID and type modifier. Not sure what the convention here is but I found this in a couple of places, maybe even a tutorial on writing C functions. See `git grep '_COLS\s\+[1-9]'` for those instances. It's in line with my habit of avoiding magic constants. > + ( <parameter>typid</parameter> <type>oid</type>, > + <parameter>typmod</parameter> <type>int4</type> ) > here `int4` should be `integer`? +1 > commit message: > `Also accounts for data typs that require the SQL grammar to be parsed:` > except the typo issue, this sentence is still hard for me to understand. Yes, the sentence is rather handwavy. What is meant here is that this function also resolves types whose typmod is determined by the SQL parser or some step after that. There are types whose typmod is whatever value is found inside the parenthesis, e.g. bit(13) has typmod 13. Our first idea before coming up with that function was to do some string manipulation and extract the typmod from inside the parenthesis. But we soon found out that other typmods don't translate one to one, e.g. varchar(13) has typmod 17. The interval type is also special because the typmod is some bitmask encoding of e.g. 'second(0)'. Hence the mention of the SQL grammar. > + <para> > + Parses a string representing an SQL type declaration as used in a > + <command>CREATE TABLE</command> statement, optionally schema-qualified. > + Returns a record with two fields, <parameter>typid</parameter> and > + <parameter>typmod</parameter>, representing the OID and > modifier for the > + type. These correspond to the parameters to pass to the > + <link linkend="format_type"><function>format_type</function> > function.</link> > + </para> > > can be simplified: > + <para> > + Parses a string representing an SQL data type, optionally > schema-qualified. > + Returns a record with two fields, <parameter>typid</parameter> and > + <parameter>typmod</parameter>, representing the OID and > modifier for the > + type. These correspond to the parameters to pass to the > + <link linkend="format_type"><function>format_type</function> > function.</link> > + </para> > (I don't have a strong opinion though) Sounds better. The CREATE TABLE reference is superfluous. [1] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-V1-CALL-CONV -- Erik
On Feb 10, 2024, at 20:52, Erik Wienhold <ewie@ewie.name> wrote: > > Let me comment on some issues since I wrote the very first version of > parse_type() on which David's patch is based. Thanks Erik. >> On 2024-02-01 01:00 +0100, jian he <jian.universality@gmail.com> wrote: >> if you are wondering around other code deal with NULL, ErrorSaveContext. >> NULL would be more correct? >> `(void) parseTypeString(type, &typid, &typmod, NULL);` Fixed. >> also parseTypeString already explained the third argument, our >> comments can be simplified as: >> /* >> * Parse type-name argument to obtain type OID and encoded typmod. We don't >> * need to handle parseTypeString failure, just let the error be >> * raised. >> */ Thanks, adopted that language. >> cosmetic issue. Looking around other error handling code, the format >> should be something like: >> ` >> if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("function returning record called in" >> "context that cannot accept type record"))); >> ` > > +1 I poked around and found some other examples, yes. I went with a single long line for errmsg following the example in adminpack.c[1] >> `PG_FUNCTION_INFO_V1(parse_type);` >> is unnecessary? >> based on the error information: https://cirrus-ci.com/task/5899457502380032 >> not sure why it only fails on windows. > > Yeah, it's not necessary for internal functions per [1]. It's a > leftover from when this function was part of the pgtap extension. Removed. >> +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */ >> +#undef PARSE_TYPE_STRING_COLS >> Are these necessary? >> given that the comments already say return the OID and type modifier. > > Not sure what the convention here is but I found this in a couple of > places, maybe even a tutorial on writing C functions. See > `git grep '_COLS\s\+[1-9]'` for those instances. It's in line with my > habit of avoiding magic constants. Left in place for now. > >> + ( <parameter>typid</parameter> <type>oid</type>, >> + <parameter>typmod</parameter> <type>int4</type> ) >> here `int4` should be `integer`? > > +1 Fixed. > Yes, the sentence is rather handwavy. What is meant here is that this > function also resolves types whose typmod is determined by the SQL > parser or some step after that. There are types whose typmod is > whatever value is found inside the parenthesis, e.g. bit(13) has typmod > 13. Our first idea before coming up with that function was to do some > string manipulation and extract the typmod from inside the parenthesis. > But we soon found out that other typmods don't translate one to one, > e.g. varchar(13) has typmod 17. The interval type is also special > because the typmod is some bitmask encoding of e.g. 'second(0)'. Hence > the mention of the SQL grammar. I adopted some of your language here --- and fixed the spelling errors :-) >> can be simplified: >> + <para> >> + Parses a string representing an SQL data type, optionally >> schema-qualified. >> + Returns a record with two fields, <parameter>typid</parameter> and >> + <parameter>typmod</parameter>, representing the OID and >> modifier for the >> + type. These correspond to the parameters to pass to the >> + <link linkend="format_type"><function>format_type</function> >> function.</link> >> + </para> >> (I don't have a strong opinion though) > > Sounds better. The CREATE TABLE reference is superfluous. Done. Best, David [1] https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/contrib/adminpack/adminpack.c#L508-L511
Attachment
"David E. Wheeler" <david@justatheory.com> writes: > [ v4-0001-Add-parse_type-SQL-function.patch ] It strikes me that this is basically to_regtype() with the additional option to return the typmod. That leads to some questions: * Should we choose a name related to to_regtype? I don't have any immediate suggestions, but since you didn't seem entirely set on parse_type, maybe it's worth thinking along those lines. OTOH, to the extent that people would use this with format_type, maybe parse_type is fine. * Perhaps the type OID output argument should be declared regtype not plain OID? It wouldn't make any difference for passing it to format_type, but in other use-cases perhaps regtype would be more readable. It's not a deal-breaker either way of course, since you can cast oid to regtype or vice versa. * Maybe the code should be in adt/regproc.c not format_type.c. * Experience with to_regtype suggests strongly that people would prefer "return NULL" to failure for an unrecognized type name. Skimming the patch, I notice that the manual addition to builtins.h should be unnecessary: the pg_proc.dat entry should be enough to create an extern in fmgrprotos.h. Also, I'm pretty sure that reformat_dat_file.pl will think your pg_proc.dat entry is overly verbose. See https://www.postgresql.org/docs/devel/system-catalog-initial-data.html regards, tom lane
I wrote: > It strikes me that this is basically to_regtype() with the additional > option to return the typmod. That leads to some questions: BTW, another way that this problem could be approached is to use to_regtype() as-is, with a separate function to obtain the typmod: select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)')); This is intellectually ugly, since it implies parsing the same typename string twice. But on the other hand it avoids the notational pain and runtime overhead involved in using a record-returning function. So I think it might be roughly a wash for performance. Question to think about is which way is easier to use. I don't have an opinion particularly; just throwing the idea out there. regards, tom lane
po 12. 2. 2024 v 19:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
I wrote:
> It strikes me that this is basically to_regtype() with the additional
> option to return the typmod. That leads to some questions:
BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:
select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));
This is intellectually ugly, since it implies parsing the same
typename string twice. But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function. So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use. I don't
have an opinion particularly; just throwing the idea out there.
+1
Pavel
regards, tom lane
On Feb 12, 2024, at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David E. Wheeler" <david@justatheory.com> writes: >> [ v4-0001-Add-parse_type-SQL-function.patch ] > > It strikes me that this is basically to_regtype() with the additional > option to return the typmod. That leads to some questions: Huh. I saw it more on a par with format_type(), at least in the output I expect. > * Should we choose a name related to to_regtype? I don't have any > immediate suggestions, but since you didn't seem entirely set on > parse_type, maybe it's worth thinking along those lines. OTOH, > to the extent that people would use this with format_type, maybe > parse_type is fine. Yeah, and the `to_*` functions return a single OID, not two fields. > * Perhaps the type OID output argument should be declared regtype > not plain OID? It wouldn't make any difference for passing it to > format_type, but in other use-cases perhaps regtype would be more > readable. It's not a deal-breaker either way of course, since > you can cast oid to regtype or vice versa. Sure. I used `oid` because that’s exactly what the argument to format_type() is called, so thought the parity there moreappropriate. Maybe its argument should be renamed to regtype? > * Maybe the code should be in adt/regproc.c not format_type.c. Happy to move it wherever makes the most sense. > * Experience with to_regtype suggests strongly that people would > prefer "return NULL" to failure for an unrecognized type name. Could do, but as with to_regtype() there’s an issue with error handling when it descends into the SQL parser. > Skimming the patch, I notice that the manual addition to > builtins.h should be unnecessary: the pg_proc.dat entry > should be enough to create an extern in fmgrprotos.h. Confirmed, will remove in next patch. > Also, I'm pretty sure that reformat_dat_file.pl will > think your pg_proc.dat entry is overly verbose. See > https://www.postgresql.org/docs/devel/system-catalog-initial-data.html There are a bunch of things it reformats: ❯ git diff --name-only src/include/catalog/pg_aggregate.dat src/include/catalog/pg_database.dat src/include/catalog/pg_proc.dat src/include/catalog/pg_type.dat src/include/utils/builtins.h And even in pg_proc.dat there are 13 blocks it reformats. I can submit with just my block formatted if you’d prefer. > BTW, another way that this problem could be approached is to use > to_regtype() as-is, with a separate function to obtain the typmod: > > select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)')); Oh interesting. > This is intellectually ugly, since it implies parsing the same > typename string twice. But on the other hand it avoids the notational > pain and runtime overhead involved in using a record-returning > function. So I think it might be roughly a wash for performance. > Question to think about is which way is easier to use. I don't > have an opinion particularly; just throwing the idea out there. Honestly I haven’t cared for the fact that parse_type() returns a record; it makes it a bit clunky. But yeah, so does thisto pass the same value to two function calls. Perhaps it makes sense to add to_regtypmod() as you suggest, but then also something like: CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$ SELECT format_type(to_regtype($1), to_regtypmod($1)) $$; Best, David
On Feb 12, 2024, at 15:55, David E. Wheeler <david@justatheory.com> wrote: > Happy to move it wherever makes the most sense. Update with a bunch of the suggested changes. Some questions still open from the previous post, though. Best, David
Attachment
On 2024-02-12 19:20 +0100, Tom Lane wrote: > I wrote: > > It strikes me that this is basically to_regtype() with the additional > > option to return the typmod. That leads to some questions: > > BTW, another way that this problem could be approached is to use > to_regtype() as-is, with a separate function to obtain the typmod: > > select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)')); > > This is intellectually ugly, since it implies parsing the same > typename string twice. But on the other hand it avoids the notational > pain and runtime overhead involved in using a record-returning > function. So I think it might be roughly a wash for performance. > Question to think about is which way is easier to use. I don't > have an opinion particularly; just throwing the idea out there. Out of curiosity, I benchmarked this with the attached to_regtypmod() patch based on David's v5 applied to a6c21887a9. The script running pgbench and its output are included at the end. Just calling parse_type() vs to_regtype()/to_regtypmod() is a wash for performance as you thought. But format_type() performs better with to_regtypmod() than with parse_type(). Accessing the record fields returned by parse_type() adds some overhead. to_regtypmod() is better for our use case in pgTAP which relies on format_type() to normalize the type name. The implementation of to_regtypmod() is also simpler than parse_type(). Usage-wise, both are clunky IMO. Benchmark script: #!/usr/bin/env bash set -eu cat <<'SQL' > parse_type.sql SELECT parse_type('interval second(0)'); SQL cat <<'SQL' > parse_type_and_format.sql SELECT format_type(p.typid, p.typmod) FROM parse_type('interval second(0)') p; SQL cat <<'SQL' > to_regtypmod.sql SELECT to_regtype('interval second(0)'), to_regtypmod('interval second(0)'); SQL cat <<'SQL' > to_regtypmod_and_format.sql SELECT format_type(to_regtype('interval second(0)'), to_regtypmod('interval second(0)')); SQL for f in \ parse_type.sql \ parse_type_and_format.sql \ to_regtypmod.sql \ to_regtypmod_and_format.sql do pgbench -n -f "$f" -T10 postgres echo done pgbench output: pgbench (17devel) transaction type: parse_type.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 277017 number of failed transactions: 0 (0.000%) latency average = 0.036 ms initial connection time = 1.623 ms tps = 27706.005513 (without initial connection time) pgbench (17devel) transaction type: parse_type_and_format.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 222487 number of failed transactions: 0 (0.000%) latency average = 0.045 ms initial connection time = 1.603 ms tps = 22252.095670 (without initial connection time) pgbench (17devel) transaction type: to_regtypmod.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 276134 number of failed transactions: 0 (0.000%) latency average = 0.036 ms initial connection time = 1.570 ms tps = 27617.628259 (without initial connection time) pgbench (17devel) transaction type: to_regtypmod_and_format.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 270820 number of failed transactions: 0 (0.000%) latency average = 0.037 ms initial connection time = 1.631 ms tps = 27086.331104 (without initial connection time) -- Erik
Attachment
Hi
ne 18. 2. 2024 v 19:50 odesílatel Erik Wienhold <ewie@ewie.name> napsal:
On 2024-02-12 19:20 +0100, Tom Lane wrote:
> I wrote:
> > It strikes me that this is basically to_regtype() with the additional
> > option to return the typmod. That leads to some questions:
>
> BTW, another way that this problem could be approached is to use
> to_regtype() as-is, with a separate function to obtain the typmod:
>
> select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));
>
> This is intellectually ugly, since it implies parsing the same
> typename string twice. But on the other hand it avoids the notational
> pain and runtime overhead involved in using a record-returning
> function. So I think it might be roughly a wash for performance.
> Question to think about is which way is easier to use. I don't
> have an opinion particularly; just throwing the idea out there.
Out of curiosity, I benchmarked this with the attached to_regtypmod()
patch based on David's v5 applied to a6c21887a9. The script running
pgbench and its output are included at the end.
Just calling parse_type() vs to_regtype()/to_regtypmod() is a wash for
performance as you thought. But format_type() performs better with
to_regtypmod() than with parse_type(). Accessing the record fields
returned by parse_type() adds some overhead.
to_regtypmod() is better for our use case in pgTAP which relies on
format_type() to normalize the type name. The implementation of
to_regtypmod() is also simpler than parse_type(). Usage-wise, both are
clunky IMO.
Benchmark script:
#!/usr/bin/env bash
set -eu
cat <<'SQL' > parse_type.sql
SELECT parse_type('interval second(0)');
SQL
cat <<'SQL' > parse_type_and_format.sql
SELECT format_type(p.typid, p.typmod) FROM parse_type('interval second(0)') p;
SQL
cat <<'SQL' > to_regtypmod.sql
SELECT to_regtype('interval second(0)'), to_regtypmod('interval second(0)');
SQL
cat <<'SQL' > to_regtypmod_and_format.sql
SELECT format_type(to_regtype('interval second(0)'), to_regtypmod('interval second(0)'));
SQL
for f in \
parse_type.sql \
parse_type_and_format.sql \
to_regtypmod.sql \
to_regtypmod_and_format.sql
do
pgbench -n -f "$f" -T10 postgres
echo
done
pgbench output:
pgbench (17devel)
transaction type: parse_type.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 277017
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 1.623 ms
tps = 27706.005513 (without initial connection time)
pgbench (17devel)
transaction type: parse_type_and_format.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 222487
number of failed transactions: 0 (0.000%)
latency average = 0.045 ms
initial connection time = 1.603 ms
tps = 22252.095670 (without initial connection time)
pgbench (17devel)
transaction type: to_regtypmod.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 276134
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 1.570 ms
tps = 27617.628259 (without initial connection time)
pgbench (17devel)
transaction type: to_regtypmod_and_format.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 270820
number of failed transactions: 0 (0.000%)
latency average = 0.037 ms
initial connection time = 1.631 ms
tps = 27086.331104 (without initial connection time)
The overhead of parse_type_and_format can be related to higher planning time. PL/pgSQL can assign composite without usage FROM clause.
Regards
Pavel
--
Erik
On 2024-02-18 20:00 +0100, Pavel Stehule wrote: > The overhead of parse_type_and_format can be related to higher planning > time. PL/pgSQL can assign composite without usage FROM clause. Thanks, didn't know that this makes a difference. In that case both variants are on par. BEGIN; CREATE FUNCTION format_with_parse_type(text) RETURNS text LANGUAGE plpgsql STABLE STRICT AS $$ DECLARE p record := parse_type($1); BEGIN RETURN format_type(p.typid, p.typmod); END $$; CREATE FUNCTION format_with_to_regtypmod(text) RETURNS text LANGUAGE plpgsql STABLE STRICT AS $$ BEGIN RETURN format_type(to_regtype($1), to_regtypmod($1)); END $$; COMMIT; Results: SELECT format_with_parse_type('interval second(0)'); pgbench (17devel) transaction type: format_with_parse_type.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 253530 number of failed transactions: 0 (0.000%) latency average = 0.039 ms initial connection time = 1.846 ms tps = 25357.551681 (without initial connection time) SELECT format_with_to_regtypmod('interval second(0)'); pgbench (17devel) transaction type: format_with_to_regtypmod.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 257942 number of failed transactions: 0 (0.000%) latency average = 0.039 ms initial connection time = 1.544 ms tps = 25798.015526 (without initial connection time) -- Erik
On Feb 18, 2024, at 15:55, Erik Wienhold <ewie@ewie.name> wrote: >> The overhead of parse_type_and_format can be related to higher planning >> time. PL/pgSQL can assign composite without usage FROM clause. > > Thanks, didn't know that this makes a difference. In that case both > variants are on par. Presumably this is a side-effect of the use of a RECORD here, which requires a FROM clause in pure SQL, yes? The only way I can think of to avoid that is to: 1. Add a to_regtypmod() for those who just want the typemod. 2. Add a format_type_string() function that returns a string, the equivalent of this function but in C: CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$ SELECT format_type(to_regtype($1), to_regtypmod($1)) $$; We could also keep the record-returning function for users who want both the regtype and the regytypemod at once, but withthe other two I consider it less essential. Thoughts? Best, David
"David E. Wheeler" <david@justatheory.com> writes: > The only way I can think of to avoid that is to: > 1. Add a to_regtypmod() for those who just want the typemod. Seems like there's a good case for doing that. > 2. Add a format_type_string() function that returns a string, the equivalent of this function but in C: > CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$ > SELECT format_type(to_regtype($1), to_regtypmod($1)) > $$; I'm less thrilled about that, mainly because I can't think of a good name for it ("format_type_string" is certainly not that). Is the use-case for this functionality really strong enough that we need to provide it as a single function rather than something assembled out of spare parts? regards, tom lane
I wrote: > I'm less thrilled about that, mainly because I can't think of > a good name for it ("format_type_string" is certainly not that). After some time ruminating, a couple of possibilities occurred to me: reformat_type(text) canonical_type_name(text) > Is the use-case for this functionality really strong enough that > we need to provide it as a single function rather than something > assembled out of spare parts? I'm still unconvinced about that, though. regards, tom lane
On Feb 19, 2024, at 15:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. Add a to_regtypmod() for those who just want the typemod. > > Seems like there's a good case for doing that. I’ll work on that. > I'm less thrilled about that, mainly because I can't think of > a good name for it ("format_type_string" is certainly not that). > Is the use-case for this functionality really strong enough that > we need to provide it as a single function rather than something > assembled out of spare parts? Not essential for pgTAP, no, as we can just use the parts. It was the typmod bit that was missing. On Feb 19, 2024, at 17:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > After some time ruminating, a couple of possibilities occurred to > me: > reformat_type(text) > canonical_type_name(text) I was just thinking about hitting the thesaurus entry for “canonical”, but I quite like reformat_type. It’s super clear anddraws the parallel to format_type() more clearly. Will likely steal the name for pgTAP if we don’t add it to core. :-) > I'm still unconvinced about that, though. I guess the questions are: * What are the other use cases for it? * How obvious is it how to do it? For the latter, it could easily be an example in the docs. Best, David
On 2024-02-19 23:59 +0100, David E. Wheeler wrote: > On Feb 19, 2024, at 15:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> 1. Add a to_regtypmod() for those who just want the typemod. > > > > Seems like there's a good case for doing that. > > I’ll work on that. See the patch I wrote for my benchmarks. But it's pretty easy anyway to cut down parse_type() ;) > > I'm less thrilled about that, mainly because I can't think of a good > > name for it ("format_type_string" is certainly not that). Is the > > use-case for this functionality really strong enough that we need to > > provide it as a single function rather than something assembled out > > of spare parts? > > Not essential for pgTAP, no, as we can just use the parts. It was the > typmod bit that was missing. But you don't actually need reformat_type() in pgTAP. You can just get the type OID and modifier of the want_type and have_type and compare those. Then use format_type() for the error message. Looks a bit cleaner to me than doing the string comparison. On second thought, I guess comparing the reformatted type names is necessary in order to have a uniform API on older Postgres releases where pgTAP has to provide its own to_regtypmod() based on typmodin functions. > On Feb 19, 2024, at 17:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > After some time ruminating, a couple of possibilities occurred to > > me: reformat_type(text) canonical_type_name(text) > > I was just thinking about hitting the thesaurus entry for “canonical”, > but I quite like reformat_type. It’s super clear and draws the > parallel to format_type() more clearly. Will likely steal the name for > pgTAP if we don’t add it to core. :-) > > > I'm still unconvinced about that, though. > > I guess the questions are: > > * What are the other use cases for it? Can't think of any right now other than this are-type-names-the-same check. Perhaps also for pretty-printing user-provided type names where we don't take the type info from e.g. pg_attribute. > * How obvious is it how to do it? > > For the latter, it could easily be an example in the docs. Can be mentioned right under format_type(). -- Erik
On Feb 19, 2024, at 21:58, Erik Wienhold <ewie@ewie.name> wrote: > See the patch I wrote for my benchmarks. But it's pretty easy anyway to > cut down parse_type() ;) LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached. > But you don't actually need reformat_type() in pgTAP. You can just get > the type OID and modifier of the want_type and have_type and compare > those. Then use format_type() for the error message. Looks a bit > cleaner to me than doing the string comparison. Fair. > On second thought, I guess comparing the reformatted type names is > necessary in order to have a uniform API on older Postgres releases > where pgTAP has to provide its own to_regtypmod() based on typmodin > functions. Maybe. Worth playing with. >> For the latter, it could easily be an example in the docs. > > Can be mentioned right under format_type(). Well I included it in the to_regtypemod docs here, but could so either. Best, David
Attachment
On Tue, Feb 20, 2024 at 11:06 AM David E. Wheeler <david@justatheory.com> wrote: > > LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached. > +SELECT to_regtypemod('interval nonesuch'); -- grammar error expected +ERROR: syntax error at or near "nonesuch" +LINE 1: SELECT to_regtypemod('interval nonesuch'); + ^ +CONTEXT: invalid type name "interval nonesuch" +SELECT to_regtypemod('year(4)'); -- grammar error expected + to_regtypemod +--------------- + +(1 row) + the second hint `-- grammar error expected` seems to contradict with the results?
On Feb 20, 2024, at 01:30, jian he <jian.universality@gmail.com> wrote: > the second hint `-- grammar error expected` seems to contradict with > the results? Quite right, thank you, that’s actually a trapped error. I’ve tweaked the comments and their order in v7, attached. This goes back to the discussion of the error raising of to_regtype[1], so I’ve incorporated the patch from that thread intothis patch, and set up the docs for to_regtypemod() with similar information. The wording is still a little opaque formy taste, though, written more for someone who knows a bit about the internals, but it’s a start. I’ve also fixed the wayward parameter in the function signature in the docs, and added a note about why I’ve also patchedgenbki.pl. Best, David [1] https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e
Attachment
On 2024-02-20 15:44 +0100, David E. Wheeler wrote: > I’ve tweaked the comments and their order in v7, attached. > > This goes back to the discussion of the error raising of > to_regtype[1], so I’ve incorporated the patch from that thread into > this patch, and set up the docs for to_regtypemod() with similar > information. Makes sense. > [1] https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e > - <entry role="func_table_entry"><para role="func_signature"> > + <entry id="format_type" role="func_table_entry"><para role="func_signature"> > <indexterm> > <primary>format_type</primary> > </indexterm> > @@ -25462,7 +25462,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); > </row> > > <row> > - <entry role="func_table_entry"><para role="func_signature"> > + <entry id="to_regtype" role="func_table_entry"><para role="func_signature"> The references are printed as "???" right now. Can be fixed with xreflabel="format_type" and xreflabel="to_regtype" on those <entry> elements. > + <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> ) The docs show parameter name "type" but pg_proc.data does not define proargnames. So the to_regtypemod() cannot be called using named notation: test=> select to_regtypemod(type => 'int'::text); ERROR: function to_regtypemod(type => text) does not exist > + Parses a string of text, extracts a potential type name from it, and > + translates its typmod, the modifier for the type, if any. Failure to s/typmod, the modifier of the type/type modifier/ Because format_type() already uses "type modifier" and I think it helps searchability to use the same wording. -- Erik
On Feb 20, 2024, at 21:09, Erik Wienhold <ewie@ewie.name> wrote: > The references are printed as "???" right now. Can be fixed with > xreflabel="format_type" and xreflabel="to_regtype" on those <entry> > elements. Thanks. > The docs show parameter name "type" but pg_proc.data does not define > proargnames. So the to_regtypemod() cannot be called using named > notation: > > test=> select to_regtypemod(type => 'int'::text); > ERROR: function to_regtypemod(type => text) does not exist Yes, this format is identical to that of to_regtype(): david=# select to_regtype(type => 'int'::text); ERROR: function to_regtype(type => text) does not exist > + Parses a string of text, extracts a potential type name from it, and >> + translates its typmod, the modifier for the type, if any. Failure to > > s/typmod, the modifier of the type/type modifier/ > > Because format_type() already uses "type modifier" and I think it helps > searchability to use the same wording. Yes, definitely better wording, thanks. V8 attached. Best, David
Attachment
On 2024-02-21 16:14 +0100, David E. Wheeler wrote: > > V8 attached. Thanks. But it's an applefile again, not a patch :P -- Erik
On Feb 21, 2024, at 11:18, Erik Wienhold <ewie@ewie.name> wrote: > Thanks. But it's an applefile again, not a patch :P WTF. I still have that feature disabled. Oh, I think I deleted the file after dragged it into Mail but before sending, because it’s empty everywhere I look. 🤦🏻♂️Let’s try that again. Best, David
Attachment
On 2024-02-21 17:51 +0100, David E. Wheeler wrote: > On Feb 21, 2024, at 11:18, Erik Wienhold <ewie@ewie.name> wrote: > > > Thanks. But it's an applefile again, not a patch :P > > Let’s try that again. Thanks. > + <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> ) The docs still state that to_regtypemod() has a named parameter, which is not the case per pg_proc.dat. Besides that, LGTM. -- Erik
On Feb 21, 2024, at 16:43, Erik Wienhold <ewie@ewie.name> wrote: > The docs still state that to_regtypemod() has a named parameter, which > is not the case per pg_proc.dat. Bah, I cleaned it up before but somehow put it back. Thanks for the catch. Fixed. Best, David
Attachment
On 2024-02-21 22:49 +0100, David E. Wheeler wrote: > On Feb 21, 2024, at 16:43, Erik Wienhold <ewie@ewie.name> wrote: > > > The docs still state that to_regtypemod() has a named parameter, which > > is not the case per pg_proc.dat. > > Bah, I cleaned it up before but somehow put it back. Thanks for the > catch. Fixed. Thanks David! LGTM. -- Erik
On Feb 21, 2024, at 17:19, Erik Wienhold <ewie@ewie.name> wrote: > Thanks David! LGTM. Thanks. Anyone else? Or is it ready for committer? Best, David
On Feb 21, 2024, at 19:13, David E. Wheeler <david@justatheory.com> wrote: > Thanks. Anyone else? Or is it ready for committer? What’s the protocol for marking a patch ready for committer? Thanks, David
On 24.02.24 14:46, David E. Wheeler wrote: > What’s the protocol for marking a patch ready for committer? I guess after the review of the last assigned reviewer v9 applies cleanly, all tests pass and documentation builds correctly. Just a very small observation: The fact that a completely invalid type returns NULL .. SELECT to_regtypemod('foo'); to_regtypemod --------------- (1 row) .. but a "partially" valid one returns an error might be confusing postgres=# SELECT to_regtypemod('timestamp(-4)'); ERROR: syntax error at or near "-" LINE 1: SELECT to_regtypemod('timestamp(-4)'); ^ CONTEXT: invalid type name "timestamp(-4)" postgres=# SELECT to_regtypemod('text(-4)'); ERROR: type modifier is not allowed for type "text" This behaviour is mentioned in the documentation, so I'd say it is ok. + <xref linkend="datatype-oid"/>). Failure to extract a valid potential + type name results in an error; however, if the extracted name is not + known to the system, this function will return <literal>NULL</literal>. I would personally prefer either NULL or an error in both cases, but I can totally live with the current design. OTOH, format_type returns "???" and it seems to be fine, so don't take this comment too seriously :) SELECT format_type(-1,-1); format_type ------------- ??? (1 row) Other than that, LGTM. Thanks David! Best, Jim -- Jim
On Feb 24, 2024, at 19:11, Jim Jones <jim.jones@uni-muenster.de> wrote: >> What’s the protocol for marking a patch ready for committer? > > I guess after the review of the last assigned reviewer Oh, I didn’t realize someone was assigned. :-) > The fact that a completely invalid type returns NULL .. > > SELECT to_regtypemod('foo'); > to_regtypemod > --------------- > > (1 row) > > > .. but a "partially" valid one returns an error might be confusing > > postgres=# SELECT to_regtypemod('timestamp(-4)'); > ERROR: syntax error at or near "-" > LINE 1: SELECT to_regtypemod('timestamp(-4)'); > ^ > CONTEXT: invalid type name "timestamp(-4)" > > postgres=# SELECT to_regtypemod('text(-4)'); > ERROR: type modifier is not allowed for type "text" Yeah, there was quite a bit of discussion of this issue back in September[1]. > This behaviour is mentioned in the documentation, so I'd say it is ok. This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH. > I would personally prefer either NULL or an error in both cases, but I > can totally live with the current design. SAME. Best, David [1] https://www.postgresql.org/message-id/flat/09F9CAD6-5096-43CC-B6A7-685703E4714D@justatheory.com
Hello Hackers, On Feb 25, 2024, at 13:00, David E. Wheeler <david@justatheory.com> wrote: >> postgres=# SELECT to_regtypemod('timestamp(-4)'); >> ERROR: syntax error at or near "-" >> LINE 1: SELECT to_regtypemod('timestamp(-4)'); >> ^ >> CONTEXT: invalid type name "timestamp(-4)" >> >> postgres=# SELECT to_regtypemod('text(-4)'); >> ERROR: type modifier is not allowed for type "text" > > Yeah, there was quite a bit of discussion of this issue back in September[1]. > >> This behaviour is mentioned in the documentation, so I'd say it is ok. > > This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH. I’ve rebased the patch and, in an attempt to clarify this behavior, added a couple of examples to the docs for to_regtype.Updated patch attached. Best, David
Attachment
Hi David, On 2024-03-08 02:37 +0100, David E. Wheeler wrote: > I’ve rebased the patch and, in an attempt to clarify this behavior, > added a couple of examples to the docs for to_regtype. Updated patch > attached. On your latest addition: > + <xref linkend="datatype-oid"/>). Failure to extract a valid potential > + type name results in an error. For example: > +<programlisting> > +SELECT to_regtype('party'); > + to_regtype > +------------ > + > +</programlisting> > + However, if the extracted name is not known to the system, this function > + will return <literal>NULL</literal>. For example: > +<programlisting> > +SELECT to_regtype('interval nonesuch'); > +ERROR: syntax error at or near "nonesuch" > +LINE 1: select to_regtype('interval nonesuch'); > + ^ > +CONTEXT: invalid type name "interval nonesuch" > +</programlisting> I think you need to swap the examples. The text mentions the error case first and the NULL case second. -- Erik
On Mar 7, 2024, at 23:39, Erik Wienhold <ewie@ewie.name> wrote: > I think you need to swap the examples. The text mentions the error case > first and the NULL case second. 🤦🏻♂️ Thanks, fixed in the attached patch. David
Attachment
On 2024-03-09 02:42 +0100, David E. Wheeler wrote: > On Mar 7, 2024, at 23:39, Erik Wienhold <ewie@ewie.name> wrote: > > > I think you need to swap the examples. The text mentions the error case > > first and the NULL case second. > > 🤦🏻♂️ > > Thanks, fixed in the attached patch. Thanks, LGTM. -- Erik
"David E. Wheeler" <david@justatheory.com> writes: > Thanks, fixed in the attached patch. Pushed with some editorialization. Mostly, I whacked the documentation around pretty heavily: we have a convention for what examples in function descriptions should look like, and this wasn't it. Not entirely your fault, since some nearby entries in that table hadn't gotten the word either. Also, I made a point of adding tests for a corner case that I initially thought the patch would not get right: SELECT format_type(to_regtype('bit'), to_regtypemod('bit')); format_type ------------- bit(1) (1 row) SELECT format_type(to_regtype('"bit"'), to_regtypemod('"bit"')); format_type ------------- "bit" (1 row) This comes from the comment in format_type(): /* * bit with typmod -1 is not the same as BIT, which means * BIT(1) per SQL spec. Report it as the quoted typename so * that parser will not assign a bogus typmod. */ My initial fear was that we'd have to emit NULL for no typmod in order to round-trip the second case, but it seems to work as-is, so that's good. I left it emitting -1 for no typmod, and documented that explicitly. regards, tom lane
On Mar 20, 2024, at 17:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pushed with some editorialization. Mostly, I whacked the > documentation around pretty heavily: we have a convention for what > examples in function descriptions should look like, and this wasn't > it. Not entirely your fault, since some nearby entries in that > table hadn't gotten the word either. Ah, great, and your wording on the parser error issue is much better, thank you! Best, David