Thread: Content for talk on Postgres Type System at PostgresConf
I am to talk about Postgres’s type system at PGConf:
I picked the issue because I think it’s poorly understood, greatly under-discussed, and an excellent way to empower postgres users.
I am reasonably conversant with the issue. I’m not looking for others to write the talk for me, but in order to make the best talk I can, I’m asking:
What would you want to see in such a talk?
I’m planning on covering:
- The built-in types that are underused and their advantages (eg inet)
- domains
- such things as details of arrays including multidimensional arrays
- user-defined types, their relationship to tables, and generally how to use them
I would spend most of the time discussing ways to make effective use of types. Some examples:
- defining functions of rows so the table can be used kind of like a set of objects (including the dot notation for invoking functions)
- using UDFs to make code clearer eg if you have an idiosyncratic functional index, define it using a function of the row, so it’s easy to get right when querying
- using UDFs as a kind of better domain. eg differentiating imperial from metric units by requiring an explicit constructor, not just accepting any old number
I would mention enumerated types, although I’m inclined to advise that their inflexibility (eg can’t delete or rearrange them) means that a related table is probably better (I’d be delighted to be proved wrong).
Custom Range Types are an interesting feature, but I can’t think of a good use case. Any good examples?
Automatic casting is a feature I’m aware of, but I’d be interested in any cool use cases.
Anything I’m missing? Any existing good discussions of the subject?
On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote: > I am to talk about Postgres’s type system at PGConf: > > https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system > > I picked the issue because I think it’s poorly understood, greatly > under-discussed, and an excellent way to empower postgres users. > > What would you want to see in such a talk? It might be good to explain how "timestamp with time zone" works. That's often confusing for beginners, because it is different from other databases and arguably deviates from the SQL standard. Perhaps it would be good to warn people about using data types like "character", "time with time zone" and "money". Yours, Laurenz Albe
Re: Content for talk on Postgres Type System at PostgresConf
From
grimy.outshine830@aceecat.org
Date:
On Thu, Feb 29, 2024 at 10:11:03AM +0100, Laurenz Albe wrote: > It might be good to explain how "timestamp with time zone" works. > That's often confusing for beginners, because it is different from > other databases and arguably deviates from the SQL standard. The most confusing part is the name :-P It is natural for a beginner -- and I am not that far from being a beginner -- to assume a data type with this name must store a time zone somehow. -- Ian
In case I’m not correct on the issues with these types, myself, what are the misconceptions, or where might I find them for myself?
My current understanding:
My current understanding:
- character is fixed-length, blank-padded. Not sure when you’d want that, but it seems clear. Is the name just confusing?
- timestamptz is just converted to a timestamp in UTC. Folks might imagine that it stores the time zone but it doesn’t.
- time with time zone *does* store the time zone, but this isn’t actually useful and should be avoided (I’m not entirely sure why and the docs only gesture at the problems without stating them, IIRC)
- money is a fixed-point decimal value, the number of decimal places is locale determined. I’m not aware of any particular problems with that
On 29 Feb 2024 at 01:11 -0800, Laurenz Albe <laurenz.albe@cybertec.at>, wrote:
On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote:I am to talk about Postgres’s type system at PGConf:
https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system
I picked the issue because I think it’s poorly understood, greatly
under-discussed, and an excellent way to empower postgres users.
What would you want to see in such a talk?
It might be good to explain how "timestamp with time zone" works.
That's often confusing for beginners, because it is different from
other databases and arguably deviates from the SQL standard.
Perhaps it would be good to warn people about using data types like
"character", "time with time zone" and "money".
Yours,
Laurenz Albe
On 2/29/24 1:38 PM, Guyren Howe wrote:
In case I’m not correct on the issues with these types, myself, what are the misconceptions, or where might I find them for myself?
My current understanding:
My current understanding:
- character is fixed-length, blank-padded. Not sure when you’d want that, but it seems clear. Is the name just confusing?
- timestamptz is just converted to a timestamp in UTC. Folks might imagine that it stores the time zone but it doesn’t.
- time with time zone *does* store the time zone, but this isn’t actually useful and should be avoided (I’m not entirely sure why and the docs only gesture at the problems without stating them, IIRC)
- money is a fixed-point decimal value, the number of decimal places is locale determined. I’m not aware of any particular problems with that
test=# show lc_monetary ;
lc_monetary
-------------
en_US.UTF8
(1 row)
test=# select 1000::money;
money
-----------
$1,000.00
(1 row)
test=# set lc_monetary = 'en_GB.utf8';
SET
test=# select 1000::money;
money
-----------
£1,000.00
On 29 Feb 2024 at 01:11 -0800, Laurenz Albe <laurenz.albe@cybertec.at>, wrote:On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote:I am to talk about Postgres’s type system at PGConf:
https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system
I picked the issue because I think it’s poorly understood, greatly
under-discussed, and an excellent way to empower postgres users.
What would you want to see in such a talk?
It might be good to explain how "timestamp with time zone" works.
That's often confusing for beginners, because it is different from
other databases and arguably deviates from the SQL standard.
Perhaps it would be good to warn people about using data types like
"character", "time with time zone" and "money".
Yours,
Laurenz Albe
-- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Feb 29, 2024 at 2:38 PM Guyren Howe <guyren@gmail.com> wrote:
In case I’m not correct on the issues with these types, myself, what are the misconceptions, or where might I find them for myself?
My current understanding:
- character is fixed-length, blank-padded. Not sure when you’d want that, but it seems clear. Is the name just confusing?
Character is not stored fixed length, it is stored variable width but it just so happens that the variable length is the same for all rows.
- timestamptz is just converted to a timestamp in UTC. Folks might imagine that it stores the time zone but it doesn’t.
You are missing the point then whenever converting a stored timestamptz to textual representation the action is stable because it involves looking up the session time zone setting. The same goes in reverse unless the textual representation includes the time zone specification to use.
And yes, people do believe that whatever timezone was in effect when the data was recorded is saved when indeed it is not; such information would need to be captured separately.
- time with time zone *does* store the time zone, but this isn’t actually useful and should be avoided (I’m not entirely sure why and the docs only gesture at the problems without stating them, IIRC)
No it doesn't store the time zone. Nor do the docs say they do. And clearly point out the issue that evaluating a time zone without both date and time inputs is basically pointless.
> money is a fixed-point decimal value, the number of decimal places is locale determined. I’m not aware of any particular problems with that
You forget about the currency symbol dynamic. Like with time zones the local session provides the context, not the stored data.
David J.
On 28 Feb 2024, at 17:08, Guyren Howe <guyren@gmail.com> wrote:
I am to talk about Postgres’s type system at PGConf:I picked the issue because I think it’s poorly understood, greatly under-discussed, and an excellent way to empower postgres users.I am reasonably conversant with the issue. I’m not looking for others to write the talk for me, but in order to make the best talk I can, I’m asking:What would you want to see in such a talk?I’m planning on covering:- The built-in types that are underused and their advantages (eg inet)- domains- such things as details of arrays including multidimensional arrays- user-defined types, their relationship to tables, and generally how to use themI would spend most of the time discussing ways to make effective use of types. Some examples:- defining functions of rows so the table can be used kind of like a set of objects (including the dot notation for invoking functions)- using UDFs to make code clearer eg if you have an idiosyncratic functional index, define it using a function of the row, so it’s easy to get right when querying- using UDFs as a kind of better domain. eg differentiating imperial from metric units by requiring an explicit constructor, not just accepting any old numberI would mention enumerated types, although I’m inclined to advise that their inflexibility (eg can’t delete or rearrange them) means that a related table is probably better (I’d be delighted to be proved wrong).Custom Range Types are an interesting feature, but I can’t think of a good use case. Any good examples?Automatic casting is a feature I’m aware of, but I’d be interested in any cool use cases.Anything I’m missing? Any existing good discussions of the subject?
Any gotchas about user defined types or arrays? I was only planning on discussing simple user defined types, without going into writing the optional features in C.
What is worth saying about custom range types? What even would be a good use case? A range of strings or ip address ranges, something like that, I suppose?
Is there a good case for enumerations versus a fk to a table? I guess if you’re sure you won’t need to change the values in the enumeration? Days of the week, that sort of thing.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Thu, Feb 29, 2024 at 2:38 PM Guyren Howe <guyren@gmail.com> wrote: >> My current understanding: >> >> - character is fixed-length, blank-padded. Not sure when you’d want >> that, but it seems clear. Is the name just confusing? > Character is not stored fixed length, it is stored variable width but it > just so happens that the variable length is the same for all rows. It's nastier than that: the restriction is that the length in *characters* be the same in all rows. In a variable-width encoding like UTF8 the length in bytes can vary. This means that the optimization that CHARACTER was meant to enable (ie "the physical width of this column is fixed, so you don't have to store a length indication") doesn't work for us, making it just about totally useless. BTW, the SQL spec words the constraint the same way, making me wonder if any modern RDBMS can use this type in the way it was designed. >> - time with time zone *does* store the time zone, but this isn’t >> actually useful and should be avoided (I’m not entirely sure why and the >> docs only gesture at the problems without stating them, IIRC) > No it doesn't store the time zone. Nor do the docs say they do. And > clearly point out the issue that evaluating a time zone without both date > and time inputs is basically pointless. timetz *does* store a time zone, in the sense of storing a numeric offset from UTC (i.e., "so many minutes east or west of Greenwich"). The problem is that in most real-world applications your notion of "time zone" probably includes annual DST changes, which timetz can't represent. I don't say the type is completely useless, but its usefulness is a lot less than you might guess. >> money is a fixed-point decimal value, the number of decimal places is >> locale determined. I’m not aware of any particular problems with that > You forget about the currency symbol dynamic. Like with time zones the > local session provides the context, not the stored data. Yeah. The fact that the same stored value might look like 10.00 euros to one session and 1000 yen to another one is pretty catastrophic. The other nasty thing about money is that it's an int64 so it can't represent more than 2^63 pennies (for whatever a "penny" is). Now, that's still a Frickin Lot Of Money in any non-hyperinflated currency, but it's the sort of restriction that banks don't like to hear of. regards, tom lane
On 29 Feb 2024, at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
- time with time zone *does* store the time zone, but this isn’t
actually useful and should be avoided (I’m not entirely sure why and the
docs only gesture at the problems without stating them, IIRC)No it doesn't store the time zone. Nor do the docs say they do. And
clearly point out the issue that evaluating a time zone without both date
and time inputs is basically pointless.
timetz *does* store a time zone, in the sense of storing a numeric
offset from UTC (i.e., "so many minutes east or west of Greenwich").
The problem is that in most real-world applications your notion of
"time zone" probably includes annual DST changes, which timetz can't
represent. I don't say the type is completely useless, but its
usefulness is a lot less than you might guess.
The closest I can come to this in the docs is:
"The appropriate time zone offset is recorded in the
time with time zone
value and is output as stored; it is not adjusted to the active time zone.”I expect to be submitting some documentation updates as part of this project, fwiw.
On 2/29/24 14:47, Guyren Howe wrote: > What is worth saying about custom range types? What even would be a good use case? A range of > strings or ip address ranges, something like that, I suppose? I wrote a blog post and an extension for an inetrange type. It's from 2016 so it might need some updates, but you are welcome to use anything you like for your talk: https://illuminatedcomputing.com/posts/2016/06/inet-range/ Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Thu, 2024-02-29 at 13:38 -0800, Guyren Howe wrote: > what are the misconceptions, or where might I find them for myself? In addition to what was already said: > My current understanding: > * character is fixed-length, blank-padded. Not sure when you’d > want that, but it seems clear. Is the name just confusing? I find the semantics confusing: test=> SELECT 'a'::character(10); bpchar ════════════ a (1 row) Ok, it is 10 characters long. test=> SELECT length('a'::character(10)); length ════════ 1 (1 row) Or is it? test=> SELECT 'a'::character(10) || 'b'::character(10); ?column? ══════════ ab (1 row) And why is the result not 20 characters long, with spaces between "a" and "b"? Best avoid "character". > * timestamptz is just converted to a timestamp in UTC. Folks might > imagine that it stores the time zone but it doesn’t. Yes, and I find that lots of people are confused by that. You could talk about the interaction with the "timezone" parameter, and that it is not so much a timestamp with time zone, but an "absolute timestamp", and in combination with "timestamp" a great way to let the database handle the difficult task of time zone conversion for you. Yours, Laurenz Albe
On 3/1/24 01:18, Laurenz Albe wrote: > On Thu, 2024-02-29 at 13:38 -0800, Guyren Howe wrote: >> what are the misconceptions, or where might I find them for myself? > > In addition to what was already said: > >> My current understanding: >> * character is fixed-length, blank-padded. Not sure when you’d >> want that, but it seems clear. Is the name just confusing? > > I find the semantics confusing: > > test=> SELECT 'a'::character(10); > bpchar > ════════════ > a > (1 row) > > Ok, it is 10 characters long. > > test=> SELECT length('a'::character(10)); > length > ════════ > 1 > (1 row) > > Or is it? https://www.postgresql.org/docs/current/datatype-character.html "Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results; for example SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2) returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values, and when using pattern matching, that is LIKE and regular expressions." > > test=> SELECT 'a'::character(10) || 'b'::character(10); > ?column? > ══════════ > ab > (1 row) > > And why is the result not 20 characters long, with spaces between "a" and "b"? SELECT pg_typeof('a'::character(10) || 'b'::character(10)); pg_typeof ----------- text This is covered by "Trailing spaces are removed when converting a character value to one of the other string types.". Though that still leaves you with: SELECT pg_typeof(('a'::character(10) || 'b'::character(10))::char(20)); pg_typeof ----------- character SELECT ('a'::character(10) || 'b'::character(10))::char(20); bpchar ---------------------- ab > > Best avoid "character". > >> * timestamptz is just converted to a timestamp in UTC. Folks might >> imagine that it stores the time zone but it doesn’t. > > Yes, and I find that lots of people are confused by that. > > You could talk about the interaction with the "timezone" parameter, and > that it is not so much a timestamp with time zone, but an "absolute timestamp", > and in combination with "timestamp" a great way to let the database handle > the difficult task of time zone conversion for you. > > Yours, > Laurenz Albe > > > -- Adrian Klaver adrian.klaver@aklaver.com
Re: Content for talk on Postgres Type System at PostgresConf
From
grimy.outshine830@aceecat.org
Date:
On Thu, Feb 29, 2024 at 05:51:11PM -0500, Tom Lane wrote: > >> money is a fixed-point decimal value, the number of decimal > >> places is locale determined. I’m not aware of any particular > >> problems with that > > You forget about the currency symbol dynamic. Like with time zones > > the local session provides the context, not the stored data. > Yeah. The fact that the same stored value might look like 10.00 > euros to one session and 1000 yen to another one is pretty > catastrophic. The other nasty thing about money is that it's an > int64 so it can't represent more than 2^63 pennies (for whatever a > "penny" is). Now, that's still a Frickin Lot Of Money in any > non-hyperinflated currency, but it's the sort of restriction that > banks don't like to hear of. Lame excuse first: I have never used the money type, probably because I overheard a word like "catastrophic" in my early development :-) But, doesn't what Tom says above contradict Adrian's example session? Either the stored value is re-interpreted according to the locale context, or it isn't. IMO it would be *more* catastrophic if it wasn't, as it looks from Adrian's example. -- Ian
On 3/1/24 11:27, grimy.outshine830@aceecat.org wrote: > On Thu, Feb 29, 2024 at 05:51:11PM -0500, Tom Lane wrote: > >>>> money is a fixed-point decimal value, the number of decimal >>>> places is locale determined. I’m not aware of any particular >>>> problems with that > >>> You forget about the currency symbol dynamic. Like with time zones >>> the local session provides the context, not the stored data. > >> Yeah. The fact that the same stored value might look like 10.00 >> euros to one session and 1000 yen to another one is pretty >> catastrophic. The other nasty thing about money is that it's an >> int64 so it can't represent more than 2^63 pennies (for whatever a >> "penny" is). Now, that's still a Frickin Lot Of Money in any >> non-hyperinflated currency, but it's the sort of restriction that >> banks don't like to hear of. > > Lame excuse first: I have never used the money type, probably because > I overheard a word like "catastrophic" in my early development :-) > > But, doesn't what Tom says above contradict Adrian's example session? > Either the stored value is re-interpreted according to the locale > context, or it isn't. IMO it would be *more* catastrophic if it wasn't, > as it looks from Adrian's example. > test=# set lc_monetary='de_DE.utf8'; SET test=# select 10::money; money --------- 10,00 € (1 row) test=# set lc_monetary='ja_JP.utf8'; SET test=# select 10::money; money ------- ¥10 (1 row) -- Adrian Klaver adrian.klaver@aklaver.com
grimy.outshine830@aceecat.org writes: > On Thu, Feb 29, 2024 at 05:51:11PM -0500, Tom Lane wrote: >> Yeah. The fact that the same stored value might look like 10.00 >> euros to one session and 1000 yen to another one is pretty >> catastrophic. > But, doesn't what Tom says above contradict Adrian's example session? No, what he showed was correct. I'm talking about a different facet of the problem: postgres=# show lc_monetary; lc_monetary ------------- en_US.utf8 (1 row) postgres=# create table t (m money); CREATE TABLE postgres=# insert into t values('$1000.00'); INSERT 0 1 postgres=# table t; m ----------- $1,000.00 (1 row) postgres=# set lc_monetary = 'ja_JP.utf8'; SET postgres=# table t; m ----------- ¥100,000 (1 row) Even if that took account of the exchange rate, it'd not be great. But it doesn't; it's just the same digits reinterpreted with a new currency sign and possibly a different number of fractional digits. This might be sort of tolerable if your database only ever deals in one currency, but even then you'd likely want to lock down what that currency is. Making it be controlled by a user-set GUC was probably not a great idea. regards, tom lane
On Fri, 2024-03-01 at 08:14 -0800, Adrian Klaver wrote: > [pointers to the documentation] I know. I was not searching for help with PostgreSQL, I was trying to point out the surprising behavior of "character" as a suggestion for the talk. Yours, Laurenz Albe
Re: Content for talk on Postgres Type System at PostgresConf
From
grimy.outshine830@aceecat.org
Date:
On Fri, Mar 01, 2024 at 03:25:35PM -0500, Tom Lane wrote: > No, what he showed was correct. I'm talking about a different facet > of the problem: > ... > Even if that took account of the exchange rate, it'd not be great. > But it doesn't; it's just the same digits reinterpreted with a new > currency sign and possibly a different number of fractional digits. > This might be sort of tolerable if your database only ever deals in > one currency, but even then you'd likely want to lock down what that > currency is. Making it be controlled by a user-set GUC was probably > not a great idea. Yes, I get it now, thanks. Not useful, I agree. -- Ian