Thread: Content for talk on Postgres Type System at PostgresConf

Content for talk on Postgres Type System at PostgresConf

From
Guyren Howe
Date:
I am to talk about Postgres’s type system at PGConf:


I picked the issue because I think it’s poorly understood, greatly under-discussed, and an excellent way to empower postgres users.

I am reasonably conversant with the issue. I’m not looking for others to write the talk for me, but in order to make the best talk I can, I’m asking:

What would you want to see in such a talk?

I’m planning on covering:

- The built-in types that are underused and their advantages (eg inet)
- domains
- such things as details of arrays including multidimensional arrays
- user-defined types, their relationship to tables, and generally how to use them

I would spend most of the time discussing ways to make effective use of types. Some examples:
- defining functions of rows so the table can be used kind of like a set of objects (including the dot notation for invoking functions)
- using UDFs to make code clearer eg if you have an idiosyncratic functional index, define it using a function of the row, so it’s easy to get right when querying
- using UDFs as a kind of better domain. eg differentiating imperial from metric units by requiring an explicit constructor, not just accepting any old number

I would mention enumerated types, although I’m inclined to advise that their inflexibility (eg can’t delete or rearrange them) means that a related table is probably better (I’d be delighted to be proved wrong).

Custom Range Types are an interesting feature, but I can’t think of a good use case. Any good examples?

Automatic casting is a feature I’m aware of, but I’d be interested in any cool use cases.


Anything I’m missing? Any existing good discussions of the subject?

Re: Content for talk on Postgres Type System at PostgresConf

From
Laurenz Albe
Date:
On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote:
> I am to talk about Postgres’s type system at PGConf:
>
> https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system
>
> I picked the issue because I think it’s poorly understood, greatly
> under-discussed, and an excellent way to empower postgres users.
>
> What would you want to see in such a talk?

It might be good to explain how "timestamp with time zone" works.
That's often confusing for beginners, because it is different from
other databases and arguably deviates from the SQL standard.

Perhaps it would be good to warn people about using data types like
"character", "time with time zone" and "money".

Yours,
Laurenz Albe



Re: Content for talk on Postgres Type System at PostgresConf

From
grimy.outshine830@aceecat.org
Date:
On Thu, Feb 29, 2024 at 10:11:03AM +0100, Laurenz Albe wrote:

> It might be good to explain how "timestamp with time zone" works.
> That's often confusing for beginners, because it is different from
> other databases and arguably deviates from the SQL standard.

The most confusing part is the name :-P

It is natural for a beginner -- and I am not that far from being a
beginner -- to assume a data type with this name must store a time
zone somehow.

-- 
Ian



Re: Content for talk on Postgres Type System at PostgresConf

From
Guyren Howe
Date:
In case I’m not correct on the issues with these types, myself, what are the misconceptions, or where might I find them for  myself?

My current understanding:
  • character is fixed-length, blank-padded. Not sure when you’d want that, but it seems clear. Is the name just confusing?
  • timestamptz is just converted to a timestamp in UTC. Folks might imagine that it stores the time zone but it doesn’t.
  • time with time zone *does* store the time zone, but this isn’t actually useful and should be avoided (I’m not entirely sure why and the docs only gesture at the problems without stating them, IIRC)
  • money is a fixed-point decimal value, the number of decimal places is locale determined. I’m not aware of any particular problems with that
On 29 Feb 2024 at 01:11 -0800, Laurenz Albe <laurenz.albe@cybertec.at>, wrote:
On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote:
I am to talk about Postgres’s type system at PGConf:

https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system

I picked the issue because I think it’s poorly understood, greatly
under-discussed, and an excellent way to empower postgres users.

What would you want to see in such a talk?

It might be good to explain how "timestamp with time zone" works.
That's often confusing for beginners, because it is different from
other databases and arguably deviates from the SQL standard.

Perhaps it would be good to warn people about using data types like
"character", "time with time zone" and "money".

Yours,
Laurenz Albe

Re: Content for talk on Postgres Type System at PostgresConf

From
Adrian Klaver
Date:


On 2/29/24 1:38 PM, Guyren Howe wrote:
In case I’m not correct on the issues with these types, myself, what are the misconceptions, or where might I find them for  myself?

My current understanding:
  • character is fixed-length, blank-padded. Not sure when you’d want that, but it seems clear. Is the name just confusing?
  • timestamptz is just converted to a timestamp in UTC. Folks might imagine that it stores the time zone but it doesn’t.
The issue I see is people getting confused when it come back rotated to whatever TimeZone is set to, which can change from client to client.
  • time with time zone *does* store the time zone, but this isn’t actually useful and should be avoided (I’m not entirely sure why and the docs only gesture at the problems without stating them, IIRC)
  • money is a fixed-point decimal value, the number of decimal places is locale determined. I’m not aware of any particular problems with that


test=# show lc_monetary ;
 lc_monetary
-------------
 en_US.UTF8
(1 row)

test=# select 1000::money;
   money   
-----------
 $1,000.00
(1 row)

test=# set lc_monetary = 'en_GB.utf8';
SET
test=# select 1000::money;
   money   
-----------
 £1,000.00


    On 29 Feb 2024 at 01:11 -0800, Laurenz Albe <laurenz.albe@cybertec.at>, wrote:
    On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote:
    I am to talk about Postgres’s type system at PGConf:

    https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system

    I picked the issue because I think it’s poorly understood, greatly
    under-discussed, and an excellent way to empower postgres users.

    What would you want to see in such a talk?

    It might be good to explain how "timestamp with time zone" works.
    That's often confusing for beginners, because it is different from
    other databases and arguably deviates from the SQL standard.

    Perhaps it would be good to warn people about using data types like
    "character", "time with time zone" and "money".

    Yours,
    Laurenz Albe
    -- 
    Adrian Klaver
    adrian.klaver@aklaver.com

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    "David G. Johnston"
    Date:
    On Thu, Feb 29, 2024 at 2:38 PM Guyren Howe <guyren@gmail.com> wrote:
    In case I’m not correct on the issues with these types, myself, what are the misconceptions, or where might I find them for  myself?

    My current understanding:
    • character is fixed-length, blank-padded. Not sure when you’d want that, but it seems clear. Is the name just confusing?
    Character is not stored fixed length, it is stored variable width but it just so happens that the variable length is the same for all rows.

     
    • timestamptz is just converted to a timestamp in UTC. Folks might imagine that it stores the time zone but it doesn’t.
    You are missing the point then whenever converting a stored timestamptz to textual representation the action is stable because it involves looking up the session time zone setting.  The same goes in reverse unless the textual representation includes the time zone specification to use.

    And yes, people do believe that whatever timezone was in effect when the data was recorded is saved when indeed it is not; such information would need to be captured separately.
    • time with time zone *does* store the time zone, but this isn’t actually useful and should be avoided (I’m not entirely sure why and the docs only gesture at the problems without stating them, IIRC)
    No it doesn't store the time zone.  Nor do the docs say they do.  And clearly point out the issue that evaluating a time zone without both date and time inputs is basically pointless.

     money is a fixed-point decimal value, the number of decimal places is locale determined. I’m not aware of any particular problems with that

    You forget about the currency symbol dynamic. Like with time zones the local session provides the context, not the stored data.

    David J.

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Guyren Howe
    Date:
    On 28 Feb 2024, at 17:08, Guyren Howe <guyren@gmail.com> wrote:

    I am to talk about Postgres’s type system at PGConf:


    I picked the issue because I think it’s poorly understood, greatly under-discussed, and an excellent way to empower postgres users.

    I am reasonably conversant with the issue. I’m not looking for others to write the talk for me, but in order to make the best talk I can, I’m asking:

    What would you want to see in such a talk?

    I’m planning on covering:

    - The built-in types that are underused and their advantages (eg inet)
    - domains
    - such things as details of arrays including multidimensional arrays
    - user-defined types, their relationship to tables, and generally how to use them

    I would spend most of the time discussing ways to make effective use of types. Some examples:
    - defining functions of rows so the table can be used kind of like a set of objects (including the dot notation for invoking functions)
    - using UDFs to make code clearer eg if you have an idiosyncratic functional index, define it using a function of the row, so it’s easy to get right when querying
    - using UDFs as a kind of better domain. eg differentiating imperial from metric units by requiring an explicit constructor, not just accepting any old number

    I would mention enumerated types, although I’m inclined to advise that their inflexibility (eg can’t delete or rearrange them) means that a related table is probably better (I’d be delighted to be proved wrong).

    Custom Range Types are an interesting feature, but I can’t think of a good use case. Any good examples?

    Automatic casting is a feature I’m aware of, but I’d be interested in any cool use cases.


    Anything I’m missing? Any existing good discussions of the subject?

    Any gotchas about user defined types or arrays? I was only planning on discussing simple user defined types, without going into writing the optional features in C.

    What is worth saying about custom range types? What even would be a good use case? A range of strings or ip address ranges, something like that, I suppose?

    Is there a good case for enumerations versus a fk to a table? I guess if you’re sure you won’t need to change the values in the enumeration? Days of the week, that sort of thing.

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Tom Lane
    Date:
    "David G. Johnston" <david.g.johnston@gmail.com> writes:
    > On Thu, Feb 29, 2024 at 2:38 PM Guyren Howe <guyren@gmail.com> wrote:
    >> My current understanding:
    >> 
    >> - character is fixed-length, blank-padded. Not sure when you’d want
    >> that, but it seems clear. Is the name just confusing?
    
    > Character is not stored fixed length, it is stored variable width but it
    > just so happens that the variable length is the same for all rows.
    
    It's nastier than that: the restriction is that the length in
    *characters* be the same in all rows.  In a variable-width encoding
    like UTF8 the length in bytes can vary.  This means that the
    optimization that CHARACTER was meant to enable (ie "the physical
    width of this column is fixed, so you don't have to store a length
    indication") doesn't work for us, making it just about totally
    useless.
    
    BTW, the SQL spec words the constraint the same way, making me wonder
    if any modern RDBMS can use this type in the way it was designed.
    
    >> - time with time zone *does* store the time zone, but this isn’t
    >> actually useful and should be avoided (I’m not entirely sure why and the
    >> docs only gesture at the problems without stating them, IIRC)
    
    > No it doesn't store the time zone.  Nor do the docs say they do.  And
    > clearly point out the issue that evaluating a time zone without both date
    > and time inputs is basically pointless.
    
    timetz *does* store a time zone, in the sense of storing a numeric
    offset from UTC (i.e., "so many minutes east or west of Greenwich").
    The problem is that in most real-world applications your notion of
    "time zone" probably includes annual DST changes, which timetz can't
    represent.  I don't say the type is completely useless, but its
    usefulness is a lot less than you might guess.
    
    >> money is a fixed-point decimal value, the number of decimal places is
    >> locale determined. I’m not aware of any particular problems with that
    
    > You forget about the currency symbol dynamic. Like with time zones the
    > local session provides the context, not the stored data.
    
    Yeah.  The fact that the same stored value might look like 10.00 euros
    to one session and 1000 yen to another one is pretty catastrophic.
    The other nasty thing about money is that it's an int64 so it can't
    represent more than 2^63 pennies (for whatever a "penny" is).  Now,
    that's still a Frickin Lot Of Money in any non-hyperinflated currency,
    but it's the sort of restriction that banks don't like to hear of.
    
                regards, tom lane
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Guyren Howe
    Date:
    On 29 Feb 2024, at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

    - time with time zone *does* store the time zone, but this isn’t
    actually useful and should be avoided (I’m not entirely sure why and the
    docs only gesture at the problems without stating them, IIRC)

    No it doesn't store the time zone.  Nor do the docs say they do.  And
    clearly point out the issue that evaluating a time zone without both date
    and time inputs is basically pointless.

    timetz *does* store a time zone, in the sense of storing a numeric
    offset from UTC (i.e., "so many minutes east or west of Greenwich").
    The problem is that in most real-world applications your notion of
    "time zone" probably includes annual DST changes, which timetz can't
    represent.  I don't say the type is completely useless, but its
    usefulness is a lot less than you might guess.

    The closest I can come to this in the docs is:

    "The appropriate time zone offset is recorded in the time with time zone value and is output as stored; it is not adjusted to the active time zone.”

    I expect to be submitting some documentation updates as part of this project, fwiw.

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Paul Jungwirth
    Date:
    On 2/29/24 14:47, Guyren Howe wrote:
    > What is worth saying about custom range types? What even would be a good use case? A range of 
    > strings or ip address ranges, something like that, I suppose?
    
    I wrote a blog post and an extension for an inetrange type. It's from 2016 so it might need some 
    updates, but you are welcome to use anything you like for your talk:
    
    https://illuminatedcomputing.com/posts/2016/06/inet-range/
    
    Yours,
    
    -- 
    Paul              ~{:-)
    pj@illuminatedcomputing.com
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Laurenz Albe
    Date:
    On Thu, 2024-02-29 at 13:38 -0800, Guyren Howe wrote:
    > what are the misconceptions, or where might I find them for  myself?
    
    In addition to what was already said:
    
    > My current understanding:
    >  * character is fixed-length, blank-padded. Not sure when you’d
    >    want that, but it seems clear. Is the name just confusing?
    
    I find the semantics confusing:
    
      test=> SELECT 'a'::character(10);
         bpchar
      ════════════
       a
      (1 row)
    
    Ok, it is 10 characters long.
    
      test=> SELECT length('a'::character(10));
       length
      ════════
            1
      (1 row)
    
    Or is it?
    
      test=> SELECT 'a'::character(10) || 'b'::character(10);
       ?column?
      ══════════
       ab
      (1 row)
    
    And why is the result not 20 characters long, with spaces between "a" and "b"?
    
    Best avoid "character".
    
    >  * timestamptz is just converted to a timestamp in UTC. Folks might
    >    imagine that it stores the time zone but it doesn’t.
    
    Yes, and I find that lots of people are confused by that.
    
    You could talk about the interaction with the "timezone" parameter, and
    that it is not so much a timestamp with time zone, but an "absolute timestamp",
    and in combination with "timestamp" a great way to let the database handle
    the difficult task of time zone conversion for you.
    
    Yours,
    Laurenz Albe
    
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Adrian Klaver
    Date:
    On 3/1/24 01:18, Laurenz Albe wrote:
    > On Thu, 2024-02-29 at 13:38 -0800, Guyren Howe wrote:
    >> what are the misconceptions, or where might I find them for  myself?
    > 
    > In addition to what was already said:
    > 
    >> My current understanding:
    >>   * character is fixed-length, blank-padded. Not sure when you’d
    >>     want that, but it seems clear. Is the name just confusing?
    > 
    > I find the semantics confusing:
    > 
    >    test=> SELECT 'a'::character(10);
    >       bpchar
    >    ════════════
    >     a
    >    (1 row)
    > 
    > Ok, it is 10 characters long.
    > 
    >    test=> SELECT length('a'::character(10));
    >     length
    >    ════════
    >          1
    >    (1 row)
    > 
    > Or is it?
    
    https://www.postgresql.org/docs/current/datatype-character.html
    
    "Values of type character are physically padded with spaces to the 
    specified width n, and are stored and displayed that way. However, 
    trailing spaces are treated as semantically insignificant and 
    disregarded when comparing two values of type character. In collations 
    where whitespace is significant, this behavior can produce unexpected 
    results; for example SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2) 
    returns true, even though C locale would consider a space to be greater 
    than a newline. Trailing spaces are removed when converting a character 
    value to one of the other string types. Note that trailing spaces are 
    semantically significant in character varying and text values, and when 
    using pattern matching, that is LIKE and regular expressions."
    
    >
    >    test=> SELECT 'a'::character(10) || 'b'::character(10);
    >     ?column?
    >    ══════════
    >     ab
    >    (1 row)
    > 
    > And why is the result not 20 characters long, with spaces between "a" and "b"?
    
    SELECT pg_typeof('a'::character(10) || 'b'::character(10));
      pg_typeof
    -----------
      text
    
    This is covered by  "Trailing spaces are removed when converting a 
    character value to one of the other string types.".
    
    Though that still leaves you with:
    
    SELECT pg_typeof(('a'::character(10) || 'b'::character(10))::char(20));
      pg_typeof
    -----------
      character
    
    SELECT ('a'::character(10) || 'b'::character(10))::char(20);
             bpchar
    ----------------------
      ab
    
    
    
    > 
    > Best avoid "character".
    > 
    >>   * timestamptz is just converted to a timestamp in UTC. Folks might
    >>     imagine that it stores the time zone but it doesn’t.
    > 
    > Yes, and I find that lots of people are confused by that.
    > 
    > You could talk about the interaction with the "timezone" parameter, and
    > that it is not so much a timestamp with time zone, but an "absolute timestamp",
    > and in combination with "timestamp" a great way to let the database handle
    > the difficult task of time zone conversion for you.
    > 
    > Yours,
    > Laurenz Albe
    > 
    > 
    > 
    
    -- 
    Adrian Klaver
    adrian.klaver@aklaver.com
    
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    grimy.outshine830@aceecat.org
    Date:
    On Thu, Feb 29, 2024 at 05:51:11PM -0500, Tom Lane wrote:
    
    > >> money is a fixed-point decimal value, the number of decimal
    > >> places is locale determined. I’m not aware of any particular
    > >> problems with that
    
    > > You forget about the currency symbol dynamic. Like with time zones
    > > the local session provides the context, not the stored data.
    
    > Yeah.  The fact that the same stored value might look like 10.00
    > euros to one session and 1000 yen to another one is pretty
    > catastrophic.  The other nasty thing about money is that it's an
    > int64 so it can't represent more than 2^63 pennies (for whatever a
    > "penny" is).  Now, that's still a Frickin Lot Of Money in any
    > non-hyperinflated currency, but it's the sort of restriction that
    > banks don't like to hear of.
    
    Lame excuse first: I have never used the money type, probably because
    I overheard a word like "catastrophic" in my early development :-)
    
    But, doesn't what Tom says above contradict Adrian's example session?
    Either the stored value is re-interpreted according to the locale
    context, or it isn't. IMO it would be *more* catastrophic if it wasn't,
    as it looks from Adrian's example.
    
    -- 
    Ian
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Adrian Klaver
    Date:
    On 3/1/24 11:27, grimy.outshine830@aceecat.org wrote:
    > On Thu, Feb 29, 2024 at 05:51:11PM -0500, Tom Lane wrote:
    > 
    >>>> money is a fixed-point decimal value, the number of decimal
    >>>> places is locale determined. I’m not aware of any particular
    >>>> problems with that
    > 
    >>> You forget about the currency symbol dynamic. Like with time zones
    >>> the local session provides the context, not the stored data.
    > 
    >> Yeah.  The fact that the same stored value might look like 10.00
    >> euros to one session and 1000 yen to another one is pretty
    >> catastrophic.  The other nasty thing about money is that it's an
    >> int64 so it can't represent more than 2^63 pennies (for whatever a
    >> "penny" is).  Now, that's still a Frickin Lot Of Money in any
    >> non-hyperinflated currency, but it's the sort of restriction that
    >> banks don't like to hear of.
    > 
    > Lame excuse first: I have never used the money type, probably because
    > I overheard a word like "catastrophic" in my early development :-)
    > 
    > But, doesn't what Tom says above contradict Adrian's example session?
    > Either the stored value is re-interpreted according to the locale
    > context, or it isn't. IMO it would be *more* catastrophic if it wasn't,
    > as it looks from Adrian's example.
    > 
    
    test=# set lc_monetary='de_DE.utf8';
    SET
    test=# select 10::money;
       money
    ---------
      10,00 €
    (1 row)
    
    test=# set lc_monetary='ja_JP.utf8';
    SET
    test=# select 10::money;
      money
    -------
       ¥10
    (1 row)
    
    
    
    -- 
    Adrian Klaver
    adrian.klaver@aklaver.com
    
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Tom Lane
    Date:
    grimy.outshine830@aceecat.org writes:
    > On Thu, Feb 29, 2024 at 05:51:11PM -0500, Tom Lane wrote:
    >> Yeah.  The fact that the same stored value might look like 10.00
    >> euros to one session and 1000 yen to another one is pretty
    >> catastrophic.
    
    > But, doesn't what Tom says above contradict Adrian's example session?
    
    No, what he showed was correct.  I'm talking about a different
    facet of the problem:
    
    postgres=# show lc_monetary;
     lc_monetary 
    -------------
     en_US.utf8
    (1 row)
    
    postgres=# create table t (m money);
    CREATE TABLE
    postgres=# insert into t values('$1000.00');
    INSERT 0 1
    postgres=# table t;
         m     
    -----------
     $1,000.00
    (1 row)
    
    postgres=# set lc_monetary = 'ja_JP.utf8';
    SET
    postgres=# table t;
         m     
    -----------
     ¥100,000
    (1 row)
    
    Even if that took account of the exchange rate, it'd not be great.
    But it doesn't; it's just the same digits reinterpreted with a new
    currency sign and possibly a different number of fractional digits.
    This might be sort of tolerable if your database only ever deals in
    one currency, but even then you'd likely want to lock down what
    that currency is.  Making it be controlled by a user-set GUC was
    probably not a great idea.
    
                regards, tom lane
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    Laurenz Albe
    Date:
    On Fri, 2024-03-01 at 08:14 -0800, Adrian Klaver wrote:
    
    > [pointers to the documentation]
    
    I know.  I was not searching for help with PostgreSQL,
    I was trying to point out the surprising behavior of
    "character" as a suggestion for the talk.
    
    Yours,
    Laurenz Albe
    
    
    
    

    Re: Content for talk on Postgres Type System at PostgresConf

    From
    grimy.outshine830@aceecat.org
    Date:
    On Fri, Mar 01, 2024 at 03:25:35PM -0500, Tom Lane wrote:
    
    > No, what he showed was correct.  I'm talking about a different facet
    > of the problem:
    
    > ...
    
    > Even if that took account of the exchange rate, it'd not be great.
    > But it doesn't; it's just the same digits reinterpreted with a new
    > currency sign and possibly a different number of fractional digits.
    > This might be sort of tolerable if your database only ever deals in
    > one currency, but even then you'd likely want to lock down what that
    > currency is.  Making it be controlled by a user-set GUC was probably
    > not a great idea.
    
    Yes, I get it now, thanks. Not useful, I agree.
    
    -- 
    Ian