Thread: Thoughts on user-defined types for talk at Postgres conference?

Thoughts on user-defined types for talk at Postgres conference?

From
Guyren Howe
Date:
I am doing a talk at Postgres Conf about Postgres’s type system. I already asked about this and got some great responses: 

https://www.postgresql.org/message-id/flat/b82f8886db61a3395e6eab94981258274cdbacd3.camel%40cybertec.at#f13288b72bd2d564ee758c1a273652ad

Those responses discussed mostly gotchas with built-in types. Lots of good stuff, thanks, all!

But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay out how to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, so just the basic “user defined sum type”), but also any gotchas.

And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this feature is greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other than some ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectly well in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult.

So, any gotchas with user defined types? Any thoughts about designing with them?

Re: Thoughts on user-defined types for talk at Postgres conference?

From
Stephen Frost
Date:
Greetings,

* Guyren Howe (guyren@gmail.com) wrote:
> But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay
outhow to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions,
sojust the basic “user defined sum type”), but also any gotchas. 

What I take it you're referring to here are most specifically called
"Composite Types", as discussed here:

https://www.postgresql.org/docs/current/sql-createtype.html

> And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this
featureis greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other
thansome ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work
perfectlywell in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t
difficult.

Yeah, it's a bit grotty when you're trying to reference them (as you
note with the comment about the extra parentheses) but it's also a bit
painful building them up to supply as input (though not impossible).

> So, any gotchas with user defined types? Any thoughts about designing with them?

The biggest gotcha (which is a bit different from the usability issues,
which we discuss above) from my perspective is the serialization of
composite types- we use up a lot more space to store a composite type
that looks like:

CREATE TYPE twoint AS (a int, b int);

then if we create a table as:

CREATE TABLE t1 (a int, b int);

Let's show this, follow the above two commands, do:

CREATE TABLE t2 (c1 twoint);

INSERT INTO t1 VALUES (1,2);
INSERT INTO t2 VALUES ('(1,2)');

=*# select pg_column_size(a) from t1;
 pg_column_size
----------------
              4
(1 row)

=*# select pg_column_size(b) from t1;
 pg_column_size
----------------
              4
(1 row)

=*# select pg_column_size(a) from t2;
 pg_column_size
----------------
             29
(1 row)

We can see it for the whole row too:

=*# select pg_column_size(t1.*) from t1;
 pg_column_size
----------------
             32
(1 row)

=*# select pg_column_size(t2.*) from t2;
 pg_column_size
----------------
             53
(1 row)

That's an additional 21 bytes, which is really quite a lot.  What's
included in those 21 bytes are the length (since it's now a
variable-length column to PG, unlike the case with the individual int
columns in the table where we know from the table structure the length
of them), the type information (typmod if there is one and the OID of
the composite type), some other info, and then the actual data of the
two int values inside of the composite type.

I've often wondered if there's a way to reduce this overhead, as I do
think that plus some improvements on the usability side would go a long
way to making composite types more appealing to users.  Still, they are
certainly handy in some instances, I just can't recommend heavy use of
them for large data sets where size is particularly important (such as
in heavy OLTP environments) due to their overhead.

Thanks!

Stephen

Attachment
Thanks for that!

It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length value.
On 6 Mar 2024 at 09:45 -0800, Stephen Frost <sfrost@snowman.net>, wrote:
Greetings,

* Guyren Howe (guyren@gmail.com) wrote:
But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay out how to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, so just the basic “user defined sum type”), but also any gotchas.

What I take it you're referring to here are most specifically called
"Composite Types", as discussed here:

https://www.postgresql.org/docs/current/sql-createtype.html

And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this feature is greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other than some ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectly well in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult.

Yeah, it's a bit grotty when you're trying to reference them (as you
note with the comment about the extra parentheses) but it's also a bit
painful building them up to supply as input (though not impossible).

So, any gotchas with user defined types? Any thoughts about designing with them?

The biggest gotcha (which is a bit different from the usability issues,
which we discuss above) from my perspective is the serialization of
composite types- we use up a lot more space to store a composite type
that looks like:

CREATE TYPE twoint AS (a int, b int);

then if we create a table as:

CREATE TABLE t1 (a int, b int);

Let's show this, follow the above two commands, do:

CREATE TABLE t2 (c1 twoint);

INSERT INTO t1 VALUES (1,2);
INSERT INTO t2 VALUES ('(1,2)');

=*# select pg_column_size(a) from t1;
pg_column_size
----------------
4
(1 row)

=*# select pg_column_size(b) from t1;
pg_column_size
----------------
4
(1 row)

=*# select pg_column_size(a) from t2;
pg_column_size
----------------
29
(1 row)

We can see it for the whole row too:

=*# select pg_column_size(t1.*) from t1;
pg_column_size
----------------
32
(1 row)

=*# select pg_column_size(t2.*) from t2;
pg_column_size
----------------
53
(1 row)

That's an additional 21 bytes, which is really quite a lot. What's
included in those 21 bytes are the length (since it's now a
variable-length column to PG, unlike the case with the individual int
columns in the table where we know from the table structure the length
of them), the type information (typmod if there is one and the OID of
the composite type), some other info, and then the actual data of the
two int values inside of the composite type.

I've often wondered if there's a way to reduce this overhead, as I do
think that plus some improvements on the usability side would go a long
way to making composite types more appealing to users. Still, they are
certainly handy in some instances, I just can't recommend heavy use of
them for large data sets where size is particularly important (such as
in heavy OLTP environments) due to their overhead.

Thanks!

Stephen

Re: Thoughts on user-defined types for talk at Postgres conference?

From
Stephen Frost
Date:
Greetings,

* Guyren Howe (guyren@gmail.com) wrote:
> Thanks for that!

We'd generally prefer that responses are in-line (such as I'm doing
here) and not top-posted (as you did on your response).  This helps
others reading the lists follow the discussion.

> It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length
value.

... and I nearly missed this comment due to the above.

While I can understand where you're coming from with that, it's
certainly not something the system is built to handle today and, at
least from my recent review of this area, looks like there would be a
fair bit of code involved to get to that point.

I do agree with you that this would be very nice to have and perhaps
someone will find interest and time to work on moving us in that
direction in the future.

Thanks!

Stephen

Attachment
Stephen Frost <sfrost@snowman.net> writes:
> * Guyren Howe (guyren@gmail.com) wrote:
>> It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length
value.

> While I can understand where you're coming from with that, it's
> certainly not something the system is built to handle today and, at
> least from my recent review of this area, looks like there would be a
> fair bit of code involved to get to that point.

There are some other fundamental problems here:

* To conclude that a tuple of only fixed-length column types is
itself fixed-length, you need an additional assumption that all
the columns are not-null.  This is pretty problematic because
we don't even have a way to enforce such a requirement against
generic composite types; we really only promise to enforce
column constraints against columns of physical tables.

* Even if all the semantic enforcement of that were bulletproof,
we'd still have a big problem with compatibility of the on-disk
representation between current releases and a future release
that thought it could elide some part of the tuple header for
all-not-null stored composite values.

I don't say that these problems are insoluble, but they do
look pretty difficult.

            regards, tom lane



Re: Thoughts on user-defined types for talk at Postgres conference?

From
"Peter J. Holzer"
Date:
On 2024-03-06 12:45:33 -0500, Stephen Frost wrote:
> =*# select pg_column_size(t2.*) from t2;
>  pg_column_size
> ----------------
>              53
> (1 row)
>
> That's an additional 21 bytes, which is really quite a lot.  What's
> included in those 21 bytes are
[...]
> the type information (typmod if there is one and the OID of the
> composite type),

Is it necessary to store this in every row? Can a column contain
different composite types?

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
"Peter J. Holzer" <hjp-pgsql@hjp.at> writes:
> On 2024-03-06 12:45:33 -0500, Stephen Frost wrote:
>> That's an additional 21 bytes, which is really quite a lot.  What's
>> included in those 21 bytes are
>> [...]
>> the type information (typmod if there is one and the OID of the
>> composite type),

> Is it necessary to store this in every row? Can a column contain
> different composite types?

Operations on type RECORD need to be able to ascertain which concrete
rowtype they're dealing with, so yes the type info is critical in
that case.  Even without that, it wouldn't be terribly practical
to insist on creating a separate copy of record_out (and every other
function that accepts composite types) just so that it could have a
hard-wired notion of what rowtype it's going to deal with.

            regards, tom lane