Thread: Thoughts on user-defined types for talk at Postgres conference?
I am doing a talk at Postgres Conf about Postgres’s type system. I already asked about this and got some great responses:
https://www.postgresql.org/message-id/flat/b82f8886db61a3395e6eab94981258274cdbacd3.camel%40cybertec.at#f13288b72bd2d564ee758c1a273652ad
Those responses discussed mostly gotchas with built-in types. Lots of good stuff, thanks, all!
But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay out how to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, so just the basic “user defined sum type”), but also any gotchas.
And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this feature is greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other than some ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectly well in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult.
So, any gotchas with user defined types? Any thoughts about designing with them?
https://www.postgresql.org/message-id/flat/b82f8886db61a3395e6eab94981258274cdbacd3.camel%40cybertec.at#f13288b72bd2d564ee758c1a273652ad
Those responses discussed mostly gotchas with built-in types. Lots of good stuff, thanks, all!
But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay out how to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, so just the basic “user defined sum type”), but also any gotchas.
And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this feature is greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other than some ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectly well in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult.
So, any gotchas with user defined types? Any thoughts about designing with them?
Greetings, * Guyren Howe (guyren@gmail.com) wrote: > But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay outhow to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, sojust the basic “user defined sum type”), but also any gotchas. What I take it you're referring to here are most specifically called "Composite Types", as discussed here: https://www.postgresql.org/docs/current/sql-createtype.html > And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this featureis greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other thansome ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectlywell in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult. Yeah, it's a bit grotty when you're trying to reference them (as you note with the comment about the extra parentheses) but it's also a bit painful building them up to supply as input (though not impossible). > So, any gotchas with user defined types? Any thoughts about designing with them? The biggest gotcha (which is a bit different from the usability issues, which we discuss above) from my perspective is the serialization of composite types- we use up a lot more space to store a composite type that looks like: CREATE TYPE twoint AS (a int, b int); then if we create a table as: CREATE TABLE t1 (a int, b int); Let's show this, follow the above two commands, do: CREATE TABLE t2 (c1 twoint); INSERT INTO t1 VALUES (1,2); INSERT INTO t2 VALUES ('(1,2)'); =*# select pg_column_size(a) from t1; pg_column_size ---------------- 4 (1 row) =*# select pg_column_size(b) from t1; pg_column_size ---------------- 4 (1 row) =*# select pg_column_size(a) from t2; pg_column_size ---------------- 29 (1 row) We can see it for the whole row too: =*# select pg_column_size(t1.*) from t1; pg_column_size ---------------- 32 (1 row) =*# select pg_column_size(t2.*) from t2; pg_column_size ---------------- 53 (1 row) That's an additional 21 bytes, which is really quite a lot. What's included in those 21 bytes are the length (since it's now a variable-length column to PG, unlike the case with the individual int columns in the table where we know from the table structure the length of them), the type information (typmod if there is one and the OID of the composite type), some other info, and then the actual data of the two int values inside of the composite type. I've often wondered if there's a way to reduce this overhead, as I do think that plus some improvements on the usability side would go a long way to making composite types more appealing to users. Still, they are certainly handy in some instances, I just can't recommend heavy use of them for large data sets where size is particularly important (such as in heavy OLTP environments) due to their overhead. Thanks! Stephen
Attachment
Thanks for that!
It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length value.
It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length value.
On 6 Mar 2024 at 09:45 -0800, Stephen Frost <sfrost@snowman.net>, wrote:
Greetings,
* Guyren Howe (guyren@gmail.com) wrote:But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay out how to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, so just the basic “user defined sum type”), but also any gotchas.
What I take it you're referring to here are most specifically called
"Composite Types", as discussed here:
https://www.postgresql.org/docs/current/sql-createtype.htmlAnd I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this feature is greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other than some ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectly well in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult.
Yeah, it's a bit grotty when you're trying to reference them (as you
note with the comment about the extra parentheses) but it's also a bit
painful building them up to supply as input (though not impossible).So, any gotchas with user defined types? Any thoughts about designing with them?
The biggest gotcha (which is a bit different from the usability issues,
which we discuss above) from my perspective is the serialization of
composite types- we use up a lot more space to store a composite type
that looks like:
CREATE TYPE twoint AS (a int, b int);
then if we create a table as:
CREATE TABLE t1 (a int, b int);
Let's show this, follow the above two commands, do:
CREATE TABLE t2 (c1 twoint);
INSERT INTO t1 VALUES (1,2);
INSERT INTO t2 VALUES ('(1,2)');
=*# select pg_column_size(a) from t1;
pg_column_size
----------------
4
(1 row)
=*# select pg_column_size(b) from t1;
pg_column_size
----------------
4
(1 row)
=*# select pg_column_size(a) from t2;
pg_column_size
----------------
29
(1 row)
We can see it for the whole row too:
=*# select pg_column_size(t1.*) from t1;
pg_column_size
----------------
32
(1 row)
=*# select pg_column_size(t2.*) from t2;
pg_column_size
----------------
53
(1 row)
That's an additional 21 bytes, which is really quite a lot. What's
included in those 21 bytes are the length (since it's now a
variable-length column to PG, unlike the case with the individual int
columns in the table where we know from the table structure the length
of them), the type information (typmod if there is one and the OID of
the composite type), some other info, and then the actual data of the
two int values inside of the composite type.
I've often wondered if there's a way to reduce this overhead, as I do
think that plus some improvements on the usability side would go a long
way to making composite types more appealing to users. Still, they are
certainly handy in some instances, I just can't recommend heavy use of
them for large data sets where size is particularly important (such as
in heavy OLTP environments) due to their overhead.
Thanks!
Stephen
Greetings, * Guyren Howe (guyren@gmail.com) wrote: > Thanks for that! We'd generally prefer that responses are in-line (such as I'm doing here) and not top-posted (as you did on your response). This helps others reading the lists follow the discussion. > It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length value. ... and I nearly missed this comment due to the above. While I can understand where you're coming from with that, it's certainly not something the system is built to handle today and, at least from my recent review of this area, looks like there would be a fair bit of code involved to get to that point. I do agree with you that this would be very nice to have and perhaps someone will find interest and time to work on moving us in that direction in the future. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Guyren Howe (guyren@gmail.com) wrote: >> It seems as though a Composite Type having only fixed-length fields should be able to be regarded as a fixed-length value. > While I can understand where you're coming from with that, it's > certainly not something the system is built to handle today and, at > least from my recent review of this area, looks like there would be a > fair bit of code involved to get to that point. There are some other fundamental problems here: * To conclude that a tuple of only fixed-length column types is itself fixed-length, you need an additional assumption that all the columns are not-null. This is pretty problematic because we don't even have a way to enforce such a requirement against generic composite types; we really only promise to enforce column constraints against columns of physical tables. * Even if all the semantic enforcement of that were bulletproof, we'd still have a big problem with compatibility of the on-disk representation between current releases and a future release that thought it could elide some part of the tuple header for all-not-null stored composite values. I don't say that these problems are insoluble, but they do look pretty difficult. regards, tom lane
On 2024-03-06 12:45:33 -0500, Stephen Frost wrote: > =*# select pg_column_size(t2.*) from t2; > pg_column_size > ---------------- > 53 > (1 row) > > That's an additional 21 bytes, which is really quite a lot. What's > included in those 21 bytes are [...] > the type information (typmod if there is one and the OID of the > composite type), Is it necessary to store this in every row? Can a column contain different composite types? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
"Peter J. Holzer" <hjp-pgsql@hjp.at> writes: > On 2024-03-06 12:45:33 -0500, Stephen Frost wrote: >> That's an additional 21 bytes, which is really quite a lot. What's >> included in those 21 bytes are >> [...] >> the type information (typmod if there is one and the OID of the >> composite type), > Is it necessary to store this in every row? Can a column contain > different composite types? Operations on type RECORD need to be able to ascertain which concrete rowtype they're dealing with, so yes the type info is critical in that case. Even without that, it wouldn't be terribly practical to insist on creating a separate copy of record_out (and every other function that accepts composite types) just so that it could have a hard-wired notion of what rowtype it's going to deal with. regards, tom lane