Re: [HACKERS] generated columns - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: [HACKERS] generated columns |
Date | |
Msg-id | 79e4de23-52e3-2c45-9ba0-5f6932f590aa@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] generated columns (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Responses |
Re: [HACKERS] generated columns
Re: [HACKERS] generated columns Re: [HACKERS] generated columns Re: [HACKERS] generated columns Re: [HACKERS] generated columns Re: [HACKERS] generated columns Re: [HACKERS] generated columns |
List | pgsql-hackers |
On 05/03/2018 20:46, Peter Eisentraut wrote: > On 2/1/18 21:25, Michael Paquier wrote: >> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote: >>> That would be nice. I'm going to study this some more to see what can >>> be done. >> >> Thanks for the update. Note: Peter has moved the patch to next CF. > > I didn't get to updating this patch, so I'm closing it in this CF. Attached is a new version of this patch. Old news: This is a well-known SQL-standard feature, also available for instance in DB2, MySQL, Oracle. A quick example: CREATE TABLE t1 ( ..., height_cm numeric, height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) ); It supports both computed-on-write and computed-on-read variants, using the keywords STORED and VIRTUAL respectively. New news: Everything works more or less. Both STORED and VIRTUAL fully work now. I've done some refactoring to reduce the surface area of the patch. I also added some caching in the tuple descriptor's "const" area to avoid some performance overhead if no generated columns are used. There are some open questions about which I'll start separate subthreads for discussion. One thing I'd like reviewed now is the catalog representation. There are a couple of possible options, but changing them would have fairly deep code impact so it would help to get that settled soon. The general idea is that a generation expression is similar to a default, just applied at different times. So the actual generation expression is stored in pg_attrdef. The actual question is the representation in pg_attribute. Options: 1. (current implementation) New column attgenerated contains 's' for STORED, 'v' for VIRTUAL, '\0' for nothing. atthasdef means "there is something in pg_attrdef for this column". So a generated column would have atthasdef = true, and attgenerated = s/v. A traditional default would have atthasdef = true and attgenerated = '\0'. The advantage is that this is easiest to implement and the internal representation is the most useful and straightforward. The disadvantage is that old client code that wants to detect whether a column has a default would need to be changed (otherwise it would interpret a generated column as having a default value instead). 2. Alternative: A generated column has attgenerated = s/v but atthasdef = false, so that atthasdef means specifically "column has a default". Then a column would have a pg_attrdef entry for either attgenerated != '\0' or atthasdef = true. (Both couldn't be the case at the same time.) The advantage is that client code wouldn't need to be changed. But it's also possible that there is client code that just does a left join of pg_attribute and pg_attrdef without looking at atthasdef, so that would still be broken. The disadvantage is that the internal implementation would get considerably ugly. Most notably, the tuple descriptor would probably still look like #1, so there would have to be a conversion somewhere between variant #1 and #2. Or we'd have to duplicate all the tuple descriptor access code to keep that separate. There would be a lot of redundancy. 3. Radical alternative: Collapse everything into one new column. We could combine atthasdef and attgenerated and even attidentity into a new column. (Only one of the three can be the case.) This would give client code a clean break, which may or may not be good. The implementation would be uglier than #1 but probably cleaner than #2. We could also get 4 bytes back per pg_attribute row. I'm happy with the current choice #1, but it's worth thinking about. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: