Re: [HACKERS] generated columns - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: [HACKERS] generated columns
Date
Msg-id 79e4de23-52e3-2c45-9ba0-5f6932f590aa@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] generated columns  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: [HACKERS] generated columns
Re: [HACKERS] generated columns
Re: [HACKERS] generated columns
Re: [HACKERS] generated columns
Re: [HACKERS] generated columns
Re: [HACKERS] generated columns
Re: [HACKERS] generated columns
List pgsql-hackers
On 05/03/2018 20:46, Peter Eisentraut wrote:
> On 2/1/18 21:25, Michael Paquier wrote:
>> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
>>> That would be nice.  I'm going to study this some more to see what can
>>> be done.
>>
>> Thanks for the update.  Note: Peter has moved the patch to next CF.
> 
> I didn't get to updating this patch, so I'm closing it in this CF.

Attached is a new version of this patch.

Old news:

This is a well-known SQL-standard feature, also available for instance
in DB2, MySQL, Oracle.  A quick example:

  CREATE TABLE t1 (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
  );

It supports both computed-on-write and computed-on-read variants, using
the keywords STORED and VIRTUAL respectively.

New news:

Everything works more or less.  Both STORED and VIRTUAL fully work now.
I've done some refactoring to reduce the surface area of the patch.  I
also added some caching in the tuple descriptor's "const" area to avoid
some performance overhead if no generated columns are used.

There are some open questions about which I'll start separate subthreads
for discussion.

One thing I'd like reviewed now is the catalog representation.  There
are a couple of possible options, but changing them would have fairly
deep code impact so it would help to get that settled soon.

The general idea is that a generation expression is similar to a
default, just applied at different times.  So the actual generation
expression is stored in pg_attrdef.  The actual question is the
representation in pg_attribute.  Options:

1. (current implementation) New column attgenerated contains 's' for
STORED, 'v' for VIRTUAL, '\0' for nothing.  atthasdef means "there is
something in pg_attrdef for this column".  So a generated column would
have atthasdef = true, and attgenerated = s/v.  A traditional default
would have atthasdef = true and attgenerated = '\0'.  The advantage is
that this is easiest to implement and the internal representation is the
most useful and straightforward.  The disadvantage is that old client
code that wants to detect whether a column has a default would need to
be changed (otherwise it would interpret a generated column as having a
default value instead).

2. Alternative: A generated column has attgenerated = s/v but atthasdef
= false, so that atthasdef means specifically "column has a default".
Then a column would have a pg_attrdef entry for either attgenerated !=
'\0' or atthasdef = true.  (Both couldn't be the case at the same time.)
 The advantage is that client code wouldn't need to be changed.  But
it's also possible that there is client code that just does a left join
of pg_attribute and pg_attrdef without looking at atthasdef, so that
would still be broken.  The disadvantage is that the internal
implementation would get considerably ugly.  Most notably, the tuple
descriptor would probably still look like #1, so there would have to be
a conversion somewhere between variant #1 and #2.  Or we'd have to
duplicate all the tuple descriptor access code to keep that separate.
There would be a lot of redundancy.

3. Radical alternative: Collapse everything into one new column.  We
could combine atthasdef and attgenerated and even attidentity into a new
column.  (Only one of the three can be the case.)  This would give
client code a clean break, which may or may not be good.  The
implementation would be uglier than #1 but probably cleaner than #2.  We
could also get 4 bytes back per pg_attribute row.

I'm happy with the current choice #1, but it's worth thinking about.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench doc fix