Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze - Mailing list pgsql-hackers

From Zoltan Boszormenyi
Subject Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze
Date
Msg-id 44CFE09A.7070203@dunaweb.hu
Whole thread Raw
In response to Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze  (Rod Taylor <pg@rbt.ca>)
Responses Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze
List pgsql-hackers
Rod Taylor írta:
> On Tue, 2006-08-01 at 18:10 +0200, Zoltan Boszormenyi wrote:
>   
>> Hi,
>>
>> I have progressed a bit with my pet project, a.k.a $SUBJECT.
>>
>> Now GENERATED ALWAYS AS IDENTITY and
>> GENERATED ALWAYS AS ( expr ) work as
>> intended. Documentation was also extended.
>>     
>
> I'm only commenting because I debated trying to implement this feature a
> couple of times.

Thanks for commenting it.

>  The ugliness required for pg_dump put me off of doing
> it.
>   

I haven't looked into it yet.

> I did not see a test for enforcement during COPY.

That was sort of intended, COPY is expected
to pull back the same record it wrote out.
But see below.

>  UPDATE restrictions
> appear to have been missed as well:
>
>         4) If <set clause> SC specifies an <object column> that
>         references a column of which some underlying column is either a
>         generated column or an identity column whose descriptor
>         indicates that values are always generated, then the <update
>         source> specified in SC shall consist of a <default
>         specification>.
>
> <object column> is the <update target>, or the left hand side of the
> equation. In short, if a column marked GENERATED ALWAYS is updated then
> it must be to DEFAULT or not provided as an update target.
>
>         CREATE TABLE tab (col integer GENERATED ALWAYS AS IDENTITY);
>         UPDATE tab SET col = DEFAULT; -- ACCEPTED
>         UPDATE tab SET col = 1; -- ERROR
>   

Yes, I have also read that detail but not yet implemented it.
I was too happy that I found a straightforward way to make
GENERATED ALWAYS work.

> For db restoration (pg_dump), how do you restore to the same values as
> previously if it is always regenerated? By making ALWAYS a suggestion
> for some users instead of always enforced and providing an override
> mechanism for it. I assume it only works for relation owners but I've
> not figured out how the spec does permissions.
>
>         <override clause> ::=
>             OVERRIDING USER VALUE
>           | OVERRIDING SYSTEM VALUE
>
> In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
> copy for relations with an GENERATED ALWAYS identity column and the
> backend will need to respect that.
>   

Aren't INSERT and COPY distinguished in code paths?
(I don't have too deep knowledge about PostgreSQL internals, yet.)
If they are, OVERRIDING SYSTEM VALUE will be
needed only when pg_dump produces INSERTs.

> ALWAYS is really only enforced for anyone who doesn't have permission to
> specify otherwise.
>
>
> Another one that got me is what do you do if you do this:
>
>         CREATE TABLE tab (col integer);
>         INSERT INTO tab VALUES (10);
>         ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;
>
> What is the value for "tab"."col"? It would seem that the table should
> be rewritten with all values for "col" recalculated -- thus it would be
> '1'. But wait! Can we add the <override clause> here too to keep the old
> values and change the enforcement for new tuples only?
>   

I don't think we should rewrite existing rows because
when it was inserted, the stored value was valid
according to the rules at that time. What if you
have more than one rows in that table?
Which record gets which value? You cannot know
which record was inserted first because subsequent
updates may ruin that order before the ALTER TABLE.
And recalculating the max value of col isn't too reliable
if another session is also inserting records.

And what about non-unique columns?
Plain SERIALs aren't declared unique automatically, either.
Consider the following:

CREATE TABLE tab (col integer);
INSERT INTO tab VALUES (10);
ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY( MINVALUE 1 MAXVALUE 12 CYCLE );


Here I expect equal values and I don't want
existing rows rewritten.

E.g. if you want a new start value, you will also need to issue
ALTER TABLE tab ALTER col RESTART WITH n;
which I started to implement.

Also, for a unique SERIAL column, you can still
insert a record with an out-of-order number and
one of the INSERTs that reach that number will
fail with unique violation. e.g. it's not a real
autoincrementer field. Or you can alter a
sequence that supports such a column.
PostgreSQL documents both behaviour and
I wanted to keep it.

Thanks for the comments,
Zoltán Böszörményi

>> Some test cases are also included, that shows
>> that ALTER TABLE ALTER TYPE keeps both
>> the sequence and the GENERATED ALWAYS
>> property. Gzipped patch is attached.
>>
>> Next steps are:
>> - pg_dump support
>> - more ALTER TABLE support for adding and
>>   dropping IDENTITY and GENERATED ALWAYS
>>   features
>> - more testing
>>
>> I still maintain that I don't see any standard
>> requirement between the GENERATED AS IDENTITY
>> and NEXT VALUE FOR but obviously both
>> require SEQUENCE as supported feature
>> in parallel. I can be proven wrong, though,
>> but please, quote section# and text where
>> it can be found in the standard.
>>
>> As for why GENERATED ALWAYS AS IDENTITY
>> is useful? Consider someone who is coming from
>> another DBMS (Informix, Access, etc.) where
>> "INSERT INTO table (id, ...) VALUES (0, ...);"
>> inserts the next value for the autoincrementer field
>> instead of 0. Leaving out fields from INSERT is
>> not allowed in the source because of documentation
>> reasons and writing DEFAULT is not handy or not
>> found in that legacy DBMS' features.
>> Multiply it with N applications that was written
>> that way over the years of the lifespan of a large
>> project, count in the human resistance to learn
>> something new (say 2.5x multiplier, but that may be
>> under-estimated :-) ) and a feature that help porting
>> easier will be a cheered feature. IIRC Bruce Momjian
>> himself wrote in this list that ease-of-use features
>> can boost PostgreSQL userbase pretty quickly.
>>
>> So, please, review my patch in it's current state
>> and decide whether it's a 8.2-worthy feature.
>>
>> BTW, is there anyone working on COPY FROM ( select ) feature?
>>
>> Thanks in advance and best regards,
>> Zoltán Böszörményi
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>     



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [PATCHES] Replication Documentation
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCHES] Replication Documentation