IDENTITY/GENERATED patch - Mailing list pgsql-hackers
From | Zoltan Boszormenyi |
---|---|
Subject | IDENTITY/GENERATED patch |
Date | |
Msg-id | 47B985AC.2050907@cybertec.at Whole thread Raw |
Responses |
Re: IDENTITY/GENERATED patch
|
List | pgsql-hackers |
Hi, as the wishlist for PostgreSQL 8.4 still has my IDENTITY/GENERATED patches, I thought I refresh it. Before actually doing it though, I wanted to ask for opinions on implementation and ideas. Here are the general ideas that were done by my patch: 1. IDENTITY columns. The IDENTITY columns are similar to SERIALs as featured by current PostgreSQL with the differences below. They can be of almost any basic type, i.e. other than INTEGER or BIGINT. There are two types of them: GENERATED BY DEFAULT [ AS ( sequence_options ) ] and GENERATED ALWAYS [ AS ( sequence_options ) ] The first behaves more or less as SERIAL in PostgreSQL currently, i.e. upon INSERT and UPDATE such a field can be assigned a value explicitely, they can be updated, etc. So, this can be viewed as nothing more than a SERIAL pseudo-type column, followed by the statements below: ALTER TABLE ... ALTER COLUMN ... TYPE ...; ALTER SEQUENCE seq_name sequence_options; So, the above form is mostly a "syntax sugar" over already existing features. In fact, this was the reason the acceptance was shot down. The standard describes some behavioural difference that may make it worth to distinguish from plain SERIALs. However, IDENTITY GENERATED ALWAYS has some more constraints. This is more like the autoincrementing fields in other DBMSs, i.e. upon INSERT, usually the value "0" indicates the generation of the next value. With the standard behaviour, any value is ignored and the sequence next value is generated. UPDATE can only use the DEFAULT keyword, not an explicit value. It seems the last draft for SQL:2008 largely clarified details for IDENTITY columns that were either conflicting or unclear in SQL:2003. 2. GENERATED columns The GENERATED column is an automatic way of computing expressions over columns in the same row. This is a feature of DB2. Currently this can be implemented as a BEFORE TRIGGER in PostgreSQL but that requires being a little familiar writing PLPGSQL functions. The syntax of GENERATED columns allows using plain expressions, so it's an easy to use feature. The GENERATED columns are real, stored columns, not on-the-fly computed virtual columns. This means that the value computation cost is in the INSERT and UPDATE statements instead of the SELECTs. This provides a nice speedup which I tested but cannot currently find the mail in the archive. Despite this demonstrable speedup, exactly the non-virtual nature was the cause why Tom Lane didn't like it. There's a problem however with GENERATED columns that was pointed out, regarding BEFORE TRIGGERs. For speed, the GENERATED columns' values can be computed after all BEFORE TRIGGERs were executed. However, this causes that functions running as BEFORE TRIGGERs see the old value of the columns. This detail was mentioned in some of the previous drafts but not in the latest SQL:2008 draft. This means such columns MUST be recomputed before running every BEFORE TRIGGERs and after the last one, too. The performance hit in this case is large. A GUC variable can be introduced to make both the fast and the correct behaviour possible. Based on my last patch: http://archives.postgresql.org/pgsql-patches/2007-05/msg00076.php what problems can you see in it that needs fixing? I know, it extends on the standard. (e.g. DROP IDENTITY, etc.) However, would a strictly standard compliant IDENTITY/GENERATED get accepted into PostgreSQL? Which extensions in the patch are worth keeping? There are two possible problems that I know of. I need to fix the OVERRIDING USER VALUE case as it isn't a NOP. This convinced me that PostgreSQL SERIAL is not equivalent to the IDENTITY GENERATED BY DEFAULT column and the requirement that only one IDENTITY column may exists in a table is strongly assumed by the OVERRIDING clauses. And I just discovered an extension of TRUNCATE TABLE in SQL:2008 that make it possible to restart the sequence behind the identity column upon TRUNCATE. What else? -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
pgsql-hackers by date: