Thread: Remove implicit unique index creation on SERIAL columns?
I'd like to propose dropping the auto-creation of UNIQUE indexes on serial columns for the following reasons: 1. Serials with indexes are quite difficult to handle in pg_dump. It means that the implicitly created unique index must be destroyed prior to loading the data, then re-created afterward else risk a performance hit. 2. SERIAL columns are usually used as the primary key of the table. As such one must specify PRIMARY KEY at creation time to override the implicitly created UNIQUE index. 3. Consistency with other databases. MySQL's AUTO_INCREMENT suggests quite heavily that an index be applied, but it doesn't appear to do it for you. Oracle has SEQUENCES, but it is up to the user to associate them with a column as per my understanding. At least, thats all I could find. SAP -> SERIAL and SERIAL(n). No index is created (creates sequence / default) A nice side effect is that analyze.c will become somewhat cleaner. Any thoughts? Does anyone create serial columns without manually specifying UNIQUE when wanting an index? Examples in our documentation do. See section 5.1.4 in the current docset: http://developer.postgresql.org/docs/postgres/datatype.html Removal of implicit UNIQUE index creation would not affect structure of current or prior db versions -- strictly new tables on 7.3.
Rod Taylor wrote: > I'd like to propose dropping the auto-creation of UNIQUE indexes on > serial columns for the following reasons: > > 1. Serials with indexes are quite difficult to handle in pg_dump. It > means that the implicitly created unique index must be destroyed prior > to loading the data, then re-created afterward else risk a performance > hit. > > > 2. SERIAL columns are usually used as the primary key of the table. As > such one must specify PRIMARY KEY at creation time to override the > implicitly created UNIQUE index. > > 3. Consistency with other databases. > > MySQL's AUTO_INCREMENT suggests quite heavily that an index be applied, > but it doesn't appear to do it for you. > > Oracle has SEQUENCES, but it is up to the user to associate them with a > column as per my understanding. At least, thats all I could find. > > SAP -> SERIAL and SERIAL(n). No index is created (creates sequence / > default) FWIW, also MSSQL. I agree 100%. If you want an index, unique constraint, or primary key on a SERIAL, I think you should explicitly add it. SERIAL should give me a column that automatically increments -- no more, no less. Joe
Joe Conway <mail@joeconway.com> writes: > I agree 100%. If you want an index, unique constraint, or primary key on > a SERIAL, I think you should explicitly add it. SERIAL should give me a > column that automatically increments -- no more, no less. Hmm, do you also want to eliminate the implicit NOT NULL constraint? I think that efficiency and orthogonality are adequate reasons for dissociating UNIQUE from SERIAL. The efficiency argument is pretty weak in the case of the NOT NULL part, though, so maybe backwards compatibility should win out there. Another line of thought: as near as I can tell, SQL92 allows defaults and CHECK constraints to be associated with domains. Taking the viewpoint that SERIAL is a domain would say that the DEFAULT nextval() is okay, and CHECK NOT NULL is okay, but UNIQUE is not okay to be implied by the domain type. Perhaps the SQL authors knew what they were doing when they made that restriction... (Note that although I just dissuaded Rod from actually turning SERIAL into a domain on compatibility grounds, I don't see any reason why we shouldn't use the spec's rules about domains to reason about how it should work. In a slightly longer timeframe we may decide that we do want to make it a domain.) regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>I agree 100%. If you want an index, unique constraint, or primary key on >>a SERIAL, I think you should explicitly add it. SERIAL should give me a >>column that automatically increments -- no more, no less. > > Hmm, do you also want to eliminate the implicit NOT NULL constraint? > > I think that efficiency and orthogonality are adequate reasons for > dissociating UNIQUE from SERIAL. The efficiency argument is pretty > weak in the case of the NOT NULL part, though, so maybe backwards > compatibility should win out there. To be honest I wasn't thinking about NOT NULL. I'd agree with leaving that in place. Maybe I should restate my comment above: SERIAL should give me a column that automatically increments -- no more, no less -- and it should not allow me to override the value that it gives. Hence an implicit NOT NULL, but also an implicit rejection of a manual insert/update of that field (how hard would this be to do?). I know this causes problems for dumped and reloaded data. In MSSQL this is gotten around by allowing the properties of the data type to be altered, e.g. in MSSQL you can turn the IDENTITY property on or off (in fact, I just checked and MSSQL won't allow you to turn on IDENTITY for a column unless you turn off ALLOW NULLS). You can also specify an exception to the rule when running BCP (the bulk loader command line program). The reason I think this behavior is good, is that it helps prevent toe loss from stray bullets. E.g. you manually add a row where you've specified some value that has not yet been reached by the sequence -- then when someday the sequence reaches said value, your insert fails on a duplicate primary key insertion attempt. If you really need to be able to insert or update a field with an explicit value *sometimes* (and you really know what you're doing), then use a plain sequence and a default, not a SERIAL. Anyway, just my thoughts. Joe
Hi -*- Joe Conway <mail@joeconway.com> [ 2002-08-18 06:36 ]: > Maybe I should restate my comment above: SERIAL should give me a column > that automatically increments -- no more, no less -- and it should not > allow me to override the value that it gives. Hence an implicit NOT > NULL, but also an implicit rejection of a manual insert/update of that > field (how hard would this be to do?). > > I know this causes problems for dumped and reloaded data. In MSSQL this > is gotten around by allowing the properties of the data type to be > altered, e.g. in MSSQL you can turn the IDENTITY property on or off (in > fact, I just checked and MSSQL won't allow you to turn on IDENTITY for a > column unless you turn off ALLOW NULLS). You can also specify an > exception to the rule when running BCP (the bulk loader command line > program). > > The reason I think this behavior is good, is that it helps prevent toe > loss from stray bullets. E.g. you manually add a row where you've > specified some value that has not yet been reached by the sequence -- > then when someday the sequence reaches said value, your insert fails on > a duplicate primary key insertion attempt. > > If you really need to be able to insert or update a field with an > explicit value *sometimes* (and you really know what you're doing), then > use a plain sequence and a default, not a SERIAL. I like the functionality you are suggesting. However, when I started thinking about what size this field should be, theidea of keeping the current SERIAL "data type" and having a SERIAL flag (CREATE TABLE foo (id int serial))... so it'spretty much the same as a seqence, but enforced so that you can't tamper with it. The best part about this is obviouslythat it doesn't change behaviour of the old SERIAL. Am I right about this, or am I missing something here? -- Regards, Tolli tolli@tol.li
On Sun, 2002-08-18 at 02:35, Joe Conway wrote: > Tom Lane wrote: > > Joe Conway <mail@joeconway.com> writes: > >>I agree 100%. If you want an index, unique constraint, or primary key on > >>a SERIAL, I think you should explicitly add it. SERIAL should give me a > >>column that automatically increments -- no more, no less. > > > > Hmm, do you also want to eliminate the implicit NOT NULL constraint? > > > > I think that efficiency and orthogonality are adequate reasons for > > dissociating UNIQUE from SERIAL. The efficiency argument is pretty > > weak in the case of the NOT NULL part, though, so maybe backwards > > compatibility should win out there. > > To be honest I wasn't thinking about NOT NULL. I'd agree with leaving > that in place. > > Maybe I should restate my comment above: SERIAL should give me a column > that automatically increments -- no more, no less -- and it should not > allow me to override the value that it gives. Hence an implicit NOT > NULL, but also an implicit rejection of a manual insert/update of that > field (how hard would this be to do?). I don't like not overriding the value. A dataload example is a perfect reason why you would want to. Anyway, this would require placing 2 triggers on the table in order to prevent changes of the value. Personally I prefer the method that SAPdb uses. They make the column a fixed() type (integer) and set the default. Nothing about NOT NULL or UNIQUE. Anyway, I think SERIAL is about assisting creation of a entry, not enforcing it. Enforcement is trivial for those who don't mind the additional overhead.
Joe Conway <mail@joeconway.com> writes: > I know this causes problems for dumped and reloaded data. Yup. > In MSSQL this > is gotten around by allowing the properties of the data type to be > altered, e.g. in MSSQL you can turn the IDENTITY property on or off Rather pointless if it can be turned off, wouldn't you say? What I would do if I wanted such a guarantee is to make insertions go through a view rule that prevents the serial column from being supplied directly. That's a general technique that applies to more than just serial columns, and it is also easy to control who has privileges to bypass the view. AFAICT this "IDENTITY" thingie is an extremely limited kluge. regards, tom lane
> > In MSSQL this > > is gotten around by allowing the properties of the data type to be > > altered, e.g. in MSSQL you can turn the IDENTITY property on or off > > Rather pointless if it can be turned off, wouldn't you say? > > What I would do if I wanted such a guarantee is to make insertions > go through a view rule that prevents the serial column from being In the future if we have per column privileges, we could disable insert and update on serial columns by default for everyone but the table owner and super-users (to accommodate dataloads). Till then a trigger or view will do the job for those who wish to add them.
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>In MSSQL this >>is gotten around by allowing the properties of the data type to be >>altered, e.g. in MSSQL you can turn the IDENTITY property on or off > > Rather pointless if it can be turned off, wouldn't you say? Not really. Turning it off takes an explict act of the dba (ALTER TABLE) for a special circumstance such as importing data. In the normal case while it's turned on, it is the developers who need and get the foot teflon (INSERT/UPDATE). > What I would do if I wanted such a guarantee is to make insertions > go through a view rule that prevents the serial column from being > supplied directly. That's a general technique that applies to more than > just serial columns, and it is also easy to control who has privileges > to bypass the view. I thought about the view rule myself. Nice, but beyond the grasp of inexperienced users, who I think SERIAL is target for. > AFAICT this "IDENTITY" thingie is an extremely limited kluge. It is extremely limited -- that's why sequences are so much better (IMHO). But SERIAL is for those who want something simple to use, don't want to think about the implementation details, and don't mind it being limited. Otherwise why have it at all -- just make everyone use sequences. Joe
Rod Taylor wrote: >>>In MSSQL this >>>is gotten around by allowing the properties of the data type to be >>>altered, e.g. in MSSQL you can turn the IDENTITY property on or off >> >>Rather pointless if it can be turned off, wouldn't you say? >> >>What I would do if I wanted such a guarantee is to make insertions >>go through a view rule that prevents the serial column from being > > > In the future if we have per column privileges, we could disable insert > and update on serial columns by default for everyone but the table owner > and super-users (to accommodate dataloads). That sounds like the best way-to-go. > Till then a trigger or view will do the job for those who wish to add > them. Agreed. Joe
Since there didn't seem to be anyone objecting to the notion of decoupling UNIQUE from SERIAL, I'm going to go ahead with reviewing/applying Rod's recent patch that does that (and fixes pg_dump to dump 7.3 serials correctly). We can continue to debate about the merits of making additional changes in SERIAL behavior, of course, but this is the one that pg_dump needs. regards, tom lane
Joe Conway wrote: > Tom Lane wrote: > > Joe Conway <mail@joeconway.com> writes: > >>I agree 100%. If you want an index, unique constraint, or primary key on > >>a SERIAL, I think you should explicitly add it. SERIAL should give me a > >>column that automatically increments -- no more, no less. > > > > Hmm, do you also want to eliminate the implicit NOT NULL constraint? > > > > I think that efficiency and orthogonality are adequate reasons for > > dissociating UNIQUE from SERIAL. The efficiency argument is pretty > > weak in the case of the NOT NULL part, though, so maybe backwards > > compatibility should win out there. > > To be honest I wasn't thinking about NOT NULL. I'd agree with leaving > that in place. > > Maybe I should restate my comment above: SERIAL should give me a column > that automatically increments -- no more, no less -- and it should not > allow me to override the value that it gives. Hence an implicit NOT > NULL, but also an implicit rejection of a manual insert/update of that > field (how hard would this be to do?). If don't understand. We already have a unique index on the SERIAL column, so why bother rejecting an insert/update that supplies the value? We need the column to be unique, and that is forced, but why prevent _any_ unique value from being used. Clearly NULL is not a valid value and should be prevented with NOT NULL. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, 19 Aug 2002, Bruce Momjian wrote: > Joe Conway wrote: > > Tom Lane wrote: > > > Joe Conway <mail@joeconway.com> writes: > > >>I agree 100%. If you want an index, unique constraint, or primary key on > > >>a SERIAL, I think you should explicitly add it. SERIAL should give me a > > >>column that automatically increments -- no more, no less. > > > > > > Hmm, do you also want to eliminate the implicit NOT NULL constraint? > > > > > > I think that efficiency and orthogonality are adequate reasons for > > > dissociating UNIQUE from SERIAL. The efficiency argument is pretty > > > weak in the case of the NOT NULL part, though, so maybe backwards > > > compatibility should win out there. > > > > To be honest I wasn't thinking about NOT NULL. I'd agree with leaving > > that in place. > > > > Maybe I should restate my comment above: SERIAL should give me a column > > that automatically increments -- no more, no less -- and it should not > > allow me to override the value that it gives. Hence an implicit NOT > > NULL, but also an implicit rejection of a manual insert/update of that > > field (how hard would this be to do?). > > If don't understand. We already have a unique index on the SERIAL > column, so why bother rejecting an insert/update that supplies the > value? We need the column to be unique, and that is forced, but why > prevent _any_ unique value from being used. One reason is that the sequence won't respect those inserted values and you'll get uniqueness errors on statements that don't give a value for the column where you'd expect to be getting a working autogenerated value.
Stephan Szabo wrote: > > If don't understand. We already have a unique index on the SERIAL > > column, so why bother rejecting an insert/update that supplies the > > value? We need the column to be unique, and that is forced, but why > > prevent _any_ unique value from being used. > > One reason is that the sequence won't respect those inserted values and > you'll get uniqueness errors on statements that don't give a value for the > column where you'd expect to be getting a working autogenerated value. Oh, it causes problems later on. Interesting. However, preventing INSERT/UPDATE seems quite extreme. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073