Thread: ALTER TABLE modifications
A general re-organization of Alter Table. Node wise, it is a AlterTableStmt with a list of AlterTableCmds. The Cmds are the individual actions to be completed (Add constraint, drop constraint, add column, etc.) Processing is done in 2 phases. The first phase updates the system catalogs and creates a work queue for the table scan. The second phase is to conduct the actual table scan evaluating all constraints and other per tuple processing simultaneously, as required. This has no effect on single step operations, but has a large benefit for combinational logic where multiple table scans would otherwise be required. Steps for the table scan include expression processing (default or transform expressions), not null constraint process, other constraints. The code has been simplified a little. Simple permission and recursion checks are conducted in the common area and an enum for the command type is used rather than a Char. ALTER TABLE ADD COLUMN col DEFAULT 3 NOT NULL; Allow creation of a column with a default (filled) and not null constraint. This is internally subdivided into 3 actions much like the below syntax. ALTER TABLE tab ADD COLUMN col DEFAULT 3, ADD CHECK (anothercol > 3); The above combinational syntax is commented out in gram.y. The support framework is used in both the above and below items, but arbitrary statements probably have some issues -- I've not tested enough to determine. If it is useful, it will be submitted at a later date. ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; Currently migrates indexes, check constraints, defaults, and the column definition to the new type with optional transform. If the tranform is not supplied, a standard assignment cast is attempted. One issue is that it detects dependencies way too late in the game (after the TRANFORM has been applied). I tried mucking up performDeletion to have a RESTRICT that did not throw the error at the end (or physically remove the items) but that didn't work out. Any bright ideas on how to achieve the NOTICES from a failed DROP COLUMN earlier? Otherwise, I can put in single object detection at that point. It does not migrate foreign keys yet (should be easy to add) and will be forwarded as an independent patch at a later time. Currently foreign keys are rejected like views, functions, etc. Comments appreciated. -- Rod Taylor <pg [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Attachment
Is this to be applied to CVS HEAD? --------------------------------------------------------------------------- Rod Taylor wrote: > A general re-organization of Alter Table. Node wise, it is a > AlterTableStmt with a list of AlterTableCmds. The Cmds are the > individual actions to be completed (Add constraint, drop constraint, add > column, etc.) > > Processing is done in 2 phases. The first phase updates the system > catalogs and creates a work queue for the table scan. The second phase > is to conduct the actual table scan evaluating all constraints and other > per tuple processing simultaneously, as required. This has no effect on > single step operations, but has a large benefit for combinational logic > where multiple table scans would otherwise be required. > > Steps for the table scan include expression processing (default or > transform expressions), not null constraint process, other constraints. > > > The code has been simplified a little. Simple permission and recursion > checks are conducted in the common area and an enum for the command type > is used rather than a Char. > > > ALTER TABLE ADD COLUMN col DEFAULT 3 NOT NULL; > Allow creation of a column with a default (filled) and not null > constraint. This is internally subdivided into 3 actions much > like the below syntax. > > ALTER TABLE tab ADD COLUMN col DEFAULT 3, ADD CHECK (anothercol > 3); > The above combinational syntax is commented out in gram.y. The > support framework is used in both the above and below items, but > arbitrary statements probably have some issues -- I've not > tested enough to determine. > > If it is useful, it will be submitted at a later date. > > ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; > Currently migrates indexes, check constraints, defaults, and the > column definition to the new type with optional transform. If > the tranform is not supplied, a standard assignment cast is > attempted. > > One issue is that it detects dependencies way too late in the > game (after the TRANFORM has been applied). I tried mucking up > performDeletion to have a RESTRICT that did not throw the error > at the end (or physically remove the items) but that didn't work > out. Any bright ideas on how to achieve the NOTICES from a > failed DROP COLUMN earlier? Otherwise, I can put in single > object detection at that point. > > It does not migrate foreign keys yet (should be easy to add) and > will be forwarded as an independent patch at a later time. > Currently foreign keys are rejected like views, functions, etc. > > > Comments appreciated. > -- > Rod Taylor <pg [at] rbt [dot] ca> > > Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Is this to be applied to CVS HEAD? It sounded like large portions were still at the request-for-comment stage... regards, tom lane
On Tue, 2003-11-11 at 23:46, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Is this to be applied to CVS HEAD? > > It sounded like large portions were still at the request-for-comment > stage... It can be applied to -HEAD without breaking anything or backtracking in feature set (that I can see) but there is substantial change in the way things work so a pre-review is warranted and I'm sure changes will be requested. If I don't receive comments for corrections by the weekend, I'll finish off one of the missing components (foreign keys) and submit a replacement.
Rod Taylor writes: > ALTER TABLE tab ADD COLUMN col DEFAULT 3, ADD CHECK (anothercol > 3); > The above combinational syntax is commented out in gram.y. The > support framework is used in both the above and below items, but > arbitrary statements probably have some issues -- I've not > tested enough to determine. > > If it is useful, it will be submitted at a later date. I think it's perfectly fine to write two separate ALTER TABLE statements. No need to introduce this nonstandard syntax. > ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; > Currently migrates indexes, check constraints, defaults, and the > column definition to the new type with optional transform. If > the tranform is not supplied, a standard assignment cast is > attempted. Please don't use the term "transform". It is used by the SQL standard for other purposes. What kind of object would you put in place of the "..." anyway? A function? What syntax do other databases use? -- Peter Eisentraut peter_e@gmx.net
On Wed, 2003-11-12 at 14:02, Peter Eisentraut wrote: > Rod Taylor writes: > > > ALTER TABLE tab ADD COLUMN col DEFAULT 3, ADD CHECK (anothercol > 3); > I think it's perfectly fine to write two separate ALTER TABLE statements. > No need to introduce this nonstandard syntax. Yes, it is certainly fine to do so, but much faster to do the above. The command shown executes nearly 40% faster than 2 independent statements in a single transaction -- the difference is even more significant with additional sub-commands. > > ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; > > Currently migrates indexes, check constraints, defaults, and the > > column definition to the new type with optional transform. If > > the tranform is not supplied, a standard assignment cast is > > attempted. > > Please don't use the term "transform". It is used by the SQL standard for > other purposes. What kind of object would you put in place of the "..." > anyway? A function? What syntax do other databases use? I've not found another database which allows this syntax. The suggestion of TRANSFORM was Toms and was a result of using an assignment cast by default. Do you have a better term I can use? http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=266b02a270a164aa&seekm=1064805960.60248.24.camel%40jester#link4 The ... is an A_Expr which does not accept (among other things) subselects. CASE statements, equations, etc. work fine. CREATE TABLE tab (col int2); -- integer to boolean ALTER TABLE tab ALTER col TYPE boolean TRANSFORM CASE WHEN col >= 1 THEN true ELSE false END; -- or say Bytes to MBytes (original column is int8) ALTER TABLE tab ALTER col TYPE integer TRANSFORM col / (1024 * 1024);
Peter Eisentraut kirjutas K, 12.11.2003 kell 21:02: > Rod Taylor writes: > > > ALTER TABLE tab ADD COLUMN col DEFAULT 3, ADD CHECK (anothercol > 3); > > The above combinational syntax is commented out in gram.y. The > > support framework is used in both the above and below items, but > > arbitrary statements probably have some issues -- I've not > > tested enough to determine. > > > > If it is useful, it will be submitted at a later date. > > I think it's perfectly fine to write two separate ALTER TABLE statements. I guess the difference is that each pass (i.e. ALTER TABLE) needs to do another scan and copy of the table. Putting them in one statement allows all the alterations to be done in one pass. > No need to introduce this nonstandard syntax. > > > ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; > > Currently migrates indexes, check constraints, defaults, and the > > column definition to the new type with optional transform. If > > the tranform is not supplied, a standard assignment cast is > > attempted. > > Please don't use the term "transform". It is used by the SQL standard for > other purposes. Is the "other" use conflicting with this syntax ? I think we have preferred reusing existing keywords to adding new ones in the past. ----------------- Hannu
Hannu Krosing writes: > > Please don't use the term "transform". It is used by the SQL standard for > > other purposes. > > Is the "other" use conflicting with this syntax ? > > I think we have preferred reusing existing keywords to adding new ones > in the past. Maybe (although I don't agree). but we've never intentionally done terminology overloading. -- Peter Eisentraut peter_e@gmx.net
Rod Taylor writes: > Yes, it is certainly fine to do so, but much faster to do the above. Are table schema changes performance-sensitive operations, and are they usually done in bulk? I doubt it. > I've not found another database which allows this syntax. The suggestion > of TRANSFORM was Toms and was a result of using an assignment cast by > default. Do you have a better term I can use? I'm not sure I buy this whole concept in the first place. If there is no cast between type A and type B, then surely changing a table column from A to B is nonsensical. > -- or say Bytes to MBytes (original column is int8) > ALTER TABLE tab ALTER col TYPE integer TRANSFORM col / (1024 * 1024); You can do this using a plain column type change plus an UPDATE. I'd prefer to keep these operations independent. -- Peter Eisentraut peter_e@gmx.net
On Thu, 2003-11-13 at 09:18, Peter Eisentraut wrote: > Rod Taylor writes: > > > Yes, it is certainly fine to do so, but much faster to do the above. > > Are table schema changes performance-sensitive operations, and are they > usually done in bulk? I doubt it. Perhaps not for you. But I would disagree with both of those points for a majority of companies. If the system has high uptime requirements, the timeframe to accomplish maintenance decreases substantially. For example, I have two 1.5 hour maintenance windows per year for maintenance. This is a requirement given by our clients, nearly all of whom insist on it. Telcos and other enterprise level environments with 24 / 7 usage patterns have different demands. So.. Adding 2 new check constraints to a table with 500million entries takes way way too much time in my eyes. Toms request (when I initially brought up fixing ADD COLUMN) was that it would process the DEFAULT and other new constraints in a single pass. The above syntax simply uses the framework put in place for ADD COLUMN. I'll not push it if it does not function with complex combinations, which has not been tested, as such is not included in this patch. > > -- or say Bytes to MBytes (original column is int8) > > ALTER TABLE tab ALTER col TYPE integer TRANSFORM col / (1024 * 1024); > > You can do this using a plain column type change plus an UPDATE. I'd > prefer to keep these operations independent. Yes, if you can do the update first. Ok.. How about boolean to int2 as the user discovered they require a third or fouth state? Or vice-versa. An MSSQL, MySQL, etc. migration would leave a bunch of int2's (since they do not support boolean) in the PostgreSQL database which have a boolean meaning. rbt=# select '1'::int4::boolean rbt-# ; ERROR: cannot cast type integer to boolean rbt=# select '1'::boolean; bool ------ t (1 row) Are you going to tell these people to write down the Default, add a new column, copy the data, remove the old column, re-add the default, and rename the new column to the old column name? This is the exact reason many folks have asked for this ALTER form to be added. Considering there are already 2 sites out there running with this patch (I have been asked for advance copies) I do not believe the functionality is overkill in any respect. Can you please suggest a better term to use in place of TRANSFORM? Perhaps UPDATE WITH?
Rod Taylor kirjutas N, 13.11.2003 kell 16:59: > > Can you please suggest a better term to use in place of TRANSFORM? > Perhaps UPDATE WITH? or perhaps USING, based loosely on our use of USING in CREATE INDEX ? ---------- Hannu
Rod, I tried the current patch on a RC2 release, and I noticed one undesirable side affect. Modifying a column moves it to the end. In high availability situations this would not be desirable, I would imagine it would break lots of code. Dave On Thu, 2003-11-13 at 11:35, Hannu Krosing wrote: > Rod Taylor kirjutas N, 13.11.2003 kell 16:59: > > > > > Can you please suggest a better term to use in place of TRANSFORM? > > Perhaps UPDATE WITH? > > or perhaps USING, based loosely on our use of USING in CREATE INDEX ? > > ---------- > Hannu > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
On Fri, Nov 14, 2003 at 08:59:05AM -0500, Dave Cramer wrote: > I tried the current patch on a RC2 release, and I noticed one > undesirable side affect. > > Modifying a column moves it to the end. In high availability situations > this would not be desirable, I would imagine it would break lots of > code. This is expected. Doing otherwise would incur into a much bigger performance hit. Anyway, IMHO no code should use SELECT * in any case, which is the only scenario where one would expect physical column order to matter, isn't it? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Alvaro Herrera kirjutas R, 14.11.2003 kell 16:17: > On Fri, Nov 14, 2003 at 08:59:05AM -0500, Dave Cramer wrote: > > > I tried the current patch on a RC2 release, and I noticed one > > undesirable side affect. > > > > Modifying a column moves it to the end. In high availability situations > > this would not be desirable, I would imagine it would break lots of > > code. > > This is expected. Doing otherwise would incur into a much bigger > performance hit. Not neccessarily, but it would make the patch much bigger ;) IIRC there was discussion about splitting colum numbers into physical and logical numbers at the time when DROP COLUMN was done. > Anyway, IMHO no code should use SELECT * in any case, which is the only > scenario where one would expect physical column order to matter, isn't > it? and this could also break when just changing the column type. ------------ Hannu
On Fri, 2003-11-14 at 08:59, Dave Cramer wrote: > Rod, > > I tried the current patch on a RC2 release, and I noticed one > undesirable side affect. > > Modifying a column moves it to the end. In high availability situations > this would not be desirable, I would imagine it would break lots of > code. Yes, I've done that to myself a few times. The method is rename old column, add new column, move data across, move or reform dependencies, drop old column. Adding the new column puts it to the end. In order to avoid the repositioning we would need some kind of a position abstraction from the physical storage to what the user sees. It's on the TODO list, but not a part of this patch.
Rod Taylor writes: > The method is rename old column, add new column, move data across, move > or reform dependencies, drop old column. I can do this by hand. If we have an explicit command to do it, then it needs to preserve the table schema. Else, this feature would be mostly useless and a certain source of complaints. -- Peter Eisentraut peter_e@gmx.net
On Fri, 2003-11-14 at 09:57, Peter Eisentraut wrote: > Rod Taylor writes: > > > The method is rename old column, add new column, move data across, move > > or reform dependencies, drop old column. > > I can do this by hand. If we have an explicit command to do it, then it > needs to preserve the table schema. Else, this feature would be mostly > useless and a certain source of complaints. The method was agreed to on -hackers prior to any code having been written. Unless I'm mistaken, the method was suggested to me by other on -hackers. Can you give me an alternative on how to approach this problem without relying on another TODO item labelled ALTER TABLE .. POSITION? The point of the command isn't to accomplish anything magical, simply to make it easier. begin; Rename old column Add new column Copy data Recreate default Recreate NOT NULL constraint Recreate index 1 Recreate index 2 (multi-key) Recreate check constraint 1 Recreate check constraint 2 Recreate check constraint 3 Recreate foreign key 1 remove old column commit; vacuum full table (to get rid of 2x problem since the user was unable to do a fileswap) The above is by hand, the below is with this command: begin; Alter table .. type .. commit; If it becomes possible to change the order without having to goto an exceptional amount of work (that POSITION item) then it would only be obvious to attempt to preserve the position.
OK, Here is another approach, that would retain column order. It will require that the table be locked while this proceeds, but I think this is a good idea anyway. lock table create newtable as select c1, c2, c3::newtype modify pg_class to point to the new filename modify existing pg_attribute for the column in question recreate indexes that exist on the column unlock table Dave On Fri, 2003-11-14 at 09:57, Peter Eisentraut wrote: > Rod Taylor writes: > > > The method is rename old column, add new column, move data across, move > > or reform dependencies, drop old column. > > I can do this by hand. If we have an explicit command to do it, then it > needs to preserve the table schema. Else, this feature would be mostly > useless and a certain source of complaints.
Rod Taylor <pg@rbt.ca> writes: > On Fri, 2003-11-14 at 09:57, Peter Eisentraut wrote: >> I can do this by hand. If we have an explicit command to do it, then it >> needs to preserve the table schema. Else, this feature would be mostly >> useless and a certain source of complaints. > The method was agreed to on -hackers prior to any code having been > written. I believe the consensus was that automating what you could do by hand is still a step forward. It clearly would be better if we could relabel the logical column position after finishing the whole process, but I agree with Rod that that is an independent patch. Combining them into one mega-patch doesn't sound like good engineering. I guess the real question here is whether we would want to revert this capability if a patch to adjust logical column orderings doesn't appear before 7.5. My vote would be "no", but apparently Peter's is "yes". Any other opinions? regards, tom lane
> lock table > create newtable as select c1, c2, c3::newtype > modify pg_class to point to the new filename > modify existing pg_attribute for the column in question > recreate indexes that exist on the column > unlock table I actually tried this to start with an ran into several dead-ends in trying to rebuild constraints, defaults, etc. In order to do this I think you would need to create a second pg_class entry and do a full table swap. By far, the easiest method to preserve order without writing a ton of additional code is to do something that is on the TODO list already, separate the visible position from physical position. Doing the above has lots of added benefits, where spending a ton of time on TYPE change has very few benefits.
Tom Lane writes: > I believe the consensus was that automating what you could do by hand > is still a step forward. I don't recall that, but if so, I would like to revisit that consensus. AFAICT, this patch does not buy us anything at all. It's just a different spelling of existing functionality. We have never done that before. It just makes the system harder to maintain and use. All commands should be reasonably independent, or at least add some functionality of their own. > It clearly would be better if we could relabel the logical column > position after finishing the whole process, but I agree with Rod that > that is an independent patch. Combining them into one mega-patch > doesn't sound like good engineering. Good engineering would be if the logical column number patch comes first. We cannot possibly leave this patch as is. People expect in-place column changes. Things will break left and right, users will complain all over the place if we offer a way to change a column, but yeah, by the way it changes the structure of the table as well. We've had these kinds of good idea/right direction/better than nothing approaches in areas like DROP COLUMN and CLUSTER already, and they were no good. Except in this case, "better than nothing" doesn't even apply, because there is already something. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut kirjutas R, 14.11.2003 kell 18:51: > Tom Lane writes: > > > I believe the consensus was that automating what you could do by hand > > is still a step forward. > > I don't recall that, but if so, I would like to revisit that consensus. > > AFAICT, this patch does not buy us anything at all. It's just a different > spelling of existing functionality. We have never done that before. what about DROP COLUMN - this is also just a different spelling for SELECT INTO, migrate all constraints, DROP OLD TABLE, RENAME. > It just makes the system harder to maintain and use. All commands should be > reasonably independent, or at least add some functionality of their own. > > > It clearly would be better if we could relabel the logical column > > position after finishing the whole process, but I agree with Rod that > > that is an independent patch. Combining them into one mega-patch > > doesn't sound like good engineering. > > Good engineering would be if the logical column number patch comes first. Agreed, it would be nice. > We cannot possibly leave this patch as is. People expect in-place column > changes. Does SQL spec even require that SELECT * always returns columns in the same order ? I don't think that relational model assigns any 'order' to columns. BTW, SELECT * is just a different spelling of existing functionality ;) > Things will break left and right, users will complain all over > the place if we offer a way to change a column, but yeah, by the way it > changes the structure of the table as well. ALTER TABLE ADD/DROP COLUMN would also break SELECT *. as would type change in many cases (query expects int but gets string) > We've had these kinds of good > idea/right direction/better than nothing approaches in areas like DROP > COLUMN and CLUSTER already, and they were no good. Except in this case, > "better than nothing" doesn't even apply, because there is already > something. We have always had DUMP -> sed -> LOAD too. This patch is *MUCH* better than nothing for performance. Instead of being N times SEQSCAN (+ intervening VACUUM FULLs to reclaim space), it can do it all in one pass. If that pass takes 1.5 hours, we have a very big win here. ----------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: > Peter Eisentraut kirjutas R, 14.11.2003 kell 18:51: >> I don't recall that, but if so, I would like to revisit that consensus. > [ Hannu disagrees ] Please take this thread to pgsql-hackers; if the issue is going to be contentious then we should try to get a wider spectrum of opinion than just people who follow -patches. regards, tom lane
> This is expected. Doing otherwise would incur into a much bigger > performance hit. > > Anyway, IMHO no code should use SELECT * in any case, which is the only > scenario where one would expect physical column order to matter, isn't > it? Well, we can always bring back the old idea of a attlognum which is the logical order of the columns. Then we use that to expand * Chris
> I guess the real question here is whether we would want to revert this > capability if a patch to adjust logical column orderings doesn't appear > before 7.5. My vote would be "no", but apparently Peter's is "yes". > Any other opinions? The fact that it deals with the nightmare of dropping and recreating indexes and fk's makes it worth it to me. How about functions and views and rules that depend on it though? Chris
Rod Taylor kirjutas L, 08.11.2003 kell 18:55: > A general re-organization of Alter Table. Node wise, it is a > AlterTableStmt with a list of AlterTableCmds. The Cmds are the > individual actions to be completed (Add constraint, drop constraint, add > column, etc.) > > Processing is done in 2 phases. The first phase updates the system > catalogs and creates a work queue for the table scan. The second phase > is to conduct the actual table scan evaluating all constraints and other > per tuple processing simultaneously, as required. This has no effect on > single step operations, but has a large benefit for combinational logic > where multiple table scans would otherwise be required. ... > ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; > Currently migrates indexes, check constraints, defaults, and the > column definition to the new type with optional transform. If > the tranform is not supplied, a standard assignment cast is > attempted. Do you have special cases for type changes which don't need data transforms. I mean things like changing VARCHAR(10) to VARCHAR(20), dropping the NOT NULL constraint or changing CHECK A < 3 to CHECK A < 4. All these could be done with no data migration or extra checking. So how much of it should PG attemt to detect automatically and should there be NOSCAN option when progremmer knows better (changing CHECK ABS(A) < 3 into CHECK 9 > (A*A) ) ---------------- Hannu