Thread: feature request - adding columns with default value
hello, when doing an "alter table blub add column blah int2" and then issuing an "alter table blub alter column blah set default 0" I will always have to manually "update blub set blah=0" to initialize all existing records.... this is quite annoying while performing this update on a running database with 100 simultaneous clients accessing the database and about 443482 rows in the table. it takes about 10 minutes to perform this update, and for about 6 minutes all other updates to the table are in status "waiting", this leads to a very high load and all max_connection setting is reached soon (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) i would suggest to change the "alter table add column" function to adding a default value while adding the column if this is possible and if this minimizes "down-time". thank you -- Mit freundlichem Gruß Henrik Steffen Geschäftsführer top concepts Internetmarketing GmbH Am Steinkamp 7 - D-21684 Stade - Germany -------------------------------------------------------- http://www.topconcepts.com Tel. +49 4141 991230 mail: steffen@topconcepts.com Fax. +49 4141 991233 -------------------------------------------------------- 24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc) -------------------------------------------------------- Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de System-Partner gesucht: http://www.franchise.city-map.de -------------------------------------------------------- Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563 --------------------------------------------------------
On Friday 04 April 2003 13:38, you wrote: > hello, > > when doing an "alter table blub add column blah int2" > and then issuing an "alter table blub alter column blah set default 0" > I will always have to manually "update blub set blah=0" to initialize > all existing records.... this is quite annoying while performing this > update on a running database with 100 simultaneous clients accessing > the database and about 443482 rows in the table. > > it takes about 10 minutes to perform this update, and for about 6 minutes > all other updates to the table are in status "waiting", this leads to > a very high load and all max_connection setting is reached soon > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > i would suggest to change the "alter table add column" function to > adding a default value while adding the column if this is possible > and if this minimizes "down-time". May be I don't get it right, but altering defaults on live table, would yield inconsistent data. Some NULLS in past rows and zeros being inserted in new rows and that is not correct. Am I right here? Furthermore mass updating close to 450,000 row would give you a nice waste of dead tuples and your next vacuum will take loooong time unless you are vacuuming in tight loops while updating. I recommend you schedule a downtime, dump the table and reload it. This way there would be no wasted tuples. Any change in database schema should go thr. a scheduled maintenance , if you ask me. Shridhar
fortunately, the vacuum after the update does only take approx. 30 seconds thanks to the ultra-fast machine we are using... -- Mit freundlichem Gruß Henrik Steffen Geschäftsführer top concepts Internetmarketing GmbH Am Steinkamp 7 - D-21684 Stade - Germany -------------------------------------------------------- http://www.topconcepts.com Tel. +49 4141 991230 mail: steffen@topconcepts.com Fax. +49 4141 991233 -------------------------------------------------------- 24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc) -------------------------------------------------------- Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de System-Partner gesucht: http://www.franchise.city-map.de -------------------------------------------------------- Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563 -------------------------------------------------------- ----- Original Message ----- From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> To: "pgsql" <pgsql-general@postgresql.org> Sent: Friday, April 04, 2003 10:24 AM Subject: Re: [GENERAL] feature request - adding columns with default value > On Friday 04 April 2003 13:38, you wrote: > > hello, > > > > when doing an "alter table blub add column blah int2" > > and then issuing an "alter table blub alter column blah set default 0" > > I will always have to manually "update blub set blah=0" to initialize > > all existing records.... this is quite annoying while performing this > > update on a running database with 100 simultaneous clients accessing > > the database and about 443482 rows in the table. > > > > it takes about 10 minutes to perform this update, and for about 6 minutes > > all other updates to the table are in status "waiting", this leads to > > a very high load and all max_connection setting is reached soon > > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > > > i would suggest to change the "alter table add column" function to > > adding a default value while adding the column if this is possible > > and if this minimizes "down-time". > > May be I don't get it right, but altering defaults on live table, would yield > inconsistent data. Some NULLS in past rows and zeros being inserted in new > rows and that is not correct. Am I right here? > > Furthermore mass updating close to 450,000 row would give you a nice waste of > dead tuples and your next vacuum will take loooong time unless you are > vacuuming in tight loops while updating. > > I recommend you schedule a downtime, dump the table and reload it. This way > there would be no wasted tuples. Any change in database schema should go thr. > a scheduled maintenance , if you ask me. > > Shridhar > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
On Fri, Apr 04, 2003 at 10:08:50AM +0200, Henrik Steffen wrote: > it takes about 10 minutes to perform this update, and for about 6 minutes > all other updates to the table are in status "waiting", this leads to > a very high load and all max_connection setting is reached soon > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > i would suggest to change the "alter table add column" function to > adding a default value while adding the column if this is possible > and if this minimizes "down-time". Why do you think that having the back end set the default value on those other rows won't sugger the very same problem? What esle is it to do except implicitly iees the same UPDATE you do? One way around this problem is to write a little script (I use perl for this, but pick your favourite flavour) which does the updates 1000 at a time. This minimises the locking, so you don't have to have anything WAITING for 10 minutes. It also allows you to insert the occasional VACUUM so that your FSM settings don't have to be huge. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Henrik Steffen wrote: >hello, > >when doing an "alter table blub add column blah int2" >and then issuing an "alter table blub alter column blah set default 0" >I will always have to manually "update blub set blah=0" to initialize >all existing records.... this is quite annoying while performing this >update on a running database with 100 simultaneous clients accessing >the database and about 443482 rows in the table. > >it takes about 10 minutes to perform this update, and for about 6 minutes >all other updates to the table are in status "waiting", this leads to >a very high load and all max_connection setting is reached soon >(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > >i would suggest to change the "alter table add column" function to >adding a default value while adding the column if this is possible >and if this minimizes "down-time". > >thank you > > Maybe I'm wrong, but this feature was supported in 7.1, and it was removed in 7.2. If what I'm saying is true, I would like to know why was this feature removed? -- Diogo de Oliveira Biazus diogo@ikono.com.br Ikono Sistemas e Automação http://www.ikono.com.br
On Fri, 4 Apr 2003, Diogo de Oliveira Biazus wrote: > Henrik Steffen wrote: > > >hello, > > > >when doing an "alter table blub add column blah int2" > >and then issuing an "alter table blub alter column blah set default 0" > >I will always have to manually "update blub set blah=0" to initialize > >all existing records.... this is quite annoying while performing this > >update on a running database with 100 simultaneous clients accessing > >the database and about 443482 rows in the table. > > > >it takes about 10 minutes to perform this update, and for about 6 minutes > >all other updates to the table are in status "waiting", this leads to > >a very high load and all max_connection setting is reached soon > >(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > > >i would suggest to change the "alter table add column" function to > >adding a default value while adding the column if this is possible > >and if this minimizes "down-time". > > > >thank you > > > > > Maybe I'm wrong, but this feature was supported in 7.1, > and it was removed in 7.2. > If what I'm saying is true, I would like to know why was this feature > removed? AFAIR, some older versions accepted the syntax but then totally ignored the extra attributes (ie, you could say something like int2 default 0, but the default was just silently lost).
"Henrik Steffen" <steffen@city-map.de> writes: > when doing an "alter table blub add column blah int2" > and then issuing an "alter table blub alter column blah set default 0" > I will always have to manually "update blub set blah=0" to initialize > all existing records.... this is quite annoying while performing this > update on a running database with 100 simultaneous clients accessing > the database and about 443482 rows in the table. As someone else already pointed out, allowing "alter table add column" to specify a default would simply mean the system has to go through these same steps behind your back. It would *not* be any faster. What it would be is a lot worse from a concurrency standpoint, because the "alter table" transaction has to take an exclusive lock, which would then have to be held throughout the update of the data rows. Doing it in separate steps allows the exclusive lock to be held only while the critical catalog updates are performed. The UPDATE part is then just an ordinary writer that needs no exclusive lock. > it takes about 10 minutes to perform this update, and for about 6 minutes > all other updates to the table are in status "waiting", this leads to > a very high load and all max_connection setting is reached soon > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) Other transactions should only need to block if they try to update a row already updated by the "set blah=0" transaction. You might consider breaking the big update into a series of smaller transactions, say updating 10% of the rows at a time. (If you vacuumed after each of these smaller updates, you could also minimize table bloat.) We will eventually support "add column" with default, but AFAICS it will only be a convenience feature; if you are concerned about keeping the table available for concurrent use, the multi-step manual approach will always win. regards, tom lane
On Fri, 4 Apr 2003, Henrik Steffen wrote: > when doing an "alter table blub add column blah int2" > and then issuing an "alter table blub alter column blah set default 0" > I will always have to manually "update blub set blah=0" to initialize > all existing records.... this is quite annoying while performing this > update on a running database with 100 simultaneous clients accessing > the database and about 443482 rows in the table. > > it takes about 10 minutes to perform this update, and for about 6 minutes > all other updates to the table are in status "waiting", this leads to > a very high load and all max_connection setting is reached soon > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > i would suggest to change the "alter table add column" function to > adding a default value while adding the column if this is possible > and if this minimizes "down-time". IIRC, ADD COLUMN adds the column to the end of the list, but doesn't modify the actual rows stored on disk. I think when the column past the end is accessed on a row that doesn't have it a NULL is returned. SET DEFAULT just sets the default up but also doesn't modify the actual rows stored on disk. Presumably SET DEFAULT should probably do the update, but it's not likely to be appreciably faster than you doing the update. Other options would include things like updating the rows on select (that sounds like it could lead to wierd deadlocks between a select and an update), returning the default rather than NULL for the access (but that doesn't work for non-immutable defaults unless you update the row during the select) or building a new copy of the table (requires potentially alot of disk space). In addition, since the transaction in question probably has an exclusive lock to the table, the other transactions need to wait while the alter runs (so a new copy doesn't save you from locking reads while the table is building unless you drop the lock, which is probably dangerous). I think this was discussed in the past, so the archives can probably do a better job than I can (since I wasn't paying all that much attention).
Diogo de Oliveira Biazus <diogo@ikono.com.br> writes: > Henrik Steffen wrote: >> i would suggest to change the "alter table add column" function to >> adding a default value while adding the column if this is possible >> and if this minimizes "down-time". > Maybe I'm wrong, but this feature was supported in 7.1, > and it was removed in 7.2. THe old implementation was really equivalent to "alter table add column" followed by "set default", because it didn't touch the actual rows (the implied new column stayed NULL). That's not what the command should do per SQL spec, so we took it out until we get around to implementing the spec-compliant behavior. The current behavior conforms to spec as far as it goes. regards, tom lane
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > SET DEFAULT just sets the default up but also doesn't modify the actual rows > stored on disk. Presumably SET DEFAULT should probably do the update, No, it shouldn't; the existing behavior is per spec. SET DEFAULT just changes the default for future insertions, it's not supposed to touch the present table contents. ALTER TABLE ADD COLUMN with a default clause *is* supposed to fill all rows of the table with the default. The reason we reject it is we don't have that behavior implemented. regards, tom lane
do ANY databases do this? How does it fit the SQL standard. And, whether you do it manually or the ALTER TABLE command does it, it's still going to block other updates. Henrik Steffen wrote: > hello, > > when doing an "alter table blub add column blah int2" > and then issuing an "alter table blub alter column blah set default 0" > I will always have to manually "update blub set blah=0" to initialize > all existing records.... this is quite annoying while performing this > update on a running database with 100 simultaneous clients accessing > the database and about 443482 rows in the table. > > it takes about 10 minutes to perform this update, and for about 6 minutes > all other updates to the table are in status "waiting", this leads to > a very high load and all max_connection setting is reached soon > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > i would suggest to change the "alter table add column" function to > adding a default value while adding the column if this is possible > and if this minimizes "down-time". > > thank you > > -- > > Mit freundlichem Gruß > > Henrik Steffen > Geschäftsführer > > top concepts Internetmarketing GmbH > Am Steinkamp 7 - D-21684 Stade - Germany > -------------------------------------------------------- > http://www.topconcepts.com Tel. +49 4141 991230 > mail: steffen@topconcepts.com Fax. +49 4141 991233 > -------------------------------------------------------- > 24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc) > -------------------------------------------------------- > Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de > System-Partner gesucht: http://www.franchise.city-map.de > -------------------------------------------------------- > Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563 > -------------------------------------------------------- > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
On Fri, 4 Apr 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > SET DEFAULT just sets the default up but also doesn't modify the actual rows > > stored on disk. Presumably SET DEFAULT should probably do the update, > > No, it shouldn't; the existing behavior is per spec. SET DEFAULT just > changes the default for future insertions, it's not supposed to touch > the present table contents. Yeah, realized that didn't make much sense afterwards. > ALTER TABLE ADD COLUMN with a default clause *is* supposed to fill all > rows of the table with the default. The reason we reject it is we don't > have that behavior implemented. Right. We should probably change the error text then since it doesn't actually mention the update being necessary (and what it suggests is just as wrong as the behavior it would have if we just did the default in one step).
** Reply to message from Dennis Gearon <gearond@cvc.net> on Fri, 04 Apr 2003 08:19:24 -0800 Yeah, Db2 does. The reall usefulness of this is the form :- Alter table add blah varchar(10) not null default 'No' The backend appears to initialise the new column to the default value (just about instantly - even for large tables), and you can add a not null column to the table in one statement. Regards, Wayne > do ANY databases do this? How does it fit the SQL standard. And, whether you do > it manually or the ALTER TABLE command does it, it's still going to block other > updates. > > Henrik Steffen wrote: > > hello, > > > > when doing an "alter table blub add column blah int2" > > and then issuing an "alter table blub alter column blah set default 0" > > I will always have to manually "update blub set blah=0" to initialize > > all existing records.... this is quite annoying while performing this > > update on a running database with 100 simultaneous clients accessing > > the database and about 443482 rows in the table. > > > > it takes about 10 minutes to perform this update, and for about 6 minutes > > all other updates to the table are in status "waiting", this leads to > > a very high load and all max_connection setting is reached soon > > (on a Dual-Xeon 2.4 GHz machine with 2 GB RAM) > > > > i would suggest to change the "alter table add column" function to > > adding a default value while adding the column if this is possible > > and if this minimizes "down-time". > > > > thank you > > > > -- > > > > Mit freundlichem Gruß > > > > Henrik Steffen > > Geschäftsführer > > > > top concepts Internetmarketing GmbH > > Am Steinkamp 7 - D-21684 Stade - Germany > > -------------------------------------------------------- > > http://www.topconcepts.com Tel. +49 4141 991230 > > mail: steffen@topconcepts.com Fax. +49 4141 991233 > > -------------------------------------------------------- > > 24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc) > > -------------------------------------------------------- > > Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de > > System-Partner gesucht: http://www.franchise.city-map.de > > -------------------------------------------------------- > > Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563 > > -------------------------------------------------------- > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html