Thread: Lazy constraints / defaults

Lazy constraints / defaults

From
"Michał Zaborowski"
Date:
Hello,I would like to be able to add CONSTRAINT and/or DEFAULT with out
affecting old rows. Yes, it sounds strange, but... Let's say I have
big table, I want to add new column, with DEFAULT and NOT NULL.
Normally it means long exclusive lock. So - right now I'm adding plain
new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
NULL... Can it be little simpler?

--
Regards, Michał Zaborowski (TeXXaS)

Re: Lazy constraints / defaults

From
Tom Lane
Date:
"Michał Zaborowski" <michal.zaborowski@gmail.com> writes:
>  I would like to be able to add CONSTRAINT and/or DEFAULT with out
> affecting old rows.

You mean without actually checking that the old rows satisfy the
constraint?  There's approximately zero chance that that proposal
will be accepted.

> Yes, it sounds strange, but... Let's say I have
> big table, I want to add new column, with DEFAULT and NOT NULL.
> Normally it means long exclusive lock. So - right now I'm adding plain
> new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
> NULL... Can it be little simpler?

Just do it all in one ALTER command.

alter table tab add column col integer not null default 42 check (col > 0);
        regards, tom lane


Re: Lazy constraints / defaults

From
"Dawid Kuroczko"
Date:
On Sun, Mar 9, 2008 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Michał Zaborowski" <michal.zaborowski@gmail.com> writes:
>  >  I would like to be able to add CONSTRAINT and/or DEFAULT with out
>  > affecting old rows.
>
>  You mean without actually checking that the old rows satisfy the
>  constraint?  There's approximately zero chance that that proposal
>  will be accepted.

I think the problem here is to minimize the time when table is held by
exclusive lock,
Something similar to the CREATE INDEX CONCURRENTLY (i.e. hold exclusive lock
for a jiffy, then do the actual work for the old tuples).

So, the proposal would read as to add the ability to perform:
 ALTER TABLE CONCURRENTLY ALTER COLUMN foo SET NOT NULL
...where exclusive lock would be held to place the constraint (so all new
tuples would satisfy it), lock would be released and the old tuples would
be checked to make sure the constraint is valid.

Should a NULL value be found or should the backend die, the constraint
should disappear or be marked invalid.

>  > Yes, it sounds strange, but... Let's say I have
>  > big table, I want to add new column, with DEFAULT and NOT NULL.
>  > Normally it means long exclusive lock. So - right now I'm adding plain
>  > new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
>  > NULL... Can it be little simpler?
>
>  Just do it all in one ALTER command.
>
>  alter table tab add column col integer not null default 42 check (col > 0);

I think this will not solve the OP's problem.  He wants to minimize the time
a table is under exclusive lock, and this ALTER command will effectively
rewrite the whole table (to add new not null column).

Probably a workable solution would be to play with inheritance:
-- Add the NULL col colum: ALTER TABLE tab ADD COLUMN col integer;
-- Create a table which will have col NOT NULL CREATE TABLE tab_new (LIKE tab INCLUDING DEFAULTS INCLUDING
CONSTRAINTS INCLUDING INDEXES ) INHERITS (tab); ALTER TABLE tab_new ALTER COLUMN col SET NOT NULL;
-- Make the new values go to tab_new, if simple enough same might be
done for UPDATEs CREATE RULE insert_new AS ON INSERT TO tab DO INSTEAD INSERT INTO
tab_new VALUES (NEW.*);

-- Now, make a job which will do something like this: START TRANSACTION ISOLATON LEVEL SERIALIZABLE; UPDATE ONLY tab
SETcol = 42 WHERE id BETWEEN n AND n + 1000; INSERT INTO tab_new SELECT * FROM ONLY tab WHERE id BETWEEN n AND n +
1000;-- or better: -- INSERT INTO tab_new SELECT a,b,c,42 AS col FROM ONLY tab WHERE id
 
BETWEEN n AND n + 1000 FOR UPDATE; DELETE FROM ONLY tab WHERE id BETWEEN n AND n + 1000; COMMIT;

-- Finally, exhange parti^W^W get rid of old tab: SELECT count(*) FROM ONLY tab; -- should be zero ALTER TABLE tab
RENAMETO tab_old; ALTER TABLE tab_new RENAME TO tab; ALTER TABLE tab NO INHERIT tab_old;
 

Of course each step should be done in transaction, probably starting
with explicit LOCK.  And extra care should be taken
with respect to the UNIQUE constraints.  In short: unless you are 100%
sure what you are doing, don't. :-)
  Regards,     Dawid


Re: Lazy constraints / defaults

From
Martijn van Oosterhout
Date:
On Sun, Mar 09, 2008 at 10:45:59PM +0100, Dawid Kuroczko wrote:
> >  alter table tab add column col integer not null default 42 check (col > 0);
>
> I think this will not solve the OP's problem.  He wants to minimize the time
> a table is under exclusive lock, and this ALTER command will effectively
> rewrite the whole table (to add new not null column).

ISTM a while back someone adding a patch that made the above almost
instantaneous. Probably remembering wrong though.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Lazy constraints / defaults

From
"Michał Zaborowski"
Date:
Hello, Let me try again...

Here is simple example.
To do:
alter table users add column aaaa integer not null default 0;
Table is big, updated, referenced etc (big - means that alter lock the
table long enought
to kill the system). Note that it is not my design - I have to do
alter the table... but

Solution:
1. alter table users add column aaaa integer; -- short lock
2. alter table users alter column aaaa set default 0;
3. update users set aaaa = 0 where users.id between a and b; --
preparing for constraint - in small chunks
4. update users set aaaa = 0 where aaaa is null;
5. alter table users alter column aaaa set not null;

Works, but I hate it.
I would like to do:
alter table users add column aaaa integer not null default 0;
- with something like "concurrently" or "no check" - and let PG to do
the job. In that case I expect
PG to update meta data, and for updated rows set default - in other
case they can not satisfy check.
It would be great that step 3 has been done, but I understand it can
be a problem. I see that breaking
operation integrity is needed. I have a script with some parameters
that do it almost automatically.
What I want to point is that PG becomes more and more popular. People
use it for bigger and bigger
databases. In that case typical alter can be a PITA. If something can
be done by DB, I would like it
to be done in this way - as safer and faster way. In this particular
case - I expect DB to take care about
new and updated data. Correcting older rows is nice to have. That
parameter can be stored to inform
everybody - that some data may not satisfy check or null can be found
instead of default.

Look at commit_delay / commit_siblings. System is faster, but if
something go wrong - something (else)
will be lost. It is DBA decision what to choose. If DB keeps all
information in pg_class, pg_attribute
everyone can get how the changes ware made.

--
Regards, Michał Zaborowski (TeXXaS)

Re: Lazy constraints / defaults

From
"Dawid Kuroczko"
Date:
On Thu, Mar 20, 2008 at 4:56 PM, Decibel! <decibel@decibel.org> wrote:
> This would be very useful for me, and would satisfy the OP's request.
>
>  Can we get a TODO?

If you feel adventureous you may UPDATE catalog constrains directly,
which will work more or less as "enforce, don't validate, don't lock". ;-)

Worked for me. ;-)  I deliberately don't say where to update. ;-)

But for a TODO feature it should LOUDLY INDICATE that given
constraint is NOT VALIDATED,  After all, PostgreSQL is famous
for its high ACIDity standards. [ And event then... ;) ]
 Regards,    Dawid


Re: Lazy constraints / defaults

From
Decibel!
Date:
This would be very useful for me, and would satisfy the OP's request.

Can we get a TODO?

On Mar 9, 2008, at 4:45 PM, Dawid Kuroczko wrote:

> On Sun, Mar 9, 2008 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Michał Zaborowski"  
>> <michal.zaborowski@gmail.com> writes:
>>>  I would like to be able to add CONSTRAINT and/or DEFAULT with out
>>> affecting old rows.
>>
>>  You mean without actually checking that the old rows satisfy the
>>  constraint?  There's approximately zero chance that that proposal
>>  will be accepted.
>
> I think the problem here is to minimize the time when table is held by
> exclusive lock,
> Something similar to the CREATE INDEX CONCURRENTLY (i.e. hold  
> exclusive lock
> for a jiffy, then do the actual work for the old tuples).
>
> So, the proposal would read as to add the ability to perform:
>
>   ALTER TABLE CONCURRENTLY ALTER COLUMN foo SET NOT NULL
> ...where exclusive lock would be held to place the constraint (so  
> all new
> tuples would satisfy it), lock would be released and the old tuples  
> would
> be checked to make sure the constraint is valid.
>
> Should a NULL value be found or should the backend die, the constraint
> should disappear or be marked invalid.
>
>>> Yes, it sounds strange, but... Let's say I have
>>> big table, I want to add new column, with DEFAULT and NOT NULL.
>>> Normally it means long exclusive lock. So - right now I'm adding  
>>> plain
>>> new column, then DEFAULT, then UPDATE on all rows in chunks, then  
>>> NOT
>>> NULL... Can it be little simpler?
>>
>>  Just do it all in one ALTER command.
>>
>>  alter table tab add column col integer not null default 42 check  
>> (col > 0);
>
> I think this will not solve the OP's problem.  He wants to minimize  
> the time
> a table is under exclusive lock, and this ALTER command will  
> effectively
> rewrite the whole table (to add new not null column).
>
> Probably a workable solution would be to play with inheritance:
> -- Add the NULL col colum:
>   ALTER TABLE tab ADD COLUMN col integer;
> -- Create a table which will have col NOT NULL
>   CREATE TABLE tab_new (LIKE tab INCLUDING DEFAULTS INCLUDING
> CONSTRAINTS INCLUDING INDEXES ) INHERITS (tab);
>   ALTER TABLE tab_new ALTER COLUMN col SET NOT NULL;
> -- Make the new values go to tab_new, if simple enough same might be
> done for UPDATEs
>   CREATE RULE insert_new AS ON INSERT TO tab DO INSTEAD INSERT INTO
> tab_new VALUES (NEW.*);
>
> -- Now, make a job which will do something like this:
>   START TRANSACTION ISOLATON LEVEL SERIALIZABLE;
>   UPDATE ONLY tab SET col = 42 WHERE id BETWEEN n AND n + 1000;
>   INSERT INTO tab_new SELECT * FROM ONLY tab WHERE id BETWEEN n AND  
> n + 1000;
>   -- or better:
>   -- INSERT INTO tab_new SELECT a,b,c,42 AS col FROM ONLY tab WHERE id
> BETWEEN n AND n + 1000 FOR UPDATE;
>   DELETE FROM ONLY tab WHERE id BETWEEN n AND n + 1000;
>   COMMIT;
>
> -- Finally, exhange parti^W^W get rid of old tab:
>   SELECT count(*) FROM ONLY tab; -- should be zero
>   ALTER TABLE tab RENAME TO tab_old;
>   ALTER TABLE tab_new RENAME TO tab;
>   ALTER TABLE tab NO INHERIT tab_old;
>
> Of course each step should be done in transaction, probably starting
> with explicit LOCK.  And extra care should be taken
> with respect to the UNIQUE constraints.  In short: unless you are 100%
> sure what you are doing, don't. :-)
>
>    Regards,
>       Dawid
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: Lazy constraints / defaults

From
Bruce Momjian
Date:
I am wondering whether people use ALTER TABLE ALTER COLUMN foo SET NOT
NULL enough to justify concurrency coding.

---------------------------------------------------------------------------

Decibel! wrote:
> This would be very useful for me, and would satisfy the OP's request.
> 
> Can we get a TODO?
> 
> On Mar 9, 2008, at 4:45 PM, Dawid Kuroczko wrote:
> 
> > On Sun, Mar 9, 2008 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> "Michał Zaborowski"  
> >> <michal.zaborowski@gmail.com> writes:
> >>>  I would like to be able to add CONSTRAINT and/or DEFAULT with out
> >>> affecting old rows.
> >>
> >>  You mean without actually checking that the old rows satisfy the
> >>  constraint?  There's approximately zero chance that that proposal
> >>  will be accepted.
> >
> > I think the problem here is to minimize the time when table is held by
> > exclusive lock,
> > Something similar to the CREATE INDEX CONCURRENTLY (i.e. hold  
> > exclusive lock
> > for a jiffy, then do the actual work for the old tuples).
> >
> > So, the proposal would read as to add the ability to perform:
> >
> >   ALTER TABLE CONCURRENTLY ALTER COLUMN foo SET NOT NULL
> > ...where exclusive lock would be held to place the constraint (so  
> > all new
> > tuples would satisfy it), lock would be released and the old tuples  
> > would
> > be checked to make sure the constraint is valid.
> >
> > Should a NULL value be found or should the backend die, the constraint
> > should disappear or be marked invalid.
> >
> >>> Yes, it sounds strange, but... Let's say I have
> >>> big table, I want to add new column, with DEFAULT and NOT NULL.
> >>> Normally it means long exclusive lock. So - right now I'm adding  
> >>> plain
> >>> new column, then DEFAULT, then UPDATE on all rows in chunks, then  
> >>> NOT
> >>> NULL... Can it be little simpler?
> >>
> >>  Just do it all in one ALTER command.
> >>
> >>  alter table tab add column col integer not null default 42 check  
> >> (col > 0);
> >
> > I think this will not solve the OP's problem.  He wants to minimize  
> > the time
> > a table is under exclusive lock, and this ALTER command will  
> > effectively
> > rewrite the whole table (to add new not null column).
> >
> > Probably a workable solution would be to play with inheritance:
> > -- Add the NULL col colum:
> >   ALTER TABLE tab ADD COLUMN col integer;
> > -- Create a table which will have col NOT NULL
> >   CREATE TABLE tab_new (LIKE tab INCLUDING DEFAULTS INCLUDING
> > CONSTRAINTS INCLUDING INDEXES ) INHERITS (tab);
> >   ALTER TABLE tab_new ALTER COLUMN col SET NOT NULL;
> > -- Make the new values go to tab_new, if simple enough same might be
> > done for UPDATEs
> >   CREATE RULE insert_new AS ON INSERT TO tab DO INSTEAD INSERT INTO
> > tab_new VALUES (NEW.*);
> >
> > -- Now, make a job which will do something like this:
> >   START TRANSACTION ISOLATON LEVEL SERIALIZABLE;
> >   UPDATE ONLY tab SET col = 42 WHERE id BETWEEN n AND n + 1000;
> >   INSERT INTO tab_new SELECT * FROM ONLY tab WHERE id BETWEEN n AND  
> > n + 1000;
> >   -- or better:
> >   -- INSERT INTO tab_new SELECT a,b,c,42 AS col FROM ONLY tab WHERE id
> > BETWEEN n AND n + 1000 FOR UPDATE;
> >   DELETE FROM ONLY tab WHERE id BETWEEN n AND n + 1000;
> >   COMMIT;
> >
> > -- Finally, exhange parti^W^W get rid of old tab:
> >   SELECT count(*) FROM ONLY tab; -- should be zero
> >   ALTER TABLE tab RENAME TO tab_old;
> >   ALTER TABLE tab_new RENAME TO tab;
> >   ALTER TABLE tab NO INHERIT tab_old;
> >
> > Of course each step should be done in transaction, probably starting
> > with explicit LOCK.  And extra care should be taken
> > with respect to the UNIQUE constraints.  In short: unless you are 100%
> > sure what you are doing, don't. :-)
> >
> >    Regards,
> >       Dawid
> >
> > -- 
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
> 
> -- 
> Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
> Give your computer some brain candy! www.distributed.net Team #1828
> 
> 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +