Thread: feature idea: use index when checking for NULLs before SET NOT NULL

feature idea: use index when checking for NULLs before SET NOT NULL

From
"John Bachir"
Date:
There's the age-old problem of SET NOT NULL being impossible on large actively used tables, because it needs to lock
thetable and do a table scan to check if there are any existing NULL values. I currently have a table that's not
particularlyhuge but a scan takes 70 seconds, which causes unacceptable downtime for my entire application.
 

Postgres is not able to use an index when doing this check: https://dba.stackexchange.com/questions/267947

Would it be possible to have Postgres use an index for this check? Given the right index, the check could be instant
andthe table would only need to be locked for milliseconds.
 

(I'm sure I'm not the first person to think of this, but I couldn't find any other discussion on this list or
elsewhere.)

Thanks for reading!
John



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
Sergei Kornilov
Date:
Hello

Correct index lookup is a difficult task. I tried to implement this previously...

But the answer in SO is a bit incomplete for recent postgresql releases. Seqscan is not the only possible way to set
notnull in pg12+. My patch was commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's possible to do this
way:

alter table foos 
     add constraint foos_not_null 
     check (bar1 is not null) not valid; -- short-time exclusive lock

alter table foos validate constraint foos_not_null; -- still seqscan entire table but without exclusive lock

An then another short lock:
alter table foos alter column bar1 set not null;
alter table foos drop constraint foos_not_null;

regards, Sergei



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
Oleksandr Shulgin
Date:
On Fri, May 29, 2020 at 8:56 AM Sergei Kornilov <sk@zsrv.org> wrote:
Hello

Correct index lookup is a difficult task. I tried to implement this previously...

But the answer in SO is a bit incomplete for recent postgresql releases. Seqscan is not the only possible way to set not null in pg12+. My patch was commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's possible to do this way:

alter table foos
     add constraint foos_not_null
     check (bar1 is not null) not valid; -- short-time exclusive lock

alter table foos validate constraint foos_not_null; -- still seqscan entire table but without exclusive lock

An then another short lock:
alter table foos alter column bar1 set not null;
alter table foos drop constraint foos_not_null;

That's really good to know, Sergei!

John, I think it's worth pointing out that Postgres most likely does a full table scan to validate a constraint by design and not in optimization oversight.  Think of what's gonna happen if the index used for checking is corrupted?

Cheers,
--
Alex

Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
"John Bachir"
Date:
Wow! Thank you Sergei for working on this patch, for working for months/years to get it in, and for replying to my
email!

For others reading this later:
- the feature was introduced in 12
- the commit is here https://github.com/postgres/postgres/commit/bbb96c3704c041d139181c6601e5bc770e045d26

Sergei, a few questions:

- Just to be clear, your recipe does not require any indexes, right? Because the constraint check table scan is
inherentlyconcurrent?
 
- Was this new behavior mentioned in the release nose?
- Do you know if there are any blog posts etc. discussing this? (I'm definitely going to write one!!)

John


> 
> But the answer in SO is a bit incomplete for recent postgresql 
> releases. Seqscan is not the only possible way to set not null in 
> pg12+. My patch was commited ( 
> https://commitfest.postgresql.org/22/1389/ ) and now it's possible to 
> do this way:
> 
> alter table foos 
>      add constraint foos_not_null 
>      check (bar1 is not null) not valid; -- short-time exclusive lock
> 
> alter table foos validate constraint foos_not_null; -- still seqscan 
> entire table but without exclusive lock
> 
> An then another short lock:
> alter table foos alter column bar1 set not null;
> alter table foos drop constraint foos_not_null;
> 
> regards, Sergei
>



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
Darafei "Komяpa" Praliaskouski
Date:


John, I think it's worth pointing out that Postgres most likely does a full table scan to validate a constraint by design and not in optimization oversight.  Think of what's gonna happen if the index used for checking is corrupted?


This can't be true: a corrupted index is a failure mode, and failure modes are not expected in normal flow. Think of it this way: we must never use index scan, because if index is corrupted the results are going to be disastrous, so we will always do Seq Scans.

It's ok to assume index is not corrupted.

 

--
Darafei Praliaskouski

Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
Sergei Kornilov
Date:
Hi

> Sergei, a few questions:
>
> - Just to be clear, your recipe does not require any indexes, right? Because the constraint check table scan is
inherentlyconcurrent?
 

Right. "alter table validate constraint" can not use indexes, but does not block concurrent read/write queries. Other
queriesin this scenario can not use indexes too, but should be fast.
 

> - Was this new behavior mentioned in the release nose?

Yes, small note in documentation and small note in release notes https://www.postgresql.org/docs/12/release-12.html
thisone:
 

> Allow ALTER TABLE ... SET NOT NULL to avoid unnecessary table scans (Sergei Kornilov)
> This can be optimized when the table's column constraints can be recognized as disallowing nulls.

> - Do you know if there are any blog posts etc. discussing this? (I'm definitely going to write one!!)

I do not know. Personally, I mentioned this feature in only a few Russian-speaking communities. And right now I wrote
answerin dba.SO.
 

regards, Sergei



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
"John Bachir"
Date:
Hi Sergei - I just used the recipe on my production database. I didn't observe all the expected benefits, I wonder if
therewere confounding factors or if I did something wrong. If you have time, I'd love to get your feedback. Let me know
ifyou need more info. I'd love to write a blog post informing the world about this potentially game-changing feature!
 

Here are the commands I did, with some notes. All the columns are boolean. The table has about 8,600,000 rows.

This (blocking operation) was not fast, perhaps 60-100 seconds. maybe running them individually
would have been proportionally faster. but even then, not near-instant as expected.
or, maybe running them together had some sort of aggregate negative effect, so running them individually
would have been instant? I don't have much experience with such constraints.

ALTER TABLE my_table
  ADD CONSTRAINT my_table_column1_not_null CHECK (column1 IS NOT NULL) NOT VALID,
  ADD CONSTRAINT my_table_column2_not_null CHECK (column2 IS NOT NULL) NOT VALID,
  ADD CONSTRAINT my_table_column3_not_null CHECK (column3 IS NOT NULL) NOT VALID,
  ADD CONSTRAINT my_table_column4_not_null CHECK (column4 IS NOT NULL) NOT VALID;


as expected these took as long as a table scan, and as expected they did not block.

ALTER TABLE my_table validate CONSTRAINT my_table_column1_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column2_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column3_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column4_not_null;


SLOW (table scan speed) - didn't have timing on, but I think about same time as the next one.
ALTER TABLE my_table ALTER COLUMN column1 SET NOT NULL;

01:39 SLOW (table scan speed)
ALTER TABLE my_table ALTER COLUMN column2 SET NOT NULL;

00:22 - 1/4 time of table scan but still not instant like expected
ALTER TABLE my_table ALTER COLUMN column3 SET NOT NULL;

20.403 ms - instant, like expected
ALTER TABLE my_table ALTER COLUMN column4 SET NOT NULL;


all < 100ms
ALTER TABLE my_table DROP CONSTRAINT my_table_column1_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column2_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column3_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column4_not_null;



Re: feature idea: use index when checking for NULLs before SET NOTNULL

From
Justin Pryzby
Date:
On Fri, May 29, 2020 at 09:53:14PM -0400, John Bachir wrote:
> Hi Sergei - I just used the recipe on my production database. I didn't
> observe all the expected benefits, I wonder if there were confounding factors
> or if I did something wrong. If you have time, I'd love to get your feedback.
> Let me know if you need more info. I'd love to write a blog post informing
> the world about this potentially game-changing feature!

If you do it right, you can see a DEBUG:

postgres=# CREATE TABLE tn (i int);
postgres=# ALTER TABLE tn ADD CONSTRAINT nn CHECK (i IS NOT NULL) NOT VALID;
postgres=# ALTER TABLE tn VALIDATE CONSTRAINT nn;
postgres=# SET client_min_messages=debug;
postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
DEBUG:  existing constraints on column "tn"."i" are sufficient to prove that it does not contain nulls

> SLOW (table scan speed) - didn't have timing on, but I think about same time as the next one.
> ALTER TABLE my_table ALTER COLUMN column1 SET NOT NULL;
> 
> 01:39 SLOW (table scan speed)
> ALTER TABLE my_table ALTER COLUMN column2 SET NOT NULL;
> 
> 00:22 - 1/4 time of table scan but still not instant like expected
> ALTER TABLE my_table ALTER COLUMN column3 SET NOT NULL;
> 
> 20.403 ms - instant, like expected
> ALTER TABLE my_table ALTER COLUMN column4 SET NOT NULL;

That the duration decreased every time may have been due to caching?
How big is the table vs RAM ?
Do you know if the SET NOT NULL blocked or not ?

Maybe something else had a nontrivial lock on the table, and those commands
were waiting on lock.  If you "SET deadlock_timeout='1'; SET
log_lock_waits=on;", then you could see that.

-- 
Justin



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
"John Bachir"
Date:
On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:

> If you do it right, you can see a DEBUG:

> postgres=# SET client_min_messages=debug;
> postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
> DEBUG:  existing constraints on column "tn"."i" are sufficient to prove 
> that it does not contain nulls

Thanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of this I
wouldhave to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way to
checksufficiency before running the command?
 


> That the duration decreased every time may have been due to caching?
> How big is the table vs RAM ?

Table is about 10 gigs, machine has 16gigs,  I'm hoping OS & PG did not decided to kick out everything else from ram
whendoing the operation. But even with caching, the final command being 20ms, and the first 2 commands being the same
timeas a table scan, seems like something other than caching is at play here? IDK!
 

> Do you know if the SET NOT NULL blocked or not ?
> Maybe something else had a nontrivial lock on the table, and those commands
> were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> log_lock_waits=on;", then you could see that.

I don't know if it blocked. Great idea! I'll add that to my recipe as well.

John


p.s. current recipe: https://gist.github.com/jjb/fab5cc5f0e1b23af28694db4fc01c55a
p.p.s I think one of the biggest surprises was that setting the NOT NULL condition was slow. That's totally unrelated
tothis feature though and out of scope for this list though, I asked about it here
https://dba.stackexchange.com/questions/268301/why-is-add-constraint-not-valid-taking-a-long-time



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
"John Bachir"
Date:
> Maybe something else had a nontrivial lock on the table, and those commands
> were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> log_lock_waits=on;", then you could see that.

Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in
case).



Re: feature idea: use index when checking for NULLs before SET NOTNULL

From
Justin Pryzby
Date:
On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
> On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:
> 
> > If you do it right, you can see a DEBUG:
> > postgres=# SET client_min_messages=debug;
> > postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
> > DEBUG:  existing constraints on column "tn"."i" are sufficient to prove 
> > that it does not contain nulls
> 
> Thanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of this
Iwould have to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way to
checksufficiency before running the command?
 

Yea, client_min_messages is there to demonstrate that the feature is working
and allow you to check whether it work using your own recipe.

If you want to avoid blocking the table for nontrivial time, maybe you'd add:
SET statement_timeout='1s';

On Mon, Jun 01, 2020 at 09:55:43PM -0400, John Bachir wrote:
> > Maybe something else had a nontrivial lock on the table, and those commands
> > were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> > log_lock_waits=on;", then you could see that.
> 
> Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in
case).

No, actually (but I've had to double check):

https://www.postgresql.org/docs/current/runtime-config-locks.html
|When log_lock_waits is set, this parameter also determines the length of time
|to wait before a log message is issued about the lock wait. If you are trying
|to investigate locking delays you might want to set a shorter than normal
|deadlock_timeout.

-- 
Justin



Re: feature idea: use index when checking for NULLs before SET NOT NULL

From
"John Bachir"
Date:
Thank you Justin for all that useful info! A couple nitpicky questions, so I can get my recipe right.

On Mon, Jun 1, 2020, at 10:04 PM, Justin Pryzby wrote:
> On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
> > Thanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of
thisI would have to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way
tocheck sufficiency before running the command?
 
> 
> Yea, client_min_messages is there to demonstrate that the feature is working
> and allow you to check whether it work using your own recipe.

Gotcha. Okay, just to double-check - you are saying there is _not_ a way to check before running the command, right?


> > Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in
case).
> 
> No, actually (but I've had to double check):
> 
> https://www.postgresql.org/docs/current/runtime-config-locks.html
> |When log_lock_waits is set, this parameter also determines the length of time
> |to wait before a log message is issued about the lock wait. If you are trying
> |to investigate locking delays you might want to set a shorter than normal
> |deadlock_timeout.

Hah! Unexpected and useful.

I think to avoid blocking my application activity, I should _also_ set lock_timeout, and retry if it fails. (maybe this
isorthogonal to what you were addressing before, but I'm just wondering what you think).
 

Thanks,
John