Thread: Domain check taking place unnecessarily?

Domain check taking place unnecessarily?

From
Mark Hills
Date:
I'm used to adding an empty column being instant in most cases, so my 
attention was drawn when it took a long lock.

The timings below imply that each row is running the CHECK?

I've come to expect addition of a NULL column to be fast, and what I'm 
seeing seems to contradict the docs [1]:

> PostgreSQL assumes that CHECK constraints' conditions are immutable, 
> that is, they will always give the same result for the same input value. 
> This assumption is what justifies examining CHECK constraints only when 
> a value is first converted to be of a domain type, and not at other 
> times.

I've ruled out waiting on a lock; nothing is reported with 
log_lock_waits=on. This is a test database with exclusive access (2.5 
million rows):

I don't think this is another index or constraint, as removing them does 
not affect performance. Also the "text" case below seems to prove this. 
Results are fully reproducable by repeatedly dropping and adding these 
columns.

Reporting in case something is not as expected. I can't even think of a 
workaround here...

This is PostgreSQL 14.5 on Alpine Linux. Thanks.

[1] https://www.postgresql.org/docs/current/sql-createdomain.html

---

CREATE DOMAIN hash AS text
    CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');
 
devstats=> ALTER TABLE invite ADD COLUMN test text;
ALTER TABLE
Time: 8.988 ms
 
devstats=> ALTER TABLE invite ADD COLUMN test hash;
ALTER TABLE
Time: 30923.380 ms (00:30.923)
 
devstats=> ALTER TABLE invite ADD COLUMN test hash DEFAULT NULL;
ALTER TABLE
Time: 30344.272 ms (00:30.344)
 
devstats=> ALTER TABLE invite ADD COLUMN test hash DEFAULT '123abc123'::hash;
ALTER TABLE
Time: 67439.232 ms (01:07.439)

-- 
Mark



Re: Domain check taking place unnecessarily?

From
"David G. Johnston"
Date:
On Wed, Feb 8, 2023 at 11:01 AM Mark Hills <mark@xwax.org> wrote:

CREATE DOMAIN hash AS text
    CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');

devstats=> ALTER TABLE invite ADD COLUMN test hash;
ALTER TABLE
Time: 30923.380 ms (00:30.923)

Necessarily, I presume because if you decided that the check on the domain should be "value is not null" (don't do this though...) the column addition would have to fail for existing rows (modulo defaults...).

David J.

Re: Domain check taking place unnecessarily?

From
Laurenz Albe
Date:
On Wed, 2023-02-08 at 18:01 +0000, Mark Hills wrote:
> I've ruled out waiting on a lock; nothing is reported with
> log_lock_waits=on. This is a test database with exclusive access (2.5
> million rows):
>
> This is PostgreSQL 14.5 on Alpine Linux. Thanks.
>
> CREATE DOMAIN hash AS text
>     CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');
>  
> devstats=> ALTER TABLE invite ADD COLUMN test text;
> ALTER TABLE
> Time: 8.988 ms
>  
> devstats=> ALTER TABLE invite ADD COLUMN test hash;
> ALTER TABLE
> Time: 30923.380 ms (00:30.923)
>  
> devstats=> ALTER TABLE invite ADD COLUMN test hash DEFAULT NULL;
> ALTER TABLE
> Time: 30344.272 ms (00:30.344)
>  
> devstats=> ALTER TABLE invite ADD COLUMN test hash DEFAULT '123abc123'::hash;
> ALTER TABLE
> Time: 67439.232 ms (01:07.439)

It takes 30 seconds to schan the table and determine that all existing rows
satisky the constraint.

The last example is slower, because there is actually a non-NULL value to check.

If that were not a domain, but a normal check constraint, you could first add
the constraint as NOT VALID and later run ALTER TABLE ... VALIDATE CONSTRAINT ...,
which takes a while too, but does not lock the table quite that much.
But I don't think there is a way to do that with a domain.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Re: Domain check taking place unnecessarily?

From
Mark Hills
Date:
On Wed, 8 Feb 2023, Laurenz Albe wrote:

> On Wed, 2023-02-08 at 18:01 +0000, Mark Hills wrote:
> > I've ruled out waiting on a lock; nothing is reported with
> > log_lock_waits=on. This is a test database with exclusive access (2.5
> > million rows):
> >
> > This is PostgreSQL 14.5 on Alpine Linux. Thanks.
> >
> > CREATE DOMAIN hash AS text
> >     CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');
> >  
> > devstats=> ALTER TABLE invite ADD COLUMN test text;
> > ALTER TABLE
> > Time: 8.988 ms
> >  
> > devstats=> ALTER TABLE invite ADD COLUMN test hash;
> > ALTER TABLE
> > Time: 30923.380 ms (00:30.923)
> >  
> > devstats=> ALTER TABLE invite ADD COLUMN test hash DEFAULT NULL;
> > ALTER TABLE
> > Time: 30344.272 ms (00:30.344)
> >  
> > devstats=> ALTER TABLE invite ADD COLUMN test hash DEFAULT '123abc123'::hash;
> > ALTER TABLE
> > Time: 67439.232 ms (01:07.439)
>
> It takes 30 seconds to schan the table and determine that all existing
> rows satisky the constraint.

But there's no existing data (note this is adding column, not constraint)

Existing rows are guaranteed to satisfy the domain check, because the
domain check is guaranteed to be immutable (per [1] in my original mail)

Of course, if it were a table constraint it may involve multiple columns,
requiring it to be evaluated per-row.

But the docs make it clear the domain check is expected to be evaluated on
input, precisely for this purpose.

So I wondered if this was a shortcoming or even a bug.

It seems that adding a column of NULL (or even default) values for a
domain can (should?) be as quick as a basic data type like text or
integer...?

--
Mark

Re: Domain check taking place unnecessarily?

From
Mark Hills
Date:
On Wed, 8 Feb 2023, David G. Johnston wrote:

> On Wed, Feb 8, 2023 at 11:01 AM Mark Hills <mark@xwax.org> wrote:
> 
> >
> > CREATE DOMAIN hash AS text
> >     CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');
> >
> > devstats=> ALTER TABLE invite ADD COLUMN test hash;
> > ALTER TABLE
> > Time: 30923.380 ms (00:30.923)
> >
> 
> Necessarily, I presume because if you decided that the check on the domain
> should be "value is not null" (don't do this though...) the column addition
> would have to fail for existing rows (modulo defaults...).

I'm not sure I'm parsing this paragraph correctly, but the existing rows 
don't provide any data to the domain check. Perhaps you could clarify.

Many thanks

-- 
Mark



Re: Domain check taking place unnecessarily?

From
Tom Lane
Date:
Mark Hills <mark@xwax.org> writes:
> On Wed, 8 Feb 2023, Laurenz Albe wrote:
>> It takes 30 seconds to schan the table and determine that all existing 
>> rows satisky the constraint.

> But there's no existing data (note this is adding column, not constraint)

> Existing rows are guaranteed to satisfy the domain check, because the 
> domain check is guaranteed to be immutable (per [1] in my original mail)

immutable != "will accept null".

There could be some more optimizations here, perhaps, but there aren't.

            regards, tom lane



Re: Domain check taking place unnecessarily?

From
Mark Hills
Date:
On Thu, 9 Feb 2023, Tom Lane wrote:

> Mark Hills <mark@xwax.org> writes:
> > On Wed, 8 Feb 2023, Laurenz Albe wrote:
> >> It takes 30 seconds to schan the table and determine that all existing 
> >> rows satisky the constraint.
> 
> > But there's no existing data (note this is adding column, not constraint)
> 
> > Existing rows are guaranteed to satisfy the domain check, because the 
> > domain check is guaranteed to be immutable (per [1] in my original mail)
> 
> immutable != "will accept null".
> 
> There could be some more optimizations here, perhaps, but there aren't.

Well that's no problem at all. Thanks for the clarification.

I mentioned this case to a few people and they were also surprised by the 
outcome, to the point where we wondered if this might be misbehaving. 
Hence bringing it up in this forum.

We'll go ahead and deal with the pauses in production, as I don't think 
there's a workaround.

Thanks

-- 
Mark