Thread: Adding a nullable DOMAIN column w/ CHECK

Adding a nullable DOMAIN column w/ CHECK

From

Marko Tiikkaja

Date:

06 September 2014, 00:01:43

Hi,

First of all, sorry about breaking the thread; I don't subscribe to
-general so I can't copy the original email.  This is in response to the
problem here:
http://www.postgresql.org/message-id/CACfv+p+8dToaR7h06_M_YMjpV5Na-CQq7kN=KgJq_k84H7UHWA@mail.gmail.com

Attached is a very ugly proof-of-concept patch of how this could work.
With this applied, the ALTER TABLE becomes fast:

=# create domain test_domain numeric check (value > 0);
CREATE DOMAIN
Time: 2,443 ms
=# create table test_domain_table as select generate_series(1, 2000000);
SELECT 2000000
Time: 1802,681 ms
=# alter table test_domain_table add column t test_domain;
ALTER TABLE
Time: 4,991 ms

The patch is obviously a load of horse crap, but does anyone have any
objections to the general approach of making this pattern faster?  To do
this optimization we do have to assume that CHECKs in DOMAINs are at
least STABLE, but I don't see that as a problem; those should be
IMMUTABLE anyway, I think.


.marko

Attachment

alter_table_add_domain.patch

Re: Adding a nullable DOMAIN column w/ CHECK

From

Noah Misch

Date:

07 September 2014, 04:55:48

On Sat, Sep 06, 2014 at 02:01:32AM +0200, Marko Tiikkaja wrote:
> First of all, sorry about breaking the thread; I don't subscribe to
> -general so I can't copy the original email.  This is in response to
> the problem here:
http://www.postgresql.org/message-id/CACfv+p+8dToaR7h06_M_YMjpV5Na-CQq7kN=KgJq_k84H7UHWA@mail.gmail.com

You can download the message via "view raw" in the web archives, open it as an
mbox file, and reply there.

The old thread was fuzzy concerning how the system works today.  Adding a
domain-type column rewrites the table unless the domain has no constraints.
Simultaneously adding an all-NULL column and a CHECK constraint to a table
scans the table, even if the CHECK constraint references only the new column.

> The patch is obviously a load of horse crap, but does anyone have
> any objections to the general approach of making this pattern
> faster?

+1 in general.

> To do this optimization we do have to assume that CHECKs in
> DOMAINs are at least STABLE, but I don't see that as a problem;
> those should be IMMUTABLE anyway, I think.

The system has such assumptions already.

Re: Adding a nullable DOMAIN column w/ CHECK

From

Tom Lane

Date:

07 September 2014, 17:06:16

Noah Misch <noah@leadboat.com> writes:
> On Sat, Sep 06, 2014 at 02:01:32AM +0200, Marko Tiikkaja wrote:
>> To do this optimization we do have to assume that CHECKs in
>> DOMAINs are at least STABLE, but I don't see that as a problem;
>> those should be IMMUTABLE anyway, I think.

> The system has such assumptions already.

What bothers me about this general approach is that the check condition is
evaluated against a null whether or not there are any rows in the table.
This means that side-effects of the check condition would happen even when
they did not happen in the previous implementation.  Maybe that's all
right, but to say it's all right you must make a stronger form of the
"check conditions are immutable" assumption than we make elsewhere,
ie not just that its result won't change but that it has no visible
evaluation side-effects.  So I disagree with Noah's conclusion that we're
already assuming this.

As an example, if the check condition is such that it actually throws
an error (not just returns false) for null input, the ALTER command
would fail outright, whereas it would previously have succeeded as long
as the table is empty.  (BTW, should not the patch be checking for a false
result?)

This objection could be met by doing a precheck to verify that the table
contains at least one live row.  That's pretty ugly and personally I'm not
sure it's necessary, but I think there's room to argue that it is.
        regards, tom lane

Re: Adding a nullable DOMAIN column w/ CHECK

From

Noah Misch

Date:

07 September 2014, 19:23:45

On Sun, Sep 07, 2014 at 01:06:04PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sat, Sep 06, 2014 at 02:01:32AM +0200, Marko Tiikkaja wrote:
> >> To do this optimization we do have to assume that CHECKs in
> >> DOMAINs are at least STABLE, but I don't see that as a problem;
> >> those should be IMMUTABLE anyway, I think.
> 
> > The system has such assumptions already.
> 
> What bothers me about this general approach is that the check condition is
> evaluated against a null whether or not there are any rows in the table.
> This means that side-effects of the check condition would happen even when
> they did not happen in the previous implementation.  Maybe that's all
> right, but to say it's all right you must make a stronger form of the
> "check conditions are immutable" assumption than we make elsewhere,
> ie not just that its result won't change but that it has no visible
> evaluation side-effects.  So I disagree with Noah's conclusion that we're
> already assuming this.

Our assumption that domain CHECK constraints are STABLE doesn't grant
unlimited freedom to evaluate them, indeed.

> As an example, if the check condition is such that it actually throws
> an error (not just returns false) for null input, the ALTER command
> would fail outright, whereas it would previously have succeeded as long
> as the table is empty.  (BTW, should not the patch be checking for a false
> result?)
> 
> This objection could be met by doing a precheck to verify that the table
> contains at least one live row.  That's pretty ugly and personally I'm not
> sure it's necessary, but I think there's room to argue that it is.

Yes; I doubt one could justify failing on an empty table as though it had been
a one-row table.  I see a couple ways we could avoid the I/O and complexity:

1) If contain_leaky_functions() approves every constraint expression, test the  constraints once, and we're done.
Otherwise,proceed as we do today.
 

2) Test the constraints in a subtransaction.  If the subtransaction commits,  we're done.  Otherwise, proceed as we do
today.

The more complexity you accept, the more cases you optimize; where best to
draw the line is not clear to me at this point.

Thanks,
nm

Re: Adding a nullable DOMAIN column w/ CHECK

From

Tom Lane

Date:

07 September 2014, 19:31:49

Noah Misch <noah@leadboat.com> writes:
> On Sun, Sep 07, 2014 at 01:06:04PM -0400, Tom Lane wrote:
>> This objection could be met by doing a precheck to verify that the table
>> contains at least one live row.  That's pretty ugly and personally I'm not
>> sure it's necessary, but I think there's room to argue that it is.

> Yes; I doubt one could justify failing on an empty table as though it had been
> a one-row table.  I see a couple ways we could avoid the I/O and complexity:

> 1) If contain_leaky_functions() approves every constraint expression, test the
>    constraints once, and we're done.  Otherwise, proceed as we do today.

> 2) Test the constraints in a subtransaction.  If the subtransaction commits,
>    we're done.  Otherwise, proceed as we do today.

I'm not sure either of those is better than doing a single heap_getnext(),
which really should be pretty cheap except under pathological conditions.
It's the messiness I'm worried about more than the cost.
        regards, tom lane