Thread: Adding a nullable DOMAIN column w/ CHECK
Hi, First of all, sorry about breaking the thread; I don't subscribe to -general so I can't copy the original email. This is in response to the problem here: http://www.postgresql.org/message-id/CACfv+p+8dToaR7h06_M_YMjpV5Na-CQq7kN=KgJq_k84H7UHWA@mail.gmail.com Attached is a very ugly proof-of-concept patch of how this could work. With this applied, the ALTER TABLE becomes fast: =# create domain test_domain numeric check (value > 0); CREATE DOMAIN Time: 2,443 ms =# create table test_domain_table as select generate_series(1, 2000000); SELECT 2000000 Time: 1802,681 ms =# alter table test_domain_table add column t test_domain; ALTER TABLE Time: 4,991 ms The patch is obviously a load of horse crap, but does anyone have any objections to the general approach of making this pattern faster? To do this optimization we do have to assume that CHECKs in DOMAINs are at least STABLE, but I don't see that as a problem; those should be IMMUTABLE anyway, I think. .marko
Attachment
On Sat, Sep 06, 2014 at 02:01:32AM +0200, Marko Tiikkaja wrote: > First of all, sorry about breaking the thread; I don't subscribe to > -general so I can't copy the original email. This is in response to > the problem here: http://www.postgresql.org/message-id/CACfv+p+8dToaR7h06_M_YMjpV5Na-CQq7kN=KgJq_k84H7UHWA@mail.gmail.com You can download the message via "view raw" in the web archives, open it as an mbox file, and reply there. The old thread was fuzzy concerning how the system works today. Adding a domain-type column rewrites the table unless the domain has no constraints. Simultaneously adding an all-NULL column and a CHECK constraint to a table scans the table, even if the CHECK constraint references only the new column. > The patch is obviously a load of horse crap, but does anyone have > any objections to the general approach of making this pattern > faster? +1 in general. > To do this optimization we do have to assume that CHECKs in > DOMAINs are at least STABLE, but I don't see that as a problem; > those should be IMMUTABLE anyway, I think. The system has such assumptions already.
Noah Misch <noah@leadboat.com> writes: > On Sat, Sep 06, 2014 at 02:01:32AM +0200, Marko Tiikkaja wrote: >> To do this optimization we do have to assume that CHECKs in >> DOMAINs are at least STABLE, but I don't see that as a problem; >> those should be IMMUTABLE anyway, I think. > The system has such assumptions already. What bothers me about this general approach is that the check condition is evaluated against a null whether or not there are any rows in the table. This means that side-effects of the check condition would happen even when they did not happen in the previous implementation. Maybe that's all right, but to say it's all right you must make a stronger form of the "check conditions are immutable" assumption than we make elsewhere, ie not just that its result won't change but that it has no visible evaluation side-effects. So I disagree with Noah's conclusion that we're already assuming this. As an example, if the check condition is such that it actually throws an error (not just returns false) for null input, the ALTER command would fail outright, whereas it would previously have succeeded as long as the table is empty. (BTW, should not the patch be checking for a false result?) This objection could be met by doing a precheck to verify that the table contains at least one live row. That's pretty ugly and personally I'm not sure it's necessary, but I think there's room to argue that it is. regards, tom lane
On Sun, Sep 07, 2014 at 01:06:04PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sat, Sep 06, 2014 at 02:01:32AM +0200, Marko Tiikkaja wrote: > >> To do this optimization we do have to assume that CHECKs in > >> DOMAINs are at least STABLE, but I don't see that as a problem; > >> those should be IMMUTABLE anyway, I think. > > > The system has such assumptions already. > > What bothers me about this general approach is that the check condition is > evaluated against a null whether or not there are any rows in the table. > This means that side-effects of the check condition would happen even when > they did not happen in the previous implementation. Maybe that's all > right, but to say it's all right you must make a stronger form of the > "check conditions are immutable" assumption than we make elsewhere, > ie not just that its result won't change but that it has no visible > evaluation side-effects. So I disagree with Noah's conclusion that we're > already assuming this. Our assumption that domain CHECK constraints are STABLE doesn't grant unlimited freedom to evaluate them, indeed. > As an example, if the check condition is such that it actually throws > an error (not just returns false) for null input, the ALTER command > would fail outright, whereas it would previously have succeeded as long > as the table is empty. (BTW, should not the patch be checking for a false > result?) > > This objection could be met by doing a precheck to verify that the table > contains at least one live row. That's pretty ugly and personally I'm not > sure it's necessary, but I think there's room to argue that it is. Yes; I doubt one could justify failing on an empty table as though it had been a one-row table. I see a couple ways we could avoid the I/O and complexity: 1) If contain_leaky_functions() approves every constraint expression, test the constraints once, and we're done. Otherwise,proceed as we do today. 2) Test the constraints in a subtransaction. If the subtransaction commits, we're done. Otherwise, proceed as we do today. The more complexity you accept, the more cases you optimize; where best to draw the line is not clear to me at this point. Thanks, nm
Noah Misch <noah@leadboat.com> writes: > On Sun, Sep 07, 2014 at 01:06:04PM -0400, Tom Lane wrote: >> This objection could be met by doing a precheck to verify that the table >> contains at least one live row. That's pretty ugly and personally I'm not >> sure it's necessary, but I think there's room to argue that it is. > Yes; I doubt one could justify failing on an empty table as though it had been > a one-row table. I see a couple ways we could avoid the I/O and complexity: > 1) If contain_leaky_functions() approves every constraint expression, test the > constraints once, and we're done. Otherwise, proceed as we do today. > 2) Test the constraints in a subtransaction. If the subtransaction commits, > we're done. Otherwise, proceed as we do today. I'm not sure either of those is better than doing a single heap_getnext(), which really should be pretty cheap except under pathological conditions. It's the messiness I'm worried about more than the cost. regards, tom lane