Re: NOT NULL constraints in foreign tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: NOT NULL constraints in foreign tables
Date
Msg-id CA+TgmobUChWmk2=Y60JYSUXKswnFGaJs-Xff9oJBJ6MZsk3XgA@mail.gmail.com
Whole thread Raw
In response to Re: NOT NULL constraints in foreign tables  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: NOT NULL constraints in foreign tables
List pgsql-hackers
On Mon, Aug 20, 2012 at 5:14 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2012-08-20 at 16:50 -0400, Robert Haas wrote:
>> #3 for foreign tables.
>
> I'm skeptical of that approach for two reasons:
>
> (1) It will be hard to inform users which constraints are enforced and
> which aren't.

The thing to keep in mind here is that EVERY property of a foreign
table is subject to change at any arbitrary point in time, without our
knowledge.  The very existence of the table, the names of its columns,
the types of those columns, and any additional constraints on that
columns are all things that can vary between the declaration and
what's actually present, and can be subsequently changed on the remote
side at any time.  Why should CHECK constraints be any different than,
say, column types?

> (2) It will be hard for users to understand the planner benefits or the
> consequences when the constraint is not enforced.

Why should that be any worse with foreign tables than anything else?
I mean, lots of people, as things stand today, manage to set up
partitioned tables using CHECK constraints.  There are undoubtedly
people who don't understand the planner benefit of having an
appropriate CHECK constraint on each partition, but it's not exactly a
common cause of confusion.

> That being said, I can imagine good use cases (like when the foreign
> table is in postgres, and already has that constraint declared), so I'm
> not outright opposed to it.
>
>> #1 is not a reasonable alternative for foreign
>> tables because we lack enforcement power in that case,
>
> Right.
>
>>  and #2 is also
>> not reasonable, because the only point of allowing declarative
>> constraints is to get better performance, and if we go with #2 then
>> we've pretty much thrown that out the window.
>
> Declared constraints can improve the plans, while runtime-enforced
> constraints slow down execution of a given plan. I'm not really sure
> whether runtime enforcement is a good trade-off, but it doesn't seem
> like an obviously bad one.

It seems to me that the term runtime enforcement is a misnomer; you
can't really "enforce" anything about a foreign table.  You can throw
an error if the data doesn't meet expectations, but by that time the
cat is already out of the bag.  The purpose of a CHECK constraint on a
plain table is to prevent bad data from going into a table, not to
throw an error when retrieving previously-inserted bad data.  If we
were to propose changing the semantics from the former to the latter,
we'd be laughed out of town, and rightly so.

> Also, what did you mean by "the only point of allowing declarative
> constraints is to get better performance"? Maybe the user wants to get
> an error if some important assumption about the remote data source is
> not as true as when they declared the constraint.

I think that need is going to be best served by issuing a query to
validate whatever constraint you're interested in, i.e. SELECT * FROM
foreign_table WHERE NOT (whatever_i_am_assuming).  I mean, let's
suppose that we were to allow unique constraints on foreign tables.
This is clearly useful, because, for example, it would allow join
removal in a case like local_table LEFT JOIN foreign_table ON
local_table.id = foreign_table.id; and it would also allow a
selectivity estimate of -1 for that column.  But are you going to
validate that the column in question is still unique every time you
access the table?  Presumably not; you'd have to read and sort the
entire table to do that.  Now, if the query is something like "SELECT
* FROM foreign_table WHERE id = 1", you could fairly cheaply validate
that there is only one row with id = 1, but that's not the same thing
as validating that the assumption (namely, that foreign_table (id) is
unique) is still true.  And if the query is "SELECT
max(some_other_column) FROM foreign_table", you can't really validate
anything at all, or at least not without a lot of extra overhead.

Now, it's more feasible to think that you could validate a CHECK
constraint on each row that the query actually touches, but that's
still going to be pretty expensive, and it's still not the same thing
as validating that the assumption holds true for all rows in the
table.  I think if we go down this road of trying to validate
remote-side CHECK constraints, we're going to end up with a mishmash
of cases where constraints are checked and other cases where
constraints are not checked, and then that really is going to be
confusing.  We're definitely not going to be able to check everything
all the time, so checking nothing ever seems like the principled
alternative.

Also, for tightly-coupled databases under common control, there's
really no need to be constantly checking and rechecking that the other
guy hasn't pulled the rug out from under you.  We certainly need to
have that as an option.  For loosely-coupled databases under different
administrative control there might be some point, but it seems sort of
hopeless: if the other DBA doesn't care about breaking your stuff, he
may happen to rename the table instead of dropping the check
constraint, and then you're definitely hosed.  People keep proposing
the idea of being able to somehow validate remote check constraints,
so maybe that is evidence that there is some use case for it, but I
can't quite figure out what it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Boszormenyi Zoltan
Date:
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework
Next
From: Robert Haas
Date:
Subject: Re: [PATCH]Tablesample Submission