Thread: NOT NULL constraints in foreign tables

NOT NULL constraints in foreign tables

From

Alvaro Herrera

Date:

17 August 2012, 18:58:34

Hi,

I noticed one more problem with NOT NULL constraints and foreign tables
-- which is that they are allowed at all (see also
http://archives.postgresql.org/message-id/1345214955-sup-3970@alvh.no-ip.org
earlier today).

Right now, with my patch, foreign table creation fails if you have a NOT
NULL column, because that NOT NULL will create a check constraint, and
those are disallowed for foreign tables.  So while HEAD allows you to
create the table, my patch causes that to fail.

There are two things I could do about this:

1. Avoid creating the CHECK constraint for NOT NULL declarations in
foreign tables.  This is the easiest to code, but it leaves us in the
situation that foreign tables will lack pg_constraint rows for NOT NULL
columns.  Not sure how undesirable this is.

2. Avoid having DefineRelation complain if it sees CHECK constraints
which correspond to some NOT NULL column declaration.  This is more
complex to implement, because basically we'd have to decompile the CHECK
constraint to find out whether it corresponds to something that had
previously been declared as just NOT NULL, and throw an error otherwise.
But this would mean we would have pg_constraint rows for those columns
... which I'm not really sure is a good thing.

I mean, what are NOT NULL in foreign tables for?  Are they harmed or
helped by having pg_constraint rows?

Thanks.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: NOT NULL constraints in foreign tables

From

Tom Lane

Date:

17 August 2012, 19:19:39

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I mean, what are NOT NULL in foreign tables for?

There was previous discussion about that, in the context of check
constraints in general, but I don't believe we reached consensus.
http://archives.postgresql.org/message-id/1038.1331738954@sss.pgh.pa.us

There's also an open commitfest entry for changing the behavior
of file_fdw, which IMO is completely premature since we haven't
got consensus on what it should do.
        regards, tom lane

Re: NOT NULL constraints in foreign tables

From

Robert Haas

Date:

17 August 2012, 19:44:33

On Fri, Aug 17, 2012 at 2:58 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I mean, what are NOT NULL in foreign tables for?  Are they harmed or
> helped by having pg_constraint rows?

As I've mentioned when this has come up before, I think that
constraints on foreign tables should be viewed as declarative
statements about the contents of the foreign data that the DB will
assume true.  This could be useful for a variety of purposes:
constraint exclusion, query optimization, etc.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: NOT NULL constraints in foreign tables

From

Alvaro Herrera

Date:

17 August 2012, 20:08:37

Excerpts from Robert Haas's message of vie ago 17 15:44:29 -0400 2012:
> On Fri, Aug 17, 2012 at 2:58 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I mean, what are NOT NULL in foreign tables for?  Are they harmed or
> > helped by having pg_constraint rows?
>
> As I've mentioned when this has come up before, I think that
> constraints on foreign tables should be viewed as declarative
> statements about the contents of the foreign data that the DB will
> assume true.  This could be useful for a variety of purposes:
> constraint exclusion, query optimization, etc.

So pg_constraint rows for NOT NULLs are a good thing, right?

In general, it seems to me that you're saying we should just lift the
DefineRelation-enforced restriction that foreign tables ought not to
have constraints.  So if the user wants to specify

CREATE FOREIGN TABLE people (   who person CHECK (who IS OF TYPE 'human'),   ..
) server foobar;

we ought to let them.  Correct?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: NOT NULL constraints in foreign tables

From

Euler Taveira

Date:

17 August 2012, 23:13:48

On 17-08-2012 16:44, Robert Haas wrote:
> On Fri, Aug 17, 2012 at 2:58 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I mean, what are NOT NULL in foreign tables for?  Are they harmed or
>> helped by having pg_constraint rows?
> 
> As I've mentioned when this has come up before, I think that
> constraints on foreign tables should be viewed as declarative
> statements about the contents of the foreign data that the DB will
> assume true.  This could be useful for a variety of purposes:
> constraint exclusion, query optimization, etc.
> 
+1. I don't see us providing a mechanism to cross-check changes between data
sources. Even if we do it for creation time, schema could be changed behind
the scenes. Let's use at least constraints (NOT NULL, CHECK, UNIQUE, PK --
UNIQUE + NOT NULL) to improve optimizer but warn (loudly) that those
constraints are merely for optimization.


--   Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento

Re: NOT NULL constraints in foreign tables

From

Robert Haas

Date:

20 August 2012, 19:35:47

On Fri, Aug 17, 2012 at 4:08 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Excerpts from Robert Haas's message of vie ago 17 15:44:29 -0400 2012:
>> On Fri, Aug 17, 2012 at 2:58 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>> > I mean, what are NOT NULL in foreign tables for?  Are they harmed or
>> > helped by having pg_constraint rows?
>>
>> As I've mentioned when this has come up before, I think that
>> constraints on foreign tables should be viewed as declarative
>> statements about the contents of the foreign data that the DB will
>> assume true.  This could be useful for a variety of purposes:
>> constraint exclusion, query optimization, etc.
>
> So pg_constraint rows for NOT NULLs are a good thing, right?

Well, they aren't a bad thing, anyway.  The query optimizer looks at
attisnull directly in the one case where this really matters (which
has to do with reordering left joins IIRC).  Allowing all check
constraints would certainly be a step forward, as it would allow
constraint exclusion.

> In general, it seems to me that you're saying we should just lift the
> DefineRelation-enforced restriction that foreign tables ought not to
> have constraints.  So if the user wants to specify
>
> CREATE FOREIGN TABLE people (
>     who person CHECK (who IS OF TYPE 'human'),
>     ..
> ) server foobar;
>
> we ought to let them.  Correct?

Yeah, that's my view.  Note that I excluded this from the initial
syntax commit of foreign tables out of some feeling that there were
some loose ends that weren't adequately handled by the original patch,
which did allow them.  I no longer remember what the deficiencies
were, unfortunately.  Obviously, at a minimum, we need to make sure
that they are dumped-and-restored properly, displayed by psql
properly, and documented.  But in general +1 for allowing this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: NOT NULL constraints in foreign tables

From

Jeff Davis

Date:

20 August 2012, 19:56:46

On Fri, 2012-08-17 at 15:44 -0400, Robert Haas wrote:
> On Fri, Aug 17, 2012 at 2:58 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I mean, what are NOT NULL in foreign tables for?  Are they harmed or
> > helped by having pg_constraint rows?
> 
> As I've mentioned when this has come up before, I think that
> constraints on foreign tables should be viewed as declarative
> statements about the contents of the foreign data that the DB will
> assume true.  This could be useful for a variety of purposes:
> constraint exclusion, query optimization, etc.

There are at least three kinds of constraint enforcement:

1. Enforced before the query runs (e.g. the current behavior on a normal
table).

2. Enforced when the query runs by validating the constraint as you go,
and then throwing an error when it turns out to be false.

3. Don't make any attempt to enforce, and silently produce wrong results
if it's false.

Which are you proposing, and how will the user know which kind of
constraint they've got?

Regards,Jeff Davis

Re: NOT NULL constraints in foreign tables

From

Robert Haas

Date:

20 August 2012, 20:50:57

On Mon, Aug 20, 2012 at 3:56 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2012-08-17 at 15:44 -0400, Robert Haas wrote:
>> On Fri, Aug 17, 2012 at 2:58 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>> > I mean, what are NOT NULL in foreign tables for?  Are they harmed or
>> > helped by having pg_constraint rows?
>>
>> As I've mentioned when this has come up before, I think that
>> constraints on foreign tables should be viewed as declarative
>> statements about the contents of the foreign data that the DB will
>> assume true.  This could be useful for a variety of purposes:
>> constraint exclusion, query optimization, etc.
>
> There are at least three kinds of constraint enforcement:
>
> 1. Enforced before the query runs (e.g. the current behavior on a normal
> table).
>
> 2. Enforced when the query runs by validating the constraint as you go,
> and then throwing an error when it turns out to be false.
>
> 3. Don't make any attempt to enforce, and silently produce wrong results
> if it's false.
>
> Which are you proposing, and how will the user know which kind of
> constraint they've got?

I'm proposing #1 for regular tables, as has always been the case, and
#3 for foreign tables.  #1 is not a reasonable alternative for foreign
tables because we lack enforcement power in that case, and #2 is also
not reasonable, because the only point of allowing declarative
constraints is to get better performance, and if we go with #2 then
we've pretty much thrown that out the window.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: NOT NULL constraints in foreign tables

From

Jeff Davis

Date:

21 August 2012, 05:15:14

On Mon, 2012-08-20 at 16:50 -0400, Robert Haas wrote:
> #3 for foreign tables.

I'm skeptical of that approach for two reasons:

(1) It will be hard to inform users which constraints are enforced and
which aren't.
(2) It will be hard for users to understand the planner benefits or the
consequences when the constraint is not enforced.

That being said, I can imagine good use cases (like when the foreign
table is in postgres, and already has that constraint declared), so I'm
not outright opposed to it.

> #1 is not a reasonable alternative for foreign
> tables because we lack enforcement power in that case,

Right.

>  and #2 is also
> not reasonable, because the only point of allowing declarative
> constraints is to get better performance, and if we go with #2 then
> we've pretty much thrown that out the window.

Declared constraints can improve the plans, while runtime-enforced
constraints slow down execution of a given plan. I'm not really sure
whether runtime enforcement is a good trade-off, but it doesn't seem
like an obviously bad one.

Also, what did you mean by "the only point of allowing declarative
constraints is to get better performance"? Maybe the user wants to get
an error if some important assumption about the remote data source is
not as true as when they declared the constraint.

Regards,Jeff Davis

Re: NOT NULL constraints in foreign tables

From

Robert Haas

Date:

21 August 2012, 14:41:56

On Mon, Aug 20, 2012 at 5:14 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2012-08-20 at 16:50 -0400, Robert Haas wrote:
>> #3 for foreign tables.
>
> I'm skeptical of that approach for two reasons:
>
> (1) It will be hard to inform users which constraints are enforced and
> which aren't.

The thing to keep in mind here is that EVERY property of a foreign
table is subject to change at any arbitrary point in time, without our
knowledge.  The very existence of the table, the names of its columns,
the types of those columns, and any additional constraints on that
columns are all things that can vary between the declaration and
what's actually present, and can be subsequently changed on the remote
side at any time.  Why should CHECK constraints be any different than,
say, column types?

> (2) It will be hard for users to understand the planner benefits or the
> consequences when the constraint is not enforced.

Why should that be any worse with foreign tables than anything else?
I mean, lots of people, as things stand today, manage to set up
partitioned tables using CHECK constraints.  There are undoubtedly
people who don't understand the planner benefit of having an
appropriate CHECK constraint on each partition, but it's not exactly a
common cause of confusion.

> That being said, I can imagine good use cases (like when the foreign
> table is in postgres, and already has that constraint declared), so I'm
> not outright opposed to it.
>
>> #1 is not a reasonable alternative for foreign
>> tables because we lack enforcement power in that case,
>
> Right.
>
>>  and #2 is also
>> not reasonable, because the only point of allowing declarative
>> constraints is to get better performance, and if we go with #2 then
>> we've pretty much thrown that out the window.
>
> Declared constraints can improve the plans, while runtime-enforced
> constraints slow down execution of a given plan. I'm not really sure
> whether runtime enforcement is a good trade-off, but it doesn't seem
> like an obviously bad one.

It seems to me that the term runtime enforcement is a misnomer; you
can't really "enforce" anything about a foreign table.  You can throw
an error if the data doesn't meet expectations, but by that time the
cat is already out of the bag.  The purpose of a CHECK constraint on a
plain table is to prevent bad data from going into a table, not to
throw an error when retrieving previously-inserted bad data.  If we
were to propose changing the semantics from the former to the latter,
we'd be laughed out of town, and rightly so.

> Also, what did you mean by "the only point of allowing declarative
> constraints is to get better performance"? Maybe the user wants to get
> an error if some important assumption about the remote data source is
> not as true as when they declared the constraint.

I think that need is going to be best served by issuing a query to
validate whatever constraint you're interested in, i.e. SELECT * FROM
foreign_table WHERE NOT (whatever_i_am_assuming).  I mean, let's
suppose that we were to allow unique constraints on foreign tables.
This is clearly useful, because, for example, it would allow join
removal in a case like local_table LEFT JOIN foreign_table ON
local_table.id = foreign_table.id; and it would also allow a
selectivity estimate of -1 for that column.  But are you going to
validate that the column in question is still unique every time you
access the table?  Presumably not; you'd have to read and sort the
entire table to do that.  Now, if the query is something like "SELECT
* FROM foreign_table WHERE id = 1", you could fairly cheaply validate
that there is only one row with id = 1, but that's not the same thing
as validating that the assumption (namely, that foreign_table (id) is
unique) is still true.  And if the query is "SELECT
max(some_other_column) FROM foreign_table", you can't really validate
anything at all, or at least not without a lot of extra overhead.

Now, it's more feasible to think that you could validate a CHECK
constraint on each row that the query actually touches, but that's
still going to be pretty expensive, and it's still not the same thing
as validating that the assumption holds true for all rows in the
table.  I think if we go down this road of trying to validate
remote-side CHECK constraints, we're going to end up with a mishmash
of cases where constraints are checked and other cases where
constraints are not checked, and then that really is going to be
confusing.  We're definitely not going to be able to check everything
all the time, so checking nothing ever seems like the principled
alternative.

Also, for tightly-coupled databases under common control, there's
really no need to be constantly checking and rechecking that the other
guy hasn't pulled the rug out from under you.  We certainly need to
have that as an option.  For loosely-coupled databases under different
administrative control there might be some point, but it seems sort of
hopeless: if the other DBA doesn't care about breaking your stuff, he
may happen to rename the table instead of dropping the check
constraint, and then you're definitely hosed.  People keep proposing
the idea of being able to somehow validate remote check constraints,
so maybe that is evidence that there is some use case for it, but I
can't quite figure out what it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: NOT NULL constraints in foreign tables

From

Jeff Davis

Date:

22 August 2012, 17:06:20

On Tue, 2012-08-21 at 10:41 -0400, Robert Haas wrote: 
> The thing to keep in mind here is that EVERY property of a foreign
> table is subject to change at any arbitrary point in time, without our
> knowledge.  ... Why should CHECK constraints be any different than,
> say, column types?

So, let's say someone changes column types from int to bigint on the
remote side, and you still have int on the local side. It continues to
work and everything is fine until all of a sudden you get 2^33 back, and
that generates an error.

That sounds closer to the semantics of constraint enforcement mechanism
#2 than #3 to me. That is, everything is fine until you get something
that you know is wrong, and you throw an error.

> Why should that be any worse with foreign tables than anything else?
> I mean, lots of people, as things stand today, manage to set up
> partitioned tables using CHECK constraints.  There are undoubtedly
> people who don't understand the planner benefit of having an
> appropriate CHECK constraint on each partition, but it's not exactly a
> common cause of confusion.

But there are no consequences there other than performance. With
unenforced constraints, they may get correct results during development
and testing, and wrong results occasionally when in production. That's
hard to explain to a user.

> It seems to me that the term runtime enforcement is a misnomer; you
> can't really "enforce" anything about a foreign table.

Maybe I chose the wrong terms, but there are at least 3 semantically
different concepts. Feel free to suggest a better term.

> If we
> were to propose changing the semantics from the former to the latter,
> we'd be laughed out of town, and rightly so.

I'm not proposing changing the semantics, I'm saying that there are more
than just 2 semantic options available, and they offer different kinds
of guarantees. Users may be interested in all 3 for different use cases.

>   I mean, let's
> suppose that we were to allow unique constraints on foreign tables.

I'm sure there are cases where people will not want what I am
suggesting, but I think there are cases where it is plausibly useful.

> Now, if the query is something like "SELECT
> * FROM foreign_table WHERE id = 1", you could fairly cheaply validate
> that there is only one row with id = 1, but that's not the same thing
> as validating that the assumption (namely, that foreign_table (id) is
> unique) is still true.

And if you don't issue a query at all, the constraint might not still be
true; but I don't think that implies that checking it when you do run a
query is useless.

> I think if we go down this road of trying to validate
> remote-side CHECK constraints, we're going to end up with a mishmash
> of cases where constraints are checked and other cases where
> constraints are not checked, and then that really is going to be
> confusing.

If we use keywords to differentiate constraints that are different
semantically, then we can just say that some types of constraints are
allowed on foreign tables and some are not.

I guess what I'd like to avoid is saying that a check constraint on a
regular table means one thing, and the same check constraint on a
foreign table means something else. If we differentiate them by
requiring special keywords like "NOT ENFORCED", then it would be more
user-visible what's going on, and it would allow room for new semantics
later if we want. Normal constraints would be disallowed on foreign
tables, but NOT ENFORCED ones would be allowed.

That brings up another point: what if someone really, really, doesn't
want to pay the overhead of enforcing their constraint on a local table,
but wants the planner benefit? Would they have to make it a remote table
to bypass the constraint check?

Regards,Jeff Davis

Re: NOT NULL constraints in foreign tables

From

Robert Haas

Date:

25 August 2012, 01:49:09

On Wed, Aug 22, 2012 at 12:59 PM, Jeff Davis <davis.jeffrey@gmail.com> wrote:
> On Tue, 2012-08-21 at 10:41 -0400, Robert Haas wrote:
>> The thing to keep in mind here is that EVERY property of a foreign
>> table is subject to change at any arbitrary point in time, without our
>> knowledge.  ... Why should CHECK constraints be any different than,
>> say, column types?
>
> So, let's say someone changes column types from int to bigint on the
> remote side, and you still have int on the local side. It continues to
> work and everything is fine until all of a sudden you get 2^33 back, and
> that generates an error.
>
> That sounds closer to the semantics of constraint enforcement mechanism
> #2 than #3 to me. That is, everything is fine until you get something
> that you know is wrong, and you throw an error.

Sure, but in that case you're not paying anything extra for it.

>> Why should that be any worse with foreign tables than anything else?
>> I mean, lots of people, as things stand today, manage to set up
>> partitioned tables using CHECK constraints.  There are undoubtedly
>> people who don't understand the planner benefit of having an
>> appropriate CHECK constraint on each partition, but it's not exactly a
>> common cause of confusion.
>
> But there are no consequences there other than performance. With
> unenforced constraints, they may get correct results during development
> and testing, and wrong results occasionally when in production. That's
> hard to explain to a user.

Sure.  Of course, your example of a column that is bigserial on one
side and an integer on the other side is a perfect example of how that
could happen *anyway*.  I'm all in favor of building things in a way
that minimizes the possibility of user confusion.  But since foreign
tables inevitably carry large amounts of risk in that area anyway, I
can't get very excited about fixing 10% of the problem.  That seems
likely to create the perception of safety without the reality.

> And if you don't issue a query at all, the constraint might not still be
> true; but I don't think that implies that checking it when you do run a
> query is useless.

Well, it does to me, but your mileage may vary (and obviously does).

>> I think if we go down this road of trying to validate
>> remote-side CHECK constraints, we're going to end up with a mishmash
>> of cases where constraints are checked and other cases where
>> constraints are not checked, and then that really is going to be
>> confusing.
>
> If we use keywords to differentiate constraints that are different
> semantically, then we can just say that some types of constraints are
> allowed on foreign tables and some are not.
>
> I guess what I'd like to avoid is saying that a check constraint on a
> regular table means one thing, and the same check constraint on a
> foreign table means something else. If we differentiate them by
> requiring special keywords like "NOT ENFORCED", then it would be more
> user-visible what's going on, and it would allow room for new semantics
> later if we want. Normal constraints would be disallowed on foreign
> tables, but NOT ENFORCED ones would be allowed.

This, I could get behind.

> That brings up another point: what if someone really, really, doesn't
> want to pay the overhead of enforcing their constraint on a local table,
> but wants the planner benefit? Would they have to make it a remote table
> to bypass the constraint check?

This is also a good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company