Thread: PRIMARY KEY on a *group* of columns imply that each column is NOT NULL?

PRIMARY KEY on a *group* of columns imply that each column is NOT NULL?

From
Stephane Bortzmeyer
Date:
If I define a primary key:

       name TEXT NOT NULL,
       address INET,
       PRIMARY KEY(name, address)

the definition (seen by \d) becomes:

 name          | text                        | not null
 address       | inet                        | not null

"address" is now not null, which I do not want. It seems unnecessary:
I just want the tuple (name, address) to be unique, which seems
possible even if some 'address' values are NULL.

It does not appear to be documented in
http://www.postgresql.org/docs/7.4/interactive/ddl-constraints.html#AEN1975.
Is there a workaround?


Re: PRIMARY KEY on a *group* of columns imply that each column is NOT NULL?

From
"Guy Rouillier"
Date:
Stephane Bortzmeyer wrote:
> If I define a primary key:
>
>        name TEXT NOT NULL,
>        address INET,
>        PRIMARY KEY(name, address)
>
> the definition (seen by \d) becomes:
>
>  name          | text                        | not null
>  address       | inet                        | not null
>
> "address" is now not null, which I do not want. It seems unnecessary:
> I just want the tuple (name, address) to be unique, which seems
> possible even if some 'address' values are NULL.
>
> It does not appear to be documented in
>
http://www.postgresql.org/docs/7.4/interactive/ddl-constraints.html#AEN1
975.
> Is there a workaround?

Per the SQL Commands Reference, under CREATE TABLE:

"The primary key constraint specifies that a column or columns of a
table may contain only unique (non-duplicate), nonnull values.
Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT NULL"

Primary key columns cannot contain null values.

--
Guy Rouillier


Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Tue, Apr 26, 2005 at 03:22:40PM -0500,
 Guy Rouillier <guyr@masergy.com> wrote
 a message of 37 lines which said:

> "The primary key constraint specifies that a column or columns of a
> table may contain only unique (non-duplicate), nonnull values.
> Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT
> NULL"
>
> Primary key columns cannot contain null values.

I read the above also. It is perfectly clear for primary key on one
column.

But it does not apply to primary keys containing a group of
columns. In that case (my case), columns do not have to be UNIQUE. But
they have to be NOT NULL, which puzzles me.

Re: PRIMARY KEY on a *group* of columns imply that each

From
Scott Marlowe
Date:
On Tue, 2005-04-26 at 15:39, Stephane Bortzmeyer wrote:
> On Tue, Apr 26, 2005 at 03:22:40PM -0500,
>  Guy Rouillier <guyr@masergy.com> wrote
>  a message of 37 lines which said:
>
> > "The primary key constraint specifies that a column or columns of a
> > table may contain only unique (non-duplicate), nonnull values.
> > Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT
> > NULL"
> >
> > Primary key columns cannot contain null values.
>
> I read the above also. It is perfectly clear for primary key on one
> column.
>
> But it does not apply to primary keys containing a group of
> columns. In that case (my case), columns do not have to be UNIQUE. But
> they have to be NOT NULL, which puzzles me.

Here's a quote from the SQL1992 spec that's VERY clear:

A unique constraint is satisfied if and only if no two rows in
a table have the same non-null values in the unique columns. In
addition, if the unique constraint was defined with PRIMARY KEY,
then it requires that none of the values in the specified column or
columns be the null value.

Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Tue, Apr 26, 2005 at 03:48:44PM -0500,
 Scott Marlowe <smarlowe@g2switchworks.com> wrote
 a message of 26 lines which said:

> Here's a quote from the SQL1992 spec that's VERY clear:

Yes, PostgreSQL is right and implement the standard. Now, what's the
rationale for the standard? I understand it for a single column but,
for several columns, it should be still possible to have different
tuples, such as (3, NULL) and (5, NULL) for instance.

Re: PRIMARY KEY on a *group* of columns imply that each

From
Richard Huxton
Date:
Stephane Bortzmeyer wrote:
> On Tue, Apr 26, 2005 at 03:48:44PM -0500,
>  Scott Marlowe <smarlowe@g2switchworks.com> wrote
>  a message of 26 lines which said:
>
>
>>Here's a quote from the SQL1992 spec that's VERY clear:
>
>
> Yes, PostgreSQL is right and implement the standard. Now, what's the
> rationale for the standard? I understand it for a single column but,
> for several columns, it should be still possible to have different
> tuples, such as (3, NULL) and (5, NULL) for instance.

The value of (3,NULL) isn't well-defined. In particular, you can't say
that (3,NULL) = (3,NULL) since NULL means not-known. The fact that part
of the value is not known means the value as a whole is not known.

--
   Richard Huxton
   Archonet Ltd

Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
ptjm@interlog.com (Patrick TJ McPhee)
Date:
In article <20050426203938.GA18628@nic.fr>,
Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:

% But it does not apply to primary keys containing a group of
% columns. In that case (my case), columns do not have to be UNIQUE. But
% they have to be NOT NULL, which puzzles me.

It does apply to primary keys containing groups of columns.

You can get the table definition you want by using a unique constraint,
but you should know that in SQL, unique constraints don't apply to
rows containing null values in the constrained columns. If you
do this:

 create table x (
   name TEXT NOT NULL,
   address INET,
   CONSTRAINT na UNIQUE (name, address)
 );

your table definition will be as you want it, but the constraint you
want won't be there.

$ INSERT INTO x VALUES ('alpha');
INSERT 194224 1
$ INSERT INTO x VALUES ('alpha');
INSERT 194225 1
$ INSERT INTO x VALUES ('alpha');
INSERT 194226 1
$ INSERT INTO x VALUES ('alpha');
INSERT 194227 1

--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

Re: PRIMARY KEY on a *group* of columns imply that each column is

From
Scott Marlowe
Date:
On Wed, 2005-04-27 at 02:12, Stephane Bortzmeyer wrote:
> On Tue, Apr 26, 2005 at 03:48:44PM -0500,
>  Scott Marlowe <smarlowe@g2switchworks.com> wrote
>  a message of 26 lines which said:
>
> > Here's a quote from the SQL1992 spec that's VERY clear:
>
> Yes, PostgreSQL is right and implement the standard. Now, what's the
> rationale for the standard? I understand it for a single column but,
> for several columns, it should be still possible to have different
> tuples, such as (3, NULL) and (5, NULL) for instance.

Since NULL <> NULL, that means you could then have

(5,NULL) and (5,NULL) since the two NULLS aren't equal.



Re: PRIMARY KEY on a *group* of columns imply that each

From
Stephan Szabo
Date:
On Wed, 27 Apr 2005, Stephane Bortzmeyer wrote:

> On Tue, Apr 26, 2005 at 03:48:44PM -0500,
>  Scott Marlowe <smarlowe@g2switchworks.com> wrote
>  a message of 26 lines which said:
>
> > Here's a quote from the SQL1992 spec that's VERY clear:
>
> Yes, PostgreSQL is right and implement the standard. Now, what's the
> rationale for the standard? I understand it for a single column but,
> for several columns, it should be still possible to have different
> tuples, such as (3, NULL) and (5, NULL) for instance.

The case that they're trying to prevent is two tuples like (3, NULL) and
(3,NULL) since uniqueness alone doesn't prevent them both from being
inserted.

Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Wed, Apr 27, 2005 at 05:19:32AM +0000,
 Patrick TJ McPhee <ptjm@interlog.com> wrote
 a message of 37 lines which said:

> but you should know that in SQL, unique constraints don't apply to
> rows containing null values

May be I should but I didn't.

> your table definition will be as you want it, but the constraint you
> want won't be there.

OK, I will try to write a custom trigger, then.



Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Tom Lane
Date:
Stephane Bortzmeyer <bortzmeyer@nic.fr> writes:
> Yes, PostgreSQL is right and implement the standard. Now, what's the
> rationale for the standard? I understand it for a single column but,
> for several columns, it should be still possible to have different
> tuples, such as (3, NULL) and (5, NULL) for instance.

If that's what you want, declare it as UNIQUE not PRIMARY KEY.

            regards, tom lane

Re: PRIMARY KEY on a *group* of columns imply that each

From
Scott Marlowe
Date:
On Wed, 2005-04-27 at 09:06, Stephane Bortzmeyer wrote:
> On Wed, Apr 27, 2005 at 05:19:32AM +0000,
>  Patrick TJ McPhee <ptjm@interlog.com> wrote
>  a message of 37 lines which said:
>
> > but you should know that in SQL, unique constraints don't apply to
> > rows containing null values
>
> May be I should but I didn't.

Actually, considering that many databases (at least in the past) have
ignored this and treated nulls as unique things, it's quite
understandable.

MSSQL, for instance, used to definitely allow only one null in a unique
field.  So, for that database, not null wasn't really necessary for a
primary key column.

I believe this problem exist(s)(ed) in several other supposedly
"enterprise" class databases as well.

Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Wed, Apr 27, 2005 at 10:26:30AM -0400,
 Tom Lane <tgl@sss.pgh.pa.us> wrote
 a message of 9 lines which said:

> If that's what you want, declare it as UNIQUE not PRIMARY KEY.

As shown by Patrick TJ McPhee, it does not work:

tests=>  create table x (
tests(>    name TEXT NOT NULL,
tests(>    address INET,
tests(>    CONSTRAINT na UNIQUE (name, address)
tests(>  );
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "na" for table "x"
CREATE TABLE
tests=> INSERT INTO x (name) values ('foobar');
INSERT 45380 1
tests=> INSERT INTO x (name) values ('foobar');
INSERT 45381 1
tests=> INSERT INTO x (name) values ('foobar');
INSERT 45382 1
tests=> INSERT INTO x (name) values ('foobar');
INSERT 45383 1
tests=> select * from x;
  name  | address
--------+---------
 foobar |
 foobar |
 foobar |
 foobar |
(4 rows)

Re: PRIMARY KEY on a *group* of columns imply that each

From
Scott Marlowe
Date:
On Wed, 2005-04-27 at 09:06, Stephane Bortzmeyer wrote:
> On Wed, Apr 27, 2005 at 05:19:32AM +0000,
>  Patrick TJ McPhee <ptjm@interlog.com> wrote
>  a message of 37 lines which said:
>
> > but you should know that in SQL, unique constraints don't apply to
> > rows containing null values
>
> May be I should but I didn't.
>
> > your table definition will be as you want it, but the constraint you
> > want won't be there.
>
> OK, I will try to write a custom trigger, then.

Often the best bet here, btw, is to declare it not null then use
something other than null to represent null, like the text characters NA
or something.

Re: PRIMARY KEY on a *group* of columns imply that each

From
Sebastian Böck
Date:
Stephane Bortzmeyer wrote:
> On Wed, Apr 27, 2005 at 10:26:30AM -0400,
>  Tom Lane <tgl@sss.pgh.pa.us> wrote
>  a message of 9 lines which said:
>
>
>>If that's what you want, declare it as UNIQUE not PRIMARY KEY.
>
>
> As shown by Patrick TJ McPhee, it does not work:
>
> tests=>  create table x (
> tests(>    name TEXT NOT NULL,
> tests(>    address INET,
> tests(>    CONSTRAINT na UNIQUE (name, address)
> tests(>  );
> NOTICE:  CREATE TABLE / UNIQUE will create implicit index "na" for table "x"
> CREATE TABLE
> tests=> INSERT INTO x (name) values ('foobar');
> INSERT 45380 1
> tests=> INSERT INTO x (name) values ('foobar');
> INSERT 45381 1
> tests=> INSERT INTO x (name) values ('foobar');
> INSERT 45382 1
> tests=> INSERT INTO x (name) values ('foobar');
> INSERT 45383 1
> tests=> select * from x;
>   name  | address
> --------+---------
>  foobar |
>  foobar |
>  foobar |
>  foobar |
> (4 rows)
>

If i understand correctly, you want something like:

create table x (
   name TEXT NOT NULL PRIMARY KEY,
   address INET
);

CREATE UNIQUE INDEX na ON x (name, address) WHERE address IS NULL;

HTH

Sebastian


Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Wed, Apr 27, 2005 at 04:50:23PM +0200,
 Sebastian Böck <sebastianboeck@freenet.de> wrote
 a message of 48 lines which said:

> CREATE UNIQUE INDEX na ON x (name, address) WHERE address IS NULL;

No, because it prevents two tuples with the same value of "name".

Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Wed, Apr 27, 2005 at 09:36:57AM -0500,
 Scott Marlowe <smarlowe@g2switchworks.com> wrote
 a message of 18 lines which said:

> Often the best bet here, btw, is to declare it not null then use
> something other than null to represent null, like the text
> characters NA or something.

Yes, but it defeats the purpose of NULL. And what should I use as a
"pseudo-NULL" value for INET? 127.0.0.1? 0.0.0.0? Special values are
well-known for the problems they raise. That's why many languages have
NULL-like solutions (None in Python, undef in Perl, Maybe types in
Haskell, etc).


Re: PRIMARY KEY on a *group* of columns imply that each

From
Sebastian Böck
Date:
Stephane Bortzmeyer wrote:
> On Wed, Apr 27, 2005 at 04:50:23PM +0200,
>  Sebastian Böck <sebastianboeck@freenet.de> wrote
>  a message of 48 lines which said:
>
>
>>CREATE UNIQUE INDEX na ON x (name, address) WHERE address IS NULL;
>
>
> No, because it prevents two tuples with the same value of "name".

Ahh, sorry! Ment something more like:

CREATE TABLE table x (
   name TEXT NOT NULL,
   address INET
);

CREATE UNIQUE INDEX na ON x (name, address);
CREATE UNIQUE INDEX n ON x (name) WHERE address IS NULL;

HTH

Sebastian

Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Wed, Apr 27, 2005 at 05:04:07PM +0200,
 Sebastian Böck <sebastianboeck@freenet.de> wrote
 a message of 24 lines which said:

               One is enough :-)
               vvvvv
> CREATE TABLE table x (
>   name TEXT NOT NULL,
>   address INET
> );
>
> CREATE UNIQUE INDEX na ON x (name, address);
> CREATE UNIQUE INDEX n ON x (name) WHERE address IS NULL;

Great! It works fine. Many thanks.

tests=> select * from x;
 name | address
------+---------
 foo  |
 foo  | 1.2.3.4
 foo  | ::1
 bar  | ::1
 bar  |
(5 rows)
tests=> insert into x (name) values ('bar');
ERROR:  duplicate key violates unique constraint "n"
tests=> insert into x (name, address) values ('bar', '::1');
ERROR:  duplicate key violates unique constraint "na"
tests=> insert into x (name) values ('new');
INSERT 21128 1

Re: PRIMARY KEY on a *group* of columns imply that each column is

From
Marco Colombo
Date:
On Wed, 2005-04-27 at 17:00 +0200, Stephane Bortzmeyer wrote:
> On Wed, Apr 27, 2005 at 09:36:57AM -0500,
>  Scott Marlowe <smarlowe@g2switchworks.com> wrote
>  a message of 18 lines which said:
>
> > Often the best bet here, btw, is to declare it not null then use
> > something other than null to represent null, like the text
> > characters NA or something.
>
> Yes, but it defeats the purpose of NULL. And what should I use as a
> "pseudo-NULL" value for INET? 127.0.0.1? 0.0.0.0? Special values are
> well-known for the problems they raise. That's why many languages have
> NULL-like solutions (None in Python, undef in Perl, Maybe types in
> Haskell, etc).

No. NULL is NOT 'None', nor 'undef', and definitely not NULL as in C.
Those are perfectly defined values, although special ones. Only 'undef'
is quite misleading, but nevertheless it is just _one_ value.
That is, given a variable A, you can always write a boolean expression
that evaluates True or False to test if A is _equal_ to None/undef/NULL
(in C):

$ python -c "a = None; print a == None"
True

$ perl -e 'print a == undef, "\n"'
1

$ cat p.c
#include <stdio.h>

int
main(int argc, char *argv[])
{
        char *a = NULL;
        printf ("%d\n", a == NULL);
}

$ cc p.c
$ ./a.out
1

About Haskell, I don't know. For what I understand from quick reading
the manual, it'd say that Nothing is similar. You _can_ tell if
something is equal to Nothing.

In databases, NULL has a completely different meaning. It doesn't mean
_no value_, which is just a special (single) value, but it means
_unknown value_. You simply can't compare it with any single value
(even special ones) and expect a boolean answer. The only possible
answer is 'I don't know', which is NULL in boolean. Notice that the
boolean NULL is _not_ the same of False. So you get:

marco=# select 2 = 2;
 ?column?
----------
 t
(1 row)

marco=# select 2 = 3;
 ?column?
----------
 f
(1 row)

marco=# select 2 = NULL;
 ?column?
----------

(1 row)

that is, neither true nor false. Back to your example, you can compare
('a', 2) with ('a', 2), the result is 't' and thus you've managed
to identify the right row (it works as a primary key).
Also, ('a', 3') is different from ('a', '2'), so you can tell the two
rows are different. But what if you allow ('a', NULL)?

('a', NULL) is neither the same _nor different_ from ('a', 2). The
result of comparison is NULL, no matter how you're testing it:

marco=# select ('a', 2) = ('a', NULL);
 ?column?
----------

(1 row)

marco=# select ('a', 2) <> ('a', NULL);
 ?column?
----------

(1 row)

see? _Neither_ one is true. This would completely defeat the purpose of
the primary key. And of course, comparing ('a', NULL) with ('a', NULL)
results in exactly the same:

marco=# select ('a', NULL) = ('a', NULL);
 ?column?
----------

(1 row)

marco=# select ('a', NULL) <> ('a', NULL);
 ?column?
----------

(1 row)

That's why NULLs are not allowed in primary keys. The key simply won't
work. NULL in databases is not _one_ special value. It's _any_ value,
since it's unknown. The boolean expression:

    2 = NULL

might be true or might be false, since NULL could be _any_ integer in
this expression.

This is completely different from the semantic of None/undef/NULL in
most programming languages.

You wrote:

"Special values are well-known for the problems they raise."

then NULL is definitely _not_ the value you're looking for.
Everything can be said of NULL, but that it is "well-known".

In your case, by choosing (name, address) as the primary key, you're
saying 'I need to know both the name and the address to be able to
retrieve a datum in the table'. This implies that if you have partial
knowledge (you don't know the address), you can't "naturally" retrieve a
single datum (or insert it).

Depending on what you're trying to achieve, you may need to split
the table (normalization the theorists call it). I don't like theory
much, but its conclusions sometimes just make a lot of sense. :-)
Review your design, maybe either the table schema or the choice of the
primary key is not natural for your database.

.TM.
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


Re: PRIMARY KEY on a *group* of columns imply that each column is NOT

From
Stephane Bortzmeyer
Date:
On Thu, Apr 28, 2005 at 02:04:29PM +0200,
 Marco Colombo <pgsql@esiway.net> wrote
 a message of 146 lines which said:

> No. NULL is NOT 'None', nor 'undef', and definitely not NULL as in
> C.

Thanks for the very good and detailed explanation of NULL in
SQL. Curious people may note that the strange semantics of NULL are
heavily criticized in C. J. Date and Hugh Darwen "A Guide to the SQL
Standard" (Addison-Wesley) [Side note: I was perfectly aware for NULL
in C, where it is just an ordinary zero, that's why I only mentioned
Perl and Python.]

> marco=# select 2 = NULL;
>  ?column?
> ----------
>
> (1 row)

Even better, you can write;

registry=> select NULL = NULL;
 ?column?
----------

(1 row)

> Depending on what you're trying to achieve, you may need to split
> the table (normalization the theorists call it).

Yes, I noticed in similar organizations that the Hosts table was split
in one table for names and one for addresses, may be for exactly that
reason.

> Review your design, maybe either the table schema or the choice of
> the primary key is not natural for your database.

At the present time, it seems that, practically speaking, the
technique proposed by Sebastian Böck (two index) is perfectly
fine. This technique can be summarized as "PostgreSQL, I tell you to
treat all NULL addresses as being the same value".


Re: PRIMARY KEY on a *group* of columns imply that each column is

From
Ron Mayer
Date:
Marco Colombo wrote:
> That is, given a variable A, you can always write a boolean expression
> that evaluates True or False to test if A is _equal_ to None/undef/NULL
> (in C):

And of course you can do this in SQL as well.

> ('a', NULL) is neither the same _nor different_ from ('a', 2).

Uh, I'm not sure what you mean by "the same"; but
certainly ('a',NULL) is distinct from ('a',2).

>The result of comparison is NULL, no matter how you're testing it:

Unless you're testing with the IS DISTINCT FROM operator.



fli=# select ('a',NULL) IS DISTINCT FROM ('a',2);
  ?column?
----------
  t
(1 row)

> marco=# select ('a', 2) = ('a', NULL);
>  ?column?
> ----------
>
> (1 row)
>
> marco=# select ('a', 2) <> ('a', NULL);
>  ?column?
> ----------
>
> (1 row)


fli=# select ('a',NULL) IS DISTINCT FROM ('a',NULL);
  ?column?
----------
  f
(1 row)


> That's why NULLs are not allowed in primary keys...

But in general I agree with most of the rest of what you said.