Thread: Referential integrity broken (8.0.3), sub-select help

Referential integrity broken (8.0.3), sub-select help

From

Date:

21 March 2006, 10:58:37

Hello,

I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK.
Somehow I ended up with some rows in B referencing non-existent rows in U.
This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen.
I'm using 8.0.3.

Here are the table references I just mentioned:

Table "bookmark":    id                  SERIAL                        CONSTRAINT pk_bookmark_id PRIMARY KEYTable
"url":  url_id              INTEGER                        CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)
 


Problem #1: Strange that PG allowed this to happen.  Maybe my DDL above allows this to happen and needs to be
tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else?
 

Problem #2: I'd like to find all rows in B that point to non-existent rows in U.  I can do it with the following
sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned):
 
 SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id);

Is there a more efficient way to get the rows from "bookmark"?

Thanks,
Otis

Re: Referential integrity broken (8.0.3), sub-select help

From

Stephan Szabo

Date:

21 March 2006, 11:08:38

On Tue, 21 Mar 2006 ogjunk-pgjedan@yahoo.com wrote:

> I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK.

That's not what your schema below has.  Your fragment below has URL
pointing to bookmark.

> Somehow I ended up with some rows in B referencing non-existent rows in U.

With the below, this is entirely possible, since you're only guaranteeing
that URLs have valid bookmarks not the other way around.  Are you sure the
below is actually what you have?

> This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen.
> I'm using 8.0.3.
>
> Here are the table references I just mentioned:
>
> Table "bookmark":
>      id                  SERIAL
>                          CONSTRAINT pk_bookmark_id PRIMARY KEY
>
>  Table "url":
>     url_id              INTEGER
>                          CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)

Re: Referential integrity broken (8.0.3), sub-select help

From

"Aftab Alam"

Date:

21 March 2006, 11:16:30

delete my email from the list

Regards,





-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org]On Behalf Of
ogjunk-pgjedan@yahoo.com
Sent: Tuesday, March 21, 2006 8:29 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Referential integrity broken (8.0.3), sub-select help


Hello,

I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing
to "url" via FK.
Somehow I ended up with some rows in B referencing non-existent rows in U.
This sounds super strange and dangerous to me, and it's not clear to me
how/why PG let this happen.
I'm using 8.0.3.

Here are the table references I just mentioned:

Table "bookmark":    id                  SERIAL                        CONSTRAINT pk_bookmark_id PRIMARY KEY
Table "url":   url_id              INTEGER                        CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)


Problem #1: Strange that PG allowed this to happen.  Maybe my DDL above
allows this to happen and needs to be tightened?  I thought the above would
ensure referential integrity, but maybe I need to specify something else?

Problem #2: I'd like to find all rows in B that point to non-existent rows
in U.  I can do it with the following sub-select, I believe, but it's rather
inefficient (EXPLAIN shows both tables would be sequentially scanned):
 SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u
WHERE b.url_id=u.id);

Is there a more efficient way to get the rows from "bookmark"?

Thanks,
Otis


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Referential integrity broken (8.0.3), sub-select help

From

Date:

21 March 2006, 14:11:45

Hi,

I mistakenly swapped the tables in my email.  Here they are, corrected:
Table "url":    id                  SERIAL                         CONSTRAINT pk_url_id PRIMARY KEY 
Table "bookmark":    url_id              INTEGER                          CONSTRAINT fk_url_id REFERENCES url(id)
I see my questions got chopped off from this email below, so let me restate them:

Problem #1: Strange that PG allowed this to happen.  Maybe my DDL above allows this to happen and needs to be
tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else?

Problem #2: I'd like to find all rows in B that point to non-existent rows in U.  I can do it with the following
sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned):

 SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id);

Is there a more efficient way to get the rows from "bookmark"?

Thanks,
Otis

----- Original Message ----
From: Stephan Szabo <sszabo@megazone.bigpanda.com>
To: ogjunk-pgjedan@yahoo.com
Cc: pgsql-sql@postgresql.org
Sent: Tuesday, March 21, 2006 10:08:38 AM
Subject: Re: [SQL] Referential integrity broken (8.0.3), sub-select help

On Tue, 21 Mar 2006 ogjunk-pgjedan@yahoo.com wrote:

> I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK.

That's not what your schema below has.  Your fragment below has URL
pointing to bookmark.

> Somehow I ended up with some rows in B referencing non-existent rows in U.

With the below, this is entirely possible, since you're only guaranteeing
that URLs have valid bookmarks not the other way around.  Are you sure the
below is actually what you have?

> This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen.
> I'm using 8.0.3.
>
> Here are the table references I just mentioned:
>
> Table "bookmark":
>      id                  SERIAL
>                          CONSTRAINT pk_bookmark_id PRIMARY KEY
>
>  Table "url":
>     url_id              INTEGER
>                          CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)

Re: Referential integrity broken (8.0.3), sub-select help

From

Stephan Szabo

Date:

21 March 2006, 14:44:15

On Tue, 21 Mar 2006 ogjunk-pgjedan@yahoo.com wrote:

> I mistakenly swapped the tables in my email.  Here they are, corrected:
>
>  Table "url":
>      id                  SERIAL
>                           CONSTRAINT pk_url_id PRIMARY KEY
>
> Table "bookmark":
>      url_id              INTEGER
>                            CONSTRAINT fk_url_id REFERENCES url(id)
>
> I see my questions got chopped off from this email below, so let me restate them:
>
>
> Problem #1: Strange that PG allowed this to happen.  Maybe my DDL above
> allows this to happen and needs to be tightened?  I thought the above
> would ensure referential integrity, but maybe I need to specify
> something else?

That seems like it should have worked. I don't know of any cases that'd
fail without referential actions (there are some cases with actions and
before triggers or rules), so if you have any leads, that'd be useful.

> Problem #2: I'd like to find all rows in B that point to non-existent
> rows in U.  I can do it with the following sub-select, I believe, but
> it's rather inefficient (EXPLAIN shows both tables would be sequentially
> scanned):
>
>   SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id);
>
> Is there a more efficient way to get the rows from "bookmark"?

I think something like the following would work

SELECT * FROM bookmark WHERE url_id NOT IN (SELECT id FROM url u);

Raising work_mem may help get a better plan as well.

Re: Referential integrity broken (8.0.3), sub-select help

From

Scott Marlowe

Date:

21 March 2006, 15:08:13

On Tue, 2006-03-21 at 08:58, ogjunk-pgjedan@yahoo.com wrote:
> Hello,
> 
> I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK.
> Somehow I ended up with some rows in B referencing non-existent rows in U.
> This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen.
> I'm using 8.0.3.
> 
> Here are the table references I just mentioned:
> 
> Table "bookmark":
>      id                  SERIAL
>                          CONSTRAINT pk_bookmark_id PRIMARY KEY
>  
>  Table "url":
>     url_id              INTEGER
>                          CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)
> 
> 
> Problem #1: Strange that PG allowed this to happen.  Maybe my DDL above allows this to happen and needs to be
tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else?

Assuming you didn't do something like turning off all triggers at some
point, the other common cause of this kind of thing is bad hardware
(CPU, memory, hard drive, etc...)  so test your hardware.  Any machine
going into production as a database server should be heavily tested to
ensure that it has good hardware.  No database management program can be
expected to overcome broken hardware or OSes.  Good tools for testing
are memtest86 and doing a fdisk with the badblocks option (in linux, not
sure what the name is in bsd, but I'm sure it has some kind of block
tester in there somewhere.)

You can also write your own scripts to test a drive by writing the same
semi-random byte sequence to the drive, filling it up, then reading it
back and comparing them.  All zeros and all ones is a good test, and
there are patterns that tend to show problems.  Generally, most drives
that have problems will show them rather quickly in testing, with bad
blocks flying by by the hundreds.  But sometimes, it's just one block
causing a problem.

> Problem #2: I'd like to find all rows in B that point to non-existent rows in U.  I can do it with the following
sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned):

> 
>   SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id);

Generally a left join with not null in the where clause is more
efficient:

select a.id from table1 a left join table2 b on (a.id=b.aid) where b.aid
is null;

will show you all the rows in table1 that have no match in table2

Re: Referential integrity broken (8.0.3), sub-select help

From

Patrick JACQUOT

Date:

22 March 2006, 05:15:49

ogjunk-pgjedan@yahoo.com wrote:

>Hello,
>
>I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK.
>Somehow I ended up with some rows in B referencing non-existent rows in U.
>This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen.
>I'm using 8.0.3.
>
>Here are the table references I just mentioned:
>
>Table "bookmark":
>     id                  SERIAL
>                         CONSTRAINT pk_bookmark_id PRIMARY KEY
> 
> Table "url":
>    url_id              INTEGER
>                         CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)
>
>  
>
Your DDL  doesn't say :  "B references U", but the contrary : "U 
references B".
So it's perfectly right that somes tuples in B are not referenced by 
tuples in U.
Please correct your constraints.

>Problem #1: Strange that PG allowed this to happen.  Maybe my DDL above allows this to happen and needs to be
tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else?
 
>
>Problem #2: I'd like to find all rows in B that point to non-existent rows in U.  I can do it with the following
sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned):
 
>
>  SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id);
>
>Is there a more efficient way to get the rows from "bookmark"?
>
>Thanks,
>Otis
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: explain analyze is your friend
>
>  
>
I think, for that one Scott's answer is OK
You could also try SELECT * FROM url U WHERE NOT EXISTS(SELECT * FROM 
bookmark B WHERE B.url-id=U.id)
and see wich one is faster