Thread: Referential integrity broken (8.0.3), sub-select help
Hello, I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK. Somehow I ended up with some rows in B referencing non-existent rows in U. This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen. I'm using 8.0.3. Here are the table references I just mentioned: Table "bookmark": id SERIAL CONSTRAINT pk_bookmark_id PRIMARY KEYTable "url": url_id INTEGER CONSTRAINT fk_bookmark_id REFERENCES bookmark(id) Problem #1: Strange that PG allowed this to happen. Maybe my DDL above allows this to happen and needs to be tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else? Problem #2: I'd like to find all rows in B that point to non-existent rows in U. I can do it with the following sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned): SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id); Is there a more efficient way to get the rows from "bookmark"? Thanks, Otis
On Tue, 21 Mar 2006 ogjunk-pgjedan@yahoo.com wrote: > I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK. That's not what your schema below has. Your fragment below has URL pointing to bookmark. > Somehow I ended up with some rows in B referencing non-existent rows in U. With the below, this is entirely possible, since you're only guaranteeing that URLs have valid bookmarks not the other way around. Are you sure the below is actually what you have? > This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen. > I'm using 8.0.3. > > Here are the table references I just mentioned: > > Table "bookmark": > id SERIAL > CONSTRAINT pk_bookmark_id PRIMARY KEY > > Table "url": > url_id INTEGER > CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)
delete my email from the list Regards, -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of ogjunk-pgjedan@yahoo.com Sent: Tuesday, March 21, 2006 8:29 PM To: pgsql-sql@postgresql.org Subject: [SQL] Referential integrity broken (8.0.3), sub-select help Hello, I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK. Somehow I ended up with some rows in B referencing non-existent rows in U. This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen. I'm using 8.0.3. Here are the table references I just mentioned: Table "bookmark": id SERIAL CONSTRAINT pk_bookmark_id PRIMARY KEY Table "url": url_id INTEGER CONSTRAINT fk_bookmark_id REFERENCES bookmark(id) Problem #1: Strange that PG allowed this to happen. Maybe my DDL above allows this to happen and needs to be tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else? Problem #2: I'd like to find all rows in B that point to non-existent rows in U. I can do it with the following sub-select, I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned): SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id); Is there a more efficient way to get the rows from "bookmark"? Thanks, Otis ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
Hi, I mistakenly swapped the tables in my email. Here they are, corrected: Table "url": id SERIAL CONSTRAINT pk_url_id PRIMARY KEY Table "bookmark": url_id INTEGER CONSTRAINT fk_url_id REFERENCES url(id) I see my questions got chopped off from this email below, so let me restate them: Problem #1: Strange that PG allowed this to happen. Maybe my DDL above allows this to happen and needs to be tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else? Problem #2: I'd like to find all rows in B that point to non-existent rows in U. I can do it with the following sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned): SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id); Is there a more efficient way to get the rows from "bookmark"? Thanks, Otis ----- Original Message ---- From: Stephan Szabo <sszabo@megazone.bigpanda.com> To: ogjunk-pgjedan@yahoo.com Cc: pgsql-sql@postgresql.org Sent: Tuesday, March 21, 2006 10:08:38 AM Subject: Re: [SQL] Referential integrity broken (8.0.3), sub-select help On Tue, 21 Mar 2006 ogjunk-pgjedan@yahoo.com wrote: > I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK. That's not what your schema below has. Your fragment below has URL pointing to bookmark. > Somehow I ended up with some rows in B referencing non-existent rows in U. With the below, this is entirely possible, since you're only guaranteeing that URLs have valid bookmarks not the other way around. Are you sure the below is actually what you have? > This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen. > I'm using 8.0.3. > > Here are the table references I just mentioned: > > Table "bookmark": > id SERIAL > CONSTRAINT pk_bookmark_id PRIMARY KEY > > Table "url": > url_id INTEGER > CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)
On Tue, 21 Mar 2006 ogjunk-pgjedan@yahoo.com wrote: > I mistakenly swapped the tables in my email. Here they are, corrected: > > Table "url": > id SERIAL > CONSTRAINT pk_url_id PRIMARY KEY > > Table "bookmark": > url_id INTEGER > CONSTRAINT fk_url_id REFERENCES url(id) > > I see my questions got chopped off from this email below, so let me restate them: > > > Problem #1: Strange that PG allowed this to happen. Maybe my DDL above > allows this to happen and needs to be tightened? I thought the above > would ensure referential integrity, but maybe I need to specify > something else? That seems like it should have worked. I don't know of any cases that'd fail without referential actions (there are some cases with actions and before triggers or rules), so if you have any leads, that'd be useful. > Problem #2: I'd like to find all rows in B that point to non-existent > rows in U. I can do it with the following sub-select, I believe, but > it's rather inefficient (EXPLAIN shows both tables would be sequentially > scanned): > > SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id); > > Is there a more efficient way to get the rows from "bookmark"? I think something like the following would work SELECT * FROM bookmark WHERE url_id NOT IN (SELECT id FROM url u); Raising work_mem may help get a better plan as well.
On Tue, 2006-03-21 at 08:58, ogjunk-pgjedan@yahoo.com wrote: > Hello, > > I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK. > Somehow I ended up with some rows in B referencing non-existent rows in U. > This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen. > I'm using 8.0.3. > > Here are the table references I just mentioned: > > Table "bookmark": > id SERIAL > CONSTRAINT pk_bookmark_id PRIMARY KEY > > Table "url": > url_id INTEGER > CONSTRAINT fk_bookmark_id REFERENCES bookmark(id) > > > Problem #1: Strange that PG allowed this to happen. Maybe my DDL above allows this to happen and needs to be tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else? Assuming you didn't do something like turning off all triggers at some point, the other common cause of this kind of thing is bad hardware (CPU, memory, hard drive, etc...) so test your hardware. Any machine going into production as a database server should be heavily tested to ensure that it has good hardware. No database management program can be expected to overcome broken hardware or OSes. Good tools for testing are memtest86 and doing a fdisk with the badblocks option (in linux, not sure what the name is in bsd, but I'm sure it has some kind of block tester in there somewhere.) You can also write your own scripts to test a drive by writing the same semi-random byte sequence to the drive, filling it up, then reading it back and comparing them. All zeros and all ones is a good test, and there are patterns that tend to show problems. Generally, most drives that have problems will show them rather quickly in testing, with bad blocks flying by by the hundreds. But sometimes, it's just one block causing a problem. > Problem #2: I'd like to find all rows in B that point to non-existent rows in U. I can do it with the following sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned): > > SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id); Generally a left join with not null in the where clause is more efficient: select a.id from table1 a left join table2 b on (a.id=b.aid) where b.aid is null; will show you all the rows in table1 that have no match in table2
ogjunk-pgjedan@yahoo.com wrote: >Hello, > >I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK. >Somehow I ended up with some rows in B referencing non-existent rows in U. >This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen. >I'm using 8.0.3. > >Here are the table references I just mentioned: > >Table "bookmark": > id SERIAL > CONSTRAINT pk_bookmark_id PRIMARY KEY > > Table "url": > url_id INTEGER > CONSTRAINT fk_bookmark_id REFERENCES bookmark(id) > > > Your DDL doesn't say : "B references U", but the contrary : "U references B". So it's perfectly right that somes tuples in B are not referenced by tuples in U. Please correct your constraints. >Problem #1: Strange that PG allowed this to happen. Maybe my DDL above allows this to happen and needs to be tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else? > >Problem #2: I'd like to find all rows in B that point to non-existent rows in U. I can do it with the following sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned): > > SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id); > >Is there a more efficient way to get the rows from "bookmark"? > >Thanks, >Otis > > >---------------------------(end of broadcast)--------------------------- >TIP 6: explain analyze is your friend > > > I think, for that one Scott's answer is OK You could also try SELECT * FROM url U WHERE NOT EXISTS(SELECT * FROM bookmark B WHERE B.url-id=U.id) and see wich one is faster