On Tue, 2006-03-21 at 08:58, ogjunk-pgjedan@yahoo.com wrote:
> Hello,
>
> I've got 2 tables, "url" (U), and "bookmark" (B), with "bookmark" pointing to "url" via FK.
> Somehow I ended up with some rows in B referencing non-existent rows in U.
> This sounds super strange and dangerous to me, and it's not clear to me how/why PG let this happen.
> I'm using 8.0.3.
>
> Here are the table references I just mentioned:
>
> Table "bookmark":
> id SERIAL
> CONSTRAINT pk_bookmark_id PRIMARY KEY
>
> Table "url":
> url_id INTEGER
> CONSTRAINT fk_bookmark_id REFERENCES bookmark(id)
>
>
> Problem #1: Strange that PG allowed this to happen. Maybe my DDL above allows this to happen and needs to be
tightened? I thought the above would ensure referential integrity, but maybe I need to specify something else?
Assuming you didn't do something like turning off all triggers at some
point, the other common cause of this kind of thing is bad hardware
(CPU, memory, hard drive, etc...) so test your hardware. Any machine
going into production as a database server should be heavily tested to
ensure that it has good hardware. No database management program can be
expected to overcome broken hardware or OSes. Good tools for testing
are memtest86 and doing a fdisk with the badblocks option (in linux, not
sure what the name is in bsd, but I'm sure it has some kind of block
tester in there somewhere.)
You can also write your own scripts to test a drive by writing the same
semi-random byte sequence to the drive, filling it up, then reading it
back and comparing them. All zeros and all ones is a good test, and
there are patterns that tend to show problems. Generally, most drives
that have problems will show them rather quickly in testing, with bad
blocks flying by by the hundreds. But sometimes, it's just one block
causing a problem.
> Problem #2: I'd like to find all rows in B that point to non-existent rows in U. I can do it with the following
sub-select,I believe, but it's rather inefficient (EXPLAIN shows both tables would be sequentially scanned):
>
> SELECT * FROM bookmark WHERE id NOT IN (SELECT b.id FROM bookmark b, url u WHERE b.url_id=u.id);
Generally a left join with not null in the where clause is more
efficient:
select a.id from table1 a left join table2 b on (a.id=b.aid) where b.aid
is null;
will show you all the rows in table1 that have no match in table2