Adding a foreign key constraint is extremely slow - Mailing list pgsql-performance

From bsamwel@xs4all.nl
Subject Adding a foreign key constraint is extremely slow
Date
Msg-id 20505.194.109.187.67.1048442304.squirrel@webmail.xs4all.nl
Whole thread Raw
Responses Re: Adding a foreign key constraint is extremely slow  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: Adding a foreign key constraint is extremely slow  (Greg Stark <gsstark@mit.edu>)
List pgsql-performance
Hi guys,

I'm having another performance problem as well. I have two tables called
"wwwlog" (about 100 bytes per row, 1.2 million records) and table called
"triples" (about 20 bytes per row, 0.9 million records). Triples contains
an integer foreign key to wwwlog, but it was not marked as a foreign key
at the point of table creation. Now, when I do:

alter table triples add foreign key(id1) references wwwlog(id);

PostgreSQL starts doing heavy work for at least one and a half hour, and I
broke it off at that. It is not possible to "explain" a statement like
this! Probably what it does is that it will check the foreign key
constraint for every field in the table. This will make it completely
impossible to load my data, because:

(1) I cannot set the foreign key constraints BEFORE loading the 0.9
million records, because that would cause the checks to take place during
loading.
(2) I cannot set the foreign key constraints AFTER loading the 0.9 million
records because I've got no clue at all how long this operation is going
to take.
(3) Table "triples" contains two more foreign keys to the same wwwlog key.
This means I've got to do the same thing two more times after the first
one is finished.

I find this behaviour very annoying, because it is possible to optimize a
check like this very well, for instance by creating a temporary data set
containing the union of all foreign keys and all primary keys of the
original table, augmented with an extra field "pri" which is 1 if the
record comes from the primary keys and 0 otherwise. Say this data is
contained in a temporary table called "t" with columns "key" and "pri" for
the data. One would then be able to do the check like this:

NOT EXISTS(
  SELECT key,sum(pri)
  FROM t
  GROUP BY key
  HAVING sum(pri) = 0
);

This means that there must not exist a group of "key" values that does not
have a primary key somewhere in the set. This query is extremely easy to
execute and would be done in a few seconds.

Does anyone know of a way of adding foreign key constraints faster in
PostgreSQL? Or, if there is no solution, do you guys know of any reasons
why a solution like the one I described above would or would not work, and
could or could not be built into PostgreSQL at some point?

Regards,
Bart


pgsql-performance by date:

Previous
From: bsamwel@xs4all.nl
Date:
Subject: Slow update of indexed column with many nulls
Next
From: Stephan Szabo
Date:
Subject: Re: Adding a foreign key constraint is extremely slow