Re: ALTER TABLE speed (adding foreign keys) - Mailing list pgsql-general

From Stephan Szabo
Subject Re: ALTER TABLE speed (adding foreign keys)
Date
Msg-id Pine.BSF.4.21.0102150749170.6101-100000@megazone23.bigpanda.com
Whole thread Raw
In response to ALTER TABLE speed (adding foreign keys)  (Shaw Terwilliger <sterwill@sourcegear.com>)
List pgsql-general
On Thu, 15 Feb 2001, Shaw Terwilliger wrote:

> I have a table, contacts, with two fields, id (int), contactid (int).
> I have another table, users, with a bunch of fields, but one important
> one, id (SERIAL), which is the table key.
>
> I needed to test my database setup (and software) with 1.5 million users,
> each of whom will have (on average), 10 contacts.  So for each user in
> the users table, I'll need ten records in the contacts table.
>
> I wrote a little script to spew some sample data into a file for
> COPY into the tables.  Since 1.5 million * 10 contacts is 15 million
> rows, I created my tables without foreign keys (both id and contactid
> will eventually be foreign keys into users, since I often select on both
> of them) and constraints (CHECK id <> contactid).
>
> The COPY goes much, much faster with these constraints absent from my tables.
> So I go to ALTER TABLE contacts, to add these foreign keys.  One backend
> fires up, allocates a few MB of RAM, and takes a long, long time to check
> all the existing (15 million) rows.  At least, that's what I think it's doing.
>
> postgres   394 98.1  1.6  7752 4300 pts/0    R<   Feb14 632:41 /usr/lib/postgresql/bin/postgres localhost sterwill im
ALTER 
>
> The process has taken approximately 10 hours of CPU time so far.
> I understand placing a foreign key constraint on an existing table can be
> a very expensive operation, but I just wanted to make sure I won't be
> "waiting forever" on this process.  I've got more tables to ALTER, and if
> they all take this long, I'll just wipe the database, load the schema
> again (_with_ the constraints), and just let the COPY take a day or two.

Currently the alter table implementation effectively means you're just
deferring all the trigger calls, so it's basically going to be doing the
15 million trigger calls.  It should finish, but I'm really uncertain of
how long it might take.  I've considered trying to do it as one query
which should be faster but requires a second copy of the logic to figure
out what the fk constraint means so I haven't done it since we're not
quite done with specifying the constraints fully yet.


pgsql-general by date:

Previous
From: Michael Ansley
Date:
Subject: RE: DSN-less connection to Postgres-database
Next
From: Tom Lane
Date:
Subject: Re: regular expression substittion function?