Thread: Batch Insert Performance

Batch Insert Performance

From
Greg Stark
Date:
I'm trying to seed a database and I'm surprised by how slow it is. I'm doing a
query like:

 INSERT INTO source_other_xref (source_id, other_id) (
   SELECT source_id,1+trunc(1000*random())
     FROM source_table
   );

where source_table has 271,000 records. Table source_other_xref has two
columns both of which are foreign key references, one to source_table and the
other to another table.

It's been running for hours and hours.

The machine doesn't seem to be thrashing its disk writing or reading much. It
seems to be completely cpu-bound.

I'm thinking it's mostly wasting time checking the foreign key constraint. Is
there any equivalent to Oracle's ability to "disable" a constraint and then
reenable it later?

Even with the foreign key constraints I wouldn't have expected inserting 200k
records to take this long.

--
greg

Re: Batch Insert Performance

From
"Peter Alberer"
Date:
Hi greg,

here is an info page that could be useful for you: "Referential
Integrity Tutorial & Hacking" at
http://techdocs.postgresql.org/techdocs/hackingreferentialintegrity.php

But be careful, you have to deal with the system tables directly.

Have you tried selecting the data into a temp table and "copying" that
into the source_other_xref table? Could that be faster?

Ciao, peter



Re: Batch Insert Performance

From
Greg Stark
Date:
Ok, a weird thing happened. I gave up on the batch insert, went to a friend's
house to drink some scotch, came back and tried it again and the whole insert
took 5 minutes. That's _much_ more reasonable.

Now it was late at night and I wasn't necessarily completely sober but I'm
pretty sure _nothing_ had changed. I can only imagine somehow the query was
frozen trying to grab a lock on something. Only nothing else was running and
it was consuming an awful lot of cpu for something trying to acquire a lock.

For what it's worth I tried it with the constraints removed using alter table.
It took 10 seconds to do the entire insert, then a 1m15s to restore each
constraint. (Can I restore them together? would it be faster?) So that's
faster but only by a factor of 2.

--
greg