Re: COPY from question - Mailing list pgsql-hackers

From Slavisa Garic
Subject Re: COPY from question
Date
Msg-id Pine.GSO.4.10.10402051133020.8655-100000@bruce.csse.monash.edu.au
Whole thread Raw
In response to Re: COPY from question  (Kevin Brown <kevin@sysexperts.com>)
List pgsql-hackers
Hi Kevin,

On Tue, 3 Feb 2004, Kevin Brown wrote:

> Slavisa Garic wrote:
> > Using pg module in python I am trying to run the COPY command to populate
> > the large table. I am using this to replace the INSERT which takes about
> > few hours to add 70000 entries where copy takes minute and a half. 
> 
> That difference in speed seems quite large.  Too large.  Are you batching
> your INSERTs into transactions (you should be in order to get good
> performance)?  Do you have a ton of indexes on the table?  Does it have
> triggers on it or some other thing (if so then COPY may well wind up doing
> the wrong thing since the triggers won't fire for the rows it inserts)?
> 
> I don't know what kind of schema you're using, but it takes perhaps a
> couple of hours to insert 2.5 million rows on my system.  But the rows
> in my schema may be much smaller than yours.

You are right about the indexes. There is quite a few of them (5-6 without
looking at the schema). The problem is that I do need those indexes as I
have a lot of SELECTs on that table and inserts are only happening once.

You are also right about the rows (i think) as I have about 15-20 columns.
This could be split into few other table and it used to be but I have
merged them because of the requirement for the faster SELECTs. With the
current schema there most of my modules that access the database are not
required to do expensive JOINs as they used to. Because faster SELECTs are
more important to me then faster INSERTs I had to do this. THis wasn't a
problem for me until I have started creating experiments which had more
than 20 thousand jobs which translates to 20 thousand rows in this big
table.

I do batch INSERTs into one big transaction (1000 rows at a time). While i
did get some improvement compared to the single transaction per insert it
was still not fast enough (well not for me :) ). Could you please
elaborate on the triggers? I have no idea what kind of triggers there are
in PGSQL or relational databases.

With regards to my problem, I did solve it by piping the data into the
COPY stdin. Now I have about 75000 rows inserted in 40 seconds which is
extremely good for me.

Thank you for your help,
Regards,
Slavisa

> -- 
> Kevin Brown                          kevin@sysexperts.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 



pgsql-hackers by date:

Previous
From: "Nicolai Tufar"
Date:
Subject: Re: PITR Dead horse?
Next
From: Josh Berkus
Date:
Subject: Re: PITR Dead horse?