Thread: bulk insert performance problem
Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! I almost tried everything suggested on this list, changed our external raid array from raid 5 to raid 10, tweaked postgresql.conf to the best of my knowledge, moved pg_xlog to a different array, dropped the tables before running the script. But the performance gain was negligible even after all these changes... IMHO the hardware that we use should be up to the task: Dell PowerEdge 6850, 4 x 3.0Ghz Dual Core Xeon, 8GB RAM, 3 x 300GB SAS 10K in raid 5 for / and 6 x 300GB SAS 10K in raid 10 (MD1000) for PG data, the data filesystem is ext3 mounted with noatime and data=writeback. Running on openSUSE 10.3 with PostgreSQL 8.2.7. The server is dedicated for PostgreSQL... We tested the same script and schema with Oracle 10g on the same machine and it took only 2.5h to complete! What I don't understand is that with Oracle the performance seems always consistent but with PG it deteriorates over time... Any idea? Is there any other improvements I could do? Thanks Christian
Christian Bourque wrote: > Hi, > > I have a performance problem with a script that does massive bulk > insert in 6 tables. When the script starts the performance is really > good but will degrade minute after minute and take almost a day to > finish! > Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. Also, if you're loading data using stored procedures you should avoid the use of exception blocks. I had some major problems with my bulk data conversion code due to overuse of exception blocks creating large numbers of subtransactions behind the scenes and slowing everything to a crawl. -- Craig Ringer
Craig Ringer wrote: > Christian Bourque wrote: >> Hi, >> >> I have a performance problem with a script that does massive bulk >> insert in 6 tables. When the script starts the performance is really >> good but will degrade minute after minute and take almost a day to >> finish! >> > Would I be correct in guessing that there are foreign key relationships > between those tables, and that there are significant numbers of indexes > in use? > > The foreign key checking costs will go up as the tables grow, and AFAIK > the indexes get a bit more expensive to maintain too. > > If possible you should probably drop your foreign key relationships and > drop your indexes, insert your data, then re-create the indexes and > foreign keys. The foreign keys will be rechecked when you recreate them, > and it's *vastly* faster to do it that way. Similarly, building an index > from scratch is quite a bit faster than progressively adding to it. Of > course, dropping the indices is only useful if you aren't querying the > tables as you build them. If you are, add "analyze" commands through the import, eg every 10,000 rows. Then your checks should be a bit faster. The other suggestion would be to do block commits: begin; do stuff for 5000 rows; commit; repeat until finished. -- Postgresql & php tutorials http://www.designmagick.com/
I use 10000 rows,have big blob
2008-04-08
bitaoxiao
发件人: Chris
发送时间: 2008-04-08 11:35:57
收件人: Christian Bourque
抄送: pgsql-performance@postgresql.org
主题: Re: [PERFORM] bulk insert performance problem
Craig Ringer wrote:
> Christian Bourque wrote:
>> Hi,
>>
>> I have a performance problem with a script that does massive bulk
>> insert in 6 tables. When the script starts the performance is really
>> good but will degrade minute after minute and take almost a day to
>> finish!
>>
> Would I be correct in guessing that there are foreign key relationships
> between those tables, and that there are significant numbers of indexes
> in use?
>
> The foreign key checking costs will go up as the tables grow, and AFAIK
> the indexes get a bit more expensive to maintain too.
>
> If possible you should probably drop your foreign key relationships and
> drop your indexes, insert your data, then re-create the indexes and
> foreign keys. The foreign keys will be rechecked when you recreate them,
> and it's *vastly* faster to do it that way. Similarly, building an index
> from scratch is quite a bit faster than progressively adding to it. Of
> course, dropping the indices is only useful if you aren't querying the
> tables as you build them.
If you are, add "analyze" commands through the import, eg every 10,000
rows. Then your checks should be a bit faster.
The other suggestion would be to do block commits:
begin;
do stuff for 5000 rows;
commit;
repeat until finished.
--
Postgresql & php tutorials
http://www.designmagick.com/
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Mon, Apr 07, 2008 at 11:01:18PM -0400, Christian Bourque wrote: > I have a performance problem with a script that does massive bulk > insert in 6 tables. When the script starts the performance is really > good but will degrade minute after minute and take almost a day to > finish! how do you do this bulk insert? depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
Christian Bourque wrote: > > Any idea? Is there any other improvements I could do? Are you using the "COPY" syntax in the import script or individual insert statements? Using COPY will always be *much* faster. I believe COPY always appends to tables rather than replacing the contents, you can combine this technique with the possibility of splitting up the task into multiple copy statements, but that has never been necessary in my case, switching from INSERTS to a COPY statement always provided the huge performance improvement I needed. It's easy to confuse "pg_dump -d" with "psql -d" ...it's too bad they mean very different things. For pg_dump, "-d" causes INSERT statements to be generated instead of a COPY statement, and is has been a mistake I made in the past, because I expected to work like "psql -d", where "-d" means "database name". I suppose the safe thing to do is to avoid using "-d" altogether! Mark
On Tue, 8 Apr 2008, Mark Stosberg wrote: >> Any idea? Is there any other improvements I could do? > > Are you using the "COPY" syntax in the import script or individual insert > statements? Using COPY will always be *much* faster. PostgreSQL (latest versions at least) has an optimisation if you create a table in the same transaction as you load data into it. So, if you have a database dump, load it in using psql -1, which wraps the entire operation in a single transaction. Of course, a COPY dump will load a lot faster than a INSERT dump. Matthew -- What goes up must come down. Ask any system administrator.
> I have a performance problem with a script that does massive bulk > insert in 6 tables. When the script starts the performance is really > good but will degrade minute after minute and take almost a day to > finish! Looks like foreign key checks slow you down. - Batch INSERTS in transactions (1000-10000 per transaction) - Run ANALYZE once in a while so the FK checks use indexes - Are there any DELETEs in your script which might hit nonidexed REFERENCES... columns to cascade ? - Do you really need to check for FKs on the fly while inserting ? ie. do you handle FK violations ? Or perhaps your data is already consistent ? In this case, load the data without any constraints (and without any indexes), and add indexes and foreign key constraints after the loading is finished. - Use COPY instead of INSERT. If you use your script to process data, perhaps you could import raw unprocessed data in a table (with COPY) and process it with SQL. This is usually much faster than doing a zillion inserts.