Re: Improve BULK insertion - Mailing list pgsql-performance
From | Christopher Browne |
---|---|
Subject | Re: Improve BULK insertion |
Date | |
Msg-id | m38y8ep1ao.fsf@knuth.knuth.cbbrowne.com Whole thread Raw |
In response to | Improve BULK insertion (Grupos <grupos@carvalhaes.net>) |
Responses |
Re: Improve BULK insertion
|
List | pgsql-performance |
In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote: > Hi ! > > I need to insert 500.000 records on a table frequently. It´s a bulk > insertion from my applicatoin. > I am with a very poor performance. PostgreSQL insert very fast until > the tuple 200.000 and after it the insertion starts to be really slow. > I am seeing on the log and there is a lot of transaction logs, > something like : > > 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012" > 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013" > 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011" > 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015" > 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014" > 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016" > 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017" > 2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018" It is entirely normal for there to be a lot of transaction log file recycling when bulk inserts are taking place; that goes through a lot of transaction logs. > How can I configure PostgreSQL to have a better performance on this > bulk insertions ? I already increased the memory values. Memory is, as likely as not, NOT the issue. Two questions: 1. How are you doing the inserts? Via INSERT statements? Or via COPY statements? What sort of transaction grouping is involved? COPY is way faster than INSERT, and grouping plenty of updates into a single transaction is generally a "win." 2. What is the schema like? Does the table have a foreign key constraint? Does it have a bunch of indices? If there should eventually be lots of indices, it tends to be faster to create the table with none/minimal indices, and add indexes afterwards, as long as your "load" process can be trusted to not break "unique" constraints... If there is some secondary table with a foreign key constraint, and _that_ table is growing, it is possible that a sequential scan is being used to search the secondary table where, if you did an ANALYZE on that table, an index scan would be preferred once it grew to larger size... There isn't a particular reason for PostgreSQL to "hit a wall" upon seeing 200K records; I and coworkers routinely load database dumps that have millions of (sometimes pretty fat) records, and they don't "choke." That's true whether talking about loading things onto my (somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID array, or higher end stuff involving high end SMP and EMC disk arrays. The latter obviously being orders of magnitude faster than desktop equipment :-). -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www3.sympatico.ca/cbbrowne/unix.html Rules of the Evil Overlord #207. "Employees will have conjugal visit trailers which they may use provided they call in a replacement and sign out on the timesheet. Given this, anyone caught making out in a closet while leaving their station unmonitored will be shot." <http://www.eviloverlord.com/>
pgsql-performance by date: