Re: Improve BULK insertion - Mailing list pgsql-performance

From Patrick Hatcher
Subject Re: Improve BULK insertion
Date
Msg-id OFB8DE37C9.38209B63-ON88256F60.00513B9D-88256F60.00513BD6@fds.com
Whole thread Raw
In response to Improve BULK insertion  (Grupos <grupos@carvalhaes.net>)
List pgsql-performance
<font face="Default Sans Serif, Verdana, Arial, Helvetica, sans-serif" size="2"><div>I do mass inserts daily into PG. 
Idrop the all indexes except my primary key and then use the COPY FROM command.  This usually takes less than 30
seconds. I spend more time waiting for indexes to recreate.<div><br /><br />Patrick Hatcher<br />Macys.Com<br
/><div> </div><fontcolor="#990099">-----pgsql-performance-owner@postgresql.org wrote: -----<br /><br /></font>To:
pgsql-performance@postgresql.org<br/>From: Christopher Browne <cbbrowne@acm.org><br />Sent by:
pgsql-performance-owner@postgresql.org<br/>Date: 2004-12-04 06:48AM<br />Subject: Re: [PERFORM] Improve BULK
insertion<br/><br /><font face="monospace" size="2">In the last exciting episode, grupos@carvalhaes.net (Grupos)
wrote:<br/>> Hi !<br />><br />> I need to insert 500.000 records on a table frequently. It´s a bulk<br />>
insertionfrom my applicatoin.<br />> I am with a very poor performance. PostgreSQL insert very fast until<br />>
thetuple 200.000 and after it the insertion starts to be really slow.<br />> I am seeing on the log and there is a
lotof transaction logs,<br />> something like :<br />><br />> 2004-12-04 11:08:59 LOG:  recycled transaction
logfile "0000000600000012"<br />> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000013"<br
/>>2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000011"<br />> 2004-12-04 11:14:04 LOG:
 recycledtransaction log file "0000000600000015"<br />> 2004-12-04 11:14:04 LOG:  recycled transaction log file
"0000000600000014"<br/>> 2004-12-04 11:19:08 LOG:  recycled transaction log file "0000000600000016"<br />>
2004-12-0411:19:08 LOG:  recycled transaction log file "0000000600000017"<br />> 2004-12-04 11:24:10 LOG:  recycled
transactionlog file "0000000600000018"<br /><br />It is entirely normal for there to be a lot of transaction log
file<br/>recycling when bulk inserts are taking place; that goes through a lot<br />of transaction logs.<br /><br
/>>How can I configure PostgreSQL to have a better performance on this<br />> bulk insertions ? I already
increasedthe memory values.<br /><br />Memory is, as likely as not, NOT the issue.<br /><br />Two questions:<br /><br
/>1.  How are you doing the inserts?  Via INSERT statements?  Or<br />     via COPY statements?  What sort of
transactiongrouping<br />     is involved?<br /><br />     COPY is way faster than INSERT, and grouping plenty of
updates<br/>     into a single transaction is generally a "win."<br /><br /> 2.  What is the schema like?  Does the
tablehave a foreign key<br />     constraint?  Does it have a bunch of indices?<br /><br />     If there should
eventuallybe lots of indices, it tends to be<br />     faster to create the table with none/minimal indices, and add<br
/>    indexes afterwards, as long as your "load" process can be trusted<br />     to not break "unique"
constraints...<br/><br />     If there is some secondary table with a foreign key constraint,<br />     and _that_
tableis growing, it is possible that a sequential<br />     scan is being used to search the secondary table where, if
you<br/>     did an ANALYZE on that table, an index scan would be preferred<br />     once it grew to larger size...<br
/><br/>There isn't a particular reason for PostgreSQL to "hit a wall" upon<br />seeing 200K records; I and coworkers
routinelyload database dumps<br />that have millions of (sometimes pretty fat) records, and they don't<br />"choke."
 That'strue whether talking about loading things onto my<br />(somewhat wimpy) desktop PC, or a SMP Xeon system with a
smallRAID<br />array, or higher end stuff involving high end SMP and EMC disk arrays.<br />The latter obviously being
ordersof magnitude faster than desktop<br />equipment :-).<br />-- <br />(format nil "~S@~S" "cbbrowne" "acm.org")<br
/><ahref="http://www3.sympatico.ca/cbbrowne/unix.html"
target="blank">http://www3.sympatico.ca/cbbrowne/unix.html</a><br/>Rules of the  Evil Overlord #207. "Employees will
 haveconjugal visit<br />trailers which  they may use provided  they call in  a replacement and<br />sign out on  the
timesheet.Given this, anyone caught  making out in a<br />closet  while  leaving  their   station  unmonitored  will
 be shot."<br /><<a href="http://www.eviloverlord.com/" target="blank">http://www.eviloverlord.com/</a>><br /><br
/>---------------------------(endof broadcast)---------------------------<br />TIP 7: don't forget to increase your
freespace map settings<br /></font></div></div></font> 

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Improve BULK insertion
Next
From: sarlav kumar
Date:
Subject: lock problem