Re: Improve BULK insertion - Mailing list pgsql-performance
From | Patrick Hatcher |
---|---|
Subject | Re: Improve BULK insertion |
Date | |
Msg-id | OFB8DE37C9.38209B63-ON88256F60.00513B9D-88256F60.00513BD6@fds.com Whole thread Raw |
In response to | Improve BULK insertion (Grupos <grupos@carvalhaes.net>) |
List | pgsql-performance |
<font face="Default Sans Serif, Verdana, Arial, Helvetica, sans-serif" size="2"><div>I do mass inserts daily into PG. Idrop the all indexes except my primary key and then use the COPY FROM command. This usually takes less than 30 seconds. I spend more time waiting for indexes to recreate.<div><br /><br />Patrick Hatcher<br />Macys.Com<br /><div> </div><fontcolor="#990099">-----pgsql-performance-owner@postgresql.org wrote: -----<br /><br /></font>To: pgsql-performance@postgresql.org<br/>From: Christopher Browne <cbbrowne@acm.org><br />Sent by: pgsql-performance-owner@postgresql.org<br/>Date: 2004-12-04 06:48AM<br />Subject: Re: [PERFORM] Improve BULK insertion<br/><br /><font face="monospace" size="2">In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote:<br/>> Hi !<br />><br />> I need to insert 500.000 records on a table frequently. It´s a bulk<br />> insertionfrom my applicatoin.<br />> I am with a very poor performance. PostgreSQL insert very fast until<br />> thetuple 200.000 and after it the insertion starts to be really slow.<br />> I am seeing on the log and there is a lotof transaction logs,<br />> something like :<br />><br />> 2004-12-04 11:08:59 LOG: recycled transaction logfile "0000000600000012"<br />> 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013"<br />>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011"<br />> 2004-12-04 11:14:04 LOG: recycledtransaction log file "0000000600000015"<br />> 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014"<br/>> 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016"<br />> 2004-12-0411:19:08 LOG: recycled transaction log file "0000000600000017"<br />> 2004-12-04 11:24:10 LOG: recycled transactionlog file "0000000600000018"<br /><br />It is entirely normal for there to be a lot of transaction log file<br/>recycling when bulk inserts are taking place; that goes through a lot<br />of transaction logs.<br /><br />>How can I configure PostgreSQL to have a better performance on this<br />> bulk insertions ? I already increasedthe memory values.<br /><br />Memory is, as likely as not, NOT the issue.<br /><br />Two questions:<br /><br />1. How are you doing the inserts? Via INSERT statements? Or<br /> via COPY statements? What sort of transactiongrouping<br /> is involved?<br /><br /> COPY is way faster than INSERT, and grouping plenty of updates<br/> into a single transaction is generally a "win."<br /><br /> 2. What is the schema like? Does the tablehave a foreign key<br /> constraint? Does it have a bunch of indices?<br /><br /> If there should eventuallybe lots of indices, it tends to be<br /> faster to create the table with none/minimal indices, and add<br /> indexes afterwards, as long as your "load" process can be trusted<br /> to not break "unique" constraints...<br/><br /> If there is some secondary table with a foreign key constraint,<br /> and _that_ tableis growing, it is possible that a sequential<br /> scan is being used to search the secondary table where, if you<br/> did an ANALYZE on that table, an index scan would be preferred<br /> once it grew to larger size...<br /><br/>There isn't a particular reason for PostgreSQL to "hit a wall" upon<br />seeing 200K records; I and coworkers routinelyload database dumps<br />that have millions of (sometimes pretty fat) records, and they don't<br />"choke." That'strue whether talking about loading things onto my<br />(somewhat wimpy) desktop PC, or a SMP Xeon system with a smallRAID<br />array, or higher end stuff involving high end SMP and EMC disk arrays.<br />The latter obviously being ordersof magnitude faster than desktop<br />equipment :-).<br />-- <br />(format nil "~S@~S" "cbbrowne" "acm.org")<br /><ahref="http://www3.sympatico.ca/cbbrowne/unix.html" target="blank">http://www3.sympatico.ca/cbbrowne/unix.html</a><br/>Rules of the Evil Overlord #207. "Employees will haveconjugal visit<br />trailers which they may use provided they call in a replacement and<br />sign out on the timesheet.Given this, anyone caught making out in a<br />closet while leaving their station unmonitored will be shot."<br /><<a href="http://www.eviloverlord.com/" target="blank">http://www.eviloverlord.com/</a>><br /><br />---------------------------(endof broadcast)---------------------------<br />TIP 7: don't forget to increase your freespace map settings<br /></font></div></div></font>
pgsql-performance by date: