Re: Improve BULK insertion - Mailing list pgsql-performance

From Christopher Browne
Subject Re: Improve BULK insertion
Date
Msg-id m38y8ep1ao.fsf@knuth.knuth.cbbrowne.com
Whole thread Raw
In response to Improve BULK insertion  (Grupos <grupos@carvalhaes.net>)
Responses Re: Improve BULK insertion
List pgsql-performance
In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote:
> Hi !
>
> I need to insert 500.000 records on a table frequently. It´s a bulk
> insertion from my applicatoin.
> I am with a very poor performance. PostgreSQL insert very fast until
> the tuple 200.000 and after it the insertion starts to be really slow.
> I am seeing on the log and there is a lot of transaction logs,
> something like :
>
> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000012"
> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000013"
> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000011"
> 2004-12-04 11:14:04 LOG:  recycled transaction log file "0000000600000015"
> 2004-12-04 11:14:04 LOG:  recycled transaction log file "0000000600000014"
> 2004-12-04 11:19:08 LOG:  recycled transaction log file "0000000600000016"
> 2004-12-04 11:19:08 LOG:  recycled transaction log file "0000000600000017"
> 2004-12-04 11:24:10 LOG:  recycled transaction log file "0000000600000018"

It is entirely normal for there to be a lot of transaction log file
recycling when bulk inserts are taking place; that goes through a lot
of transaction logs.

> How can I configure PostgreSQL to have a better performance on this
> bulk insertions ? I already increased the memory values.

Memory is, as likely as not, NOT the issue.

Two questions:

 1.  How are you doing the inserts?  Via INSERT statements?  Or
     via COPY statements?  What sort of transaction grouping
     is involved?

     COPY is way faster than INSERT, and grouping plenty of updates
     into a single transaction is generally a "win."

 2.  What is the schema like?  Does the table have a foreign key
     constraint?  Does it have a bunch of indices?

     If there should eventually be lots of indices, it tends to be
     faster to create the table with none/minimal indices, and add
     indexes afterwards, as long as your "load" process can be trusted
     to not break "unique" constraints...

     If there is some secondary table with a foreign key constraint,
     and _that_ table is growing, it is possible that a sequential
     scan is being used to search the secondary table where, if you
     did an ANALYZE on that table, an index scan would be preferred
     once it grew to larger size...

There isn't a particular reason for PostgreSQL to "hit a wall" upon
seeing 200K records; I and coworkers routinely load database dumps
that have millions of (sometimes pretty fat) records, and they don't
"choke."  That's true whether talking about loading things onto my
(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID
array, or higher end stuff involving high end SMP and EMC disk arrays.
The latter obviously being orders of magnitude faster than desktop
equipment :-).
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/unix.html
Rules of the  Evil Overlord #207. "Employees will  have conjugal visit
trailers which  they may use provided  they call in  a replacement and
sign out on  the timesheet. Given this, anyone caught  making out in a
closet  while  leaving  their   station  unmonitored  will  be  shot."
<http://www.eviloverlord.com/>

pgsql-performance by date:

Previous
From: Grupos
Date:
Subject: Improve BULK insertion
Next
From: Patrick Hatcher
Date:
Subject: Re: Improve BULK insertion