Home > mailing lists

Re: Improve BULK insertion - Mailing list pgsql-performance

From	Christopher Browne
Subject	Re: Improve BULK insertion
Date	December 4, 2004 14:50:16
Msg-id	m38y8ep1ao.fsf@knuth.knuth.cbbrowne.com Whole thread Raw
In response to	Improve BULK insertion (Grupos <grupos@carvalhaes.net>)
Responses	Re: Improve BULK insertion
List	pgsql-performance

Tree view

In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote:
> Hi !
>
> I need to insert 500.000 records on a table frequently. It´s a bulk
> insertion from my applicatoin.
> I am with a very poor performance. PostgreSQL insert very fast until
> the tuple 200.000 and after it the insertion starts to be really slow.
> I am seeing on the log and there is a lot of transaction logs,
> something like :
>
> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000012"
> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000013"
> 2004-12-04 11:08:59 LOG:  recycled transaction log file "0000000600000011"
> 2004-12-04 11:14:04 LOG:  recycled transaction log file "0000000600000015"
> 2004-12-04 11:14:04 LOG:  recycled transaction log file "0000000600000014"
> 2004-12-04 11:19:08 LOG:  recycled transaction log file "0000000600000016"
> 2004-12-04 11:19:08 LOG:  recycled transaction log file "0000000600000017"
> 2004-12-04 11:24:10 LOG:  recycled transaction log file "0000000600000018"

It is entirely normal for there to be a lot of transaction log file
recycling when bulk inserts are taking place; that goes through a lot
of transaction logs.

> How can I configure PostgreSQL to have a better performance on this
> bulk insertions ? I already increased the memory values.

Memory is, as likely as not, NOT the issue.

Two questions:

 1.  How are you doing the inserts?  Via INSERT statements?  Or
     via COPY statements?  What sort of transaction grouping
     is involved?

     COPY is way faster than INSERT, and grouping plenty of updates
     into a single transaction is generally a "win."

 2.  What is the schema like?  Does the table have a foreign key
     constraint?  Does it have a bunch of indices?

     If there should eventually be lots of indices, it tends to be
     faster to create the table with none/minimal indices, and add
     indexes afterwards, as long as your "load" process can be trusted
     to not break "unique" constraints...

     If there is some secondary table with a foreign key constraint,
     and _that_ table is growing, it is possible that a sequential
     scan is being used to search the secondary table where, if you
     did an ANALYZE on that table, an index scan would be preferred
     once it grew to larger size...

There isn't a particular reason for PostgreSQL to "hit a wall" upon
seeing 200K records; I and coworkers routinely load database dumps
that have millions of (sometimes pretty fat) records, and they don't
"choke."  That's true whether talking about loading things onto my
(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID
array, or higher end stuff involving high end SMP and EMC disk arrays.
The latter obviously being orders of magnitude faster than desktop
equipment :-).
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/unix.html
Rules of the  Evil Overlord #207. "Employees will  have conjugal visit
trailers which  they may use provided  they call in  a replacement and
sign out on  the timesheet. Given this, anyone caught  making out in a
closet  while  leaving  their   station  unmonitored  will  be  shot."
<http://www.eviloverlord.com/>

pgsql-performance by date:

From: Grupos
Date: 04 December 2004, 13:37:41
Subject: Improve BULK insertion

From: Patrick Hatcher
Date: 04 December 2004, 15:01:00
Subject: Re: Improve BULK insertion

Re: Improve BULK insertion - Mailing list pgsql-performance

Previous

Next