Thread: Improve BULK insertion
Hi ! I need to insert 500.000 records on a table frequently. It´s a bulk insertion from my applicatoin. I am with a very poor performance. PostgreSQL insert very fast until the tuple 200.000 and after it the insertion starts to be really slow. I am seeing on the log and there is a lot of transaction logs, something like : 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012" 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013" 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011" 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015" 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014" 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016" 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017" 2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018" How can I configure PostgreSQL to have a better performance on this bulk insertions ? I already increased the memory values. My data: Conectiva linux kernel 2.6.9 PostgreSQL 7.4.6 - 1,5gb memory max_connections = 30 shared_buffers = 30000 sort_mem = 32768 vacuum_mem = 32768 max_fsm_pages = 30000 max_fsm_relations = 1500 The other configurations are default. Cheers, Rodrigo Carvalhaes
In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote: > Hi ! > > I need to insert 500.000 records on a table frequently. It´s a bulk > insertion from my applicatoin. > I am with a very poor performance. PostgreSQL insert very fast until > the tuple 200.000 and after it the insertion starts to be really slow. > I am seeing on the log and there is a lot of transaction logs, > something like : > > 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012" > 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013" > 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011" > 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015" > 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014" > 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016" > 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017" > 2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018" It is entirely normal for there to be a lot of transaction log file recycling when bulk inserts are taking place; that goes through a lot of transaction logs. > How can I configure PostgreSQL to have a better performance on this > bulk insertions ? I already increased the memory values. Memory is, as likely as not, NOT the issue. Two questions: 1. How are you doing the inserts? Via INSERT statements? Or via COPY statements? What sort of transaction grouping is involved? COPY is way faster than INSERT, and grouping plenty of updates into a single transaction is generally a "win." 2. What is the schema like? Does the table have a foreign key constraint? Does it have a bunch of indices? If there should eventually be lots of indices, it tends to be faster to create the table with none/minimal indices, and add indexes afterwards, as long as your "load" process can be trusted to not break "unique" constraints... If there is some secondary table with a foreign key constraint, and _that_ table is growing, it is possible that a sequential scan is being used to search the secondary table where, if you did an ANALYZE on that table, an index scan would be preferred once it grew to larger size... There isn't a particular reason for PostgreSQL to "hit a wall" upon seeing 200K records; I and coworkers routinely load database dumps that have millions of (sometimes pretty fat) records, and they don't "choke." That's true whether talking about loading things onto my (somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID array, or higher end stuff involving high end SMP and EMC disk arrays. The latter obviously being orders of magnitude faster than desktop equipment :-). -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www3.sympatico.ca/cbbrowne/unix.html Rules of the Evil Overlord #207. "Employees will have conjugal visit trailers which they may use provided they call in a replacement and sign out on the timesheet. Given this, anyone caught making out in a closet while leaving their station unmonitored will be shot." <http://www.eviloverlord.com/>
<font face="Default Sans Serif, Verdana, Arial, Helvetica, sans-serif" size="2"><div>I do mass inserts daily into PG. Idrop the all indexes except my primary key and then use the COPY FROM command. This usually takes less than 30 seconds. I spend more time waiting for indexes to recreate.<div><br /><br />Patrick Hatcher<br />Macys.Com<br /><div> </div><fontcolor="#990099">-----pgsql-performance-owner@postgresql.org wrote: -----<br /><br /></font>To: pgsql-performance@postgresql.org<br/>From: Christopher Browne <cbbrowne@acm.org><br />Sent by: pgsql-performance-owner@postgresql.org<br/>Date: 2004-12-04 06:48AM<br />Subject: Re: [PERFORM] Improve BULK insertion<br/><br /><font face="monospace" size="2">In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote:<br/>> Hi !<br />><br />> I need to insert 500.000 records on a table frequently. It´s a bulk<br />> insertionfrom my applicatoin.<br />> I am with a very poor performance. PostgreSQL insert very fast until<br />> thetuple 200.000 and after it the insertion starts to be really slow.<br />> I am seeing on the log and there is a lotof transaction logs,<br />> something like :<br />><br />> 2004-12-04 11:08:59 LOG: recycled transaction logfile "0000000600000012"<br />> 2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013"<br />>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011"<br />> 2004-12-04 11:14:04 LOG: recycledtransaction log file "0000000600000015"<br />> 2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014"<br/>> 2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016"<br />> 2004-12-0411:19:08 LOG: recycled transaction log file "0000000600000017"<br />> 2004-12-04 11:24:10 LOG: recycled transactionlog file "0000000600000018"<br /><br />It is entirely normal for there to be a lot of transaction log file<br/>recycling when bulk inserts are taking place; that goes through a lot<br />of transaction logs.<br /><br />>How can I configure PostgreSQL to have a better performance on this<br />> bulk insertions ? I already increasedthe memory values.<br /><br />Memory is, as likely as not, NOT the issue.<br /><br />Two questions:<br /><br />1. How are you doing the inserts? Via INSERT statements? Or<br /> via COPY statements? What sort of transactiongrouping<br /> is involved?<br /><br /> COPY is way faster than INSERT, and grouping plenty of updates<br/> into a single transaction is generally a "win."<br /><br /> 2. What is the schema like? Does the tablehave a foreign key<br /> constraint? Does it have a bunch of indices?<br /><br /> If there should eventuallybe lots of indices, it tends to be<br /> faster to create the table with none/minimal indices, and add<br /> indexes afterwards, as long as your "load" process can be trusted<br /> to not break "unique" constraints...<br/><br /> If there is some secondary table with a foreign key constraint,<br /> and _that_ tableis growing, it is possible that a sequential<br /> scan is being used to search the secondary table where, if you<br/> did an ANALYZE on that table, an index scan would be preferred<br /> once it grew to larger size...<br /><br/>There isn't a particular reason for PostgreSQL to "hit a wall" upon<br />seeing 200K records; I and coworkers routinelyload database dumps<br />that have millions of (sometimes pretty fat) records, and they don't<br />"choke." That'strue whether talking about loading things onto my<br />(somewhat wimpy) desktop PC, or a SMP Xeon system with a smallRAID<br />array, or higher end stuff involving high end SMP and EMC disk arrays.<br />The latter obviously being ordersof magnitude faster than desktop<br />equipment :-).<br />-- <br />(format nil "~S@~S" "cbbrowne" "acm.org")<br /><ahref="http://www3.sympatico.ca/cbbrowne/unix.html" target="blank">http://www3.sympatico.ca/cbbrowne/unix.html</a><br/>Rules of the Evil Overlord #207. "Employees will haveconjugal visit<br />trailers which they may use provided they call in a replacement and<br />sign out on the timesheet.Given this, anyone caught making out in a<br />closet while leaving their station unmonitored will be shot."<br /><<a href="http://www.eviloverlord.com/" target="blank">http://www.eviloverlord.com/</a>><br /><br />---------------------------(endof broadcast)---------------------------<br />TIP 7: don't forget to increase your freespace map settings<br /></font></div></div></font>
Rodrigo, > I need to insert 500.000 records on a table frequently. It´s a bulk > insertion from my applicatoin. > I am with a very poor performance. PostgreSQL insert very fast until the > tuple 200.000 and after it the insertion starts to be really slow. > I am seeing on the log and there is a lot of transaction logs, something In addition to what Chris Browne asked: What's your transaction log setup? Are your database transaction logs on a seperate disk resource? What is checkpoint_segments and checkpoint_timeout set to? -- Josh Berkus Aglio Database Solutions San Francisco
Hi! 1. I am doing the inserts using pg_restore. The dump was created using pg_dump and the standard format (copy statements) 2. See below the table schema. There are only 7 indexes. 3. My transaction log configuration are : checkpoint_segments = 3 and checkpoint_timeout = 300 and my transaction logs are on the same disk . I know that I can increase the performance separating the transaction logs and making a RAID 5 array BUT I am really curious about WHY this performance is so poor and HOW can I try to improve on this actual machine because actualy this inserts are taking around 90 minutes!!! Cheers! Rodrigo dadosadv=# \d si2010 Table "public.si2010" Column | Type | Modifiers ------------+------------------+--------------------------------------------------------------------- i2_filial | character(2) | not null default ' '::bpchar i2_num | character(10) | not null default ' '::bpchar i2_linha | character(2) | not null default ' '::bpchar i2_data | character(8) | not null default ' '::bpchar i2_dc | character(1) | not null default ' '::bpchar i2_debito | character(20) | not null default ' '::bpchar i2_dcd | character(1) | not null default ' '::bpchar i2_credito | character(20) | not null default ' '::bpchar i2_dcc | character(1) | not null default ' '::bpchar i2_moedas | character(5) | not null default ' '::bpchar i2_valor | double precision | not null default 0.0 i2_hp | character(3) | not null default ' '::bpchar i2_hist | character(40) | not null default ' '::bpchar i2_ccd | character(9) | not null default ' '::bpchar i2_ccc | character(9) | not null default ' '::bpchar i2_ativdeb | character(6) | not null default ' '::bpchar i2_ativcrd | character(6) | not null default ' '::bpchar i2_vlmoed2 | double precision | not null default 0.0 i2_vlmoed3 | double precision | not null default 0.0 i2_vlmoed4 | double precision | not null default 0.0 i2_vlmoed5 | double precision | not null default 0.0 i2_dtvenc | character(8) | not null default ' '::bpchar i2_criter | character(4) | not null default ' '::bpchar i2_rotina | character(8) | not null default ' '::bpchar i2_periodo | character(6) | not null default ' '::bpchar i2_listado | character(1) | not null default ' '::bpchar i2_origem | character(40) | not null default ' '::bpchar i2_permat | character(4) | not null default ' '::bpchar i2_filorig | character(2) | not null default ' '::bpchar i2_intercp | character(1) | not null default ' '::bpchar i2_identcp | character(12) | not null default ' '::bpchar i2_lote | character(4) | not null default ' '::bpchar i2_doc | character(6) | not null default ' '::bpchar i2_emporig | character(2) | not null default ' '::bpchar i2_lp | character(3) | not null default ' '::bpchar i2_itemd | character(9) | not null default ' '::bpchar i2_itemc | character(9) | not null default ' '::bpchar i2_prelan | character(1) | not null default ' '::bpchar i2_tipo | character(2) | not null default ' '::bpchar i2_dcc | character(1) | not null default ' '::bpchar i2_moedas | character(5) | not null default ' '::bpchar i2_valor | double precision | not null default 0.0 i2_hp | character(3) | not null default ' '::bpchar i2_hist | character(40) | not null default ' '::bpchar i2_ccd | character(9) | not null default ' '::bpchar i2_ccc | character(9) | not null default ' '::bpchar i2_ativdeb | character(6) | not null default ' '::bpchar i2_ativcrd | character(6) | not null default ' '::bpchar i2_vlmoed2 | double precision | not null default 0.0 i2_vlmoed3 | double precision | not null default 0.0 i2_vlmoed4 | double precision | not null default 0.0 i2_vlmoed5 | double precision | not null default 0.0 i2_dtvenc | character(8) | not null default ' '::bpchar i2_criter | character(4) | not null default ' '::bpchar i2_rotina | character(8) | not null default ' '::bpchar i2_periodo | character(6) | not null default ' '::bpchar i2_listado | character(1) | not null default ' '::bpchar i2_origem | character(40) | not null default ' '::bpchar i2_permat | character(4) | not null default ' '::bpchar i2_filorig | character(2) | not null default ' '::bpchar i2_intercp | character(1) | not null default ' '::bpchar i2_identcp | character(12) | not null default ' '::bpchar i2_lote | character(4) | not null default ' '::bpchar i2_doc | character(6) | not null default ' '::bpchar i2_emporig | character(2) | not null default ' '::bpchar i2_lp | character(3) | not null default ' '::bpchar i2_itemd | character(9) | not null default ' '::bpchar i2_itemc | character(9) | not null default ' '::bpchar i2_prelan | character(1) | not null default ' '::bpchar i2_tipo | character(2) | not null default ' '::bpchar d_e_l_e_t_ | character(1) | not null default ' '::bpchar r_e_c_n_o_ | double precision | not null default 0.0 Indexes: "si2010_pkey" primary key, btree (r_e_c_n_o_) "si20101" btree (i2_filial, i2_num, i2_linha, i2_periodo, r_e_c_n_o_, d_e_l_e_t_) "si20102" btree (i2_filial, i2_periodo, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20103" btree (i2_filial, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20104" btree (i2_filial, i2_debito, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20105" btree (i2_filial, i2_credito, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20106" btree (i2_filial, i2_doc, i2_periodo, r_e_c_n_o_, d_e_l_e_t_) "si20107" btree (i2_filial, i2_origem, r_e_c_n_o_, d_e_l_e_t_) Christopher Browne wrote: >In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote: > > >>Hi ! >> >>I need to insert 500.000 records on a table frequently. It´s a bulk >>insertion from my applicatoin. >>I am with a very poor performance. PostgreSQL insert very fast until >>the tuple 200.000 and after it the insertion starts to be really slow. >>I am seeing on the log and there is a lot of transaction logs, >>something like : >> >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012" >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013" >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011" >>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015" >>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014" >>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016" >>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017" >>2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018" >> >> > >It is entirely normal for there to be a lot of transaction log file >recycling when bulk inserts are taking place; that goes through a lot >of transaction logs. > > > >>How can I configure PostgreSQL to have a better performance on this >>bulk insertions ? I already increased the memory values. >> >> > >Memory is, as likely as not, NOT the issue. > >Two questions: > > 1. How are you doing the inserts? Via INSERT statements? Or > via COPY statements? What sort of transaction grouping > is involved? > > COPY is way faster than INSERT, and grouping plenty of updates > into a single transaction is generally a "win." > > 2. What is the schema like? Does the table have a foreign key > constraint? Does it have a bunch of indices? > > If there should eventually be lots of indices, it tends to be > faster to create the table with none/minimal indices, and add > indexes afterwards, as long as your "load" process can be trusted > to not break "unique" constraints... > > If there is some secondary table with a foreign key constraint, > and _that_ table is growing, it is possible that a sequential > scan is being used to search the secondary table where, if you > did an ANALYZE on that table, an index scan would be preferred > once it grew to larger size... > >There isn't a particular reason for PostgreSQL to "hit a wall" upon >seeing 200K records; I and coworkers routinely load database dumps >that have millions of (sometimes pretty fat) records, and they don't >"choke." That's true whether talking about loading things onto my >(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID >array, or higher end stuff involving high end SMP and EMC disk arrays. >The latter obviously being orders of magnitude faster than desktop >equipment :-). > >
Rodrigo -- You should definitely drop the indexes and any other FK constraints before loading and then rebuild them. Check your logsand see if there are warnings about checkpoint intervals -- only 3 logs seems like it might be small; if you have thedisk space I would definitely consider raising the number. If you haven't already posted your config settings you mightdo so -- this seems very slow. I regularly use COPY to load or unload data sets in the 200k-900k range and they don'ttake 90 minutes, even on slower hardware (and usually only a few minutes on our production servers; rebuilding the indexesusually takes longer. This unloading a 300k+ row data set on a dell linux box with not very good disks and 1 gig of RAM: Starting copy of parcel staging table parcels_12031 at Thu Dec 2 01:13:52 2004 Done with staging table copy at Thu Dec 2 01:15:16 2004 ... Starting compression of parcel file at Thu Dec 2 01:15:22 2004 gzip: /tmp/parcels_12031.unl.gz already exists; do you wish to overwrite (y or n )? y Done with compression of parcel file at Thu Dec 2 01:17:23 2004 ... And loading them on a rather faster server: Starting unzip of parcels at Thu Dec 2 01:29:15 2004 Finished with unzip at Thu Dec 2 01:29:22 2004 ... Target db detail table updated at Thu Dec 2 01:29:29 2004 Dropping indexes Dropping fk constraint on tracking id Dropping indexes Done dropping indexes on target parcels table at Thu Dec 2 01:29:30 2004 NOTICE: drop cascades to table f12031.parcel_pins NOTICE: drop cascades to table f12031.parcel_addresses NOTICE: drop cascades to table f12031.parcel_owner_fti NOTICE: drop cascades to table f12031.parcel_owners Removing old parcels entries starting at Thu Dec 2 01:29:30 2004 Done deleting schema and parcels for track_id 10163541 at Thu Dec 2 01:33:04 2004 Starting load of parcels at Thu Dec 2 01:33:04 2004 Done copying data into parcels at Thu Dec 2 01:35:18 2004 Deleting old v_detail reference for track_id 10163541 Done with delete of old v_detail reference Starting creation of foreign key constraint at Thu Dec 2 01:39:43 2004 Done with creation of foreign key constraint at Thu Dec 2 01:42:14 2004 Starting spatial index create at Thu Dec 2 01:42:14 2004 Done creating spatial index at Thu Dec 2 01:55:04 2004 Starting stats on geometry column now Done doing stats for spatial index at Thu Dec 2 02:03:47 2004 Starting index on PIN now Done creating pin index at Thu Dec 2 02:09:36 2004 Starting index on tracking id now Done creating trid index at Thu Dec 2 02:12:35 2004 Starting centroid index now Done creating centroid index at Thu Dec 2 02:24:11 2004 Starting stats on centroid column Done doing stats for spatial index at Thu Dec 2 02:29:55 2004 Doing City/Street Index on parcels table ...Done creating city/street index at Thu Dec 2 02:42:41 2004 with result <-1> Committing changes So this took about 70 minutes to delete 200000+ rows from a table with about 5 million rows, load a new set and reindex them(and do some statistics for spatial geometry). If the table had only this data the indexing would have been *much* faster.These are moderate size columns -- about 2 dozen columns and some spatial data (polygon and point). Both servers haverather more log files than your setup, but I am not familiar enough with postgres to know how much of an impact thatalone will have. The comment about it slowing down part way through a load makes me suspect indexing issues, somehow(not from postgres experience but it rings a bell with other DBs); if you explicitly drop the indexes first and thenload does it show the same performance behavior ? If you are doing the data read from, the database write and the WAL logging all on single disk drive, then I would guessthat that is your bottleneck. If you use vmstat and/or top or the like, is your I/O pegged ? HTH Greg WIlliamson DBA GlobeXplorer LLC -----Original Message----- From: Rodrigo Carvalhaes [mailto:grupos@carvalhaes.net] Sent: Sun 12/5/2004 11:52 AM To: Christopher Browne Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Improve BULK insertion Hi! 1. I am doing the inserts using pg_restore. The dump was created using pg_dump and the standard format (copy statements) 2. See below the table schema. There are only 7 indexes. 3. My transaction log configuration are : checkpoint_segments = 3 and checkpoint_timeout = 300 and my transaction logs are on the same disk . I know that I can increase the performance separating the transaction logs and making a RAID 5 array BUT I am really curious about WHY this performance is so poor and HOW can I try to improve on this actual machine because actualy this inserts are taking around 90 minutes!!! Cheers! Rodrigo dadosadv=# \d si2010 Table "public.si2010" Column | Type | Modifiers ------------+------------------+--------------------------------------------------------------------- i2_filial | character(2) | not null default ' '::bpchar i2_num | character(10) | not null default ' '::bpchar i2_linha | character(2) | not null default ' '::bpchar i2_data | character(8) | not null default ' '::bpchar i2_dc | character(1) | not null default ' '::bpchar i2_debito | character(20) | not null default ' '::bpchar i2_dcd | character(1) | not null default ' '::bpchar i2_credito | character(20) | not null default ' '::bpchar i2_dcc | character(1) | not null default ' '::bpchar i2_moedas | character(5) | not null default ' '::bpchar i2_valor | double precision | not null default 0.0 i2_hp | character(3) | not null default ' '::bpchar i2_hist | character(40) | not null default ' '::bpchar i2_ccd | character(9) | not null default ' '::bpchar i2_ccc | character(9) | not null default ' '::bpchar i2_ativdeb | character(6) | not null default ' '::bpchar i2_ativcrd | character(6) | not null default ' '::bpchar i2_vlmoed2 | double precision | not null default 0.0 i2_vlmoed3 | double precision | not null default 0.0 i2_vlmoed4 | double precision | not null default 0.0 i2_vlmoed5 | double precision | not null default 0.0 i2_dtvenc | character(8) | not null default ' '::bpchar i2_criter | character(4) | not null default ' '::bpchar i2_rotina | character(8) | not null default ' '::bpchar i2_periodo | character(6) | not null default ' '::bpchar i2_listado | character(1) | not null default ' '::bpchar i2_origem | character(40) | not null default ' '::bpchar i2_permat | character(4) | not null default ' '::bpchar i2_filorig | character(2) | not null default ' '::bpchar i2_intercp | character(1) | not null default ' '::bpchar i2_identcp | character(12) | not null default ' '::bpchar i2_lote | character(4) | not null default ' '::bpchar i2_doc | character(6) | not null default ' '::bpchar i2_emporig | character(2) | not null default ' '::bpchar i2_lp | character(3) | not null default ' '::bpchar i2_itemd | character(9) | not null default ' '::bpchar i2_itemc | character(9) | not null default ' '::bpchar i2_prelan | character(1) | not null default ' '::bpchar i2_tipo | character(2) | not null default ' '::bpchar i2_dcc | character(1) | not null default ' '::bpchar i2_moedas | character(5) | not null default ' '::bpchar i2_valor | double precision | not null default 0.0 i2_hp | character(3) | not null default ' '::bpchar i2_hist | character(40) | not null default ' '::bpchar i2_ccd | character(9) | not null default ' '::bpchar i2_ccc | character(9) | not null default ' '::bpchar i2_ativdeb | character(6) | not null default ' '::bpchar i2_ativcrd | character(6) | not null default ' '::bpchar i2_vlmoed2 | double precision | not null default 0.0 i2_vlmoed3 | double precision | not null default 0.0 i2_vlmoed4 | double precision | not null default 0.0 i2_vlmoed5 | double precision | not null default 0.0 i2_dtvenc | character(8) | not null default ' '::bpchar i2_criter | character(4) | not null default ' '::bpchar i2_rotina | character(8) | not null default ' '::bpchar i2_periodo | character(6) | not null default ' '::bpchar i2_listado | character(1) | not null default ' '::bpchar i2_origem | character(40) | not null default ' '::bpchar i2_permat | character(4) | not null default ' '::bpchar i2_filorig | character(2) | not null default ' '::bpchar i2_intercp | character(1) | not null default ' '::bpchar i2_identcp | character(12) | not null default ' '::bpchar i2_lote | character(4) | not null default ' '::bpchar i2_doc | character(6) | not null default ' '::bpchar i2_emporig | character(2) | not null default ' '::bpchar i2_lp | character(3) | not null default ' '::bpchar i2_itemd | character(9) | not null default ' '::bpchar i2_itemc | character(9) | not null default ' '::bpchar i2_prelan | character(1) | not null default ' '::bpchar i2_tipo | character(2) | not null default ' '::bpchar d_e_l_e_t_ | character(1) | not null default ' '::bpchar r_e_c_n_o_ | double precision | not null default 0.0 Indexes: "si2010_pkey" primary key, btree (r_e_c_n_o_) "si20101" btree (i2_filial, i2_num, i2_linha, i2_periodo, r_e_c_n_o_, d_e_l_e_t_) "si20102" btree (i2_filial, i2_periodo, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20103" btree (i2_filial, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20104" btree (i2_filial, i2_debito, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20105" btree (i2_filial, i2_credito, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20106" btree (i2_filial, i2_doc, i2_periodo, r_e_c_n_o_, d_e_l_e_t_) "si20107" btree (i2_filial, i2_origem, r_e_c_n_o_, d_e_l_e_t_) Christopher Browne wrote: >In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote: > > >>Hi ! >> >>I need to insert 500.000 records on a table frequently. It´s a bulk >>insertion from my applicatoin. >>I am with a very poor performance. PostgreSQL insert very fast until >>the tuple 200.000 and after it the insertion starts to be really slow. >>I am seeing on the log and there is a lot of transaction logs, >>something like : >> >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012" >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013" >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011" >>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015" >>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014" >>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016" >>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017" >>2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018" >> >> > >It is entirely normal for there to be a lot of transaction log file >recycling when bulk inserts are taking place; that goes through a lot >of transaction logs. > > > >>How can I configure PostgreSQL to have a better performance on this >>bulk insertions ? I already increased the memory values. >> >> > >Memory is, as likely as not, NOT the issue. > >Two questions: > > 1. How are you doing the inserts? Via INSERT statements? Or > via COPY statements? What sort of transaction grouping > is involved? > > COPY is way faster than INSERT, and grouping plenty of updates > into a single transaction is generally a "win." > > 2. What is the schema like? Does the table have a foreign key > constraint? Does it have a bunch of indices? > > If there should eventually be lots of indices, it tends to be > faster to create the table with none/minimal indices, and add > indexes afterwards, as long as your "load" process can be trusted > to not break "unique" constraints... > > If there is some secondary table with a foreign key constraint, > and _that_ table is growing, it is possible that a sequential > scan is being used to search the secondary table where, if you > did an ANALYZE on that table, an index scan would be preferred > once it grew to larger size... > >There isn't a particular reason for PostgreSQL to "hit a wall" upon >seeing 200K records; I and coworkers routinely load database dumps >that have millions of (sometimes pretty fat) records, and they don't >"choke." That's true whether talking about loading things onto my >(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID >array, or higher end stuff involving high end SMP and EMC disk arrays. >The latter obviously being orders of magnitude faster than desktop >equipment :-). > > ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Rodrigo, > 3. My transaction log configuration are : checkpoint_segments = 3 and > checkpoint_timeout = 300 and my transaction logs are on the same disk . Well, you need to move your transaction logs to another disk, and increase them to a large number ... like 128, which is about 1GB (you'll need that much disk space). Also, increase the checkpoint_timeout to minimize checkpointing during the load; like, 1500. > I know that I can increase the performance separating the transaction > logs and making a RAID 5 array Actually, RAID5, unless you're using > 5 disks, would make things slower. Speeding writes up through RAID would require at least 6 drives, and probably RAID 1+0. > BUT I am really curious about WHY this > performance is so poor and HOW can I try to improve on this actual > machine because actualy this inserts are taking around 90 minutes!!! Are you doing INSERTS and not COPY? If so, are you batching them in transactions? -- Josh Berkus Aglio Database Solutions San Francisco