Re: Improve BULK insertion - Mailing list pgsql-performance
From | Rodrigo Carvalhaes |
---|---|
Subject | Re: Improve BULK insertion |
Date | |
Msg-id | 41B366E3.7040109@carvalhaes.net Whole thread Raw |
In response to | Re: Improve BULK insertion (Christopher Browne <cbbrowne@acm.org>) |
Responses |
Re: Improve BULK insertion
|
List | pgsql-performance |
Hi! 1. I am doing the inserts using pg_restore. The dump was created using pg_dump and the standard format (copy statements) 2. See below the table schema. There are only 7 indexes. 3. My transaction log configuration are : checkpoint_segments = 3 and checkpoint_timeout = 300 and my transaction logs are on the same disk . I know that I can increase the performance separating the transaction logs and making a RAID 5 array BUT I am really curious about WHY this performance is so poor and HOW can I try to improve on this actual machine because actualy this inserts are taking around 90 minutes!!! Cheers! Rodrigo dadosadv=# \d si2010 Table "public.si2010" Column | Type | Modifiers ------------+------------------+--------------------------------------------------------------------- i2_filial | character(2) | not null default ' '::bpchar i2_num | character(10) | not null default ' '::bpchar i2_linha | character(2) | not null default ' '::bpchar i2_data | character(8) | not null default ' '::bpchar i2_dc | character(1) | not null default ' '::bpchar i2_debito | character(20) | not null default ' '::bpchar i2_dcd | character(1) | not null default ' '::bpchar i2_credito | character(20) | not null default ' '::bpchar i2_dcc | character(1) | not null default ' '::bpchar i2_moedas | character(5) | not null default ' '::bpchar i2_valor | double precision | not null default 0.0 i2_hp | character(3) | not null default ' '::bpchar i2_hist | character(40) | not null default ' '::bpchar i2_ccd | character(9) | not null default ' '::bpchar i2_ccc | character(9) | not null default ' '::bpchar i2_ativdeb | character(6) | not null default ' '::bpchar i2_ativcrd | character(6) | not null default ' '::bpchar i2_vlmoed2 | double precision | not null default 0.0 i2_vlmoed3 | double precision | not null default 0.0 i2_vlmoed4 | double precision | not null default 0.0 i2_vlmoed5 | double precision | not null default 0.0 i2_dtvenc | character(8) | not null default ' '::bpchar i2_criter | character(4) | not null default ' '::bpchar i2_rotina | character(8) | not null default ' '::bpchar i2_periodo | character(6) | not null default ' '::bpchar i2_listado | character(1) | not null default ' '::bpchar i2_origem | character(40) | not null default ' '::bpchar i2_permat | character(4) | not null default ' '::bpchar i2_filorig | character(2) | not null default ' '::bpchar i2_intercp | character(1) | not null default ' '::bpchar i2_identcp | character(12) | not null default ' '::bpchar i2_lote | character(4) | not null default ' '::bpchar i2_doc | character(6) | not null default ' '::bpchar i2_emporig | character(2) | not null default ' '::bpchar i2_lp | character(3) | not null default ' '::bpchar i2_itemd | character(9) | not null default ' '::bpchar i2_itemc | character(9) | not null default ' '::bpchar i2_prelan | character(1) | not null default ' '::bpchar i2_tipo | character(2) | not null default ' '::bpchar i2_dcc | character(1) | not null default ' '::bpchar i2_moedas | character(5) | not null default ' '::bpchar i2_valor | double precision | not null default 0.0 i2_hp | character(3) | not null default ' '::bpchar i2_hist | character(40) | not null default ' '::bpchar i2_ccd | character(9) | not null default ' '::bpchar i2_ccc | character(9) | not null default ' '::bpchar i2_ativdeb | character(6) | not null default ' '::bpchar i2_ativcrd | character(6) | not null default ' '::bpchar i2_vlmoed2 | double precision | not null default 0.0 i2_vlmoed3 | double precision | not null default 0.0 i2_vlmoed4 | double precision | not null default 0.0 i2_vlmoed5 | double precision | not null default 0.0 i2_dtvenc | character(8) | not null default ' '::bpchar i2_criter | character(4) | not null default ' '::bpchar i2_rotina | character(8) | not null default ' '::bpchar i2_periodo | character(6) | not null default ' '::bpchar i2_listado | character(1) | not null default ' '::bpchar i2_origem | character(40) | not null default ' '::bpchar i2_permat | character(4) | not null default ' '::bpchar i2_filorig | character(2) | not null default ' '::bpchar i2_intercp | character(1) | not null default ' '::bpchar i2_identcp | character(12) | not null default ' '::bpchar i2_lote | character(4) | not null default ' '::bpchar i2_doc | character(6) | not null default ' '::bpchar i2_emporig | character(2) | not null default ' '::bpchar i2_lp | character(3) | not null default ' '::bpchar i2_itemd | character(9) | not null default ' '::bpchar i2_itemc | character(9) | not null default ' '::bpchar i2_prelan | character(1) | not null default ' '::bpchar i2_tipo | character(2) | not null default ' '::bpchar d_e_l_e_t_ | character(1) | not null default ' '::bpchar r_e_c_n_o_ | double precision | not null default 0.0 Indexes: "si2010_pkey" primary key, btree (r_e_c_n_o_) "si20101" btree (i2_filial, i2_num, i2_linha, i2_periodo, r_e_c_n_o_, d_e_l_e_t_) "si20102" btree (i2_filial, i2_periodo, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20103" btree (i2_filial, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20104" btree (i2_filial, i2_debito, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20105" btree (i2_filial, i2_credito, i2_data, i2_num, i2_linha, r_e_c_n_o_, d_e_l_e_t_) "si20106" btree (i2_filial, i2_doc, i2_periodo, r_e_c_n_o_, d_e_l_e_t_) "si20107" btree (i2_filial, i2_origem, r_e_c_n_o_, d_e_l_e_t_) Christopher Browne wrote: >In the last exciting episode, grupos@carvalhaes.net (Grupos) wrote: > > >>Hi ! >> >>I need to insert 500.000 records on a table frequently. It´s a bulk >>insertion from my applicatoin. >>I am with a very poor performance. PostgreSQL insert very fast until >>the tuple 200.000 and after it the insertion starts to be really slow. >>I am seeing on the log and there is a lot of transaction logs, >>something like : >> >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012" >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013" >>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011" >>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015" >>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014" >>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016" >>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017" >>2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018" >> >> > >It is entirely normal for there to be a lot of transaction log file >recycling when bulk inserts are taking place; that goes through a lot >of transaction logs. > > > >>How can I configure PostgreSQL to have a better performance on this >>bulk insertions ? I already increased the memory values. >> >> > >Memory is, as likely as not, NOT the issue. > >Two questions: > > 1. How are you doing the inserts? Via INSERT statements? Or > via COPY statements? What sort of transaction grouping > is involved? > > COPY is way faster than INSERT, and grouping plenty of updates > into a single transaction is generally a "win." > > 2. What is the schema like? Does the table have a foreign key > constraint? Does it have a bunch of indices? > > If there should eventually be lots of indices, it tends to be > faster to create the table with none/minimal indices, and add > indexes afterwards, as long as your "load" process can be trusted > to not break "unique" constraints... > > If there is some secondary table with a foreign key constraint, > and _that_ table is growing, it is possible that a sequential > scan is being used to search the secondary table where, if you > did an ANALYZE on that table, an index scan would be preferred > once it grew to larger size... > >There isn't a particular reason for PostgreSQL to "hit a wall" upon >seeing 200K records; I and coworkers routinely load database dumps >that have millions of (sometimes pretty fat) records, and they don't >"choke." That's true whether talking about loading things onto my >(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID >array, or higher end stuff involving high end SMP and EMC disk arrays. >The latter obviously being orders of magnitude faster than desktop >equipment :-). > >
pgsql-performance by date: