Re: Inserting 26 million rows takes 8 hours, how to - Mailing list pgsql-admin

From Jose Vicente Nunez Z
Subject Re: Inserting 26 million rows takes 8 hours, how to
Date
Msg-id 1063726066.5407.46.camel@linux0037
Whole thread Raw
In response to Re: Inserting 26 million rows takes 8 hours, how to improve those times?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Inserting 26 million rows takes 8 hours, how to improve those times?
Re: Inserting 26 million rows takes 8 hours, how to
List pgsql-admin
Tom,


On Tue, 2003-09-16 at 10:44, Tom Lane wrote:
> Jose Vicente Nunez Z <josevnz@newbreak.com> writes:
> > For that i'm using the PostgreSQL copy tool, but the problem is that is
> > taking almost 9 hours to load the data:
>
> Hmm, 889 rows/second doesn't sound that bad, considering that you've
> given us no data about the hardware you're running on.  In particular,
> what's your disk setup?

My fault, i should provide more information about my setup.

I got two ATA disks (one for PostgreSQL, the other for the data files).
The box is a Pentium III 1.3Ghz, dual CPU with 2 GB of RAM. One disk is
30 GB, the other is 40GB.

As you can see is not state of the art harware, but for development
purposes i should get a decent performance (only one or two applications
will be accessing the server to do some data mining).

>
> But at any rate, the only bottlenecks I could think of for a COPY
> command are:
>
> 1. Data conversion.  This would depend on the column datatypes of the
> table, which you didn't tell us.
>
create table mytable
(
     a         varchar(20) not null,
     b         varchar(20) not null,
     c         varchar(20) not null,
     d         char(6),
     f         int null,
     g         float not null,
     h         float not null
)

Data conversions shouldn't be that complicated, isn't it?

> 2. Index updating.  This would depend on the indexes on the table,
> which you didn't tell us.

I had one index and dropped it for the load. Also i turn of the 'fsync'
flag (fsync=false) and now the load time is only 15 minutes!. But then
recreating the original index takes forever:

create  index myindex on mytable ( a, b, c );

For some reason i don't see much I/O but a lot of CPU ussage:

 11:18:05  up 3 days, 20:55,  3 users,  load average: 1.00, 1.00, 0.92
51 processes: 49 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:  96.0% user   3.0% system    0.0% nice   0.0% iowait   0.0%
idle
CPU1 states:   0.1% user   0.0% system    0.0% nice   0.0% iowait  99.0%
idle
Mem:  2064404k av, 2049312k used,   15092k free,       0k shrd,  149544k
buff
                    406064k actv, 1076876k in_d,  345612k in_c
Swap: 2047744k av,    4252k used, 2043492k free                 1809160k
cached

       
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
COMMAND
 1882 postgres  25   0 41492  40M 36080 R    98.6  2.0  87:34   0
postmaster
16998 root      15   0     0    0     0 SW    1.9  0.0   4:30   0
kjournald
 1956 root      19   0  1152 1152   848 R     0.9  0.0   0:00   1 top
    1 root      15   0   476  448   420 S     0.0  0.0   0:07   1 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0
migration/0
    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1
migration/1
    4 root      15   0     0    0     0 SW    0.0  0.0   0:00   1
keventd
    5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd_CPU0
    6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1
ksoftirqd_CPU1
   11 root      19   0     0    0     0 SW    0.0  0.0   0:00   0
bdflush
    7 root      15   0     0    0     0 SW    0.0  0.0   0:22   0 kswapd
    8 root      15   0     0    0     0 SW    0.0  0.0   0:00   1
kscand/DMA
    9 root      15   0     0    0     0 SW    0.0  0.0   1:57   0
kscand/Normal
   10 root      15   0     0    0     0 SW    0.0  0.0   2:28   0
kscand/HighMem
   12 root      15   0     0    0     0 SW    0.0  0.0   0:04   0
kupdated

Also seems to be than memory usage is ok, i don't see any process
blocked or waiting for resources, nor swapped out due lack of memory:

[root@linux0105 data]# vmstat 10 10
   procs                      memory      swap          io
system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id
 1  0  0   4252  15092 149564 1809160    0    0    60    43    1    16
2  1 33
 1  0  0   4252  15092 149564 1809160    0    0     0  1820  134    12
49  1 50
 1  0  0   4252  15092 149564 1809160    0    0     0  1282  125     8
49  1 50
 1  0  0   4252  15092 149568 1809160    0    0     0  1202  124     9
50  1 50
 1  0  0   4252  15092 149572 1809160    0    0     0  1216  123     9
50  1 50
 1  0  0   4252  15092 149576 1809160    0    0     0  1224  125     8
50  0 50
 1  0  0   4252  15092 149580 1809160    0    0     0  1212  124    10
50  1 50
 1  0  0   4252  15092 149584 1809160    0    0     0  1206  123     8
50  1 50
 1  0  0   4252  15092 149588 1809160    0    0     0  1212  125     8
49  1 50
 1  0  0   4252  15092 149592 1809160    0    0     0  1208  125     9
49  1 50


Any ideas about what i can do to speed up the Index creation? Why is
using so much CPU (doesn't seem to be writting much information to disk
though).

>
> 3. Foreign-key constraint checking.  This would depend on the foreign
> keys the table has, which you didn't tell us.
>

None so far, is just one table.

> In short ... you'd better provide a lot more detail if you want
> useful commentary.
>
> > I apologize if this not the proper place to post this questions.
>
> pgsql-performance would be more appropriate, probably.

I'm reading some documentation (checking the list archives):

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Hopefully this would help me to fix the problem.

Thanks for the help,

JV.

>
>             regards, tom lane
--
Jose Vicente Nunez Zuleta (josevnz at newbreak dot com)
Newbreak LLC System Administrator
http://www.newbreak.com
RHCE, SCJD, SCJP


pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inserting 26 million rows takes 8 hours, how to improve those times?
Next
From: Tom Lane
Date:
Subject: Re: Inserting 26 million rows takes 8 hours, how to improve those times?