... - Mailing list pgsql-general

From Anton de Wet
Subject ...
Date
Msg-id Pine.LNX.3.96.981019160311.7518G-100000@ra.obsidian.co.za
Whole thread Raw
Responses Re: your mail
List pgsql-general
Hi

I'm playing with postgresql and a few rather large datasets.

My biggest table I'm going to have to work with will be something like:

create table salesitems (storecode char(5), docid char(12),
    stockcode char(7), nrsold int, totalcost float8);

with data like:
BRM2,BRM2-001681,M01207,8,117.89

Now my full file will have 5-10 million rows but I'm playing with test sets of
6000000 rows. Its taking a VERY long time with COPY to get the data
into postgres. To try and find out what is going on, I've broken down the set
into 10000 row blocks and I'm timing the time it takes to import it.

The system is a PentiumII-300 with 96 MB of memory and its running Linux 2.0.35.
I've tried 6.3.2 and 6.4-beta2 with the same results (though 6.4 solves a few
other problems I've had).

I'm running the COPYs from a bash script which runs psql for each copy.

COPYing the blocks give the following times (time given per block, not
total time)

importing part 1 of      62, time : 0:02.35
importing part 2 of      62, time : 0:01.88
importing part 3 of      62, time : 0:01.99
importing part 4 of      62, time : 0:02.67
importing part 5 of      62, time : 0:02.09
importing part 6 of      62, time : 0:02.20
importing part 7 of      62, time : 0:02.35
importing part 8 of      62, time : 0:02.41
importing part 9 of      62, time : 0:02.62
importing part 10 of      62, time : 0:02.79
importing part 11 of      62, time : 0:03.10
importing part 12 of      62, time : 0:03.82
importing part 13 of      62, time : 0:03.98
importing part 14 of      62, time : 0:03.72
importing part 15 of      62, time : 0:03.81
importing part 16 of      62, time : 0:04.01
importing part 17 of      62, time : 0:04.29
importing part 18 of      62, time : 0:04.62
importing part 19 of      62, time : 0:05.35
importing part 20 of      62, time : 0:05.40
importing part 21 of      62, time : 0:05.51
importing part 22 of      62, time : 0:05.92
importing part 23 of      62, time : 0:06.04
importing part 24 of      62, time : 0:06.53
importing part 25 of      62, time : 0:06.79
importing part 26 of      62, time : 0:07.24
importing part 27 of      62, time : 0:07.56
importing part 28 of      62, time : 0:08.56
importing part 29 of      62, time : 0:08.85
importing part 30 of      62, time : 0:09.50
importing part 31 of      62, time : 0:09.50
importing part 32 of      62, time : 0:10.28
importing part 33 of      62, time : 0:10.10
importing part 34 of      62, time : 0:13.45
importing part 35 of      62, time : 0:11.23
importing part 36 of      62, time : 0:11.32
importing part 37 of      62, time : 0:11.65
importing part 38 of      62, time : 0:11.70
importing part 39 of      62, time : 0:11.97
importing part 40 of      62, time : 0:12.67
importing part 41 of      62, time : 0:12.42
importing part 42 of      62, time : 0:12.90
importing part 43 of      62, time : 0:13.26
importing part 44 of      62, time : 0:14.04
importing part 45 of      62, time : 0:13.52
importing part 46 of      62, time : 0:14.31
importing part 47 of      62, time : 0:14.48
importing part 48 of      62, time : 0:14.99
importing part 49 of      62, time : 0:15.22
importing part 50 of      62, time : 0:15.18
importing part 51 of      62, time : 0:16.00
importing part 52 of      62, time : 0:17.46
importing part 53 of      62, time : 0:16.55
importing part 54 of      62, time : 0:17.01
importing part 55 of      62, time : 0:17.30
importing part 56 of      62, time : 0:17.55
importing part 57 of      62, time : 0:19.36
importing part 58 of      62, time : 0:18.14
importing part 59 of      62, time : 0:17.86
importing part 60 of      62, time : 0:18.65
importing part 61 of      62, time : 0:18.95



Or nicely plotted it looks like this:


     20 ++--------+--------+---------+---------+---------+--------+--------++
        +         +        +         +         +         +    A  AA         +
     18 ++                                                     "times"  A  ++
        |                                                 AAAA              |
     16 ++                                               A A               ++
        |                                             AAA                   |
     14 ++                                        A AA                     ++
        |                               A       AA A                        |
     12 ++                                   AAA                           ++
        |                                AAAA                               |
     10 ++                            AA                                   ++
        |                           AA                                      |
        |                         AA                                        |
      8 ++                      AA                                         ++
        |                     AA                                            |
      6 ++               AAAAA                                             ++
        |               AA                                                  |
      4 ++         AAAAA                                                   ++
        |  A   AAAA                                                         |
      2 AAA AAA                                                            ++
        +         +        +         +         +         +        +         +
      0 ++--------+--------+---------+---------+---------+--------+A-------++
        0        10       20        30        40        50       60        70


Apparently a linear increase in time per data set.

This means that importing 1/2 million rows takes  7 minuites, but
importing 5 million takes more than 10 hours!

Is there a way around this? I'm currently creating indexes afterwards (I haven't
even tried creating indexes on the full set of data, will that show similar
linear time increases?


Any help will be appreciated.

Anton





"We must be the change we wish to see in the world."--Gandhi


pgsql-general by date:

Previous
From: Juan Carlos Castro y Castro
Date:
Subject: Access & pgsql
Next
From: Oleg Broytmann
Date:
Subject: Re: [GENERAL] Access & pgsql