Performance while loading data and indexing - Mailing list pgsql-general

Hello all,

Some time back I posted a query to build a site with 150GB of database. In last
couple of weeks, lots of things were tested at my place and there are some
results and again some concerns.

This is a long post. Please be patient and read thr. If we win this, I guess we
have a good marketing/advocacy  case here..;-)

First the problems (For those who do not read beyond first page)

1) Database load time from flat file using copy is very high
2) Creating index takes huge amount of time.
3) Any suggsestions for runtime as data load and query will be going in
parallel.

Now the details. Note that this is a test run only..

Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI
RedHat7.2/PostgreSQL7.1.3

Database in flat file:
125,000,000 records of around 100 bytes each.
Flat file size 12GB

Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec.
Create unique composite index on 2 char and a timestamp field:  25226 sec.
Database size on disk: 26GB
Select query: 1.5 sec. for approx. 150 rows.

Important postgresql.conf settings

sort_mem = 12000
shared_buffers = 24000
fsync=true (Sad but true. Left untouched.. Will that make a difference on
SCSI?)
wal_buffers = 65536
wal_files = 64

Now the requirements

Initial flat data load: 250GB of data. This has gone up since last query. It
was 150GB earlier..
Ongoing inserts: 5000/sec.
Number of queries: 4800 queries/hour
Query response time: 10 sec.


Now questions.

1)  Instead of copying from a single 12GB data file,  will a parallel copy from
say 5 files will speed up the things?

Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5
setup..

2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further
addition to improve create index performance?

3) 5K concurrent inserts with an index on, will this need a additional CPU
power? Like deploying it on dual RISC CPUs etc?

4) Query performance is not a problem. Though 4.8K queries per sec. expected
response time from each query is 10 sec. But my guess is some serius CPU power
will be chewed there too..

5)Will upgrading to 7.2.2/7.3 beta help?

All in all, in the  test, we didn't see the performance where hardware is
saturated to it's limits. So effectively we are not able to get postgresql
making use of it. Just pushing WAL and shared buffers does not seem to be the
solution.

If you guys have any suggestions. let me know.  I need them all..

Mysql is almost out because it's creating index for last 17 hours. I don't
think it will keep up with 5K inserts per sec. with index. SAP DB is under
evaluation too. But postgresql is most favourite as of now because it works. So
I need to come up with solutions to problems that will occur in near future..
;-)

TIA..

Bye
 Shridhar

--
Law of Procrastination:    Procrastination avoids boredom; one never has    the
feeling that there is nothing important to do.


pgsql-general by date:

Previous
From: "Gaetano Mendola"
Date:
Subject: Re: rotatelog / logrotate with PostgreSQL
Next
From: "Shridhar Daithankar"
Date:
Subject: Re: [HACKERS] Performance while loading data and indexing