Performance while loading data and indexing - Mailing list pgsql-general
From | Shridhar Daithankar |
---|---|
Subject | Performance while loading data and indexing |
Date | |
Msg-id | 3D931438.22010.133ADAFA@localhost Whole thread Raw |
Responses |
Re: [HACKERS] Performance while loading data and indexing
Re: Performance while loading data and indexing Re: Performance while loading data and indexing Re: Performance while loading data and indexing |
List | pgsql-general |
Hello all, Some time back I posted a query to build a site with 150GB of database. In last couple of weeks, lots of things were tested at my place and there are some results and again some concerns. This is a long post. Please be patient and read thr. If we win this, I guess we have a good marketing/advocacy case here..;-) First the problems (For those who do not read beyond first page) 1) Database load time from flat file using copy is very high 2) Creating index takes huge amount of time. 3) Any suggsestions for runtime as data load and query will be going in parallel. Now the details. Note that this is a test run only.. Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI RedHat7.2/PostgreSQL7.1.3 Database in flat file: 125,000,000 records of around 100 bytes each. Flat file size 12GB Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec. Create unique composite index on 2 char and a timestamp field: 25226 sec. Database size on disk: 26GB Select query: 1.5 sec. for approx. 150 rows. Important postgresql.conf settings sort_mem = 12000 shared_buffers = 24000 fsync=true (Sad but true. Left untouched.. Will that make a difference on SCSI?) wal_buffers = 65536 wal_files = 64 Now the requirements Initial flat data load: 250GB of data. This has gone up since last query. It was 150GB earlier.. Ongoing inserts: 5000/sec. Number of queries: 4800 queries/hour Query response time: 10 sec. Now questions. 1) Instead of copying from a single 12GB data file, will a parallel copy from say 5 files will speed up the things? Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5 setup.. 2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further addition to improve create index performance? 3) 5K concurrent inserts with an index on, will this need a additional CPU power? Like deploying it on dual RISC CPUs etc? 4) Query performance is not a problem. Though 4.8K queries per sec. expected response time from each query is 10 sec. But my guess is some serius CPU power will be chewed there too.. 5)Will upgrading to 7.2.2/7.3 beta help? All in all, in the test, we didn't see the performance where hardware is saturated to it's limits. So effectively we are not able to get postgresql making use of it. Just pushing WAL and shared buffers does not seem to be the solution. If you guys have any suggestions. let me know. I need them all.. Mysql is almost out because it's creating index for last 17 hours. I don't think it will keep up with 5K inserts per sec. with index. SAP DB is under evaluation too. But postgresql is most favourite as of now because it works. So I need to come up with solutions to problems that will occur in near future.. ;-) TIA.. Bye Shridhar -- Law of Procrastination: Procrastination avoids boredom; one never has the feeling that there is nothing important to do.
pgsql-general by date: