Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance
From | Dave Cramer |
---|---|
Subject | Re: High update activity, PostgreSQL vs BigDBMS |
Date | |
Msg-id | BB4FC28F-BDC1-424F-B330-EE9204404F1A@fastcrypt.com Whole thread Raw |
In response to | Re: High update activity, PostgreSQL vs BigDBMS (Guy Rouillier <guyr-ml1@burntmail.com>) |
Responses |
Re: High update activity, PostgreSQL vs BigDBMS
|
List | pgsql-performance |
On 5-Jan-07, at 9:51 PM, Guy Rouillier wrote: > I've got back access to my test system. I ran another test run > with the same input data set. This time I put pg_xlog on a > different RAID volume (the unused one that I suspect is a software > RAID), and I turned fsync=off in postgresql.conf. I left the rest > of the configuration alone (all foreign keys removed), etc. > Unfortunately, this only dropped elapsed time down to about 28000 > seconds (from 30000), still significantly more than BigDBMS. > Additional info inline below. > > Shoaib Mir wrote: >> Here are my few recommendations that might help you: >> - You will need to do table partitioning (http:// >> www.postgresql.org/docs/current/static/ddl-partitioning.html >> <http://www.postgresql.org/docs/current/static/ddl- >> partitioning.html>) as you are storing quite a lot of data in one >> table per day. > > I'm focusing on the detailed perspective for now. The 144 files > I'm processing represent not even two hours of data, so that surely > wouldn't be split up. > >> - You are using a RAID5 setup which is something that can also >> affect performance so switching to RAID1 might help you there, but >> again you have a RAID5 with 12 disks so hmm that shouldn't be that >> much of a problem. > > Agreed. > >> - Have you done the tuning for postgresql.conf parameters? if not >> then you really need to do this for like checkpoint segments, >> random page cost, shared buffers, cache size, fsm pages, vacuum >> cost delay, work_mem, bgwriter etc etc. You can get good advice >> for tuning these parameters at --> http://www.powerpostgresql.com/ >> PerfList/ > > The box has 3 GB of memory. I would think that BigDBMS would be > hurt by this more than PG. Here are the settings I've modified in > postgresql.conf: As I said you need to set shared_buffers to at least 750MB this is the starting point, it can actually go higher. Additionally effective cache should be set to 2.25 G turning fsync is not a real world situation. Additional tuning of file systems can provide some gain, however as Craig pointed out some queries may need to be tweaked. > > autovacuum=on > stats_row_level = on > max_connections = 10 > listen_addresses = 'db01,localhost' > shared_buffers = 128MB > work_mem = 16MB > maintenance_work_mem = 64MB > temp_buffers = 32MB > max_fsm_pages = 204800 > checkpoint_segments = 30 > redirect_stderr = on > log_line_prefix = '%t %d' > >> - For autovacuuming you need to properly tune the thresholds so >> that the vacuum and analyze is done at the right time not >> affecting the database server performance. (You can find help for >> this at http://www.postgresql.org/docs/current/static/routine- >> vacuuming.html under "22.1.4. The auto-vacuum daemon") > > The real-life load on this database would be fairly constant > throughout the day. Stats from network devices are received every > 15 minutes from each device, but they are staggered. As a result, > the database is almost constantly being updated, so there is no > dead time to do vacuums. > >> - You will need to separate your transactional logs i.e. pg_xlog >> folder to a different drive other then your database server drive. >> This can be done by creating symlinks for pg_xlog folder. > > Done, see opening remarks. Unfortunately minor impact. > >> - I hope you are doing proper connection pool management, because >> good use of database connections can be really effect the overall >> performance, connections can be expensive to create, and consume >> memory if they are not properly exited. > > I probably should have mentioned this originally but was afraid of > information overload. The application runs on JBoss and uses JBoss > connection pools. So connections are pooled, but I don't know how > they would compare to native PG connection pools. Essentially, > JBoss gets native JDBC connections, and the pools simply allow them > to be re-used without opening and closing each time. So if the > native PG connection pools provide any pooling optimizations beyond > that, those advantages are not being realized. the PG Connection pools will not help, they do not currently provide any extra optimization. Dave > >> Hope that helps your tests... > > Thanks to everyone for providing suggestions, and I apologize for > my delay in responding to each of them. > >> ---------------- >> Shoaib Mir >> EnterpriseDB (www.enterprisedb.com <http://www.enterprisedb.com>) >> On 12/28/06, *Guy Rouillier* <guyr-ml1@burntmail.com <mailto:guyr- >> ml1@burntmail.com>> wrote: >> I don't want to violate any license agreement by discussing >> performance, >> so I'll refer to a large, commercial PostgreSQL-compatible >> DBMS only as >> BigDBMS here. >> I'm trying to convince my employer to replace BigDBMS with >> PostgreSQL >> for at least some of our Java applications. As a proof of >> concept, I >> started with a high-volume (but conceptually simple) network data >> collection application. This application collects files of 5- >> minute >> usage statistics from our network devices, and stores a raw >> form of >> these stats into one table and a normalized form into a second >> table. >> We are currently storing about 12 million rows a day in the >> normalized >> table, and each month we start new tables. For the normalized >> data, the >> app inserts rows initialized to zero for the entire current >> day first >> thing in the morning, then throughout the day as stats are >> received, >> executes updates against existing rows. So the app has very >> high update >> activity. >> In my test environment, I have a dual-x86 Linux platform >> running the >> application, and an old 4-CPU Sun Enterprise 4500 running >> BigDBMS and >> PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk >> arrays >> attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, >> for those >> familiar with these devices.) The arrays are set up with >> RAID5. So I'm >> working with a consistent hardware platform for this >> comparison. I'm >> only processing a small subset of files (144.) >> BigDBMS processed this set of data in 20000 seconds, with all >> foreign >> keys in place. With all foreign keys in place, PG took 54000 >> seconds to >> complete the same job. I've tried various approaches to >> autovacuum >> (none, 30-seconds) and it doesn't seem to make much >> difference. What >> does seem to make a difference is eliminating all the foreign >> keys; in >> that configuration, PG takes about 30000 seconds. Better, but >> BigDBMS >> still has it beat significantly. >> I've got PG configured so that that the system database is on >> disk >> array >> 2, as are the transaction log files. The default table space >> for the >> test database is disk array 3. I've got all the reference >> tables (the >> tables to which the foreign keys in the stats tables refer) on >> this >> array. I also store the stats tables on this array. Finally, >> I put the >> indexes for the stats tables on disk array 4. I don't use >> disk array 1 >> because I believe it is a software array. >> I'm out of ideas how to improve this picture any further. I'd >> appreciate some suggestions. Thanks. >> -- >> Guy Rouillier >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings > > > -- > Guy Rouillier > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
pgsql-performance by date: