Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance

From Shoaib Mir
Subject Re: High update activity, PostgreSQL vs BigDBMS
Date
Msg-id bf54be870612280106n1247520dtb48e10cb4bdc43e9@mail.gmail.com
Whole thread Raw
In response to High update activity, PostgreSQL vs BigDBMS  (Guy Rouillier <guyr-ml1@burntmail.com>)
Responses Re: High update activity, PostgreSQL vs BigDBMS
Re: High update activity, PostgreSQL vs BigDBMS
List pgsql-performance
Here are my few recommendations that might help you:

- You will need to do table partitioning (http://www.postgresql.org/docs/current/static/ddl-partitioning.html ) as you are storing quite a lot of data in one table per day.

- You are using a RAID5 setup which is something that can also affect performance so switching to RAID1 might help you there, but again you have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem.

- Have you done the tuning for postgresql.conf parameters? if not then you really need to do this for like checkpoint segments, random page cost, shared buffers, cache size, fsm pages, vacuum cost delay, work_mem, bgwriter etc etc. You can get good advice for tuning these parameters at --> http://www.powerpostgresql.com/PerfList/

- For autovacuuming you need to properly tune the thresholds so that the vacuum and analyze is done at the right time not affecting the database server performance. (You can find help for this at http://www.postgresql.org/docs/current/static/routine-vacuuming.html under "22.1.4. The auto-vacuum daemon")

- You will need to separate your transactional logs i.e. pg_xlog folder to a different drive other then your database server drive. This can be done by creating symlinks for pg_xlog folder.

- I hope you are doing proper connection pool management, because good use of database connections can be really effect the overall performance, connections can be expensive to create, and consume memory if they are not properly exited.

Hope that helps your tests...

----------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/28/06, Guy Rouillier <guyr-ml1@burntmail.com > wrote:
I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.)  The arrays are set up with RAID5.  So I'm
working with a consistent hardware platform for this comparison.  I'm
only processing a small subset of files (144.)

BigDBMS processed this set of data in 20000 seconds, with all foreign
keys in place.  With all foreign keys in place, PG took 54000 seconds to
complete the same job.  I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference.  What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 30000 seconds.  Better, but BigDBMS
still has it beat significantly.

I've got PG configured so that that the system database is on disk array
2, as are the transaction log files.  The default table space for the
test database is disk array 3.  I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array.  I also store the stats tables on this array.  Finally, I put the
indexes for the stats tables on disk array 4.  I don't use disk array 1
because I believe it is a software array.

I'm out of ideas how to improve this picture any further.  I'd
appreciate some suggestions.  Thanks.

--
Guy Rouillier


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

pgsql-performance by date:

Previous
From: Guy Rouillier
Date:
Subject: High update activity, PostgreSQL vs BigDBMS
Next
From: Dave Cramer
Date:
Subject: Re: High update activity, PostgreSQL vs BigDBMS