Home > mailing lists
Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance

From	Dave Cramer
Subject	Re: High update activity, PostgreSQL vs BigDBMS
Date	January 6, 2007 03:05:55
Msg-id	BB4FC28F-BDC1-424F-B330-EE9204404F1A@fastcrypt.com Whole thread Raw
In response to	Re: High update activity, PostgreSQL vs BigDBMS (Guy Rouillier <guyr-ml1@burntmail.com>)
Responses	Re: High update activity, PostgreSQL vs BigDBMS
List	pgsql-performance
Tree view
On 5-Jan-07, at 9:51 PM, Guy Rouillier wrote:

> I've got back access to my test system.  I ran another test run
> with the same input data set.  This time I put pg_xlog on a
> different RAID volume (the unused one that I suspect is a software
> RAID), and I turned fsync=off in postgresql.conf.  I left the rest
> of the configuration alone (all foreign keys removed), etc.
> Unfortunately, this only dropped elapsed time down to about 28000
> seconds (from 30000), still significantly more than BigDBMS.
> Additional info inline below.
>
> Shoaib Mir wrote:
>> Here are my few recommendations that might help you:
>> - You will need to do table partitioning (http://
>> www.postgresql.org/docs/current/static/ddl-partitioning.html
>> <http://www.postgresql.org/docs/current/static/ddl-
>> partitioning.html>) as you are storing quite a lot of data in one
>> table per day.
>
> I'm focusing on the detailed perspective for now.  The 144 files
> I'm processing represent not even two hours of data, so that surely
> wouldn't be split up.
>
>> - You are using a RAID5 setup which is something that can also
>> affect performance so switching to RAID1 might help you there, but
>> again you have a RAID5 with 12 disks so hmm that shouldn't be that
>> much of a problem.
>
> Agreed.
>
>> - Have you done the tuning for postgresql.conf parameters? if not
>> then you really need to do this for like checkpoint segments,
>> random page cost, shared buffers, cache size, fsm pages, vacuum
>> cost delay, work_mem, bgwriter etc etc. You can get good advice
>> for tuning these parameters at --> http://www.powerpostgresql.com/
>> PerfList/
>
> The box has 3 GB of memory.  I would think that BigDBMS would be
> hurt by this more than PG.  Here are the settings I've modified in
> postgresql.conf:

As I said you need to set shared_buffers to at least 750MB this is
the starting point, it can actually go higher. Additionally effective
cache should be set to 2.25 G turning fsync is not a real world
situation. Additional tuning of file systems can provide some gain,
however as Craig pointed out some queries may need to be tweaked.
>
> autovacuum=on
> stats_row_level = on
> max_connections = 10
> listen_addresses = 'db01,localhost'
> shared_buffers = 128MB
> work_mem = 16MB
> maintenance_work_mem = 64MB
> temp_buffers = 32MB
> max_fsm_pages = 204800
> checkpoint_segments = 30
> redirect_stderr = on
> log_line_prefix = '%t %d'
>
>> - For autovacuuming you need to properly tune the thresholds so
>> that the vacuum and analyze is done at the right time not
>> affecting the database server performance. (You can find help for
>> this at http://www.postgresql.org/docs/current/static/routine-
>> vacuuming.html under "22.1.4. The auto-vacuum daemon")
>
> The real-life load on this database would be fairly constant
> throughout the day.  Stats from network devices are received every
> 15 minutes from each device, but they are staggered.  As a result,
> the database is almost constantly being updated, so there is no
> dead time to do vacuums.
>
>> - You will need to separate your transactional logs i.e. pg_xlog
>> folder to a different drive other then your database server drive.
>> This can be done by creating symlinks for pg_xlog folder.
>
> Done, see opening remarks.  Unfortunately minor impact.
>
>> - I hope you are doing proper connection pool management, because
>> good use of database connections can be really effect the overall
>> performance, connections can be expensive to create, and consume
>> memory if they are not properly exited.
>
> I probably should have mentioned this originally but was afraid of
> information overload.  The application runs on JBoss and uses JBoss
> connection pools.  So connections are pooled, but I don't know how
> they would compare to native PG connection pools.  Essentially,
> JBoss gets native JDBC connections, and the pools simply allow them
> to be re-used without opening and closing each time.  So if the
> native PG connection pools provide any pooling optimizations beyond
> that, those advantages are not being realized.

the PG Connection pools will not help, they do not currently provide
any extra optimization.

Dave
>
>> Hope that helps your tests...
>
> Thanks to everyone for providing suggestions, and I apologize for
> my delay in responding to each of them.
>
>> ----------------
>> Shoaib Mir
>> EnterpriseDB (www.enterprisedb.com <http://www.enterprisedb.com>)
>> On 12/28/06, *Guy Rouillier* <guyr-ml1@burntmail.com <mailto:guyr-
>> ml1@burntmail.com>> wrote:
>>     I don't want to violate any license agreement by discussing
>>     performance,
>>     so I'll refer to a large, commercial PostgreSQL-compatible
>> DBMS only as
>>     BigDBMS here.
>>     I'm trying to convince my employer to replace BigDBMS with
>> PostgreSQL
>>     for at least some of our Java applications.  As a proof of
>> concept, I
>>     started with a high-volume (but conceptually simple) network data
>>     collection application.  This application collects files of 5-
>> minute
>>     usage statistics from our network devices, and stores a raw
>> form of
>>     these stats into one table and a normalized form into a second
>> table.
>>     We are currently storing about 12 million rows a day in the
>> normalized
>>     table, and each month we start new tables.  For the normalized
>> data, the
>>     app inserts rows initialized to zero for the entire current
>> day first
>>     thing in the morning, then throughout the day as stats are
>> received,
>>     executes updates against existing rows.  So the app has very
>> high update
>>     activity.
>>     In my test environment, I have a dual-x86 Linux platform
>> running the
>>     application, and an old 4-CPU Sun Enterprise 4500 running
>> BigDBMS and
>>     PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk
>> arrays
>>     attached, each with 12 SCSI hard disks (a D1000 and 3 A1000,
>> for those
>>     familiar with these devices.)  The arrays are set up with
>> RAID5.  So I'm
>>     working with a consistent hardware platform for this
>> comparison.  I'm
>>     only processing a small subset of files (144.)
>>     BigDBMS processed this set of data in 20000 seconds, with all
>> foreign
>>     keys in place.  With all foreign keys in place, PG took 54000
>> seconds to
>>     complete the same job.  I've tried various approaches to
>> autovacuum
>>     (none, 30-seconds) and it doesn't seem to make much
>> difference.  What
>>     does seem to make a difference is eliminating all the foreign
>> keys; in
>>     that configuration, PG takes about 30000 seconds.  Better, but
>> BigDBMS
>>     still has it beat significantly.
>>     I've got PG configured so that that the system database is on
>> disk
>>     array
>>     2, as are the transaction log files.  The default table space
>> for the
>>     test database is disk array 3.  I've got all the reference
>> tables (the
>>     tables to which the foreign keys in the stats tables refer) on
>> this
>>     array.  I also store the stats tables on this array.  Finally,
>> I put the
>>     indexes for the stats tables on disk array 4.  I don't use
>> disk array 1
>>     because I believe it is a software array.
>>     I'm out of ideas how to improve this picture any further.  I'd
>>     appreciate some suggestions.  Thanks.
>>     --
>>     Guy Rouillier
>>     ---------------------------(end of
>> broadcast)---------------------------
>>     TIP 5: don't forget to increase your free space map settings
>
>
> --
> Guy Rouillier
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
pgsql-performance by date:
From: "Craig A. James"
Date: 06 January 2007, 02:13:52
Subject: Re: High update activity, PostgreSQL vs BigDBMS
From: "Simon Riggs"
Date: 06 January 2007, 09:18:22
Subject: Re: Worse perfomance on 8.2.0 than on 7.4.14
Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance

Previous

Next