Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance

From Guy Rouillier
Subject Re: High update activity, PostgreSQL vs BigDBMS
Date
Msg-id 46C625A5.5030107@burntmail.com
Whole thread Raw
In response to High update activity, PostgreSQL vs BigDBMS  (Guy Rouillier <guyr-ml1@burntmail.com>)
List pgsql-performance
I originally posted the question below back in Dec 2006, and many
helpful suggestions resulted.  Unfortunately, since this was a closet
effort, my official duties pushed further exploration to the back
burner, then I lost my original test environment.  So while I can no
longer compare to BigDBMS, I've just made some discoveries that I
thought others might find helpful.

The app (which I inherited) was implemented making exhaustive use of
stored procedures.  All inserts and updates are done using procs.  When
configuration changes produced no noticeable improvements in
performance, I turned to the application architecture.  In a new
environment, I updated an insert/update intensive part of the app to use
embedded insert and update statements instead of invoking stored
procedures that did the same work.  All the remaining code, database
implementation, hardware, etc remains the same.

The results were significant.  Running a repeatable test set of data
produced the following results:

With stored procs: 2595 seconds
With embedded inserts/updates: 991 seconds

So at least in this one scenario, it looks like the extensive use of
stored procs is contributing significantly to long run times.

Guy Rouillier wrote:
> I don't want to violate any license agreement by discussing performance,
> so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
> BigDBMS here.
>
> I'm trying to convince my employer to replace BigDBMS with PostgreSQL
> for at least some of our Java applications.  As a proof of concept, I
> started with a high-volume (but conceptually simple) network data
> collection application.  This application collects files of 5-minute
> usage statistics from our network devices, and stores a raw form of
> these stats into one table and a normalized form into a second table. We
> are currently storing about 12 million rows a day in the normalized
> table, and each month we start new tables.  For the normalized data, the
> app inserts rows initialized to zero for the entire current day first
> thing in the morning, then throughout the day as stats are received,
> executes updates against existing rows.  So the app has very high update
> activity.
>
> In my test environment, I have a dual-x86 Linux platform running the
> application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
> PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
> attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
> familiar with these devices.)  The arrays are set up with RAID5.  So I'm
> working with a consistent hardware platform for this comparison.  I'm
> only processing a small subset of files (144.)
>
> BigDBMS processed this set of data in 20000 seconds, with all foreign
> keys in place.  With all foreign keys in place, PG took 54000 seconds to
> complete the same job.  I've tried various approaches to autovacuum
> (none, 30-seconds) and it doesn't seem to make much difference.  What
> does seem to make a difference is eliminating all the foreign keys; in
> that configuration, PG takes about 30000 seconds.  Better, but BigDBMS
> still has it beat significantly.
>
> I've got PG configured so that that the system database is on disk array
> 2, as are the transaction log files.  The default table space for the
> test database is disk array 3.  I've got all the reference tables (the
> tables to which the foreign keys in the stats tables refer) on this
> array.  I also store the stats tables on this array.  Finally, I put the
> indexes for the stats tables on disk array 4.  I don't use disk array 1
> because I believe it is a software array.
>
> I'm out of ideas how to improve this picture any further.  I'd
> appreciate some suggestions.  Thanks.
>


--
Guy Rouillier

pgsql-performance by date:

Previous
From: Frank Schoep
Date:
Subject: Re: Bad planner decision - bitmap scan instead of index
Next
From: "Jozsef Szalay"
Date:
Subject: Re: Simple select hangs while CPU close to 100% - Analyze