Home > mailing lists

Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance

From	Guy Rouillier
Subject	Re: High update activity, PostgreSQL vs BigDBMS
Date	January 5, 2007 22:51:44
Msg-id	459F0EA6.30403@burntmail.com Whole thread Raw
In response to	Re: High update activity, PostgreSQL vs BigDBMS ("Shoaib Mir" <shoaibmir@gmail.com>)
Responses	Re: High update activity, PostgreSQL vs BigDBMS Re: High update activity, PostgreSQL vs BigDBMS
List	pgsql-performance

Tree view

I've got back access to my test system.  I ran another test run with the
same input data set.  This time I put pg_xlog on a different RAID volume
(the unused one that I suspect is a software RAID), and I turned
fsync=off in postgresql.conf.  I left the rest of the configuration
alone (all foreign keys removed), etc.  Unfortunately, this only dropped
elapsed time down to about 28000 seconds (from 30000), still
significantly more than BigDBMS.  Additional info inline below.

Shoaib Mir wrote:
> Here are my few recommendations that might help you:
>
> - You will need to do table partitioning
> (http://www.postgresql.org/docs/current/static/ddl-partitioning.html
> <http://www.postgresql.org/docs/current/static/ddl-partitioning.html>)
> as you are storing quite a lot of data in one table per day.

I'm focusing on the detailed perspective for now.  The 144 files I'm
processing represent not even two hours of data, so that surely wouldn't
be split up.

>
> - You are using a RAID5 setup which is something that can also affect
> performance so switching to RAID1 might help you there, but again you
> have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem.

Agreed.

>
> - Have you done the tuning for postgresql.conf parameters? if not then
> you really need to do this for like checkpoint segments, random page
> cost, shared buffers, cache size, fsm pages, vacuum cost delay,
> work_mem, bgwriter etc etc. You can get good advice for tuning these
> parameters at --> http://www.powerpostgresql.com/PerfList/

The box has 3 GB of memory.  I would think that BigDBMS would be hurt by
this more than PG.  Here are the settings I've modified in postgresql.conf:

autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

>
> - For autovacuuming you need to properly tune the thresholds so that the
> vacuum and analyze is done at the right time not affecting the database
> server performance. (You can find help for this at
> http://www.postgresql.org/docs/current/static/routine-vacuuming.html
> under "22.1.4. The auto-vacuum daemon")

The real-life load on this database would be fairly constant throughout
the day.  Stats from network devices are received every 15 minutes from
each device, but they are staggered.  As a result, the database is
almost constantly being updated, so there is no dead time to do vacuums.

>
> - You will need to separate your transactional logs i.e. pg_xlog folder
> to a different drive other then your database server drive. This can be
> done by creating symlinks for pg_xlog folder.

Done, see opening remarks.  Unfortunately minor impact.

>
> - I hope you are doing proper connection pool management, because good
> use of database connections can be really effect the overall
> performance, connections can be expensive to create, and consume memory
> if they are not properly exited.

I probably should have mentioned this originally but was afraid of
information overload.  The application runs on JBoss and uses JBoss
connection pools.  So connections are pooled, but I don't know how they
would compare to native PG connection pools.  Essentially, JBoss gets
native JDBC connections, and the pools simply allow them to be re-used
without opening and closing each time.  So if the native PG connection
pools provide any pooling optimizations beyond that, those advantages
are not being realized.

>
> Hope that helps your tests...

Thanks to everyone for providing suggestions, and I apologize for my
delay in responding to each of them.

>
> ----------------
> Shoaib Mir
> EnterpriseDB (www.enterprisedb.com <http://www.enterprisedb.com>)
>
> On 12/28/06, *Guy Rouillier* <guyr-ml1@burntmail.com
> <mailto:guyr-ml1@burntmail.com>> wrote:
>
>     I don't want to violate any license agreement by discussing
>     performance,
>     so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
>     BigDBMS here.
>
>     I'm trying to convince my employer to replace BigDBMS with PostgreSQL
>     for at least some of our Java applications.  As a proof of concept, I
>     started with a high-volume (but conceptually simple) network data
>     collection application.  This application collects files of 5-minute
>     usage statistics from our network devices, and stores a raw form of
>     these stats into one table and a normalized form into a second table.
>     We are currently storing about 12 million rows a day in the normalized
>     table, and each month we start new tables.  For the normalized data, the
>     app inserts rows initialized to zero for the entire current day first
>     thing in the morning, then throughout the day as stats are received,
>     executes updates against existing rows.  So the app has very high update
>     activity.
>
>     In my test environment, I have a dual-x86 Linux platform running the
>     application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
>     PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
>     attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
>     familiar with these devices.)  The arrays are set up with RAID5.  So I'm
>     working with a consistent hardware platform for this comparison.  I'm
>     only processing a small subset of files (144.)
>
>     BigDBMS processed this set of data in 20000 seconds, with all foreign
>     keys in place.  With all foreign keys in place, PG took 54000 seconds to
>     complete the same job.  I've tried various approaches to autovacuum
>     (none, 30-seconds) and it doesn't seem to make much difference.  What
>     does seem to make a difference is eliminating all the foreign keys; in
>     that configuration, PG takes about 30000 seconds.  Better, but BigDBMS
>     still has it beat significantly.
>
>     I've got PG configured so that that the system database is on disk
>     array
>     2, as are the transaction log files.  The default table space for the
>     test database is disk array 3.  I've got all the reference tables (the
>     tables to which the foreign keys in the stats tables refer) on this
>     array.  I also store the stats tables on this array.  Finally, I put the
>     indexes for the stats tables on disk array 4.  I don't use disk array 1
>     because I believe it is a software array.
>
>     I'm out of ideas how to improve this picture any further.  I'd
>     appreciate some suggestions.  Thanks.
>
>     --
>     Guy Rouillier
>
>
>     ---------------------------(end of broadcast)---------------------------
>     TIP 5: don't forget to increase your free space map settings
>
>


--
Guy Rouillier

pgsql-performance by date:

From: Reid Thompson
Date: 05 January 2007, 14:42:29
Subject: Re: PostgreSQL to host e-mail?

From: "Craig A. James"
Date: 05 January 2007, 23:13:52
Subject: Re: High update activity, PostgreSQL vs BigDBMS

Re: High update activity, PostgreSQL vs BigDBMS - Mailing list pgsql-performance

Previous

Next