Some Solaris notes, and an invitation - Mailing list pgsql-general

From Andrew Sullivan
Subject Some Solaris notes, and an invitation
Date
Msg-id 20020624134429.J20436@mail.libertyrms.com
Whole thread Raw
List pgsql-general
Hi,

I currently have on loan to me a Sun StorEdge A5200.  It's a 22-disk
array that I currently have configured with Veritas and RAID 1+0
(which is how our production environment is set).  The A5200 is
attached by fibre.

I wanted to do some tests to find out where there might be
bottle-necks in our current config.  Nothing too surprising, but I
thought I'd share these results as they might be useful to others.

I tested with 7.2.1 compiled as a 64-bit application with gcc 3.0.1.
We're using Solaris 7.  The host machine is a Sun E450, 4 ways, with
2 Gig of RAM.  I wasn't testing for memory settings, so I picked
(somewhat arbitrarily) a shared buffers setting of 16384.  (Most of
the tests I was working on were write-only tests, where the buffers
wouldn't matter anyway.)

One problem we have is that benchmarking is still tricky.  The OSDB
suite is nice, and now works without immediately dumping core on
64-bit Solaris (many, many thanks go to Andy Riebs for his work on
that project), but it does crash at the end of its run (I haven't had
a chance to dig into that any more).  Using it for repeated one-off
tests is therefore not practical, although it's extremely handy for
larger-scale checks.  Several people have noted that pgbench is not
terribly consistent, and so when you're dealing in sub-second trends,
its numbers aren't usually helpful.  I did note that it's quite good
for exercising locking problems: 100 clients doing 100 transactions
will cause all kinds of UPDATE WAITING conditions, which are nice to
be able to cause if you're looking for problems to do with
concurrency.

In order to discover straight differences between one change and
another, I ended up using a combination of truss and time.  This is a
way less-than-optimal way of testing, and I don't suggest it.  But at
least if you alter one option, and can get consistently better
numbers for one setting than for the other, you can be reasonbly
certain that the better one really is better.  (It also offers the
client's perspective, so you get to see whether a change in a setting
really helps.  That at least tells you that, if the change has made a
difference on server performance but not on client, you haven't found
the magic bottleneck.)

I disovered, for what it's worth, that as might be expected,
open_datasync is the fastest, although only on the order of a few
milliseconds compared to open_sync.  I was a little surprised at how
little the difference was, but I always expected open_datasync to be
the fastest.  This is consistent with other tests I've done on
Solaris, too, so I think the fsync method for Solaris should probably
be open_datasync.  It currently defaults to fsync, which is pretty
bad.

(Note that the A5200 doesn't have any intelligence, so turning off
fsync makes a difference still.  For us, the speed penalty is worth
the additional safety.)

I also discovered that this disk is so much faster than other systems
that it doesn't matter where you put the pg_xlog (I was sort of
surprised by this one).  I tried configuring another striped pair on
an internal SCSI controller, and could consistently get single-insert
times a few milliseconds slower than when the pg_xlog was in the main
database cluster.  I thought this might just be due to disk speed and
controller speed, so we tried configuring another stripe on the same
array.  That made no difference on average, but the resulting times
were not always consistent.  My best guess is that the inconsistency
correlates with the moving of data from WAL to the main data area,
but I haven't figured out a way to prove it.

For our purposes, increasing the WAL_BUFFERS was useful.  I have a
suspicion that this is one area which gets overlooked by a lot of
people, because I sometimes get blank looks when I ask about it.  We
have always set this in order to avoid the overhead of creating new
segments.  After the surprise of the xlog, though, I thought I'd
check to see whether this was a real problem.  It is.  With only 10
clients doing 10000 inserts each, I was able to cause new segments to
be created.  The creation added as much as a second to run times.
The difference was enough to "feel", which indicates that WAL tuning
is important.

I also did some other tests, but they were all related to specific
things that I'd identified as issues with our app I wanted to
investigate.

I'm done with our own tests, and no-one has shown up to collect the
machine yet.  I'm happy to do some other investigation if anyone has
anything they really want to know.  I have been thinking of trying to
do some optimiser tweak tests, but every time I've done any tests
with that, my results have been pretty inconclusive.  Any suggestions
are welcome.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110




pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Table name length (maximum identifier length)
Next
From: "William N. Zanatta"
Date:
Subject: Help with Arrays and References