Some Solaris notes, and an invitation - Mailing list pgsql-general
From | Andrew Sullivan |
---|---|
Subject | Some Solaris notes, and an invitation |
Date | |
Msg-id | 20020624134429.J20436@mail.libertyrms.com Whole thread Raw |
List | pgsql-general |
Hi, I currently have on loan to me a Sun StorEdge A5200. It's a 22-disk array that I currently have configured with Veritas and RAID 1+0 (which is how our production environment is set). The A5200 is attached by fibre. I wanted to do some tests to find out where there might be bottle-necks in our current config. Nothing too surprising, but I thought I'd share these results as they might be useful to others. I tested with 7.2.1 compiled as a 64-bit application with gcc 3.0.1. We're using Solaris 7. The host machine is a Sun E450, 4 ways, with 2 Gig of RAM. I wasn't testing for memory settings, so I picked (somewhat arbitrarily) a shared buffers setting of 16384. (Most of the tests I was working on were write-only tests, where the buffers wouldn't matter anyway.) One problem we have is that benchmarking is still tricky. The OSDB suite is nice, and now works without immediately dumping core on 64-bit Solaris (many, many thanks go to Andy Riebs for his work on that project), but it does crash at the end of its run (I haven't had a chance to dig into that any more). Using it for repeated one-off tests is therefore not practical, although it's extremely handy for larger-scale checks. Several people have noted that pgbench is not terribly consistent, and so when you're dealing in sub-second trends, its numbers aren't usually helpful. I did note that it's quite good for exercising locking problems: 100 clients doing 100 transactions will cause all kinds of UPDATE WAITING conditions, which are nice to be able to cause if you're looking for problems to do with concurrency. In order to discover straight differences between one change and another, I ended up using a combination of truss and time. This is a way less-than-optimal way of testing, and I don't suggest it. But at least if you alter one option, and can get consistently better numbers for one setting than for the other, you can be reasonbly certain that the better one really is better. (It also offers the client's perspective, so you get to see whether a change in a setting really helps. That at least tells you that, if the change has made a difference on server performance but not on client, you haven't found the magic bottleneck.) I disovered, for what it's worth, that as might be expected, open_datasync is the fastest, although only on the order of a few milliseconds compared to open_sync. I was a little surprised at how little the difference was, but I always expected open_datasync to be the fastest. This is consistent with other tests I've done on Solaris, too, so I think the fsync method for Solaris should probably be open_datasync. It currently defaults to fsync, which is pretty bad. (Note that the A5200 doesn't have any intelligence, so turning off fsync makes a difference still. For us, the speed penalty is worth the additional safety.) I also discovered that this disk is so much faster than other systems that it doesn't matter where you put the pg_xlog (I was sort of surprised by this one). I tried configuring another striped pair on an internal SCSI controller, and could consistently get single-insert times a few milliseconds slower than when the pg_xlog was in the main database cluster. I thought this might just be due to disk speed and controller speed, so we tried configuring another stripe on the same array. That made no difference on average, but the resulting times were not always consistent. My best guess is that the inconsistency correlates with the moving of data from WAL to the main data area, but I haven't figured out a way to prove it. For our purposes, increasing the WAL_BUFFERS was useful. I have a suspicion that this is one area which gets overlooked by a lot of people, because I sometimes get blank looks when I ask about it. We have always set this in order to avoid the overhead of creating new segments. After the surprise of the xlog, though, I thought I'd check to see whether this was a real problem. It is. With only 10 clients doing 10000 inserts each, I was able to cause new segments to be created. The creation added as much as a second to run times. The difference was enough to "feel", which indicates that WAL tuning is important. I also did some other tests, but they were all related to specific things that I'd identified as issues with our app I wanted to investigate. I'm done with our own tests, and no-one has shown up to collect the machine yet. I'm happy to do some other investigation if anyone has anything they really want to know. I have been thinking of trying to do some optimiser tweak tests, but every time I've done any tests with that, my results have been pretty inconclusive. Any suggestions are welcome. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
pgsql-general by date: