Thread: Planning a new server - help needed

Planning a new server - help needed

From
Laszlo Nagy
Date:
Hello,

I need to install a new server for postgresql 8.3. It will run two
databases, web server and some background programs. We already have a
server but it is becoming slow and we would like to have something that
is faster. It is a cost sensitive application, and I would like to get
your opinion in some questions.

The database itself is an OLTP system. There are many smaller tables,
and some bigger ones (biggest table with 1.2 million records, table size
966MB, indexes size 790MB). In the bigger tables there are only a few
records updated frequently, most of the other records are not changed.
The smaller tables are updated continuously.

Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs
say that it is better to use FreeBSD because it can alter the I/O
priority of processes dynamically. The latest legacy release is 6.3
which is probably more stable. However, folks say that 7.0 has superior
performance on the same hardware. Can I use 7.0 on a production server?

Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller
with 8 SATA 2 disks. The operating system would be on another disk pair,
connected to the motherboard's controller. I wonder if I can get more
performance with SCSI, for the same amount of money? (I can spend about
$1500 on the controller and the disks, that would cover 10 SATA 2 disks
and the controller.)

Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS
2 + soft updates will be better, but I'm not sure. Which is better?

Question 4. How to make the partitions? This is the hardest question.
Here is my plan:

- the OS resides on 2 disks, RAID 1
- the databases should go on 8 disks, RAID 0 + 1

However, the transaction log file should be on a separate disk and maybe
I could gain more performance by putting indexes on a separate drive,
but I do not want to reduce the number of disks in the RAID 0+1 array.
Should I put indexes and transaction log on the RAID 1 array? Or should
I invest a bit more money, add an SATA RAID controller with 16 channels
and add more disks? Would it pay the bill? Another alternative is to put
the biggest tables on a separate array so that it will be faster when we
join these tables with other tables.

I know that it is hard to answer without knowing the structure of the
databases. :-( I can make tests with different configurations later, but
I would like to know your opinion first - what should I try?

Thanks,

   Laszlo


Re: Planning a new server - help needed

From
"Claus Guttesen"
Date:
>  I need to install a new server for postgresql 8.3. It will run two
>  databases, web server and some background programs. We already have a
>  server but it is becoming slow and we would like to have something that
>  is faster. It is a cost sensitive application, and I would like to get
>  your opinion in some questions.
>
>  The database itself is an OLTP system. There are many smaller tables,
>  and some bigger ones (biggest table with 1.2 million records, table size
>  966MB, indexes size 790MB). In the bigger tables there are only a few
>  records updated frequently, most of the other records are not changed.
>  The smaller tables are updated continuously.
>
>  Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs
>  say that it is better to use FreeBSD because it can alter the I/O
>  priority of processes dynamically. The latest legacy release is 6.3
>  which is probably more stable. However, folks say that 7.0 has superior
>  performance on the same hardware. Can I use 7.0 on a production server?

I guess you mean postgresql 8.3.1? :-)

I use FreeBSD 7 release on a 8-way HP DL360 G5 with a ciss controller.
Works out of the box and I haven't had any issue with 7.0 at all.

>  Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller
>  with 8 SATA 2 disks. The operating system would be on another disk pair,
>  connected to the motherboard's controller. I wonder if I can get more
>  performance with SCSI, for the same amount of money? (I can spend about
>  $1500 on the controller and the disks, that would cover 10 SATA 2 disks
>  and the controller.)

SAS would probably be the way to go. I haven't tried the
rocketraid-controller. I use the built-in p400i-controller on my
servers using the ciss-driver. I've heard many positive remarks about
areca.

>  Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS
>  2 + soft updates will be better, but I'm not sure. Which is better?

I'd stick with ufs2 atm. There are some issues with zfs which probably
have been ironed out by now but ufs2 has been deployed for a longer
time. Performance-wise they are about the same.

>  Question 4. How to make the partitions? This is the hardest question.
>  Here is my plan:
>
>  - the OS resides on 2 disks, RAID 1
>  - the databases should go on 8 disks, RAID 0 + 1

If you have enough disks raid-6 should perform almost as good as raid
1+0. I've setup 11 disks in raid-6 plus one hotspare so I can get more
space out of it. "Enough disks" are approx. eight and up.

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Re: Planning a new server - help needed

From
"Reko Turja"
Date:
> Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg
> docs say that it is better to use FreeBSD because it can alter the
> I/O priority of processes dynamically. The latest legacy release is
> 6.3 which is probably more stable. However, folks say that 7.0 has
> superior performance on the same hardware. Can I use 7.0 on a
> production server?

FreeBSD 7.x is pretty stable, and it has the advantage of having the
new ULE and other things that can't be MFC'd to 6.x branch. And as a
long time FreeBSD enthusiast having Cisco sponsoring Dtrace, Nokia
sponsoring scheduler development etc. 7.x is definitely in my opinion
now the branch to install and start following for ease of upgrading
later. Of course, as always check that your intended hardware is
supported.

ULE which is pretty much the key for performance boost in 7.x branch
isn't yet the default scheduler, but will be in 7.1 and afterwards.
This means you have to roll custom kernel if you want to use ULE.

> Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that
> UFS 2 + soft updates will be better, but I'm not sure. Which is
> better?

For now I'd choose between UFS+gjournal or plain UFS, although with
bigger disks journaling is a boon compared to fsck'ing plain UFS
partition. ZFS isn't yet ready for production I think, but it seems to
be getting there. This is opinion based on bug reports and discussions
in stable and current mailing lists, not on personal testing though.
My experiences with gjournal have been positive so far.

On the drives and controller - I'm not sure whether SCSI/SAS will give
any noticeable boost over SATA, but based on personal experience SCSI
is still ahead on terms of drive reliability. Whatever technology I'd
choose, for production server getting decent battery backed controller
would be the start. And of course a controller that does the RAID's in
hardware.

-Reko


Re: Planning a new server - help needed

From
Laszlo Nagy
Date:
> I guess you mean postgresql 8.3.1? :-)
>
Yep. Sorry.
>>  Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS
>>  2 + soft updates will be better, but I'm not sure. Which is better?
>>
>
> I'd stick with ufs2 atm. There are some issues with zfs which probably
> have been ironed out by now but ufs2 has been deployed for a longer
> time. Performance-wise they are about the same.
>
Thank you. I suspected the same but it was good to get positive
confirmation.
>>  Question 4. How to make the partitions? This is the hardest question.
>>  Here is my plan:
>>
>>  - the OS resides on 2 disks, RAID 1
>>  - the databases should go on 8 disks, RAID 0 + 1
>>
>
> If you have enough disks raid-6 should perform almost as good as raid
> 1+0.
Hmm, I have heard that RAID 1 or RAID 1 + 0 should be used for
databases,  never RAID 5. I know nothing about RAID 6. I guess I must
accept your suggestion since you have more experience than I have. :-)
Obviously, it would be easier to manage a single RAID 6 array.
> I've setup 11 disks in raid-6 plus one hotspare so I can get more
> space out of it. "Enough disks" are approx. eight and up.
>
The RAID controller that I have selected can only handle 8 disks. I
guess I need to find a different one with 16 channels and use more
disks. So are you saying that with all disks in a bigger RAID 6 array, I
will get the most out of the hardware? In that case, I'll try to get a
bit more money from the management and build RAID 6 with 12 disks.

I also feel that I need to use a separate RAID 1 array (I prefer
gmirror) for the base system.

Thanks,

   Laszlo


Re: Planning a new server - help needed

From
Greg Smith
Date:
On Fri, 28 Mar 2008, Laszlo Nagy wrote:

> We already have a server but it is becoming slow and we would like to
> have something that is faster.

What's it slow at?  Have you identified the bottlenecks and current
sources of sluggish behavior?  That sort of thing is much more informative
to look into in regards to redesigning for new hardware than trivia like
disk layout.  For all we know you're CPU bound.

> The database itself is an OLTP system. There are many smaller tables, and
> some bigger ones (biggest table with 1.2 million records, table size 966MB,
> indexes size 790MB).

The total database size is the interesting number you left out here.  And
you didn't mention how much RAM either.  That ratio has a lot of impact on
how hard you'll push the disks.

> Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs say
> that it is better to use FreeBSD because it can alter the I/O priority of
> processes dynamically.

You shouldn't make an OS decision based on a technical detail that small.
I won't knock FreeBSD because it's a completely reasonable choice, but
there's no credible evidence it's a better performer for the workload you
expect than, say, Linux or even Solaris x64.  (The benchmarks the FreeBSD
team posted as part of their 7.0 fanfare are not representative of real
PostgreSQL performance, and are read-only as well).

All the reasonable OS choices here are close enough to one another (as
long as you get FreeBSD 7, earlier versions are really slow) that you
should be thinking in terms of reliability, support, and features rather
than viewing this from a narrow performance perspective.  There's nothing
about what you've described that sounds like it needs bleeding-edge
performance to achieve. For reliability, I first look at how good the disk
controller and its matching driver in the OS used is, which brings us to:

> Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller with 8
> SATA 2 disks. The operating system would be on another disk pair, connected
> to the motherboard's controller. I wonder if I can get more performance with
> SCSI, for the same amount of money? (I can spend about $1500 on the
> controller and the disks, that would cover 10 SATA 2 disks and the
> controller.)

Highpoint has traditionally made disk controllers that were garbage.  The
3520 is from a relatively new series of products from them, and it seems
like a reasonable unit.  However:  do you want to be be deploying your
system on a new card with zero track record for reliability, and from a
company that has never done a good job before?  I can't think of any
reason at all why you should take that risk.

The standard SATA RAID controller choices people suggest here are 3ware,
Areca, and LSI Logic.  Again, unless you're really pushing what the
hardware is capable of these are all close to each other performance-wise
(see http://femme.tweakblogs.net/blog/196/highpoint-rocketraid-3220.html
for something that include the Highpoint card).  You should be thinking in
terms of known reliability and stability when you select a database
controller card, and Highpoint isn't even on the list of vendors to
consider yet by those standards.

As for SCSI vs. SATA, I collected up the usual arguments on both sides at
http://www.postgresqldocs.org/index.php/SCSI_vs._IDE/SATA_Disks

> However, the transaction log file should be on a separate disk and maybe I
> could gain more performance by putting indexes on a separate drive, but I do
> not want to reduce the number of disks in the RAID 0+1 array.

If you're looking at 8+ disks and have a caching controller with a battery
backup, which appears to be your target configuration, there little reason
to expect a big performance improvement from splitting the transaction log
out onto a seperate disk.  As you note, doing that will reduce the spread
of disk for the database which may cost you more in performance than
seperate transaction logs gain.

It is worth considering creating a seperate filesystem on the big array to
hold the xlog data through, because that gives you more flexibility in
terms of mount parameters there.  For example, you can always turn off
atime updates on the transaction log filesystem, and in many cases the
filesystem journal updates can be optimized more usefully (the xlog
doesn't require them).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Planning a new server - help needed

From
Dan Harris
Date:
Laszlo Nagy wrote:
>
> Question 4. How to make the partitions? This is the hardest question.
> Here is my plan:
>
> - the OS resides on 2 disks, RAID 1
> - the databases should go on 8 disks, RAID 0 + 1
Make sure you understand the difference between RAID 1+0 and RAID 0+1..
I suspect you'll end up going with 1+0 instead.

-Dan



Re: Planning a new server - help needed

From
Weinzierl Stefan
Date:
Laszlo Nagy schrieb:
[...]
> The RAID controller that I have selected can only handle 8 disks. I
> guess I need to find a different one with 16 channels and use more
> disks. So are you saying that with all disks in a bigger RAID 6 array, I
> will get the most out of the hardware? In that case, I'll try to get a
> bit more money from the management and build RAID 6 with 12 disks.

Here a good SATA-Controllers for 4/8/12/16-Disks:
http://www.tekram.com/product2/product_detail.asp?pid=51

Stefan

Re: Planning a new server - help needed

From
James Mansion
Date:
Greg Smith wrote:
> As for SCSI vs. SATA, I collected up the usual arguments on both sides
> at http://www.postgresqldocs.org/index.php/SCSI_vs._IDE/SATA_Disks
>
Why do you claim that 'More platters also means slower seeks
and generally slower performance.'?

On the face of it, it should mean that the number of track step
operations is reduced, even if the drive doesn't buffer a slice
of tracks aross all platters (which would help if it did).

I'm not entirely sure why the extra platters should really count
as more moving parts since I think the platter assembly and
head assembly are both single parts in effect, albeit they will
be more massive with more platters. I'm not sure how much
extra bearing friction that will mean, but it is reasonable that
some extra energy is going to be needed.

Recent figures I've seen suggest that the increased storage
density per platter, plus the extra number of platters, means
that the streaming speed of good 7200rpm SATA drives is
very close to that of good 15000rpm SAS drives - and you
can choose which bit of the disk to use to reduce seek time and
maximise transfer rate with the oversize drives. You can
get about 100MB/s out of both technologies, streaming.

It may be worth considering an alternative approach. I suspect
that a god RAID1 or RAID1+0 is worthwhile for WAL, but
you might consider a RAID1 of a pair of SSDs for data. They
will use a lot of your budget, but the seek time is negligible so the
effective usable performance is higher than you get with
spinning disks - so you might trade a fancy controller with
battery-backed cache for straight SSD.

I haven't done this, so YMMV.  But the prices are getting
interesting for OLTP where most disks are massively
oversized. The latest Samsung and SanDisk are expensive
in the UK but the Transcend 16GB TS16GSSD25S-S  SATA
is about $300 equiv - it can do 'only' 'up to' 28MB/s write and
you wouldn't want to put WAL on one, but sustaining
15-20MB/s through random access to a real disk isn't
trivial. If average access is 10ms, and you write 100MB/s
streaming, then you have to ask yourself if you going to do
80 or more seeks a second.

James

James





Re: Planning a new server - help needed

From
PFC
Date:
> Why do you claim that 'More platters also means slower seeks
> and generally slower performance.'?

    More platters -> more heads -> heavier head assembly -> slower seek time
    But..
    More platters -> higher density -> less seek distance (in mm of head
movement) -> faster seek time

    As usual, no clear-cut case, a real-life test would tell more interesting
things.

> I'm not entirely sure why the extra platters should really count
> as more moving parts since I think the platter assembly and
> head assembly are both single parts in effect, albeit they will
> be more massive with more platters. I'm not sure how much
> extra bearing friction that will mean, but it is reasonable that
> some extra energy is going to be needed.

    Since the bearings are only on one side of the axle (not both), a heavier
platter assembly would put more stress on the bearing if the disk is
subject to vibrations (like, all those RAID disks seeking together) which
would perhaps shorten its life. Everything with conditionals of course ;)
    I remember reading a paper on vibration from many RAID disks somewhere a
year or so ago, vibration from other disks seeking at the exact same time
and in the same direction would cause resonances in the housing chassis
and disturb the heads of disks, slightly worsening seek times and
reliability. But, on the other hand, the 7 disks raided in my home storage
server never complained, even though the $30 computer case vibrates all
over the place when they seek. Perhaps if they were subject to 24/7 heavy
torture, a heavier/better damped chassis would be a good investment.

> It may be worth considering an alternative approach. I suspect
> that a god RAID1 or RAID1+0 is worthwhile for WAL, but

    Actually, now that 8.3 can sync to disk every second instead of at every
commit, I wonder, did someone do some enlightening benchmarks ? I remember
benchmarking 8.2 on a forum style load and using a separate disk for WAL
(SATA, write cache off) made a huge difference (as expected) versus one
disk for everything (SATA, and write cache off). Postgres beat the crap
out of MyISAM, lol.
    Seems like Postgres is one of the rare apps which gets faster and meaner
with every release, instead of getting slower and more bloated like
everyone else.

    Also, there is a thing called write barriers, which supposedly could be
used to implement fsync-like behaviour without the penalty, if the disk,
the OS, the controller, and the filesystem support it (that's a lot of
ifs)...

> I haven't done this, so YMMV.  But the prices are getting
> interesting for OLTP where most disks are massively
> oversized. The latest Samsung and SanDisk are expensive
> in the UK but the Transcend 16GB TS16GSSD25S-S  SATA
> is about $300 equiv - it can do 'only' 'up to' 28MB/s write and

    Gigabyte should revamp their i-RAM to use ECC RAM of a larger capacity...
and longer lasting battery backup...
    I wonder, how many write cycles those Flash drives can take before
reliability becomes a problem...




Re: Planning a new server - help needed

From
James Mansion
Date:
PFC wrote:
>
>> Why do you claim that 'More platters also means slower seeks
>> and generally slower performance.'?
>
>     More platters -> more heads -> heavier head assembly -> slower
> seek time
Note sure I've sen a lot of evidence of that in drive specifications!

>     Gigabyte should revamp their i-RAM to use ECC RAM of a larger
> capacity... and longer lasting battery backup...
You would think a decent capacitor or rechargable button battery would
be enough to dump it to a flash memory.
No problem with flash wear then.
>     I wonder, how many write cycles those Flash drives can take before
> reliability becomes a problem...
Hard to get data isn't it? I believe its hundreds of thousands to
millions now. Now each record in most OLTP
tables is rewritten a few times unless its stuff that can go into temp
tables etc, which should be elsewhere.
Index pages clearly get rewritten often.

I suspect a mix of storage technologies will be handy for some time yet
- WAL on disk, and temp tables on
disk with no synchronous fsync requirement.

I think life is about to get interesting in DBMS storage.  All good for
us users.

James


Re: Planning a new server - help needed

From
Peter Eisentraut
Date:
Laszlo Nagy wrote:
> Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs
> say that it is better to use FreeBSD because it can alter the I/O
> priority of processes dynamically.

Where does it say that?

Re: Planning a new server - help needed

From
Greg Smith
Date:
On Sat, 29 Mar 2008, PFC wrote:

>> Why do you claim that 'More platters also means slower seeks
>> and generally slower performance.'?
>     More platters -> more heads -> heavier head assembly -> slower seek
> time

I recall seeing many designs with more platters that have slower seek
times in benchmarks, and I always presumed this was the reason.  That's
the basis for that comment.  I'll disclaim that section a bit.

> Actually, now that 8.3 can sync to disk every second instead of at every
> commit, I wonder, did someone do some enlightening benchmarks ?

I've seen some really heavy workloads where using async commit helped
group commits in a larger batches usefully, but I personally haven't found
it to be all that useful if you're already got a caching controller to
accelerate writes on the kinds of hardware most people have.  It's a great
solution for situations without a usable write cache though.

> Also, there is a thing called write barriers, which supposedly could be
> used to implement fsync-like behaviour without the penalty, if the disk,
> the OS, the controller, and the filesystem support it (that's a lot of
> ifs)...

The database can't use fsync-like behavior for the things it calls fsync
for; it needs the full semantics.  You're either doing the full operation,
or you're cheating and it doesn't do what it's supposed to.  Write
barriers aren't any improvement over a good direct I/O sync write setup
for the WAL.  There may be some limited value to that approach for the
database writes at checkpoint time, but again there's a real fsync coming
at the end of that and it's not satisfied until everything is on disk (or
in a good disk controller cache).

> Gigabyte should revamp their i-RAM to use ECC RAM of a larger
> capacity... and longer lasting battery backup...

I saw a rumor somewhere that they were close to having a new version of
that using DDR2 ready, which would make it pretty easy to have 8GB on
there.

> I wonder, how many write cycles those Flash drives can take before
> reliability becomes a problem...

The newer SSD drives with good write leveling should last at least as long
as you'd expect a mechanical drive to, even in a WAL application.  Lesser
grades of flash used as disk could be a problem though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Planning a new server - help needed

From
Craig Ringer
Date:
Greg Smith wrote:

>> Gigabyte should revamp their i-RAM to use ECC RAM of a larger
>> capacity... and longer lasting battery backup...
>
> I saw a rumor somewhere that they were close to having a new version of
> that using DDR2 ready, which would make it pretty easy to have 8GB on
> there.

I'm hoping - probably in vain - that they'll also include a CF/SD slot
or some onboard flash so it can dump its contents to flash using the
battery backup.

For anybody wondering what the devices in question are, see:

http://www.anandtech.com/storage/showdoc.aspx?i=2480
http://www.gigabyte.com.tw/Products/Storage/Products_Overview.aspx?ProductID=2180

--
Craig Ringer

Re: Planning a new server - help needed

From
Guy Rouillier
Date:
PFC wrote:
>
>> Why do you claim that 'More platters also means slower seeks
>> and generally slower performance.'?
>
>     More platters -> more heads -> heavier head assembly -> slower seek
> time
>     But..
>     More platters -> higher density -> less seek distance (in mm of head
> movement) -> faster seek time

More platters means more tracks under the read heads at a time, so
generally *better* performance.  All other things (like rotational
speed) being equal, of course.

--
Guy Rouillier