Re: How to improve db performance with $7K? - Mailing list pgsql-performance

From William Yu
Subject Re: How to improve db performance with $7K?
Msg-id d3vpnn$2iu$
Whole thread Raw
In response to Re: How to improve db performance with $7K?  (Steve Poe)
Responses Re: How to improve db performance with $7K?  (Greg Stark)
List pgsql-performance
Problem with this strategy. You want battery-backed write caching for
best performance & safety. (I've tried IDE for WAL before w/ write
caching off -- the DB got crippled whenever I had to copy files from/to
the drive on the WAL partition -- ended up just moving WAL back on the
same SCSI drive as the main DB.) That means in addition to a $$$ SCSI
caching controller, you also need a $$$ SATA caching controller. From my
glance at prices, advanced SATA controllers seem to cost nearly as their
SCSI counterparts.

This also looks to be the case for the drives themselves. Sure you can
get super cheap 7200RPM SATA drives but they absolutely suck for
database work. Believe me, I gave it a try once -- ugh. The highend WD
10K Raptors look pretty good though -- the benchmarks @ storagereview
seem to put these drives at about 90% of SCSI 10Ks for both single-user
and multi-user. However, they're also priced like SCSIs -- here's what I
found @ Mwave (going through pricewatch to find WD740GDs):

Seagate 7200 SATA -- 80GB    $59
WD 10K SATA       -- 72GB    $182
Seagate 10K U320  -- 72GB    $289

Using the above prices for a fixed budget for RAID-10, you could get:

SATA 7200 -- 680MB per $1000
SATA 10K  -- 200MB per $1000
SCSI 10K  -- 125MB per $1000

For a 99% read-only DB that required lots of disk space (say something
like Wikipedia or blog host), using consumer level SATA probably is ok.
For anything else, I'd consider SATA 10K if (1) I do not need 15K RPM
and (2) I don't have SCSI intrastructure already.

Steve Poe wrote:
> If SATA drives don't have the ability to replace SCSI for a multi-user
> Postgres apps, but you needed to save on cost (ALWAYS an issue),
> could/would you implement SATA for your logs (pg_xlog) and keep the rest
> on SCSI?
> Steve Poe
> Mohan, Ross wrote:
>> I've been doing some reading up on this, trying to keep up here, and
>> have found out that (experts, just yawn and cover your ears)
>> 1) some SATA drives (just type II, I think?) have a "Phase Zero"
>>    implementation of Tagged Command Queueing (the special sauce
>>    for SCSI).
>> 2) This SATA "TCQ" is called NCQ and I believe it basically
>>    allows the disk software itself to do the reordering
>>    (this is called "simple" in TCQ terminology) It does not
>>    yet allow the TCQ "head of queue" command, allowing the
>>    current tagged request to go to head of queue, which is
>>    a simple way of manifesting a "high priority" request.
>> 3) SATA drives are not yet multi-initiator?
>> Largely b/c of 2 and 3, multi-initiator SCSI RAID'ed drives
>> are likely to whomp SATA II drives for a while yet (read: a
>> year or two) in multiuser PostGres applications.
>> -----Original Message-----
>> From: 
>> [mailto:] On Behalf Of Greg Stark
>> Sent: Thursday, April 14, 2005 2:04 PM
>> To: Kevin Brown
>> Cc: 
>> Subject: Re: [PERFORM] How to improve db performance with $7K?
>> Kevin Brown <> writes:
>>> Greg Stark wrote:
>>>> I think you're being misled by analyzing the write case.
>>>> Consider the read case. When a user process requests a block and
>>>> that read makes its way down to the driver level, the driver can't
>>>> just put it aside and wait until it's convenient. It has to go ahead
>>>> and issue the read right away.
>>> Well, strictly speaking it doesn't *have* to.  It could delay for a
>>> couple of milliseconds to see if other requests come in, and then
>>> issue the read if none do.  If there are already other requests being
>>> fulfilled, then it'll schedule the request in question just like the
>>> rest.
>> But then the cure is worse than the disease. You're basically
>> describing exactly what does happen anyways, only you're delaying more
>> requests than necessary. That intervening time isn't really idle, it's
>> filled with all the requests that were delayed during the previous
>> large seek...
>>> Once the first request has been fulfilled, the driver can now
>>> schedule the rest of the queued-up requests in disk-layout order.
>>> I really don't see how this is any different between a system that
>>> has tagged queueing to the disks and one that doesn't.  The only
>>> difference is where the queueing happens.
>> And *when* it happens. Instead of being able to issue requests while a
>> large seek is happening and having some of them satisfied they have to
>> wait until that seek is finished and get acted on during the next
>> large seek.
>> If my theory is correct then I would expect bandwidth to be
>> essentially equivalent but the latency on SATA drives to be increased
>> by about 50% of the average seek time. Ie, while a busy SCSI drive can
>> satisfy most requests in about 10ms a busy SATA drive would satisfy
>> most requests in 15ms. (add to that that 10k RPM and 15kRPM SCSI
>> drives have even lower seek times and no such IDE/SATA drives exist...)
>> In reality higher latency feeds into a system feedback loop causing
>> your application to run slower causing bandwidth demands to be lower
>> as well. It's often hard to distinguish root causes from symptoms when
>> optimizing complex systems.
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match

pgsql-performance by date:

From: Nick Trainor
From: Dawid Kuroczko
Subject: Re: immutable functions vs. join for lookups ?