Re: Reliability with RAID 10 SSD and Streaming Replication - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Reliability with RAID 10 SSD and Streaming Replication
Date
Msg-id CAHyXU0wUHDiGVWgejxsFNApiP0sUcf8eLOy1Dwy-1pboBuxLeQ@mail.gmail.com
Whole thread Raw
In response to Re: Reliability with RAID 10 SSD and Streaming Replication  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Reliability with RAID 10 SSD and Streaming Replication  (Shaun Thomas <sthomas@optionshouse.com>)
Re: Reliability with RAID 10 SSD and Streaming Replication  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-performance
On Wed, May 22, 2013 at 2:30 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 5/22/13 3:06 PM, Joshua D. Drake wrote:
>>
>> Greg, can you elaborate on the SSD + Xlog issue? What type of burn
>> through are we talking about?
>
>
> You're burning through flash cells at a multiple of the total WAL write
> volume.  The system I gave iostat snapshots from upthread (with the Intel
> 710 hitting its limit) archives about 1TB of WAL each week.  The actual
> amount of WAL written in terms of erased flash blocks is even higher though,
> because sometimes the flash is hit with partial page writes.  The write
> amplification of WAL is much worse than the main database.
>
> I gave a rough intro to this on the Intel drives at
> http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ and there's a
> nice "Write endurance" table at
> http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-2.html
>
> The cheapest of the Intel SSDs I have here only guarantees 15TB of total
> write endurance.  Eliminating >1TB of writes per week by moving the WAL off
> SSD is a pretty significant change, even though the burn rate isn't a simple
> linear thing--you won't burn the flash out in only 15 weeks.

Certainly, intel 320 is not designed for 1tb/week workloads.

> The production server is actually using the higher grade 710 drives that aim
> for 900TB instead.  But I do have standby servers using the low grade stuff,
> so anything I can do to decrease SSD burn rate without dropping performance
> is useful.  And only the top tier of transaction rates will outrun a RAID1
> pair of 15K drives dedicated to WAL.

s3700 is rated for 10 drive writes/day for 5 years. so, for 200gb drive, that's
200gb * 10/day * 365 days * 5, that's 3.65 million gigabytes or ~ 3.5 petabytes.

1tb/week would take 67 years to burn through / whatever you assume for
write amplification / whatever extra penalty you give if you are
shooting for > 5 year duty cycle (flash degrades faster the older it
is)  *for a single 200gb device*.  write endurance is not a problem
for this drive, in fact it's a very reasonable assumption that the
faster worst case random performance is directly related to reduced
write amplification.  btw,  cost/pb of this drive is less than half of
the 710 (which IMO was obsolete the day the s3700 hit the street).

merlin


pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Reliability with RAID 10 SSD and Streaming Replication
Next
From: Shaun Thomas
Date:
Subject: Re: Reliability with RAID 10 SSD and Streaming Replication