Thread: Tuning guidelines for server with 256GB of RAM and SSDs?

Tuning guidelines for server with 256GB of RAM and SSDs?

From
Kaixi Luo
Date:
Hello,

I've been reading Mr. Greg Smith's "Postgres 9.0 - High Performance" book and I have some questions regarding the guidelines I found in the book, because I suspect some of them can't be followed blindly to the letter on a server with lots of RAM and SSDs.

Here are my server specs:

Intel Xeon E5-1650 v3 Hexa-Core Haswell 
256GB DDR4 ECC RAM
Battery backed hardware RAID with 512MB of WriteBack cache (LSI MegaRAID SAS 9260-4i)
RAID1 - 2x480GB Samsung SSD with power loss protection (will be used to store the PostgreSQL database)
RAID1 - 2x240GB Crucial SSD with power loss protection. (will be used to store PostgreSQL transactions logs)

First of all, the book suggests that I should enable the WriteBack cache of the HWRAID and disable the disk cache to increase performance and ensure data safety. Is it still advisable to do this on SSDs, specifically the step of disabling the disk cache? Wouldn't that increase the wear rate of the SSD?

Secondly, the book suggests that we increase the device readahead from 256 to 4096. As far as I understand, this was done in order to reduce the number of seeks on a rotating hard drive, so again, is this still applicable to SSDs?

The other tunable I've been looking into is vm.dirty_ratio and vm.dirty_background_ratio. I reckon that the book's recommendation to lower vm.dirty_background_ratio to 5 and vm.dirty_ratio to 10 is not enough for a server with such big amount of RAM. How much lower should I set these values, given that my RAID's WriteBack cache size is 512MB?

Thank you very much.

Kaixi Luo

Re: Tuning guidelines for server with 256GB of RAM and SSDs?

From
Merlin Moncure
Date:
On Tue, Jul 5, 2016 at 9:50 AM, Kaixi Luo <kaixiluo@gmail.com> wrote:
> Hello,
>
> I've been reading Mr. Greg Smith's "Postgres 9.0 - High Performance" book
> and I have some questions regarding the guidelines I found in the book,
> because I suspect some of them can't be followed blindly to the letter on a
> server with lots of RAM and SSDs.
>
> Here are my server specs:
>
> Intel Xeon E5-1650 v3 Hexa-Core Haswell
> 256GB DDR4 ECC RAM
> Battery backed hardware RAID with 512MB of WriteBack cache (LSI MegaRAID SAS
> 9260-4i)
> RAID1 - 2x480GB Samsung SSD with power loss protection (will be used to
> store the PostgreSQL database)
> RAID1 - 2x240GB Crucial SSD with power loss protection. (will be used to
> store PostgreSQL transactions logs)
>
> First of all, the book suggests that I should enable the WriteBack cache of
> the HWRAID and disable the disk cache to increase performance and ensure
> data safety. Is it still advisable to do this on SSDs, specifically the step
> of disabling the disk cache? Wouldn't that increase the wear rate of the
> SSD?

At the time that book was written, the majority of SSDs were known not
to be completely honest and/or reliable about data integrity in the
face of a power event.  Now it's a hit or miss situation (for example,
see here: http://blog.nordeus.com/dev-ops/power-failure-testing-with-ssds.htm).
The intel drives S3500/S3700 and their descendants are the standard
against which other drives should be judged IMO. The S3500 family in
particular offers tremendous value for database usage.  Do your
research; the warning is still relevant but the blanket statement no
longer applies.  Spinning drives are completely obsolete for database
applications in my experience.

Disabling write back cache for write heavy database loads will will
destroy it in short order due to write amplication and will generally
cause it to underperform hard drives in my experience.

With good SSDs and a good motherboard, I do not recommend a caching
raid controller; software raid is a better choice for many reasons.

One parameter that needs to be analyzed with SSD is
effective_io_concurrency.  see
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com

merlin


Re: Tuning guidelines for server with 256GB of RAM and SSDs?

From
Scott Marlowe
Date:
On Wed, Jul 6, 2016 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Disabling write back cache for write heavy database loads will will
> destroy it in short order due to write amplication and will generally
> cause it to underperform hard drives in my experience.

Interesting. We found our best performance with a RAID-5 of 10 800GB
SSDs (Intel 3500/3700 series) that we got MUCH faster performance with
all write caching turned off on our LSI MEgaRAID controllers. We went
from 3 to 4ktps to 15 to 18ktps. And after a year of hard use we still
show ~90% life left (these machines handle thousands of writes per
second in real use) It could be that the caching was getting in the
way of RAID calcs or some other issue. With RAID-1 I have no clue what
the performance will be with write cache on or off.

--
To understand recursion, one must first understand recursion.


Re: Tuning guidelines for server with 256GB of RAM and SSDs?

From
"Wes Vaske (wvaske)"
Date:
Regarding the Nordeus blog Merlin linked.

They say:
"This doesn't mean the data was really written to disk, it can still remain in the disk cache, but enterprise drives
usuallymake sure the data was really written to disk on fsync calls." 

This isn't actually true for enterprise drives (when I say enterprise in the context of an SSD, I'm assuming full power
lossprotection via capacitors on the drive like the Intel DC S3x00 series). Most enterprise SSDs will ignore calls to
disabledisk cache or to flush the disk cache as doing so is entirely unnecessary. 


Regarding write back cache:
Disabling the write back cache won't have a real large impact on the endurance of the drive unless it reduces the total
numberof bytes written (which it won't). I've seen drives that perform better with it disabled and drives that perform
betterwith it enabled. I would test in your environment and make the decision based on performance.  


Regarding the Crucial drive for logs:
As far as I'm aware, none of the Crucial drives have power loss protection. To use these drives you would want to
disabledisk cache which would drop your performance a fair bit. 


Write amplification:
I wouldn't expect write amplification to be a serious issue unless you hit every LBA on the device early in its life
andnever execute TRIM. This is one of the reasons software RAID can be a better solution for something like this. MDADM
supportsTRIM in RAID devices.  So unless you run the drives above 90% full, the write amplification would be minimal so
longas you have a daily fstrim cron job. 

Wes Vaske | Senior Storage Solutions Engineer
Micron Technology

________________________________________
From: pgsql-performance-owner@postgresql.org <pgsql-performance-owner@postgresql.org> on behalf of Merlin Moncure
<mmoncure@gmail.com>
Sent: Wednesday, July 6, 2016 1:13 PM
To: Kaixi Luo
Cc: postgres performance list
Subject: Re: [PERFORM] Tuning guidelines for server with 256GB of RAM and SSDs?

On Tue, Jul 5, 2016 at 9:50 AM, Kaixi Luo <kaixiluo@gmail.com> wrote:
> Hello,
>
> I've been reading Mr. Greg Smith's "Postgres 9.0 - High Performance" book
> and I have some questions regarding the guidelines I found in the book,
> because I suspect some of them can't be followed blindly to the letter on a
> server with lots of RAM and SSDs.
>
> Here are my server specs:
>
> Intel Xeon E5-1650 v3 Hexa-Core Haswell
> 256GB DDR4 ECC RAM
> Battery backed hardware RAID with 512MB of WriteBack cache (LSI MegaRAID SAS
> 9260-4i)
> RAID1 - 2x480GB Samsung SSD with power loss protection (will be used to
> store the PostgreSQL database)
> RAID1 - 2x240GB Crucial SSD with power loss protection. (will be used to
> store PostgreSQL transactions logs)
>
> First of all, the book suggests that I should enable the WriteBack cache of
> the HWRAID and disable the disk cache to increase performance and ensure
> data safety. Is it still advisable to do this on SSDs, specifically the step
> of disabling the disk cache? Wouldn't that increase the wear rate of the
> SSD?

At the time that book was written, the majority of SSDs were known not
to be completely honest and/or reliable about data integrity in the
face of a power event.  Now it's a hit or miss situation (for example,
see here: http://blog.nordeus.com/dev-ops/power-failure-testing-with-ssds.htm).
The intel drives S3500/S3700 and their descendants are the standard
against which other drives should be judged IMO. The S3500 family in
particular offers tremendous value for database usage.  Do your
research; the warning is still relevant but the blanket statement no
longer applies.  Spinning drives are completely obsolete for database
applications in my experience.

Disabling write back cache for write heavy database loads will will
destroy it in short order due to write amplication and will generally
cause it to underperform hard drives in my experience.

With good SSDs and a good motherboard, I do not recommend a caching
raid controller; software raid is a better choice for many reasons.

One parameter that needs to be analyzed with SSD is
effective_io_concurrency.  see
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com

merlin


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Tuning guidelines for server with 256GB of RAM and SSDs?

From
Merlin Moncure
Date:
On Wed, Jul 6, 2016 at 4:48 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Jul 6, 2016 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Disabling write back cache for write heavy database loads will will
>> destroy it in short order due to write amplication and will generally
>> cause it to underperform hard drives in my experience.
>
> Interesting. We found our best performance with a RAID-5 of 10 800GB
> SSDs (Intel 3500/3700 series) that we got MUCH faster performance with
> all write caching turned off on our LSI MEgaRAID controllers. We went
> from 3 to 4ktps to 15 to 18ktps. And after a year of hard use we still
> show ~90% life left (these machines handle thousands of writes per
> second in real use) It could be that the caching was getting in the
> way of RAID calcs or some other issue. With RAID-1 I have no clue what
> the performance will be with write cache on or off.

Right -- by that I meant disabling the write back cache on the drive
itself, so that all writes are immediately flushed.  Disabling write
back on the raid controller should be the right choice; each of these
drives essentially is a 'caching raid controller' for all intents and
purposes.  Hardware raid controllers are engineered around performance
and reliability assumptions that are no longer correct in an SSD
world.  Personally I would have plugged the drives directly to the
motherboard (assuming it's a got enough lanes) and mounted the raid
against mdadm and compared.

merlin


Re: Tuning guidelines for server with 256GB of RAM and SSDs?

From
Scott Marlowe
Date:
On Thu, Jul 7, 2016 at 10:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Jul 6, 2016 at 4:48 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Wed, Jul 6, 2016 at 12:13 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> Disabling write back cache for write heavy database loads will will
>>> destroy it in short order due to write amplication and will generally
>>> cause it to underperform hard drives in my experience.
>>
>> Interesting. We found our best performance with a RAID-5 of 10 800GB
>> SSDs (Intel 3500/3700 series) that we got MUCH faster performance with
>> all write caching turned off on our LSI MEgaRAID controllers. We went
>> from 3 to 4ktps to 15 to 18ktps. And after a year of hard use we still
>> show ~90% life left (these machines handle thousands of writes per
>> second in real use) It could be that the caching was getting in the
>> way of RAID calcs or some other issue. With RAID-1 I have no clue what
>> the performance will be with write cache on or off.
>
> Right -- by that I meant disabling the write back cache on the drive
> itself, so that all writes are immediately flushed.  Disabling write
> back on the raid controller should be the right choice; each of these
> drives essentially is a 'caching raid controller' for all intents and
> purposes.  Hardware raid controllers are engineered around performance
> and reliability assumptions that are no longer correct in an SSD
> world.  Personally I would have plugged the drives directly to the
> motherboard (assuming it's a got enough lanes) and mounted the raid
> against mdadm and compared.

Oh yeah definitely. And yea we've found that mdadm and raw HBAs work
better than most RAID controllers for SSDs.


Re: Tuning guidelines for server with 256GB of RAM and SSDs?

From
Kaixi Luo
Date:

Regarding write back cache:
Disabling the write back cache won't have a real large impact on the endurance of the drive unless it reduces the total number of bytes written (which it won't). I've seen drives that perform better with it disabled and drives that perform better with it enabled. I would test in your environment and make the decision based on performance.


Thanks. I assume you are referring to the write back cache on the RAID controller here and not the disk cache itself.

Kaixi