Re: Deploying PostgreSQL on CentOS with SSD and Hardware RAID - Mailing list pgsql-general

From Holger Hoffstaette
Subject Re: Deploying PostgreSQL on CentOS with SSD and Hardware RAID
Date
Msg-id pan.2013.05.21.11.27.08.144987@googlemail.com
Whole thread Raw
In response to Deploying PostgreSQL on CentOS with SSD and Hardware RAID  (Matt Brock <mb@mattbrock.co.uk>)
List pgsql-general
On Tue, 21 May 2013 11:40:55 +1000, Toby Corkindale wrote:

>>> While it is important to let the SSD know about space that can be
>>> reclaimed, I gather the operation does not perform well. I *think*
>>> current advice is to leave 'discard' off the mount options, and instead
>>> run a nightly cron job to call 'fstrim' on the mount point instead. (In
>>> really high write situations, you'd be looking at calling that every
>>> hour instead I suppose)

This is still a good idea - see below.

>> The guy who blogged about this a couple of years ago was using a
>> Sandforce controller drive.

Btw that doesn't mean anything (neither in terms of performance nor
stability), since "the controller" also needs to be paired with an - often
vendor-dependent - firmware, which is much more relevant. Since LSI
acquired Sandforce this situation has gotten much better (unified
upstream).

>> I'm not sure there is a similar issue with other drives. Certainly we've

There is (now), because..

>> never noticed a problematic delay in file deletes. That said, our
>> applications don't delete files too often (log file purging is probably
>> the only place it happens regularly).
>>
>> Personally, in the absence of a clear and present issue, I'd prefer to
>> go the "kernel guys and drive firmware guys will take care of this"
>> route, and just enable discard on the mount.

Nope, wrong, because.. (..getting there :)

> That is from 2011 though, so you're right that things may have improved by
> now.. Has anyone seen benchmarks supporting that though?

Unfortunately since 3.8 discards are issued as synchronous commands,
effectively disabling any scheduling/merging etc. The result can be seen
easily:

- mount drive without discard using kernel >= 3.8
- unpack kernel source
- time delete of entire tree

- remount with discard
- unpack kernel tree
- start delete of tree
- ...
- check it hasn't crashed
- ...
- go plant a tree or make babies while waiting for it to finish

Online discard has gotten so slow that it's now a good idea to turn off
for anything but light write workloads. Metadata-heavy writes are
obviously the worst case.

I experienced this on Samsung, Intel & a Sandforce-based drives, so "the
controller" is no longer the primary reason for the performance impact.
Extremely enterprisey drives *might* behave slightly better, but I doubt
it; flash erase cycles are what they are.

-h


pgsql-general by date:

Previous
From: Dev Kumkar
Date:
Subject: Re: [ODBC] ODBC constructs
Next
From: Kjetil Jørgensen
Date:
Subject: Contents of data/base/ and no corresponding entry in pg_database