Thread: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Michael March
Date:
If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.  

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.


Is this what everyone else is seeing?

Thanks!

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Carey
Date:
SSD's actually vary quite a bit with typical postgres benchmark workloads.  Many of them also do not guarantee data that has been sync'd will not be lost if power fails (most hard drives with a sane OS and file system do).


On Aug 7, 2010, at 4:47 PM, Michael March wrote:

If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.  

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.


Is this what everyone else is seeing?

Thanks!


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Michael March
Date:

SSD's actually vary quite a bit with typical postgres benchmark workloads.

You mean various SSDs from different vendors? Or are you saying the same SSD model might vary in performance from drive to drive?
 
 Many of them also do not guarantee data that has been sync'd will not be lost if power fails (most hard drives with a sane OS and file system do).

What feature does an SSD need to have to insure that sync'd data is indeed written to the SSD in the case of power loss?

 


On Aug 7, 2010, at 4:47 PM, Michael March wrote:

If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.  

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.


Is this what everyone else is seeing?

Thanks!

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Marlowe
Date:
On Sat, Aug 7, 2010 at 5:47 PM, Michael March <mmarch@gmail.com> wrote:
> If anyone is interested I just completed a series of benchmarks of stock
> Postgresql running on a normal HDD vs a SSD.
> If you don't want to read the post, the summary is that SSDs are 5 to 7
> times faster than a 7200RPM HDD drive under a pgbench load.
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
> Thanks!

It's a good first swing, but I'd be interested in seeing how it runs
with various numbers of clients, and how it runs as the number of
clients goes past optimal.  I.e. a nice smooth downward trend or a
horrible drop-off for whichever drives.

--
To understand recursion, one must first understand recursion.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Michael March
Date:
On Sat, Aug 7, 2010 at 5:47 PM, Michael March <mmarch@gmail.com> wrote:
> If anyone is interested I just completed a series of benchmarks of stock
> Postgresql running on a normal HDD vs a SSD.
> If you don't want to read the post, the summary is that SSDs are 5 to 7
> times faster than a 7200RPM HDD drive under a pgbench load.
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
> Thanks!

It's a good first swing, but I'd be interested in seeing how it runs
with various numbers of clients, and how it runs as the number of
clients goes past optimal.  I.e. a nice smooth downward trend or a
horrible drop-off for whichever drives.


Yeah. I was thinking the same thing.. 

I need to automate the tests so I can incrementally increase the scaling of the seed tables and increase of number simultaneous clients over time. Put another way,I need to do A LOT more tests that will gently increment all the testable factors one small step at a time.  

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Marlowe
Date:
On Sun, Aug 8, 2010 at 12:49 AM, Michael March <mmarch@gmail.com> wrote:
>
>> SSD's actually vary quite a bit with typical postgres benchmark workloads.
>
> You mean various SSDs from different vendors? Or are you saying the same SSD
> model might vary in performance from drive to drive?
>
>>
>>  Many of them also do not guarantee data that has been sync'd will not be
>> lost if power fails (most hard drives with a sane OS and file system do).
>
> What feature does an SSD need to have to insure that sync'd data is indeed
> written to the SSD in the case of power loss?

A big freaking capacitor and the ability to notice power's been cut
and start writing out the cache.  There are a few that have it that
are coming out right about now.  There was a post about one such drive
a few days ago, it was like 50G and $450 or so, so not cheap, but not
that bad compared to the $7000 drive bays with 16 15k6 drives I've
used in to the past to get good performance (3 to 4k tps)

--
To understand recursion, one must first understand recursion.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Yeb Havinga
Date:
Michael March wrote:
> If anyone is interested I just completed a series of benchmarks of
> stock Postgresql running on a normal HDD vs a SSD.
>
> If you don't want to read the post, the summary is that SSDs are 5 to
> 7 times faster than a 7200RPM HDD drive under a pgbench load.
>
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
I tested a SSD with a capacitor and posted conclusions here
http://archives.postgresql.org/pgsql-performance/2010-07/msg00449.php

regards,
Yeb Havinga


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Carey
Date:

On Aug 7, 2010, at 11:49 PM, Michael March wrote:


SSD's actually vary quite a bit with typical postgres benchmark workloads.

You mean various SSDs from different vendors? Or are you saying the same SSD model might vary in performance from drive to drive?

Model to model (more specifically, controller chip to controller chip -- i.e. most 'Indilinx Barefoot' controller based SSD's perform similar).   

 
 Many of them also do not guarantee data that has been sync'd will not be lost if power fails (most hard drives with a sane OS and file system do).

What feature does an SSD need to have to insure that sync'd data is indeed written to the SSD in the case of power loss?

 

Either properly flush to storage when the OS / File sytem asks for it (most SSD's don't, most Hard Drives do), or have a supercapacitor to flush data on power loss.

The former can be achieved by turning off the write cache on some drives (such as Intel's X25-M and -E), but hurts performance.

Also, the amount of data at risk in a power loss varies between drives.  For Intel's drives, its a small chunk of data ( < 256K).  For some other drives, the cache can be over 30MB of outstanding writes.
For some workloads this is acceptable -- not every application is doing financial transactions.   Not every part of the system needs to be on an SSD either -- the WAL, and various table spaces can all have different data integrity and performance requirements.



On Aug 7, 2010, at 4:47 PM, Michael March wrote:

If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.  

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.


Is this what everyone else is seeing?

Thanks!


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Jeff Davis
Date:
On Mon, 2010-08-09 at 09:49 -0700, Scott Carey wrote:
> Also, the amount of data at risk in a power loss varies between
> drives.  For Intel's drives, its a small chunk of data ( < 256K).  For
> some other drives, the cache can be over 30MB of outstanding writes.
> For some workloads this is acceptable -- not every application is
> doing financial transactions.   Not every part of the system needs to
> be on an SSD either -- the WAL, and various table spaces can all have
> different data integrity and performance requirements.

I don't think it makes sense to speak about the data integrity of a
drive in terms of the amount of data at risk, especially with a DBMS.
Depending on which 256K you lose, you might as well lose your entire
database.

That may be an exaggeration, but the point is that it's not as simple as
"this drive is only risking 256K data loss per outage".

Regards,
    Jeff Davis



Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Greg Smith
Date:
Scott Carey wrote:
> Also, the amount of data at risk in a power loss varies between
> drives.  For Intel's drives, its a small chunk of data ( < 256K).  For
> some other drives, the cache can be over 30MB of outstanding writes.
> For some workloads this is acceptable

No, it isn't ever acceptable.  You can expect the type of data loss you
get when a cache fails to honor write flush calls results in
catastrophic database corruption.  It's not "I lost the last few
seconds"; it's "the database is corrupted and won't start" after a
crash.  This is why we pound on this topic on this list.  A SSD that
fails to honor flush requests is completely worthless for anything other
than toy databases.  You can expect significant work to recover any
portion of your data after the first unexpected power loss under heavy
write load in this environment, during which you're down.  We do
database corruption recovery at 2ndQuadrant; while I can't talk about
the details of some recent incidents, I am not speaking theoretically
when I warn about this.

Michael, I would suggest you read
http://www.postgresql.org/docs/current/static/wal-reliability.html and
link to it at the end of your article.  You are recommending that people
consider a configuration that will result in their data being lost.
That can be acceptable, if for example your data is possible to recreate
from backups or the like.  But people should be extremely clear that
trade-off is happening, and your blog post is not doing that yet.  Part
of the reason for the bang per buck you're seeing here is that cheap
SSDs are cheating.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Greg Smith
Date:
Jeff Davis wrote:
> Depending on which 256K you lose, you might as well lose your entire
> database.
>

Let's be nice and assume that you only lose one 8K block because of the
SSD write cache; that's not so bad, right?  Guess what--you could easily
be the next lucky person who discovers the block corrupted is actually
in the middle of the pg_class system catalog, where the list of tables
in the database is at!  Enjoy getting your data back again with that
piece missing.  It's really a fun time, I'll tell you that.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Brad Nicholson
Date:
  On 8/10/2010 12:21 PM, Greg Smith wrote:
> Scott Carey wrote:
>> Also, the amount of data at risk in a power loss varies between
>> drives.  For Intel's drives, its a small chunk of data ( < 256K).
>> For some other drives, the cache can be over 30MB of outstanding writes.
>> For some workloads this is acceptable
>
> No, it isn't ever acceptable.  You can expect the type of data loss
> you get when a cache fails to honor write flush calls results in
> catastrophic database corruption.  It's not "I lost the last few
> seconds"; it's "the database is corrupted and won't start" after a
> crash.  This is why we pound on this topic on this list.  A SSD that
> fails to honor flush requests is completely worthless for anything
> other than toy databases.  You can expect significant work to recover
> any portion of your data after the first unexpected power loss under
> heavy write load in this environment, during which you're down.  We do
> database corruption recovery at 2ndQuadrant; while I can't talk about
> the details of some recent incidents, I am not speaking theoretically
> when I warn about this.
>

What about putting indexes on them?  If the drive fails and drops writes
on those, they could be rebuilt - assuming your system can function
without the index(es) temporarily.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Karl Denninger
Date:
Brad Nicholson wrote:
>  On 8/10/2010 12:21 PM, Greg Smith wrote:
>> Scott Carey wrote:
>>> Also, the amount of data at risk in a power loss varies between
>>> drives.  For Intel's drives, its a small chunk of data ( < 256K).
>>> For some other drives, the cache can be over 30MB of outstanding
>>> writes.
>>> For some workloads this is acceptable
>>
>> No, it isn't ever acceptable.  You can expect the type of data loss
>> you get when a cache fails to honor write flush calls results in
>> catastrophic database corruption.  It's not "I lost the last few
>> seconds"; it's "the database is corrupted and won't start" after a
>> crash.  This is why we pound on this topic on this list.  A SSD that
>> fails to honor flush requests is completely worthless for anything
>> other than toy databases.  You can expect significant work to recover
>> any portion of your data after the first unexpected power loss under
>> heavy write load in this environment, during which you're down.  We
>> do database corruption recovery at 2ndQuadrant; while I can't talk
>> about the details of some recent incidents, I am not speaking
>> theoretically when I warn about this.
>
> What about putting indexes on them?  If the drive fails and drops
> writes on those, they could be rebuilt - assuming your system can
> function without the index(es) temporarily.
You could put indices on them but as noted by Scott, he's SPOT ON.

ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
before you have a failure of some sort that results in catastrophic data
loss.  If you're LUCKY the database won't start and you know you're in
trouble.  If you're UNLUCKY the database DOES start but there is
undetected and unrecoverable data corruption somewhere inside the data
tables, which you WILL discover at the most-inopportune moment (like
when you desperately NEED that business record for some reason.)  You
cannot put either the tables or the logs on such a drive without running
the risk of a data corruption problem that WILL lose data and MAY be
catastrophic, depending on exactly what fails when.

While it is possible to recover that which is not damaged from a
database that has corruption like this it simply is not possible to
recover data that never made it to the disk - no matter what you do -
and the time and effort expended (not to mention money if you have to
bring in someone with specialized skills you do not possess) that result
from such "decisions" when things go wrong are extreme.

Don't do it.

-- Karl

Attachment

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Marlowe
Date:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote:

> ANY disk that says "write is complete" when it really is not is entirely
> unsuitable for ANY real database use.  It is simply a matter of time

What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Greg Smith
Date:
Brad Nicholson wrote:
> What about putting indexes on them?  If the drive fails and drops
> writes on those, they could be rebuilt - assuming your system can
> function without the index(es) temporarily.

Dumping indexes on SSD is one of the better uses for them, presuming you
can survive what is likely to be an outage from a "can the site handle
full load?" perspective while they rebuild after a crash.  As I'm sure
Brad is painfully aware of already, index rebuilding in PostgreSQL can
take a while.  To spin my broken record here again, the main thing to
note when you consider that--relocate indexes onto SSD--is that the ones
you are most concerned about the performance of were likely to be
already sitting in RAM anyway, meaning the SSD speedup doesn't help
reads much.  So the giant performance boost just isn't there in that case.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Karl Denninger
Date:
Scott Marlowe wrote:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote: 
ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time   
What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it. 
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

IF you mean "a server that only accepts SELECTs, does not accept UPDATEs or INSERTs, and on a crash **reloads the entire database from the master**", then ok.

Most people who will do this won't reload it after a crash.  They'll "inspect" the database and say "ok", and put it back online.  Bad Karma will ensue in the future.

Incidentally, that risk is not theoretical either (I know about this one from hard experience.  Fortunately the master was still ok and I was able to force a full-table copy.... I didn't like it as the database was a few hundred GB, but I had no choice.)

-- Karl
Attachment

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Brad Nicholson
Date:
  On 8/10/2010 2:28 PM, Greg Smith wrote:
> Brad Nicholson wrote:
>> What about putting indexes on them?  If the drive fails and drops
>> writes on those, they could be rebuilt - assuming your system can
>> function without the index(es) temporarily.
>
> Dumping indexes on SSD is one of the better uses for them, presuming
> you can survive what is likely to be an outage from a "can the site
> handle full load?" perspective while they rebuild after a crash.  As
> I'm sure Brad is painfully aware of already, index rebuilding in
> PostgreSQL can take a while.  To spin my broken record here again, the
> main thing to note when you consider that--relocate indexes onto
> SSD--is that the ones you are most concerned about the performance of
> were likely to be already sitting in RAM anyway, meaning the SSD
> speedup doesn't help reads much.  So the giant performance boost just
> isn't there in that case.
>
The case where I'm thinking they may be of use is for indexes you can
afford to lose.  I'm thinking of ones that are needed by nightly batch
jobs, down stream systems or reporting - the sorts of things that you
can turn off  during a rebuild, and where the data sets are not likely
to be in cache.

We have a few such cases, but we don't need the speed of SSD's for them.

Personally, I wouldn't entertain any SSD with a capacitor backing it for
anything, even indexes.  Not worth the hassle to me.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Brad Nicholson
Date:
On 8/10/2010 2:38 PM, Karl Denninger wrote:
Scott Marlowe wrote:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote: 
ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time   
What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it. 
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?


A valid case is a Slony replica if used for query offloading (not for DR).  It's considered a read-only subscriber from the perspective of Slony as only Slony can modify the data  (although you are technically correct, it is not read only - controlled write may be more accurate). 

 In case of failure, a rebuild + resubscribe gets you back to the same consistency.  If you have high IO requirements, and don't have the budget to rack up extra disk arrays to meet them, it could be an option.
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Karl Denninger
Date:
Brad Nicholson wrote:
On 8/10/2010 2:38 PM, Karl Denninger wrote:
Scott Marlowe wrote:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote: 
ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time       
What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it. 
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

A valid case is a Slony replica if used for query offloading (not for DR).  It's considered a read-only subscriber from the perspective of Slony as only Slony can modify the data  (although you are technically correct, it is not read only - controlled write may be more accurate). 

 In case of failure, a rebuild + resubscribe gets you back to the same consistency.  If you have high IO requirements, and don't have the budget to rack up extra disk arrays to meet them, it could be an option.
CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data to stable storage.

When the slave restarts it will not know that the transaction was lost.  Neither will the master, since it was told that it was committed.  Slony will happily go on its way and replicate forward, without any indication of a problem - except that on the slave, there are one or more transactions that are **missing**.

Some time later you issue an update that goes to the slave, but the change previously lost causes the slave commit to violate referential integrity.   SLONY will fail to propagate that change and all behind it - it effectively locks at that point in time.

You can recover from this by dropping the slave from replication and re-inserting it, but that forces a full-table copy of everything in the replication set.  The bad news is that the queries to the slave in question may have been returning erroneous data for some unknown period of time prior to the lockup in replication (which hopefully you detect reasonably quickly - you ARE watching SLONY queue depth with some automated process, right?)

I can both cause this in the lab and have had it happen in the field.  It's a nasty little problem that bit me on a series of disks that claimed to have write caching off, but in fact did not.  I was very happy that the data on the master was good at that point, as if I had needed to failover to the slave (thinking it was a "good" copy) I would have been in SERIOUS trouble.

-- Karl
Attachment

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Brad Nicholson
Date:
On 8/10/2010 3:28 PM, Karl Denninger wrote:
Brad Nicholson wrote:
On 8/10/2010 2:38 PM, Karl Denninger wrote:
Scott Marlowe wrote:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote: 
ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time       
What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it. 
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

A valid case is a Slony replica if used for query offloading (not for DR).  It's considered a read-only subscriber from the perspective of Slony as only Slony can modify the data  (although you are technically correct, it is not read only - controlled write may be more accurate). 

 In case of failure, a rebuild + resubscribe gets you back to the same consistency.  If you have high IO requirements, and don't have the budget to rack up extra disk arrays to meet them, it could be an option.
CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data to stable storage.

What will hose you is assuming that your data will be okay in the case of a failure, which is a very bad assumption to make in the case on unreliable SSD's.  You are assuming I am implying that these should be treated like reliable media - I am not.

In case of failure, you need to assume data loss until proven otherwise.  If there is a problem, rebuild.

When the slave restarts it will not know that the transaction was lost.  Neither will the master, since it was told that it was committed.  Slony will happily go on its way and replicate forward, without any indication of a problem - except that on the slave, there are one or more transactions that are **missing**.


Correct.
Some time later you issue an update that goes to the slave, but the change previously lost causes the slave commit to violate referential integrity.   SLONY will fail to propagate that change and all behind it - it effectively locks at that point in time.

It will lock data flow to that subscriber, but not to others.

You can recover from this by dropping the slave from replication and re-inserting it, but that forces a full-table copy of everything in the replication set.  The bad news is that the queries to the slave in question may have been returning erroneous data for some unknown period of time prior to the lockup in replication (which hopefully you detect reasonably quickly - you ARE watching SLONY queue depth with some automated process, right?)

There are ways around that - run two subscribers and redirect your queries on failure.  Don't bring up the failed replica until it is verified or rebuilt.

I can both cause this in the lab and have had it happen in the field.  It's a nasty little problem that bit me on a series of disks that claimed to have write caching off, but in fact did not.  I was very happy that the data on the master was good at that point, as if I had needed to failover to the slave (thinking it was a "good" copy) I would have been in SERIOUS trouble.

It's very easy to cause those sorts of problems.

What  I am saying is that the technology can have a use, if you are aware of the sharp edges, and can both work around them and live with them.  Everything you are citing is correct, but is more implying that they they are blindly thrown in without understanding the risks and mitigating them.

I'm also not suggesting that this is a configuration I would endorse, but it could potentially save a lot of money in certain use cases.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Marlowe
Date:
On Tue, Aug 10, 2010 at 12:38 PM, Karl Denninger <karl@denninger.net> wrote:
> Scott Marlowe wrote:
>
> On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote:
>
>
> ANY disk that says "write is complete" when it really is not is entirely
> unsuitable for ANY real database use.  It is simply a matter of time
>
>
> What about read only slaves where there's a master with 100+spinning
> hard drives "getting it right" and you need a half dozen or so read
> slaves?  I can imagine that being ok, as long as you don't restart a
> server after a crash without checking on it.
>
>
> A read-only slave isn't read-only, is it?
>
> I mean, c'mon - how does the data get there?

Well, duh.  However, what I'm looking at is having two big servers in
failover running on solid reliable hardware, and then a small army of
read only slony slaves that are used for things like sending user rss
feeds and creating weekly reports and such.  These 1U machines with 12
to 24 cores and a single SSD drive are "disposable" in terms that if
they ever crash, there's a simple script to run that reinits the db
and then subscribes them to the set.

My point being, no matter how terrible an idea a certain storage media
is, there's always a use case for it.  Even if it's very narrow.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Christopher Browne
Date:
On Tue, Aug 10, 2010 at 3:52 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> My point being, no matter how terrible an idea a certain storage media
> is, there's always a use case for it.  Even if it's very narrow.

The trouble is, if extra subscribers induce load on the "master,"
which they presumably will, then that sliver of "use case" may very
well get obscured by the cost, such that the sliver should be treated
as not existing :-(.
--
http://linuxfinances.info/info/linuxdistributions.html

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Marlowe
Date:
On Tue, Aug 10, 2010 at 2:00 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
> On Tue, Aug 10, 2010 at 3:52 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> My point being, no matter how terrible an idea a certain storage media
>> is, there's always a use case for it.  Even if it's very narrow.
>
> The trouble is, if extra subscribers induce load on the "master,"
> which they presumably will, then that sliver of "use case" may very
> well get obscured by the cost, such that the sliver should be treated
> as not existing :-(.

One master, one slave, master handles all writes, slave handles all of
the other subscribers.  I've run a setup like this with as many as 8
or so slaves at the bottom of the pile with no problems at all.

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Carey
Date:
On Aug 10, 2010, at 9:21 AM, Greg Smith wrote:

> Scott Carey wrote:
>> Also, the amount of data at risk in a power loss varies between
>> drives.  For Intel's drives, its a small chunk of data ( < 256K).  For
>> some other drives, the cache can be over 30MB of outstanding writes.
>> For some workloads this is acceptable
>
> No, it isn't ever acceptable.  You can expect the type of data loss you
> get when a cache fails to honor write flush calls results in
> catastrophic database corruption.  It's not "I lost the last few
> seconds";

I never said it was.

> it's "the database is corrupted and won't start" after a
> crash.

Which is sometimes acceptables.   There is NO GUARANTEE that you won't lose data, ever.  An increase in the likelihood
isan acceptable tradeoff in some situations, especially when it is small.  On ANY power loss event, with or without
batterybacked caches and such, you should do a consistency check on the system proactively.  With less reliable
hardware,that task becomes much more of a burden, and is much more likely to require restoring data from somewhere. 

What is the likelihood that your RAID card fails, or that the battery that reported 'good health' only lasts 5 minutes
andyou lose data before power is restored?   What is the likelihood of human error? 
Not that far off from the likelihood of power failure in a datacenter with redundant power.  One MUST have a DR plan.
Neverassume that your perfect hardware won't fail. 

> This is why we pound on this topic on this list.  A SSD that
> fails to honor flush requests is completely worthless for anything other
> than toy databases.

Overblown.  Not every DB and use case is a financial application or business critical app.   Many are not toys at all.
Slave,read only DB's (or simply subset tablespaces) ... 

Indexes. (per application, schema)
Tables. (per application, schema)
System tables / indexes.
WAL.

Each has different reliability requirement and consequences from losing recently written data.  less than 8K can be
fatalto the WAL, or table data.   Corrupting some tablespaces is not a big deal.  Corrupting others is catastrophic.
Theproblem with the assertion that this hardware is worthless is that it implies that every user, every use case, is at
thefar end of the reliability requirement spectrum. 

Yes, that can be a critical requirement for many, perhaps most, DB's.  But there are many uses for slightly unsafe
storagesystems. 

> You can expect significant work to recover any
> portion of your data after the first unexpected power loss under heavy
> write load in this environment, during which you're down.  We do
> database corruption recovery at 2ndQuadrant; while I can't talk about
> the details of some recent incidents, I am not speaking theoretically
> when I warn about this.

I've done the single-user mode recover system tables by hand thing myself at 4AM, on a system with battery backed RAID
10,redundant power, etc.   Raid cards die, and 10TB recovery times from backup are long. 

Its a game of balancing your data loss tolerance with the likelihood of power failure.  Both of these variables are
highlyvariable, and not just with 'toy' dbs.  If you know what you are doing, you can use 'fast but not completely
safe'storage for many things safely.  Chance of loss is NEVER zero, do not assume that 'good' hardware is flawless. 

Imagine a common internet case where synchronous_commit=false is fine.  Recovery from backups is a pain (but a daily
snapshotis taken of the important tables, and weekly for easily recoverable other stuff).   If you expect one power
relatedfailure every 2 years, it might be perfectly reasonable to use 'unsafe' SSD's in order to support high
transactionload on the risk that that once every 2 year downtime is 12 hours long instead of 30 minutes, and includes
losingup to a day's information.   Applications like this exist all over the place. 


> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Carey
Date:
On Aug 10, 2010, at 11:28 AM, Greg Smith wrote:

> Brad Nicholson wrote:
>> What about putting indexes on them?  If the drive fails and drops
>> writes on those, they could be rebuilt - assuming your system can
>> function without the index(es) temporarily.
>
> Dumping indexes on SSD is one of the better uses for them, presuming you
> can survive what is likely to be an outage from a "can the site handle
> full load?" perspective while they rebuild after a crash.  As I'm sure
> Brad is painfully aware of already, index rebuilding in PostgreSQL can
> take a while.  To spin my broken record here again, the main thing to
> note when you consider that--relocate indexes onto SSD--is that the ones
> you are most concerned about the performance of were likely to be
> already sitting in RAM anyway, meaning the SSD speedup doesn't help
> reads much.  So the giant performance boost just isn't there in that case.
>

For an OLTP type system, yeah.  But for DW/OLAP and batch processing the gains are pretty big.  Those indexes get
kickedout of RAM and then pulled back in a lot. I'm talking about a server with 72GB of RAM that can't keep enough
indexesin memory to avoid a lot of random access. Putting the indexes on an SSD has lowered the random I/O load on the
otherdrives a lot, letting them get through sequential scans a lot faster. 

Estimated power failure, once every 18 months (mostly due to human error).  Rebuild indexes offline for 40 minutes
every18 months?  No problem. 


> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Carey
Date:

On Aug 10, 2010, at 11:38 AM, Karl Denninger wrote:

Scott Marlowe wrote:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl@denninger.net> wrote: 
ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time   
What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it. 
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

IF you mean "a server that only accepts SELECTs, does not accept UPDATEs or INSERTs, and on a crash **reloads the entire database from the master**", then ok.


"ENTIRE database"?

Depends on your tablespace setup and schema usage pattern.  
If:
* 90% of your data tables are partitioned by date, and untouched a week after insert.  Partitions are backed up incrementally.
* The remaining 10% of it is backed up daily, and of that 9% can be re-generated from data elsewhere if data is lost.
* System catalog and wal are on 'safest of safe' hardware.

Then your 'bulk' data on a slave can be on less than flawless hardware.  Simply restore the tables from the last week from the master or backup when the (rare) power failure occurs.  The remaining data is safe, since it is not written to.
Split up your 10% of non-date partitioned data into what needs to be on safe hardware and what does not (maybe some indexes, etc).

Most of the time, the incremental cost of getting a BBU is too small to not do it, so the above hardly applies.  But if you have data that is known to be read-only, you can do many unconventional things with it safely.


Most people who will do this won't reload it after a crash.  They'll "inspect" the database and say "ok", and put it back online.  Bad Karma will ensue in the future.

Anyone going with something unconventional better know what they are doing and not just blindly plug it in and think everything will be OK.  I'd never recommend unconventional setups for a user that wasn't an expert and understood the tradeoff.


Incidentally, that risk is not theoretical either (I know about this one from hard experience.  Fortunately the master was still ok and I was able to force a full-table copy.... I didn't like it as the database was a few hundred GB, but I had no choice.)


Been there with 10TB with hardware that should have been perfectly safe.  5 days of copying, and wishing that pg_dump supported lzo compression so that the dump portion had a chance at keeping up with the much faster restore portion with some level of compression on to save the copy bandwidth.

-- Karl
<karl.vcf>

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Karl Denninger
Date:
Scott Carey wrote:
>
> On Aug 10, 2010, at 11:38 AM, Karl Denninger wrote:
>
> .....
>>
>> Most people who will do this won't reload it after a crash.  They'll
>> "inspect" the database and say "ok", and put it back online.  Bad
>> Karma will ensue in the future.
>
> Anyone going with something unconventional better know what they are
> doing and not just blindly plug it in and think everything will be OK.
>  I'd never recommend unconventional setups for a user that wasn't an
> expert and understood the tradeoff.
True.
>>
>> Incidentally, that risk is not theoretical either (I know about this
>> one from hard experience.  Fortunately the master was still ok and I
>> was able to force a full-table copy.... I didn't like it as the
>> database was a few hundred GB, but I had no choice.)
>
> Been there with 10TB with hardware that should have been perfectly
> safe.  5 days of copying, and wishing that pg_dump supported lzo
> compression so that the dump portion had a chance at keeping up with
> the much faster restore portion with some level of compression on to
> save the copy bandwidth.
Pipe it through ssh -C

PS: This works for SLONY and Bucardo too - set up a tunnel and then
change the port temporarily.    This is especially useful when the DB
being COPY'd across has big fat honking BYTEA fields in it, which
otherwise expand about 400% - or more - on the wire.

-- Karl


Attachment

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Greg Smith
Date:
Scott Carey wrote:
> What is the likelihood that your RAID card fails, or that the battery that reported 'good health' only lasts 5
minutesand you lose data before power is restored?   What is the likelihood of human error? 
>

These are all things that happen sometimes, sure.  The problem with the
cheap SSDs is that they happen downright often if you actually test for
it.  If someone is aware of the risk and makes an informed decision,
fine.  But most of the time I see articles like the one that started
this thread that are oblivious to the issue, and that's really bad.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Scott Carey
Date:
On Aug 11, 2010, at 9:30 PM, Greg Smith wrote:

> Scott Carey wrote:
>> What is the likelihood that your RAID card fails, or that the battery that reported 'good health' only lasts 5
minutesand you lose data before power is restored?   What is the likelihood of human error? 
>>
>
> These are all things that happen sometimes, sure.  The problem with the
> cheap SSDs is that they happen downright often if you actually test for
> it.  If someone is aware of the risk and makes an informed decision,
> fine.  But most of the time I see articles like the one that started
> this thread that are oblivious to the issue, and that's really bad.
>

Agreed.  There is a HUGE gap between "ooh ssd's are fast, look!" and engineering a solution that uses them properly
withall their strengths and faults.  And as 'gnuoytr' points out, there is a big difference between an Intel SSD and
say,this thing: http://www.nimbusdata.com/products/s-class_overview.html  

> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>


Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Arjen van der Meijden
Date:
On 13-8-2010 1:40 Scott Carey wrote:
> Agreed.  There is a HUGE gap between "ooh ssd's are fast, look!" and
> engineering a solution that uses them properly with all their
> strengths and faults.  And as 'gnuoytr' points out, there is a big
> difference between an Intel SSD and say, this thing:
> http://www.nimbusdata.com/products/s-class_overview.html

 From the description it sounds as if its either FreeBSD or OpenSolaris
with ZFS with some webinterface-layer. That's not a bad thing per se,
but as the site suggests its 'only' $25k for the smallest (2.5TB?)
device. That makes it very likely that it are "off the shelf" MLC flash
drives. Given the design of the device and the pricing it probably are
your average 2.5"-drives with 100, 200 or 400GB capacity (maybe OCZ
vertex 2 pro, which do have such a capacitor?), similar to the Intel SSD
you compared it to.
And than we're basically back to square one, unless the devices have a
capacitor or ZFS works better with SSD-drives to begin with (it will at
least know silent data corruption did occur).

There are of course devices that are not built on top of normal disk
form factor SSD-drives like the Ramsan devices or Sun's F5100.

Best regards,

Arjen

Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

From
Michael March
Date:
As a postscript to these tests.. I just tried the 500GB Monentus XT hybrid SSD/HDD drive. I had this fantasy that it would at least do better than the 7200 rpm desktop drive. 

Oh lord, my gut was wrong.  The performance was inconsistent and never over 2/3rds the performance of the slowest desktop drive. 

On Sat, Aug 7, 2010 at 4:47 PM, Michael March <mmarch@gmail.com> wrote:
If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.  

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.


Is this what everyone else is seeing?

Thanks!




--
<admiral>

Michael F. March ----- mmarch@gmail.com