Thread: New server setup
Hi, I'm going to setup a new server for my postgresql database, and I am considering one of these: http://www.hetzner.de/hosting/produkte_rootserver/poweredge-r720with four SAS drives in a RAID 10 array. Has any of you anyparticular comments/pitfalls/etc. to mention on the setup? My application is very write heavy.
On Fri, Mar 1, 2013 at 3:43 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:
I can only tell you our experience with Dell from several years ago. We bought two Dell servers similar (somewhat larger) than the model you're looking at. We'll never buy from them again.
Advantages: They work. They haven't failed.
Disadvantages:
Performance sucks. Dell costs far more than "white box" servers we buy from a "white box" supplier (ASA Computers). ASA gives us roughly double the performance for the same price. We can buy exactly what we want from ASA.
Dell did a disk-drive "lock in." The RAID controller won't spin up a non-Dell disk. They wanted roughly four times the price for their disks compared to buying the exact same disks on Amazon. If a disk went out today, it would probably cost even more because that model is obsolete (luckily, we bought a couple spares). I think they abandoned this policy because it caused so many complaints, but you should check before you buy. This was an incredibly stupid RAID controller design.
Dell tech support doesn't know what they're talking about when it comes to RAID controllers and serious server support. You're better off with a white-box solution, where you can buy the exact parts recommended in this group and get technical advice from people who know what they're talking about. Dell basically doesn't understand Postgres.
They boast excellent on-site service, but for the price of their computers and their service contract, you can buy two servers from a white-box vendor. Our white-box servers have been just as reliable as the Dell servers -- no failures.
I'm sure someone in Europe can recommend a good vendor for you.
Craig James
Hi, I'm going to setup a new server for my postgresql database, and I am considering one of these: http://www.hetzner.de/hosting/produkte_rootserver/poweredge-r720 with four SAS drives in a RAID 10 array. Has any of you any particular comments/pitfalls/etc. to mention on the setup? My application is very write heavy.
I can only tell you our experience with Dell from several years ago. We bought two Dell servers similar (somewhat larger) than the model you're looking at. We'll never buy from them again.
Advantages: They work. They haven't failed.
Disadvantages:
Performance sucks. Dell costs far more than "white box" servers we buy from a "white box" supplier (ASA Computers). ASA gives us roughly double the performance for the same price. We can buy exactly what we want from ASA.
Dell did a disk-drive "lock in." The RAID controller won't spin up a non-Dell disk. They wanted roughly four times the price for their disks compared to buying the exact same disks on Amazon. If a disk went out today, it would probably cost even more because that model is obsolete (luckily, we bought a couple spares). I think they abandoned this policy because it caused so many complaints, but you should check before you buy. This was an incredibly stupid RAID controller design.
Dell tech support doesn't know what they're talking about when it comes to RAID controllers and serious server support. You're better off with a white-box solution, where you can buy the exact parts recommended in this group and get technical advice from people who know what they're talking about. Dell basically doesn't understand Postgres.
They boast excellent on-site service, but for the price of their computers and their service contract, you can buy two servers from a white-box vendor. Our white-box servers have been just as reliable as the Dell servers -- no failures.
I'm sure someone in Europe can recommend a good vendor for you.
Craig James
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
pls choice PCI-E Flash for written heavy app Wales 在 2013-3-1,下午8:43,Niels Kristian Schjødt <nielskristian@autouncle.com> 写道: > Hi, I'm going to setup a new server for my postgresql database, and I am considering one of these: http://www.hetzner.de/hosting/produkte_rootserver/poweredge-r720with four SAS drives in a RAID 10 array. Has any of you anyparticular comments/pitfalls/etc. to mention on the setup? My application is very write heavy. > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Thanks both of you for your input.
Earlier I have been discussing my extremely high IO wait with you here on the mailing list, and have tried a lot of tweaks both on postgresql config, wal directly location and kernel tweaks, but unfortunately my problem persists, and I think I'm eventually down to just bad hardware (currently two 7200rpm disks in a software raid 1). So changing to 4 15000rpm SAS disks in a raid 10 is probably going to change a lot - don't you think? However, we are running a lot of background processing 300 connections to db sometimes. So my question is, should I also get something like pgpool2 setup at the same time? Is it, from your experience, likely to increase my throughput a lot more, if I had a connection pool of eg. 20 connections, instead of 300 concurrent ones directly?
Den 01/03/2013 kl. 16.28 skrev Craig James <cjames@emolecules.com>:
On Fri, Mar 1, 2013 at 3:43 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:Hi, I'm going to setup a new server for my postgresql database, and I am considering one of these: http://www.hetzner.de/hosting/produkte_rootserver/poweredge-r720 with four SAS drives in a RAID 10 array. Has any of you any particular comments/pitfalls/etc. to mention on the setup? My application is very write heavy.
I can only tell you our experience with Dell from several years ago. We bought two Dell servers similar (somewhat larger) than the model you're looking at. We'll never buy from them again.
Advantages: They work. They haven't failed.
Disadvantages:
Performance sucks. Dell costs far more than "white box" servers we buy from a "white box" supplier (ASA Computers). ASA gives us roughly double the performance for the same price. We can buy exactly what we want from ASA.
Dell did a disk-drive "lock in." The RAID controller won't spin up a non-Dell disk. They wanted roughly four times the price for their disks compared to buying the exact same disks on Amazon. If a disk went out today, it would probably cost even more because that model is obsolete (luckily, we bought a couple spares). I think they abandoned this policy because it caused so many complaints, but you should check before you buy. This was an incredibly stupid RAID controller design.
Dell tech support doesn't know what they're talking about when it comes to RAID controllers and serious server support. You're better off with a white-box solution, where you can buy the exact parts recommended in this group and get technical advice from people who know what they're talking about. Dell basically doesn't understand Postgres.
They boast excellent on-site service, but for the price of their computers and their service contract, you can buy two servers from a white-box vendor. Our white-box servers have been just as reliable as the Dell servers -- no failures.
I'm sure someone in Europe can recommend a good vendor for you.
Craig James
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > So my question is, should I also get something like pgpool2 setup > at the same time? Is it, from your experience, likely to increase > my throughput a lot more, if I had a connection pool of eg. 20 > connections, instead of 300 concurrent ones directly? In my experience, it can make a big difference. If you are just using the pooler for this reason, and don't need any of the other features of pgpool, I suggest pgbouncer. It is a simpler, more lightweight tool. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 5, 2013 at 9:34 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > >> So my question is, should I also get something like pgpool2 setup >> at the same time? Is it, from your experience, likely to increase >> my throughput a lot more, if I had a connection pool of eg. 20 >> connections, instead of 300 concurrent ones directly? > > In my experience, it can make a big difference. If you are just > using the pooler for this reason, and don't need any of the other > features of pgpool, I suggest pgbouncer. It is a simpler, more > lightweight tool. I second the pgbouncer rec.
Thanks, that was actually what I just ended up doing yesterday. Any suggestion how to tune pgbouncer? BTW, I have just bumped into an issue that caused me to disable pgbouncer again actually. My web application is queryingthe database with a per request based SEARCH_PATH. This is because I use schemas to provide country based separationof my data (e.g. english, german, danish data in different schemas). I have pgbouncer setup to have a transactionalbehavior (pool_mode = transaction) - however some of my colleagues complained that it sometimes didn't returndata from the right schema set in the SEARCH_PATH - you wouldn't by chance have any idea what is going wrong wouldn'tyou? #################### pgbouncer.ini [databases] production = [pgbouncer] logfile = /var/log/pgbouncer/pgbouncer.log pidfile = /var/run/pgbouncer/pgbouncer.pid listen_addr = localhost listen_port = 6432 unix_socket_dir = /var/run/postgresql auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres pool_mode = transaction server_reset_query = DISCARD ALL max_client_conn = 500 default_pool_size = 20 reserve_pool_size = 5 reserve_pool_timeout = 10 ##################### Den 05/03/2013 kl. 17.34 skrev Kevin Grittner <kgrittn@ymail.com>: > Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > >> So my question is, should I also get something like pgpool2 setup >> at the same time? Is it, from your experience, likely to increase >> my throughput a lot more, if I had a connection pool of eg. 20 >> connections, instead of 300 concurrent ones directly? > > In my experience, it can make a big difference. If you are just > using the pooler for this reason, and don't need any of the other > features of pgpool, I suggest pgbouncer. It is a simpler, more > lightweight tool. > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Set it to use session. I had a similar issue having moved one of the components of our app to use transactions, which introducedan undesired behavior. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Niels KristianSchjødt Sent: Tuesday, March 05, 2013 10:12 AM To: Kevin Grittner Cc: Craig James; pgsql-performance@postgresql.org Subject: Re: [PERFORM] New server setup Thanks, that was actually what I just ended up doing yesterday. Any suggestion how to tune pgbouncer? BTW, I have just bumped into an issue that caused me to disable pgbouncer again actually. My web application is queryingthe database with a per request based SEARCH_PATH. This is because I use schemas to provide country based separationof my data (e.g. english, german, danish data in different schemas). I have pgbouncer setup to have a transactionalbehavior (pool_mode = transaction) - however some of my colleagues complained that it sometimes didn't returndata from the right schema set in the SEARCH_PATH - you wouldn't by chance have any idea what is going wrong wouldn'tyou? #################### pgbouncer.ini [databases] production = [pgbouncer] logfile = /var/log/pgbouncer/pgbouncer.log pidfile = /var/run/pgbouncer/pgbouncer.pid listen_addr = localhost listen_port= 6432 unix_socket_dir = /var/run/postgresql auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt admin_users= postgres pool_mode = transaction server_reset_query = DISCARD ALL max_client_conn = 500 default_pool_size =20 reserve_pool_size = 5 reserve_pool_timeout = 10 ##################### Den 05/03/2013 kl. 17.34 skrev Kevin Grittner <kgrittn@ymail.com>: > Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > >> So my question is, should I also get something like pgpool2 setup at >> the same time? Is it, from your experience, likely to increase my >> throughput a lot more, if I had a connection pool of eg. 20 >> connections, instead of 300 concurrent ones directly? > > In my experience, it can make a big difference. If you are just using > the pooler for this reason, and don't need any of the other features > of pgpool, I suggest pgbouncer. It is a simpler, more lightweight > tool. > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Okay, thanks - but hey - if I put it at session pooling, then it says in the documentation: "default_pool_size: In session pooling it needs to be the number of max clients you want to handle at any moment". So as I understand it, is it true that I then have to set default_pool_size to 300 if I have up to 300 client connections? And then what would the pooler then help on my performance - would that just be exactly like having the 300 clients connect directly to the database???
On Tue, Mar 5, 2013 at 10:27 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:
If those 300 client connections are all long-lived, then yes you need that many in the pool. If they are short-lived connections, then you can have a lot less as any ones over the default_pool_size will simply block until an existing connection is closed and can be re-assigned--which won't take long if they are short-lived connections.
It would probably be even worse than having 300 clients connected directly. There would be no point in using a pooler under those conditions.
Cheers,
Jeff
Okay, thanks - but hey - if I put it at session pooling, then it says in the documentation: "default_pool_size: In session pooling it needs to be the number of max clients you want to handle at any moment". So as I understand it, is it true that I then have to set default_pool_size to 300 if I have up to 300 client connections?
If those 300 client connections are all long-lived, then yes you need that many in the pool. If they are short-lived connections, then you can have a lot less as any ones over the default_pool_size will simply block until an existing connection is closed and can be re-assigned--which won't take long if they are short-lived connections.
And then what would the pooler then help on my performance - would that just be exactly like having the 300 clients connect directly to the database???
It would probably be even worse than having 300 clients connected directly. There would be no point in using a pooler under those conditions.
Cheers,
Jeff
In my recent experience PgPool2 performs pretty badly as a pooler. I'd avoid it if possible, unless you depend on other features.
It simply doesn't scale.
On 5 March 2013 21:59, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Mar 5, 2013 at 10:27 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:Okay, thanks - but hey - if I put it at session pooling, then it says in the documentation: "default_pool_size: In session pooling it needs to be the number of max clients you want to handle at any moment". So as I understand it, is it true that I then have to set default_pool_size to 300 if I have up to 300 client connections?
If those 300 client connections are all long-lived, then yes you need that many in the pool. If they are short-lived connections, then you can have a lot less as any ones over the default_pool_size will simply block until an existing connection is closed and can be re-assigned--which won't take long if they are short-lived connections.And then what would the pooler then help on my performance - would that just be exactly like having the 300 clients connect directly to the database???
It would probably be even worse than having 300 clients connected directly. There would be no point in using a pooler under those conditions.
Cheers,
Jeff
GJ
On 3/1/13 6:43 AM, Niels Kristian Schjødt wrote: > Hi, I'm going to setup a new server for my postgresql database, and I am considering one of these: http://www.hetzner.de/hosting/produkte_rootserver/poweredge-r720with four SAS drives in a RAID 10 array. Has any of you anyparticular comments/pitfalls/etc. to mention on the setup? My application is very write heavy. The Dell PERC H710 (actually a LSI controller) works fine for write-heavy workloads on a RAID 10, as long as you order it with a battery backup unit module. Someone must install the controller management utility and do three things however: 1) Make sure the battery-backup unit is working. 2) Configure the controller so that the *disk* write cache is off. 3) Set the controller cache to "write-back when battery is available". That will use the cache when it is safe to do so, and if not it will bypass it. That will make the server slow down if the battery fails, but it won't ever become unsafe at writing. See http://wiki.postgresql.org/wiki/Reliable_Writes for more information about this topic. If you'd like some consulting help with making sure the server is working safely and as fast as it should be, 2ndQuadrant does offer a hardware benchmarking service to do that sort of thing: http://www.2ndquadrant.com/en/hardware-benchmarking/ I think we're even generating those reports in German now. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
On 10 March 2013 15:58, Greg Smith <greg@2ndquadrant.com> wrote:
On 3/1/13 6:43 AM, Niels Kristian Schjødt wrote:The Dell PERC H710 (actually a LSI controller) works fine for write-heavy workloads on a RAID 10, as long as you order it with a battery backup unit module. Someone must install the controller management utility and do three things however:Hi, I'm going to setup a new server for my postgresql database, and I am considering one of these: http://www.hetzner.de/hosting/produkte_rootserver/poweredge-r720 with four SAS drives in a RAID 10 array. Has any of you any particular comments/pitfalls/etc. to mention on the setup? My application is very write heavy.
We're going to go with either HP or IBM (customer's preference, etc).
1) Make sure the battery-backup unit is working.
2) Configure the controller so that the *disk* write cache is off.
3) Set the controller cache to "write-back when battery is available". That will use the cache when it is safe to do so, and if not it will bypass it. That will make the server slow down if the battery fails, but it won't ever become unsafe at writing.
See http://wiki.postgresql.org/wiki/Reliable_Writes for more information about this topic. If you'd like some consulting help with making sure the server is working safely and as fast as it should be, 2ndQuadrant does offer a hardware benchmarking service to do that sort of thing: http://www.2ndquadrant.com/en/hardware-benchmarking/ I think we're even generating those reports in German now.
Thanks Greg. I will follow advice there, and also the one in your book. I do always make sure they order battery backed cache (or flash based, which seems to be what people use these days).
I think subject of using external help with setting things up did came up, but more around connection pooling subject then hardware itself (shortly, pgpool2 is crap, we will go with dns based solution and apps connection directly to nodes).
I will let my clients (doing this on a contract) know that there's an option to get you guys to help us. Mind you, this database is rather small in grand scheme of things (30-40GB). Just possibly a lot of occasional writes.
We wouldn't need German. But Proper English (i.e. british english) would always be nice ;)
Whilst on the hardware subject, someone mentioned throwing ssd into the mix. I.e. combining spinning HDs with SSD, apparently some raid cards can use small-ish (80GB+) SSDs as external caches. Any experiences with that ?
Thanks !
GJ
On 12/03/2013 21:41, Gregg Jaskiewicz wrote: > > Whilst on the hardware subject, someone mentioned throwing ssd into > the mix. I.e. combining spinning HDs with SSD, apparently some raid > cards can use small-ish (80GB+) SSDs as external caches. Any > experiences with that ? > The new LSI/Dell cards do this (eg H710 as mentioned in an earlier post). It is easy to set up and supported it seems on all versions of dells cards even if the docs say it isn't. Works well with the limited testing I did, switched to pretty much all SSDs drives in my current setup These cards also supposedly support enhanced performance with just SSDs (CTIO) by playing with the cache settings, but to be honest I haven't noticed any difference and I'm not entirely sure it is enabled as there is no indication that CTIO is actually enabled and working. John
On 13 Mar 2013, at 15:33, John Lister <john.lister@kickstone.com> wrote: > On 12/03/2013 21:41, Gregg Jaskiewicz wrote: >> >> Whilst on the hardware subject, someone mentioned throwing ssd into the mix. I.e. combining spinning HDs with SSD, apparentlysome raid cards can use small-ish (80GB+) SSDs as external caches. Any experiences with that ? >> > The new LSI/Dell cards do this (eg H710 as mentioned in an earlier post). It is easy to set up and supported it seems onall versions of dells cards even if the docs say it isn't. Works well with the limited testing I did, switched to prettymuch all SSDs drives in my current setup > > These cards also supposedly support enhanced performance with just SSDs (CTIO) by playing with the cache settings, butto be honest I haven't noticed any difference and I'm not entirely sure it is enabled as there is no indication that CTIOis actually enabled and working. > SSDs have much shorter life then spinning drives, so what do you do when one inevitably fails in your system ?
On 13/03/2013 15:50, Greg Jaskiewicz wrote: > SSDs have much shorter life then spinning drives, so what do you do when one inevitably fails in your system ? Define much shorter? I accept they have a limited no of writes, but that depends on load. You can actively monitor the drives "health" level in terms of wear using smart and it is relatively straightforward to calculate an estimate of life based on average use and for me that works out at about in excess of 5 years. Experience tells me that spinning drives have a habit of failing in that time frame as well :( and in 5 years I'll be replacing the server probably. I also overprovisioned the drives by about an extra 13% giving me 20% spare capacity when adding in the 7% manufacturer spare space - given this currently my drives have written about 4TB of data each and show 0% wear, this is for 160GB drives. I actively monitor the wear level and plan to replace the drives when they get low. For a comparison of write levels see http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm, it shows for the 320series that it reported to have hit the wear limit at 190TB (for a drive 1/4 the size of mine) but actually managed nearer 700TB before the drive failed. I've mixed 2 different manufacturers in my raid 10 pairs to mitigate against both pairs failing at the same time either due to a firmware bug or being full In addition when I was setting the box up I did some performance testing against the drives but with using different combinations for each test - the aim here is to pre-load each drive differently to prevent them failing when full simultaneously. If you do go for raid 10 make sure to have a power fail endurance, ie capacitor or battery on the drive. John
On 03/13/2013 09:15 AM, John Lister wrote: > On 13/03/2013 15:50, Greg Jaskiewicz wrote: >> SSDs have much shorter life then spinning drives, so what do you do >> when one inevitably fails in your system ? > Define much shorter? I accept they have a limited no of writes, but > that depends on load. You can actively monitor the drives "health" > level... What concerns me more than wear is this: InfoWorld Article: http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 Referenced research paper: https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault Kind of messes with the "D" in ACID. Cheers, Steve
On 3/13/2013 2:23 PM, Steve Crawford wrote:
On 03/13/2013 09:15 AM, John Lister wrote:On 13/03/2013 15:50, Greg Jaskiewicz wrote:SSDs have much shorter life then spinning drives, so what do you do when one inevitably fails in your system ?Define much shorter? I accept they have a limited no of writes, but that depends on load. You can actively monitor the drives "health" level...
What concerns me more than wear is this:
InfoWorld Article:
http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715
Referenced research paper:
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault
Kind of messes with the "D" in ACID.
Cheers,
Steve
One potential way around this is to run ZFS as the underlying filesystem and use the SSDs as cache drives. If they lose data due to a power problem it is non-destructive.
Short of that you cannot use a SSD on a machine where silent corruption is unacceptable UNLESS you know it has a supercap or similar IN THE DISK that guarantees that on-drive cache can be flushed in the event of a power failure. A battery-backed controller cache DOES NOTHING to alleviate this risk. If you violate this rule and the power goes off you must EXPECT silent and possibly-catastrophic data corruption.
Only a few (and they're expensive!) SSD drives have said protection. If yours does not the only SAFE option is as I described up above using them as ZFS cache devices.
On Mar 13, 2013, at 3:23 PM, Steve Crawford wrote: > On 03/13/2013 09:15 AM, John Lister wrote: >> On 13/03/2013 15:50, Greg Jaskiewicz wrote: >>> SSDs have much shorter life then spinning drives, so what do you do when one inevitably fails in your system ? >> Define much shorter? I accept they have a limited no of writes, but that depends on load. You can actively monitor thedrives "health" level... > > What concerns me more than wear is this: > > InfoWorld Article: > http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 > > Referenced research paper: > https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault > > Kind of messes with the "D" in ACID. Have a look at this: http://blog.2ndquadrant.com/intel_ssd_now_off_the_sherr_sh/ I'm not sure what other ssds offer this, but Intel's newest entry will, and it's attractively priced. Another way we leverage SSDs that can be more reliable in the face of total SSD meltdown is to use them as ZFS Intent Logcaches. All the sync writes get handled on the SSDs. We deploy them as mirrored vdevs, so if one fails, we're OK. Ifboth fail, we're really slow until someone can replace them. On modest hardware, I was able to get about 20K TPS out ofpgbench with the SSDs configured as ZIL and 4 10K raptors as the spinny disks. In either case, the amount of money you'd have to spend on the two-dozen or so SAS drives (and the controllers, enclosure,etc.) that would equal a few pairs of SSDs in random IO performance is non-trivial, even if you plan on proactivelyretiring your SSDs every year. Just another take on the issue.. Charles > > Cheers, > Steve > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On 13/03/2013 19:23, Steve Crawford wrote: > On 03/13/2013 09:15 AM, John Lister wrote: >> On 13/03/2013 15:50, Greg Jaskiewicz wrote: >>> SSDs have much shorter life then spinning drives, so what do you do >>> when one inevitably fails in your system ? >> Define much shorter? I accept they have a limited no of writes, but >> that depends on load. You can actively monitor the drives "health" >> level... > > What concerns me more than wear is this: > > InfoWorld Article: > http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 > When I read this they didn't name the drives that failed - or those that passed. But I'm assuming the failed ones are standard consumer SSDS, but 2 good ones were either enterprise of had caps. The reason I say this, is that yes SSD drives by the nature of their operation cache/store information in ram while they write it to the flash and to handle the mappings, etc of real to virtual sectors and if they loose power it is this that is lost, causing at best corruption if not complete loss of the drive. Enterprise drives (and some consumer, such as the 320s) have either capacitors or battery backup to allows the drive to safely shutdown. There have been various reports both on this list and elsewhere showing that these drives successfully survive repeated power failures. A bigger concern is the state of the firmware in these drives which until recently was more likely to trash your drive - fortunately things seems to becoming more stable with age now. John
On 3/13/2013 1:23 PM, Steve Crawford wrote: > > What concerns me more than wear is this: > > InfoWorld Article: > http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 > > > Referenced research paper: > https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault > > > Kind of messes with the "D" in ACID. It is somewhat surprising to discover that many SSD products are not durable under sudden power loss (what where they thinking!?, and ...why doesn't anyone care??). However, there is a set of SSD types known to be designed to address power loss events that have been tested by contributors to this list. Use only those devices and you won't see this problem. SSDs do have a wear-out mechanism but wear can be monitored and devices replaced in advance of failure. In practice longevity is such that most machines will be in the dumpster long before the SSD wears out. We've had machines running with several hundred wps constantly for 18 months using Intel 710 drives and the wear level SMART value is still zero. In addition, like any electronics module (CPU, memory, NIC), an SSD can fail so you do need to arrange for valuable data to be replicated. As with old school disk drives, firmware bugs are a concern so you might want to consider what would happen if all the drives of a particular type all decided to quit working at the same second in time (I've only seen this happen myself with magnetic drives, but in theory it could happen with SSD).
On 14/03/13 09:16, David Boreham wrote: > On 3/13/2013 1:23 PM, Steve Crawford wrote: >> >> What concerns me more than wear is this: >> >> InfoWorld Article: >> http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 >> >> >> Referenced research paper: >> https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault >> >> >> Kind of messes with the "D" in ACID. > > It is somewhat surprising to discover that many SSD products are not > durable under sudden power loss (what where they thinking!?, and ...why > doesn't anyone care??). > > However, there is a set of SSD types known to be designed to address > power loss events that have been tested by contributors to this list. > Use only those devices and you won't see this problem. SSDs do have a > wear-out mechanism but wear can be monitored and devices replaced in > advance of failure. In practice longevity is such that most machines > will be in the dumpster long before the SSD wears out. We've had > machines running with several hundred wps constantly for 18 months using > Intel 710 drives and the wear level SMART value is still zero. > > In addition, like any electronics module (CPU, memory, NIC), an SSD can > fail so you do need to arrange for valuable data to be replicated. > As with old school disk drives, firmware bugs are a concern so you might > want to consider what would happen if all the drives of a particular > type all decided to quit working at the same second in time (I've only > seen this happen myself with magnetic drives, but in theory it could > happen with SSD). > > Just going through this now with a vendor. They initially assured us that the drives had "end to end protection" so we did not need to worry. I had to post stripdown pictures from Intel's s3700, showing obvious capacitors attached to the board before I was taken seriously and actually meaningful specifications were revealed. So now I'm demanding to know: - chipset (and version) - original manufacturer (for re-badged ones) - power off protection *explicitly* mentioned - show me the circuit board (and where are the capacitors) Seems like you gotta push 'em! Cheers Mark
On 3/13/2013 9:29 PM, Mark Kirkwood wrote:
Just going through this now with a vendor. They initially assured us that the drives had "end to end protection" so we did not need to worry. I had to post stripdown pictures from Intel's s3700, showing obvious capacitors attached to the board before I was taken seriously and actually meaningful specifications were revealed. So now I'm demanding to know:
- chipset (and version)
- original manufacturer (for re-badged ones)
- power off protection *explicitly* mentioned
- show me the circuit board (and where are the capacitors)
In addition to the above, I only use drives where I've seen compelling evidence that plug pull tests have been done and passed (e.g. done by someone on this list or in-house here). I also like to have a high level of confidence in the firmware development group. This results in a very small set of acceptable products :(
On Tue, Mar 12, 2013 at 09:41:08PM +0000, Gregg Jaskiewicz wrote: > On 10 March 2013 15:58, Greg Smith <greg@2ndquadrant.com> wrote: > > On 3/1/13 6:43 AM, Niels Kristian Schjødt wrote: > > Hi, I'm going to setup a new server for my postgresql database, and I > am considering one of these: http://www.hetzner.de/hosting/ > produkte_rootserver/poweredge-r720 with four SAS drives in a RAID 10 > array. Has any of you any particular comments/pitfalls/etc. to mention > on the setup? My application is very write heavy. > > > The Dell PERC H710 (actually a LSI controller) works fine for write-heavy > workloads on a RAID 10, as long as you order it with a battery backup unit > module. Someone must install the controller management utility and do > three things however: > > > We're going to go with either HP or IBM (customer's preference, etc). > > > > 1) Make sure the battery-backup unit is working. > > 2) Configure the controller so that the *disk* write cache is off. Only use SSDs with a BBU cache, and don't set SSD caches to write-through because an SSD needs to cache the write to avoid wearing out the chips early, see: http://momjian.us/main/blogs/pgblog/2012.html#August_3_2012 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 15/03/13 07:54, Bruce Momjian wrote: > Only use SSDs with a BBU cache, and don't set SSD caches to > write-through because an SSD needs to cache the write to avoid wearing > out the chips early, see: > > http://momjian.us/main/blogs/pgblog/2012.html#August_3_2012 > I not convinced about the need for BBU with SSD - you *can* use them without one, just need to make sure about suitable longevity and also the presence of (proven) power off protection (as discussed previously). It is worth noting that using unproven or SSD known to be lacking power off protection with a BBU will *not* save you from massive corruption (or device failure) upon unexpected power loss. Also, in terms of performance, the faster PCIe SSD do about as well by themselves as connected to a RAID card with BBU. In fact they will do better in some cases (the faster SSD can get close to the max IOPS many RAID cards can handle...so more than a couple of 'em plugged into one card will be throttled by its limitations). Cheers Mark
On 15/03/13 10:37, Mark Kirkwood wrote: > > Also, in terms of performance, the faster PCIe SSD do about as well by > themselves as connected to a RAID card with BBU. > Sorry - I meant to say "the faster **SAS** SSD do...", since you can't currently plug PCIe SSD into RAID cards (confusingly, some of the PCIe guys actually have RAID card firmware on their boards...Intel 910 I think). Cheers Mark
On Fri, Mar 15, 2013 at 10:37:55AM +1300, Mark Kirkwood wrote: > On 15/03/13 07:54, Bruce Momjian wrote: > >Only use SSDs with a BBU cache, and don't set SSD caches to > >write-through because an SSD needs to cache the write to avoid wearing > >out the chips early, see: > > > > http://momjian.us/main/blogs/pgblog/2012.html#August_3_2012 > > > > I not convinced about the need for BBU with SSD - you *can* use them > without one, just need to make sure about suitable longevity and > also the presence of (proven) power off protection (as discussed > previously). It is worth noting that using unproven or SSD known to > be lacking power off protection with a BBU will *not* save you from > massive corruption (or device failure) upon unexpected power loss. I don't think any drive that corrupts on power-off is suitable for a database, but for non-db uses, sure, I guess they are OK, though you have to be pretty money-constrainted to like that tradeoff. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 15/03/13 11:34, Bruce Momjian wrote: > > I don't think any drive that corrupts on power-off is suitable for a > database, but for non-db uses, sure, I guess they are OK, though you > have to be pretty money-constrainted to like that tradeoff. > Agreed - really *all* SSD should have capacitor (or equivalent) power off protection...that fact that it's a feature present on only a handful of drives is...disappointing.
On 3/14/2013 3:37 PM, Mark Kirkwood wrote:
I not convinced about the need for BBU with SSD - you *can* use them without one, just need to make sure about suitable longevity and also the presence of (proven) power off protection (as discussed previously). It is worth noting that using unproven or SSD known to be lacking power off protection with a BBU will *not* save you from massive corruption (or device failure) upon unexpected power loss.
I think it probably depends on the specifics of the deployment, but for us the fact that the BBU isn't required in order to achieve high write tps with SSDs is one of the key benefits -- the power, cooling and space savings over even a few servers are significant. In our case we only have one or two drives per server so no need for fancy drive string arrangements.
Also, in terms of performance, the faster PCIe SSD do about as well by themselves as connected to a RAID card with BBU. In fact they will do better in some cases (the faster SSD can get close to the max IOPS many RAID cards can handle...so more than a couple of 'em plugged into one card will be throttled by its limitations).
You might want to evaluate the performance you can achieve with a single-SSD (use several for capacity by all means) before considering a RAID card + SSD solution.
Again I bet it depends on the application but our experience with the older Intel 710 series is that their performance out-runs the CPU, at least under our PG workload.
>> I not convinced about the need for BBU with SSD - you *can* use them >> without one, just need to make sure about suitable longevity and also >> the presence of (proven) power off protection (as discussed >> previously). It is worth noting that using unproven or SSD known to be >> lacking power off protection with a BBU will *not* save you from >> massive corruption (or device failure) upon unexpected power loss. >I don't think any drive that corrupts on power-off is suitable for a database, but for non-db uses, sure, I guess they areOK, though you have to be pretty money->constrainted to like that tradeoff. Wouldn't mission critical databases normally be configured in a high availability cluster - presumably with replicas runningon different power sources? If you lose power to a member of the cluster (or even the master), you would have new data coming in and stuff to do longbefore it could come back online - corrupted disk or not. I find it hard to imagine configuring something that is too critical to be able to be restored from periodic backup to NOTbe in a (synchronous) cluster. I'm not sure all the fuss over whether an SSD might come back after a hard server failureis really about. You should architect the solution so you can lose the server and throw it away and never bring itback online again. Native streaming replication is fairly straightforward to configure. Asynchronous multimaster (albeitwith some synchronization latency) is also fairly easy to configure using third party tools such as SymmetricDS. Agreed that adding a supercap doesn't sound like a hard thing for a hardware manufacturer to do, but I don't think it shouldbe a necessarily be showstopper for being able to take advantage of some awesome I/O performance opportunities.
On Fri, Mar 15, 2013 at 06:06:02PM +0000, Rick Otten wrote: > >I don't think any drive that corrupts on power-off is suitable for a > >database, but for non-db uses, sure, I guess they are OK, though you > >have to be pretty money->constrainted to like that tradeoff. > > Wouldn't mission critical databases normally be configured in a high > availability cluster - presumably with replicas running on different > power sources? > > If you lose power to a member of the cluster (or even the master), you > would have new data coming in and stuff to do long before it could > come back online - corrupted disk or not. > > I find it hard to imagine configuring something that is too critical > to be able to be restored from periodic backup to NOT be in a > (synchronous) cluster. I'm not sure all the fuss over whether an SSD > might come back after a hard server failure is really about. You > should architect the solution so you can lose the server and throw > it away and never bring it back online again. Native streaming > replication is fairly straightforward to configure. Asynchronous > multimaster (albeit with some synchronization latency) is also fairly > easy to configure using third party tools such as SymmetricDS. > > Agreed that adding a supercap doesn't sound like a hard thing for > a hardware manufacturer to do, but I don't think it should be a > necessarily be showstopper for being able to take advantage of some > awesome I/O performance opportunities. Do you want to recreate the server if it loses power over an extra $100 per drive? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Mar 15, 2013 at 12:06 PM, Rick Otten <rotten@manta.com> wrote: >>> I not convinced about the need for BBU with SSD - you *can* use them >>> without one, just need to make sure about suitable longevity and also >>> the presence of (proven) power off protection (as discussed >>> previously). It is worth noting that using unproven or SSD known to be >>> lacking power off protection with a BBU will *not* save you from >>> massive corruption (or device failure) upon unexpected power loss. > >>I don't think any drive that corrupts on power-off is suitable for a database, but for non-db uses, sure, I guess theyare OK, though you have to be pretty money->constrainted to like that tradeoff. > > Wouldn't mission critical databases normally be configured in a high availability cluster - presumably with replicas runningon different power sources? I've worked in high end data centers where certain failures resulted in ALL power being lost. more than once. Relying on never losing power to keep your data from getting corrupted is not a good idea. Now if they're geographically separate you're maybe ok.
On 16/03/13 07:06, Rick Otten wrote: >>> I not convinced about the need for BBU with SSD - you *can* use them >>> without one, just need to make sure about suitable longevity and also >>> the presence of (proven) power off protection (as discussed >>> previously). It is worth noting that using unproven or SSD known to be >>> lacking power off protection with a BBU will *not* save you from >>> massive corruption (or device failure) upon unexpected power loss. > >> I don't think any drive that corrupts on power-off is suitable for a database, but for non-db uses, sure, I guess theyare OK, though you have to be pretty money->constrainted to like that tradeoff. > > Wouldn't mission critical databases normally be configured in a high availability cluster - presumably with replicas runningon different power sources? > > If you lose power to a member of the cluster (or even the master), you would have new data coming in and stuff to do longbefore it could come back online - corrupted disk or not. > > I find it hard to imagine configuring something that is too critical to be able to be restored from periodic backup toNOT be in a (synchronous) cluster. I'm not sure all the fuss over whether an SSD might come back after a hard server failureis really about. You should architect the solution so you can lose the server and throw it away and never bring itback online again. Native streaming replication is fairly straightforward to configure. Asynchronous multimaster (albeitwith some synchronization latency) is also fairly easy to configure using third party tools such as SymmetricDS. > > Agreed that adding a supercap doesn't sound like a hard thing for a hardware manufacturer to do, but I don't think it shouldbe a necessarily be showstopper for being able to take advantage of some awesome I/O performance opportunities. > > A somewhat extreme point of view. I note that the Mongodb guys added journaling for single server reliability a while ago - an admission that while in *theory* lots of semi-reliable nodes can be eventually consistent, it is a lot less hassle if individual nodes are as reliable as possible. That is what this discussion is about. Regards Mark
On Thu, Mar 14, 2013 at 4:37 PM, David Boreham <david_list@boreham.org> wrote: > You might want to evaluate the performance you can achieve with a single-SSD > (use several for capacity by all means) before considering a RAID card + SSD > solution. > Again I bet it depends on the application but our experience with the older > Intel 710 series is that their performance out-runs the CPU, at least under > our PG workload. How many people are using a single enterprise grade SSD for production without RAID? I've had a few consumer grade SSDs brick themselves - but are the enterprise grade SSDs, like the new Intel S3700 which you can get in sizes up to 800GB, reliable enough to run as a single drive without RAID1? The performance of one is definitely good enough for most medium sized workloads without the complexity of a BBU RAID and multiple spinning disks... -Dave
On 3/20/2013 6:44 PM, David Rees wrote: > On Thu, Mar 14, 2013 at 4:37 PM, David Boreham <david_list@boreham.org> wrote: >> You might want to evaluate the performance you can achieve with a single-SSD >> (use several for capacity by all means) before considering a RAID card + SSD >> solution. >> Again I bet it depends on the application but our experience with the older >> Intel 710 series is that their performance out-runs the CPU, at least under >> our PG workload. > How many people are using a single enterprise grade SSD for production > without RAID? I've had a few consumer grade SSDs brick themselves - > but are the enterprise grade SSDs, like the new Intel S3700 which you > can get in sizes up to 800GB, reliable enough to run as a single drive > without RAID1? The performance of one is definitely good enough for > most medium sized workloads without the complexity of a BBU RAID and > multiple spinning disks... > You're replying to my post, but I'll raise my hand again :) We run a bunch of single-socket 1U, short-depth machines (Supermicro chassis) using 1x Intel 710 drives (we'd use S3700 in new deployments today). The most recent of these have 128G and E5-2620 hex-core CPU and dissipate less than 150W at full-load. Couldn't be happier with the setup. We have 18 months up time with no drive failures, running at several hundred wps 7x24. We also write 10's of GB of log files every day that are rotated, so the drives are getting beaten up on bulk data overwrites too. There is of course a non-zero probability of some unpleasant firmware bug afflicting the drives (as with regular spinning drives), and initially we deployed a "spare" 10k HD in the chassis, spun-down, that would allow us to re-jigger the machines without SSD remotely (the data center is 1000 miles away). We never had to do that, and later deployments omitted the HD spare. We've also considered mixing SSD from two vendors for firmware-bug-diversity, but so far we only have one approved vendor (Intel).
On 3/20/2013 7:44 PM, David Rees wrote:
Two is one, one is none.On Thu, Mar 14, 2013 at 4:37 PM, David Boreham <david_list@boreham.org> wrote:You might want to evaluate the performance you can achieve with a single-SSD (use several for capacity by all means) before considering a RAID card + SSD solution. Again I bet it depends on the application but our experience with the older Intel 710 series is that their performance out-runs the CPU, at least under our PG workload.How many people are using a single enterprise grade SSD for production without RAID? I've had a few consumer grade SSDs brick themselves - but are the enterprise grade SSDs, like the new Intel S3700 which you can get in sizes up to 800GB, reliable enough to run as a single drive without RAID1? The performance of one is definitely good enough for most medium sized workloads without the complexity of a BBU RAID and multiple spinning disks... -Dave
:-)
-
On Wed, Mar 20, 2013 at 6:44 PM, David Rees <drees76@gmail.com> wrote: > On Thu, Mar 14, 2013 at 4:37 PM, David Boreham <david_list@boreham.org> wrote: >> You might want to evaluate the performance you can achieve with a single-SSD >> (use several for capacity by all means) before considering a RAID card + SSD >> solution. >> Again I bet it depends on the application but our experience with the older >> Intel 710 series is that their performance out-runs the CPU, at least under >> our PG workload. > > How many people are using a single enterprise grade SSD for production > without RAID? I've had a few consumer grade SSDs brick themselves - > but are the enterprise grade SSDs, like the new Intel S3700 which you > can get in sizes up to 800GB, reliable enough to run as a single drive > without RAID1? The performance of one is definitely good enough for > most medium sized workloads without the complexity of a BBU RAID and > multiple spinning disks... I would still at least run two in software RAID-1 for reliability.
On 21/03/13 13:44, David Rees wrote: > On Thu, Mar 14, 2013 at 4:37 PM, David Boreham <david_list@boreham.org> wrote: >> You might want to evaluate the performance you can achieve with a single-SSD >> (use several for capacity by all means) before considering a RAID card + SSD >> solution. >> Again I bet it depends on the application but our experience with the older >> Intel 710 series is that their performance out-runs the CPU, at least under >> our PG workload. > > How many people are using a single enterprise grade SSD for production > without RAID? I've had a few consumer grade SSDs brick themselves - > but are the enterprise grade SSDs, like the new Intel S3700 which you > can get in sizes up to 800GB, reliable enough to run as a single drive > without RAID1? The performance of one is definitely good enough for > most medium sized workloads without the complexity of a BBU RAID and > multiple spinning disks... > If you are using Intel S3700 or 710's you can certainly use a pair setup in software RAID1 (so avoiding the need for RAID cards and BBU etc). I'd certainly feel happier with 2 drives :-) . However, a setup using replication with a number of hosts - each with a single SSD is going to be ok. Regards Mark