Thread: Amazon High I/O instances
Hello,
Since Amazon has added new high I/O instance types and EBS volumes, anyone has done some benchmark of PostgreSQL on them ?
I will be testing my app soon, but was curious to know if others have done some tests so I can compare / have a rough idea to what to expect. Looking on Google, I found an article about MySQL (http://palominodb.com/blog/2012/07/24/palomino-evaluates-amazon%E2%80%99s-new-high-io-ssd-instances), but nothing about PostgresSQL.
Thanks !
Sébastien
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit : > > > Since Amazon has added new high I/O instance types and EBS volumes, > anyone has done some benchmark of PostgreSQL on them ? > I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? For instance, the dedibox I use for my app (visible in sig) costs 14,00 euros/month, and sits at .03% load average with 5 active users; you can admin it like a home pc. -- Vincent Veyron http://marica.fr/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit : > > > Since Amazon has added new high I/O instance types and EBS volumes, > anyone has done some benchmark of PostgreSQL on them ? > I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? For instance, the dedibox I use for my app (visible in sig) costs 14,00 euros/month, and sits at .03% load average with 5 active users; you can admin it like a home pc. -- Vincent Veyron http://marica.fr/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
On 21 Aug 2012, at 13:32, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
Since Amazon has added new high I/O instance types and EBS volumes,
anyone has done some benchmark of PostgreSQL on them ?
I wonder : is there a reason why you have to go through the complexity
of such a setup, rather than simply use bare metal and get good
performance with simplicity?
For instance, the dedibox I use for my app (visible in sig) costs 14,00
euros/month, and sits at .03% load average with 5 active users; you can
admin it like a home pc.
This is a general 'cloud or dedicated' question, I won't go into it but I believe cloud proponents cite management ease, scalability etc. I'm sure there's a place for every type of hosting. However I would be interested in hearing some experiences of PostgreSQL on an Amazon high I/O instance, given a client has just proposed running on one. If there are none forthcoming in the short term I may be in a position to provide some results myself in a month or two.
Oliver Kohll
On 8/21/2012 7:10 AM, Oliver Kohll - Mailing Lists wrote: > > This is a general 'cloud or dedicated' question, I won't go into it > but I believe cloud proponents cite management ease, scalability etc. > I'm sure there's a place for every type of hosting. However I would be > interested in hearing some experiences of PostgreSQL on an Amazon high > I/O instance, given a client has just proposed running on one. If > there are none forthcoming in the short term I may be in a position to > provide some results myself in a month or two. > Amazon don't say what vendor's SSDs they are using, which is a little worrying to me -- when we deployed our SSD-based machines last year, much work was done to address the risk of write endurance problems. Now, an AWS instance and its ephemeral storage isn't expected to live forever (keep that in mind when storing data on one!) so perhaps one can ignore write endurance as a concern in this case since we'd already be worried about (and have a plan to address) "entire machine endurance". For sure performance on these instances for any I/O limited application is going to be great. I have a friend who is looking at them for his "big data" analytics application which spends most of its time sorting Tb sized files.
On 8/21/2012 2:18 AM, Vincent Veyron wrote: > I wonder : is there a reason why you have to go through the complexity > of such a setup, rather than simply use bare metal and get good > performance with simplicity? In general I agree -- it is much (much!) cheaper to buy tin and deploy yourself vs any of the current cloud services. However, there are plenty of counterexample use cases : for example what if you want one of these machines for a week only? Another one : what if you are a venture capitalist funding 10 companies with questionable business models where you expect only one to succeed? AWS saves you from the headache of selling 500 machines on eBay...
On Tue, Aug 21, 2012 at 12:33 AM, Sébastien Lorion <sl@thestrangefactory.com> wrote: > Hello, > > Since Amazon has added new high I/O instance types and EBS volumes, anyone > has done some benchmark of PostgreSQL on them ? > > http://perspectives.mvdirona.com/2012/07/20/IOPerformanceNoLongerSucksInTheCloud.aspx > http://perspectives.mvdirona.com/2012/08/01/EBSProvisionedIOPSOptimizedInstanceTypes.aspx > http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html > > I will be testing my app soon, but was curious to know if others have done > some tests so I can compare / have a rough idea to what to expect. Looking > on Google, I found an article about MySQL > (http://palominodb.com/blog/2012/07/24/palomino-evaluates-amazon%E2%80%99s-new-high-io-ssd-instances), > but nothing about PostgresSQL. here's a datapoint, stock config: pgbench -i -s 500 pgbench -c 16 -T 60 number of transactions actually processed: 418012 tps = 6962.607292 (including connections establishing) tps = 6973.154593 (excluding connections establishing) not too shabby. this was run by a friend who is evaluating high i/o instances for their high load db servers. we didn't have time to kick off a high scale read only test unfortunately. Regarding 'AWS vs bare metal', I think high i/o instances full a huge niche in their lineup. Dollar for dollar, I'm coming around to the point of view that dealing with aws is a cheaper/more effective solution than renting out space from a data center or (even worse) running your own data center unless you're very large or have other special requirements. Historically the problem with AWS is that you had no solution for highly transaction bound systems which forced you to split your environment which ruined most of the benefit, and they fixed that. merlin
Le mardi 21 août 2012 à 09:36 -0500, Merlin Moncure a écrit : > here's a datapoint, stock config: > pgbench -i -s 500 > pgbench -c 16 -T 60 > number of transactions actually processed: 418012 > tps = 6962.607292 (including connections establishing) > tps = 6973.154593 (excluding connections establishing) > > not too shabby. this was run by a friend who is evaluating high i/o > instances for their high load db servers. we didn't have time to > kick off a high scale read only test unfortunately. > > Regarding 'AWS vs bare metal', I think high i/o instances full a huge > niche in their lineup. Dollar for dollar, I'm coming around to the > point of view that dealing with aws is a cheaper/more effective > solution than renting out space from a data center or (even worse) > running your own data center unless you're very large or have other > special requirements. Historically the problem with AWS is that you > had no solution for highly transaction bound systems which forced you > to split your environment which ruined most of the benefit, and they > fixed that. > Hi Merlin, I am sure you can get good performance with these. I simply focused on the part where seb said he was testing his app, and since you can get some really high data throughput (by my very modest standards anyway) with a good machine, I wondered why he did it. Maybe seb is planning for an application that already has hundreds of users after all, I did oversee that option. To Sébastien : please use 'reply all' to send your reply to the list Le mardi 21 août 2012 à 10:29 -0400, Sébastien Lorion a écrit : Could you elaborate on the complexity you mention ? Setting up a machine on Amazon, even with a script, is quite simple. As for the pricing you give, it can be matched on Amazon using Micro or small instances, which would be adequate given your load average. > > Well, it _has_ to be more complicated to use AWS than a bare machine, because of the added layer? -- Vincent Veyron http://vincentveyron.com/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
Oops sorry, I thought I did hit reply all.
I am not sure this mailing list is the right place to have this debate (assuming it is needed, there are plenty of articles stating the benefits of using the cloud), so I will simply answer that you pay the cost of the added layer up front (mostly scripting the Amazon API and batch configuration), but it saves you a ton of time even in the short term and even for a 2-3 machines setup. Besides, launching and shutting down 10's or 100's of new instances of a server to answer a burst of requests is hardly feasible on dedicated hardware, nor is it cheap to rent servers in different physical locations with some respectable SLA.
Sébastien
On Tue, Aug 21, 2012 at 1:16 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
Le mardi 21 août 2012 à 09:36 -0500, Merlin Moncure a écrit :Hi Merlin,
> here's a datapoint, stock config:
> pgbench -i -s 500
> pgbench -c 16 -T 60
> number of transactions actually processed: 418012
> tps = 6962.607292 (including connections establishing)
> tps = 6973.154593 (excluding connections establishing)
>
> not too shabby. this was run by a friend who is evaluating high i/o
> instances for their high load db servers. we didn't have time to
> kick off a high scale read only test unfortunately.
>
> Regarding 'AWS vs bare metal', I think high i/o instances full a huge
> niche in their lineup. Dollar for dollar, I'm coming around to the
> point of view that dealing with aws is a cheaper/more effective
> solution than renting out space from a data center or (even worse)
> running your own data center unless you're very large or have other
> special requirements. Historically the problem with AWS is that you
> had no solution for highly transaction bound systems which forced you
> to split your environment which ruined most of the benefit, and they
> fixed that.
>
I am sure you can get good performance with these.
I simply focused on the part where seb said he was testing his app, and
since you can get some really high data throughput (by my very modest
standards anyway) with a good machine, I wondered why he did it.
Maybe seb is planning for an application that already has hundreds of
users after all, I did oversee that option.
To Sébastien : please use 'reply all' to send your reply to the list
Le mardi 21 août 2012 à 10:29 -0400, Sébastien Lorion a écrit :Could you elaborate on the complexity you mention ? Setting up a machineWell, it _has_ to be more complicated to use AWS than a bare machine,
on Amazon, even with a script, is quite simple. As for the pricing you
give, it can be matched on Amazon using Micro or small instances, which
would be adequate given your load average.
>
>
because of the added layer?
--
Vincent Veyron
http://vincentveyron.com/Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
On 08/21/2012 09:40 PM, David Boreham wrote: > On 8/21/2012 2:18 AM, Vincent Veyron wrote: >> I wonder : is there a reason why you have to go through the complexity >> of such a setup, rather than simply use bare metal and get good >> performance with simplicity? > In general I agree -- it is much (much!) cheaper to buy tin and deploy > yourself vs any of the current cloud services. > > However, there are plenty of counterexample use cases : for example what > if you want one of these machines for a week only? > Another one : what if you are a venture capitalist funding 10 companies > with questionable business models where you expect only one to succeed? > AWS saves you from the headache of selling 500 machines on eBay... Dedibox appears to be a hosting company that offers dedicated machines. He appears to be suggesting that buying access to real hardware in a datacenter (if not buying the hardware yourself) is more cost effective and easier to manage than using "cloud" style services with more transient hosts like EC2 offers. At least that's how I understood it. Vincent? I wasn't sure what Vincent meant until I did an `mtr` on his host address, either. http://dedibox.fr/ redirects to http://www.online.net/ A look at their product page suggests that they're at least claiming the machines are dedicated: http://www.online.net/serveur-dedie/offre-dedibox-sc.xhtml running Via Nano (Nano U2250) CPUs on Dell VX11-VS8 machines. The VS8 appears to be a blade: http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2009/05/19/dell-launches-quot-fortuna-quot-via-nano-based-server-for-hyperscale-customers.aspx http://www.flickr.com/photos/netbooknews/3537912243/ so yeah, a dedicated server for €15/month. That's *AWESOME* when you mostly need storage and you don't care about performance or storage reliability; it's a local HDD so you get great gobs of storage w/o paying per GB. -- Craig Ringer
Le mercredi 22 août 2012 à 13:15 +0800, Craig Ringer a écrit : > He appears to be suggesting that buying access to real hardware in a > datacenter (if not buying the hardware yourself) is more cost effective > and easier to manage than using "cloud" style services with more > transient hosts like EC2 offers. At least that's how I understood it. Hi Craig, Actually, my comments about costs were misleading : I simply reacted to the fact that the OP wanted to test his application for high performance, and thought that it would be easier with bare metal rather than with AWS, because you have less parameters to control this way. Also, I'll admit that I jumped the gun without reading about the SSD offer by Amazon. Still, I would test first with a machine that I control, but it maybe that Sébastien already did that. I am curious to know what kind of application requires 10s to 100s of instances with a PostgreSQL database, because that could get unwieldy with big data (which I assumed from the high performance specification) -- Vincent Veyron http://marica.fr/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
Vincent, I would appreciate that you stop assuming things based on zero information about what I am doing. I understand that you are trying to be helpful, but I can assure you that going bare-metal only does not make any sense in my context.
Sébastien
On Wed, Aug 22, 2012 at 12:44 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
Le mercredi 22 août 2012 à 13:15 +0800, Craig Ringer a écrit :Hi Craig,
> He appears to be suggesting that buying access to real hardware in a
> datacenter (if not buying the hardware yourself) is more cost effective
> and easier to manage than using "cloud" style services with more
> transient hosts like EC2 offers. At least that's how I understood it.
Actually, my comments about costs were misleading : I simply reacted to
the fact that the OP wanted to test his application for high
performance, and thought that it would be easier with bare metal rather
than with AWS, because you have less parameters to control this way.
Also, I'll admit that I jumped the gun without reading about the SSD
offer by Amazon. Still, I would test first with a machine that I
control, but it maybe that Sébastien already did that.
I am curious to know what kind of application requires 10s to 100s of
instances with a PostgreSQL database, because that could get unwieldy
with big data (which I assumed from the high performance specification)
--
Vincent Veyron
http://marica.fr/
Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
Just looking into High IO instances for a DB deployment. In order to get past 1TB, we are looking at RAID-0. I have heard (http://hackerne.ws/item?id=4266119) there might be a problem if TRIM isn't supported. Does anyone know if it is and has anyone used RAID-0 on these instances? (Linux of course…)
On Aug 21, 2012, at 9:36 AM, Merlin Moncure wrote:
On Tue, Aug 21, 2012 at 12:33 AM, Sébastien Lorion
<sl@thestrangefactory.com> wrote:Hello,Since Amazon has added new high I/O instance types and EBS volumes, anyonehas done some benchmark of PostgreSQL on them ?http://perspectives.mvdirona.com/2012/07/20/IOPerformanceNoLongerSucksInTheCloud.aspxhttp://perspectives.mvdirona.com/2012/08/01/EBSProvisionedIOPSOptimizedInstanceTypes.aspxhttp://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.htmlI will be testing my app soon, but was curious to know if others have donesome tests so I can compare / have a rough idea to what to expect. Lookingon Google, I found an article about MySQL(http://palominodb.com/blog/2012/07/24/palomino-evaluates-amazon%E2%80%99s-new-high-io-ssd-instances),but nothing about PostgresSQL.
here's a datapoint, stock config:
pgbench -i -s 500
pgbench -c 16 -T 60
number of transactions actually processed: 418012
tps = 6962.607292 (including connections establishing)
tps = 6973.154593 (excluding connections establishing)
not too shabby. this was run by a friend who is evaluating high i/o
instances for their high load db servers. we didn't have time to
kick off a high scale read only test unfortunately.
Regarding 'AWS vs bare metal', I think high i/o instances full a huge
niche in their lineup. Dollar for dollar, I'm coming around to the
point of view that dealing with aws is a cheaper/more effective
solution than renting out space from a data center or (even worse)
running your own data center unless you're very large or have other
special requirements. Historically the problem with AWS is that you
had no solution for highly transaction bound systems which forced you
to split your environment which ruined most of the benefit, and they
fixed that.
merlin
On Wednesday, August 22, 2012 04:10:01 PM Andrew Hannon wrote: > Just looking into High IO instances for a DB deployment. In order to get > past 1TB, we are looking at RAID-0. I have heard > (http://hackerne.ws/item?id=4266119) there might be a problem if TRIM isn't > supported. Does anyone know if it is and has anyone used RAID-0 on these > instances? (Linux of course…) Just use LVM striping. If it turns out to be an issue; that seems to be mostly conjecture. I note that the SSDs are only instance storage. The data will be gone when the instance goes away. I have used instance storage in replicated setups but it always feels rather fragile unless your data really is transient or you can maintain 2 replicas. Their other new service, provisioned IOPS for EBS, might be more useful for a persistent database. Although not nearly SSD speeds, of course.
Le mercredi 22 août 2012 à 13:30 -0400, Sébastien Lorion a écrit : > Vincent, I would appreciate that you stop assuming things based on > zero information about what I am doing. I understand that you are > trying to be helpful, but I can assure you that going bare-metal only > does not make any sense in my context. Dude, I would appreciate you realize that approaching a newsgroup while providing zero information about what you are doing (in your own words) is not the best way to get relevant responses to your question. Ignoring repeated requests for information does not help, castigating people trying to help for not having said information at least shows a certain consistency on your part. Lest we ridicule ourselves publicly, I suggest we leave the discussion at that and wish you luck in your endeavor. Vincent Veyron
On Thu, Aug 23, 2012 at 7:39 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > Lest we ridicule ourselves publicly, I suggest we leave the discussion > at that and wish you luck in your endeavor. If anyone has an answer to his question, I'd appreciate hearing it, despite any faux pas that the OP has committed.
Vincent,
The original question can be summed up by "how is general performance of PostgreSQL on Amazon IOPS". I fail to understand why that would require me to explain the specifics of my application and/or my market. The only one asking for that information is you, while others have provided useful answers, for which I am very grateful.
p.s. My name is not "dude" or "seb", we have not raised the pigs together ...
Sébastien
On Thu, Aug 23, 2012 at 7:39 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
Le mercredi 22 août 2012 à 13:30 -0400, Sébastien Lorion a écrit :> Vincent, I would appreciate that you stop assuming things based onDude,
> zero information about what I am doing. I understand that you are
> trying to be helpful, but I can assure you that going bare-metal only
> does not make any sense in my context.
I would appreciate you realize that approaching a newsgroup while
providing zero information about what you are doing (in your own words)
is not the best way to get relevant responses to your question.
Ignoring repeated requests for information does not help, castigating
people trying to help for not having said information at least shows a
certain consistency on your part.
Lest we ridicule ourselves publicly, I suggest we leave the discussion
at that and wish you luck in your endeavor.
Vincent Veyron
On 08/23/2012 07:39 PM, Vincent Veyron wrote: > Le mercredi 22 août 2012 à 13:30 -0400, Sébastien Lorion a écrit : >> Vincent, I would appreciate that you stop assuming things based on >> zero information about what I am doing. I understand that you are >> trying to be helpful, but I can assure you that going bare-metal only >> does not make any sense in my context. > > > Dude, > > I would appreciate you realize that approaching a newsgroup while > providing zero information about what you are doing In this case, what he's doing is seeking generalized performance measurements. I don't think details were particularly necessary until it got pulled off-track. I'll be interested to hear if you have any results Sébastien, or if anyone else does. It's good to have data on the increasingly popular cloud platforms out there. -- Craig Ringer
On Wed, Aug 22, 2012 at 4:12 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: > On Wednesday, August 22, 2012 04:10:01 PM Andrew Hannon wrote: >> Just looking into High IO instances for a DB deployment. In order to get >> past 1TB, we are looking at RAID-0. I have heard >> (http://hackerne.ws/item?id=4266119) there might be a problem if TRIM isn't >> supported. Does anyone know if it is and has anyone used RAID-0 on these >> instances? (Linux of course…) > > Just use LVM striping. If it turns out to be an issue; that seems to be mostly > conjecture. > > I note that the SSDs are only instance storage. The data will be gone when the > instance goes away. I have used instance storage in replicated setups but it > always feels rather fragile unless your data really is transient or you can > maintain 2 replicas. > > Their other new service, provisioned IOPS for EBS, might be more useful for a > persistent database. Although not nearly SSD speeds, of course. Yeah -- I should have mentioned that: you absolutely must run hs/sr or some other strategy that maintains your data. I guess you might as well turn off fsync, right? merlin
On 08/23/12 6:49 AM, Craig Ringer wrote: > In this case, what he's doing is seeking generalized performance > measurements. I don't think details were particularly necessary until > it got pulled off-track. "42" performance measurements without a very narrow definition of 'performance' are useless. depending on the nature of the application workload, postgres can stress completely different aspects of the system (cpu vs read IO performance vs write IO performance being the big three). -- john r pierce N 37, W 122 santa cruz ca mid-left coast
I think both kind of tests (general and app specific) are complementary and useful in their own way. At a minimum, if the general ones fail, why go to the expenses of doing the specific ones ? Setting up a meaningful application test can take a lot of time and it can be hard to pinpoint exactly where in the stack the performance drops occur. The way I see it, synthetic benchmarks allow to isolate somewhat the layers and serve as a base to validate application tests done later on. It surprises me that asking for the general perf behavior of a platform is controversial.
Sébastien
On Thu, Aug 23, 2012 at 11:51 AM, John R Pierce <pierce@hogranch.com> wrote:
On 08/23/12 6:49 AM, Craig Ringer wrote:"42"In this case, what he's doing is seeking generalized performance measurements. I don't think details were particularly necessary until it got pulled off-track.
performance measurements without a very narrow definition of 'performance' are useless. depending on the nature of the application workload, postgres can stress completely different aspects of the system (cpu vs read IO performance vs write IO performance being the big three).
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 08/23/12 11:24 AM, Sébastien Lorion wrote: > I think both kind of tests (general and app specific) are > complementary and useful in their own way. At a minimum, if the > general ones fail, why go to the expenses of doing the specific ones ? > Setting up a meaningful application test can take a lot of time and it > can be hard to pinpoint exactly where in the stack the performance > drops occur. The way I see it, synthetic benchmarks allow to isolate > somewhat the layers and serve as a base to validate application tests > done later on. It surprises me that asking for the general perf > behavior of a platform is controversial. I don't use AWS at all. But, it shouldnt take more than a couple hours to spin up an instance, populate a pgbench database and run a series of pgbench runs against it, and do the same against any other sort of system you wish to use as your reference. I like to test with a database about twice the size of the available memory if I'm testing IO, and I've found that pgbench -i -s ####, for ####=10000 it generates a 1 billion row table and uses about 150GB (and a hour or so to initialize on fast IO hardware). I then run pgbench with -c of about 2-4X the cpu/thread count, and -j of about -c/16, and a -t of at least 10000 (so each client connection runs 10000 transactions). on a modest but decent 2U class 2-socket dedicated server with a decent raid card and raid10 across enough spindles, I can see numbers as high as 5000 transactions/second with 15krpm rust, and 7000-8000 with a couple MLC SSD's striped. trying to raid10 a bunch of SATA 7200 disks gives numbers more like 1000. using host based raid, without a write-back cache in the raid card, gives numbers about 1/2 the above. the IOPS during these tests hit around 12000 or 15000 small writes/second. doing this level of IO on a midsized SAN can often cause the SAN CPU to run at 80%+ so if there's other activity on the SAN from other hosts, good luck. in a heavily virtualized shared-everything environment, I'm guessing your numbers will be all over the place and difficult to achieve consistency. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
I will be setting up an instance in the coming days and post the results here.
While reading on the subject, I found this interesting discussion on YCombinator:
Sébastien
On Thu, Aug 23, 2012 at 2:41 PM, John R Pierce <pierce@hogranch.com> wrote:
On 08/23/12 11:24 AM, Sébastien Lorion wrote:I don't use AWS at all. But, it shouldnt take more than a couple hours to spin up an instance, populate a pgbench database and run a series of pgbench runs against it, and do the same against any other sort of system you wish to use as your reference.I think both kind of tests (general and app specific) are complementary and useful in their own way. At a minimum, if the general ones fail, why go to the expenses of doing the specific ones ? Setting up a meaningful application test can take a lot of time and it can be hard to pinpoint exactly where in the stack the performance drops occur. The way I see it, synthetic benchmarks allow to isolate somewhat the layers and serve as a base to validate application tests done later on. It surprises me that asking for the general perf behavior of a platform is controversial.
I like to test with a database about twice the size of the available memory if I'm testing IO, and I've found that pgbench -i -s ####, for ####=10000 it generates a 1 billion row table and uses about 150GB (and a hour or so to initialize on fast IO hardware). I then run pgbench with -c of about 2-4X the cpu/thread count, and -j of about -c/16, and a -t of at least 10000 (so each client connection runs 10000 transactions).
on a modest but decent 2U class 2-socket dedicated server with a decent raid card and raid10 across enough spindles, I can see numbers as high as 5000 transactions/second with 15krpm rust, and 7000-8000 with a couple MLC SSD's striped. trying to raid10 a bunch of SATA 7200 disks gives numbers more like 1000. using host based raid, without a write-back cache in the raid card, gives numbers about 1/2 the above. the IOPS during these tests hit around 12000 or 15000 small writes/second.
doing this level of IO on a midsized SAN can often cause the SAN CPU to run at 80%+ so if there's other activity on the SAN from other hosts, good luck.
in a heavily virtualized shared-everything environment, I'm guessing your numbers will be all over the place and difficult to achieve consistency.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Finally I got time to setup an instance and do some tests.
Instance:
High-Mem 4x large (8 cores, 68 GB)
EBS-Optimized flag set (allow up to 1000 Mbits/s transfer)
10GB standard EBS for OS
8x100GB in RAID10 for data (max 1000 iops)
2x100GB in RAID0 for WAL (max 1000 iops)
FreeBSD 9.0
PostgreSQL 9.1.5
OS is on UFS
data is on ZFS with noatime, recordsize=8K, logbias=throughput
WAL is on ZFS with noatime, recordsize=8K, logbias=latency, primarycache=metadata
I have included config files used.
Results:
I ran the same pgbench command 5 times in a row, with 5 min. sleep between each.
autovacuum was off
scale is 10000 and -t = 10000
read-write
clients/threads : result1, result2, result3, result4, result5
8/1: 3210, 2394, 2668, 1943, 2894
8/8: 3285, 2487, 2423, 2839, 3380
32/8: 4862, 3116, 2933, 3053, 3013
64/8: 2988, 1197, 867, 1159, 828
read-only (-S)
clients/threads : result1, result2, result3, result4, result5
8/1: 5978, 5983, 6110, 6091, 6158
8/8: 6081, 6022, 6109, 6027, 5479
32/8: 5169, 5144, 4762, 5293, 4936
64/8: 3916, 4203, 4199, 4261, 4070
I also let `pgbench -c 32 -j 8 -T 10800` run last night, with autovacuum turned on this time, and the results is 694 tps.
As you can see, I am nowhere near the results John mentioned for a 10,000 scale (about 8000 tps) and I am not sure why. My instance setup and configuration should be ok, but I am far from an expert (a startup founder has to wear many hats...), I simply followed advice found in Greg Smith book and what I read on the net. So, if anyone can offer insight as to why the performance is not as good as expected, please let me know ..
I did not terminate the AMI yet, so I can do more testing and/or try suggestions to improve the results. I will also try to run the benchmarks again on a pure RAID1 configuration with fsync off, which I will use for read-only databases.
Many thanks!
Sébastien
On Thu, Aug 23, 2012 at 2:41 PM, John R Pierce <pierce@hogranch.com> wrote:
On 08/23/12 11:24 AM, Sébastien Lorion wrote:I don't use AWS at all. But, it shouldnt take more than a couple hours to spin up an instance, populate a pgbench database and run a series of pgbench runs against it, and do the same against any other sort of system you wish to use as your reference.I think both kind of tests (general and app specific) are complementary and useful in their own way. At a minimum, if the general ones fail, why go to the expenses of doing the specific ones ? Setting up a meaningful application test can take a lot of time and it can be hard to pinpoint exactly where in the stack the performance drops occur. The way I see it, synthetic benchmarks allow to isolate somewhat the layers and serve as a base to validate application tests done later on. It surprises me that asking for the general perf behavior of a platform is controversial.
I like to test with a database about twice the size of the available memory if I'm testing IO, and I've found that pgbench -i -s ####, for ####=10000 it generates a 1 billion row table and uses about 150GB (and a hour or so to initialize on fast IO hardware). I then run pgbench with -c of about 2-4X the cpu/thread count, and -j of about -c/16, and a -t of at least 10000 (so each client connection runs 10000 transactions).
on a modest but decent 2U class 2-socket dedicated server with a decent raid card and raid10 across enough spindles, I can see numbers as high as 5000 transactions/second with 15krpm rust, and 7000-8000 with a couple MLC SSD's striped. trying to raid10 a bunch of SATA 7200 disks gives numbers more like 1000. using host based raid, without a write-back cache in the raid card, gives numbers about 1/2 the above. the IOPS during these tests hit around 12000 or 15000 small writes/second.
doing this level of IO on a midsized SAN can often cause the SAN CPU to run at 80%+ so if there's other activity on the SAN from other hosts, good luck.
in a heavily virtualized shared-everything environment, I'm guessing your numbers will be all over the place and difficult to achieve consistency.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Attachment
Le 2012-09-12 à 17:08, Sébastien Lorion a écrit : > As you can see, I am nowhere near the results John mentioned for a 10,000 scale (about 8000 tps) and I am not sure why.My instance setup and configuration should be ok, but I am far from an expert (a startup founder has to wear many hats...),I simply followed advice found in Greg Smith book and what I read on the net. So, if anyone can offer insight asto why the performance is not as good as expected, please let me know .. > > I did not terminate the AMI yet, so I can do more testing and/or try suggestions to improve the results. I will also tryto run the benchmarks again on a pure RAID1 configuration with fsync off, which I will use for read-only databases. I wonder if work_mem is too high? That's 1 GB per connection (max), but still a lot of RAM. Might want to try with more reasonablevalues, such as 16MB to 64MB. What are the iostat / vmstat numbers during the test? Bye, François
On 09/12/12 3:17 PM, François Beausoleil wrote: > What are the iostat / vmstat numbers during the test? note you need to run iostat with -x <interval> and ignore the first sample as its average since reboot. I usually use 5, 10, or 30 second intervals when analyzing IO performance problems. on a system with multiple storage devices, the device list can get long and confusing, I'll often specify just the device(s) I'm investigating... iostat -x 5 sdc sdd ditto, vmstat needs a <interval> argument, or it shows average since reboot -- john r pierce N 37, W 122 santa cruz ca mid-left coast
I agree 1GB is a lot, I played around with that value, but it hardly makes a difference. Is there a plateau in how that value affects query performance ? On a master DB, I would set it low and raise as necessary, but what would be a good average value on a read-only DB with same spec and max_connections ?
I will run a test again and let you know how is the IO. Might also run bonnie++ to see if the raid performs as expected...
Sébastien
On Wed, Sep 12, 2012 at 6:17 PM, François Beausoleil <francois@teksol.info> wrote:
Le 2012-09-12 à 17:08, Sébastien Lorion a écrit :I wonder if work_mem is too high? That's 1 GB per connection (max), but still a lot of RAM. Might want to try with more reasonable values, such as 16MB to 64MB.
> As you can see, I am nowhere near the results John mentioned for a 10,000 scale (about 8000 tps) and I am not sure why. My instance setup and configuration should be ok, but I am far from an expert (a startup founder has to wear many hats...), I simply followed advice found in Greg Smith book and what I read on the net. So, if anyone can offer insight as to why the performance is not as good as expected, please let me know ..
>
> I did not terminate the AMI yet, so I can do more testing and/or try suggestions to improve the results. I will also try to run the benchmarks again on a pure RAID1 configuration with fsync off, which I will use for read-only databases.
What are the iostat / vmstat numbers during the test?
Bye,
François
On 09/12/12 4:03 PM, Sébastien Lorion wrote: > I agree 1GB is a lot, I played around with that value, but it hardly > makes a difference. Is there a plateau in how that value affects query > performance ? On a master DB, I would set it low and raise as > necessary, but what would be a good average value on a read-only DB > with same spec and max_connections ? a complex query can require several times work_mem for sorts and hash merges. how many queries do you expect to ever be executing concurrently? I'll take 25% of my system memory and divide it by 'max_connections' and use that as work_mem for most cases. on a large memory system doing dedicated transaction processing, I generally shoot for about 50% of the server memory as disk cache, 1-2GB as shared_buffers, 512MB-2GB as maintenance_work_mem, and 20-25% as work_mem (divided by max_connections) -- john r pierce N 37, W 122 santa cruz ca mid-left coast
You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
Sébastien
On Wed, Sep 12, 2012 at 7:24 PM, John R Pierce <pierce@hogranch.com> wrote:
On 09/12/12 4:03 PM, Sébastien Lorion wrote:a complex query can require several times work_mem for sorts and hash merges. how many queries do you expect to ever be executing concurrently? I'll take 25% of my system memory and divide it by 'max_connections' and use that as work_mem for most cases.I agree 1GB is a lot, I played around with that value, but it hardly makes a difference. Is there a plateau in how that value affects query performance ? On a master DB, I would set it low and raise as necessary, but what would be a good average value on a read-only DB with same spec and max_connections ?
on a large memory system doing dedicated transaction processing, I generally shoot for about 50% of the server memory as disk cache, 1-2GB as shared_buffers, 512MB-2GB as maintenance_work_mem, and 20-25% as work_mem (divided by max_connections)
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 09/12/12 4:49 PM, Sébastien Lorion wrote: > You set shared_buffers way below what is suggested in Greg Smith book > (25% or more of RAM) .. what is the rationale behind that rule of > thumb ? Other values are more or less what I set, though I could lower > the effective_cache_size and vfs.zfs.arc_max and see how it goes. I think those 25% rules were typically created when ram was no more than 4-8GB. for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
The DB back-end of my application has 2 use cases:
- a normalized master DB, sharded by userid (based on their activity, not a formula such as modulo, because some users can be 1-2 order of magnitude more active than others)
- many denormalized read-only slaves, with some different models depending on what kind of queries they are serving
All requests are queued in RabbitMQ, and most writes are fire-and-forget, with calculations done on the client, assuming the write worked fine and UI refreshed at most a couple of seconds later with real values.
At the moment, I have total 73 tables, 432 columns and 123 relations, so not overly complex, but not a key-value store either .. Most queries are localized to sub-system which on average have 5-10 tables.
So far, about 10% of traffic hit the master DB (a lot I know, this an interactive application), which is the one that really concerns me. Users make an average of 1 write request every 5 sec., so with say, with 100,000 concurrent users, that makes 20,000 tx/s. That said, that number could grow overnight and I do not want to be one more startup that redo his system under the worst of conditions as I read too often about. I got time to at least prepare a bit, without overdoing it, of course..
Sébastien
On Wed, Sep 12, 2012 at 7:24 PM, John R Pierce <pierce@hogranch.com> wrote:
On 09/12/12 4:03 PM, Sébastien Lorion wrote:a complex query can require several times work_mem for sorts and hash merges. how many queries do you expect to ever be executing concurrently? I'll take 25% of my system memory and divide it by 'max_connections' and use that as work_mem for most cases.I agree 1GB is a lot, I played around with that value, but it hardly makes a difference. Is there a plateau in how that value affects query performance ? On a master DB, I would set it low and raise as necessary, but what would be a good average value on a read-only DB with same spec and max_connections ?
on a large memory system doing dedicated transaction processing, I generally shoot for about 50% of the server memory as disk cache, 1-2GB as shared_buffers, 512MB-2GB as maintenance_work_mem, and 20-25% as work_mem (divided by max_connections)
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ok, make sense .. I will update that as well and report back. Thank you for your advice.
Sébastien
On Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:
On 09/12/12 4:49 PM, Sébastien Lorion wrote:I think those 25% rules were typically created when ram was no more than 4-8GB.You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Is dedicating 2 drives for WAL too much ? Since my whole raid is comprised of SSD drives, should I just put it in the main pool ?
Sébastien
On Wed, Sep 12, 2012 at 8:28 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
Ok, make sense .. I will update that as well and report back. Thank you for your advice.SébastienOn Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:On 09/12/12 4:49 PM, Sébastien Lorion wrote:I think those 25% rules were typically created when ram was no more than 4-8GB.You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
One more question .. I could not set wal_sync_method to anything else but fsync .. is that expected or should other choices be also available ? I am not sure how the EC2 SSD cache flushing is handled on EC2, but I hope it is flushing the whole cache on every sync .. As a side note, I got corrupted databases (errors about pg_xlog directories not found, etc) at first when running my tests, and I suspect it was because of vfs.zfs.cache_flush_disable=1, though I cannot prove it for sure.
Sébastien
On Wed, Sep 12, 2012 at 8:49 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
Is dedicating 2 drives for WAL too much ? Since my whole raid is comprised of SSD drives, should I just put it in the main pool ?SébastienOn Wed, Sep 12, 2012 at 8:28 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:Ok, make sense .. I will update that as well and report back. Thank you for your advice.SébastienOn Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:On 09/12/12 4:49 PM, Sébastien Lorion wrote:I think those 25% rules were typically created when ram was no more than 4-8GB.You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Forgot to say that this is it with new values suggested (see included postgresql.conf) and ARC cache size set to 32GB.
Sébastien
On Wed, Sep 12, 2012 at 9:16 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
I recreated the DB and WAL pools, and launched pgbench -i -s 10000. Here are the stats during the load (still running):
iostat (xbd13-14 are WAL zpool)device r/s w/s kr/s kw/s qlen svc_t %bxbd8 0.0 471.5 0.0 14809.3 40 67.9 84xbd7 0.0 448.1 0.0 14072.6 39 62.0 74xbd6 0.0 472.3 0.0 14658.6 39 61.3 77xbd5 0.0 464.7 0.0 14433.1 39 61.4 76xbd14 0.0 0.0 0.0 0.0 0 0.0 0xbd13 0.0 0.0 0.0 0.0 0 0.0 0xbd12 0.0 460.1 0.0 14189.7 40 63.4 78xbd11 0.0 462.9 0.0 14282.8 40 61.8 76xbd10 0.0 477.0 0.0 14762.1 38 61.2 77xbd9 0.0 477.6 0.0 14796.2 38 61.1 77
zpool iostat (db pool)
pool alloc free read write read write
db 11.1G 387G 0 6.62K 0 62.9M
vmstatprocs memory page disks faults cpur b w avm fre flt re pi po fr sr ad0 xb8 in sy cs us sy id0 0 0 3026M 35G 126 0 0 0 29555 0 0 478 2364 31201 26165 10 9 81toplast pid: 1333; load averages: 1.89, 1.65, 1.08 up 0+01:17:08 01:13:4532 processes: 2 running, 30 sleepingCPU: 10.3% user, 0.0% nice, 7.8% system, 1.2% interrupt, 80.7% idleMem: 26M Active, 19M Inact, 33G Wired, 16K Cache, 25M Buf, 33G Free
On Wed, Sep 12, 2012 at 9:02 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>
> One more question .. I could not set wal_sync_method to anything else but fsync .. is that expected or should other choices be also available ? I am not sure how the EC2 SSD cache flushing is handled on EC2, but I hope it is flushing the whole cache on every sync .. As a side note, I got corrupted databases (errors about pg_xlog directories not found, etc) at first when running my tests, and I suspect it was because of vfs.zfs.cache_flush_disable=1, though I cannot prove it for sure.
>
> Sébastien
>
>
> On Wed, Sep 12, 2012 at 8:49 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>
>> Is dedicating 2 drives for WAL too much ? Since my whole raid is comprised of SSD drives, should I just put it in the main pool ?
>>
>> Sébastien
>>
>>
>> On Wed, Sep 12, 2012 at 8:28 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>>
>>> Ok, make sense .. I will update that as well and report back. Thank you for your advice.
>>>
>>> Sébastien
>>>
>>>
>>> On Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:
>>>>
>>>> On 09/12/12 4:49 PM, Sébastien Lorion wrote:
>>>>>
>>>>> You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
>>>>
>>>>
>>>> I think those 25% rules were typically created when ram was no more than 4-8GB.
>>>>
>>>> for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> john r pierce N 37, W 122
>>>> santa cruz ca mid-left coast
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>>
>
Attachment
I recreated the DB and WAL pools, and launched pgbench -i -s 10000. Here are the stats during the load (still running):
iostat (xbd13-14 are WAL zpool)
zpool iostat (db pool)
pool alloc free read write read write
db 11.1G 387G 0 6.62K 0 62.9M
vmstat
top
On Wed, Sep 12, 2012 at 9:02 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>
> One more question .. I could not set wal_sync_method to anything else but fsync .. is that expected or should other choices be also available ? I am not sure how the EC2 SSD cache flushing is handled on EC2, but I hope it is flushing the whole cache on every sync .. As a side note, I got corrupted databases (errors about pg_xlog directories not found, etc) at first when running my tests, and I suspect it was because of vfs.zfs.cache_flush_disable=1, though I cannot prove it for sure.
>
> Sébastien
>
>
> On Wed, Sep 12, 2012 at 8:49 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>
>> Is dedicating 2 drives for WAL too much ? Since my whole raid is comprised of SSD drives, should I just put it in the main pool ?
>>
>> Sébastien
>>
>>
>> On Wed, Sep 12, 2012 at 8:28 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>>
>>> Ok, make sense .. I will update that as well and report back. Thank you for your advice.
>>>
>>> Sébastien
>>>
>>>
>>> On Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:
>>>>
>>>> On 09/12/12 4:49 PM, Sébastien Lorion wrote:
>>>>>
>>>>> You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
>>>>
>>>>
>>>> I think those 25% rules were typically created when ram was no more than 4-8GB.
>>>>
>>>> for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> john r pierce N 37, W 122
>>>> santa cruz ca mid-left coast
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>>
>
iostat (xbd13-14 are WAL zpool)
device r/s w/s kr/s kw/s qlen svc_t %b
xbd8 0.0 471.5 0.0 14809.3 40 67.9 84
xbd7 0.0 448.1 0.0 14072.6 39 62.0 74
xbd6 0.0 472.3 0.0 14658.6 39 61.3 77
xbd5 0.0 464.7 0.0 14433.1 39 61.4 76
xbd14 0.0 0.0 0.0 0.0 0 0.0 0
xbd13 0.0 0.0 0.0 0.0 0 0.0 0
xbd12 0.0 460.1 0.0 14189.7 40 63.4 78
xbd11 0.0 462.9 0.0 14282.8 40 61.8 76
xbd10 0.0 477.0 0.0 14762.1 38 61.2 77
xbd9 0.0 477.6 0.0 14796.2 38 61.1 77
zpool iostat (db pool)
pool alloc free read write read write
db 11.1G 387G 0 6.62K 0 62.9M
vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 xb8 in sy cs us sy id
0 0 0 3026M 35G 126 0 0 0 29555 0 0 478 2364 31201 26165 10 9 81
last pid: 1333; load averages: 1.89, 1.65, 1.08 up 0+01:17:08 01:13:45
32 processes: 2 running, 30 sleeping
CPU: 10.3% user, 0.0% nice, 7.8% system, 1.2% interrupt, 80.7% idle
Mem: 26M Active, 19M Inact, 33G Wired, 16K Cache, 25M Buf, 33G Free
On Wed, Sep 12, 2012 at 9:02 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>
> One more question .. I could not set wal_sync_method to anything else but fsync .. is that expected or should other choices be also available ? I am not sure how the EC2 SSD cache flushing is handled on EC2, but I hope it is flushing the whole cache on every sync .. As a side note, I got corrupted databases (errors about pg_xlog directories not found, etc) at first when running my tests, and I suspect it was because of vfs.zfs.cache_flush_disable=1, though I cannot prove it for sure.
>
> Sébastien
>
>
> On Wed, Sep 12, 2012 at 8:49 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>
>> Is dedicating 2 drives for WAL too much ? Since my whole raid is comprised of SSD drives, should I just put it in the main pool ?
>>
>> Sébastien
>>
>>
>> On Wed, Sep 12, 2012 at 8:28 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>>
>>> Ok, make sense .. I will update that as well and report back. Thank you for your advice.
>>>
>>> Sébastien
>>>
>>>
>>> On Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:
>>>>
>>>> On 09/12/12 4:49 PM, Sébastien Lorion wrote:
>>>>>
>>>>> You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
>>>>
>>>>
>>>> I think those 25% rules were typically created when ram was no more than 4-8GB.
>>>>
>>>> for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> john r pierce N 37, W 122
>>>> santa cruz ca mid-left coast
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>>
>
On Tue, Aug 21, 2012 at 1:18 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit :I wonder : is there a reason why you have to go through the complexity
>
>
> Since Amazon has added new high I/O instance types and EBS volumes,
> anyone has done some benchmark of PostgreSQL on them ?
>
of such a setup, rather than simply use bare metal and get good
performance with simplicity?
For instance, the dedibox I use for my app (visible in sig) costs 14,00
euros/month, and sits at .03% load average with 5 active users; you can
admin it like a home pc.
The main use cases I know of are relatively small instances where the web server and db server for an app may be on the same system.
--
Vincent Veyron
http://marica.fr/
Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 09/12/12 10:01 PM, Sébastien Lorion wrote: > pgbench initialization has been going on for almost 5 hours now and > still stuck before vacuum starts .. something is definitely wrong as I > don't remember it took so long first time I created the db.... pgbench initialization with a high scale factor, like the -s 10000 I frequently use, does take quite a few hours. you need a large maintenance_work_mem, or the create index phase will take a really long time to index the 150GB worth of tables its created. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
pgbench initialization has been going on for almost 5 hours now and still stuck before vacuum starts .. something is definitely wrong as I don't remember it took so long first time I created the db. Here are the current stats now:
iostat (xbd13-14 are WAL zpool)
device r/s w/s kr/s kw/s qlen svc_t %b
xbd8 161.3 109.8 1285.4 3450.5 0 12.5 19
xbd7 159.5 110.6 1272.3 3450.5 0 11.4 14
xbd6 161.1 108.8 1284.4 3270.6 0 10.9 14
xbd5 159.5 109.0 1273.1 3270.6 0 11.6 15
xbd14 0.0 0.0 0.0 0.0 0 0.0 0
xbd13 0.0 0.0 0.0 0.0 0 0.0 0
xbd12 204.6 110.8 1631.3 3329.2 0 9.1 15
xbd11 216.0 111.2 1722.5 3329.2 1 8.6 16
xbd10 197.2 109.4 1573.5 3285.8 0 9.8 15
xbd9 195.0 109.4 1557.1 3285.8 0 9.9 15
zpool iostat (db pool)
pool alloc free read write read write
db 143G 255G 1.40K 1.53K 11.2M 12.0M
vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 xb8 in sy cs us sy id
0 0 0 5634M 28G 7 0 0 0 7339 0 0 245 2091 6358 20828 2 5 93
0 0 0 5634M 28G 10 0 0 0 6989 0 0 312 1993 6033 20090 1 4 95
0 0 0 5634M 28G 7 0 0 0 6803 0 0 292 1974 6111 22763 2 5 93
0 0 0 5634M 28G 10 0 0 0 7418 0 0 339 2041 6170 20838 2 4 94
0 0 0 5634M 28G 123 0 0 0 6980 0 0 282 1977 5906 19961 2 4 94
top
last pid: 2430; load averages: 0.72, 0.73, 0.69 up 0+04:56:16 04:52:53
32 processes: 1 running, 31 sleeping
CPU: 1.8% user, 0.0% nice, 5.3% system, 1.4% interrupt, 91.5% idle
Mem: 1817M Active, 25M Inact, 36G Wired, 24K Cache, 699M Buf, 28G Free
Swap:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1283 pgsql 1 34 0 3967M 1896M zio->i 5 80:14 21.00% postgres
1282 pgsql 1 25 0 25740K 3088K select 2 10:34 0.00% pgbench
1274 pgsql 1 20 0 2151M 76876K select 1 0:09 0.00% postgres
On Wed, Sep 12, 2012 at 9:16 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
I recreated the DB and WAL pools, and launched pgbench -i -s 10000. Here are the stats during the load (still running):
iostat (xbd13-14 are WAL zpool)device r/s w/s kr/s kw/s qlen svc_t %bxbd8 0.0 471.5 0.0 14809.3 40 67.9 84xbd7 0.0 448.1 0.0 14072.6 39 62.0 74xbd6 0.0 472.3 0.0 14658.6 39 61.3 77xbd5 0.0 464.7 0.0 14433.1 39 61.4 76xbd14 0.0 0.0 0.0 0.0 0 0.0 0xbd13 0.0 0.0 0.0 0.0 0 0.0 0xbd12 0.0 460.1 0.0 14189.7 40 63.4 78xbd11 0.0 462.9 0.0 14282.8 40 61.8 76xbd10 0.0 477.0 0.0 14762.1 38 61.2 77xbd9 0.0 477.6 0.0 14796.2 38 61.1 77
zpool iostat (db pool)
pool alloc free read write read write
db 11.1G 387G 0 6.62K 0 62.9M
vmstatprocs memory page disks faults cpur b w avm fre flt re pi po fr sr ad0 xb8 in sy cs us sy id0 0 0 3026M 35G 126 0 0 0 29555 0 0 478 2364 31201 26165 10 9 81toplast pid: 1333; load averages: 1.89, 1.65, 1.08 up 0+01:17:08 01:13:4532 processes: 2 running, 30 sleepingCPU: 10.3% user, 0.0% nice, 7.8% system, 1.2% interrupt, 80.7% idleMem: 26M Active, 19M Inact, 33G Wired, 16K Cache, 25M Buf, 33G Free
On Wed, Sep 12, 2012 at 9:02 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>
> One more question .. I could not set wal_sync_method to anything else but fsync .. is that expected or should other choices be also available ? I am not sure how the EC2 SSD cache flushing is handled on EC2, but I hope it is flushing the whole cache on every sync .. As a side note, I got corrupted databases (errors about pg_xlog directories not found, etc) at first when running my tests, and I suspect it was because of vfs.zfs.cache_flush_disable=1, though I cannot prove it for sure.
>
> Sébastien
>
>
> On Wed, Sep 12, 2012 at 8:49 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>
>> Is dedicating 2 drives for WAL too much ? Since my whole raid is comprised of SSD drives, should I just put it in the main pool ?
>>
>> Sébastien
>>
>>
>> On Wed, Sep 12, 2012 at 8:28 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
>>>
>>> Ok, make sense .. I will update that as well and report back. Thank you for your advice.
>>>
>>> Sébastien
>>>
>>>
>>> On Wed, Sep 12, 2012 at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:
>>>>
>>>> On 09/12/12 4:49 PM, Sébastien Lorion wrote:
>>>>>
>>>>> You set shared_buffers way below what is suggested in Greg Smith book (25% or more of RAM) .. what is the rationale behind that rule of thumb ? Other values are more or less what I set, though I could lower the effective_cache_size and vfs.zfs.arc_max and see how it goes.
>>>>
>>>>
>>>> I think those 25% rules were typically created when ram was no more than 4-8GB.
>>>>
>>>> for our highly transactional workload, at least, too large of a shared_buffers seems to slow us down, perhaps due to higher overhead of managing that many 8k buffers. I've heard other read-mostly workloads, such as data warehousing, can take advantage of larger buffer counts.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> john r pierce N 37, W 122
>>>> santa cruz ca mid-left coast
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>>
>
maintenance_work_mem is already 4GB. How large should it be during load then ?
Sébastien
On Thu, Sep 13, 2012 at 1:29 AM, John R Pierce <pierce@hogranch.com> wrote:
On 09/12/12 10:01 PM, Sébastien Lorion wrote:pgbench initialization has been going on for almost 5 hours now and still stuck before vacuum starts .. something is definitely wrong as I don't remember it took so long first time I created the db....
pgbench initialization with a high scale factor, like the -s 10000 I frequently use, does take quite a few hours. you need a large maintenance_work_mem, or the create index phase will take a really long time to index the 150GB worth of tables its created.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I started db creation over, this time with 16GB maintenance_work_mem and fsync=off and it does not seem to have a great effect. After again 5 hours, during index creation, disk and cpu are barely used: 95% idle and 2-3 MB/s writes (150 reads/s, 90 writes/s).
Sébastien
On Thu, Sep 13, 2012 at 1:29 AM, John R Pierce <pierce@hogranch.com> wrote:
On 09/12/12 10:01 PM, Sébastien Lorion wrote:pgbench initialization has been going on for almost 5 hours now and still stuck before vacuum starts .. something is definitely wrong as I don't remember it took so long first time I created the db....
pgbench initialization with a high scale factor, like the -s 10000 I frequently use, does take quite a few hours. you need a large maintenance_work_mem, or the create index phase will take a really long time to index the 150GB worth of tables its created.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 09/13/12 2:08 PM, Sébastien Lorion wrote: > I started db creation over, this time with 16GB maintenance_work_mem > and fsync=off and it does not seem to have a great effect. After again > 5 hours, during index creation, disk and cpu are barely used: 95% idle > and 2-3 MB/s writes (150 reads/s, 90 writes/s). I've never had to set maintenance_work_mem any higher than 1gb for plenty good enough performance. whats the %busy on the disk ? if you have a slow disk device (such as a shared virtual disk), 90 write/sec may be all its good for. MB/s is fairly meaningless when dealing with random committed writes. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Disks are doing 150 read + 90 write ops/s when they should be able to do a total of 1000 iops each as currently configured (this is the max that can be set). Total bandwidth is 1000mb/s each too. So clearly, either there is something wrong with ZFS/FreeBSD on Amazon (either because of config or something deeper) or PostgreSQL is not fully utilizing the hardware, again because of config or some other issue.
I will make another test instance with pg 9.2 this time.
Concerning shared_buffers and wal_buffers, I found this article interesting:
http://rhaas.blogspot.ca/2012/03/tuning-sharedbuffers-and-walbuffers.html
Sébastien
On Thu, Sep 13, 2012 at 5:28 PM, John R Pierce <pierce@hogranch.com> wrote:
On 09/13/12 2:08 PM, Sébastien Lorion wrote:I've never had to set maintenance_work_mem any higher than 1gb for plenty good enough performance.I started db creation over, this time with 16GB maintenance_work_mem and fsync=off and it does not seem to have a great effect. After again 5 hours, during index creation, disk and cpu are barely used: 95% idle and 2-3 MB/s writes (150 reads/s, 90 writes/s).
whats the %busy on the disk ? if you have a slow disk device (such as a shared virtual disk), 90 write/sec may be all its good for. MB/s is fairly meaningless when dealing with random committed writes.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Just saw your email between all the others .. Pinterest, Instagram, Netflix, Shazam, NASDAQ, Cycle Computing (http://arstechnica.com/business/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud/) .. that list could go on and on, see http://aws.amazon.com/solutions/case-studies/ for some more.
For a small all-in-one web server, any kind of web hosting is fine, and Amazon would most certainly be the pricier option.
Sébastien
On Thu, Sep 13, 2012 at 12:40 AM, Chris Travers <chris.travers@gmail.com> wrote:
On Tue, Aug 21, 2012 at 1:18 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit :I wonder : is there a reason why you have to go through the complexity
>
>
> Since Amazon has added new high I/O instance types and EBS volumes,
> anyone has done some benchmark of PostgreSQL on them ?
>
of such a setup, rather than simply use bare metal and get good
performance with simplicity?
For instance, the dedibox I use for my app (visible in sig) costs 14,00
euros/month, and sits at .03% load average with 5 active users; you can
admin it like a home pc.The main use cases I know of are relatively small instances where the web server and db server for an app may be on the same system.
--
Vincent Veyron
http://marica.fr/
Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general