Thread: Postgresql in a Virtual Machine
Hi,
Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go.
Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes.
Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres?
Thanks!
Lee Nguyen
On 26/11/13 09:01, Lee Nguyen wrote: > Hi, > > Having attended a few PGCons, I've always heard the remark from a few > presenters and attendees that Postgres shouldn't be run inside a VM. > That bare metal is the only way to go. > > Here at work we were entertaining the idea of running our Postgres > database on our VM farm alongside our application vm's. We are > planning to run a few Postgres synchronous replication nodes. > > Why shouldn't we run Postgres in a VM? What are the downsides? Does > anyone have any metrics or benchmarks with the latest Postgres? > > Thanks! > > Lee Nguyen I suspect that it is a performance and reliability issue that affects any ACID database. AFAIK, in a VM there is less certainty as to when a disk I/O is actually complete and safely on the disk. I think vm's are probably fine for testing, but not for production. Cheers, Gavin
On 25.11.2013 22:01, Lee Nguyen wrote: > Hi, > > Having attended a few PGCons, I've always heard the remark from a few > presenters and attendees that Postgres shouldn't be run inside a VM. That > bare metal is the only way to go. > > Here at work we were entertaining the idea of running our Postgres database > on our VM farm alongside our application vm's. We are planning to run a > few Postgres synchronous replication nodes. > > Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone > have any metrics or benchmarks with the latest Postgres? I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the performance testing that we've done in VMware, though, we haven't seen any big impact. So I guess the answer is that it depends on the specific configuration of CPU, memory, disks and the software. Synchronous replication is likely going to be the biggest bottleneck by far, unless it's mostly read-only. I don't know if virtualization will have a measurable impact on network latency, which is what matters for synchronous replication. So, I'd suggest that you try it yourself, and see how it performs. And please report back to the list, I'd also love to see some numbers! - Heikki
On 11/25/2013 03:19 PM, Heikki Linnakangas wrote: > On 25.11.2013 22:01, Lee Nguyen wrote: >> Hi, >> >> Having attended a few PGCons, I've always heard the remark from a few >> presenters and attendees that Postgres shouldn't be run inside a VM. >> That >> bare metal is the only way to go. >> >> Here at work we were entertaining the idea of running our Postgres >> database >> on our VM farm alongside our application vm's. We are planning to run a >> few Postgres synchronous replication nodes. >> >> Why shouldn't we run Postgres in a VM? What are the downsides? Does >> anyone >> have any metrics or benchmarks with the latest Postgres? > > I've also heard people say that they've seen PostgreSQL to perform > worse in a VM. In the performance testing that we've done in VMware, > though, we haven't seen any big impact. So I guess the answer is that > it depends on the specific configuration of CPU, memory, disks and the > software. Synchronous replication is likely going to be the biggest > bottleneck by far, unless it's mostly read-only. I don't know if > virtualization will have a measurable impact on network latency, which > is what matters for synchronous replication. > > So, I'd suggest that you try it yourself, and see how it performs. And > please report back to the list, I'd also love to see some numbers! > > Yeah, and there are large numbers of public and/or private cloud-based offerings out there (from Amazon RDS, Heroku, EnterpriseDB and VMware among others.) Pretty much all of these are VM based, and can be suitable for many workloads. Maybe the advice is a bit out of date. cheers andrew
Hi! We have virtualized several hundreds of production databases, mostly Oracle and DB2 but a few postgres as well, and we haveseen a very positive effect in doing this. We might loose a little bit in virtualization overhead but have gained a lot in flexibility and managebility. My tips are to make sure to optimize where you have I/O and don't over provision cpu cores to VM's. Use paravirtualized driverswhere you can and use fast storage and network to gain what you loose in virtualization overhead in those areas. I would also make sure to check that the hypervisor does write to permanent storage before returning to the VM with acknowledgement. And yes, the idea that databases and virtualization does not match, is not a reality to us anymore. It works well for mostuse cases. Best regards, Martin > 25 nov 2013 kl. 21:30 skrev "Andrew Dunstan" <andrew@dunslane.net>: > > >> On 11/25/2013 03:19 PM, Heikki Linnakangas wrote: >>> On 25.11.2013 22:01, Lee Nguyen wrote: >>> Hi, >>> >>> Having attended a few PGCons, I've always heard the remark from a few >>> presenters and attendees that Postgres shouldn't be run inside a VM. That >>> bare metal is the only way to go. >>> >>> Here at work we were entertaining the idea of running our Postgres database >>> on our VM farm alongside our application vm's. We are planning to run a >>> few Postgres synchronous replication nodes. >>> >>> Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone >>> have any metrics or benchmarks with the latest Postgres? >> >> I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the performance testing that we'vedone in VMware, though, we haven't seen any big impact. So I guess the answer is that it depends on the specific configurationof CPU, memory, disks and the software. Synchronous replication is likely going to be the biggest bottleneckby far, unless it's mostly read-only. I don't know if virtualization will have a measurable impact on network latency,which is what matters for synchronous replication. >> >> So, I'd suggest that you try it yourself, and see how it performs. And please report back to the list, I'd also love tosee some numbers! > > > Yeah, and there are large numbers of public and/or private cloud-based offerings out there (from Amazon RDS, Heroku, EnterpriseDBand VMware among others.) Pretty much all of these are VM based, and can be suitable for many workloads. > > Maybe the advice is a bit out of date. > > cheers > > andrew > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On 26/11/13 09:28, Andrew Dunstan wrote: > > On 11/25/2013 03:19 PM, Heikki Linnakangas wrote: >> On 25.11.2013 22:01, Lee Nguyen wrote: >>> Hi, >>> >>> Having attended a few PGCons, I've always heard the remark from a few >>> presenters and attendees that Postgres shouldn't be run inside a VM. >>> That >>> bare metal is the only way to go. >>> >>> Here at work we were entertaining the idea of running our Postgres >>> database >>> on our VM farm alongside our application vm's. We are planning to run a >>> few Postgres synchronous replication nodes. >>> >>> Why shouldn't we run Postgres in a VM? What are the downsides? Does >>> anyone >>> have any metrics or benchmarks with the latest Postgres? >> >> I've also heard people say that they've seen PostgreSQL to perform >> worse in a VM. In the performance testing that we've done in VMware, >> though, we haven't seen any big impact. So I guess the answer is that >> it depends on the specific configuration of CPU, memory, disks and the >> software. Synchronous replication is likely going to be the biggest >> bottleneck by far, unless it's mostly read-only. I don't know if >> virtualization will have a measurable impact on network latency, which >> is what matters for synchronous replication. >> >> So, I'd suggest that you try it yourself, and see how it performs. And >> please report back to the list, I'd also love to see some numbers! >> >> > > > Yeah, and there are large numbers of public and/or private cloud-based > offerings out there (from Amazon RDS, Heroku, EnterpriseDB and VMware > among others.) Pretty much all of these are VM based, and can be > suitable for many workloads. > > Maybe the advice is a bit out of date. > Agreed. Possibly years ago the maturity of various virtualization layers was such that the advice was sound. But these days it seems that provided some reading is done (so you understand for instance how to make writes go to the hosting hardware), it should be fine. We make use of many KVM guest VMs on usually Ubuntu and the IO performance is pretty indistinguishable from bare metal. In some tests we did notice that VMs with >8 cpus tended to stop scaling so we are using more smaller VMs rather than fewer big ones [1]. regards Mark [1] This was with Pgbench. Note this was over a year ago, so this effect may be not present (different kernels and kvm versions), or the magic number may be higher than 8 now...
On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com> wrote: > Hi, > > Having attended a few PGCons, I've always heard the remark from a few > presenters and attendees that Postgres shouldn't be run inside a VM. That > bare metal is the only way to go. > > Here at work we were entertaining the idea of running our Postgres database > on our VM farm alongside our application vm's. We are planning to run a few > Postgres synchronous replication nodes. > > Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone > have any metrics or benchmarks with the latest Postgres? Unfortunately (and it really pains me to say this) we live in an increasingly virtualized world and we just have to go ahead and deal with it. I work at a mid cap company and we have a zero tolerance policy in terms of applications targeting hardware: in short, you can't. VMs have downsides: you get less performance per buck and have another thing to fail but the administration advantages are compelling especially for large environments. Furthermore, for any size company it makes less sense to run your own data center with each passing day; the cloud providers are really bringing up their game. This is economic specialization at work. (but, as always, take regular backups of everything you do that is valuable) merlin
On Mon, 25 Nov 2013, Merlin Moncure wrote: > On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com> wrote: >> Hi, >> >> Having attended a few PGCons, I've always heard the remark from a few >> presenters and attendees that Postgres shouldn't be run inside a VM. That >> bare metal is the only way to go. >> >> Here at work we were entertaining the idea of running our Postgres database >> on our VM farm alongside our application vm's. We are planning to run a few >> Postgres synchronous replication nodes. >> >> Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone >> have any metrics or benchmarks with the latest Postgres? > > Unfortunately (and it really pains me to say this) we live in an > increasingly virtualized world and we just have to go ahead and deal > with it. I work at a mid cap company and we have a zero tolerance > policy in terms of applications targeting hardware: in short, you > can't. VMs have downsides: you get less performance per buck and have > another thing to fail but the administration advantages are compelling > especially for large environments. Furthermore, for any size company > it makes less sense to run your own data center with each passing day; > the cloud providers are really bringing up their game. This is > economic specialization at work. being pedantic, you can get almost all the management benefits on bare metal, and you can rent bare metal from hosting providors, cloud VMs are not the only option. 'Cloud' makes sense if you have a very predictably spiky load and you can add/remove machines to meet that load, but if you end up needing to have the machines running a significant percentage of the time, dedicated boxes are cheaper (as well as faster) David Lang
On Mon, Nov 25, 2013 at 4:57 PM, David Lang <david@lang.hm> wrote: > On Mon, 25 Nov 2013, Merlin Moncure wrote: > >> On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com> wrote: >>> >>> Hi, >>> >>> Having attended a few PGCons, I've always heard the remark from a few >>> presenters and attendees that Postgres shouldn't be run inside a VM. That >>> bare metal is the only way to go. >>> >>> Here at work we were entertaining the idea of running our Postgres >>> database >>> on our VM farm alongside our application vm's. We are planning to run a >>> few >>> Postgres synchronous replication nodes. >>> >>> Why shouldn't we run Postgres in a VM? What are the downsides? Does >>> anyone >>> have any metrics or benchmarks with the latest Postgres? >> >> >> Unfortunately (and it really pains me to say this) we live in an >> increasingly virtualized world and we just have to go ahead and deal >> with it. I work at a mid cap company and we have a zero tolerance >> policy in terms of applications targeting hardware: in short, you >> can't. VMs have downsides: you get less performance per buck and have >> another thing to fail but the administration advantages are compelling >> especially for large environments. Furthermore, for any size company >> it makes less sense to run your own data center with each passing day; >> the cloud providers are really bringing up their game. This is >> economic specialization at work. > > > being pedantic, you can get almost all the management benefits on bare > metal, and you can rent bare metal from hosting providors, cloud VMs are not > the only option. 'Cloud' makes sense if you have a very predictably spiky > load and you can add/remove machines to meet that load, but if you end up > needing to have the machines running a significant percentage of the time, > dedicated boxes are cheaper (as well as faster) Well, that depends on how you define 'most'. The thing is for me is that for machines around the office (just like with people) about 10% of them do 90% of the work. Being able to slide them around based on that (sometime changing) need is a tremendous time and cost saver. For application and infrastructure development dealing with hardware is just a distraction. I'd rather click on some interface and say, 'this application needs 25k iops guaranteed' and then make a cost driven decision on software optimization. It's hard to let go after decades of hardware innovation (the SSD revolution was the final shoe to drop) but for me the time has finally come. As recently as a year ago I was arguing databases needed to be run against metal. merlin
We have been running several Postgres databases on VMs for the last 9 months. The largest one currently has a few hundreds of millions of rows (~1.5T of data, ~100G of frequently queried data ) and performs at ~1000 tps. Most of our transactions are part of a 2PC, which effectively results to high I/O as asynchronous commit is disabled.
Main benefits so far:
- ESXi HA makes high availability completely transparent and reduces the number of failover servers (we're running N+1 clusters)
- Our projects' load can often miss our expectations, and it changes over the time. Scaling up/down has helped us cope.
- Live relocation of databases helps with hardware upgrades and spreading of load.
Main issues:
- We are not overprovisioning at all (using virtualization exclusively for the management benefits), so we don't know its impact to performance.
- I/O has often been a bottleneck. We are not certain whether this is due to the impact of virtualization or due to mistakes in our sizing and configuration. So far we have been coping by spreading the load across more spindles and by increasing the memory.
On Tue, Nov 26, 2013 at 1:26 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
Well, that depends on how you define 'most'. The thing is for me isOn Mon, Nov 25, 2013 at 4:57 PM, David Lang <david@lang.hm> wrote:
> On Mon, 25 Nov 2013, Merlin Moncure wrote:
>
>> On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Having attended a few PGCons, I've always heard the remark from a few
>>> presenters and attendees that Postgres shouldn't be run inside a VM. That
>>> bare metal is the only way to go.
>>>
>>> Here at work we were entertaining the idea of running our Postgres
>>> database
>>> on our VM farm alongside our application vm's. We are planning to run a
>>> few
>>> Postgres synchronous replication nodes.
>>>
>>> Why shouldn't we run Postgres in a VM? What are the downsides? Does
>>> anyone
>>> have any metrics or benchmarks with the latest Postgres?
>>
>>
>> Unfortunately (and it really pains me to say this) we live in an
>> increasingly virtualized world and we just have to go ahead and deal
>> with it. I work at a mid cap company and we have a zero tolerance
>> policy in terms of applications targeting hardware: in short, you
>> can't. VMs have downsides: you get less performance per buck and have
>> another thing to fail but the administration advantages are compelling
>> especially for large environments. Furthermore, for any size company
>> it makes less sense to run your own data center with each passing day;
>> the cloud providers are really bringing up their game. This is
>> economic specialization at work.
>
>
> being pedantic, you can get almost all the management benefits on bare
> metal, and you can rent bare metal from hosting providors, cloud VMs are not
> the only option. 'Cloud' makes sense if you have a very predictably spiky
> load and you can add/remove machines to meet that load, but if you end up
> needing to have the machines running a significant percentage of the time,
> dedicated boxes are cheaper (as well as faster)
that for machines around the office (just like with people) about 10%
of them do 90% of the work. Being able to slide them around based on
that (sometime changing) need is a tremendous time and cost saver.
For application and infrastructure development dealing with hardware
is just a distraction. I'd rather click on some interface and say,
'this application needs 25k iops guaranteed' and then make a cost
driven decision on software optimization. It's hard to let go after
decades of hardware innovation (the SSD revolution was the final shoe
to drop) but for me the time has finally come. As recently as a year
ago I was arguing databases needed to be run against metal.
merlin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Tue, 26 Nov 2013, Xenofon Papadopoulos wrote: > We have been running several Postgres databases on VMs for the last 9 > months. The largest one currently has a few hundreds of millions of rows > (~1.5T of data, ~100G of frequently queried data ) and performs at ~1000 > tps. Most of our transactions are part of a 2PC, which effectively results > to high I/O as asynchronous commit is disabled. > > Main benefits so far: > > - ESXi HA makes high availability completely transparent and reduces the > number of failover servers (we're running N+1 clusters) > > - Our projects' load can often miss our expectations, and it changes over > the time. Scaling up/down has helped us cope. how do you add another server without having to do a massive data copy in the process? David Lang > - Live relocation of databases helps with hardware upgrades and spreading > of load. > > Main issues: > > - We are not overprovisioning at all (using virtualization exclusively for > the management benefits), so we don't know its impact to performance. > > - I/O has often been a bottleneck. We are not certain whether this is due > to the impact of virtualization or due to mistakes in our sizing and > configuration. So far we have been coping by spreading the load across > more spindles and by increasing the memory. > > > > > > On Tue, Nov 26, 2013 at 1:26 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> On Mon, Nov 25, 2013 at 4:57 PM, David Lang <david@lang.hm> wrote: >>> On Mon, 25 Nov 2013, Merlin Moncure wrote: >>> >>>> On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com> >> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Having attended a few PGCons, I've always heard the remark from a few >>>>> presenters and attendees that Postgres shouldn't be run inside a VM. >> That >>>>> bare metal is the only way to go. >>>>> >>>>> Here at work we were entertaining the idea of running our Postgres >>>>> database >>>>> on our VM farm alongside our application vm's. We are planning to run >> a >>>>> few >>>>> Postgres synchronous replication nodes. >>>>> >>>>> Why shouldn't we run Postgres in a VM? What are the downsides? Does >>>>> anyone >>>>> have any metrics or benchmarks with the latest Postgres? >>>> >>>> >>>> Unfortunately (and it really pains me to say this) we live in an >>>> increasingly virtualized world and we just have to go ahead and deal >>>> with it. I work at a mid cap company and we have a zero tolerance >>>> policy in terms of applications targeting hardware: in short, you >>>> can't. VMs have downsides: you get less performance per buck and have >>>> another thing to fail but the administration advantages are compelling >>>> especially for large environments. Furthermore, for any size company >>>> it makes less sense to run your own data center with each passing day; >>>> the cloud providers are really bringing up their game. This is >>>> economic specialization at work. >>> >>> >>> being pedantic, you can get almost all the management benefits on bare >>> metal, and you can rent bare metal from hosting providors, cloud VMs are >> not >>> the only option. 'Cloud' makes sense if you have a very predictably spiky >>> load and you can add/remove machines to meet that load, but if you end up >>> needing to have the machines running a significant percentage of the >> time, >>> dedicated boxes are cheaper (as well as faster) >> >> Well, that depends on how you define 'most'. The thing is for me is >> that for machines around the office (just like with people) about 10% >> of them do 90% of the work. Being able to slide them around based on >> that (sometime changing) need is a tremendous time and cost saver. >> For application and infrastructure development dealing with hardware >> is just a distraction. I'd rather click on some interface and say, >> 'this application needs 25k iops guaranteed' and then make a cost >> driven decision on software optimization. It's hard to let go after >> decades of hardware innovation (the SSD revolution was the final shoe >> to drop) but for me the time has finally come. As recently as a year >> ago I was arguing databases needed to be run against metal. >> >> merlin >> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> >
Which scenario do you have in mind? We don't add servers to scale out, we only scale up in each single project. We use SAN for storage, so if we need to increase disk space we provide more through LVM.
There is one case we need to move data around, when we relocate projects over to new storage (eg to reduce the load on the SAN). There is significant data copy involved, but as it's done in parallel with our live operations and doesn't cause noticeable performance drop it hasn't been an issue so far.
On Tue, Nov 26, 2013 at 8:16 AM, David Lang <david@lang.hm> wrote:
On Tue, 26 Nov 2013, Xenofon Papadopoulos wrote:how do you add another server without having to do a massive data copy in the process?We have been running several Postgres databases on VMs for the last 9
months. The largest one currently has a few hundreds of millions of rows
(~1.5T of data, ~100G of frequently queried data ) and performs at ~1000
tps. Most of our transactions are part of a 2PC, which effectively results
to high I/O as asynchronous commit is disabled.
Main benefits so far:
- ESXi HA makes high availability completely transparent and reduces the
number of failover servers (we're running N+1 clusters)
- Our projects' load can often miss our expectations, and it changes over
the time. Scaling up/down has helped us cope.
David Lang- Live relocation of databases helps with hardware upgrades and spreading
of load.
Main issues:
- We are not overprovisioning at all (using virtualization exclusively for
the management benefits), so we don't know its impact to performance.
- I/O has often been a bottleneck. We are not certain whether this is due
to the impact of virtualization or due to mistakes in our sizing and
configuration. So far we have been coping by spreading the load across
more spindles and by increasing the memory.
On Tue, Nov 26, 2013 at 1:26 AM, Merlin Moncure <mmoncure@gmail.com> wrote:On Mon, Nov 25, 2013 at 4:57 PM, David Lang <david@lang.hm> wrote:On Mon, 25 Nov 2013, Merlin Moncure wrote:wrote:On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com>That
Hi,
Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM.abare metal is the only way to go.
Here at work we were entertaining the idea of running our Postgres
database
on our VM farm alongside our application vm's. We are planning to runnotfew
Postgres synchronous replication nodes.
Why shouldn't we run Postgres in a VM? What are the downsides? Does
anyone
have any metrics or benchmarks with the latest Postgres?
Unfortunately (and it really pains me to say this) we live in an
increasingly virtualized world and we just have to go ahead and deal
with it. I work at a mid cap company and we have a zero tolerance
policy in terms of applications targeting hardware: in short, you
can't. VMs have downsides: you get less performance per buck and have
another thing to fail but the administration advantages are compelling
especially for large environments. Furthermore, for any size company
it makes less sense to run your own data center with each passing day;
the cloud providers are really bringing up their game. This is
economic specialization at work.
being pedantic, you can get almost all the management benefits on bare
metal, and you can rent bare metal from hosting providors, cloud VMs arethe only option. 'Cloud' makes sense if you have a very predictably spikytime,
load and you can add/remove machines to meet that load, but if you end up
needing to have the machines running a significant percentage of thededicated boxes are cheaper (as well as faster)
Well, that depends on how you define 'most'. The thing is for me is
that for machines around the office (just like with people) about 10%
of them do 90% of the work. Being able to slide them around based on
that (sometime changing) need is a tremendous time and cost saver.
For application and infrastructure development dealing with hardware
is just a distraction. I'd rather click on some interface and say,
'this application needs 25k iops guaranteed' and then make a cost
driven decision on software optimization. It's hard to let go after
decades of hardware innovation (the SSD revolution was the final shoe
to drop) but for me the time has finally come. As recently as a year
ago I was arguing databases needed to be run against metal.
merlin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/25/2013 09:01 PM, Lee Nguyen wrote: > Hi, > > Having attended a few PGCons, I've always heard the remark from a > few presenters and attendees that Postgres shouldn't be run inside > a VM. That bare metal is the only way to go. > [........] Hello This was true some years ago. In our experience, this is not true anymore if you are not running a very demanding system that will be a challenge even running on metal. It should work well for most use cases if your infrastructure is configured correctly. This year we have moved all our postgreSQL servers (45+) to a VMware cluster running vSphere 5.1. We are also almost finished moving all our oracle databases to this cluster too. More than 100 virtual servers and some thousands databases are running without problems in our VM environment. In our experience, VMware vSphere 5.1 makes a huge different in IO performance compared to older versions. Our tests against a storage solution connected to vm servers and metal servers last year, did not show any particular difference in performance between them. Some tips: * We use a SAN via Fibre Channel to storage our data. Be sure to have enough active FC channels for your load. Do not even think to use NFS to connect your physical nodes to your SAN. * We are using 10GigE to interconnect the physical nodes in our cluster. This helps a lot when moving VM servers between nodes. * Don't use in production the snapshot functionality in VM clusters. * Don't over provision resources, specially memory. * Use paravirtualized drivers. * As usual, your storage solution will define the limits in performance of your VM cluster. We have gained a lot in flexibility and manageability without losing performance, the benefits in these areas are many when you administrate many servers/databases. regards, - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAlKUbjcACgkQBhuKQurGihTpHQCeIDkjR/BFM61V2ft72BYd2SBr sowAnRrscNmByay3KL9iicpGUYcb2hv6 =Qvey -----END PGP SIGNATURE-----
2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta: > On 25.11.2013 22:01, Lee Nguyen wrote: >> Hi, >> >> Having attended a few PGCons, I've always heard the remark from a few >> presenters and attendees that Postgres shouldn't be run inside a VM. That >> bare metal is the only way to go. >> >> Here at work we were entertaining the idea of running our Postgres database >> on our VM farm alongside our application vm's. We are planning to run a >> few Postgres synchronous replication nodes. >> >> Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone >> have any metrics or benchmarks with the latest Postgres? > > I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the > performance testing that we've done in VMware, though, we haven't seen any big impact. > So I guess the answer is that it depends on the specific configuration of CPU, memory, > disks and the software. We at Cybertec tested some configurations about 2 months ago. The performance drop is coming from the disk given to the VM guest. When there is a dedicated disk (pass through) given to the VM guest, PostgreSQL runs at a speed of around 98% of the bare metal. When the virtual disk is a disk file on the host machine, we've measured 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed both under Qemu/KVM and Xen. The virtual disk was not pre-allocated, since it was the default setting, i.e. space savings preferred over speed. The figure might be better with a pre-allocated disk but the filesystem journalling done twice (both in the host and the guest) will have an effect. The PostgreSQL server versions 9.2.x, 9.3beta were tested with pgbench, standalone, without replication. Best regards, Zoltán Böszörményi > Synchronous replication is likely going to be the biggest bottleneck by far, unless it's > mostly read-only. I don't know if virtualization will have a measurable impact on > network latency, which is what matters for synchronous replication. > > So, I'd suggest that you try it yourself, and see how it performs. And please report > back to the list, I'd also love to see some numbers! > > - Heikki > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Zoltan, * Boszormenyi Zoltan (zb@cybertec.at) wrote: > When the virtual disk is a disk file on the host machine, we've measured > 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel > with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed > both under Qemu/KVM and Xen. Interesting- that's far worse than I would have expected. Was this test done with paravirtualized drivers? If not, I can certainly understand the terrible performance. Independently of that, I'll add my own 2c that DB people tend to be pretty paranoid and the current round of VM technologies out there have caused more than one person to lose data because fsync wasn't honored all the way down to the disk. This is especially true of 'home-grown' setups, imv, but I'm sure you could configure the commercial offerings to lie to the guest OS too. Of course, there are similar concerns about a SAN or even local RAID cards, but there's a lot more general familiarity and history around those which reduces the risk there (or at least, that's the thought). Thanks, Stephen
Attachment
On 25.11.2013 22:01, Lee Nguyen wrote:
Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone
have any metrics or benchmarks with the latest Postgres?
For those of us with small (a few to a dozen servers), we'd like to get out of server maintenance completely. Can anyone with experience on a cloud VM solution comment? Do the VM solutions provided by the major hosting companies have the same good performance as the VM's that that several have described here?
Obviously there's Amazon's new Postgres solution available. What else is out there in the way of "instant on" solutions with Linux/Postgres/Apache preconfigured systems? Has anyone used them in production?
Thanks,
Craig
Thanks,
Craig
On 11/26/2013 09:26 AM, Craig James wrote: > > On 25.11.2013 22:01, Lee Nguyen wrote: > > > Why shouldn't we run Postgres in a VM? What are the > downsides? Does anyone > have any metrics or benchmarks with the latest Postgres? > > > For those of us with small (a few to a dozen servers), we'd like to > get out of server maintenance completely. Can anyone with experience > on a cloud VM solution comment? Do the VM solutions provided by the > major hosting companies have the same good performance as the VM's > that that several have described here? > > Obviously there's Amazon's new Postgres solution available. What else > is out there in the way of "instant on" solutions with > Linux/Postgres/Apache preconfigured systems? Has anyone used them in > production? > > If you want a full stack including Postgres, Heroku might be your best bet. Depends a bit on your application and your workload. And yes, I've used it. Full disclosure: I have done work paid for by Heroku. cheers andrew
On 11/26/2013 08:51 AM, Boszormenyi Zoltan wrote: > 2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta: >> On 25.11.2013 22:01, Lee Nguyen wrote: >>> Hi, >>> >>> Having attended a few PGCons, I've always heard the remark from a few >>> presenters and attendees that Postgres shouldn't be run inside a VM. >>> That >>> bare metal is the only way to go. >>> >>> Here at work we were entertaining the idea of running our Postgres >>> database >>> on our VM farm alongside our application vm's. We are planning to >>> run a >>> few Postgres synchronous replication nodes. >>> >>> Why shouldn't we run Postgres in a VM? What are the downsides? Does >>> anyone >>> have any metrics or benchmarks with the latest Postgres? >> >> I've also heard people say that they've seen PostgreSQL to perform >> worse in a VM. In the performance testing that we've done in VMware, >> though, we haven't seen any big impact. So I guess the answer is that >> it depends on the specific configuration of CPU, memory, disks and >> the software. > > We at Cybertec tested some configurations about 2 months ago. > The performance drop is coming from the disk given to the VM guest. > > When there is a dedicated disk (pass through) given to the VM guest, > PostgreSQL runs at a speed of around 98% of the bare metal. > > When the virtual disk is a disk file on the host machine, we've measured > 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux > kernel > with EXT4 filesystem (this latter is sure, not IIRC). The effect was > observed > both under Qemu/KVM and Xen. > > The virtual disk was not pre-allocated, since it was the default setting, > i.e. space savings preferred over speed. The figure might be better with > a pre-allocated disk but the filesystem journalling done twice (both > in the > host and the guest) will have an effect. Not-pre-allocated disk-file backed is just about the worst case in my experience. Try pre-allocated VirtIO disks on an LVM volume group - you should get much better performance. cheers andrew
On 11/26/2013 7:26 AM, Craig James wrote: > > For those of us with small (a few to a dozen servers), we'd like to > get out of server maintenance completely. Can anyone with experience > on a cloud VM solution comment? Do the VM solutions provided by the > major hosting companies have the same good performance as the VM's > that that several have described here? > > Obviously there's Amazon's new Postgres solution available. What else > is out there in the way of "instant on" solutions with > Linux/Postgres/Apache preconfigured systems? Has anyone used them in > production? I've done some work with Heroku and the MySQL flavor of AWS service. They work, and are convenient, but there are a couple of issues : 1. Random odd (and bad) things can happen from a performance perspective that you just need to cope with. e.g. I/O will become vastly slower for periods of 10s of seconds, once or twice a day. If you don't like the idea of phenomena like this in your system, beware. 2. Your inability to connect with the bare metal may turn out to be a significant hassle when trying to understand some performance issue in the future. Tricks that we're used to using such as looking at "iostat" (or even "top") output are no longer usable because the hosting company will not give you a login on the host VM. This limitation extends to many many techniques that have been commonly used in the past and can become a major headache to the point where you need to reproduce the system on physical hardware just to understand what's going on with it (been there, done that...) For the reasons above I would caution deploying a production service (today) on a "SaaS" database service like Heroku or Amazon RDS. Running your own database inside a stock VM might be better, but it can be hard to get the right kind of I/O for that deployment scenario. In the case of self-hosted VMWare or KVM obviously you have much more control and observability. Heroku had (at least when I last used it, a year ago or so) an additional issue in that they host on AWS VMs so if something goes wrong you are talking to one company that is using another company's virtual machine service. Not a recipe for clarity, good service and hair retention...
On Tue, Nov 26, 2013 at 8:29 AM, David Boreham <david_list@boreham.org> wrote:
On 11/26/2013 7:26 AM, Craig James wrote:
For those of us with small (a few to a dozen servers), we'd like to get out of server maintenance completely. Can anyone with experience on a cloud VM solution comment? ...
I've done some work with Heroku and the MySQL flavor of AWS service.
Thanks, I'll check Heroku out.
For the reasons above I would caution deploying a production service (today) on a "SaaS" database service like Heroku or Amazon RDS.
Running your own database inside a stock VM might be better, but it can be hard to get the right kind of I/O for that deployment scenario.
In the case of self-hosted VMWare or KVM obviously you have much more control and observability.
Well, the whole point of switching to a cloud provider is to get out of the business of buying hardware and hauling it down to the co-lo facility. Adding VMWare or KVM is just one more thing we'd have to add to our sysadmin skills. We'd rather focus on our core technology, the stuff we're better at than anyone else.
So far I'm impressed by what I've read about Amazon's Postgres instances. Maybe the reality will be disappointing, but (for example) the idea of setting up streaming replication with one click is pretty appealing.
Craig
On 11/25/2013 12:01 PM, Lee Nguyen wrote: > Hi, > > Having attended a few PGCons, I've always heard the remark from a few > presenters and attendees that Postgres shouldn't be run inside a VM. That > bare metal is the only way to go. This is pretty dated advice. Early VMs had horrible performance under load, which is mostly where this thinking comes from. It's not true anymore. It *is* true that getting good performance in a virtualized environment requires more tuning than bare metal, because you have to tune the VM system as well. > Here at work we were entertaining the idea of running our Postgres database > on our VM farm alongside our application vm's. We are planning to run a > few Postgres synchronous replication nodes. Biggest pitfall here is IO performance configuration. I can't give you specific advice without knowing the platform and the desired workload. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Nov 26, 2013 at 11:31 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/25/2013 12:01 PM, Lee Nguyen wrote: >> Hi, >> >> Having attended a few PGCons, I've always heard the remark from a few >> presenters and attendees that Postgres shouldn't be run inside a VM. That >> bare metal is the only way to go. > > This is pretty dated advice. Early VMs had horrible performance under > load, which is mostly where this thinking comes from. It's not true > anymore. > > It *is* true that getting good performance in a virtualized environment > requires more tuning than bare metal, because you have to tune the VM > system as well. > >> Here at work we were entertaining the idea of running our Postgres database >> on our VM farm alongside our application vm's. We are planning to run a >> few Postgres synchronous replication nodes. > > Biggest pitfall here is IO performance configuration. I can't give you > specific advice without knowing the platform and the desired workload. Yeah. Seeing things like provisioned iops in the cloud services is a pretty big deal. I do think it's still fairly expensive for what you get but SSDs and competition is going to force prices down quickly over time. For "in house" virtualized setups, you can get pretty far with SSDs using any number of options (direct attached to the host, iscsi etc, SAN etc). For I/O constrained systems, I don't consider any spindle based systems, in particular SANs, to be a good investment. Curious: I just read your article on iscsi (http://it.toolbox.com/blogs/database-soup/the-problem-with-iscsi-30602). Do you still consider iscsi to be imperformant? merlin
On Nov 26, 2013, at 9:24 AM, Craig James wrote:
So far I'm impressed by what I've read about Amazon's Postgres instances. Maybe the reality will be disappointing, but (for example) the idea of setting up streaming replication with one click is pretty appealing.
Where did you hear this was an option? When we talked to AWS about their Postgres RDS offering, they were pretty clear that (currently) replication is hardware-based, the slave is not live, and you don't get access to the WALs that they use internally for PITR. Changing that is something they want to address, but isn't there today.
That said, we use AWS instances to run Postgres, and so long as you use their Provisioned IOPS service for i/o and size your instances appropriately, it's been pretty good. Maybe not the most cost-effective option, but you're paying for the service to not have to worry about stocking spare parts or making sure your hardware is burned in before use. And AWS makes it easy to add regional or even global redundancy, if that's what you want. (Of course that costs even more money, but if you need it, using AWS is a lot easier than finding colos around the world yourself.)
Like many have said, the problem of using VMs for databases is that a lot of VM systems try to over-subscribe the hardware for more savings. That works for a lot of loads but not a busy database. So just make sure your VM isn't doing that to you, and most of the performance argument for avoiding VMs goes away.
On Tue, Nov 26, 2013 at 10:40 AM, Ben Chobot <bench@silentmedia.com> wrote:
On Nov 26, 2013, at 9:24 AM, Craig James wrote:So far I'm impressed by what I've read about Amazon's Postgres instances. Maybe the reality will be disappointing, but (for example) the idea of setting up streaming replication with one click is pretty appealing.Where did you hear this was an option? When we talked to AWS about their Postgres RDS offering, they were pretty clear that (currently) replication is hardware-based, the slave is not live, and you don't get access to the WALs that they use internally for PITR. Changing that is something they want to address, but isn't there today.
I was guessing from the description of their "High Availability" option ... but maybe it uses something like pg-pool, or as you say, maybe they do it at the hardware level.
http://aws.amazon.com/rds/postgresql/#High-Availability
http://aws.amazon.com/rds/postgresql/#High-Availability
"Multi-AZ Deployments – This deployment option for your production DB Instances enhances database availability while protecting your latest database updates against unplanned outages. When you create or modify your DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically provision and manage a “standby” replica in a different Availability Zone (independent infrastructure in a physically separate location). Database updates are made concurrently on the primary and standby resources to prevent replication lag. In the event of planned database maintenance, DB Instance failure, or an Availability Zone failure, Amazon RDS will automatically failover to the up-to-date standby so that database operations can resume quickly without administrative intervention. Prior to failover you cannot directly access the standby, and it cannot be used to serve read traffic."
Either way, if a cold standby is all you need, it's still a one-click option, lots simpler than setting it up yourself.
Craig
Craig
On Tue, Nov 26, 2013 at 11:18:41AM -0800, Craig James wrote: - On Tue, Nov 26, 2013 at 10:40 AM, Ben Chobot <bench@silentmedia.com> wrote: - - > On Nov 26, 2013, at 9:24 AM, Craig James wrote: - > - > So far I'm impressed by what I've read about Amazon's Postgres instances. - > Maybe the reality will be disappointing, but (for example) the idea of - > setting up streaming replication with one click is pretty appealing. - > - > - > Where did you hear this was an option? When we talked to AWS about their - > Postgres RDS offering, they were pretty clear that (currently) replication - > is hardware-based, the slave is not live, and you don't get access to the - > WALs that they use internally for PITR. Changing that is something they - > want to address, but isn't there today. - > - - I was guessing from the description of their "High Availability" option ... - but maybe it uses something like pg-pool, or as you say, maybe they do it - at the hardware level. - - http://aws.amazon.com/rds/postgresql/#High-Availability - - - "Multi-AZ Deployments This deployment option for your production DB - Instances enhances database availability while protecting your latest - database updates against unplanned outages. When you create or modify your - DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically - provision and manage a standby replica in a different Availability Zone - (independent infrastructure in a physically separate location). Database - updates are made concurrently on the primary and standby resources to - prevent replication lag. In the event of planned database maintenance, DB - Instance failure, or an Availability Zone failure, Amazon RDS will - automatically failover to the up-to-date standby so that database - operations can resume quickly without administrative intervention. Prior to - failover you cannot directly access the standby, and it cannot be used to - serve read traffic." - - Either way, if a cold standby is all you need, it's still a one-click - option, lots simpler than setting it up yourself. - - Craig The Multi-AZ deployments don't expose the replica to you unless there is a failover. (in which case it picks one and promotes it) There is an option for "Create Read Replica" but it's currently not available so we can assume that will eventually be an option.
On Mon, Nov 25, 2013 at 4:00 PM, Gudmundsson Martin (mg) <martin.mg.gudmundsson@volvo.com> wrote: > > I would also make sure to check that the hypervisor does write to permanent storage before returning to the VM with acknowledgement. > In the case of ESX, there is no such concern per http://kb.vmware.com/kb/1008542. As Heikki commented, VMware recently compared Postgres performance in an ESX (5.1) VM versus in a comparable native Linux. We saw 1. ESX-level locking causes no vertical scalability degradation, 2. Memory oversubscription can indeed be a performance hazard when consolidating mulitple Postgres VMs on one host. Yet we found moderate memory oversubscription (up to 20%) might work out fine: we saw <5% degradation at 20% memory oversubscription in a conventional setup (where Postgres server uses 25% memory shared_buffers and VM uses out-of-the-box kernel-level memory ballooning.) Nitty-gritty details can be found in the whitepaper http://www.vmware.com/files/pdf/techpaper/vPostgres-perf.pdf (Disclaimer: I'm a author.) As many pointed out here, storage is most likely where extra care of capacity planning can be used when weighing putting Postgres in a VM versus natively. Our tests (during the same period as those towards the above observations) read: pgbench default saw ~10% degradation at 28 pgbench clients on a 32-core Intel Sandy Bridge machine; and dbt2 with zero thinking/keying/ time saw ~30% degradation at 28 dbt2 terminals on the same machine. In both cases, the regression is gradually and increasingly more pronounced as concurrency ramps up (starting from <5% degradation at 1 client/terminal in both cases.) Regards, Dong
> > > > I would also make sure to check that the hypervisor does write to > permanent storage before returning to the VM with acknowledgement. > > > In the case of ESX, there is no such concern per > http://kb.vmware.com/kb/1008542. Very useful info! > As Heikki commented, VMware recently compared Postgres performance in > an ESX (5.1) VM versus in a comparable native Linux. We saw 1. > ESX-level locking causes no vertical scalability degradation, 2. > Memory oversubscription can indeed be a performance hazard when > consolidating mulitple Postgres VMs on one host. Yet we found moderate > memory oversubscription (up to 20%) might work out fine: we saw <5% > degradation at 20% memory oversubscription in a conventional setup > (where Postgres server uses 25% memory shared_buffers and VM uses > out-of-the-box kernel-level memory ballooning.) Nitty-gritty details > can be found in the whitepaper > http://www.vmware.com/files/pdf/techpaper/vPostgres-perf.pdf > (Disclaimer: I'm a author.) Interesting reading. There was some earlier comment in this discussion about not using NFS datastores for Postgres VMDK's. Would you think you'dsee a difference in scalability behavior or performance in these tests if a NFS datastore would be used instead? Providedthe architecture is properly setup for that, with high speed low latency networking, and fast NAS storage. Thanks! > Regards, > Dong
> There was some earlier comment in this discussion about not using NFS datastores for Postgres VMDK's. Would you think you'dsee a difference in scalability behavior or performance in these tests if a NFS datastore would be used instead? Providedthe architecture is properly setup for that, with high speed low latency networking, and fast NAS storage. > Though not first-hand experience, my understanding is that performance is not near the top of the list of considerations when weighing different storage protocols. You might find the following docs useful: http://www.vmware.com/files/pdf/techpaper/Storage_Protocol_Comparison.pdf http://media.netapp.com/documents/tr-3916.pdf Cheers, Dong
On Wed, Nov 27, 2013 at 7:58 PM, Dong Ye <yed@vmware.com> wrote: > As Heikki commented, VMware recently compared Postgres performance in > an ESX (5.1) VM versus in a comparable native Linux. We saw 1. > ESX-level locking causes no vertical scalability degradation, 2. FYI Vmware has an optimized version of Postgresql for use on VSphere etc: http://www.vmware.com/products/vfabric-postgres/
On Fri, Nov 29, 2013 at 3:40 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Nov 27, 2013 at 7:58 PM, Dong Ye <yed@vmware.com> wrote: > >> As Heikki commented, VMware recently compared Postgres performance in >> an ESX (5.1) VM versus in a comparable native Linux. We saw 1. >> ESX-level locking causes no vertical scalability degradation, 2. > > FYI Vmware has an optimized version of Postgresql for use on VSphere > etc: http://www.vmware.com/products/vfabric-postgres/ There is actually no fork of the core in vFabric Postgres, Postgres core is unmodified as of release 9.3. Have a look at the release notes: https://www.vmware.com/support/vfabric-postgres/doc/vfabric-postgres-93-release-notes.html -- Michael