Home > mailing lists

Re: Postgresql in a Virtual Machine - Mailing list pgsql-performance

From	Xenofon Papadopoulos
Subject	Re: Postgresql in a Virtual Machine
Date	November 26, 2013 09:30:38
Msg-id	CANL7jASGw3TqpPHHJy_Wkq-r_=KymhnFGMm03ZqWShtOhP2-0g@mail.gmail.com Whole thread Raw
In response to	Re: Postgresql in a Virtual Machine (David Lang <david@lang.hm>)
List	pgsql-performance

Tree view

Which scenario do you have in mind? We don't add servers to scale out, we only scale up in each single project. We use SAN for storage, so if we need to increase disk space we provide more through LVM.

There is one case we need to move data around, when we relocate projects over to new storage (eg to reduce the load on the SAN). There is significant data copy involved, but as it's done in parallel with our live operations and doesn't cause noticeable performance drop it hasn't been an issue so far.

On Tue, Nov 26, 2013 at 8:16 AM, David Lang <david@lang.hm> wrote:

On Tue, 26 Nov 2013, Xenofon Papadopoulos wrote:

We have been running several Postgres databases on VMs for the last 9
months. The largest one currently has a few hundreds of millions of rows
(~1.5T of data, ~100G of frequently queried data ) and performs at ~1000
tps. Most of our transactions are part of a 2PC, which effectively results
to high I/O as asynchronous commit is disabled.

Main benefits so far:

- ESXi HA makes high availability completely transparent and reduces the
number of failover servers (we're running N+1 clusters)

- Our projects' load can often miss our expectations, and it changes over
the time. Scaling up/down has helped us cope.

how do you add another server without having to do a massive data copy in the process?

David Lang

- Live relocation of databases helps with hardware upgrades and spreading
of load.

Main issues:

- We are not overprovisioning at all (using virtualization exclusively for
the management benefits), so we don't know its impact to performance.

- I/O has often been a bottleneck. We are not certain whether this is due
to the impact of virtualization or due to mistakes in our sizing and
configuration. So far we have been coping by spreading the load across
more spindles and by increasing the memory.

On Tue, Nov 26, 2013 at 1:26 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Nov 25, 2013 at 4:57 PM, David Lang <david@lang.hm> wrote:
On Mon, 25 Nov 2013, Merlin Moncure wrote:

On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen <leemobile@gmail.com>
wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM.
That
bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres
database
on our VM farm alongside our application vm's. We are planning to run
a
few
Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM? What are the downsides? Does
anyone
have any metrics or benchmarks with the latest Postgres?

Unfortunately (and it really pains me to say this) we live in an
increasingly virtualized world and we just have to go ahead and deal
with it. I work at a mid cap company and we have a zero tolerance
policy in terms of applications targeting hardware: in short, you
can't. VMs have downsides: you get less performance per buck and have
another thing to fail but the administration advantages are compelling
especially for large environments. Furthermore, for any size company
it makes less sense to run your own data center with each passing day;
the cloud providers are really bringing up their game. This is
economic specialization at work.

being pedantic, you can get almost all the management benefits on bare
metal, and you can rent bare metal from hosting providors, cloud VMs are
not
the only option. 'Cloud' makes sense if you have a very predictably spiky
load and you can add/remove machines to meet that load, but if you end up
needing to have the machines running a significant percentage of the
time,
dedicated boxes are cheaper (as well as faster)

Well, that depends on how you define 'most'. The thing is for me is
that for machines around the office (just like with people) about 10%
of them do 90% of the work. Being able to slide them around based on
that (sometime changing) need is a tremendous time and cost saver.
For application and infrastructure development dealing with hardware
is just a distraction. I'd rather click on some interface and say,
'this application needs 25k iops guaranteed' and then make a cost
driven decision on software optimization. It's hard to let go after
decades of hardware innovation (the SSD revolution was the final shoe
to drop) but for me the time has finally come. As recently as a year
ago I was arguing databases needed to be run against metal.

merlin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

From: David Lang
Date: 26 November 2013, 09:16:11
Subject: Re: Postgresql in a Virtual Machine

From: Rafael Martinez
Date: 26 November 2013, 12:47:44
Subject: Re: Postgresql in a Virtual Machine

Re: Postgresql in a Virtual Machine - Mailing list pgsql-performance

Previous

Next