Thread: Setting effective_io_concurrency in VM?

Setting effective_io_concurrency in VM?

From
Don Seiler
Date:
Good afternoon. We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX machines. We currently have effective_io_concurrency set to the default of 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2 parity drives. I know that RAID10 is recommended, just working with what I've inherited for now (storage is high-end HP 3Par and HP recommended RAID 6 for best performance). Anyway, I'm wondering if, in a virtualized environment with a VM datastore, it makes sense to set effective_io_concurrency closer to the number of data drives? I'd also be interested in hearing how others have configured their PostgreSQL instances for VMs (if there's anything special to think about). Don. -- Don Seiler www.seiler.us

Re: Setting effective_io_concurrency in VM?

From
Scott Marlowe
Date:
On Mon, Nov 27, 2017 at 11:23 AM, Don Seiler <don@seiler.us> wrote:
> Good afternoon.
>
> We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX
> machines. We currently have effective_io_concurrency set to the default of
> 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2
> parity drives. I know that RAID10 is recommended, just working with what
> I've inherited for now (storage is high-end HP 3Par and HP recommended RAID
> 6 for best performance).
>
> Anyway, I'm wondering if, in a virtualized environment with a VM datastore,
> it makes sense to set effective_io_concurrency closer to the number of data
> drives?
>
> I'd also be interested in hearing how others have configured their
> PostgreSQL instances for VMs (if there's anything special to think about).

Generally VMs are never going to be as fast as running on bare metal
etc. You can adjust it and test it with something simple like pgbench
with various settings for -c (concurrency) and see where it peaks etc
with the setting. This will at least get you into the ball park.

A while back we needed fast machines with LOTS of storage (7TB data
drives with 5TB of data on them) and the only way to stuff that many
800GB SSDs into a single machine was to use RAID-5 with a spare (I
lobbied for RAID6 but was overidden eh...) We were able to achieve
over 15k TPS in pgbench with a 400GB data store on those boxes. The
secret was to turn off the cache in the RAID controller and cranl up
effective io concurrency to something around 10 (not sure, it's been a
while).

tl;dr: Only way to know is to benchmark it. I'd guess that somewhere
between 10 and 20 is going to get the best throughput but that's just
a guess. Benchmark it and let us know!


Re: Setting effective_io_concurrency in VM?

From
Fernando Hevia
Date:
El 27 nov. 2017 15:24, "Don Seiler" escribió: Good afternoon. We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX machines. We currently have effective_io_concurrency set to the default of 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2 parity drives. I know that RAID10 is recommended, just working with what I've inherited for now (storage is high-end HP 3Par and HP recommended RAID 6 for best performance). Anyway, I'm wondering if, in a virtualized environment with a VM datastore, it makes sense to set effective_io_concurrency closer to the number of data drives? I'd also be interested in hearing how others have configured their PostgreSQL instances for VMs (if there's anything special to think about). If the storage was exclusively for the Postgres box I'd try effective_io_concurrency somewhere between 8 and 12. Since it is probably not, it will depend on the load the other VMs exert on the storage. Assuming the storage isnt already stressed and you need the extra IOPS, you could test values between 4 and 8. You can of course be a lousy team player and have PG paralelize as much as it can, but this eventually will piss off the storage or vmware manager, which is never good as they can limit your IO throughput at the virtualization or storage layers. Cheers.

Re: Setting effective_io_concurrency in VM?

From
Andrew Kerber
Date:
Whats the guest OS? I have been able to get Oracle to perform just as well on Virtuals as it does on Physicals. I suspect the settings are pretty similar. On Mon, Nov 27, 2017 at 3:06 PM, Fernando Hevia wrote: > > > El 27 nov. 2017 15:24, "Don Seiler" escribió: > > Good afternoon. > > We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX > machines. We currently have effective_io_concurrency set to the default of > 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2 > parity drives. I know that RAID10 is recommended, just working with what > I've inherited for now (storage is high-end HP 3Par and HP recommended RAID > 6 for best performance). > > Anyway, I'm wondering if, in a virtualized environment with a VM > datastore, it makes sense to set effective_io_concurrency closer to the > number of data drives? > > I'd also be interested in hearing how others have configured their > PostgreSQL instances for VMs (if there's anything special to think about). > > > > If the storage was exclusively for the Postgres box I'd try > effective_io_concurrency somewhere between 8 and 12. Since it is probably > not, it will depend on the load the other VMs exert on the storage. > Assuming the storage isnt already stressed and you need the extra IOPS, you > could test values between 4 and 8. You can of course be a lousy team player > and have PG paralelize as much as it can, but this eventually will piss > off the storage or vmware manager, which is never good as they can limit > your IO throughput at the virtualization or storage layers. > > Cheers. > > > > -- Andrew W. Kerber 'If at first you dont succeed, dont take up skydiving.'

Re: Setting effective_io_concurrency in VM?

From
Don Seiler
Date:
On Mon, Nov 27, 2017 at 3:44 PM, Andrew Kerber wrote: > Whats the guest OS? I have been able to get Oracle to perform just as > well on Virtuals as it does on Physicals. I suspect the settings are > pretty similar. > Guest OS is CentOS 6 and CentOS 7 depending on which DB host we're looking at. I'd be interested in learning for either case. Don. -- Don Seiler www.seiler.us

Re: Setting effective_io_concurrency in VM?

From
Jeff Janes
Date:
On Mon, Nov 27, 2017 at 10:40 AM, Scott Marlowe wrote: > > Generally VMs are never going to be as fast as running on bare metal > etc. You can adjust it and test it with something simple like pgbench > with various settings for -c (concurrency) and see where it peaks etc > with the setting. This will at least get you into the ball park. > None of the built-in workloads for pgbench cares a whit about effective_io_concurrency. He would have to come up with some custom transactions to exercise that feature. (Or use the tool people use to run the TPCH benchmark, rather than using pgbench's built in transactions) I think the best overall advice would be to configure it the same as you would if it were not a VM. There may be cases where you diverge from that, but I think each one would require extensive investigation and experimentation, so can't be turned into a rule of thumb. Cheers, Jeff

Re: Setting effective_io_concurrency in VM?

From
Andres Freund
Date:
Hi,

On 2017-11-27 11:40:19 -0700, Scott Marlowe wrote:
> tl;dr: Only way to know is to benchmark it. I'd guess that somewhere
> between 10 and 20 is going to get the best throughput but that's just
> a guess. Benchmark it and let us know!

FWIW, for SSDs my previous experiments suggest that the sweet spot is
more likely to be an order of magnitude or two bigger. Depends a bit on
your workload (including size of scans and concurrency) obviously.

Greetings,

Andres Freund


Re: Setting effective_io_concurrency in VM?

From
Mark Kirkwood
Date:
On 28/11/17 07:40, Scott Marlowe wrote:

> On Mon, Nov 27, 2017 at 11:23 AM, Don Seiler <don@seiler.us> wrote:
>> Good afternoon.
>>
>> We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX
>> machines. We currently have effective_io_concurrency set to the default of
>> 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2
>> parity drives. I know that RAID10 is recommended, just working with what
>> I've inherited for now (storage is high-end HP 3Par and HP recommended RAID
>> 6 for best performance).
>>
>> Anyway, I'm wondering if, in a virtualized environment with a VM datastore,
>> it makes sense to set effective_io_concurrency closer to the number of data
>> drives?
>>
>> I'd also be interested in hearing how others have configured their
>> PostgreSQL instances for VMs (if there's anything special to think about).
> Generally VMs are never going to be as fast as running on bare metal
> etc. You can adjust it and test it with something simple like pgbench
> with various settings for -c (concurrency) and see where it peaks etc
> with the setting. This will at least get you into the ball park.
>
> A while back we needed fast machines with LOTS of storage (7TB data
> drives with 5TB of data on them) and the only way to stuff that many
> 800GB SSDs into a single machine was to use RAID-5 with a spare (I
> lobbied for RAID6 but was overidden eh...) We were able to achieve
> over 15k TPS in pgbench with a 400GB data store on those boxes. The
> secret was to turn off the cache in the RAID controller and cranl up
> effective io concurrency to something around 10 (not sure, it's been a
> while).
>
> tl;dr: Only way to know is to benchmark it. I'd guess that somewhere
> between 10 and 20 is going to get the best throughput but that's just
> a guess. Benchmark it and let us know!

Reasonably modern Linux hosts with Linux guests using Libvirt/KVM should 
be able to get bare metal performance for moderate numbers of cpus (<=8 
last time we benchmarked). It certainly *used* to be the case that 
virtualization sucked for databases, but not so much now.

The advice to benhmark, however - is golden :-)

Cheers

Mark