Thread: [GENERAL] effective_io_concurrency increasing

[GENERAL] effective_io_concurrency increasing

From
Patrick B
Date:
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I use effective_io_concurrency = 0.

What can be the benefits of increasing that number? Also, do you guys have any recommendations? 

I'm using PG 9.2 and the official doc does not say much about which value you should use.

If I put it to 1, does it mean I can have a query spread into 1 processor?

Thanks
P

Re: [GENERAL] effective_io_concurrency increasing

From
Melvin Davidson
Date:


On Sun, Jun 18, 2017 at 9:02 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I use effective_io_concurrency = 0.

What can be the benefits of increasing that number? Also, do you guys have any recommendations? 

I'm using PG 9.2 and the official doc does not say much about which value you should use.

If I put it to 1, does it mean I can have a query spread into 1 processor?

Thanks
P

18.4.6. Asynchronous Behavior

effective_io_concurrency (integer)
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] effective_io_concurrency increasing

From
Patrick B
Date:


2017-06-19 13:19 GMT+12:00 Melvin Davidson <melvin6925@gmail.com>:


On Sun, Jun 18, 2017 at 9:02 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I use effective_io_concurrency = 0.

What can be the benefits of increasing that number? Also, do you guys have any recommendations? 

I'm using PG 9.2 and the official doc does not say much about which value you should use.

If I put it to 1, does it mean I can have a query spread into 1 processor?

Thanks
P

18.4.6. Asynchronous Behavior

effective_io_concurrency (integer)


I've done that! But I'm looking for some personal experiences and suggestions!!

Re: [GENERAL] effective_io_concurrency increasing

From
Melvin Davidson
Date:
As per the docs:
1. This is dependent on whether or not you are using a RAID disk,
2. "Some experimentation may be needed to find the best value"

IOW, there is no general recommendation.

On Sun, Jun 18, 2017 at 9:24 PM, Patrick B <patrickbakerbr@gmail.com> wrote:


2017-06-19 13:19 GMT+12:00 Melvin Davidson <melvin6925@gmail.com>:


On Sun, Jun 18, 2017 at 9:02 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I use effective_io_concurrency = 0.

What can be the benefits of increasing that number? Also, do you guys have any recommendations? 

I'm using PG 9.2 and the official doc does not say much about which value you should use.

If I put it to 1, does it mean I can have a query spread into 1 processor?

Thanks
P

18.4.6. Asynchronous Behavior

effective_io_concurrency (integer)


I've done that! But I'm looking for some personal experiences and suggestions!!



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] effective_io_concurrency increasing

From
"David G. Johnston"
Date:
On Sun, Jun 18, 2017 at 6:02 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I use
​​
 effective_io_concurrency = 0.

​It seems as though the number of virtual CPUs little to no bearing on whether, or to what value, you should set this parameter.  Obviously with only one CPU parallelism wouldn't be possible (I'm assuming a single query does not make multiple parallel requests for data) but the value seems to strictly describe a characteristic the I/O subsystem.  Whether you can fully leverage a properly set large value is another matter.

​As general advice, even you are using a soon to be obsolete (or any non-current really) version of PostgreSQL when you are learning about a new concept checking the most recent docs can be helpful.  Generally only bugs in the docs get back-patched but a number of doc contributions are not bug related but helpful none-the-less.


​In short, if you want any good advice you will need to figure out the specifics of your I/O subsystem (non-volatile memory and any associated hardware), and share that with the list.​ Lacking rules-of-thumb learning how to test your system and measure changes would help get you to the end goal.  Sadly not a skill I've really picked up as of yet.

​David J.​

Re: [GENERAL] effective_io_concurrency increasing

From
Andreas Kretschmer
Date:

Am 19.06.2017 um 03:02 schrieb Patrick B:
> Hi guys.
>
> I just wanna understand the effective_io_concurrency value better.
>
> My current Master database server has 16 vCPUS and I
> use effective_io_concurrency = 0.
>
> What can be the benefits of increasing that number? Also, do you guys
> have any recommendations?
>
>
as far as i know, at the moment only bitmap-index-scans would benefit
from higher values. You can try 16 or 32 as starting point.
(if you have a proper io-controller with cache)


(it has nothing to do with parallel execution of queries)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: [GENERAL] effective_io_concurrency increasing

From
Jeff Janes
Date:
On Sun, Jun 18, 2017 at 7:09 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sun, Jun 18, 2017 at 6:02 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I use
​​
 effective_io_concurrency = 0.

​It seems as though the number of virtual CPUs little to no bearing on whether, or to what value, you should set this parameter.  Obviously with only one CPU parallelism wouldn't be possible (I'm assuming a single query does not make multiple parallel requests for data)

Ah, but it does.  That is exactly what this parameter is for.

Unfortunately, it is only implemented in very narrow circumstances.  You have to be doing bitmap index scans of many widely scattered rows to make it useful.  I don't think that this is all that common of a situation.  The problem is that at every point in the scan, it has to be possible to know what data block it is going to want N iterations in the future, so you can inform the kernel to pre-fetch it.  That is only easy to know for bitmap scans.

If you have a RAID, set it to the number of spindles in your RAID and forget it. It is usually one of the less interesting knobs to play with.  (Unless your usage pattern of the database is unusual and exact fits the above pattern.)


Cheers,

Jeff

Re: [GENERAL] effective_io_concurrency increasing

From
Merlin Moncure
Date:
On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> If you have a RAID, set it to the number of spindles in your RAID and forget
> it. It is usually one of the less interesting knobs to play with.  (Unless
> your usage pattern of the database is unusual and exact fits the above
> pattern.)

Isn't that advice obsolete in a SSD world though?  I was able to show
values up to 256 for a single device provided measurable gains for a
single S3500.  It's true though that the class of queries that this
would help is pretty narrow.

merlin


Re: [GENERAL] effective_io_concurrency increasing

From
Jeff Janes
Date:
On Mon, Jun 19, 2017 at 8:49 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> If you have a RAID, set it to the number of spindles in your RAID and forget
> it. It is usually one of the less interesting knobs to play with.  (Unless
> your usage pattern of the database is unusual and exact fits the above
> pattern.)

Isn't that advice obsolete in a SSD world though?  I was able to show
values up to 256 for a single device provided measurable gains for a
single S3500.  It's true though that the class of queries that this
would help is pretty narrow.

I don't think it is obsolete, you just have to be creative with how you interpret 'spindle' :)

With a single laptop hard-drive, I could get improvements of about 2 fold by setting it to very high numbers, like 50 or 80. By giving the hard drive the option of dozens of different possible sectors to read next, it could minimize head-seek.  But that is with just one query running at a time.  With multiple queries all running simultaneously all trying to take advantage of this, performance gains quickly fell apart.  I would expect the SSD situation to be similar to that, where the improvements are measurable but also fragile, but I haven't tested it.

Cheers,

Jeff

Re: [GENERAL] effective_io_concurrency increasing

From
Bruce Momjian
Date:
On Mon, Jun 19, 2017 at 10:49:59AM -0500, Merlin Moncure wrote:
> On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > If you have a RAID, set it to the number of spindles in your RAID and forget
> > it. It is usually one of the less interesting knobs to play with.  (Unless
> > your usage pattern of the database is unusual and exact fits the above
> > pattern.)
>
> Isn't that advice obsolete in a SSD world though?  I was able to show
> values up to 256 for a single device provided measurable gains for a
> single S3500.  It's true though that the class of queries that this
> would help is pretty narrow.

Our developer docs are much clearer:

    https://www.postgresql.org/docs/10/static/runtime-config-resource.html#runtime-config-resource-disk

    For magnetic drives, a good starting point for this setting is the
    number of separate drives comprising a RAID 0 stripe or RAID 1 mirror
    being used for the database. (For RAID 5 the parity drive should not be
    counted.) However, if the database is often busy with multiple queries
    issued in concurrent sessions, lower values may be sufficient to keep
    the disk array busy. A value higher than needed to keep the disks busy
    will only result in extra CPU overhead. SSDs and other memory-based
    storage can often process many concurrent requests, so the best value
    might be in the hundreds.

I didn't backpatch this change since the original docs were not
incorrect.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: [GENERAL] effective_io_concurrency increasing

From
Peter Geoghegan
Date:
On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Unfortunately, it is only implemented in very narrow circumstances.  You
> have to be doing bitmap index scans of many widely scattered rows to make it
> useful.  I don't think that this is all that common of a situation.  The
> problem is that at every point in the scan, it has to be possible to know
> what data block it is going to want N iterations in the future, so you can
> inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> scans.

I think that you could prefetch in index scans by using the
pointers/downlinks in the immediate parent page of the leaf page that
the index scan currently pins. The sibling pointer in the leaf itself
is no good for this, because there is only one block to prefetch
available at a time.

I think that this is the way index scan prefetch is normally
implemented. Index scans will on average have a much more random
access pattern than what is typical for bitmap heap scans, making this
optimization more compelling, so hopefully someone will get around to
this.

--
Peter Geoghegan


Re: [GENERAL] effective_io_concurrency increasing

From
Alvaro Herrera
Date:
Peter Geoghegan wrote:
> On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > Unfortunately, it is only implemented in very narrow circumstances.  You
> > have to be doing bitmap index scans of many widely scattered rows to make it
> > useful.  I don't think that this is all that common of a situation.  The
> > problem is that at every point in the scan, it has to be possible to know
> > what data block it is going to want N iterations in the future, so you can
> > inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> > scans.
>
> I think that you could prefetch in index scans by using the
> pointers/downlinks in the immediate parent page of the leaf page that
> the index scan currently pins. The sibling pointer in the leaf itself
> is no good for this, because there is only one block to prefetch
> available at a time.

Surely you could prefetch all the heap pages pointed to by index items
in the current leaf index page ...

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] effective_io_concurrency increasing

From
Peter Geoghegan
Date:
On Mon, Jun 19, 2017 at 3:25 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Surely you could prefetch all the heap pages pointed to by index items
> in the current leaf index page ...

I'm sure that you could do that too. I'm not sure how valuable each
prefetching optimization is.

I can imagine prefetching heap pages mattering a lot less for a
primary key index, where there is a strong preexisting correlation
between physical and logical order, while also mattering a lot more
than what I describe in other cases. I suppose that you need both.

--
Peter Geoghegan


Re: [GENERAL] effective_io_concurrency increasing

From
Andres Freund
Date:
On 2017-06-19 15:21:20 -0700, Peter Geoghegan wrote:
> On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > Unfortunately, it is only implemented in very narrow circumstances.  You
> > have to be doing bitmap index scans of many widely scattered rows to make it
> > useful.  I don't think that this is all that common of a situation.  The
> > problem is that at every point in the scan, it has to be possible to know
> > what data block it is going to want N iterations in the future, so you can
> > inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> > scans.
>
> I think that you could prefetch in index scans by using the
> pointers/downlinks in the immediate parent page of the leaf page that
> the index scan currently pins. The sibling pointer in the leaf itself
> is no good for this, because there is only one block to prefetch
> available at a time.
>
> I think that this is the way index scan prefetch is normally
> implemented. Index scans will on average have a much more random
> access pattern than what is typical for bitmap heap scans, making this
> optimization more compelling, so hopefully someone will get around to
> this.

I think for index based merge and nestloop joins, it'd be hugely
beneficial to do prefetching on the index, but more importantly on the
heap level.  Not entirely trivial to do however.

- Andres


Re: [GENERAL] effective_io_concurrency increasing

From
Peter Geoghegan
Date:
On Mon, Jun 19, 2017 at 4:35 PM, Andres Freund <andres@anarazel.de> wrote:
>> I think that this is the way index scan prefetch is normally
>> implemented. Index scans will on average have a much more random
>> access pattern than what is typical for bitmap heap scans, making this
>> optimization more compelling, so hopefully someone will get around to
>> this.
>
> I think for index based merge and nestloop joins, it'd be hugely
> beneficial to do prefetching on the index, but more importantly on the
> heap level.  Not entirely trivial to do however.

Speaking of nestloop join, and on a similar note, we could do some
caching on the inner side of a nestloop join.

We already track if the outer side access path of a nestloop join
preserves sort order within the optimizer. It might not be that hard
to teach the optimizer to generate a plan where, when we know that
this has happened, and we know that the outer side is not unique, the
final plan hints to the executor to opportunistically cache every
lookup on the inner side.

This would make only the first lookup for each distinct value on the
outer side actually do an index scan on the inner side. I can imagine
the optimization saving certain queries from consuming a lot of memory
bandwidth, as well as saving them from pinning and locking the same
buffers repeatedly.

--
Peter Geoghegan


Re: [GENERAL] effective_io_concurrency increasing

From
Peter Geoghegan
Date:
On Mon, Jun 19, 2017 at 4:51 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> This would make only the first lookup for each distinct value on the
> outer side actually do an index scan on the inner side. I can imagine
> the optimization saving certain queries from consuming a lot of memory
> bandwidth, as well as saving them from pinning and locking the same
> buffers repeatedly.

Apparently this is sometimes called block nested loop join, and MySQL
has had it for a while now:

https://en.wikipedia.org/wiki/Block_nested_loop

It doesn't necessarily require that the outer side input be sorted,
because you might end up using a hash table, etc.

--
Peter Geoghegan