Thread: effective_io_concurrency

effective_io_concurrency

From
Jeff Janes
Date:
The bitmap heap scan can benefit quite a bit from
effective_io_concurrency on RAID system (and to some extent even on
single spindle systems)

However, the planner isn't aware of this.  So you have to just be
lucky to have it choose the bitmap heap scan instead of something else
that can't benefit from effective_io_concurrency.

As far as I can tell, the only thing that drives the bitmap heap scan
down in cost is the estimation that you will end up getting multiple
tuples from the same block.  And because of the fuzzy in
compare_path_costs_fuzzily, the estimate has to be 1% of redundant
blocks before the bitmap scan will be considered, and I think the
benefits of effective_io_concurrency can kick in well before that on
very large data sets.

Also, if there some correlation in the table, then the situation is
worse because the index scan lowers its block-read estimates based on
the correlation, while the bitmap scan does not lower its estimate.  I
haven't witnessed such a case, but it seems like there must be
correlation levels small enough that most reading is still scattered,
but large enough to make a difference in the cost estimates between
the two competing access methods that favor the one that is not
actually faster.

From my attempted reading of the thread "posix_fadvise v22", it seems
like modification of the planner was never discussed, rather than
being discussed and rejected.  So, is there a reason not to make the
planner take account of effective_io_concurrency?

But it might be better yet to make ordinary index scans benefit from
effective_io_concurrency, but even if/when that gets done it would
probably still be worthwhile to make the planner understand the
benefit.

Cheers,

Jeff


Re: effective_io_concurrency

From
Robert Haas
Date:
On Sat, Jul 28, 2012 at 4:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> From my attempted reading of the thread "posix_fadvise v22", it seems
> like modification of the planner was never discussed, rather than
> being discussed and rejected.  So, is there a reason not to make the
> planner take account of effective_io_concurrency?

Not that I can see.

> But it might be better yet to make ordinary index scans benefit from
> effective_io_concurrency, but even if/when that gets done it would
> probably still be worthwhile to make the planner understand the
> benefit.

That sounds good too, but separate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: effective_io_concurrency

From
Peter Geoghegan
Date:
On 30 August 2012 20:28, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jul 28, 2012 at 4:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> But it might be better yet to make ordinary index scans benefit from
>> effective_io_concurrency, but even if/when that gets done it would
>> probably still be worthwhile to make the planner understand the
>> benefit.
>
> That sounds good too, but separate.

Indeed. The original effective_io_concurrency commit message said:

"""
***SNIP***

(The best way to handle this for plain index scans is still under debate,
so that part is not applied yet --- tgl)
"""

...seems like a pity that this debate never reached a useful conclusion.

Just how helpful is effective_io_concurrency? Did someone produce a
benchmark at some point?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services



Re: effective_io_concurrency

From
Jeff Janes
Date:
On Thu, Aug 30, 2012 at 1:25 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

> Just how helpful is effective_io_concurrency? Did someone produce a
> benchmark at some point?

Attached is a benchmark I put together a while ago.

I don't know how close to "real world" it might be.  I haven't seen it
in the wild, but I'm anticipating I will see something like it soon.

The benefit is of course reduced if you were to apply high levels of
-c $clients, but cases where -c 1 are frequent in data mining and
such.

Obviously it would be better to cluster the giant table on "abalance",
but that might be hard to do and then hard to maintain.

Size of generated database is about 70GB.

Server had 16GB of RAM.  It is a virtual server, and I don't know the
real hardware, but am told it has 8 spindles.
(if the range used in the query is increased from +100 to +1000, even
more speed up is available)

effective_io_concurrency    tps
0    2.06273064
1    2.1693092
2    4.11726948
3    5.90785352
4    6.65748384
5    7.58297556
6    8.36130404
7    8.86561116
8    9.2673546
9    9.57076168
10    9.8558758
11    10.11641752
12    10.316673
13    10.46953468
14    10.65962516
15    10.76328636
16    10.86442376
17    10.96362168
18    11.04371008
19    11.19470171
20    11.30110867
21    11.39553967
22    11.45420263
23    11.54764725
24    11.61949146
25    11.65659225
26    11.68992392
27    11.75944667
28    11.7456135
29    11.80111779
30    11.72897188
31    11.7210945
32    11.73292504
33    11.734458
34    11.75195196
35    11.79079175
36    11.73687979
37    11.79583758
38    11.75879063
39    11.77868596
40    11.74685896
41    11.76294508
42    11.7213265
43    11.68458158
44    11.71036729
45    11.72728229
46    11.72063796
47    11.80322429
48    11.83563058
49    11.81916996
50    11.73395892

Cheers,

Jeff

Attachment