Thread: Does larger i/o size make sense?

Does larger i/o size make sense?

From
Kohei KaiGai
Date:
Hello,

A few days before, I got a question as described in the subject line on
a discussion with my colleague.

In general, larger i/o size per system call gives us wider bandwidth on
sequential read, than multiple system calls with smaller i/o size.
Probably, people knows this heuristics.

On the other hand, PostgreSQL always reads database files by BLCKSZ
(= usually, 8KB) when referenced block was not on the shared buffer,
however, it doesn't seem to me it can pull maximum performance of
modern storage system.

I'm not certain whether we had discussed this kind of ideas, or not.
So, I'd like to see the reason why we stick on the fixed length i/o size,
if similar ideas were rejected before.

An idea that I'd like to investigate is, PostgreSQL allocates a set of
continuous buffers to fit larger i/o size when block is referenced due to
sequential scan, then invokes consolidated i/o request on the buffer.
It probably make sense if we can expect upcoming block references
shall be on the neighbor blocks; that is typical sequential read workload.

Of course, we shall need to solve some complicated stuff, like prevention
of fragmentation on shared buffers, or enhancement of internal APIs of
storage manager to accept larger i/o size.
Furthermore, it seems to me this idea has worth to investigate.

Any comments please. Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



Re: Does larger i/o size make sense?

From
Merlin Moncure
Date:
On Thu, Aug 22, 2013 at 2:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> Hello,
>
> A few days before, I got a question as described in the subject line on
> a discussion with my colleague.
>
> In general, larger i/o size per system call gives us wider bandwidth on
> sequential read, than multiple system calls with smaller i/o size.
> Probably, people knows this heuristics.
>
> On the other hand, PostgreSQL always reads database files by BLCKSZ
> (= usually, 8KB) when referenced block was not on the shared buffer,
> however, it doesn't seem to me it can pull maximum performance of
> modern storage system.
>
> I'm not certain whether we had discussed this kind of ideas, or not.
> So, I'd like to see the reason why we stick on the fixed length i/o size,
> if similar ideas were rejected before.
>
> An idea that I'd like to investigate is, PostgreSQL allocates a set of
> continuous buffers to fit larger i/o size when block is referenced due to
> sequential scan, then invokes consolidated i/o request on the buffer.
> It probably make sense if we can expect upcoming block references
> shall be on the neighbor blocks; that is typical sequential read workload.
>
> Of course, we shall need to solve some complicated stuff, like prevention
> of fragmentation on shared buffers, or enhancement of internal APIs of
> storage manager to accept larger i/o size.
> Furthermore, it seems to me this idea has worth to investigate.
>
> Any comments please. Thanks,

Isn't this dealt with at least in part by effective i/o concurrency
and o/s readahead?

merlin



Re: Does larger i/o size make sense?

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> On Thu, Aug 22, 2013 at 2:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> An idea that I'd like to investigate is, PostgreSQL allocates a set of
>> continuous buffers to fit larger i/o size when block is referenced due to
>> sequential scan, then invokes consolidated i/o request on the buffer.

> Isn't this dealt with at least in part by effective i/o concurrency
> and o/s readahead?

I should think so.  It's very difficult to predict future block-access
requirements for anything except a seqscan, and for that, we expect the
OS will detect the access pattern and start reading ahead on its own.

Another point here is that you could get some of the hoped-for benefit
just by increasing BLCKSZ ... but nobody's ever demonstrated any
compelling benefit from larger BLCKSZ (except on specialized workloads,
if memory serves).

The big-picture problem with work in this area is that no matter how you
do it, any benefit is likely to be both platform- and workload-specific.
So the prospects for getting a patch accepted aren't all that bright.
        regards, tom lane



Re: Does larger i/o size make sense?

From
Kohei KaiGai
Date:
2013/8/23 Tom Lane <tgl@sss.pgh.pa.us>:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Thu, Aug 22, 2013 at 2:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> An idea that I'd like to investigate is, PostgreSQL allocates a set of
>>> continuous buffers to fit larger i/o size when block is referenced due to
>>> sequential scan, then invokes consolidated i/o request on the buffer.
>
>> Isn't this dealt with at least in part by effective i/o concurrency
>> and o/s readahead?
>
> I should think so.  It's very difficult to predict future block-access
> requirements for anything except a seqscan, and for that, we expect the
> OS will detect the access pattern and start reading ahead on its own.
>
> Another point here is that you could get some of the hoped-for benefit
> just by increasing BLCKSZ ... but nobody's ever demonstrated any
> compelling benefit from larger BLCKSZ (except on specialized workloads,
> if memory serves).
>
> The big-picture problem with work in this area is that no matter how you
> do it, any benefit is likely to be both platform- and workload-specific.
> So the prospects for getting a patch accepted aren't all that bright.
>
Hmm. I might overlook effect of readahead on operating system level.
Indeed, sequential scan has a workload that easily launches it, so
smaller i/o size in application level will be hidden.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



Re: Does larger i/o size make sense?

From
Fabien COELHO
Date:
> The big-picture problem with work in this area is that no matter how you
> do it, any benefit is likely to be both platform- and workload-specific.
> So the prospects for getting a patch accepted aren't all that bright.

Indeed.

Would it make sense to have something easier to configure that recompiling 
postgresql and managing a custom executable, say a block size that could 
be configured from initdb and/or postmaster.conf, or maybe per-object 
settings specified at creation time?

Note that the block size may also affect the cache behavior, for instance 
for pure random accesses, more "recently accessed" tuples can be kept in 
memory if the pages are smaller. So there are other reasons to play with 
the blocksize than I/O access times, and an option to do that more easily 
would help.

-- 
Fabien.



Re: Does larger i/o size make sense?

From
Kohei KaiGai
Date:
2013/8/23 Fabien COELHO <coelho@cri.ensmp.fr>:
>
>> The big-picture problem with work in this area is that no matter how you
>> do it, any benefit is likely to be both platform- and workload-specific.
>> So the prospects for getting a patch accepted aren't all that bright.
>
>
> Indeed.
>
> Would it make sense to have something easier to configure that recompiling
> postgresql and managing a custom executable, say a block size that could be
> configured from initdb and/or postmaster.conf, or maybe per-object settings
> specified at creation time?
>
I love the idea of per-object block size setting according to expected workload;
maybe configured by DBA. In case when we have to run sequential scan on
large tables, larger block size may have less pain than interruption per 8KB
boundary to switch the block being currently focused on, even though random
access via index scan loves smaller block size.

> Note that the block size may also affect the cache behavior, for instance
> for pure random accesses, more "recently accessed" tuples can be kept in
> memory if the pages are smaller. So there are other reasons to play with the
> blocksize than I/O access times, and an option to do that more easily would
> help.
>
I see. Uniformed block-size could simplify the implementation, thus no need
to worry about a scenario that continuous buffer allocation push out pages to
be kept in memory.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



Re: Does larger i/o size make sense?

From
Fabien COELHO
Date:
>> Would it make sense to have something easier to configure that recompiling
>> postgresql and managing a custom executable, say a block size that could be
>> configured from initdb and/or postmaster.conf, or maybe per-object settings
>> specified at creation time?
>>
> I love the idea of per-object block size setting according to expected workload;

My 0.02€: wait to see whether the idea get some positive feedback by core 
people before investing any time in that...

The per object would be a lot of work. A per initdb (so per cluster) 
setting (block size, wal size...) would much easier to implement, but it 
impacts for storage format.

> large tables, larger block size may have less pain than interruption per 8KB
> boundary to switch the block being currently focused on, even though random
> access via index scan loves smaller block size.

Yep, as Tom noted, this is really workload specific.

-- 
Fabien.

Re: Does larger i/o size make sense?

From
Greg Stark
Date:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Aug 22, 2013 at 8:53 PM, Kohei KaiGai
<spandir="ltr"><<a href="mailto:kaigai@kaigai.gr.jp" target="_blank">kaigai@kaigai.gr.jp</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":652"
style="overflow:hidden">An idea that I'd like to investigate is, PostgreSQL allocates a set of<br /> continuous buffers
tofit larger i/o size when block is referenced due to<br /> sequential scan, then invokes consolidated i/o request on
thebuffer.<br /> It probably make sense if we can expect upcoming block references<br /> shall be on the neighbor
blocks;that is typical sequential read workload.</div></blockquote></div><br /></div><div class="gmail_extra">I think
itmakes more sense to use scatter gather i/o or async i/o to read to regular sized buffers scattered around memory than
torestrict the buffers to needing to be contiguous.<br /><br /></div><div class="gmail_extra">As others said, Postgres
dependson the OS buffer cache to do readahead. The scenario where the above becomes interesting is if it's paired with
amove to directio or other ways of skipping the buffer cache. Double caching is a huge waste and leads to lots of
inefficiencies.<br /><br /></div><div class="gmail_extra">The blocking issue there is that Postgres doesn't understand
muchabout the underlying hardware storage. If there were APIs to find out more about it from the kernel -- how much
furtherbefore the end of the raid chunk, how much parallelism it has, how congested the i/o channel is, etc -- then
Postgresmight be on par with the kernel and able to eliminate the double buffering inefficiency and might even be able
todo better if it understands its own workload better.<br /><br /></div><div class="gmail_extra">If Postgres did that
thenit would be necessary to be able to initiate i/o on multiple buffers in parallel. That can be done using scatter
gatheri/o such as readv() and writev() but that would mean blocking on reading blocks that might not be needed until
thefuture. Or it could be done using libaio to initiate i/o and return control as soon as the needed data is available
whileother i/o is still pending.<br /><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">-- <br
/>greg<br/></div></div> 

Re: Does larger i/o size make sense?

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us>

> Another point here is that you could get some of the hoped-for
> benefit just by increasing BLCKSZ ... but nobody's ever
> demonstrated any compelling benefit from larger BLCKSZ (except on
> specialized workloads, if memory serves).

I think I've seen a handful of reports of performance differences
with different BLCKSZ builds (perhaps not all on community lists). 
My recollection is that some people sifting through data in data
warehouse environments see a performance benefit up to 32KB, but
that tests of GiST index performance with different sizes showed
better performance with smaller sizes down to around 2KB.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Does larger i/o size make sense?

From
Josh Berkus
Date:
Kevin,

> I think I've seen a handful of reports of performance differences
> with different BLCKSZ builds (perhaps not all on community lists). 
> My recollection is that some people sifting through data in data
> warehouse environments see a performance benefit up to 32KB, but
> that tests of GiST index performance with different sizes showed
> better performance with smaller sizes down to around 2KB.

I believe that Greenplum currently uses 128K.  There's a definite
benefit for the DW use-case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Does larger i/o size make sense?

From
Greg Smith
Date:
On 8/27/13 3:54 PM, Josh Berkus wrote:
> I believe that Greenplum currently uses 128K.  There's a definite
> benefit for the DW use-case.

Since Linux read-ahead can easily give big gains on fast storage, I 
normally set that to at least 4096 sectors = 2048KB.  That's a lot 
bigger than even this, and definitely necessary for reaching maximum 
storage speed.

I don't think that the block size change alone will necessarily 
duplicate the gains on seq scans that Greenplum gets though.  They've 
done a lot more performance optimization on that part of the read path 
than just the larger block size.

As far as quantifying whether this is worth chasing, the most useful 
thing to do here is find some fast storage and profile the code with 
different block sizes at a large read-ahead.  I wouldn't spend a minute 
on trying to come up with a more complicated management scheme until the 
potential gain is measured.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com