Re: Does larger i/o size make sense? - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Does larger i/o size make sense?
Date
Msg-id CAM-w4HOxZ71aG75n6ruRJaSM62CbFUjhHeNp8nsFC-M_sgVTHA@mail.gmail.com
Whole thread Raw
In response to Does larger i/o size make sense?  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
List pgsql-hackers

On Thu, Aug 22, 2013 at 8:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
An idea that I'd like to investigate is, PostgreSQL allocates a set of
continuous buffers to fit larger i/o size when block is referenced due to
sequential scan, then invokes consolidated i/o request on the buffer.
It probably make sense if we can expect upcoming block references
shall be on the neighbor blocks; that is typical sequential read workload.

I think it makes more sense to use scatter gather i/o or async i/o to read to regular sized buffers scattered around memory than to restrict the buffers to needing to be contiguous.

As others said, Postgres depends on the OS buffer cache to do readahead. The scenario where the above becomes interesting is if it's paired with a move to directio or other ways of skipping the buffer cache. Double caching is a huge waste and leads to lots of inefficiencies.

The blocking issue there is that Postgres doesn't understand much about the underlying hardware storage. If there were APIs to find out more about it from the kernel -- how much further before the end of the raid chunk, how much parallelism it has, how congested the i/o channel is, etc -- then Postgres might be on par with the kernel and able to eliminate the double buffering inefficiency and might even be able to do better if it understands its own workload better.

If Postgres did that then it would be necessary to be able to initiate i/o on multiple buffers in parallel. That can be done using scatter gather i/o such as readv() and writev() but that would mean blocking on reading blocks that might not be needed until the future. Or it could be done using libaio to initiate i/o and return control as soon as the needed data is available while other i/o is still pending.


--
greg

pgsql-hackers by date:

Previous
From: Emanuel Calvo
Date:
Subject: Parallel pg_basebackup
Next
From: Dimitri Fontaine
Date:
Subject: Re: pg_system_identifier()