Re: Blocking I/O, async I/O and io_uring - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Blocking I/O, async I/O and io_uring
Date
Msg-id b43db30b-8657-7b6d-cef0-fd5520f4b132@oss.nttdata.com
Whole thread Raw
In response to Blocking I/O, async I/O and io_uring  (Craig Ringer <craig.ringer@enterprisedb.com>)
List pgsql-hackers

On 2020/12/08 11:55, Craig Ringer wrote:
> Hi all
> 
> A new kernel API called io_uring has recently come to my attention. I assume some of you (Andres?) have been
followingit for a while.
 
> 
> io_uring appears to offer a way to make system calls including reads, writes, fsync()s, and more in a non-blocking,
batchedand pipelined manner, with or without O_DIRECT. Basically async I/O with usable buffered I/O and fsync support.
Ithas ordering support which is really important for us.
 
> 
> This should be on our radar. The main barriers to benefiting from linux-aio based async I/O in postgres in the past
hasbeen its reliance on direct I/O, the various kernel-version quirks, platform portability, and its
maybe-async-except-when-it's-randomly-notnature.
 
> 
> The kernel version and portability remain an issue with io_uring so it's not like this is something we can pivot over
tocompletely. But we should probably take a closer look at it.
 
> 
> PostgreSQL spends a huge amount of time waiting, doing nothing, for blocking I/O. If we can improve that then we
couldpotentially realize some major increases in I/O utilization especially for bigger, less concurrent workloads. The
mostobvious candidates to benefit would be redo, logical apply, and bulk loading.
 
> 
> But I have no idea how to even begin to fit this into PostgreSQL's executor pipeline. Almost all PostgreSQL's code is
synchronous-blocking-imperativein nature, with a push/pull executor pipeline. It seems to have been recognised for some
timethat this is increasingly hurting our performance and scalability as platforms become more and more parallel.
 
> 
> To benefit from AIO (be it POSIX, linux-aio, io_uring, Windows AIO, etc) we have to be able to dispatch I/O and do
somethingelse while we wait for the results. So we need the ability to pipeline the executor and pipeline redo.
 
> 
> I thought I'd start the discussion on this and see where we can go with it. What incremental steps can be done to
moveus toward parallelisable I/O without having to redesign everything?
 
> 
> I'm thinking that redo is probably a good first candidate. It doesn't depend on the guts of the executor. It is much
lesssensitive to ordering between operations in shmem and on disk since it runs in the startup process. And it hurts
REALLYBADLY from its single-threaded blocking approach to I/O - as shown by an extension written by 2ndQuadrant that
candouble redo performance by doing read-ahead on btree pages that will soon be needed.
 
> 
> Thoughts anybody?

I was wondering if async I/O might be helpful for the performance
improvement of walreceiver. In physical replication, walreceiver receives,
writes and fsyncs WAL data. Also it does tasks like keepalive. Since
walreceiver is a single process, for example, currently it cannot do other
tasks while fsyncing WAL to the disk.

OTOH, if walreceiver can do other tasks even while fsyncing WAL by
using async I/O, ISTM that it might improve the performance of walreceiver.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS