Re: Raw device on PostgreSQL - Mailing list pgsql-hackers

From Jonah H. Harris
Subject Re: Raw device on PostgreSQL
Date
Msg-id CADUqk8WTDY476f4kL+dC18vWq3YgbYmeginvD4J8UXpR=L3v8w@mail.gmail.com
Whole thread Raw
In response to Re: Raw device on PostgreSQL  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Responses Re: Raw device on PostgreSQL
List pgsql-hackers
On Wed, Apr 29, 2020 at 8:34 PM Jonah H. Harris <jonah.harris@gmail.com> wrote:
On Tue, Apr 28, 2020 at 8:10 AM Andreas Karlsson <andreas@proxel.se> wrote:
To get the performance benefits from using raw devices I think you would
want to add support for asynchronous IO to PostgreSQL rather than
implementing your own layer to emulate the kernel's buffered IO.

Andres Freund did a talk on aync IO in PostgreSQL earlier this year. It
was not recorded but the slides are available.

https://www.postgresql.eu/events/fosdem2020/schedule/session/2959-asynchronous-io-for-postgresql/

FWIW, in 2007/2008, when I was at EnterpriseDB, Inaam Rana and I implemented a benchmarkable proof-of-concept patch for direct I/O and asynchronous I/O (for libaio and POSIX). We made that patch public, so it should be on the list somewhere. But, we began to run into performance issues related to buffer manager scaling in terms of locking and, specifically, replacement. We began prototyping alternate buffer managers (going back to the old MRU/LRU model with midpoint insertion and testing a 2Q variant) but that wasn't public. I had also prototyped raw device support, which is a good amount of work and required implementing a custom filesystem (similar to Oracle's ASM) within the storage manager. It's probably a bit harder now than it was then, given the number of different types of file access.

Here's a hack job merge of that preliminary PoC AIO/DIO patch against 13devel. This was designed to keep the buffer manager clean using AIO and is write-only. I'll have to dig through some of my other old Postgres 8.x patches to find the AIO-based prefetching version with aio_req_t modified to handle read vs. write in FileAIO. Also, this will likely have an issue with O_DIRECT as additional buffer manager alignment is needed and I haven't tracked it down in 13 yet. As my default development is on a Mac, I have POSIX AIO only. As such, I can't natively play with the O_DIRECT or libaio paths to see if they work without going into Docker or VirtualBox - and I don't care that much right now :)

The code is nasty, but maybe it will give someone ideas. If I get some time to work on it, I'll rewrite it properly.

--
Jonah H. Harris

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: design for parallel backup
Next
From: Masahiro Ikeda
Date:
Subject: Re: Why are wait events not reported even though it reads/writes atimeline history file?