On Tue, Apr 28, 2020 at 02:10:51PM +0200, Andreas Karlsson wrote:
>On 4/28/20 10:43 AM, Benjamin Schaller wrote:
>>for an university project I'm currently doing some research on
>>PostgreSQL. I was wondering if hypothetically it would be possible
>>to implement a raw device system to PostgreSQL. I know that the
>>disadvantages would probably be higher than the advantages compared
>>to working with the file system. Just hypothetically: Would it be
>>possible to change the source code of PostgreSQL so a raw device
>>system could be implemented, or would that cause a chain reaction so
>>that basically one would have to rewrite almost the entire code,
>>because too many elements of PostgreSQL rely on the file system?
>
>It would require quite a bit of work since 1) PostgreSQL stores its
>data in multiple files and 2) PostgreSQL currently supports only
>synchronous buffered IO.
>
Not sure how that's related to raw devices, which is what Benjamin was
asking about. AFAICS most of the changes would be in smgr.c and md.c,
but I might be wrong.
I'd imagine supporting raw devices would require implementing some sort
of custom file system on the device, and I'd expect it to work with
relation segments just fine. So why would that be a problem?
The synchronous buffered I/O is a bigger challenge, I guess, but then
again - you could continue using synchronous I/O even with raw devices.
>To get the performance benefits from using raw devices I think you
>would want to add support for asynchronous IO to PostgreSQL rather
>than implementing your own layer to emulate the kernel's buffered IO.
>
>Andres Freund did a talk on aync IO in PostgreSQL earlier this year.
>It was not recorded but the slides are available.
>
>https://www.postgresql.eu/events/fosdem2020/schedule/session/2959-asynchronous-io-for-postgresql/
>
Yeah, I think the question is what are the expected benefits of using
raw devices. It might be an interesting exercise / experiment, but my
understanding is that most of the benefits can be achieved by using file
systems but with direct I/O and async I/O, which would allow us to
continue reusing the existing filesystem code with much less disruption
to our code base.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services