Re: Potential Large Performance Gain in WAL synching - Mailing list pgsql-hackers

From Giles Lean
Subject Re: Potential Large Performance Gain in WAL synching
Date
Msg-id 17858.1033778946@nemeton.com.au
Whole thread Raw
In response to Re: Potential Large Performance Gain in WAL synching  ("Curtis Faith" <curtis@galtair.com>)
List pgsql-hackers
Curtis Faith writes:

> I'm no Unix filesystem expert but I don't see how the OS can handle
> multiple writes and fsyncs to the same file descriptors without
> blocking other processes from writing at the same time.

Why not?  Other than the necessary synchronisation for attributes such
as file size and modification times, multiple processes can readily
write to different areas of the same file at the "same" time.

fsync() may not return until after the buffers it schedules are
written, but it doesn't have to block subsequent writes to different
buffers in the file either.  (Note too Tom Lane's responses about
when fsync() is used and not used.)

> I'll have to write a test and see if there really is a problem.

Please do.  I expect you'll find things aren't as bad as you fear.

In another posting, you write:

> Hmm, I keep hearing that buffer block writes are asynchronous but I don't
> read that in the code at all. There are simple "write" calls with files
> that are not opened with O_NOBLOCK, so they'll be done synchronously. The
> code for this is relatively straighforward (once you get past the
> storage manager abstraction) so I don't see what I might be missing.

There is a confusion of terminology here: the write() is synchronous
from the point of the application only in that the data is copied into
kernel buffers (or pages remapped, or whatever) before the system call
returns.  For files opened with O_DSYNC the write() would wait for the
data to be written to disk.  Thus O_DSYNC is "synchronous" I/O, but
there is no equivalently easy name for the regular "flush to disk
after write() returns" that the Unix kernel has done ~forever.

The asynchronous I/O that you mention ("aio") is a third thing,
different from both regular write() and write() with O_DSYNC. I
understand that with aio the data is not even transferred to the
kernel before the aio_write() call returns, but I've never programmed
with aio and am not 100% sure how it works.

Regards,

Giles




pgsql-hackers by date:

Previous
From: Greg Copeland
Date:
Subject: Re: Potential Large Performance Gain in WAL synching
Next
From: Joe Conway
Date:
Subject: Re: Improving backend startup interlock