Home > mailing lists

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date	August 8, 2022 12:40:23
Msg-id	CALj2ACUWUJZ2hrHEd0ibHDYhkxiUbt-oMhWN+-uPzXgZArYikg@mail.gmail.com Whole thread Raw
In response to	Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
List	pgsql-hackers

Tree view

On Sun, Aug 7, 2022 at 9:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sun, Aug 7, 2022 at 3:19 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > > A second thing is that pg_pwritev_with_retry_and_write_zeros() is
> > > designed to work on WAL segments initialization and it uses
> > > XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing
> > > in its name that tells us so.  This makes me question whether
> > > file_utils.c is a good location for this second thing.  Could a new
> > > file be a better location?  We have a xlogutils.c in the backend, and
> > > a name similar to that in src/common/ would be one possibility.
> >
> > Yeah, I think it should probably be disconnected from XLOG_BLCKSZ, or
> > maybe it's OK to use BLCKSZ with a comment to say that it's a bit
> > arbitrary, or maybe it's better to define a new zero buffer of some
> > arbitrary size just in this code if that is too strange.  We could
> > experiment with different size buffers to see how it performs, bearing
> > in mind that every time we double it you halve the number of system
> > calls, but also bearing in mind that at some point it's too much for
> > the stack.  I can tell you that the way that code works today was not
> > really written with performance in mind (unlike, say, the code
> > reverted from 9.4 that tried to do this with posix_fallocate()), it
> > was just finding an excuse to call pwritev(), to exercise new fallback
> > code being committed for use by later AIO stuff (more patches coming
> > soon).  The retry support was added because it seemed plausible that
> > some system out there would start to do short writes as we cranked up
> > the sizes for some implementation reason other than ENOSPC, so we
> > should make a reusable retry routine.
>
> Yes, doubling the zerobuffer size to say 2 * XLOG_BLCKSZ or 2 * BLCKSZ
> reduces the system calls to half (right now, pg_pwritev_with_retry()
> gets called 64 times per 16MB WAL file, it writes in the batches of 32
> blocks per call).
>
> Is there a ready-to-use tool or script or specific settings for
> pgbench (pgbench command line options or GUC settings) that I can play
> with to measure the performance?

I played with a simple insert use-case [1] that generates ~380 WAL
files, with different block sizes. To my surprise, I have not seen any
improvement with larger block sizes. I may be doing something wrong
here, suggestions on to test and see the benefits are welcome.

> > I think this should also handle the remainder after processing whole
> > blocks, just for completeness.  If I call the code as presented with size
> > 8193, I think this code will only write 8192 bytes.
>
> Hm, I will fix it.

Fixed.

I'm attaching v5 patch-set. I've addressed review comments received so
far and fixed a compiler warning that CF bot complained about.

Please review it further.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment

pgsql-hackers by date:

From: Luc Vlaming Hummel
Date: 08 August 2022, 12:29:22
Subject: Re: Reducing planning time on tables with many indexes

From: Yedil Serzhan
Date: 08 August 2022, 12:50:17
Subject: Asking for feedback on Pgperffarm

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? - Mailing list pgsql-hackers

Attachment

Previous

Next