Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date
Msg-id CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com
Whole thread Raw
In response to Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Mon, Aug 8, 2022 at 6:10 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I played with a simple insert use-case [1] that generates ~380 WAL
> files, with different block sizes. To my surprise, I have not seen any
> improvement with larger block sizes. I may be doing something wrong
> here, suggestions on to test and see the benefits are welcome.
>
> > > I think this should also handle the remainder after processing whole
> > > blocks, just for completeness.  If I call the code as presented with size
> > > 8193, I think this code will only write 8192 bytes.
> >
> > Hm, I will fix it.
>
> Fixed.
>
> I'm attaching v5 patch-set. I've addressed review comments received so
> far and fixed a compiler warning that CF bot complained about.
>
> Please review it further.

I tried to vary the zero buffer size to see if there's any huge
benefit for the WAL-generating queries. Unfortunately, I didn't see
any benefit on my dev system (16 vcore, 512GB SSD, 32GB RAM) . The use
case I've tried is at [1] and the results are at [2].

Having said that, the use of pg_pwritev_with_retry() in walmethods.c
will definitely reduce number of system calls - on HEAD the
dir_open_for_write() makes pad_to_size/XLOG_BLCKSZ i.e. 16MB/8KB =
2,048 write() calls and with patch it makes only 64
pg_pwritev_with_retry() calls with XLOG_BLCKSZ zero buffer size. The
proposed patches will provide straight 32x reduction in system calls
(for pg_receivewal and pg_basebackup) apart from the safety against
partial writes.

[1]
/* built source code with release flags */
./configure --with-zlib --enable-depend --prefix=$PWD/inst/
--with-openssl --with-readline --with-perl --with-libxml CFLAGS='-O2'
> install.log && make -j 8 install > install.log 2>&1 &

\q
./pg_ctl -D data -l logfile stop
rm -rf data

/* ensured that nothing exists in OS page cache */
free -m
sudo su
sync; echo 3 > /proc/sys/vm/drop_caches
exit
free -m

./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "64GB";'
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET work_mem = "16MB";'
./psql -d postgres -c 'ALTER SYSTEM SET checkpoint_timeout = "1d";'
./pg_ctl -D data -l logfile restart
./psql -d postgres -c 'create table foo(bar int);'
./psql -d postgres
\timing
insert into foo select * from generate_series(1, 100000000);   /* this
query generates about 385 WAL files, no checkpoint hence no recycle of
old WAL files, all new WAL files */

[2]
HEAD
Time: 84249.535 ms (01:24.250)

HEAD with wal_init_zero off
Time: 75086.300 ms (01:15.086)

#define    PWRITEV_BLCKSZ    XLOG_BLCKSZ
Time: 85254.302 ms (01:25.254)

#define    PWRITEV_BLCKSZ    (4 * XLOG_BLCKSZ)
Time: 83542.885 ms (01:23.543)

#define    PWRITEV_BLCKSZ    (16 * XLOG_BLCKSZ)
Time: 84035.770 ms (01:24.036)

#define    PWRITEV_BLCKSZ    (64 * XLOG_BLCKSZ)
Time: 84749.021 ms (01:24.749)

#define    PWRITEV_BLCKSZ    (256 * XLOG_BLCKSZ)
Time: 84273.466 ms (01:24.273)

#define    PWRITEV_BLCKSZ    (512 * XLOG_BLCKSZ)
Time: 84233.576 ms (01:24.234)

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Next
From: Richard Guo
Date:
Subject: Re: Using each rel as both outer and inner for JOIN_ANTI