Thread: Re: RFC: Lock-free XLog Reservation from WAL

Re: RFC: Lock-free XLog Reservation from WAL

From
Japin Li
Date:
Hi, Zhigou

Thanks for the patch.

On Thu, 02 Jan 2025 at 09:14, "Zhou, Zhiguo" <zhiguo.zhou@intel.com> wrote:
> Hi all,
>
> I am reaching out to solicit your insights and comments on a recent
> proposal regarding the "Lock-free XLog Reservation from WAL." We have
> identified some challenges with the current WAL insertions, which
> require space reservations in the WAL buffer which involve updating
> two shared-memory statuses in XLogCtlInsert: CurrBytePos (the start
> position of the current XLog) and PrevBytePos (the prev-link to the
> previous XLog). Currently, the use of XLogCtlInsert.insertpos_lck
> ensures consistency but introduces lock contention, hindering the
> parallelism of XLog insertions.
>
> To address this issue, we propose the following changes:
>
> 1. Removal of PrevBytePos: This will allow increments of the CurrBytePos (a single uint64 field) to be implemented
withan atomic operation (fetch_add). 
> 2. Updating Prev-Link of next XLog: Based on the fact that the
> prev-link of the next XLog always points to the head of the current
> Xlog,we will slightly exceed the reserved memory range of the current
> XLog to update the prev-link of the next XLog, regardless of which
> backend acquires the next memory space. The next XLog inserter will
> wait until its prev-link is updated for CRC calculation before
> starting its own XLog copy into the WAL.
> 3. Breaking Sequential Write Convention: Each backend will update the
> prev-link of its next XLog first, then return to the header position
> for the current log insertion. This change will reduce the dependency
> of XLog writes on previous ones (compared with the sequential writes).
> 4. Revised GetXLogBuffer: To support #3, we need update this function to separate the LSN it intends to access from
theLSN it expects to update in the insertingAt field. 
> 5. Increase NUM_XLOGINSERT_LOCKS: With the above changes, increasing NUM_XLOGINSERT_LOCKS, for example to 128, could
effectivelyenhance the parallelism. 
>
> The attached patch could pass the regression tests (make check, make
> check-world), and in the performance test of this POC on SPR (480
> vCPU) shows that this optimization could help the TPCC benchmark
> better scale with the core count and as a result the performance with
> full cores enabled could be improved by 2.04x.
>

I tried it and found it cannot pass all test-cases.  Here is the output
comes from Rocky 9.5.

        mkdir build && cd build
        ../configure \
                --prefix=/tmp/pg \
                --enable-tap-tests \
                --enable-debug \
                --enable-cassert \
                --enable-depend \
                --enable-dtrace \
                --with-icu \
                --with-openssl \
                --with-python \
                --with-libxml \
                --with-libxslt \
                --with-lz4 \
                --with-pam \
                CFLAGS='-Wmissing-prototypes -Wincompatible-pointer-types'
        make -j $(nproc) -s && make install -s
        (cd contrib/ && make -j $(nproc) -s && make install -s)
        make check-world

The error as follows:

echo "# +++ tap check in src/test/recovery +++" && rm -rf '/home/japin/postgresql/build/src/test/recovery'/tmp_check &&
/usr/bin/mkdir-p '/home/japin/postgresql/build/src/test/recovery'/tmp_check && cd
/home/japin/postgresql/build/../src/test/recovery&&
TESTLOGDIR='/home/japin/postgresql/build/src/test/recovery/tmp_check/log'
TESTDATADIR='/home/japin/postgresql/build/src/test/recovery/tmp_check'
PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/bin:/home/japin/postgresql/build/src/test/recovery:$PATH"
LD_LIBRARY_PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/lib"
INITDB_TEMPLATE='/home/japin/postgresql/build'/tmp_install/initdb-template PGPORT='65432'
top_builddir='/home/japin/postgresql/build/src/test/recovery/../../..'
PG_REGRESS='/home/japin/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress'/usr/bin/prove -I
/home/japin/postgresql/build/../src/test/perl/-I /home/japin/postgresql/build/../src/test/recovery  t/*.pl 
# +++ tap check in src/test/recovery +++
t/001_stream_rep.pl ................... ok
t/002_archiving.pl .................... 1/? Bailout called.  Further testing stopped:  command "pg_ctl -D
/home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata-l
/home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.logpromote" exited with value 1 
FAILED--Further testing stopped: command "pg_ctl -D
/home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata-l
/home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.logpromote" exited with value 1 
make[2]: *** [Makefile:28: check] Error 255
make[2]: Leaving directory '/home/japin/postgresql/build/src/test/recovery'
make[1]: *** [Makefile:42: check-recovery-recurse] Error 2
make[1]: Leaving directory '/home/japin/postgresql/build/src/test'
make: *** [GNUmakefile:71: check-world-src/test-recurse] Error 2

--
Regrads,
Japin Li



Re: RFC: Lock-free XLog Reservation from WAL

From
"Zhou, Zhiguo"
Date:
Hi Japin,

Thanks so much for your test and review. As you may have noticed, this 
patch has implemented the initial optimization idea and has passed only 
the basic regression tests. We have planned to extend the validation to 
include TAP tests after aligning the expectations with the community, 
but we still haven't had the chance to do so yet.

We will try to address the issues raised and invite you to review the 
new version of the patch once it is completed.

On 1/6/2025 11:32 AM, Japin Li wrote:

> I tried it and found it cannot pass all test-cases.  Here is the output
> comes from Rocky 9.5.
> 
>          mkdir build && cd build
>          ../configure \
>                  --prefix=/tmp/pg \
>                  --enable-tap-tests \
>                  --enable-debug \
>                  --enable-cassert \
>                  --enable-depend \
>                  --enable-dtrace \
>                  --with-icu \
>                  --with-openssl \
>                  --with-python \
>                  --with-libxml \
>                  --with-libxslt \
>                  --with-lz4 \
>                  --with-pam \
>                  CFLAGS='-Wmissing-prototypes -Wincompatible-pointer-types'
>          make -j $(nproc) -s && make install -s
>          (cd contrib/ && make -j $(nproc) -s && make install -s)
>          make check-world
> 
> The error as follows:
> 
> echo "# +++ tap check in src/test/recovery +++" && rm -rf '/home/japin/postgresql/build/src/test/recovery'/tmp_check
&&/usr/bin/mkdir -p '/home/japin/postgresql/build/src/test/recovery'/tmp_check && cd
/home/japin/postgresql/build/../src/test/recovery&&
TESTLOGDIR='/home/japin/postgresql/build/src/test/recovery/tmp_check/log'
TESTDATADIR='/home/japin/postgresql/build/src/test/recovery/tmp_check'
PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/bin:/home/japin/postgresql/build/src/test/recovery:$PATH"
LD_LIBRARY_PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/lib"
INITDB_TEMPLATE='/home/japin/postgresql/build'/tmp_install/initdb-template PGPORT='65432'
top_builddir='/home/japin/postgresql/build/src/test/recovery/../../..'
PG_REGRESS='/home/japin/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress'/usr/bin/prove -I
/home/japin/postgresql/build/../src/test/perl/-I /home/japin/postgresql/build/../src/test/recovery  t/*.pl
 
> # +++ tap check in src/test/recovery +++
> t/001_stream_rep.pl ................... ok
> t/002_archiving.pl .................... 1/? Bailout called.  Further testing stopped:  command "pg_ctl -D
/home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata-l
/home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.logpromote" exited with value 1
 
> FAILED--Further testing stopped: command "pg_ctl -D
/home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata-l
/home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.logpromote" exited with value 1
 
> make[2]: *** [Makefile:28: check] Error 255
> make[2]: Leaving directory '/home/japin/postgresql/build/src/test/recovery'
> make[1]: *** [Makefile:42: check-recovery-recurse] Error 2
> make[1]: Leaving directory '/home/japin/postgresql/build/src/test'
> make: *** [GNUmakefile:71: check-world-src/test-recurse] Error 2
> 

This error report is really helpful for me to diagnose the problem. Thanks!

Regards,
Zhiguo