Hi, Zhigou
Thanks for the patch.
On Thu, 02 Jan 2025 at 09:14, "Zhou, Zhiguo" <zhiguo.zhou@intel.com> wrote:
> Hi all,
>
> I am reaching out to solicit your insights and comments on a recent
> proposal regarding the "Lock-free XLog Reservation from WAL." We have
> identified some challenges with the current WAL insertions, which
> require space reservations in the WAL buffer which involve updating
> two shared-memory statuses in XLogCtlInsert: CurrBytePos (the start
> position of the current XLog) and PrevBytePos (the prev-link to the
> previous XLog). Currently, the use of XLogCtlInsert.insertpos_lck
> ensures consistency but introduces lock contention, hindering the
> parallelism of XLog insertions.
>
> To address this issue, we propose the following changes:
>
> 1. Removal of PrevBytePos: This will allow increments of the CurrBytePos (a single uint64 field) to be implemented
withan atomic operation (fetch_add).
> 2. Updating Prev-Link of next XLog: Based on the fact that the
> prev-link of the next XLog always points to the head of the current
> Xlog,we will slightly exceed the reserved memory range of the current
> XLog to update the prev-link of the next XLog, regardless of which
> backend acquires the next memory space. The next XLog inserter will
> wait until its prev-link is updated for CRC calculation before
> starting its own XLog copy into the WAL.
> 3. Breaking Sequential Write Convention: Each backend will update the
> prev-link of its next XLog first, then return to the header position
> for the current log insertion. This change will reduce the dependency
> of XLog writes on previous ones (compared with the sequential writes).
> 4. Revised GetXLogBuffer: To support #3, we need update this function to separate the LSN it intends to access from
theLSN it expects to update in the insertingAt field.
> 5. Increase NUM_XLOGINSERT_LOCKS: With the above changes, increasing NUM_XLOGINSERT_LOCKS, for example to 128, could
effectivelyenhance the parallelism.
>
> The attached patch could pass the regression tests (make check, make
> check-world), and in the performance test of this POC on SPR (480
> vCPU) shows that this optimization could help the TPCC benchmark
> better scale with the core count and as a result the performance with
> full cores enabled could be improved by 2.04x.
>
I tried it and found it cannot pass all test-cases. Here is the output
comes from Rocky 9.5.
mkdir build && cd build
../configure \
--prefix=/tmp/pg \
--enable-tap-tests \
--enable-debug \
--enable-cassert \
--enable-depend \
--enable-dtrace \
--with-icu \
--with-openssl \
--with-python \
--with-libxml \
--with-libxslt \
--with-lz4 \
--with-pam \
CFLAGS='-Wmissing-prototypes -Wincompatible-pointer-types'
make -j $(nproc) -s && make install -s
(cd contrib/ && make -j $(nproc) -s && make install -s)
make check-world
The error as follows:
echo "# +++ tap check in src/test/recovery +++" && rm -rf '/home/japin/postgresql/build/src/test/recovery'/tmp_check &&
/usr/bin/mkdir-p '/home/japin/postgresql/build/src/test/recovery'/tmp_check && cd
/home/japin/postgresql/build/../src/test/recovery&&
TESTLOGDIR='/home/japin/postgresql/build/src/test/recovery/tmp_check/log'
TESTDATADIR='/home/japin/postgresql/build/src/test/recovery/tmp_check'
PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/bin:/home/japin/postgresql/build/src/test/recovery:$PATH"
LD_LIBRARY_PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/lib"
INITDB_TEMPLATE='/home/japin/postgresql/build'/tmp_install/initdb-template PGPORT='65432'
top_builddir='/home/japin/postgresql/build/src/test/recovery/../../..'
PG_REGRESS='/home/japin/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress'/usr/bin/prove -I
/home/japin/postgresql/build/../src/test/perl/-I /home/japin/postgresql/build/../src/test/recovery t/*.pl
# +++ tap check in src/test/recovery +++
t/001_stream_rep.pl ................... ok
t/002_archiving.pl .................... 1/? Bailout called. Further testing stopped: command "pg_ctl -D
/home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata-l
/home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.logpromote" exited with value 1
FAILED--Further testing stopped: command "pg_ctl -D
/home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata-l
/home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.logpromote" exited with value 1
make[2]: *** [Makefile:28: check] Error 255
make[2]: Leaving directory '/home/japin/postgresql/build/src/test/recovery'
make[1]: *** [Makefile:42: check-recovery-recurse] Error 2
make[1]: Leaving directory '/home/japin/postgresql/build/src/test'
make: *** [GNUmakefile:71: check-world-src/test-recurse] Error 2
--
Regrads,
Japin Li