Re: WAL Insertion Lock Improvements - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: WAL Insertion Lock Improvements |
Date | |
Msg-id | ZDOLy4frwhnBisis@paquier.xyz Whole thread Raw |
In response to | Re: WAL Insertion Lock Improvements (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: WAL Insertion Lock Improvements
|
List | pgsql-hackers |
On Mon, Feb 20, 2023 at 09:49:48PM -0800, Nathan Bossart wrote: > I'm marking this as ready-for-committer. I think a couple of the comments > could use some small adjustments, but that probably doesn't need to hold up > this patch. Apologies. I was planning to have a thorough look at this patch but life got in the way and I have not been able to study what's happening on this thread this close to the feature freeze. Anyway, I am attaching two modules I have written for the sake of this thread while beginning my lookup of the patch: - lwlock_test.tar.gz, validation module for LWLocks with variable waits. This module can be loaded with shared_preload_libraries to have two LWLocks and two variables in shmem, then have 2 backends play ping-pong with each other's locks. An isolation test may be possible, though I have not thought hard about it. Just use a SQL sequence like that, for example, with N > 1 (see README): Backend 1: SELECT lwlock_test_acquire(); Backend 2: SELECT lwlock_test_wait(N); Backend 1: SELECT lwlock_test_update(N); Backend 1: SELECT lwlock_test_release(); - custom_wal.tar.gz, thin wrapper for LogLogicalMessage() able to generate N records of size M bytes in a single SQL call. This can be used to generate records of various sizes for benchmarking, limiting the overhead of individual calls to pg_logical_emit_message_bytea(). I have begun gathering numbers with WAL records of various size and length, using pgbench like: $ cat script.sql \set record_size 1 \set record_number 5000 SELECT custom_wal(:record_size, :record_number); $ pgbench -n -c 500 -t 100 -f script.sql So this limits most the overhead of behind parsing, planning, and most of the INSERT logic. I have been trying to get some reproducible numbers, but I think that I am going to need a bigger maching than what I have been using for the last few days, up to 400 connections. It is worth noting that 00d1e02b may influence a bit the results, so we may want to have more numbers with that in place particularly with INSERTs, and one of the tests used upthread uses single row INSERTs. Another question I had: would it be worth having some tests with pg_wal/ mounted to a tmpfs so as I/O would not be a bottleneck? It should be instructive to get more measurement with a fixed number of transactions and a rather high amount of concurrent connections (1k at least?), where the contention would be on the variable waits. My first impression is that records should not be too small if you want to see more the effects of this patch, either. Looking at the patch.. LWLockConflictsWithVar() and LWLockReleaseClearVar() are the trivial bits. These are OK. + * NB: Note the use of pg_atomic_exchange_u64 as opposed to just + * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is + * a full barrier, we're guaranteed that the subsequent shared memory + * reads/writes, if any, happen after we reset the lock variable. This mentions that the subsequent read/write operations are safe, so this refers to anything happening after the variable is reset. As a full barrier, should be also mention that this is also ordered with respect to anything that the caller did before clearing the variable? From this perspective using pg_atomic_exchange_u64() makes sense to me in LWLockReleaseClearVar(). + * XXX: Use of a spinlock at the beginning of this function to read + * current insert position implies memory ordering. That means that + * the immediate loads and stores to shared memory (for instance, + * in LWLockUpdateVar called via LWLockWaitForVar) don't need an + * explicit memory barrier as far as the current usage is + * concerned. But that might not be safe in general. */ What's the part where this is not safe? Based on what I see, this code path is safe because of the previous spinlock. This is the same comment as at the beginning of LWLockConflictsWithVar(). Is that something that we ought to document at the top of LWLockWaitForVar() as well? We have one caller of this function currently, but there may be more in the future. - * you're about to write out. + * you're about to write out. Using an atomic variable for insertingAt avoids + * taking any explicit lock for reads and writes. Hmm. Not sure that we need to comment at all. -LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val) +LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val) [...] Assert(pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE); - /* Update the lock's value */ - *valptr = val; The sensitive change is in LWLockUpdateVar(). I am not completely sure to understand this removal, though. Does that influence the case where there are waiters? Another thing I was wondering about: how much does the fast-path used in LWLockUpdateVar() influence the performance numbers? Am I right to guess that it counts for most of the gain seen? Or could it be that the removal of the spin lock in LWLockConflictsWithVar()/LWLockWaitForVar() the point that has the highest effect? -- Michael
Attachment
pgsql-hackers by date: