Re: WAL Insertion Lock Improvements - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: WAL Insertion Lock Improvements |
Date | |
Msg-id | ZFuIViUI6xy6VegI@paquier.xyz Whole thread Raw |
In response to | Re: WAL Insertion Lock Improvements (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: WAL Insertion Lock Improvements
Re: WAL Insertion Lock Improvements |
List | pgsql-hackers |
On Mon, May 08, 2023 at 05:57:09PM +0530, Bharath Rupireddy wrote: > Note that I've used pg_logical_emit_message() for ease of > understanding about the txns generating various amounts of WAL, but > the pattern is the same if txns are generating various amounts of WAL > say with inserts. Sounds good to me to just rely on that for some comparison numbers. + * NB: LWLockConflictsWithVar (which is called from + * LWLockWaitForVar) relies on the spinlock used above in this + * function and doesn't use a memory barrier. This patch adds the following comment in WaitXLogInsertionsToFinish() because lwlock.c on HEAD mentions that: /* * Test first to see if it the slot is free right now. * * XXX: the caller uses a spinlock before this, so we don't need a memory * barrier here as far as the current usage is concerned. But that might * not be safe in general. */ Should it be something where we'd better be noisy about at the top of LWLockWaitForVar()? We don't want to add a memory barrier at the beginning of LWLockConflictsWithVar(), still it strikes me that somebody that aims at using LWLockWaitForVar() may miss this point because LWLockWaitForVar() is the routine published in lwlock.h, not LWLockConflictsWithVar(). This does not need to be really complicated, say a note at the top of LWLockWaitForVar() among the lines of (?): "Be careful that LWLockConflictsWithVar() does not include a memory barrier, hence the caller of this function may want to rely on an explicit barrier or a spinlock to avoid memory ordering issues." >> + * NB: Note the use of pg_atomic_exchange_u64 as opposed to just >> + * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is >> + * a full barrier, we're guaranteed that the subsequent shared memory >> + * reads/writes, if any, happen after we reset the lock variable. >> >> This mentions that the subsequent read/write operations are safe, so >> this refers to anything happening after the variable is reset. As >> a full barrier, should be also mention that this is also ordered with >> respect to anything that the caller did before clearing the variable? >> From this perspective using pg_atomic_exchange_u64() makes sense to me >> in LWLockReleaseClearVar(). > > Wordsmithed that comment a bit. - * Set the variable's value before releasing the lock, that prevents race - * a race condition wherein a new locker acquires the lock, but hasn't yet - * set the variables value. [...] + * NB: pg_atomic_exchange_u64 is used here as opposed to just + * pg_atomic_write_u64 to update the variable. Since pg_atomic_exchange_u64 + * is a full barrier, we're guaranteed that all loads and stores issued + * prior to setting the variable are completed before any loads or stores + * issued after setting the variable. This is the same explanation as LWLockUpdateVar(), except that we lose the details explaining why we are doing the update before releasing the lock. It took me some time, but I have been able to deploy a big box to see the effect of this patch at a rather large scale (64 vCPU, 512G of memory), with the following test characteristics for HEAD and v6: - TPS comparison with pgbench and pg_logical_emit_message(). - Record sizes of 16, 64, 256, 1k, 4k and 16k. - Clients and jobs equal at 4, 16, 64, 256, 512, 1024, 2048, 4096. - Runs of 3 mins for each of the 48 combinations, meaning 96 runs in total. And here are the results I got: message_size_b | 16 | 64 | 256 | 1024 | 4096 | 16k ------------------|--------|--------|--------|--------|-------|------- head_4_clients | 3026 | 2965 | 2846 | 2880 | 2778 | 2412 head_16_clients | 12087 | 11287 | 11670 | 11100 | 9003 | 5608 head_64_clients | 42995 | 44005 | 43592 | 35437 | 21533 | 11273 head_256_clients | 106775 | 109233 | 104201 | 80759 | 42118 | 16254 head_512_clients | 153849 | 156950 | 142915 | 99288 | 57714 | 16198 head_1024_clients | 122102 | 123895 | 114248 | 117317 | 62270 | 16261 head_2048_clients | 126730 | 115594 | 109671 | 119564 | 62454 | 16298 head_4096_clients | 111564 | 111697 | 119164 | 123483 | 62430 | 16140 v6_4_clients | 2893 | 2917 | 3087 | 2904 | 2786 | 2262 v6_16_clients | 12097 | 11387 | 11579 | 11242 | 9228 | 5661 v6_64_clients | 45124 | 46533 | 42275 | 36124 | 21696 | 11386 v6_256_clients | 121500 | 125732 | 104328 | 78989 | 41949 | 16254 v6_512_clients | 164120 | 174743 | 146677 | 98110 | 60228 | 16171 v6_1024_clients | 168990 | 180710 | 149894 | 117431 | 62271 | 16259 v6_2048_clients | 165426 | 162893 | 146322 | 132476 | 62468 | 16274 v6_4096_clients | 161283 | 158732 | 162474 | 135636 | 62461 | 16030 These tests are not showing me any degradation, and a correlation between the record size and the number of clients where the TPS begins to show a difference between HEAD and v6 of the patch. In short the shorter the record, the better performance gets at a lower client number, still this required at least 256~512 clients with even messages of 16bytes. At the end I'm cool with that. -- Michael
Attachment
pgsql-hackers by date: