Thread: increase size of pg_commit_ts buffers
I wrote this patch last year in response to a customer issue and I thought I had submitted it here, but evidently I didn't. So here it is. The short story is: in commit 5364b357fb11 we increased the size of pg_commit (née pg_clog) but we didn't increase the size of pg_commit_ts to match. When commit_ts is in use, this can lead to significant buffer thrashing and thus poor performance. Since commit_ts entries are larger than pg_commit, my proposed patch uses twice as many buffers. Suffice it to say once we did this the customer problem went away. (Andrey Borodin already has a patch to change the behavior for multixact, which is something we should perhaps also do. I now notice that they're also reporting a bug in that thread ... sigh) -- Álvaro Herrera 39°49'30"S 73°17'W "The problem with the future is that it keeps turning into the present" (Hobbes)
Attachment
On Fri, Jan 15, 2021 at 07:07:44PM -0300, Alvaro Herrera wrote: > I wrote this patch last year in response to a customer issue and I > thought I had submitted it here, but evidently I didn't. So here it is. > > The short story is: in commit 5364b357fb11 we increased the size of > pg_commit (née pg_clog) but we didn't increase the size of pg_commit_ts > to match. When commit_ts is in use, this can lead to significant buffer > thrashing and thus poor performance. > > Since commit_ts entries are larger than pg_commit, my proposed patch uses > twice as many buffers. This is a step in the right direction. With commit_ts entries being forty times as large as pg_xact, it's not self-evident that just twice as many buffers is appropriate. Did you try other numbers? I'm fine with proceeding even if not, but the comment should then admit that the new number was a guess that solved problems for one site. > --- a/src/backend/access/transam/commit_ts.c > +++ b/src/backend/access/transam/commit_ts.c > @@ -530,7 +530,7 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS) The comment right above here is outdated. > Size > CommitTsShmemBuffers(void) > { > - return Min(16, Max(4, NBuffers / 1024)); > + return Min(256, Max(4, NBuffers / 512)); > } > > /*
> 16 янв. 2021 г., в 03:07, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а): > > Andrey Borodin already has a patch to change the behavior for > multixact, which is something we should perhaps also do. I now notice > that they're also reporting a bug in that thread ... sigh I've tried in that thread [0] to do more intelligent optimisation than just increase number of buffers. Though, in fact bigger memory had dramatically better effect that lock tricks. Maybe let's make all SLRUs buffer sizes configurable? Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/b4911e88-9969-aaba-f6be-ed57bd5fec36%40darold.net#ecfdfc8a40af563a0f8b1211266b6fcc
On Mon, Feb 15, 2021 at 11:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > 16 янв. 2021 г., в 03:07, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а): > > Andrey Borodin already has a patch to change the behavior for > > multixact, which is something we should perhaps also do. I now notice > > that they're also reporting a bug in that thread ... sigh > > I've tried in that thread [0] to do more intelligent optimisation than just increase number of buffers. > Though, in fact bigger memory had dramatically better effect that lock tricks. > > Maybe let's make all SLRUs buffer sizes configurable? +1 I got interested in the SLRU sizing problem (the lock trick and CV stuff sounds interesting too, but I didn't have time to review that in this cycle). The main problem I'm aware of with it is the linear search, so I tried to fix that. See Andrey's thread for details.
Hi, I think this is ready to go. I was wondering why it merely doubles the number of buffers, as described by previous comments. That seems like a very tiny increase, so considering how much the hardware grew over the last few years it'd probably fail to help some of the large boxes. But it turns out that's not what the patch does. The change is this > - return Min(16, Max(4, NBuffers / 1024)); > + return Min(256, Max(4, NBuffers / 512)); So it does two things - (a) it increases the maximum from 16 to 256 (so 16x), and (b) it doubles the speed how fast we get there. Until now we add 1 buffer per 1024 shared buffers, and the maximum would be reached with 128MB. The patch lowers the steps to 512, and the maximum to 1GB. So this actually increases the number of commit_ts buffers 16x, not 2x. That seems reasonable, I guess. The increase may be smaller for systems with less that 1GB shared buffers, but IMO that's a tiny minority of production systems busy enough for this patch to make a difference. The other question is of course what overhead could this change have on workload that does not have issues with commit_ts buffers (i.e. it's using commit_ts, but would be fine with just the 16 buffers). But my guess is this is negligible, based on how simple the SLRU code is and my previous experiments with SLRU. So +1 to just get this committed, as it is. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 12 Nov 2021 at 17:39, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > So +1 to just get this committed, as it is. This is a real issue affecting Postgres users. Please commit this soon. -- Simon Riggs http://www.EnterpriseDB.com/
On 2021-Nov-30, Simon Riggs wrote: > On Fri, 12 Nov 2021 at 17:39, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > So +1 to just get this committed, as it is. > > This is a real issue affecting Postgres users. Please commit this soon. Uh ouch, I had forgotten that this patch was mine (blush). Thanks for the ping, I pushed it yesterday. I added a comment. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/