Re: Slow standby snapshot - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Slow standby snapshot
Date
Msg-id 20221116000653.shgrvzm2q4sucj56@awork3.anarazel.de
Whole thread Raw
In response to Re: Slow standby snapshot  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: Slow standby snapshot
List pgsql-hackers
Hi,

On 2022-11-15 23:14:42 +0000, Simon Riggs wrote:
> * COMMITs: xids are removed from the array by performing a binary
> search - this gets more and more expensive as the array gets wider
> * SNAPSHOTs: scanning the array for snapshots becomes more expensive
> as the array gets wider
>
> Hence more frequent compression is effective at reducing the overhead.
> But too frequent compression slows down the startup process, which
> can't then keep up.

> So we're just looking for an optimal frequency of compression for any
> given workload.

What about making the behaviour adaptive based on the amount of wasted effort
during those two operations, rather than just a hardcoded "emptiness" factor?
It's common that basically no snapshots are taken, and constantly compressing
in that case is likely going to be wasted effort.


The heuristic the patch adds to KnownAssignedXidsRemoveTree() seems somewhat
misplaced to me. When called from ProcArrayApplyXidAssignment() we probably
should always compress - it'll only be issued when a substantial amount of
subxids have been assigned, so there'll be a bunch of cleanup work.  It makes
more sense from ExpireTreeKnownAssignedTransactionIds(), since it will very
commonly called for individual xids - but even then, we afaict should take
into account how many xids we've just expired.

I don't think the xids % KAX_COMPRESS_FREQUENCY == 0 filter is a good idea -
if you have a workload with plenty subxids you might end up never compressing
because xids divisible by KAX_COMPRESS_FREQUENCY will end up as a subxid
most/all of the time.


Re cost of processing at COMMITs: We do a fresh binary search for each subxid
right now. There's a lot of locality in the xids that can be expired. Perhaps
we could have a cache for the position of the latest value in
KnownAssignedXidsSearch() and search linearly if the distance from the last
looked up value isn't large?


Greetings,

Andres



pgsql-hackers by date:

Previous
From: Juan José Santamaría Flecha
Date:
Subject: Re: Meson doesn't define HAVE_LOCALE_T for mscv
Next
From: Tom Lane
Date:
Subject: Re: SUBTRANS: Minimizing calls to SubTransSetParent()