Re: Slow standby snapshot - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Slow standby snapshot
Date
Msg-id 20221116004448.om7vtwcklpmnclpw@awork3.anarazel.de
Whole thread Raw
In response to Re: Slow standby snapshot  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Slow standby snapshot
List pgsql-hackers
Hi,

On 2022-11-15 19:15:15 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-11-15 23:14:42 +0000, Simon Riggs wrote:
> >> Hence more frequent compression is effective at reducing the overhead.
> >> But too frequent compression slows down the startup process, which
> >> can't then keep up.
> >> So we're just looking for an optimal frequency of compression for any
> >> given workload.
> 
> > What about making the behaviour adaptive based on the amount of wasted effort
> > during those two operations, rather than just a hardcoded "emptiness" factor?
> 
> Not quite sure how we could do that, given that those things aren't even
> happening in the same process.

I'm not certain what the best approach is, but I don't think the
not-the-same-process part is a blocker.


Approach 1:

We could have an atomic variable in ProcArrayStruct that counts the amount of
wasted effort and have processes update it whenever they've wasted a
meaningful amount of effort.  Something like counting the skipped elements in
KnownAssignedXidsGetAndSetXmin in a function local static variable and
updating the shared counter whenever that reaches



Approach 2:

Perform conditional cleanup in non-startup processes - I think that'd actually
be ok, as long as ProcArrayLock is held exlusively.  We could count the amount
of skipped elements in KnownAssignedXidsGetAndSetXmin() in a local variable,
and whenever that gets too high, conditionally acquire ProcArrayLock lock
exlusively at the end of GetSnapshotData() and compress KAX. Reset the local
variable independent of getting the lock or not, to avoid causing a lot of
contention.

The nice part is that this would work even without the startup making
process. The not nice part that it'd require a bit of code study to figure out
whether it's safe to modify KAX from outside the startup process.



> But yeah, it does feel like the proposed
> approach is only going to be optimal over a small range of conditions.

In particular, it doesn't adapt at all to workloads that don't replay all that
much, but do compute a lot of snapshots.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Slow standby snapshot
Next
From: Tom Lane
Date:
Subject: Re: Slow standby snapshot