Re: Slow standby snapshot - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Slow standby snapshot
Date
Msg-id 1258157.1668558695@sss.pgh.pa.us
Whole thread Raw
In response to Re: Slow standby snapshot  (Simon Riggs <simon.riggs@enterprisedb.com>)
List pgsql-hackers
Simon Riggs <simon.riggs@enterprisedb.com> writes:
> but that is not related to the main issues:

> * COMMITs: xids are removed from the array by performing a binary
> search - this gets more and more expensive as the array gets wider
> * SNAPSHOTs: scanning the array for snapshots becomes more expensive
> as the array gets wider

Right.  The case complained of in this thread is SNAPSHOT cost,
since that's what KnownAssignedXidsGetAndSetXmin is used for.

> Hence more frequent compression is effective at reducing the overhead.
> But too frequent compression slows down the startup process, which
> can't then keep up.
> So we're just looking for an optimal frequency of compression for any
> given workload.

Hmm.  I wonder if my inability to detect a problem is because the startup
process does keep ahead of the workload on my machine, while it fails
to do so on the OP's machine.  I've only got a 16-CPU machine at hand,
which probably limits the ability of the primary to saturate the standby's
startup process.  If that's accurate, reducing the frequency of
compression attempts could be counterproductive in my workload range.
It would help the startup process when that is the bottleneck --- but
that wasn't what the OP complained of, so I'm not sure it helps him
either.

It seems like maybe what we should do is just drop the "nelements < 4 *
PROCARRAY_MAXPROCS" part of the existing heuristic, which is clearly
dangerous with large max_connection settings, and in any case doesn't
have a clear connection to either the cost of scanning or the cost
of compressing.  Or we could replace that with a hardwired constant,
like "nelements < 400".

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Add test module for Custom WAL Resource Manager feature
Next
From: Andres Freund
Date:
Subject: Re: Slow standby snapshot