Re: Slow standby snapshot - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Slow standby snapshot
Date
Msg-id CANbhV-HJ1vUh_tMO9ub8AZMjE9ekj2hEdTFsQO3=3A9qZCUpuQ@mail.gmail.com
Whole thread Raw
In response to Re: Slow standby snapshot  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Slow standby snapshot
List pgsql-hackers
On Tue, 22 Nov 2022 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> > We seem to have replaced one magic constant with another, so not sure
> > if this is autotuning, but I like it much better than what we had
> > before (i.e. better than my prev patch).
>
> Yeah, the magic constant is still magic, even if it looks like it's
> not terribly sensitive to the exact value.
>
> > 1. I was surprised that you removed the limits on size and just had
> > the wasted work limit. If there is no read traffic that will mean we
> > hardly ever compress, which means the removal of xids at commit will
> > get slower over time.  I would prefer that we forced compression on a
> > regular basis, such as every time we process an XLOG_RUNNING_XACTS
> > message (every 15s), as well as when we hit certain size limits.
>
> > 2. If there is lots of read traffic but no changes flowing, it would
> > also make sense to force compression when the startup process goes
> > idle rather than wait for the work to be wasted first.
>
> If we do those things, do we need a wasted-work counter at all?
>
> I still suspect that 90% of the problem is the max_connections
> dependency in the existing heuristic, because of the fact that
> you have to push max_connections to the moon before it becomes
> a measurable problem.  If we do
>
> -        if (nelements < 4 * PROCARRAY_MAXPROCS ||
> -            nelements < 2 * pArray->numKnownAssignedXids)
> +        if (nelements < 2 * pArray->numKnownAssignedXids)
>
> and then add the forced compressions you suggest, where
> does that put us?

The forced compressions I propose happen
* when idle - since we have time to do it when that happens, which
happens often since most workloads are bursty
* every 15s - since we already have lock
which is overall much less often than every 64 commits, as benchmarked
by Michail.
I didn't mean to imply that superceded the wasted work approach, it
was meant to be in addition to.

The wasted work counter works well to respond to heavy read-only
traffic and also avoids wasted compressions for write-heavy workloads.
So I still like it the best.

> Also, if we add more forced compressions, it seems like we should have
> a short-circuit for a forced compression where there's nothing to do.
> So more or less like
>
>     nelements = head - tail;
>     if (!force)
>     {
>         if (nelements < 2 * pArray->numKnownAssignedXids)
>             return;
>     }
>     else
>     {
>         if (nelements == pArray->numKnownAssignedXids)
>             return;
>     }

+1

> I'm also wondering why there's not an
>
>     Assert(compress_index == pArray->numKnownAssignedXids);
>
> after the loop, to make sure our numKnownAssignedXids tracking
> is sane.

+1

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Damage control for planner's get_actual_variable_endpoint() runaway
Next
From: Joe Conway
Date:
Subject: Re: fixing CREATEROLE