Re: Smoothing the subtrans performance catastrophe - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Smoothing the subtrans performance catastrophe
Date
Msg-id CAFiTN-s2Jh2jD1khT6suO1Yszcn0rbSggGOCd9cGQ9cnBDencQ@mail.gmail.com
Whole thread Raw
In response to Smoothing the subtrans performance catastrophe  (Simon Riggs <simon.riggs@enterprisedb.com>)
List pgsql-hackers
On Mon, Aug 1, 2022 at 10:13 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> "A mathematical catastrophe is a point in a model of an input-output
> system, where a vanishingly small change in the input can produce a
> large change in the output."
>
> We have just such a change in Postgres: when a snapshot overflows. In
> this case it takes only one subxid over the subxid cache limit to slow
> down every request in XidInMVCCSnapshot(), which becomes painful when
> a long running transaction exists at the same time. This situation has
> been noted by various bloggers, but is illustrated clearly in the
> attached diagram, generated by test results from Julien Tachoires.
>
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
>
> Attached patch improves on the situation, as also shown in the attached diagram.
>
> The patch does these things:
>
> 1. Rework XidInMVCCSnapshot() so that it always checks the snapshot
> first, before attempting to lookup subtrans. A related change means
> that we always keep full subxid info in the snapshot, even if one of
> the backends has overflowed.
>
> 2. Use binary search for standby snapshots, since the snapshot subxip
> is in sorted order.
>
> 3. Rework GetTopmostTransaction so that it a) checks xmin as it goes,
> b) only does one iteration on standby snapshots, both of which save
> subtrans lookups in appropriate cases.
> (This was newly added in v6)
>
> Now, is this a panacea? Not at all. What this patch does is smooth out
> the catastrophic effect so that a few overflowed subxids don't spoil
> everybody else's performance, but eventually, if many or all sessions
> have their overflowed subxid caches then the performance will descend
> as before, albeit that the attached patch has some additional
> optimizations (2, 3 above). So what this gives is a better flight
> envelope in case of a small number of occasional overflows.
>
> Please review. Thank you.

+1,
I had a quick look into the patch to understand the idea and I think
the idea looks really promising to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Generalize ereport_startup_progress infrastructure
Next
From: Ashutosh Sharma
Date:
Subject: Correct comment in RemoveNonParentXlogFiles()