Thread: Smoothing the subtrans performance catastrophe

Smoothing the subtrans performance catastrophe

From

Simon Riggs

Date:

01 August 2022, 16:42:49

"A mathematical catastrophe is a point in a model of an input-output
system, where a vanishingly small change in the input can produce a
large change in the output."

We have just such a change in Postgres: when a snapshot overflows. In
this case it takes only one subxid over the subxid cache limit to slow
down every request in XidInMVCCSnapshot(), which becomes painful when
a long running transaction exists at the same time. This situation has
been noted by various bloggers, but is illustrated clearly in the
attached diagram, generated by test results from Julien Tachoires.

The reason for the slowdown is clear: when we overflow we check every
xid against subtrans, producing a large stream of lookups. Some
previous hackers have tried to speed up subtrans - this patch takes a
different approach: remove as many subtrans lookups as possible. (So
is not competing with those other solutions).

Attached patch improves on the situation, as also shown in the attached diagram.

The patch does these things:

1. Rework XidInMVCCSnapshot() so that it always checks the snapshot
first, before attempting to lookup subtrans. A related change means
that we always keep full subxid info in the snapshot, even if one of
the backends has overflowed.

2. Use binary search for standby snapshots, since the snapshot subxip
is in sorted order.

3. Rework GetTopmostTransaction so that it a) checks xmin as it goes,
b) only does one iteration on standby snapshots, both of which save
subtrans lookups in appropriate cases.
(This was newly added in v6)

Now, is this a panacea? Not at all. What this patch does is smooth out
the catastrophic effect so that a few overflowed subxids don't spoil
everybody else's performance, but eventually, if many or all sessions
have their overflowed subxid caches then the performance will descend
as before, albeit that the attached patch has some additional
optimizations (2, 3 above). So what this gives is a better flight
envelope in case of a small number of occasional overflows.

Please review. Thank you.

--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Smoothing the subtrans performance catastrophe

From

Dilip Kumar

Date:

03 August 2022, 11:55:42

On Mon, Aug 1, 2022 at 10:13 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> "A mathematical catastrophe is a point in a model of an input-output
> system, where a vanishingly small change in the input can produce a
> large change in the output."
>
> We have just such a change in Postgres: when a snapshot overflows. In
> this case it takes only one subxid over the subxid cache limit to slow
> down every request in XidInMVCCSnapshot(), which becomes painful when
> a long running transaction exists at the same time. This situation has
> been noted by various bloggers, but is illustrated clearly in the
> attached diagram, generated by test results from Julien Tachoires.
>
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
>
> Attached patch improves on the situation, as also shown in the attached diagram.
>
> The patch does these things:
>
> 1. Rework XidInMVCCSnapshot() so that it always checks the snapshot
> first, before attempting to lookup subtrans. A related change means
> that we always keep full subxid info in the snapshot, even if one of
> the backends has overflowed.
>
> 2. Use binary search for standby snapshots, since the snapshot subxip
> is in sorted order.
>
> 3. Rework GetTopmostTransaction so that it a) checks xmin as it goes,
> b) only does one iteration on standby snapshots, both of which save
> subtrans lookups in appropriate cases.
> (This was newly added in v6)
>
> Now, is this a panacea? Not at all. What this patch does is smooth out
> the catastrophic effect so that a few overflowed subxids don't spoil
> everybody else's performance, but eventually, if many or all sessions
> have their overflowed subxid caches then the performance will descend
> as before, albeit that the attached patch has some additional
> optimizations (2, 3 above). So what this gives is a better flight
> envelope in case of a small number of occasional overflows.
>
> Please review. Thank you.

+1,
I had a quick look into the patch to understand the idea and I think
the idea looks really promising to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Smoothing the subtrans performance catastrophe

From

Andres Freund

Date:

03 August 2022, 19:18:26

Hi,

On 2022-08-01 17:42:49 +0100, Simon Riggs wrote:
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
> 
> Attached patch improves on the situation, as also shown in the attached diagram.

I think we should consider redesigning subtrans more substantially - even with
the changes you propose here, there's still plenty ways to hit really bad
performance. And there's only so much we can do about that without more
fundamental design changes.

One way to fix a lot of the issues around pg_subtrans would be remove the
pg_subtrans SLRU and replace it with a purely in-memory hashtable. IMO there's
really no good reason to use an SLRU for it (anymore).

In contrast to e.g. clog or multixact we don't need to access a lot of old
entries, we don't need persistency etc. Nor is it a good use of memory and IO
to have loads of pg_subtrans pages that don't point anywhere, because the xid
is just a "normal" xid.

While we can't put a useful hard cap on the number of potential subtrans
entries (we can only throw subxid->parent mappings away once no existing
snapshot might need them), saying that there can't be more subxids "considered
running" at a time than can fit in memory doesn't seem like a particularly
problematic restriction.

So, why don't we use a dshash table with some amount of statically allocated
memory for the mapping? In common cases that will *reduce* memory usage
(because we don't need to reserve space for [as many] subxids in snapshots /
procarray anymore) and IO (no mostly-zeroes pg_subtrans).

Greetings,

Andres Freund

Re: Smoothing the subtrans performance catastrophe

From

Robert Haas

Date:

03 August 2022, 19:36:40

On Wed, Aug 3, 2022 at 3:18 PM Andres Freund <andres@anarazel.de> wrote:
> In contrast to e.g. clog or multixact we don't need to access a lot of old
> While we can't put a useful hard cap on the number of potential subtrans
> entries (we can only throw subxid->parent mappings away once no existing
> snapshot might need them), saying that there can't be more subxids "considered
> running" at a time than can fit in memory doesn't seem like a particularly
> problematic restriction.

That sounds really problematic to me, unless I misunderstand what
you're proposing. Say I have a plpgsql containing a FOR loop which in
turn contains an EXCEPTION block which in turn does DML. Right now,
that loop could iterate millions of times and everything would still
work. Sure, there might be performance impacts depending on what else
is happening on the system, but it might also be totally fine. IIUC,
you'd like to make that case fail outright. I think that's a
non-starter.

I don't know whether Simon's ideas here are amazingly good, utterly
terrible, or something in between, but I think we can evaluate the
patch actually submitted rather than propose a complete redesign of
the entire mechanism - especially one that seems like it would break
stuff that currently works.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Smoothing the subtrans performance catastrophe

From

Andres Freund

Date:

03 August 2022, 20:14:37

Hi,

On 2022-08-03 15:36:40 -0400, Robert Haas wrote:
> On Wed, Aug 3, 2022 at 3:18 PM Andres Freund <andres@anarazel.de> wrote:
> > In contrast to e.g. clog or multixact we don't need to access a lot of old
> > While we can't put a useful hard cap on the number of potential subtrans
> > entries (we can only throw subxid->parent mappings away once no existing
> > snapshot might need them), saying that there can't be more subxids "considered
> > running" at a time than can fit in memory doesn't seem like a particularly
> > problematic restriction.
>
> That sounds really problematic to me, unless I misunderstand what
> you're proposing. Say I have a plpgsql containing a FOR loop which in
> turn contains an EXCEPTION block which in turn does DML. Right now,
> that loop could iterate millions of times and everything would still
> work. Sure, there might be performance impacts depending on what else
> is happening on the system, but it might also be totally fine. IIUC,
> you'd like to make that case fail outright. I think that's a
> non-starter.

I don't think this scenario would fundamentally change - we already keep the
set of subxids in backend local memory (e.g. either a dedicated
TransactionStateData or an element in ->childXids) and in the locking table
(via XactLockTableInsert()).

The problematic part isn't keeping "actually" running subxids in memory, but
keeping subxids that might be "considered running" in memory (i.e. subxids
that are considered running by an old snapshot in another backend).

A hashtable just containing child->parent mapping for subxids doesn't actually
need that much memory. It'd be approximately (2 * 4 bytes) * subxids *
(2-fillfactor) or such? So maybe ~10MB for 1 milllion subxids?  Allocating
that on-demand doesn't strike me as prohibitive.

> I don't know whether Simon's ideas here are amazingly good, utterly
> terrible, or something in between, but I think we can evaluate the
> patch actually submitted rather than propose a complete redesign of
> the entire mechanism - especially one that seems like it would break
> stuff that currently works.

We've had quite a few patches that try to address issues around subids, but
only ever improve things on the margins. I'm doubtful that's a useful use of
time.

Greetings,

Andres Freund

Re: Smoothing the subtrans performance catastrophe

From

Simon Riggs

Date:

04 August 2022, 12:11:25

On Wed, 3 Aug 2022 at 20:18, Andres Freund <andres@anarazel.de> wrote:

> On 2022-08-01 17:42:49 +0100, Simon Riggs wrote:
> > The reason for the slowdown is clear: when we overflow we check every
> > xid against subtrans, producing a large stream of lookups. Some
> > previous hackers have tried to speed up subtrans - this patch takes a
> > different approach: remove as many subtrans lookups as possible. (So
> > is not competing with those other solutions).
> >
> > Attached patch improves on the situation, as also shown in the attached diagram.
>
> I think we should consider redesigning subtrans more substantially - even with
> the changes you propose here, there's still plenty ways to hit really bad
> performance. And there's only so much we can do about that without more
> fundamental design changes.

I completely agree - you will be glad to hear that I've been working
on a redesign of the subtrans module.

But we should be clear that redesigning subtrans has nothing to do
with this patch; they are separate ideas and this patch relates to
XidInMVCCSnapshot(), an important caller of subtrans.

I will post my patch, when complete, in a different thread.

> One way to fix a lot of the issues around pg_subtrans would be remove the
> pg_subtrans SLRU and replace it with a purely in-memory hashtable. IMO there's
> really no good reason to use an SLRU for it (anymore).
>
> In contrast to e.g. clog or multixact we don't need to access a lot of old
> entries, we don't need persistency etc. Nor is it a good use of memory and IO
> to have loads of pg_subtrans pages that don't point anywhere, because the xid
> is just a "normal" xid.
>
> While we can't put a useful hard cap on the number of potential subtrans
> entries (we can only throw subxid->parent mappings away once no existing
> snapshot might need them), saying that there can't be more subxids "considered
> running" at a time than can fit in memory doesn't seem like a particularly
> problematic restriction.

I do agree that sometimes it is easier to impose restrictions than to
try to provide unbounded resources.

Having said that, I can't see an easy way of making that work well in
practice for this case. Making write transactions just suddenly stop
working at some point doesn't sound like it would be good for
availability, especially when it happens sporadically and
unpredictably as that would, whenever long running transactions appear
alongside users of subtransactions.

> So, why don't we use a dshash table with some amount of statically allocated
> memory for the mapping? In common cases that will *reduce* memory usage
> (because we don't need to reserve space for [as many] subxids in snapshots /
> procarray anymore) and IO (no mostly-zeroes pg_subtrans).

I considered this and have ruled it out, but as I said above, we can
discuss that on a different thread.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Smoothing the subtrans performance catastrophe

From

Robert Haas

Date:

04 August 2022, 13:17:23

On Wed, Aug 3, 2022 at 4:14 PM Andres Freund <andres@anarazel.de> wrote:
> I don't think this scenario would fundamentally change - we already keep the
> set of subxids in backend local memory (e.g. either a dedicated
> TransactionStateData or an element in ->childXids) and in the locking table
> (via XactLockTableInsert()).

Sure....

> The problematic part isn't keeping "actually" running subxids in memory, but
> keeping subxids that might be "considered running" in memory (i.e. subxids
> that are considered running by an old snapshot in another backend).
>
> A hashtable just containing child->parent mapping for subxids doesn't actually
> need that much memory. It'd be approximately (2 * 4 bytes) * subxids *
> (2-fillfactor) or such? So maybe ~10MB for 1 milllion subxids?  Allocating
> that on-demand doesn't strike me as prohibitive.

I mean the worst case is ~2 bn, no?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Smoothing the subtrans performance catastrophe

From

Simon Riggs

Date:

06 September 2022, 08:36:48

On Thu, 4 Aug 2022 at 13:11, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> I will post my patch, when complete, in a different thread.

To avoid confusion, I will withdraw this patch from the CF, in favour
of my other patch on a similar topic,
Minimizing calls to SubTransSetParent()
https://commitfest.postgresql.org/39/3806/.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

RE: Smoothing the subtrans performance catastrophe

From

Kurlaev Jaroslav

Date:

21 March 2023, 06:51:42

Hello Simon,

Can you please provide the test cases that you used to plot the performance
graph you've attached.
Also do you know if your optimization will be useful for a very large amount
of subtransactions per transaction (around 1000)?

-- 
Thanks,
Yaroslav