Re: should there be a hard-limit on the number of transactionspending undo? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: should there be a hard-limit on the number of transactionspending undo?
Date
Msg-id 20190719180426.lrujuzbhkzhgg3ve@alap3.anarazel.de
Whole thread Raw
In response to should there be a hard-limit on the number of transactions pending undo?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: should there be a hard-limit on the number of transactionspending undo?
List pgsql-hackers
Hi,

On 2019-07-19 13:28:14 -0400, Robert Haas wrote:
> I want to consider three specific scenarios that could cause undo
> application to fail, and then offer some observations about them.
> 
> Scenario #1:
> 
> 1. Sessions 1..N each begin a transaction and write a bunch of data to
> a table (at least enough that we'll try to perform undo in the
> background).
> 2. Session N+1 begins a transaction and tries to lock the same table.
> It blocks.
> 3. Sessions 1..N abort, successfully pushing the undo work into the background.
> 4. Session N+1 now acquires the lock and sits on it.
> 5. Optionally, repeat steps 1-4 K times, each time for a different table.
> 
> Scenario #2:
> 
> 1. Any number of sessions begin a transaction, write a bunch of data,
> and then abort.
> 2. They all try to perform undo in the foreground.
> 3. They get killed using pg_terminate_backend().
> 
> Scenario #3:
> 
> 1. A transaction begins, does some work, and then aborts.
> 2. When undo processing occurs, 1% of such transactions fail during
> undo apply because of a bug in the table AM.
> 3. When undo processing retries after a failure, it fails again
> because the bug is triggered by something about the contents of the
> undo record, rather than by, say, concurrency.


> However, if prepared transactions are in use, we could have a variant
> of scenario #1 in which each transaction is first prepared, and then
> the prepared transaction is rolled back.  Unlike the ordinary case,
> this can lead to a nearly-unbounded growth in the number of
> transactions that are pending undo, because we don't have a way to
> transfer the locks held by PGPROC used for the prepare to some running
> session that could perform the undo.

It doesn't seem that hard - and kind of required for robustness
independent of the decision around "completeness" - to find a way to use
the locks already held by the prepared transaction.


> It's not necessary to have a
> large value for max_prepared_transactions; it only has to be greater
> than 0, because we can keep reusing the same slots with different
> tables.  That is, let N = max_prepared_xacts, and let K be anything at
> all; session N+1 can just stay in the same transaction and keep on
> taking new locks one at a time until the lock table fills up; not sure
> exactly how long that will take, but it's probably a five digit number
> of transactions, or maybe six. In this case, we can't force undo into
> the foreground, so we can exceed the number of transactions that are
> supposed to be backgrounded.

I'm not following, unfortunately.

I don't understand what exactly the scenario is you refer to. You say
"session N+1 can just stay in the same transaction", but then you also
reference something taking "probably a five digit number of
transactions". Are those transactions the prepared ones?

Aloso, if someobdy fills up the entire lock table, then the system is
effectively down - independent of UNDO, and no meaningful amount of UNDO
is going to be written. Perhaps we need some better resource control,
but that's really independent of UNDO.

Perhaps you can just explain the scenario in a few more words? My
comments regarding it probably make no sense, given how little I
understand what the scenario is.


> In scenario #2, the undo work is going to have to be retried in the
> background, and perforce that means reacquiring locks that have been
> released, and so there is a chance of long lock waits and/or deadlock
> that cannot really be avoided. I think there is basically no way at
> all to avoid an unbounded accumulation of transactions requiring undo
> in this case, just as in the similar case where the cluster is
> repeatedly shut down or repeatedly crashes. Eventually, if you have a
> hard cap on the number of transactions requiring undo, you're going to
> hit it, and have to start refusing new undo-using transactions. As
> Thomas pointed out, that might still be better than some other systems
> which use undo, where the system doesn't open for any transactions at
> all after a restart until all undo is retired, and/or where undo is
> never processed in the background. But it's a possible concern. On the
> other hand, if you don't have a hard cap, the system may just get
> further and further behind until it eventually melts, and that's also
> a possible concern.

You could force new connections to complete the rollback processing of
the terminated connection, if there's too much pending UNDO. That'd be a
way of providing back-pressure against such crazy scenarios.  Seems
again that it'd be good to have that pressure, independent of the
decision on completeness.



> One other thing that seems worth noting is that we have to consider
> what happens after a restart.  After a crash, and depending on exactly
> how we design it perhaps also after a non-crash restart, we won't
> immediately know how many outstanding transactions need undo; we'll
> have to grovel through the undo logs to find out. If we've got a hard
> cap, we can't allow new undo-using transactions to start until we
> finish that work.

Couldn't we record the outstanding transactions in the checkpoint, and
then recompute the changes to that record during WAL replay?


> When I first thought about this, I was really concerned about the idea
> of a hard limit, but the more I think about it the less problematic it
> seems. I think in the end it boils down to a question of: when things
> break, what behavior would users prefer? You can either have a fairly
> quick, hard breakage which will definitely get your attention, or you
> can have a long, slow process of gradual degradation that doesn't
> actually stop the system until, say, the XIDs stuck in the undo
> processing queue become old enough to threaten wraparound, or the disk
> fills up.  Which is less evil?

Yea, I think that's what it boils down to... Would be good to have a few
more opinions on this.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg_receivewal documentation
Next
From: Tom Lane
Date:
Subject: Re: Broken defenses against dropping a partitioning column