Re: should there be a hard-limit on the number of transactionspending undo? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: should there be a hard-limit on the number of transactionspending undo?
Date
Msg-id CA+TgmobMBgfH=2D7xKa10O_2aw-GUXTwja9OKabTQsm-BZiRyg@mail.gmail.com
Whole thread Raw
In response to Re: should there be a hard-limit on the number of transactionspending undo?  (Andres Freund <andres@anarazel.de>)
Responses Re: should there be a hard-limit on the number of transactionspending undo?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Jul 19, 2019 at 2:04 PM Andres Freund <andres@anarazel.de> wrote:
> It doesn't seem that hard - and kind of required for robustness
> independent of the decision around "completeness" - to find a way to use
> the locks already held by the prepared transaction.

I'm not wild about finding more subtasks to put on the must-do list,
but I agree it's doable.

> I'm not following, unfortunately.
>
> I don't understand what exactly the scenario is you refer to. You say
> "session N+1 can just stay in the same transaction", but then you also
> reference something taking "probably a five digit number of
> transactions". Are those transactions the prepared ones?

So you begin a bunch of transactions.  All but one of them begin a
transaction, insert data into a table, and then prepare.  The last one
begins a transaction and locks the table.  Now you roll back all the
prepared transactions.  Those sessions now begin new transactions,
insert data into a second table, and prepare the second set of
transactions.  The last session, which still has the first table
locked, now locks the second table in addition.  Now you again roll
back all the prepared transactions.  At this point you have 2 *
max_prepared_transactions that are waiting for undo, all blocked on
that last session that holds locks on both tables.  So now you go have
all of those sessions begin a third transaction, and they all insert
into a third table, and prepare.  The last session now attempts AEL on
that third table, and once it's waiting, you roll back all the
prepared transactions, after which that last session successfully
picks up its third table lock.

You can keep repeating this, locking a new table each time, until you
run out of lock table space, by which time you will have roughly
max_prepared_transactions * size_of_lock_table transactions waiting
for undo processing.

> You could force new connections to complete the rollback processing of
> the terminated connection, if there's too much pending UNDO. That'd be a
> way of providing back-pressure against such crazy scenarios.  Seems
> again that it'd be good to have that pressure, independent of the
> decision on completeness.

That would definitely provide a whole lot of back-pressure, but it
would also make the system unusable if the undo handler finds a way to
FATAL, or just hangs for some stupid reason (stuck I/O?). It would be
a shame if the administrative action needed to fix the problem were
prevented by the back-pressure mechanism.

One thing I've thought about, which I think would be helpful for a
variety of scenarios, is to have a facility that forces a computed
delay at the each write transaction (when it first writes WAL, or when
an XID is assigned), or we could adapt that to this case and say the
beginning of each undo-using transaction. So for example if you are
about to run out of space in pg_wal, you can slow thinks down to let
the checkpoint complete, or if you are about to run out of XIDs, you
can slow things down to let autovacuum complete, or if you are about
to run out of undo slots, you can slow things down to let some undo to
complete.  The trick is to make sure that you only wait when it's
likely to do some good; if you wait because you're running out of XIDs
and the reason you're running out of XIDs is because somebody left a
replication slot or a prepared transaction around, the back-pressure
is useless.

> Couldn't we record the outstanding transactions in the checkpoint, and
> then recompute the changes to that record during WAL replay?

Hmm, that's not a bad idea. So the transactions would have to "count"
the moment they insert their first undo record, which is exactly the
right thing anyway.

Hmm, but what about transactions that are only touching unlogged tables?

> Yea, I think that's what it boils down to... Would be good to have a few
> more opinions on this.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap
Next
From: Steven Pousty
Date:
Subject: Re: SQL/JSON path issues/questions