Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+TgmoYxsrXTLR=ANv4cXLdd8VV4GURqATtSwsgKG=asQRVHWQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Jul 19, 2019 at 12:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> We are doing exactly what you have written in the last line of the
> next paragraph "stop the transaction from writing undo when the hash
> table is already too full.".  So we will
> never face the problems related to repeated crash recovery.  The
> definition of too full is that we stop allowing the new transactions
> that can write undo when we have the hash table already have entries
> equivalent to (UndoRollbackHashTableSize() - MaxBackends).  Does this
> make sense?

Oops, I was looking in the wrong place.  Yes, that makes sense, but:

1. It looks like the test you added to PrepareUndoInsert has no
locking, and I don't see how that can be right.

2. It seems like this is would result in testing for each new undo
insertion that gets prepared, whereas surely we would want to only
test when first attaching to an undo log.  If you've already attached
to the undo log, there's no reason not to continue inserting into it,
because doing so doesn't increase the number of transactions (or
transaction-persistence level combinations) that need undo.

3. I don't think the test itself is correct. It can fire even when
there's no problem. It is correct (or would be if it said 2 *
MaxBackends) if every other backend in the system is already attached
to an undo log (or two). But if they are not, it will block
transactions from being started for no reason. For instance, suppose
max_connections = 100 and there are another 100 slots for background
rollbacks. Now suppose that the system crashes when 101 slots are in
use -- 100 pushed into the background plus 1 that was aborted by the
crash. On recovery, this test will refuse to let any new transaction
start. Actually it is OK for up to 99 transactions to write undo, just
not 100.  Or, given that you have a slot per persistence level, it's
OK to have up to 199 transaction-persistence-level combinations in
flight, just not 200. And that is the difference between the system
being unusable after the crash until a rollback succeeds and being
almost fully usable immediately.

> > I don't follow this.  If you have a hash table where the key is XID,
> > there is no need to delete and reinsert anything just because you
> > discover that the XID has not only permanent undo but also unlogged
> > undo, or something of that sort.
>
> The size of the total undo to be processed will be changed if we find
> anyone (permanent or unlogged) later.  Based on the size, the entry
> location should be changed in size queue.

OK, true. But that's not a significant cost, either in runtime or code
complexity.

I still don't really see any good reason to the hash table key be
anything other than XID, or really, FXID. I mean, sure, the data
structure manipulations are a little different, but not in any way
that really matters. And it seems to me that there are some benefits,
the biggest of which is that the system becomes easier for users to
understand.  We can simply say that there is a limit on the number of
transactions that either (1) are in progress and have written undo or
(2) have aborted and not all of the undo has been processed. If the
key is XID + persistence level, then it's a limit on the number of
transaction-and-persistence-level combinations, which I feel is not so
easy to understand. In most but not all scenarios, it means that the
limit is about double what you think the limit is, and as the mistake
in the current version of the patch makes clear, even the people
writing the code can forget about that factor of two.

It affects a few other things, too.  If you made the key XID and fixed
problems (2) and (3) from above, then you'd have a situation where a
transaction could fail at only one times: either it bombs the first
time it tries to write undo, or it works.  As it is, there is a second
failure scenario: you do a bunch of work on permanent (or unlogged)
tables and then try to write to an unlogged (or permanent) table and
it fails because there are not enough slots.  Is that the end of the
world? No, certainly not. The situation should be rare. But if we have
to fail transactions, it's best to fail them before they've started
doing any work, because that minimizes the amount of work we waste by
having to retry. Of course, a transaction that fails midway through
when it tries to write at a second persistence level is also consuming
an undo slot in a situation where we're short of undo slots.

Another thing which Andres pointed out to me off-list is that we might
want to have a function that takes a transaction ID as an argument and
tells you the status of that transaction from the point of view of the
undo machinery: does it have any undo, and if so how much? As you have
it now, such a function would require searching the whole hash table,
because the user won't be able to provide an UndoRecPtr to go with the
XID.  If the hash table key were in fact <XID, undo persistence level>
rather than <XID, UndoRecPtr>, then you could it with two lookups; if
it were XID alone, you could do it with one lookup. The difference
between one lookup and two is not significant, but having to search
the whole hash table is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Compiler warnings with MinGW
Next
From: Robert Haas
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs