Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+Tgmobhzbc+5n4oU1QU0nCwd8L2x_TyjTWSBH+rH2HAdS_LAQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Jul 17, 2019 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> We add entries in queues only when we want them to be processed by
> background workers whereas hash table will contain the entries for all
> the pending undo requests irrespective of whether they are executed by
> foreground-transaction or by background workers.  Once the request is
> processed, we remove it from the hash table.  The reasons for keeping
> all the pending abort requests in hash table is that it allows us to
> compute oldestXidHavingUnappliedUndo and second is it avoids us to
> have duplicate undo requests by backends and discard worker.  In
> short, there is no reason to keep all the entries in queues, but there
> are reasons to keep all the aborted xact entries in hash table.

I think we're drifting off on a tangent here.  That does make sense,
but my original comment that led to this discussion was
"PerformUndoActions() also thinks that there is a possibility of
failing to insert a failed request into the error queue, and makes
reference to such requests being rediscovered by the discard worker,
..." and none of what you've written explains why there is or should
be a possibility of failing to insert a request into the error queue.
I feel like we've discussed this point to death. You just make the
maximum size of the queue equal to the maximum size of the hash table,
and it can't ever fail to have room for a new entry.  If you remove
entries lazily, then it can, but any time it does, you can just go and
clean out all of the dead entries and you're guaranteed to then have
enough room.  And if we switch to rbtree then we won't do lazy removal
any more, and it won't matter anyway.

> > Anyway, if you don't like this solution, propose something else. It's
> > impossible to correctly implement a hard limit unless the number of
> > aborted-but-not-yet-undone transaction is bounded to (HARD_LIMIT -
> > ENTRIES_THAT_WOULD_BE_ADDED_AFTER_RECOVERY_IF_THE_SYSTEM_CRASHED_NOW).
> > If there are 100 transactions each bound to 2 undo logs, and you
> > crash, you will need to (as you have it designed now) add another 200
> > transactions to the hash table upon recovery, and that will make you
> > exceed the hard limit unless you were at least 200 transactions below
> > the limit before the crash.  Have you handled that somehow?  If so,
> > how?
>
> Yeah, we have handled it by reserving the space of MaxBackends.  It is
> UndoRollbackHashTableSize() - MaxBackends.  There is a bug in the
> current patch which is that it should reserve space for 2 *
> MaxBackends so that after recovery, we are safe, but that can be
> fixed.

One of us is REALLY confused here.  Nothing you do in
UndoRollbackHashTableSize() can possibly fix the problem that I'm
talking about. Suppose the system gets to a point where all of the
rollback hash table entries are in use - there are some entries that
are used because work was pushed into the background, and then there
are other entries that are present because those transactions are
being rolled back in the foreground. Now at this point you crash.  Now
when you start up, all the hash table entries, including the reserved
ones, are already in use before any running transactions start.  Now
if you allow transactions to start before some of the rollbacks
complete, you have got big problems.  The system might crash again,
and if it does, when it restarts, the total amount of outstanding
requests will no longer fit in the hash table, which was the whole
premise of this design.

Maybe that doesn't make sense, so think about it this way.  Suppose
the following happens repeatedly: the system starts, someone begins a
transaction that writes an undo record, the rollback workers start up
but don't make very much progress because the system is heavily loaded
or whatever reason, the system crashes, rinse, repeat.  Since no
transactions got successfully rolled back and 1 new transaction that
needs roll back got added, the number of transactions pending rollback
has increased by one.  Now, however big you made the hash table, just
repeat this process that number of times plus one, and the hash table
overflows.  The only way you can prevent that is if you stop the
transaction from writing undo when the hash table is already too full.

> I have responded to it as a separate email, but let's discuss it here.
> So, you are right that only time we need to scan the undo logs to find
> all pending aborted xacts is immediately after startup.  But, we can't
> create a fully update-to-date entry from both the logs unless we make
> undo launcher to also wait to process anything till we are done.  We
> are not doing this in the current patch but we can do it if we want.
> This will be an additional restriction we have to put which is not
> required for the current approach.

I mean, that is just not true. There's no fundamental difference
between having two possible entries each of which looks like this:

struct entry { txn_details d; };

And having a single entry that looks like this:

struct entry { txn_details permanent; txn_details unlogged; bool
using_permanent; bool using_unlogged; };

I mean, I'm not saying you would actually want to do exactly the
second thing, but arguing that something cannot be done with one
design or the other is just not correct.

> Another related thing is that to update the existing entry for queues,
> we need to delete and re-insert the entry after we find the request in
> a different log category.   Again it depends if we point queue entries
> to hash table, then we might not have this additional work but that
> has its own set of complexities.

I don't follow this.  If you have a hash table where the key is XID,
there is no need to delete and reinsert anything just because you
discover that the XID has not only permanent undo but also unlogged
undo, or something of that sort.

> I think it is implementation wise simpler to have one entry per
> persistence level.   It is not that we can't deal with all the
> problems being discussed.

It's possible that it's simpler, but I'm not finding the arguments
you're making very convincing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Memory Accounting
Next
From: Jeff Davis
Date:
Subject: Re: Bad canonicalization for dateranges with 'infinity' bounds