Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1J6wqz=dxoUTS3oQitUy7b-3FEiySSvp3qzzrLnriMfSg@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Jul 16, 2019 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jul 16, 2019 at 12:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The idea is that the queues can get full, but not rollback hash table.
> > In the case where the error queue gets full, we mark the entry as
> > Invalid in the hash table and later when discard worker again
> > encounters this request, it adds it to the queue if there is a space
> > available and marks the entry in the hash table as valid.  This allows
> > us to keep the information of all xacts having pending undo in shared
> > memory.
>
> I don't understand.  How is it OK to have entries in the hash table
> but not the queues?  And why would that ever happen, anyway?
>

We add entries in queues only when we want them to be processed by
background workers whereas hash table will contain the entries for all
the pending undo requests irrespective of whether they are executed by
foreground-transaction or by background workers.  Once the request is
processed, we remove it from the hash table.  The reasons for keeping
all the pending abort requests in hash table is that it allows us to
compute oldestXidHavingUnappliedUndo and second is it avoids us to
have duplicate undo requests by backends and discard worker.  In
short, there is no reason to keep all the entries in queues, but there
are reasons to keep all the aborted xact entries in hash table.

There is some more explanation about queues and hash table in
README.UndoProcessing which again might not be sufficient to get all
the details, but it can still help.

>  If you
> make the queues as big as the hash table is, then they should never
> fill up (or, if using binary heaps with lazy removal rather than
> rbtrees, they might fill up, but if they do, you can always make space
> by cleaning out the stale entries).
>
> > I think this can regress the performance when there are many
> > concurrent sessions unless there is a way to add/remove request
> > without a lock.  As of now, we don't enter any request or block any
> > space in shared memory related to pending undo till there is an error
> > or user explicitly Rollback the transaction.  We can surely do some
> > other way as well, but this way we won't have any overhead in the
> > commit or successful transaction's path.
>
> Well, we're already incurring some overhead to attach to an undo log,
> and that probably involves some locking.  I don't see why this would
> be any worse, and maybe it could piggyback on the existing work.
>

We attach to the undo log only once per backend (unless user changes
tablespace of undo in-between or probably when the space in current
log is finished) and then use it for all transactions via that
backend.  For each transaction, we don't take any global lock for
undo, so here we need something different.  Also, we need it at commit
time as well.

> Anyway, if you don't like this solution, propose something else. It's
> impossible to correctly implement a hard limit unless the number of
> aborted-but-not-yet-undone transaction is bounded to (HARD_LIMIT -
> ENTRIES_THAT_WOULD_BE_ADDED_AFTER_RECOVERY_IF_THE_SYSTEM_CRASHED_NOW).
> If there are 100 transactions each bound to 2 undo logs, and you
> crash, you will need to (as you have it designed now) add another 200
> transactions to the hash table upon recovery, and that will make you
> exceed the hard limit unless you were at least 200 transactions below
> the limit before the crash.  Have you handled that somehow?  If so,
> how?
>

Yeah, we have handled it by reserving the space of MaxBackends.  It is
UndoRollbackHashTableSize() - MaxBackends.  There is a bug in the
current patch which is that it should reserve space for 2 *
MaxBackends so that after recovery, we are safe, but that can be
fixed.

>  It seems to me that you MUST - at a minimum - keep a count of
> undo logs attached to in-progress transactions, if not the actual hash
> table entries.
>
> > Again coming to question of whether we need single or multiple entries
> > for one-request-per-persistence level, the reason for the same we have
> > discussed so far is that discard worker can register the requests for
> > them while scanning undo logs at different times.
>
> Yeah, but why do we need that in the first place?  I wrote something
> about that in a previous email, but you haven't responded to it here.
>

I have responded to it as a separate email, but let's discuss it here.
So, you are right that only time we need to scan the undo logs to find
all pending aborted xacts is immediately after startup.  But, we can't
create a fully update-to-date entry from both the logs unless we make
undo launcher to also wait to process anything till we are done.  We
are not doing this in the current patch but we can do it if we want.
This will be an additional restriction we have to put which is not
required for the current approach.

Another related thing is that to update the existing entry for queues,
we need to delete and re-insert the entry after we find the request in
a different log category.   Again it depends if we point queue entries
to hash table, then we might not have this additional work but that
has its own set of complexities.


> > However, there are
> > few more things like what if while applying the actions, the actions
> > for logged are successful and unlogged fails, keeping them separate
> > allows better processing.  If one fails, register its request in error
> > queue and try to process the request for another persistence level.  I
> > think the requests for the different persistence levels are kept in a
> > separate log which makes their processing separately easier.
>
> I don't find this convincing. It's not really an argument, just a
> vague list of issues. If you want to convince me, you'll need to be
> much more precise.
>

I think it is implementation wise simpler to have one entry per
persistence level.   It is not that we can't deal with all the
problems being discussed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: 张连壮
Date:
Subject: Re: pg_stat_database update stats_reset only by pg_stat_reset
Next
From: Justin Pryzby
Date:
Subject: Re: make \d pg_toast.foo show its indices ; and, \d toast show itsmain table ; and \d relkind=I show its partitions