Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Andres Freund
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id 20190716222338.wodqjohw4eaiiycc@alap3.anarazel.de
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
should there be a hard-limit on the number of transactions pending undo?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2019-07-15 12:26:21 -0400, Robert Haas wrote:
> Yeah. I didn't understand that explanation.  It seems to me that one
> of the fundamental design questions for this system is whether we
> should allow there to be an unbounded number of transactions that are
> pending undo application, or whether it's OK to enforce a hard limit.
> Either way, there should certainly be pressure applied to try to keep
> the number low, like forcing undo application into the foreground when
> a backlog is accumulating, but the question is what to do when that's
> insufficient.  My original idea was that we should not have a hard
> limit, in which case the shared memory data on what is pending might
> be incomplete, in which case we would need the discard workers to
> discover transactions needing undo and add them to the shared memory
> data structures, and if those structures are full, then we'd just skip
> adding those details and rediscover those transactions again at some
> future point.
>
> But, my understanding of the current design being implemented is that
> there is a hard limit on the number of transactions that can be
> pending undo and the in-memory data structures are sized accordingly.

My understanding is that that's really just an outcome of needing to
maintain oldestXidHavingUndo accurately, right? I know I asked this
before, but I didn't feel like the answer was that clear (probably due
to my own haziness). To me it seems very important to understand whether
/ how much we can separate the queuing/worker logic from the question of
how to maintain oldestXidHavingUndo.


> In such a system, we cannot rely on the discard worker(s) to
> (re)discover transactions that need undo, because if there can be
> transactions that need undo that we don't know about, then we can't
> enforce a hard limit correctly.  The exception, I suppose, is that
> after a crash, we'll need to scan all the undo logs and figure out
> which transactions are pending, but that doesn't preclude using a
> single queue entry covering both the logged and the unlogged portion
> of a transaction that has written undo of both kinds.  We've got to
> scan all of the undo logs before we allow any new undo-using
> transactions to start, and so we can create one fully-up-to-date entry
> that reflects the data for both persistence levels before any
> concurrent activity happens.

Yea, that seems like a question independent of the "completeness"
requirement. If desirable, it seems trivial to either have
RollbackHashEntry have per-persistence level status (for one entry per
xid), or not (for per-persistence entries).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Allow simplehash to use already-calculated hash values
Next
From: Andres Freund
Date:
Subject: Re: Allow simplehash to use already-calculated hash values