Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1JKPVUDTCnJwuv=btJHhpccmRXA=rQt=N-xVRcsGwfiJA@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Jul 15, 2019 at 9:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Jul 13, 2019 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I think then I am missing something because what I am talking about is
> > below code in rbt_insert:
>
> What you're saying here is that, with an rbtree, an exact match will
> result in a merging of requests which we don't want, so we have to
> make them always unique.  That's fine, but even if you used a binary
> heap where it wouldn't be absolutely required that you break the ties,
> you'd still want to think at least a little bit about what behavior is
> best in case of a tie, just from the point of view of making the
> system efficient.
>

Okay.

> > I think it would be better to use start_urec_ptr because XID can be
> > non-unique in our case.  As I explained in one of the emails above [1]
> > that we register the requests for logged and unlogged relations
> > separately, so XID can be non-unique.
>
> Yeah. I didn't understand that explanation.  It seems to me that one
> of the fundamental design questions for this system is whether we
> should allow there to be an unbounded number of transactions that are
> pending undo application, or whether it's OK to enforce a hard limit.
> Either way, there should certainly be pressure applied to try to keep
> the number low, like forcing undo application into the foreground when
> a backlog is accumulating, but the question is what to do when that's
> insufficient.  My original idea was that we should not have a hard
> limit, in which case the shared memory data on what is pending might
> be incomplete, in which case we would need the discard workers to
> discover transactions needing undo and add them to the shared memory
> data structures, and if those structures are full, then we'd just skip
> adding those details and rediscover those transactions again at some
> future point.
>
> But, my understanding of the current design being implemented is that
> there is a hard limit on the number of transactions that can be
> pending undo and the in-memory data structures are sized accordingly.
>

Yes, that is correct.

> In such a system, we cannot rely on the discard worker(s) to
> (re)discover transactions that need undo, because if there can be
> transactions that need undo that we don't know about, then we can't
> enforce a hard limit correctly.
>

I have responded in the email above about this point.

>  The exception, I suppose, is that
> after a crash, we'll need to scan all the undo logs and figure out
> which transactions are pending, but that doesn't preclude using a
> single queue entry covering both the logged and the unlogged portion
> of a transaction that has written undo of both kinds.  We've got to
> scan all of the undo logs before we allow any new undo-using
> transactions to start, and so we can create one fully-up-to-date entry
> that reflects the data for both persistence levels before any
> concurrent activity happens.
>

It is correct that no new undo using transaction can start, but
nothing prevents undo launcher to start the undo workers to process
the already registered requests which can lead to some concurrent
activity.

> I am wondering (and would love to hear other opinions on) the question
> of which kind of design we ought to be pursuing, but it's got to be
> one or the other, not something in the middle.
>

I agree that it should not be in the middle.  It is possible that I am
missing or misunderstanding something here, but AFAIU, the current
design, and implementation allows us to maintain the pending undo
state in-memory.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: make \d pg_toast.foo show its indices ; and, \d toast show itsmain table ; and \d relkind=I show its partitions (and tablespace)
Next
From: Michael Paquier
Date:
Subject: Re: pg_receivewal documentation