Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+TgmoZi_XxDscp5S-8yokma7pYd-4VvQ2oCvBtQzfcCdncJ9g@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Re: POC: Cleaning up orphaned files using undo logs  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sat, Jul 13, 2019 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think then I am missing something because what I am talking about is
> below code in rbt_insert:

What you're saying here is that, with an rbtree, an exact match will
result in a merging of requests which we don't want, so we have to
make them always unique.  That's fine, but even if you used a binary
heap where it wouldn't be absolutely required that you break the ties,
you'd still want to think at least a little bit about what behavior is
best in case of a tie, just from the point of view of making the
system efficient.

> I think it would be better to use start_urec_ptr because XID can be
> non-unique in our case.  As I explained in one of the emails above [1]
> that we register the requests for logged and unlogged relations
> separately, so XID can be non-unique.

Yeah. I didn't understand that explanation.  It seems to me that one
of the fundamental design questions for this system is whether we
should allow there to be an unbounded number of transactions that are
pending undo application, or whether it's OK to enforce a hard limit.
Either way, there should certainly be pressure applied to try to keep
the number low, like forcing undo application into the foreground when
a backlog is accumulating, but the question is what to do when that's
insufficient.  My original idea was that we should not have a hard
limit, in which case the shared memory data on what is pending might
be incomplete, in which case we would need the discard workers to
discover transactions needing undo and add them to the shared memory
data structures, and if those structures are full, then we'd just skip
adding those details and rediscover those transactions again at some
future point.

But, my understanding of the current design being implemented is that
there is a hard limit on the number of transactions that can be
pending undo and the in-memory data structures are sized accordingly.
In such a system, we cannot rely on the discard worker(s) to
(re)discover transactions that need undo, because if there can be
transactions that need undo that we don't know about, then we can't
enforce a hard limit correctly.  The exception, I suppose, is that
after a crash, we'll need to scan all the undo logs and figure out
which transactions are pending, but that doesn't preclude using a
single queue entry covering both the logged and the unlogged portion
of a transaction that has written undo of both kinds.  We've got to
scan all of the undo logs before we allow any new undo-using
transactions to start, and so we can create one fully-up-to-date entry
that reflects the data for both persistence levels before any
concurrent activity happens.

I am wondering (and would love to hear other opinions on) the question
of which kind of design we ought to be pursuing, but it's got to be
one or the other, not something in the middle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)