Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1KKAFBCJuPnFtgdc89djv4xO=ZkAdXvKQinqN4hWiRbvA@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, Jul 16, 2019 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jul 1, 2019 at 3:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > 2.  Introduced a new RMGR callback rm_undo_status.  It is used to
> > decide when record sets in the UNDO_SHARED category should be
> > discarded (instead of the usual single xid-based rules).  The possible
> > answers are "discard me now!", "ask me again when a given XID is all
> > visible", and "ask me again when a given XID is no longer running".
>
> From the minor nitpicking department, the patches from this stack that
> are updating rmgrlist.h are consistently failing to update the comment
> line preceding the list of PG_RMGR() lines.  This looks to be patches
> 0014 and 0015 in this stack; 0015 seems to need to be squashed into
> 0014.
>

Fixed.  You can verify in patch
0011-Infrastructure-to-execute-pending-undo-actions.  The 'mask' was
missing in the list as well which I have added here, we might want to
commit that separately.

> Reviewing Amit's 0016:
>
> performUndoActions appears to be badly-designed.  For starters, it's
> sometimes wrong: the only place it gets set to true is in
> UndoActionsRequired (which is badly named, because from the name you
> expect it to return a Boolean and to not have side effects, but
> instead it doesn't return anything and does have side effects).
> UndoActionsRequired() only gets called from selected places, like
> AbortCurrentTransaction(), so the rest of the time it just returns a
> wrong answer.  Now maybe it's never called at those times, but there's
> no guard to prevent a function like CanPerformUndoActions() (which is
> also badly named, because performUndoActions tells you whether you
> need to perform undo actions, not whether it's possible to perform
> undo actions) from being called before the flag is set.  I think that
> this flag should be either (1) maintained eagerly - so that wherever
> we set start_urec_ptr we also set the flag right away or (2) removed -
> so when we need to know, we just loop over all of the undo categories
> on the spot, which is not that expensive because there aren't that
> many of them.
>

I have taken approach-2 to fix this.

> It seems pointless to make PrepareTransaction() take undo pointers as
> arguments, because those pointers are just extracted from the
> transaction state, to which PrepareTransaction() has a pointer.
>

Fixed.

> Thomas has already objected to another proposal to add functions that
> turn 32-bit XIDs into 64-bit XIDs.  Therefore, I feel confident in
> predicting that he will likewise object to GetEpochForXid. I think
> this needs to be changed somehow, maybe by doing what the XXX comment
> you added suggests.
>

I will fix this later.  I think we can separately write a patch to
extend Two-phase file to use fulltransactionid and then use it here.

> This patch has some problems with naming consistency.  There's a
> function called PushUndoRequest() which calls a function called
> RegisterRollbackReq() to do the heart of the work.  So, is it undo or
> rollback?  Are we pushing or registering?  Is it a request or a req?
> For bonus points, the flag that the function sets is called
> undo_req_pushed, which is halfway in between the two competing
> terminologies.  Other gripes about PushUndoRequest: push is vague and
> doesn't really explain what's happening, "apllying" is a typo,
> per_level is a poor variable name and shouldn't be declared volatile.
> This function has problems with naming in other places, too; please go
> through all of the names carefully and make them consistent and
> adequately descriptive.
>

I have changed the namings to make them consistent.  If you see
anything else, then do let me know.

> I am not a fan of applying_subxact_undo.  I think we should look for a
> better design there.  A couple of things occur to me.  One is that we
> don't necessarily need to go to FATAL; we could just force the current
> transaction and all of its subtransactions fail all the way out to the
> top level, but then perhaps allow new transactions to be started
> afterwards.  I'm not sure that's worth it, but it would work, and I
> think it has precedent in SxactIsDoomed. Assuming we're going to stick
> with the current FATAL plan, I think we should do something like
> invent a new kind of critical section that forces ERROR to be promoted
> to FATAL and then use it here.  We could call it a semi-critical or
> locally-critical section, and the undo machinery could use it, but
> then also so could other things.  I've wanted that sort of concept
> before, so I think it's a good idea to try to have something general
> and independent of undo.  The same concept could be used in
> PerformUndoActions() instead of having to invent
> pg_rethrow_as_fatal(), so we'd have two uses for this mechanism right
> away.
>

Okay, I have developed the concept of semi-critical section and used
it for sub-transactions and temp tables.  Kindly check if this is
something that you have in mind?

> FinishPreparedTransactions() tries to apply undo actions while
> interrupts are still held.  Is that necessary?  Can we avoid it?
>

Fixed.

> It seems highly likely that the logic added to the TBLOCK_SUBCOMMIT
> case inside CommitTransactionCommand and also into
> ReleaseCurrentSubTransaction should have been added to
> CommitSubTransaction instead.  If that's not true, then we have to
> believe that the TBLOCK_SUBRELEASE call to CommitSubTransaction needs
> different treatment from the other two cases, which sounds unlikely;
> we also have to explain why undo is somehow different from all of
> these other releases that are already handled in that function, not in
> its callers.  I also strongly suspect it is altogether wrong to do
> this before CommitSubTransaction sets s->state to TRANS_COMMIT; what
> if a subxact callback throws an error?
>
> For related reasons, I don't think that the change ReleaseSavepoint()
> are right either.  Notice the header comment: "As above, we don't
> actually do anything here except change blockState."  The "as above"
> part of the comment probably didn't originally refer to
> DefineSavepoint(), which definitely does do other stuff, but to
> something like EndImplicitTransactionBlock() or EndTransactionBlock(),
> and DefineSavepoint() got stuck in the middle later.  Anyway, your
> patch makes the comment false by doing actual state changes in this
> function, rather than just marking the subtransactions for commit.
> But why should that be right?  If none of the many other bits of state
> are manipulated here rather than in CommitSubTransaction(), why is
> undo the one thing that is different?  I guess this is basically just
> compensation for the lack of any of this code in the TBLOCK_SUBRELEASE
> path which I noted in the previous paragraph, but I still think the
> right answer is to put it all in CommitSubTransaction() *after* we set
> TRANS_COMMIT.
>

Changed as per suggestion.

> There are a number of things I either don't like or don't understand
> about PerformUndoActions.  One is that undo_req_pushed gets passed to
> this function.  That just looks really odd from an abstraction point
> of view.  Basically, we have a function whose job is to "perform undo
> actions," and it gets a flag as an argument that tells it to not
> actually perform some of the undo actions: that's odd. I think the
> reason it's like that is because of the issue we've been discussing
> elsewhere that there's a separate undo request for each category.  If
> you didn't have that, you wouldn't need to do this here.  I'm not
> saying that proves that the one-request-per-persistence-level design
> is definitely wrong, but this is certainly not a point in its favor,
> at least IMHO.
>

I think we have discussed in detail about
one-request-per-persistence-level design and I will investigate it to
see if we can make it one-request-per-transaction and if not what are
the challenges and can we overcome them without significantly more
work and complexity.  So for now, I have not changed anything related
to this point.

Apart from these comments, I have changed a few more things:
a. Changed TWOPHASE_MAGIC define as we are changing TwoPhaseFileHeader.
b. Fixed comments by Dilip on same patch [1].  I will respond to them
separately.
c. Fixed the problem reported by Thomas [2] and one similar problem in
an error queue noticed by me.

I have still not addressed all the comments raised.  This is mainly to
unblock Thomas's test and share whatever is done until now.  I am
posting all the patches, but have not modified anything related to
undo-log and undo-interface patches (aka from 0001 to 0008).

[1] - https://www.postgresql.org/message-id/CAFiTN-tObs5BQZETqK12QuOz7nPSXb90PdG49AzK2ZJ4ts1c5g%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CA%2BhUKGLv016-1y%3DCwx%2Bmme%2BcFRD5Bn03%3D2JVFnRB7JMLsA35%3Dw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: proposal - patch: psql - sort_by_size
Next
From: Jeremy Finzel
Date:
Subject: Re: proposal - patch: psql - sort_by_size