Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Amul Sul
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CAAJ_b95EBZfKfgG10p5b4hFj7J8QgP0gJQFN46A0jsLJJXam5g@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
On Wed, Oct 7, 2020 at 11:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Sep 16, 2020 at 3:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > I don't mind a loop, but that one looks broken. We have to clear the
> > bit before we call the function that processes that type of barrier.
> > Otherwise, if we succeed in absorbing the barrier but a new instance
> > of the same barrier arrives meanwhile, we'll fail to realize that we
> > need to absorb the new one.
>
> Here's a new version of the patch for allowing errors in
> barrier-handling functions and/or rejection of barriers by those
> functions. I think this responds to all of the previous review
> comments from Andres. Also, here is an 0002 which is a handy bit of
> test code that I wrote. It's not for commit, but it is useful for
> finding bugs.
>
> In addition to improving 0001 based on the review comments, I also
> tried to write a better commit message for it, but it might still be
> possible to do better there. It's a bit hard to explain the idea in
> the abstract. For ALTER SYSTEM READ ONLY, the idea is that a process
> with an XID -- and possibly a bunch of sub-XIDs, and possibly while
> idle-in-transaction -- can elect to FATAL rather than absorbing the
> barrier. I suspect for other barrier types we might have certain
> (hopefully short) stretches of code where a barrier of a particular
> type can't be absorbed because we're in the middle of doing something
> that relies on the previous value of whatever state is protected by
> the barrier. Holding off interrupts in those stretches of code would
> prevent the barrier from being absorbed, but would also prevent query
> cancel, backend termination, and absorption of other barrier types, so
> it seems possible that just allowing the barrier-absorption function
> for a barrier of that type to just refuse the barrier until after the
> backend exits the critical section of code will work out better.
>
> Just for kicks, I tried running 'make installcheck-parallel' while
> emitting placeholder barriers every 0.05 s after altering the
> barrier-absorption function to always return false, just to see how
> ugly that was. In round figures, it made it take 24 s vs. 21 s, so
> it's actually not that bad. However, it all depends on how many times
> you hit CHECK_FOR_INTERRUPTS() how quickly, so it's easy to imagine
> that the effect might be very non-uniform. That is, if you can get the
> code to be running a tight loop that does little real work but does
> CHECK_FOR_INTERRUPTS() while refusing to absorb outstanding type of
> barrier, it will probably suck. Therefore, I'm inclined to think that
> the fairly strong cautionary logic in the patch is reasonable, but
> perhaps it can be better worded somehow. Thoughts welcome.
>
> I have not rebased the remainder of the patch series over these two.
>
That I'll do.

On a quick look at the latest 0001 patch, the following hunk to reset leftover
flags seems to be unnecessary:

+ /*
+ * If some barrier types were not successfully absorbed, we will have
+ * to try again later.
+ */
+ if (!success)
+ {
+ ResetProcSignalBarrierBits(flags);
+ return;
+ }

When the ProcessBarrierPlaceholder() function returns false without an error,
that barrier flag gets reset within the while loop.  The case when it has an
error, the rest of the flags get reset in the catch block.  Correct me if I am
missing something here.

Regards,
Amul



pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Wired if-statement in gen_partprune_steps_internal
Next
From: Dmitry Dolgov
Date:
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey