Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Amul Sul
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CAAJ_b97pC5MRwG42c9StkZSNYXh9Lq8YittY8NZLcgUcB1LfjA@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
List pgsql-hackers
On Thu, Oct 8, 2020 at 3:52 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Wed, Oct 7, 2020 at 11:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Wed, Sep 16, 2020 at 3:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > I don't mind a loop, but that one looks broken. We have to clear the
> > > bit before we call the function that processes that type of barrier.
> > > Otherwise, if we succeed in absorbing the barrier but a new instance
> > > of the same barrier arrives meanwhile, we'll fail to realize that we
> > > need to absorb the new one.
> >
> > Here's a new version of the patch for allowing errors in
> > barrier-handling functions and/or rejection of barriers by those
> > functions. I think this responds to all of the previous review
> > comments from Andres. Also, here is an 0002 which is a handy bit of
> > test code that I wrote. It's not for commit, but it is useful for
> > finding bugs.
> >
> > In addition to improving 0001 based on the review comments, I also
> > tried to write a better commit message for it, but it might still be
> > possible to do better there. It's a bit hard to explain the idea in
> > the abstract. For ALTER SYSTEM READ ONLY, the idea is that a process
> > with an XID -- and possibly a bunch of sub-XIDs, and possibly while
> > idle-in-transaction -- can elect to FATAL rather than absorbing the
> > barrier. I suspect for other barrier types we might have certain
> > (hopefully short) stretches of code where a barrier of a particular
> > type can't be absorbed because we're in the middle of doing something
> > that relies on the previous value of whatever state is protected by
> > the barrier. Holding off interrupts in those stretches of code would
> > prevent the barrier from being absorbed, but would also prevent query
> > cancel, backend termination, and absorption of other barrier types, so
> > it seems possible that just allowing the barrier-absorption function
> > for a barrier of that type to just refuse the barrier until after the
> > backend exits the critical section of code will work out better.
> >
> > Just for kicks, I tried running 'make installcheck-parallel' while
> > emitting placeholder barriers every 0.05 s after altering the
> > barrier-absorption function to always return false, just to see how
> > ugly that was. In round figures, it made it take 24 s vs. 21 s, so
> > it's actually not that bad. However, it all depends on how many times
> > you hit CHECK_FOR_INTERRUPTS() how quickly, so it's easy to imagine
> > that the effect might be very non-uniform. That is, if you can get the
> > code to be running a tight loop that does little real work but does
> > CHECK_FOR_INTERRUPTS() while refusing to absorb outstanding type of
> > barrier, it will probably suck. Therefore, I'm inclined to think that
> > the fairly strong cautionary logic in the patch is reasonable, but
> > perhaps it can be better worded somehow. Thoughts welcome.
> >
> > I have not rebased the remainder of the patch series over these two.
> >
> That I'll do.
>

Attaching a rebased version includes Robert's patches for the latest master
head.

> On a quick look at the latest 0001 patch, the following hunk to reset leftover
> flags seems to be unnecessary:
>
> + /*
> + * If some barrier types were not successfully absorbed, we will have
> + * to try again later.
> + */
> + if (!success)
> + {
> + ResetProcSignalBarrierBits(flags);
> + return;
> + }
>
> When the ProcessBarrierPlaceholder() function returns false without an error,
> that barrier flag gets reset within the while loop.  The case when it has an
> error, the rest of the flags get reset in the catch block.  Correct me if I am
> missing something here.
>

Robert, could you please confirm this?

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Valgrind run error with Postgres on OSX
Next
From: Bharath Rupireddy
Date:
Subject: Re: A new function to wait for the backend exit after termination