Thread: global / super barriers (for checksums)

global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

Magnus cornered me at pgconf.eu and asked me whether I could prototype
the "barriers" I'd been talking about in the online checksumming thread.

The problem there was to make sure that all processes, backends and
auxiliary processes have seen the new state of checksums being enabled,
and aren't currently in the process of writing a new page out.

The current prototype solves that by requiring a restart, but that
strikes me as a far too large hammer.

The attached patch introduces "global barriers" (name was invented in a
overcrowded hotel lounge, so ...), which allow to wait for such a change
to be absorbed by all backends.

I've only tested the code with gdb, but that seems to work:

p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))

waits until all backends (including bgwriter, checkpointers, walwriters,
bgworkers, ...) have accepted interrupts at least once.  Multiple such
requests are coalesced.

I decided to wait until interrupts are actually process, rather than
just the signal received, because that means the system is in a well
defined state. E.g. there's no pages currently being written out.

For the checksum enablement patch you'd do something like;

EnableChecksumsInShmemWithLock();
WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));

and after that you should be able to set it to a perstistent mode.


I chose to use procsignals to send the signals, a global uint64
globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
latter keeping track which barriers have been requested. There likely
seem to be other usecases.


The patch definitely is in a prototype stage. At the very least it needs
a high-level comment somewhere, and some of the lower-level code needs
to be cleaned up.

One thing I wasn't happy about is how checksum internals have to absorb
barrier requests - that seems unavoidable, but I'd hope for something
more global than just BufferSync().


Comments?


Greetings,

Andres Freund

Attachment

Re: global / super barriers (for checksums)

From
Magnus Hagander
Date:


On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

Magnus cornered me at pgconf.eu and asked me whether I could prototype
the "barriers" I'd been talking about in the online checksumming thread.

The problem there was to make sure that all processes, backends and
auxiliary processes have seen the new state of checksums being enabled,
and aren't currently in the process of writing a new page out.

The current prototype solves that by requiring a restart, but that
strikes me as a far too large hammer.

The attached patch introduces "global barriers" (name was invented in a
overcrowded hotel lounge, so ...), which allow to wait for such a change
to be absorbed by all backends.

I've only tested the code with gdb, but that seems to work:

p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))

waits until all backends (including bgwriter, checkpointers, walwriters,
bgworkers, ...) have accepted interrupts at least once.  Multiple such
requests are coalesced.

I decided to wait until interrupts are actually process, rather than
just the signal received, because that means the system is in a well
defined state. E.g. there's no pages currently being written out.

For the checksum enablement patch you'd do something like;

EnableChecksumsInShmemWithLock();
WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));

and after that you should be able to set it to a perstistent mode.


I chose to use procsignals to send the signals, a global uint64
globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
latter keeping track which barriers have been requested. There likely
seem to be other usecases.


The patch definitely is in a prototype stage. At the very least it needs
a high-level comment somewhere, and some of the lower-level code needs
to be cleaned up.

One thing I wasn't happy about is how checksum internals have to absorb
barrier requests - that seems unavoidable, but I'd hope for something
more global than just BufferSync().


Comments?

Finally getting around to playing with this one and it unfortunately doesn't apply anymore (0003).

I think it's just a matter of adding those two rows though, right? That is, it's not an actual conflict it's just something else added in the same place? 

--

Re: global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

On 2018-12-27 13:54:34 +0100, Magnus Hagander wrote:
> Finally getting around to playing with this one and it unfortunately
> doesn't apply anymore (0003).
> 
> I think it's just a matter of adding those two rows though, right? That is,
> it's not an actual conflict it's just something else added in the same
> place?

What do you mean with "rows" here? I see a bunch of trivial conflicts
due to changes in atomics initialization but nothing else?  And yes, I'd
not expect any meaningful conflicts.

Greetings,

Andres Freund


Re: global / super barriers (for checksums)

From
Magnus Hagander
Date:


On Thu, Dec 27, 2018 at 2:22 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-12-27 13:54:34 +0100, Magnus Hagander wrote:
> Finally getting around to playing with this one and it unfortunately
> doesn't apply anymore (0003).
>
> I think it's just a matter of adding those two rows though, right? That is,
> it's not an actual conflict it's just something else added in the same
> place?

What do you mean with "rows" here? I see a bunch of trivial conflicts
due to changes in atomics initialization but nothing else?  And yes, I'd
not expect any meaningful conflicts.


Sorry, lack of caffeine. The conflict I saw was:

        /* Initialize lockGroupMembers list. */
        dlist_init(&procs[i].lockGroupMembers);
+
+       pg_atomic_init_u32(&procs[i].barrierFlags, 0);
+       pg_atomic_init_u64(&procs[i].barrierGen, PG_UINT64_MAX);
    }
 

So yes, I'm pretty sure we're talking about the same thing.

--

Re: global / super barriers (for checksums)

From
Magnus Hagander
Date:
On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

Magnus cornered me at pgconf.eu and asked me whether I could prototype
the "barriers" I'd been talking about in the online checksumming thread.

The problem there was to make sure that all processes, backends and
auxiliary processes have seen the new state of checksums being enabled,
and aren't currently in the process of writing a new page out.

The current prototype solves that by requiring a restart, but that
strikes me as a far too large hammer.

The attached patch introduces "global barriers" (name was invented in a
overcrowded hotel lounge, so ...), which allow to wait for such a change
to be absorbed by all backends.

I've only tested the code with gdb, but that seems to work:

p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))

waits until all backends (including bgwriter, checkpointers, walwriters,
bgworkers, ...) have accepted interrupts at least once.  Multiple such
requests are coalesced.

I decided to wait until interrupts are actually process, rather than
just the signal received, because that means the system is in a well
defined state. E.g. there's no pages currently being written out.

For the checksum enablement patch you'd do something like;

EnableChecksumsInShmemWithLock();
WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));

and after that you should be able to set it to a perstistent mode.


I chose to use procsignals to send the signals, a global uint64
globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
latter keeping track which barriers have been requested. There likely
seem to be other usecases.


The patch definitely is in a prototype stage. At the very least it needs
a high-level comment somewhere, and some of the lower-level code needs
to be cleaned up.

One thing I wasn't happy about is how checksum internals have to absorb
barrier requests - that seems unavoidable, but I'd hope for something
more global than just BufferSync().


Comments?



Finally getting back to this one.

In re-reading this, I notice there are a lot of references to Intterrupt (with two t). I'm guessing this is just a spelling error, and not something that actually conveys some meaning?

Can you elaborate on what you mean with:
+       /* XXX: need a more principled approach here */

Is that the thing you refer to above about "checksum internals"?

Also in checking we figured it'd be nice to have a wait event for this, since a process can potentially get stuck in an infinite loop waiting for some other process if it's misbehaving. Kind of like the attached?

//Magnus
 
Attachment

Re: global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

On 2019-07-10 15:31:11 +0200, Magnus Hagander wrote:
> In re-reading this, I notice there are a lot of references to Intterrupt
> (with two t). I'm guessing this is just a spelling error, and not something
> that actually conveys some meaning?

Just a spelling error. I think I wrote the patch in a night after
pgconf.eu, to allow you to quickly make progress :P


> Can you elaborate on what you mean with:
> +       /* XXX: need a more principled approach here */

> Is that the thing you refer to above about "checksum internals"?

I think I didn't actually mean "checksum" but instead "checkpoint".  It
does bother me that we have an operation as long-running as BufferSync()
commonly is, without a proper way to accept event.  There's a hack for
doing something similar-ish in CheckpointWriteDelay(), for absorbing
fsync requests, but it doesn't trigger for checkpoints not done in
checkpointer, nor is it really extensible.


> Also in checking we figured it'd be nice to have a wait event for this,
> since a process can potentially get stuck in an infinite loop waiting for
> some other process if it's misbehaving. Kind of like the attached?

Yea, that makes sense.

Greetings,

Andres Freund



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Tue, Oct 30, 2018 at 1:17 AM Andres Freund <andres@anarazel.de> wrote:
> The patch definitely is in a prototype stage. At the very least it needs
> a high-level comment somewhere, and some of the lower-level code needs
> to be cleaned up.
>
> One thing I wasn't happy about is how checksum internals have to absorb
> barrier requests - that seems unavoidable, but I'd hope for something
> more global than just BufferSync().

Hi,

TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I
have some concerns about 0003 and am interested in working further on
it.

0001 changes the StartBackgroundWorker so that the SIGINT handler is
contingent on BGWORKER_SHMEM_ACCESS rather than
BGWORKER_BACKEND_DATABASE_CONNECTION. It seems to me that the goal
here should be to use procsignal_sigusr1_handler(), or something that
calls it, in any process where ProcSignalInit() is called, but a
backend that only requests BGWORKER_SHMEM_ACCESS probably won't,
because the normal way for a background worker to call
ProcSignalInit() would be to indirectly call InitPostgres() by means
of BackgroundWorkerInitializeConnection() or
BackgroundWorkerInitializeConnectionByOid(), and a worker that didn't
ask for a database connection presumably shouldn't be doing that. So
I'm not sure that I understand why we need this. (Is it legal for a
worker to call one of these functions as long as it passes InvalidOid
for the database OID, or something? Do we have examples of workers
that do that?)

On the other hand, 0002 seems like it's pretty clearly a good idea. It
makes a whole bunch of auxiliary processes use
procsignal_sigusr1_handler() and those things all get called from
AuxiliaryProcessMain(), which does ProcSignalInit(), and it seems like
clearly the right idea that processes which register to participate in
the procsignal mechanism should also register to get notified if they
receive a procsignal. I think that the reason we haven't bothered with
this up until now is because I think that it's presently impossible
for any of the kind of procsignals that we have to get sent to any of
those processes. But, global barriers would require us to do so, so it
seems like it's time to tighten that up, and it doesn't really cost
anything. So I propose to commit this part soon, unless somebody
objects.

Regarding 0003:

- The proc->pid == 0 check in EmitGlobalBarrier() doesn't really do
what it appears to do, because regular backends don't clear proc->pid
when they exit; only auxiliary processes do. (We could, and perhaps
should, change that.)

- It seems  to me that it would be generally better to insert
CHECK_FOR_INTERRUPTS() in places where the patch just inserts an
ad-hoc if (GlobalBarrierInterruptPending)
ProcessGlobalBarrierIntterupt(). There might be someplace where we
have to do it the latter way because we unavoidably hold an LWLock or
something, but I think we should avoid that whenever possible. If it's
a good place to check for one kind of interrupt, it's probably a good
place to check for all of them.

- I don't think BufferSync() is doing this in the right place. Unless
I am confused, it's doing it inside a loop that just allocates stuff
and copies data around, which probably does not need this kind of
decoration. It wouldn't hurt to check here, and it might be a good
idea for safety, but I think the place that really needs this
treatment is the following loop where we are actually doing I/O and
sleeping. Possibly we should even be doctoring CheckpointWriteDelay to
use a latch-wait loop that can CHECK_FOR_INTERRUPTS() before
re-sleeping, although for 100ms (curiously described as a Big Sleep)
it might be overkill.

- I think it would be nice to make this system more extensible. IIUC,
the idea is that ProcessGlobalBarrierIntterupt() will call bespoke
code for each GLOBBAR_* flag that set, and that code will then update
process-local state from global state before returning. But, instead
of hard-coding constants, how about just having a list of callbacks,
and we call them all here, and each one is responsible for figuring
out whether anything's been changed that it cares about, and if so
updating the appropriate local state? Then GlobalBarrierKind goes away
altogether. Granted, this could be a bit expensive if we were using
global barriers for lots of different things and at least some of
those things occurred frequently, but I don't think we're very near
the point where we need that kind of optimization. As long as the
callbacks have tests that exit quickly if there's nothing to do, I
think it should be fine. (As an intermediate position, we could
consider keeping barrierFlags but assign the bits dynamically; but
unless and until we get more users of the facility, I think it might
be simplest to just forget about barrierFlags altogether. Then even
extensions can use this, if they want.)

- The patch needs some general tidying up, like comments and naming
consistency and stuff like that.

Andres, Magnus, if neither of you are planning to work on this soon, I
might like to jump in and run with this. Please let me know your
thoughts.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

On 2019-11-13 12:26:34 -0500, Robert Haas wrote:
> TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I
> have some concerns about 0003 and am interested in working further on
> it.

Thanks for looking at the patch!


> 0001 changes the StartBackgroundWorker so that the SIGINT handler is
> contingent on BGWORKER_SHMEM_ACCESS rather than
> BGWORKER_BACKEND_DATABASE_CONNECTION. It seems to me that the goal
> here should be to use procsignal_sigusr1_handler(), or something that
> calls it, in any process where ProcSignalInit() is called, but a
> backend that only requests BGWORKER_SHMEM_ACCESS probably won't,
> because the normal way for a background worker to call
> ProcSignalInit() would be to indirectly call InitPostgres() by means
> of BackgroundWorkerInitializeConnection() or
> BackgroundWorkerInitializeConnectionByOid(), and a worker that didn't
> ask for a database connection presumably shouldn't be doing that. So
> I'm not sure that I understand why we need this. (Is it legal for a
> worker to call one of these functions as long as it passes InvalidOid
> for the database OID, or something? Do we have examples of workers
> that do that?)

Hm. Well, it seems useful for non-database connected processes to be
able to partake in global barriers. Without that, if there's any chance
they could e.g. generate WAL, it'd e.g. break the checksum enablement
patch. Note that auxiliary processes already do call ProcSignalInit().

You're right that we ought to make it easier (or automatic) to call
ProcSignalInit() for such processes.  Perhaps we ought to do so in
ProcessInit()?

But perhaps we don't strictly need this - I'm not sure how many examples
of BGWORKER_SHMEM_ACCESS bgworkers that don't also use
BGWORKER_BACKEND_DATABASE_CONNECTION there are.



> Regarding 0003:
> 
> - The proc->pid == 0 check in EmitGlobalBarrier() doesn't really do
> what it appears to do, because regular backends don't clear proc->pid
> when they exit; only auxiliary processes do. (We could, and perhaps
> should, change that.)

Ick.


> - It seems  to me that it would be generally better to insert
> CHECK_FOR_INTERRUPTS() in places where the patch just inserts an
> ad-hoc if (GlobalBarrierInterruptPending)
> ProcessGlobalBarrierIntterupt(). There might be someplace where we
> have to do it the latter way because we unavoidably hold an LWLock or
> something, but I think we should avoid that whenever possible. If it's
> a good place to check for one kind of interrupt, it's probably a good
> place to check for all of them.

I might be missing something. Aren't all of the places where those
checks are places where we currently can't do a CHECK_FOR_INTERRUPTS()?
I've swapped this thoroughly out of my mind, but going through them:

1) AutoVacLauncherMain() - doesn't do CFI()
2) BackgroundWriterMain() - dito
3) CheckpointerMain() - dito
4) HandleStartupProcInterrupts() - dito
5) WalWriterMain() - dito
6) BufferSync() - dito, called from CheckpointerMain(), and startup process
7) ProcessClientWriteInterrupt() - can't do generic CFI, don't want to
   process all interrupts while writing out data, to avoid corrupting
   the output stream or loosing messages

Which one do you think we should convert to CFI()? As far as I can tell
we can't make the non-backend cases use the postgres.c
ProcessInterrupts(), and the ProcessClientWriteInterrupt() one can't do
so either.


> - I don't think BufferSync() is doing this in the right place. Unless
> I am confused, it's doing it inside a loop that just allocates stuff
> and copies data around, which probably does not need this kind of
> decoration. It wouldn't hurt to check here, and it might be a good
> idea for safety, but I think the place that really needs this
> treatment is the following loop where we are actually doing I/O and
> sleeping. Possibly we should even be doctoring CheckpointWriteDelay to
> use a latch-wait loop that can CHECK_FOR_INTERRUPTS() before
> re-sleeping, although for 100ms (curiously described as a Big Sleep)
> it might be overkill.

Yea. Not sure what happened there. I think it's good to have such a
check in all the buffer loops in BufferSync(), but clearly it's most
important to have one in the write case.


> - I think it would be nice to make this system more extensible. IIUC,
> the idea is that ProcessGlobalBarrierIntterupt() will call bespoke
> code for each GLOBBAR_* flag that set, and that code will then update
> process-local state from global state before returning. But, instead
> of hard-coding constants, how about just having a list of callbacks,
> and we call them all here, and each one is responsible for figuring
> out whether anything's been changed that it cares about, and if so
> updating the appropriate local state?

I don't think that's a good idea. This stuff happens at an extremely low
level, in different types of processes, with various locks held
etc. Running arbitrary code in those circumstances strikes me as a
seriously bad idea.


> Then GlobalBarrierKind goes away altogether. Granted, this could be a
> bit expensive if we were using global barriers for lots of different
> things and at least some of those things occurred frequently, but I
> don't think we're very near the point where we need that kind of
> optimization. As long as the callbacks have tests that exit quickly if
> there's nothing to do, I think it should be fine. (As an intermediate
> position, we could consider keeping barrierFlags but assign the bits
> dynamically; but unless and until we get more users of the facility, I
> think it might be simplest to just forget about barrierFlags
> altogether. Then even extensions can use this, if they want.)

I think we need the option to *not* "accept" the barrier (as in, not
update MyProc->barrierGen), because we cannot guarantee the option has
taken effect yet. Without something identifying which types of events
being waited for, that seems hard.


> - The patch needs some general tidying up, like comments and naming
> consistency and stuff like that.

Yea. It really was a prototype to allow Magnus to continue...


> Andres, Magnus, if neither of you are planning to work on this soon, I
> might like to jump in and run with this. Please let me know your
> thoughts.

I'm not planning to do so in the near term - so I'd more than welcome
you to do so.

Greetings,

Andres Freund



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Wed, Nov 13, 2019 at 2:45 PM Andres Freund <andres@anarazel.de> wrote:
> I might be missing something. Aren't all of the places where those
> checks are places where we currently can't do a CHECK_FOR_INTERRUPTS()?
> I've swapped this thoroughly out of my mind, but going through them:
>
> 1) AutoVacLauncherMain() - doesn't do CFI()
> 2) BackgroundWriterMain() - dito
> 3) CheckpointerMain() - dito
> 4) HandleStartupProcInterrupts() - dito
> 5) WalWriterMain() - dito
> 6) BufferSync() - dito, called from CheckpointerMain(), and startup process
> 7) ProcessClientWriteInterrupt() - can't do generic CFI, don't want to
>    process all interrupts while writing out data, to avoid corrupting
>    the output stream or loosing messages
>
> Which one do you think we should convert to CFI()? As far as I can tell
> we can't make the non-backend cases use the postgres.c
> ProcessInterrupts(), and the ProcessClientWriteInterrupt() one can't do
> so either.

I haven't looked through all of these, but in AutoVacLauncherMain, a
trivial conversion to CFI doesn't seem to break anything horribly (see
attached). It does change the error message when we exit, making it
chattier, but I think we could find our way around that problem. Note
that AutoVacLauncherMain, and some of the others, do a
HOLD_INTERRUPTS() and a RESUME_INTERRUPTS() in the error-recovery
block, so somebody evidently thought some of that code might call
CHECK_FOR_INTERRUPTS(), and I can't prove off-hand that none of the
other logic which isn't so protected doesn't have some way to reach
CHECK_FOR_INTERRUPTS(). It seems to me that there are so many places
where PostgreSQL calls CHECK_FOR_INTERRUPTS() that it's somewhat
unwise to assume that just "can't happen."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: global / super barriers (for checksums)

From
Magnus Hagander
Date:


On Wed, Nov 13, 2019 at 8:45 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-11-13 12:26:34 -0500, Robert Haas wrote:
> TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I
> have some concerns about 0003 and am interested in working further on
> it.

Thanks for looking at the patch!

+1 (well, +<more than one>, but there is a quota)


> - The patch needs some general tidying up, like comments and naming
> consistency and stuff like that.

Yea. It really was a prototype to allow Magnus to continue...

And a very useful one! :) So thanks for that one as well.


> Andres, Magnus, if neither of you are planning to work on this soon, I
> might like to jump in and run with this. Please let me know your
> thoughts.

I'm not planning to do so in the near term - so I'd more than welcome
you to do so.

I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review" needed for it. So I am also very happy for you to pick it up and run with it.

--

Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Wed, Nov 13, 2019 at 12:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On the other hand, 0002 seems like it's pretty clearly a good idea. It
> makes a whole bunch of auxiliary processes use
> procsignal_sigusr1_handler() and those things all get called from
> AuxiliaryProcessMain(), which does ProcSignalInit(), and it seems like
> clearly the right idea that processes which register to participate in
> the procsignal mechanism should also register to get notified if they
> receive a procsignal. I think that the reason we haven't bothered with
> this up until now is because I think that it's presently impossible
> for any of the kind of procsignals that we have to get sent to any of
> those processes. But, global barriers would require us to do so, so it
> seems like it's time to tighten that up, and it doesn't really cost
> anything. So I propose to commit this part soon, unless somebody
> objects.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Sun, Nov 17, 2019 at 8:38 AM Magnus Hagander <magnus@hagander.net> wrote:
> I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review"
neededfor it. So I am also very happy for you to pick it up and run with it.
 

OK, here's what I came up with.

0001 is just a code movement patch. It puts the interrupt handling for
each type of background process into a subroutine, instead of having
it all inline in the main loop. I feel this makes things more clear.
Hopefully it's uncontroversial.

0002 and 0003 are optional and slightly off-topic as far as the matter
at hand, but they are not entirely unrelated and I believe that they
are good cleanups.  0002 changes assorted background processes to use
PostgresSigHupHandler and ConfigReloadPending rather than having a
bunch of separate global variables and a bunch of separate signal
handlers that basically do the same stuff. 0003 takes this a step
further by trying to unify a bunch more error-handling across
different background process types. Together, 0002 and 0003 save >300
lines of code and as far as I can see there's basically no downside. I
think in general it would be good to strive towards a future where
everybody used the same signal handling and interrupt handling, but
these patches have the much less ambitious goal of just removing
overtly duplicated code. I think they might act as an inducement to
future patch authors to try to make things look more alike, though.

0004 is the main patch. It is substantially revised from Andres's
version, but the basic principle is along similar lines. Changes:

- I felt that it was imprudent to put some of the machinery into
PGPROC and the rest into the ProcSignal mechanism, because there is
nothing that says that everything with a PGPROC must also be a
ProcSignal participant. So I moved everything over into the ProcSignal
mechanism.  This seems a lot safer to me.

- I accordingly renamed this thing from GlobalBarrier to
ProcSignalBarrier. This is a lot more arguable, but as of now I favor
it.

- There details of the synchronization are different: there's no
ProcSignalKind for this new kind of barrier any more, and there's now
only one pg_memory_barrier() and it's in a different place than Andres
had it. It is quite possible that I have screwed things up here, but I
couldn't make heads or tails of the way Andres had it. I don't know
whether that's because it was a POC or because I'm dumb. Also, moving
everything over into ProcSignal seemed to me to suggest some
rejiggering here that might've made less sense in the old scheme.

- I changed the code that waits for a barrier to use a latch wait,
because I think that's our usual convention. So wait-for-barrier gets
woken up early if the latch is set, and also itself manipulates the
latch.  I also gave it a variable-length timeout, because when testing
it irked me that the same test randomly took either 0s or 1s. Now it
takes either 0s or 125ms, which is a lot less noticeable.

- I changed the kind of barrier provided by the patch from "checksum"
to "sample".

- Otherwise hacked on comments and naming somewhat.

This patch probably needs some more changes before commit. One likely
candidate: the use of a sample barrier type here probably needs to be
replaced with something else. But the way this is set up doesn't
really lend itself to starting with 0 types of barrier and then adding
them later, so I am not sure exactly what to do to make this patch
independent of what follows. The callback proposal I made before
could've accommodated that (as well as, perhaps, some automated
testing by dynamically loading up a barrier type) but Andres didn't
like that. Perhaps he'll relent, or have a counter-proposal, or
someone else will feel differently.

0005 is test code that sends and waits for a sample barrier.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

Thanks for the updated version!

On 2019-12-02 13:06:24 -0500, Robert Haas wrote:
> On Sun, Nov 17, 2019 at 8:38 AM Magnus Hagander <magnus@hagander.net> wrote:
> > I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review"
neededfor it. So I am also very happy for you to pick it up and run with it.
 
> 
> OK, here's what I came up with.
> 
> 0001 is just a code movement patch. It puts the interrupt handling for
> each type of background process into a subroutine, instead of having
> it all inline in the main loop. I feel this makes things more clear.
> Hopefully it's uncontroversial.

I'd suggest marking the shutdown functions as noreturn. Might prevent
some compiler warnings in the future.


> 0002 and 0003 are optional and slightly off-topic as far as the matter
> at hand, but they are not entirely unrelated and I believe that they
> are good cleanups.  0002 changes assorted background processes to use
> PostgresSigHupHandler and ConfigReloadPending rather than having a
> bunch of separate global variables and a bunch of separate signal
> handlers that basically do the same stuff.

Hm. Why "Postgres*"? I think that's confusing. So far only backend like
things are named like that, and I'd e.g. not call checkpointer that.

It's probably fine, but it's worthwhile to consider that that checking an
external variable like ConfigReloadPending is more expensive than a
static var like got_SIGHUP. The compiler will most of the time
understand the code flow to understand when something like got_SIGHUP
can change, but it can't assume anything like that with
ConfigReloadPending. But as we're forcing the compilers hand, by marking
the variable as volatile, so it's not likely to be a meaningful difference.


> 0003 takes this a step
> further by trying to unify a bunch more error-handling across
> different background process types. Together, 0002 and 0003 save >300
> lines of code and as far as I can see there's basically no downside. I
> think in general it would be good to strive towards a future where
> everybody used the same signal handling and interrupt handling, but
> these patches have the much less ambitious goal of just removing
> overtly duplicated code. I think they might act as an inducement to
> future patch authors to try to make things look more alike, though.

I do agree that this is a good goal. The amount of redundant code we
have across different types of processes is utterly ridiculous. While
obviously not a significant problem, there's also


> 0004 is the main patch. It is substantially revised from Andres's
> version, but the basic principle is along similar lines. Changes:
> 
> - I felt that it was imprudent to put some of the machinery into
> PGPROC and the rest into the ProcSignal mechanism, because there is
> nothing that says that everything with a PGPROC must also be a
> ProcSignal participant. So I moved everything over into the ProcSignal
> mechanism.  This seems a lot safer to me.

I don't fully remember how I ended up with that split. But I think there
is an argument that putting the generation in PGPROC is good: That way
there are fewer changes to the shared cachelines in the ProcSignal. As I
think there'll be more modifications to barrierGen/barrierFlags (as I'd
named them) than the number of times they're read, that could be
beneficial in keeping the overhead low.

It's probably still worthwhile to go the way you have.


> - There details of the synchronization are different: there's no
> ProcSignalKind for this new kind of barrier any more

This I do not get. What's the point in making this unconditional?


> , and there's now
> only one pg_memory_barrier() and it's in a different place than Andres
> had it. It is quite possible that I have screwed things up here, but I
> couldn't make heads or tails of the way Andres had it. I don't know
> whether that's because it was a POC or because I'm dumb. Also, moving
> everything over into ProcSignal seemed to me to suggest some
> rejiggering here that might've made less sense in the old scheme.

I don't remember enough to comment usefully on why things were the way
they are.


> - I changed the code that waits for a barrier to use a latch wait,
> because I think that's our usual convention. So wait-for-barrier gets
> woken up early if the latch is set, and also itself manipulates the
> latch.  I also gave it a variable-length timeout, because when testing
> it irked me that the same test randomly took either 0s or 1s. Now it
> takes either 0s or 125ms, which is a lot less noticeable.

Sounds good.


> From e9bfcbc3a1b1aa4a01a5a281bc71c6f019f5d46b Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas@postgresql.org>
> Date: Wed, 27 Nov 2019 11:29:25 -0500
> Subject: [PATCH v2 4/5] Extend the ProcSignal mechanism to support barriers.



> +uint64
> +EmitProcSignalBarrier(ProcSignalBarrierType type)
> +{
> +    uint64    flagbit = UINT64CONST(1) << (uint64) type;
> +    uint64    generation;
> +
> +    /*
> +     * Set all the flags.
> +     *
> +     * Note that pg_atomic_fetch_or_u32 has full barrier semantics, so this
> +     * is totally ordered with respect to anything the caller did before, and
> +     * anything that we do afterwards. (This is also true of the later call
> +     * to pg_atomic_add_fetch_u64.)
> +     */
> +    for (int i = NumProcSignalSlots - 1; i >= 0; i--)
> +    {
> +        volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
> +
> +        pg_atomic_fetch_or_u32(&slot->pss_barrierCheckMask, flagbit);
> +    }

What's the reason for doing this in reverse? SendProcSignal() does it
because short circuits the search once a match is found, and it
theorizes they're at the end, but that doesn't apply here?

It could potentially be worthwhile to check whether all backends already
have the flag set, and in that case not increment the barrier
generation? Probably not worth it for now.


> +/*
> + * WaitForProcSignalBarrier - wait until it is guaranteed that all changes
> + * requested by a specific call to EmitProcSignalBarrier() have taken effect.
> + *
> + * We expect that the barrier will normally be absorbed very quickly by other
> + * backends, so we start by waiting just 1/8 of a second and then back off
> + * by a factor of two every time we time out, to a maximum wait time of
> + * 1 second.
> + */
> +void
> +WaitForProcSignalBarrier(uint64 generation)
> +{
> +    long    timeout = 125L;
> +
> +    for (int i = NumProcSignalSlots - 1; i >= 0; i--)
> +    {
> +        volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
> +        uint64    oldval;
> +
> +        oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
> +        while (oldval < generation)

Hm. without a barrier here, is it guaranteed that we can rely on the
state we need? IOW, could it be that we see an update to
pss_barrierGeneration to our desired generation, but still see
"published" state by some other backend that isn't yet in accordance to
the the generation? Seems we ought to have at least a read barrier
somewhere?



> +/*
> + * Perform global barrier related interrupt checking.
> + *
> + * Any backend that participates in ProcSignal signalling must arrange to
> + * call this function periodically. It is called from CHECK_FOR_INTERRUPTS(),
> + * which is enough for normal backends, but not necessarily for all types of
> + * background processes.
> + */
> +void
> +ProcessProcSignalBarrier(void)
> +{
> +    uint64 generation;
> +    uint32 flags;
> +
> +    /* Exit quickly if there's no work to do. */
> +    if (!ProcSignalBarrierPending)
> +        return;
> +    ProcSignalBarrierPending = false;
> +
> +    /*
> +     * Read the current barrier generation, and then get the flags that
> +     * are set for this backend. Note that pg_atomic_exchange_u32 is a full
> +     * barrier, so we're guaranteed that the read of the barrier generation
> +     * happens before we atomically extract the flags, and that any subsequent
> +     * state changes happen afterward.
> +     */
> +    generation = pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration);
> +    flags = pg_atomic_exchange_u32(&MyProcSignalSlot->pss_barrierCheckMask, 0);
> +
> +    /*
> +     * Process each type of barrier. It's important that nothing we call from
> +     * here throws an error, because pss_barrierCheckMask has already been
> +     * cleared. If we jumped out of here before processing all barrier types,
> +     * then we'd forget about the need to do so later.
> +     *
> +     * NB: It ought to be OK to call the barrier-processing functions
> +     * unconditionally, but it's more efficient to call only the ones that
> +     * might need us to do something based on the flags.
> +     */
> +    if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_SAMPLE))
> +        ProcessBarrierSample();

Hm. In my mental model it would be useful for barrier "processors" to
not acknowledge the state change at certain points. Imagine e.g. needing
to efficiently wait till all backends have processed a config file
reload - since we don't reload while a query is being processed, we
should be able to not ack the barrier at that point.



Greetings,

Andres Freund



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Fri, Dec 6, 2019 at 12:17 PM Andres Freund <andres@anarazel.de> wrote:
> > 0001 is just a code movement patch. It puts the interrupt handling for
> > each type of background process into a subroutine, instead of having
> > it all inline in the main loop. I feel this makes things more clear.
> > Hopefully it's uncontroversial.
>
> I'd suggest marking the shutdown functions as noreturn. Might prevent
> some compiler warnings in the future.

That patch only adds one shutdown function, and it's already marked as
noreturn, because it prevents a compiler warning in the present. So I
don't know what you're asking for here.

> > 0002 and 0003 are optional and slightly off-topic as far as the matter
> > at hand, but they are not entirely unrelated and I believe that they
> > are good cleanups.  0002 changes assorted background processes to use
> > PostgresSigHupHandler and ConfigReloadPending rather than having a
> > bunch of separate global variables and a bunch of separate signal
> > handlers that basically do the same stuff.
>
> Hm. Why "Postgres*"? I think that's confusing. So far only backend like
> things are named like that, and I'd e.g. not call checkpointer that.

Well, that's the existing function name, and 0003 changes it anyway.

> It's probably fine, but it's worthwhile to consider that that checking an
> external variable like ConfigReloadPending is more expensive than a
> static var like got_SIGHUP. The compiler will most of the time
> understand the code flow to understand when something like got_SIGHUP
> can change, but it can't assume anything like that with
> ConfigReloadPending. But as we're forcing the compilers hand, by marking
> the variable as volatile, so it's not likely to be a meaningful difference.

Yeah, I think it's pretty much certain to be in the noise.

> > 0003 takes this a step
> > further by trying to unify a bunch more error-handling across
> > different background process types. Together, 0002 and 0003 save >300
> > lines of code and as far as I can see there's basically no downside. I
> > think in general it would be good to strive towards a future where
> > everybody used the same signal handling and interrupt handling, but
> > these patches have the much less ambitious goal of just removing
> > overtly duplicated code. I think they might act as an inducement to
> > future patch authors to try to make things look more alike, though.
>
> I do agree that this is a good goal. The amount of redundant code we
> have across different types of processes is utterly ridiculous. While
> obviously not a significant problem, there's also

You seem to have omitted the end of this sentence.

> > 0004 is the main patch. It is substantially revised from Andres's
> > version, but the basic principle is along similar lines. Changes:
> >
> > - I felt that it was imprudent to put some of the machinery into
> > PGPROC and the rest into the ProcSignal mechanism, because there is
> > nothing that says that everything with a PGPROC must also be a
> > ProcSignal participant. So I moved everything over into the ProcSignal
> > mechanism.  This seems a lot safer to me.
>
> I don't fully remember how I ended up with that split. But I think there
> is an argument that putting the generation in PGPROC is good: That way
> there are fewer changes to the shared cachelines in the ProcSignal. As I
> think there'll be more modifications to barrierGen/barrierFlags (as I'd
> named them) than the number of times they're read, that could be
> beneficial in keeping the overhead low.
>
> It's probably still worthwhile to go the way you have.

I think so. I think there's just too much chance for error/frustration
in having it in two separate mechanisms that might end up out of sync
with each other. We could pad the stuff within ProcSignal separately,
if required, but I'm hoping we're not going to be using this for
properties that change frequently.

> > - There details of the synchronization are different: there's no
> > ProcSignalKind for this new kind of barrier any more
>
> This I do not get. What's the point in making this unconditional?

I wasn't sure about this part, but Occam's razor argues for fewer
moving parts. As I have it, you need to make sure that the signal
handler is going to see our proc's generation as lower than the global
generation, and that when we go to process the interrupt (no longer in
the signal handler) we're going to be able to see the state changes
that we need to absorb. Now, if you also have a ProcSignal thing, then
you have three things to synchronize. When you see the flag set, you
need to be able to see the generations as unequal, and when you see
the generations as unequal, then you need to be able to see the state
changes that you need to absorb. That seems more complex, but it might
be worth it if the cost of comparing the barrier generations is too
much.

> > +uint64
> > +EmitProcSignalBarrier(ProcSignalBarrierType type)
> > +{
> > +     uint64  flagbit = UINT64CONST(1) << (uint64) type;
> > +     uint64  generation;
> > +
> > +     /*
> > +      * Set all the flags.
> > +      *
> > +      * Note that pg_atomic_fetch_or_u32 has full barrier semantics, so this
> > +      * is totally ordered with respect to anything the caller did before, and
> > +      * anything that we do afterwards. (This is also true of the later call
> > +      * to pg_atomic_add_fetch_u64.)
> > +      */
> > +     for (int i = NumProcSignalSlots - 1; i >= 0; i--)
> > +     {
> > +             volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
> > +
> > +             pg_atomic_fetch_or_u32(&slot->pss_barrierCheckMask, flagbit);
> > +     }
>
> What's the reason for doing this in reverse? SendProcSignal() does it
> because short circuits the search once a match is found, and it
> theorizes they're at the end, but that doesn't apply here?

Mmm, this was just copy and paste. I can switch it around.

> It could potentially be worthwhile to check whether all backends already
> have the flag set, and in that case not increment the barrier
> generation? Probably not worth it for now.

I don't think it's worth it now, or maybe ever. It seems unlikely to
come up, and it makes the synchronization more complex.

> > +void
> > +WaitForProcSignalBarrier(uint64 generation)
> > +{
> > +     long    timeout = 125L;
> > +
> > +     for (int i = NumProcSignalSlots - 1; i >= 0; i--)
> > +     {
> > +             volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
> > +             uint64  oldval;
> > +
> > +             oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
> > +             while (oldval < generation)
>
> Hm. without a barrier here, is it guaranteed that we can rely on the
> state we need? IOW, could it be that we see an update to
> pss_barrierGeneration to our desired generation, but still see
> "published" state by some other backend that isn't yet in accordance to
> the the generation? Seems we ought to have at least a read barrier
> somewhere?

Yeah, I think now that you mention it it should be a full barrier. The
caller might see that the barrier has been absorbed and immediately
perform either a read or a write.

> Hm. In my mental model it would be useful for barrier "processors" to
> not acknowledge the state change at certain points. Imagine e.g. needing
> to efficiently wait till all backends have processed a config file
> reload - since we don't reload while a query is being processed, we
> should be able to not ack the barrier at that point.

Yeah, you mentioned this before, but I think we could leave it for
later. I think it's going to be complex. I suppose the way to do it
would be to add an argument to ProcessProcSignalBarrier() that lets
you specify which barrier events you're OK with processing from the
current point in the code. However, that will mean that whenever
somebody adds a new barrier type, they have got to go through all of
those callers and think about whether they should add their new
barrier type into the flags or not. If we try to do the specific thing
you're talking about with config-file processing, it will also mean
that we could be waiting MUCH longer for acknowledgements, which I
think might have pretty far-reaching ramifications. You might get
stuck waiting for that barrier to be absorbed for hours, and that
would also impact later barriers, since you won't see the generation
bump to 42 until it goes through 40 and 41. That sounds fairly awful.
You could try to work around it by looking at which barrier flags are
set, but I think that has ABA problems.

It seems sufficient to me for now to come up with something that can
be used for enabling checksums online and for the thing that got me
motivated to work on this, which is ALTER SYSTEM READ ONLY. If we can
meet the requirements of those two patches with this, I think we
should be happy. And I don't think we need the feature you're talking
about here for either of those use cases. If there's some simple
change you want to propose, we can certainly talk about that, but it
looks to me like you're talking about adding a fairly complex
mechanism that wouldn't actually get used for anything right away, and
I'd rather not go there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: global / super barriers (for checksums)

From
Daniel Gustafsson
Date:
> On 9 Dec 2019, at 16:42, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 6, 2019 at 12:17 PM Andres Freund <andres@anarazel.de> wrote:

I've read through the patchset and played around with it to try and break it
and understand it (not in that order).  Being a bit out of my comfort zone, I
can't offer the deep insights that Andres has done; but in reading the code it
all makes sense, and it works as I expect it to.

>> Hm. Why "Postgres*"? I think that's confusing. So far only backend like
>> things are named like that, and I'd e.g. not call checkpointer that.
>
> Well, that's the existing function name, and 0003 changes it anyway.

Reading the patches in sequence I was irked by Postgres* as well, and was going
to suggest PostgresProcess*, but since 0003 changes it immediately it's less of
an issue.

>>> 0003 takes this a step
>>> further by trying to unify a bunch more error-handling across
>>> different background process types. Together, 0002 and 0003 save >300
>>> lines of code and as far as I can see there's basically no downside.

Agreed, I cannot see any negative impacts.  The cleanup of common code makes
the remaining process code more readable so +1.

The patch has a tiny typo though, s/procesess/processes/:

+ * Typically, this handler would be used for SIGTERM, but some procesess use

> This patch probably needs some more changes before commit. One likely
> candidate: the use of a sample barrier type here probably needs to be
> replaced with something else. But the way this is set up doesn't
> really lend itself to starting with 0 types of barrier and then adding
> them later, so I am not sure exactly what to do to make this patch
> independent of what follows.

Something which can be observed in order to hook up a test for it, but which
has no sideeffects?  A NOOP barrier which only does a debug elog?

> The callback proposal I made before
> could've accommodated that (as well as, perhaps, some automated
> testing by dynamically loading up a barrier type) but Andres didn't
> like that. Perhaps he'll relent, or have a counter-proposal, or
> someone else will feel differently.


I sort of like the callback idea conceptually, but Andres is making a good
point about the extensibility actually making it harder to reason about.

> It seems sufficient to me for now to come up with something that can
> be used for enabling checksums online

FWIW I've rebased the online checksums patch on top of this patchset instead of
Andres' previous patch and everything seems to work fine.  All tests pass, and
I've been unable to trigger anything unexpected.  I'll post the new version of
said patch shortly.

> and for the thing that got me
> motivated to work on this, which is ALTER SYSTEM READ ONLY.

Aha, thats a pretty neat usecase.

In order to play around with and try to understand the patchset I wrote a PoC
PGC_PROCSIGNALBARRIER class for GUCs and converted a few PGC_SIGHUP GUCs to use
a barrier rather than a HUP.  The idea was to test interacting with the API a
bit more than the online checksum patch does, as its needs are pretty basic.
In hacking this up I didn't really come across anything in particular that I
lacked, and the result worked fine (insofar as a PoC can be considered
working).

cheers ./daniel


Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Mon, Dec 9, 2019 at 7:37 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> I've read through the patchset and played around with it to try and break it
> and understand it (not in that order).  Being a bit out of my comfort zone, I
> can't offer the deep insights that Andres has done; but in reading the code it
> all makes sense, and it works as I expect it to.

Stellar. If nobody objects in the meantime, I plan to commit 0001-0003
next week.

> The patch has a tiny typo though, s/procesess/processes/:

But I'll fix that first. Thanks for the review.

> Something which can be observed in order to hook up a test for it, but which
> has no sideeffects?  A NOOP barrier which only does a debug elog?

It would be a bit hard to hook up a test for that, because a normal
test doesn't see the logs, and a TAP test could, but in this case what
you want to check for is that there are no stragglers who should have
gotten the memo and didn't, which supposes you have a list of everyone
who ought to get the memo. You could manually list out all the
background processes but that seems like it would be at risk of
becoming obsolete. You could check that everything in pg_stat_activity
emitted a message except for a known list of exceptions, which would
be less likely to become obsolete, but would also be complex. Also, I
think it's a crock to have something like that in there long term just
for testing purposes.

> I sort of like the callback idea conceptually, but Andres is making a good
> point about the extensibility actually making it harder to reason about.

That objection doesn't hold any water for me, because this is open
source. People can always patch the core. If we don't add hooks, then
we make life easier for fork maintainers (including my employer) and
harder for extension authors. I think that's *definitely* the wrong
bet for the community to be making; we should be trying very hard to
help extension authors and minimize the need for forks. If you install
an extension that uses a hook, any hook, and it breaks things, then
you get to keep both pieces.

And that approach would let us do spiffy testing without having to
leave cruft in core.

> FWIW I've rebased the online checksums patch on top of this patchset instead of
> Andres' previous patch and everything seems to work fine.  All tests pass, and
> I've been unable to trigger anything unexpected.  I'll post the new version of
> said patch shortly.

Cool.

> > and for the thing that got me
> > motivated to work on this, which is ALTER SYSTEM READ ONLY.
>
> Aha, thats a pretty neat usecase.

Thanks.

> In order to play around with and try to understand the patchset I wrote a PoC
> PGC_PROCSIGNALBARRIER class for GUCs and converted a few PGC_SIGHUP GUCs to use
> a barrier rather than a HUP.  The idea was to test interacting with the API a
> bit more than the online checksum patch does, as its needs are pretty basic.
> In hacking this up I didn't really come across anything in particular that I
> lacked, and the result worked fine (insofar as a PoC can be considered
> working).

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

On 2019-12-11 09:12:49 -0500, Robert Haas wrote:
> On Mon, Dec 9, 2019 at 7:37 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > I sort of like the callback idea conceptually, but Andres is making a good
> > point about the extensibility actually making it harder to reason about.
> 
> That objection doesn't hold any water for me, because this is open
> source. People can always patch the core.

I just don't buy this argument. There's a difference in between an
unpatched version of postgres suddenly potentially running hooks
everywhere CFI() etc is called, and some user patching postgres to
behave differently. In the former case we'll have to ask to reproduce
problems without extension in a lot more cases.  For me code like this
that runs in pretty low level situations that we've gotten wrong more
than once, doesn't benefit by being extensible. We just make things more
fragile, and provide traps for extension authors.


> If we don't add hooks, then we make life easier for fork maintainers
> (including my employer) and harder for extension authors. I think
> that's *definitely* the wrong bet for the community to be making; we
> should be trying very hard to help extension authors and minimize the
> need for forks. If you install an extension that uses a hook, any
> hook, and it breaks things, then you get to keep both pieces.

But that's just not how it ends up working in a lot of cases? People
still report bugs to the list, and the debugging experience of problems
where an extension causes crashes-at-a-distance is pretty bad.

Greetings,

Andres Freund



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Wed, Dec 11, 2019 at 12:38 PM Andres Freund <andres@anarazel.de> wrote:
> I just don't buy this argument. There's a difference in between an
> unpatched version of postgres suddenly potentially running hooks
> everywhere CFI() etc is called, and some user patching postgres to
> behave differently. In the former case we'll have to ask to reproduce
> problems without extension in a lot more cases.  For me code like this
> that runs in pretty low level situations that we've gotten wrong more
> than once, doesn't benefit by being extensible. We just make things more
> fragile, and provide traps for extension authors.

It seems to me that this amounts to an argument that (a) core
developers are smarter than extension developers, and are thus the
only ones entitled to play with sharp tools, and/or (b) core
developers are more important than extension developers, and thus
inconveniencing extension developers is OK if it makes life better for
core developers. Both propositions seem pretty arrogant to me.

> But that's just not how it ends up working in a lot of cases? People
> still report bugs to the list, and the debugging experience of problems
> where an extension causes crashes-at-a-distance is pretty bad.

I agree that crashes at a distance are bad, and that those caused by
extensions are more annoying than those caused by core code, because
in the latter case there is a silent contributor to the problem about
which we may have little information. However, I disagree with the
idea that the right solution is to try to lock things down so that
extension authors can't do stuff. Extensibility is an important part
of what has made PostgreSQL successful, and I think we need more of
it, not less.

We can to some extent mitigate these kinds of problems by adding
assertions to our code that will catch usage errors, which has the
advantage that those things also get caught when somebody accidentally
introduces such bugs into core. To the extent that we can't mitigate
them, I think we should live with them, because I think extensibility
is far too valuable to cast aside over such concerns.

While I have passionate philosophical feelings about this topic, for
purposes of the present thread the really important question (IMV,
anyway) is whether there's any way of getting a patch for global
barriers committed in advance of the first user of such barriers. Both
your version and mine add an enum, and I think that you can't have an
enum with no elements. If you have a suggestion as to how we can
structure things so that we can start out with 0 users of this
mechanism rather than needing to start out with 1, I'd like to hear
them. If not, then I guess we'll need to decide which of checksum
enable/disable and ALTER SYSTEM READ ONLY is going to go first and
commit this only when that patch is ready to go as well. Or, I
suppose, commit it with a dummy placeholder that then gets replaced by
whichever patch goes first, but I'm not sure whether people would be
OK with that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: global / super barriers (for checksums)

From
Andres Freund
Date:
Hi,

On 2019-12-11 13:35:26 -0500, Robert Haas wrote:
> While I have passionate philosophical feelings about this topic, for
> purposes of the present thread the really important question (IMV,
> anyway) is whether there's any way of getting a patch for global
> barriers committed in advance of the first user of such barriers.

Right. I think there is.


> If not, then I guess we'll need to decide which of checksum
> enable/disable and ALTER SYSTEM READ ONLY is going to go first and
> commit this only when that patch is ready to go as well. Or, I
> suppose, commit it with a dummy placeholder that then gets replaced by
> whichever patch goes first, but I'm not sure whether people would be
> OK with that.

I'd either add a test (if we have some) or placeholder kind
initially. But I'd also be ok with going for either of the other
versions directly - but it seems harder to tackle the patches together.

Greetings,

Andres Freund



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Thu, Dec 12, 2019 at 2:54 PM Andres Freund <andres@anarazel.de> wrote:
> I'd either add a test (if we have some) or placeholder kind
> initially. But I'd also be ok with going for either of the other
> versions directly - but it seems harder to tackle the patches together.

OK. I have committed 0001-0003 as I had mentioned last week that I
intended to do. For 0004, I have replaced "sample" with "placeholder"
and added comments explaining that this is intended to be replaced by
the first real user of the mechanism. If there are no further
comments/objections I'll go ahead with this one as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: global / super barriers (for checksums)

From
Sergei Kornilov
Date:
Hello

> Stellar. If nobody objects in the meantime, I plan to commit 0001-0003
> next week.

My compiler (gcc 8.3.0) is not happy with recent 5910d6c7e311f0b14e3d3cb9ce3597c01d3a3cde commit:

autovacuum.c:831:1: error: ‘AutoVacLauncherShutdown’ was used with no prototype before its definition
[-Werror=missing-prototypes]
checkpointer.c:524:1: error: ‘HandleCheckpointerInterrupts’ was used with no prototype before its definition
[-Werror=missing-prototypes]

I think definition should looks as in attached patch. With this change build is clean

regards, Sergei
Attachment

Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Tue, Dec 17, 2019 at 1:44 PM Sergei Kornilov <sk@zsrv.org> wrote:
> > Stellar. If nobody objects in the meantime, I plan to commit 0001-0003
> > next week.
>
> My compiler (gcc 8.3.0) is not happy with recent 5910d6c7e311f0b14e3d3cb9ce3597c01d3a3cde commit:
>
> autovacuum.c:831:1: error: ‘AutoVacLauncherShutdown’ was used with no prototype before its definition
[-Werror=missing-prototypes]
> checkpointer.c:524:1: error: ‘HandleCheckpointerInterrupts’ was used with no prototype before its definition
[-Werror=missing-prototypes]
>
> I think definition should looks as in attached patch. With this change build is clean

Andrew Gierth complained about this too over on -committers, and I saw
his message first and pushed a fix. It includes the first and third
hunks from your proposed patch, but not the second one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: global / super barriers (for checksums)

From
Sergei Kornilov
Date:
Hi

> Andrew Gierth complained about this too over on -committers, and I saw
> his message first and pushed a fix. It includes the first and third
> hunks from your proposed patch, but not the second one.

Yep, I received his email just after sending mine. Thanks, my build is clean now.

regards, Sergei



Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Tue, Dec 17, 2019 at 1:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 12, 2019 at 2:54 PM Andres Freund <andres@anarazel.de> wrote:
> > I'd either add a test (if we have some) or placeholder kind
> > initially. But I'd also be ok with going for either of the other
> > versions directly - but it seems harder to tackle the patches together.
>
> OK. I have committed 0001-0003 as I had mentioned last week that I
> intended to do. For 0004, I have replaced "sample" with "placeholder"
> and added comments explaining that this is intended to be replaced by
> the first real user of the mechanism. If there are no further
> comments/objections I'll go ahead with this one as well.

And, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: global / super barriers (for checksums)

From
Magnus Hagander
Date:
On Thu, Dec 19, 2019 at 8:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 17, 2019 at 1:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 12, 2019 at 2:54 PM Andres Freund <andres@anarazel.de> wrote:
> > I'd either add a test (if we have some) or placeholder kind
> > initially. But I'd also be ok with going for either of the other
> > versions directly - but it seems harder to tackle the patches together.
>
> OK. I have committed 0001-0003 as I had mentioned last week that I
> intended to do. For 0004, I have replaced "sample" with "placeholder"
> and added comments explaining that this is intended to be replaced by
> the first real user of the mechanism. If there are no further
> comments/objections I'll go ahead with this one as well.

And, done.

\o/

//Magnus
 

Re: global / super barriers (for checksums)

From
Robert Haas
Date:
On Mon, Dec 9, 2019 at 10:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > Hm. In my mental model it would be useful for barrier "processors" to
> > not acknowledge the state change at certain points. Imagine e.g. needing
> > to efficiently wait till all backends have processed a config file
> > reload - since we don't reload while a query is being processed, we
> > should be able to not ack the barrier at that point.
>
> Yeah, you mentioned this before, but I think we could leave it for
> later. I think it's going to be complex. I suppose the way to do it
> would be to add an argument to ProcessProcSignalBarrier() that lets
> you specify which barrier events you're OK with processing from the
> current point in the code. However, that will mean that whenever
> somebody adds a new barrier type, they have got to go through all of
> those callers and think about whether they should add their new
> barrier type into the flags or not. If we try to do the specific thing
> you're talking about with config-file processing, it will also mean
> that we could be waiting MUCH longer for acknowledgements, which I
> think might have pretty far-reaching ramifications. You might get
> stuck waiting for that barrier to be absorbed for hours, and that
> would also impact later barriers, since you won't see the generation
> bump to 42 until it goes through 40 and 41. That sounds fairly awful.
> You could try to work around it by looking at which barrier flags are
> set, but I think that has ABA problems.

I might have rejected this idea too hastily. At any rate, I have a new
view on it having studied the issue a bit more.

In the case of ALTER SYSTEM READ ONLY, the big problem is that we
can't afford to find out that we're unable to emit WAL after we've
already entered a critical section, because "ERROR: system is read
only" or similar will get promoted to PANIC due to the critical
section and that will be sad. (Before someone asks, not promoting the
ERROR to PANIC would be unsafe, even for this specific case.) Some WAL
records don't use a critical section, though it's recently been
proposed[1] that we add a critical section in at least some of the
cases that currently lack one. For those, I think we can just upgrade
the existing test-and-elog cases in XLogInsert() and XLogBeginInsert()
to ereport() and call it good. But the cases that do have a critical
section are harder.

Currently, I see only the following two options:

(1) Check just before entering a critical section whether or not the
system is read only, and if so, error out. Once we've entered a
critical section, ProcessInterrupts() will not do anything, so at that
point we're safe. This could be done either by making
START_CRIT_SECTION() itself do it or by finding every critical section
that is used for inserting WAL and adding a call just beforehand. The
latter is probably better, as some critical sections appear to be used
for other purposes; see PGSTAT_BEGIN_WRITE_ACTIVITY in particular. But
it means changing things in a bunch of places, adding an if-test. That
wouldn't add *much* overhead, but it's not quite zero.

(2) Accept barrier events for read-only/read-write changes only at
command boundaries. In that case, we only need to check for a read
only state around the places where we currently do
PreventCommandIfReadOnly and similar, and the additional overhead
should be basically nil. However, this seems pretty unappealing to me,
because it means that if you try to make the system READ ONLY, it
might run for hours before actually going read only. If you're trying
to make the system read only because the master has become isolated
from the rest of your network, that's not a fast enough response.

In general, I think the problem here is to avoid TOCTTOU problems. I
think the checksums patch has this problem too. For instance, the
changes to basebackup.c in that function check
DataChecksumsNeedVerify() only once per file, but what is to say that
a call to ProcessInterrupts() couldn't happen in the middle of sending
a file, changing prevailing value? If someone sets checksums off, and
everyone acknowledges that they're off, then backends may begin
writing blocks without setting checksums, and then this code will
potentially try to verify checksums in a block written without them.
That patch also modifies the definition of XLogHintBitIsNeeded(), so
e.g. log_heap_visible is wrong if a CHECK_FOR_INTERRUPTS() can happen
in XLogRegisterBuffer(). I don't think it can currently, but it seems
like a heck of a fragile assumption, and I don't see how we can be
sure that there's no test for XLogHintBitIsNeeded() that does
CHECK_FOR_INTERRUPTS() between where it's tested and where it does
something critical with the result of that test.

At the moment, the ALTER SYSTEM READ ONLY problem appears to me to be
the lesser of the two (although I may be wrong). Assuming people are
OK with inserting an additional check before each xlog-writing
critical section, we can just go do that and then I think the problem
is pretty much solved. The only way there would be a residual problem,
I think, is if something could change between the time XLogInsert()
checks whether xlog writing is allowed and the time when it does the
insert. Even if that's an issue, it can be fixed with a
HOLD_INTERRUPTS/RESUME_INTERRUPTS in just that function. The meaning
of READ ONLY is just that xlog insertion is prohibited, so the only
use of the value is for allowing XLogInsert() to go ahead or not. But
for checksums, there's not such a clear definition of what it means to
use the value, nor does it get used in only one place, so the
situation seems less clear.

In short, I am not still not sure that we need the facility Andres was
asking for here, but I do now understand better why Andres was worried
about having this capability, and I do think that patches using it are
going to need to be very carefully studied for TOCTTOU problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/20191207001232.klidxnm756wqxvwx@alap3.anarazel.de