Thread: global / super barriers (for checksums)
Hi, Magnus cornered me at pgconf.eu and asked me whether I could prototype the "barriers" I'd been talking about in the online checksumming thread. The problem there was to make sure that all processes, backends and auxiliary processes have seen the new state of checksums being enabled, and aren't currently in the process of writing a new page out. The current prototype solves that by requiring a restart, but that strikes me as a far too large hammer. The attached patch introduces "global barriers" (name was invented in a overcrowded hotel lounge, so ...), which allow to wait for such a change to be absorbed by all backends. I've only tested the code with gdb, but that seems to work: p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM)) waits until all backends (including bgwriter, checkpointers, walwriters, bgworkers, ...) have accepted interrupts at least once. Multiple such requests are coalesced. I decided to wait until interrupts are actually process, rather than just the signal received, because that means the system is in a well defined state. E.g. there's no pages currently being written out. For the checksum enablement patch you'd do something like; EnableChecksumsInShmemWithLock(); WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM)); and after that you should be able to set it to a perstistent mode. I chose to use procsignals to send the signals, a global uint64 globalBarrierGen, and per-backend barrierGen, barrierFlags, with the latter keeping track which barriers have been requested. There likely seem to be other usecases. The patch definitely is in a prototype stage. At the very least it needs a high-level comment somewhere, and some of the lower-level code needs to be cleaned up. One thing I wasn't happy about is how checksum internals have to absorb barrier requests - that seems unavoidable, but I'd hope for something more global than just BufferSync(). Comments? Greetings, Andres Freund
Attachment
On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Magnus cornered me at pgconf.eu and asked me whether I could prototype
the "barriers" I'd been talking about in the online checksumming thread.
The problem there was to make sure that all processes, backends and
auxiliary processes have seen the new state of checksums being enabled,
and aren't currently in the process of writing a new page out.
The current prototype solves that by requiring a restart, but that
strikes me as a far too large hammer.
The attached patch introduces "global barriers" (name was invented in a
overcrowded hotel lounge, so ...), which allow to wait for such a change
to be absorbed by all backends.
I've only tested the code with gdb, but that seems to work:
p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))
waits until all backends (including bgwriter, checkpointers, walwriters,
bgworkers, ...) have accepted interrupts at least once. Multiple such
requests are coalesced.
I decided to wait until interrupts are actually process, rather than
just the signal received, because that means the system is in a well
defined state. E.g. there's no pages currently being written out.
For the checksum enablement patch you'd do something like;
EnableChecksumsInShmemWithLock();
WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
and after that you should be able to set it to a perstistent mode.
I chose to use procsignals to send the signals, a global uint64
globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
latter keeping track which barriers have been requested. There likely
seem to be other usecases.
The patch definitely is in a prototype stage. At the very least it needs
a high-level comment somewhere, and some of the lower-level code needs
to be cleaned up.
One thing I wasn't happy about is how checksum internals have to absorb
barrier requests - that seems unavoidable, but I'd hope for something
more global than just BufferSync().
Comments?
Finally getting around to playing with this one and it unfortunately doesn't apply anymore (0003).
I think it's just a matter of adding those two rows though, right? That is, it's not an actual conflict it's just something else added in the same place?
Hi, On 2018-12-27 13:54:34 +0100, Magnus Hagander wrote: > Finally getting around to playing with this one and it unfortunately > doesn't apply anymore (0003). > > I think it's just a matter of adding those two rows though, right? That is, > it's not an actual conflict it's just something else added in the same > place? What do you mean with "rows" here? I see a bunch of trivial conflicts due to changes in atomics initialization but nothing else? And yes, I'd not expect any meaningful conflicts. Greetings, Andres Freund
On Thu, Dec 27, 2018 at 2:22 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-12-27 13:54:34 +0100, Magnus Hagander wrote:
> Finally getting around to playing with this one and it unfortunately
> doesn't apply anymore (0003).
>
> I think it's just a matter of adding those two rows though, right? That is,
> it's not an actual conflict it's just something else added in the same
> place?
What do you mean with "rows" here? I see a bunch of trivial conflicts
due to changes in atomics initialization but nothing else? And yes, I'd
not expect any meaningful conflicts.
Sorry, lack of caffeine. The conflict I saw was:
/* Initialize lockGroupMembers list. */
dlist_init(&procs[i].lockGroupMembers);
+
+ pg_atomic_init_u32(&procs[i].barrierFlags, 0);
+ pg_atomic_init_u64(&procs[i].barrierGen, PG_UINT64_MAX);
}
So yes, I'm pretty sure we're talking about the same thing.
On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Magnus cornered me at pgconf.eu and asked me whether I could prototype
the "barriers" I'd been talking about in the online checksumming thread.
The problem there was to make sure that all processes, backends and
auxiliary processes have seen the new state of checksums being enabled,
and aren't currently in the process of writing a new page out.
The current prototype solves that by requiring a restart, but that
strikes me as a far too large hammer.
The attached patch introduces "global barriers" (name was invented in a
overcrowded hotel lounge, so ...), which allow to wait for such a change
to be absorbed by all backends.
I've only tested the code with gdb, but that seems to work:
p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))
waits until all backends (including bgwriter, checkpointers, walwriters,
bgworkers, ...) have accepted interrupts at least once. Multiple such
requests are coalesced.
I decided to wait until interrupts are actually process, rather than
just the signal received, because that means the system is in a well
defined state. E.g. there's no pages currently being written out.
For the checksum enablement patch you'd do something like;
EnableChecksumsInShmemWithLock();
WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
and after that you should be able to set it to a perstistent mode.
I chose to use procsignals to send the signals, a global uint64
globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
latter keeping track which barriers have been requested. There likely
seem to be other usecases.
The patch definitely is in a prototype stage. At the very least it needs
a high-level comment somewhere, and some of the lower-level code needs
to be cleaned up.
One thing I wasn't happy about is how checksum internals have to absorb
barrier requests - that seems unavoidable, but I'd hope for something
more global than just BufferSync().
Comments?
Finally getting back to this one.
In re-reading this, I notice there are a lot of references to Intterrupt (with two t). I'm guessing this is just a spelling error, and not something that actually conveys some meaning?
Can you elaborate on what you mean with:
+ /* XXX: need a more principled approach here */
Is that the thing you refer to above about "checksum internals"?
Also in checking we figured it'd be nice to have a wait event for this, since a process can potentially get stuck in an infinite loop waiting for some other process if it's misbehaving. Kind of like the attached?
//Magnus
Attachment
Hi, On 2019-07-10 15:31:11 +0200, Magnus Hagander wrote: > In re-reading this, I notice there are a lot of references to Intterrupt > (with two t). I'm guessing this is just a spelling error, and not something > that actually conveys some meaning? Just a spelling error. I think I wrote the patch in a night after pgconf.eu, to allow you to quickly make progress :P > Can you elaborate on what you mean with: > + /* XXX: need a more principled approach here */ > Is that the thing you refer to above about "checksum internals"? I think I didn't actually mean "checksum" but instead "checkpoint". It does bother me that we have an operation as long-running as BufferSync() commonly is, without a proper way to accept event. There's a hack for doing something similar-ish in CheckpointWriteDelay(), for absorbing fsync requests, but it doesn't trigger for checkpoints not done in checkpointer, nor is it really extensible. > Also in checking we figured it'd be nice to have a wait event for this, > since a process can potentially get stuck in an infinite loop waiting for > some other process if it's misbehaving. Kind of like the attached? Yea, that makes sense. Greetings, Andres Freund
On Tue, Oct 30, 2018 at 1:17 AM Andres Freund <andres@anarazel.de> wrote: > The patch definitely is in a prototype stage. At the very least it needs > a high-level comment somewhere, and some of the lower-level code needs > to be cleaned up. > > One thing I wasn't happy about is how checksum internals have to absorb > barrier requests - that seems unavoidable, but I'd hope for something > more global than just BufferSync(). Hi, TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I have some concerns about 0003 and am interested in working further on it. 0001 changes the StartBackgroundWorker so that the SIGINT handler is contingent on BGWORKER_SHMEM_ACCESS rather than BGWORKER_BACKEND_DATABASE_CONNECTION. It seems to me that the goal here should be to use procsignal_sigusr1_handler(), or something that calls it, in any process where ProcSignalInit() is called, but a backend that only requests BGWORKER_SHMEM_ACCESS probably won't, because the normal way for a background worker to call ProcSignalInit() would be to indirectly call InitPostgres() by means of BackgroundWorkerInitializeConnection() or BackgroundWorkerInitializeConnectionByOid(), and a worker that didn't ask for a database connection presumably shouldn't be doing that. So I'm not sure that I understand why we need this. (Is it legal for a worker to call one of these functions as long as it passes InvalidOid for the database OID, or something? Do we have examples of workers that do that?) On the other hand, 0002 seems like it's pretty clearly a good idea. It makes a whole bunch of auxiliary processes use procsignal_sigusr1_handler() and those things all get called from AuxiliaryProcessMain(), which does ProcSignalInit(), and it seems like clearly the right idea that processes which register to participate in the procsignal mechanism should also register to get notified if they receive a procsignal. I think that the reason we haven't bothered with this up until now is because I think that it's presently impossible for any of the kind of procsignals that we have to get sent to any of those processes. But, global barriers would require us to do so, so it seems like it's time to tighten that up, and it doesn't really cost anything. So I propose to commit this part soon, unless somebody objects. Regarding 0003: - The proc->pid == 0 check in EmitGlobalBarrier() doesn't really do what it appears to do, because regular backends don't clear proc->pid when they exit; only auxiliary processes do. (We could, and perhaps should, change that.) - It seems to me that it would be generally better to insert CHECK_FOR_INTERRUPTS() in places where the patch just inserts an ad-hoc if (GlobalBarrierInterruptPending) ProcessGlobalBarrierIntterupt(). There might be someplace where we have to do it the latter way because we unavoidably hold an LWLock or something, but I think we should avoid that whenever possible. If it's a good place to check for one kind of interrupt, it's probably a good place to check for all of them. - I don't think BufferSync() is doing this in the right place. Unless I am confused, it's doing it inside a loop that just allocates stuff and copies data around, which probably does not need this kind of decoration. It wouldn't hurt to check here, and it might be a good idea for safety, but I think the place that really needs this treatment is the following loop where we are actually doing I/O and sleeping. Possibly we should even be doctoring CheckpointWriteDelay to use a latch-wait loop that can CHECK_FOR_INTERRUPTS() before re-sleeping, although for 100ms (curiously described as a Big Sleep) it might be overkill. - I think it would be nice to make this system more extensible. IIUC, the idea is that ProcessGlobalBarrierIntterupt() will call bespoke code for each GLOBBAR_* flag that set, and that code will then update process-local state from global state before returning. But, instead of hard-coding constants, how about just having a list of callbacks, and we call them all here, and each one is responsible for figuring out whether anything's been changed that it cares about, and if so updating the appropriate local state? Then GlobalBarrierKind goes away altogether. Granted, this could be a bit expensive if we were using global barriers for lots of different things and at least some of those things occurred frequently, but I don't think we're very near the point where we need that kind of optimization. As long as the callbacks have tests that exit quickly if there's nothing to do, I think it should be fine. (As an intermediate position, we could consider keeping barrierFlags but assign the bits dynamically; but unless and until we get more users of the facility, I think it might be simplest to just forget about barrierFlags altogether. Then even extensions can use this, if they want.) - The patch needs some general tidying up, like comments and naming consistency and stuff like that. Andres, Magnus, if neither of you are planning to work on this soon, I might like to jump in and run with this. Please let me know your thoughts. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-11-13 12:26:34 -0500, Robert Haas wrote: > TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I > have some concerns about 0003 and am interested in working further on > it. Thanks for looking at the patch! > 0001 changes the StartBackgroundWorker so that the SIGINT handler is > contingent on BGWORKER_SHMEM_ACCESS rather than > BGWORKER_BACKEND_DATABASE_CONNECTION. It seems to me that the goal > here should be to use procsignal_sigusr1_handler(), or something that > calls it, in any process where ProcSignalInit() is called, but a > backend that only requests BGWORKER_SHMEM_ACCESS probably won't, > because the normal way for a background worker to call > ProcSignalInit() would be to indirectly call InitPostgres() by means > of BackgroundWorkerInitializeConnection() or > BackgroundWorkerInitializeConnectionByOid(), and a worker that didn't > ask for a database connection presumably shouldn't be doing that. So > I'm not sure that I understand why we need this. (Is it legal for a > worker to call one of these functions as long as it passes InvalidOid > for the database OID, or something? Do we have examples of workers > that do that?) Hm. Well, it seems useful for non-database connected processes to be able to partake in global barriers. Without that, if there's any chance they could e.g. generate WAL, it'd e.g. break the checksum enablement patch. Note that auxiliary processes already do call ProcSignalInit(). You're right that we ought to make it easier (or automatic) to call ProcSignalInit() for such processes. Perhaps we ought to do so in ProcessInit()? But perhaps we don't strictly need this - I'm not sure how many examples of BGWORKER_SHMEM_ACCESS bgworkers that don't also use BGWORKER_BACKEND_DATABASE_CONNECTION there are. > Regarding 0003: > > - The proc->pid == 0 check in EmitGlobalBarrier() doesn't really do > what it appears to do, because regular backends don't clear proc->pid > when they exit; only auxiliary processes do. (We could, and perhaps > should, change that.) Ick. > - It seems to me that it would be generally better to insert > CHECK_FOR_INTERRUPTS() in places where the patch just inserts an > ad-hoc if (GlobalBarrierInterruptPending) > ProcessGlobalBarrierIntterupt(). There might be someplace where we > have to do it the latter way because we unavoidably hold an LWLock or > something, but I think we should avoid that whenever possible. If it's > a good place to check for one kind of interrupt, it's probably a good > place to check for all of them. I might be missing something. Aren't all of the places where those checks are places where we currently can't do a CHECK_FOR_INTERRUPTS()? I've swapped this thoroughly out of my mind, but going through them: 1) AutoVacLauncherMain() - doesn't do CFI() 2) BackgroundWriterMain() - dito 3) CheckpointerMain() - dito 4) HandleStartupProcInterrupts() - dito 5) WalWriterMain() - dito 6) BufferSync() - dito, called from CheckpointerMain(), and startup process 7) ProcessClientWriteInterrupt() - can't do generic CFI, don't want to process all interrupts while writing out data, to avoid corrupting the output stream or loosing messages Which one do you think we should convert to CFI()? As far as I can tell we can't make the non-backend cases use the postgres.c ProcessInterrupts(), and the ProcessClientWriteInterrupt() one can't do so either. > - I don't think BufferSync() is doing this in the right place. Unless > I am confused, it's doing it inside a loop that just allocates stuff > and copies data around, which probably does not need this kind of > decoration. It wouldn't hurt to check here, and it might be a good > idea for safety, but I think the place that really needs this > treatment is the following loop where we are actually doing I/O and > sleeping. Possibly we should even be doctoring CheckpointWriteDelay to > use a latch-wait loop that can CHECK_FOR_INTERRUPTS() before > re-sleeping, although for 100ms (curiously described as a Big Sleep) > it might be overkill. Yea. Not sure what happened there. I think it's good to have such a check in all the buffer loops in BufferSync(), but clearly it's most important to have one in the write case. > - I think it would be nice to make this system more extensible. IIUC, > the idea is that ProcessGlobalBarrierIntterupt() will call bespoke > code for each GLOBBAR_* flag that set, and that code will then update > process-local state from global state before returning. But, instead > of hard-coding constants, how about just having a list of callbacks, > and we call them all here, and each one is responsible for figuring > out whether anything's been changed that it cares about, and if so > updating the appropriate local state? I don't think that's a good idea. This stuff happens at an extremely low level, in different types of processes, with various locks held etc. Running arbitrary code in those circumstances strikes me as a seriously bad idea. > Then GlobalBarrierKind goes away altogether. Granted, this could be a > bit expensive if we were using global barriers for lots of different > things and at least some of those things occurred frequently, but I > don't think we're very near the point where we need that kind of > optimization. As long as the callbacks have tests that exit quickly if > there's nothing to do, I think it should be fine. (As an intermediate > position, we could consider keeping barrierFlags but assign the bits > dynamically; but unless and until we get more users of the facility, I > think it might be simplest to just forget about barrierFlags > altogether. Then even extensions can use this, if they want.) I think we need the option to *not* "accept" the barrier (as in, not update MyProc->barrierGen), because we cannot guarantee the option has taken effect yet. Without something identifying which types of events being waited for, that seems hard. > - The patch needs some general tidying up, like comments and naming > consistency and stuff like that. Yea. It really was a prototype to allow Magnus to continue... > Andres, Magnus, if neither of you are planning to work on this soon, I > might like to jump in and run with this. Please let me know your > thoughts. I'm not planning to do so in the near term - so I'd more than welcome you to do so. Greetings, Andres Freund
On Wed, Nov 13, 2019 at 2:45 PM Andres Freund <andres@anarazel.de> wrote: > I might be missing something. Aren't all of the places where those > checks are places where we currently can't do a CHECK_FOR_INTERRUPTS()? > I've swapped this thoroughly out of my mind, but going through them: > > 1) AutoVacLauncherMain() - doesn't do CFI() > 2) BackgroundWriterMain() - dito > 3) CheckpointerMain() - dito > 4) HandleStartupProcInterrupts() - dito > 5) WalWriterMain() - dito > 6) BufferSync() - dito, called from CheckpointerMain(), and startup process > 7) ProcessClientWriteInterrupt() - can't do generic CFI, don't want to > process all interrupts while writing out data, to avoid corrupting > the output stream or loosing messages > > Which one do you think we should convert to CFI()? As far as I can tell > we can't make the non-backend cases use the postgres.c > ProcessInterrupts(), and the ProcessClientWriteInterrupt() one can't do > so either. I haven't looked through all of these, but in AutoVacLauncherMain, a trivial conversion to CFI doesn't seem to break anything horribly (see attached). It does change the error message when we exit, making it chattier, but I think we could find our way around that problem. Note that AutoVacLauncherMain, and some of the others, do a HOLD_INTERRUPTS() and a RESUME_INTERRUPTS() in the error-recovery block, so somebody evidently thought some of that code might call CHECK_FOR_INTERRUPTS(), and I can't prove off-hand that none of the other logic which isn't so protected doesn't have some way to reach CHECK_FOR_INTERRUPTS(). It seems to me that there are so many places where PostgreSQL calls CHECK_FOR_INTERRUPTS() that it's somewhat unwise to assume that just "can't happen." -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Nov 13, 2019 at 8:45 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-11-13 12:26:34 -0500, Robert Haas wrote:
> TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I
> have some concerns about 0003 and am interested in working further on
> it.
Thanks for looking at the patch!
+1 (well, +<more than one>, but there is a quota)
> - The patch needs some general tidying up, like comments and naming
> consistency and stuff like that.
Yea. It really was a prototype to allow Magnus to continue...
And a very useful one! :) So thanks for that one as well.
> Andres, Magnus, if neither of you are planning to work on this soon, I
> might like to jump in and run with this. Please let me know your
> thoughts.
I'm not planning to do so in the near term - so I'd more than welcome
you to do so.
I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review" needed for it. So I am also very happy for you to pick it up and run with it.
On Wed, Nov 13, 2019 at 12:26 PM Robert Haas <robertmhaas@gmail.com> wrote: > On the other hand, 0002 seems like it's pretty clearly a good idea. It > makes a whole bunch of auxiliary processes use > procsignal_sigusr1_handler() and those things all get called from > AuxiliaryProcessMain(), which does ProcSignalInit(), and it seems like > clearly the right idea that processes which register to participate in > the procsignal mechanism should also register to get notified if they > receive a procsignal. I think that the reason we haven't bothered with > this up until now is because I think that it's presently impossible > for any of the kind of procsignals that we have to get sent to any of > those processes. But, global barriers would require us to do so, so it > seems like it's time to tighten that up, and it doesn't really cost > anything. So I propose to commit this part soon, unless somebody > objects. Done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 17, 2019 at 8:38 AM Magnus Hagander <magnus@hagander.net> wrote: > I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review" neededfor it. So I am also very happy for you to pick it up and run with it. OK, here's what I came up with. 0001 is just a code movement patch. It puts the interrupt handling for each type of background process into a subroutine, instead of having it all inline in the main loop. I feel this makes things more clear. Hopefully it's uncontroversial. 0002 and 0003 are optional and slightly off-topic as far as the matter at hand, but they are not entirely unrelated and I believe that they are good cleanups. 0002 changes assorted background processes to use PostgresSigHupHandler and ConfigReloadPending rather than having a bunch of separate global variables and a bunch of separate signal handlers that basically do the same stuff. 0003 takes this a step further by trying to unify a bunch more error-handling across different background process types. Together, 0002 and 0003 save >300 lines of code and as far as I can see there's basically no downside. I think in general it would be good to strive towards a future where everybody used the same signal handling and interrupt handling, but these patches have the much less ambitious goal of just removing overtly duplicated code. I think they might act as an inducement to future patch authors to try to make things look more alike, though. 0004 is the main patch. It is substantially revised from Andres's version, but the basic principle is along similar lines. Changes: - I felt that it was imprudent to put some of the machinery into PGPROC and the rest into the ProcSignal mechanism, because there is nothing that says that everything with a PGPROC must also be a ProcSignal participant. So I moved everything over into the ProcSignal mechanism. This seems a lot safer to me. - I accordingly renamed this thing from GlobalBarrier to ProcSignalBarrier. This is a lot more arguable, but as of now I favor it. - There details of the synchronization are different: there's no ProcSignalKind for this new kind of barrier any more, and there's now only one pg_memory_barrier() and it's in a different place than Andres had it. It is quite possible that I have screwed things up here, but I couldn't make heads or tails of the way Andres had it. I don't know whether that's because it was a POC or because I'm dumb. Also, moving everything over into ProcSignal seemed to me to suggest some rejiggering here that might've made less sense in the old scheme. - I changed the code that waits for a barrier to use a latch wait, because I think that's our usual convention. So wait-for-barrier gets woken up early if the latch is set, and also itself manipulates the latch. I also gave it a variable-length timeout, because when testing it irked me that the same test randomly took either 0s or 1s. Now it takes either 0s or 125ms, which is a lot less noticeable. - I changed the kind of barrier provided by the patch from "checksum" to "sample". - Otherwise hacked on comments and naming somewhat. This patch probably needs some more changes before commit. One likely candidate: the use of a sample barrier type here probably needs to be replaced with something else. But the way this is set up doesn't really lend itself to starting with 0 types of barrier and then adding them later, so I am not sure exactly what to do to make this patch independent of what follows. The callback proposal I made before could've accommodated that (as well as, perhaps, some automated testing by dynamically loading up a barrier type) but Andres didn't like that. Perhaps he'll relent, or have a counter-proposal, or someone else will feel differently. 0005 is test code that sends and waits for a sample barrier. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, Thanks for the updated version! On 2019-12-02 13:06:24 -0500, Robert Haas wrote: > On Sun, Nov 17, 2019 at 8:38 AM Magnus Hagander <magnus@hagander.net> wrote: > > I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review" neededfor it. So I am also very happy for you to pick it up and run with it. > > OK, here's what I came up with. > > 0001 is just a code movement patch. It puts the interrupt handling for > each type of background process into a subroutine, instead of having > it all inline in the main loop. I feel this makes things more clear. > Hopefully it's uncontroversial. I'd suggest marking the shutdown functions as noreturn. Might prevent some compiler warnings in the future. > 0002 and 0003 are optional and slightly off-topic as far as the matter > at hand, but they are not entirely unrelated and I believe that they > are good cleanups. 0002 changes assorted background processes to use > PostgresSigHupHandler and ConfigReloadPending rather than having a > bunch of separate global variables and a bunch of separate signal > handlers that basically do the same stuff. Hm. Why "Postgres*"? I think that's confusing. So far only backend like things are named like that, and I'd e.g. not call checkpointer that. It's probably fine, but it's worthwhile to consider that that checking an external variable like ConfigReloadPending is more expensive than a static var like got_SIGHUP. The compiler will most of the time understand the code flow to understand when something like got_SIGHUP can change, but it can't assume anything like that with ConfigReloadPending. But as we're forcing the compilers hand, by marking the variable as volatile, so it's not likely to be a meaningful difference. > 0003 takes this a step > further by trying to unify a bunch more error-handling across > different background process types. Together, 0002 and 0003 save >300 > lines of code and as far as I can see there's basically no downside. I > think in general it would be good to strive towards a future where > everybody used the same signal handling and interrupt handling, but > these patches have the much less ambitious goal of just removing > overtly duplicated code. I think they might act as an inducement to > future patch authors to try to make things look more alike, though. I do agree that this is a good goal. The amount of redundant code we have across different types of processes is utterly ridiculous. While obviously not a significant problem, there's also > 0004 is the main patch. It is substantially revised from Andres's > version, but the basic principle is along similar lines. Changes: > > - I felt that it was imprudent to put some of the machinery into > PGPROC and the rest into the ProcSignal mechanism, because there is > nothing that says that everything with a PGPROC must also be a > ProcSignal participant. So I moved everything over into the ProcSignal > mechanism. This seems a lot safer to me. I don't fully remember how I ended up with that split. But I think there is an argument that putting the generation in PGPROC is good: That way there are fewer changes to the shared cachelines in the ProcSignal. As I think there'll be more modifications to barrierGen/barrierFlags (as I'd named them) than the number of times they're read, that could be beneficial in keeping the overhead low. It's probably still worthwhile to go the way you have. > - There details of the synchronization are different: there's no > ProcSignalKind for this new kind of barrier any more This I do not get. What's the point in making this unconditional? > , and there's now > only one pg_memory_barrier() and it's in a different place than Andres > had it. It is quite possible that I have screwed things up here, but I > couldn't make heads or tails of the way Andres had it. I don't know > whether that's because it was a POC or because I'm dumb. Also, moving > everything over into ProcSignal seemed to me to suggest some > rejiggering here that might've made less sense in the old scheme. I don't remember enough to comment usefully on why things were the way they are. > - I changed the code that waits for a barrier to use a latch wait, > because I think that's our usual convention. So wait-for-barrier gets > woken up early if the latch is set, and also itself manipulates the > latch. I also gave it a variable-length timeout, because when testing > it irked me that the same test randomly took either 0s or 1s. Now it > takes either 0s or 125ms, which is a lot less noticeable. Sounds good. > From e9bfcbc3a1b1aa4a01a5a281bc71c6f019f5d46b Mon Sep 17 00:00:00 2001 > From: Robert Haas <rhaas@postgresql.org> > Date: Wed, 27 Nov 2019 11:29:25 -0500 > Subject: [PATCH v2 4/5] Extend the ProcSignal mechanism to support barriers. > +uint64 > +EmitProcSignalBarrier(ProcSignalBarrierType type) > +{ > + uint64 flagbit = UINT64CONST(1) << (uint64) type; > + uint64 generation; > + > + /* > + * Set all the flags. > + * > + * Note that pg_atomic_fetch_or_u32 has full barrier semantics, so this > + * is totally ordered with respect to anything the caller did before, and > + * anything that we do afterwards. (This is also true of the later call > + * to pg_atomic_add_fetch_u64.) > + */ > + for (int i = NumProcSignalSlots - 1; i >= 0; i--) > + { > + volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i]; > + > + pg_atomic_fetch_or_u32(&slot->pss_barrierCheckMask, flagbit); > + } What's the reason for doing this in reverse? SendProcSignal() does it because short circuits the search once a match is found, and it theorizes they're at the end, but that doesn't apply here? It could potentially be worthwhile to check whether all backends already have the flag set, and in that case not increment the barrier generation? Probably not worth it for now. > +/* > + * WaitForProcSignalBarrier - wait until it is guaranteed that all changes > + * requested by a specific call to EmitProcSignalBarrier() have taken effect. > + * > + * We expect that the barrier will normally be absorbed very quickly by other > + * backends, so we start by waiting just 1/8 of a second and then back off > + * by a factor of two every time we time out, to a maximum wait time of > + * 1 second. > + */ > +void > +WaitForProcSignalBarrier(uint64 generation) > +{ > + long timeout = 125L; > + > + for (int i = NumProcSignalSlots - 1; i >= 0; i--) > + { > + volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i]; > + uint64 oldval; > + > + oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration); > + while (oldval < generation) Hm. without a barrier here, is it guaranteed that we can rely on the state we need? IOW, could it be that we see an update to pss_barrierGeneration to our desired generation, but still see "published" state by some other backend that isn't yet in accordance to the the generation? Seems we ought to have at least a read barrier somewhere? > +/* > + * Perform global barrier related interrupt checking. > + * > + * Any backend that participates in ProcSignal signalling must arrange to > + * call this function periodically. It is called from CHECK_FOR_INTERRUPTS(), > + * which is enough for normal backends, but not necessarily for all types of > + * background processes. > + */ > +void > +ProcessProcSignalBarrier(void) > +{ > + uint64 generation; > + uint32 flags; > + > + /* Exit quickly if there's no work to do. */ > + if (!ProcSignalBarrierPending) > + return; > + ProcSignalBarrierPending = false; > + > + /* > + * Read the current barrier generation, and then get the flags that > + * are set for this backend. Note that pg_atomic_exchange_u32 is a full > + * barrier, so we're guaranteed that the read of the barrier generation > + * happens before we atomically extract the flags, and that any subsequent > + * state changes happen afterward. > + */ > + generation = pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration); > + flags = pg_atomic_exchange_u32(&MyProcSignalSlot->pss_barrierCheckMask, 0); > + > + /* > + * Process each type of barrier. It's important that nothing we call from > + * here throws an error, because pss_barrierCheckMask has already been > + * cleared. If we jumped out of here before processing all barrier types, > + * then we'd forget about the need to do so later. > + * > + * NB: It ought to be OK to call the barrier-processing functions > + * unconditionally, but it's more efficient to call only the ones that > + * might need us to do something based on the flags. > + */ > + if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_SAMPLE)) > + ProcessBarrierSample(); Hm. In my mental model it would be useful for barrier "processors" to not acknowledge the state change at certain points. Imagine e.g. needing to efficiently wait till all backends have processed a config file reload - since we don't reload while a query is being processed, we should be able to not ack the barrier at that point. Greetings, Andres Freund
On Fri, Dec 6, 2019 at 12:17 PM Andres Freund <andres@anarazel.de> wrote: > > 0001 is just a code movement patch. It puts the interrupt handling for > > each type of background process into a subroutine, instead of having > > it all inline in the main loop. I feel this makes things more clear. > > Hopefully it's uncontroversial. > > I'd suggest marking the shutdown functions as noreturn. Might prevent > some compiler warnings in the future. That patch only adds one shutdown function, and it's already marked as noreturn, because it prevents a compiler warning in the present. So I don't know what you're asking for here. > > 0002 and 0003 are optional and slightly off-topic as far as the matter > > at hand, but they are not entirely unrelated and I believe that they > > are good cleanups. 0002 changes assorted background processes to use > > PostgresSigHupHandler and ConfigReloadPending rather than having a > > bunch of separate global variables and a bunch of separate signal > > handlers that basically do the same stuff. > > Hm. Why "Postgres*"? I think that's confusing. So far only backend like > things are named like that, and I'd e.g. not call checkpointer that. Well, that's the existing function name, and 0003 changes it anyway. > It's probably fine, but it's worthwhile to consider that that checking an > external variable like ConfigReloadPending is more expensive than a > static var like got_SIGHUP. The compiler will most of the time > understand the code flow to understand when something like got_SIGHUP > can change, but it can't assume anything like that with > ConfigReloadPending. But as we're forcing the compilers hand, by marking > the variable as volatile, so it's not likely to be a meaningful difference. Yeah, I think it's pretty much certain to be in the noise. > > 0003 takes this a step > > further by trying to unify a bunch more error-handling across > > different background process types. Together, 0002 and 0003 save >300 > > lines of code and as far as I can see there's basically no downside. I > > think in general it would be good to strive towards a future where > > everybody used the same signal handling and interrupt handling, but > > these patches have the much less ambitious goal of just removing > > overtly duplicated code. I think they might act as an inducement to > > future patch authors to try to make things look more alike, though. > > I do agree that this is a good goal. The amount of redundant code we > have across different types of processes is utterly ridiculous. While > obviously not a significant problem, there's also You seem to have omitted the end of this sentence. > > 0004 is the main patch. It is substantially revised from Andres's > > version, but the basic principle is along similar lines. Changes: > > > > - I felt that it was imprudent to put some of the machinery into > > PGPROC and the rest into the ProcSignal mechanism, because there is > > nothing that says that everything with a PGPROC must also be a > > ProcSignal participant. So I moved everything over into the ProcSignal > > mechanism. This seems a lot safer to me. > > I don't fully remember how I ended up with that split. But I think there > is an argument that putting the generation in PGPROC is good: That way > there are fewer changes to the shared cachelines in the ProcSignal. As I > think there'll be more modifications to barrierGen/barrierFlags (as I'd > named them) than the number of times they're read, that could be > beneficial in keeping the overhead low. > > It's probably still worthwhile to go the way you have. I think so. I think there's just too much chance for error/frustration in having it in two separate mechanisms that might end up out of sync with each other. We could pad the stuff within ProcSignal separately, if required, but I'm hoping we're not going to be using this for properties that change frequently. > > - There details of the synchronization are different: there's no > > ProcSignalKind for this new kind of barrier any more > > This I do not get. What's the point in making this unconditional? I wasn't sure about this part, but Occam's razor argues for fewer moving parts. As I have it, you need to make sure that the signal handler is going to see our proc's generation as lower than the global generation, and that when we go to process the interrupt (no longer in the signal handler) we're going to be able to see the state changes that we need to absorb. Now, if you also have a ProcSignal thing, then you have three things to synchronize. When you see the flag set, you need to be able to see the generations as unequal, and when you see the generations as unequal, then you need to be able to see the state changes that you need to absorb. That seems more complex, but it might be worth it if the cost of comparing the barrier generations is too much. > > +uint64 > > +EmitProcSignalBarrier(ProcSignalBarrierType type) > > +{ > > + uint64 flagbit = UINT64CONST(1) << (uint64) type; > > + uint64 generation; > > + > > + /* > > + * Set all the flags. > > + * > > + * Note that pg_atomic_fetch_or_u32 has full barrier semantics, so this > > + * is totally ordered with respect to anything the caller did before, and > > + * anything that we do afterwards. (This is also true of the later call > > + * to pg_atomic_add_fetch_u64.) > > + */ > > + for (int i = NumProcSignalSlots - 1; i >= 0; i--) > > + { > > + volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i]; > > + > > + pg_atomic_fetch_or_u32(&slot->pss_barrierCheckMask, flagbit); > > + } > > What's the reason for doing this in reverse? SendProcSignal() does it > because short circuits the search once a match is found, and it > theorizes they're at the end, but that doesn't apply here? Mmm, this was just copy and paste. I can switch it around. > It could potentially be worthwhile to check whether all backends already > have the flag set, and in that case not increment the barrier > generation? Probably not worth it for now. I don't think it's worth it now, or maybe ever. It seems unlikely to come up, and it makes the synchronization more complex. > > +void > > +WaitForProcSignalBarrier(uint64 generation) > > +{ > > + long timeout = 125L; > > + > > + for (int i = NumProcSignalSlots - 1; i >= 0; i--) > > + { > > + volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i]; > > + uint64 oldval; > > + > > + oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration); > > + while (oldval < generation) > > Hm. without a barrier here, is it guaranteed that we can rely on the > state we need? IOW, could it be that we see an update to > pss_barrierGeneration to our desired generation, but still see > "published" state by some other backend that isn't yet in accordance to > the the generation? Seems we ought to have at least a read barrier > somewhere? Yeah, I think now that you mention it it should be a full barrier. The caller might see that the barrier has been absorbed and immediately perform either a read or a write. > Hm. In my mental model it would be useful for barrier "processors" to > not acknowledge the state change at certain points. Imagine e.g. needing > to efficiently wait till all backends have processed a config file > reload - since we don't reload while a query is being processed, we > should be able to not ack the barrier at that point. Yeah, you mentioned this before, but I think we could leave it for later. I think it's going to be complex. I suppose the way to do it would be to add an argument to ProcessProcSignalBarrier() that lets you specify which barrier events you're OK with processing from the current point in the code. However, that will mean that whenever somebody adds a new barrier type, they have got to go through all of those callers and think about whether they should add their new barrier type into the flags or not. If we try to do the specific thing you're talking about with config-file processing, it will also mean that we could be waiting MUCH longer for acknowledgements, which I think might have pretty far-reaching ramifications. You might get stuck waiting for that barrier to be absorbed for hours, and that would also impact later barriers, since you won't see the generation bump to 42 until it goes through 40 and 41. That sounds fairly awful. You could try to work around it by looking at which barrier flags are set, but I think that has ABA problems. It seems sufficient to me for now to come up with something that can be used for enabling checksums online and for the thing that got me motivated to work on this, which is ALTER SYSTEM READ ONLY. If we can meet the requirements of those two patches with this, I think we should be happy. And I don't think we need the feature you're talking about here for either of those use cases. If there's some simple change you want to propose, we can certainly talk about that, but it looks to me like you're talking about adding a fairly complex mechanism that wouldn't actually get used for anything right away, and I'd rather not go there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
> On 9 Dec 2019, at 16:42, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 6, 2019 at 12:17 PM Andres Freund <andres@anarazel.de> wrote: I've read through the patchset and played around with it to try and break it and understand it (not in that order). Being a bit out of my comfort zone, I can't offer the deep insights that Andres has done; but in reading the code it all makes sense, and it works as I expect it to. >> Hm. Why "Postgres*"? I think that's confusing. So far only backend like >> things are named like that, and I'd e.g. not call checkpointer that. > > Well, that's the existing function name, and 0003 changes it anyway. Reading the patches in sequence I was irked by Postgres* as well, and was going to suggest PostgresProcess*, but since 0003 changes it immediately it's less of an issue. >>> 0003 takes this a step >>> further by trying to unify a bunch more error-handling across >>> different background process types. Together, 0002 and 0003 save >300 >>> lines of code and as far as I can see there's basically no downside. Agreed, I cannot see any negative impacts. The cleanup of common code makes the remaining process code more readable so +1. The patch has a tiny typo though, s/procesess/processes/: + * Typically, this handler would be used for SIGTERM, but some procesess use > This patch probably needs some more changes before commit. One likely > candidate: the use of a sample barrier type here probably needs to be > replaced with something else. But the way this is set up doesn't > really lend itself to starting with 0 types of barrier and then adding > them later, so I am not sure exactly what to do to make this patch > independent of what follows. Something which can be observed in order to hook up a test for it, but which has no sideeffects? A NOOP barrier which only does a debug elog? > The callback proposal I made before > could've accommodated that (as well as, perhaps, some automated > testing by dynamically loading up a barrier type) but Andres didn't > like that. Perhaps he'll relent, or have a counter-proposal, or > someone else will feel differently. I sort of like the callback idea conceptually, but Andres is making a good point about the extensibility actually making it harder to reason about. > It seems sufficient to me for now to come up with something that can > be used for enabling checksums online FWIW I've rebased the online checksums patch on top of this patchset instead of Andres' previous patch and everything seems to work fine. All tests pass, and I've been unable to trigger anything unexpected. I'll post the new version of said patch shortly. > and for the thing that got me > motivated to work on this, which is ALTER SYSTEM READ ONLY. Aha, thats a pretty neat usecase. In order to play around with and try to understand the patchset I wrote a PoC PGC_PROCSIGNALBARRIER class for GUCs and converted a few PGC_SIGHUP GUCs to use a barrier rather than a HUP. The idea was to test interacting with the API a bit more than the online checksum patch does, as its needs are pretty basic. In hacking this up I didn't really come across anything in particular that I lacked, and the result worked fine (insofar as a PoC can be considered working). cheers ./daniel
On Mon, Dec 9, 2019 at 7:37 PM Daniel Gustafsson <daniel@yesql.se> wrote: > I've read through the patchset and played around with it to try and break it > and understand it (not in that order). Being a bit out of my comfort zone, I > can't offer the deep insights that Andres has done; but in reading the code it > all makes sense, and it works as I expect it to. Stellar. If nobody objects in the meantime, I plan to commit 0001-0003 next week. > The patch has a tiny typo though, s/procesess/processes/: But I'll fix that first. Thanks for the review. > Something which can be observed in order to hook up a test for it, but which > has no sideeffects? A NOOP barrier which only does a debug elog? It would be a bit hard to hook up a test for that, because a normal test doesn't see the logs, and a TAP test could, but in this case what you want to check for is that there are no stragglers who should have gotten the memo and didn't, which supposes you have a list of everyone who ought to get the memo. You could manually list out all the background processes but that seems like it would be at risk of becoming obsolete. You could check that everything in pg_stat_activity emitted a message except for a known list of exceptions, which would be less likely to become obsolete, but would also be complex. Also, I think it's a crock to have something like that in there long term just for testing purposes. > I sort of like the callback idea conceptually, but Andres is making a good > point about the extensibility actually making it harder to reason about. That objection doesn't hold any water for me, because this is open source. People can always patch the core. If we don't add hooks, then we make life easier for fork maintainers (including my employer) and harder for extension authors. I think that's *definitely* the wrong bet for the community to be making; we should be trying very hard to help extension authors and minimize the need for forks. If you install an extension that uses a hook, any hook, and it breaks things, then you get to keep both pieces. And that approach would let us do spiffy testing without having to leave cruft in core. > FWIW I've rebased the online checksums patch on top of this patchset instead of > Andres' previous patch and everything seems to work fine. All tests pass, and > I've been unable to trigger anything unexpected. I'll post the new version of > said patch shortly. Cool. > > and for the thing that got me > > motivated to work on this, which is ALTER SYSTEM READ ONLY. > > Aha, thats a pretty neat usecase. Thanks. > In order to play around with and try to understand the patchset I wrote a PoC > PGC_PROCSIGNALBARRIER class for GUCs and converted a few PGC_SIGHUP GUCs to use > a barrier rather than a HUP. The idea was to test interacting with the API a > bit more than the online checksum patch does, as its needs are pretty basic. > In hacking this up I didn't really come across anything in particular that I > lacked, and the result worked fine (insofar as a PoC can be considered > working). Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-12-11 09:12:49 -0500, Robert Haas wrote: > On Mon, Dec 9, 2019 at 7:37 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > I sort of like the callback idea conceptually, but Andres is making a good > > point about the extensibility actually making it harder to reason about. > > That objection doesn't hold any water for me, because this is open > source. People can always patch the core. I just don't buy this argument. There's a difference in between an unpatched version of postgres suddenly potentially running hooks everywhere CFI() etc is called, and some user patching postgres to behave differently. In the former case we'll have to ask to reproduce problems without extension in a lot more cases. For me code like this that runs in pretty low level situations that we've gotten wrong more than once, doesn't benefit by being extensible. We just make things more fragile, and provide traps for extension authors. > If we don't add hooks, then we make life easier for fork maintainers > (including my employer) and harder for extension authors. I think > that's *definitely* the wrong bet for the community to be making; we > should be trying very hard to help extension authors and minimize the > need for forks. If you install an extension that uses a hook, any > hook, and it breaks things, then you get to keep both pieces. But that's just not how it ends up working in a lot of cases? People still report bugs to the list, and the debugging experience of problems where an extension causes crashes-at-a-distance is pretty bad. Greetings, Andres Freund
On Wed, Dec 11, 2019 at 12:38 PM Andres Freund <andres@anarazel.de> wrote: > I just don't buy this argument. There's a difference in between an > unpatched version of postgres suddenly potentially running hooks > everywhere CFI() etc is called, and some user patching postgres to > behave differently. In the former case we'll have to ask to reproduce > problems without extension in a lot more cases. For me code like this > that runs in pretty low level situations that we've gotten wrong more > than once, doesn't benefit by being extensible. We just make things more > fragile, and provide traps for extension authors. It seems to me that this amounts to an argument that (a) core developers are smarter than extension developers, and are thus the only ones entitled to play with sharp tools, and/or (b) core developers are more important than extension developers, and thus inconveniencing extension developers is OK if it makes life better for core developers. Both propositions seem pretty arrogant to me. > But that's just not how it ends up working in a lot of cases? People > still report bugs to the list, and the debugging experience of problems > where an extension causes crashes-at-a-distance is pretty bad. I agree that crashes at a distance are bad, and that those caused by extensions are more annoying than those caused by core code, because in the latter case there is a silent contributor to the problem about which we may have little information. However, I disagree with the idea that the right solution is to try to lock things down so that extension authors can't do stuff. Extensibility is an important part of what has made PostgreSQL successful, and I think we need more of it, not less. We can to some extent mitigate these kinds of problems by adding assertions to our code that will catch usage errors, which has the advantage that those things also get caught when somebody accidentally introduces such bugs into core. To the extent that we can't mitigate them, I think we should live with them, because I think extensibility is far too valuable to cast aside over such concerns. While I have passionate philosophical feelings about this topic, for purposes of the present thread the really important question (IMV, anyway) is whether there's any way of getting a patch for global barriers committed in advance of the first user of such barriers. Both your version and mine add an enum, and I think that you can't have an enum with no elements. If you have a suggestion as to how we can structure things so that we can start out with 0 users of this mechanism rather than needing to start out with 1, I'd like to hear them. If not, then I guess we'll need to decide which of checksum enable/disable and ALTER SYSTEM READ ONLY is going to go first and commit this only when that patch is ready to go as well. Or, I suppose, commit it with a dummy placeholder that then gets replaced by whichever patch goes first, but I'm not sure whether people would be OK with that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-12-11 13:35:26 -0500, Robert Haas wrote: > While I have passionate philosophical feelings about this topic, for > purposes of the present thread the really important question (IMV, > anyway) is whether there's any way of getting a patch for global > barriers committed in advance of the first user of such barriers. Right. I think there is. > If not, then I guess we'll need to decide which of checksum > enable/disable and ALTER SYSTEM READ ONLY is going to go first and > commit this only when that patch is ready to go as well. Or, I > suppose, commit it with a dummy placeholder that then gets replaced by > whichever patch goes first, but I'm not sure whether people would be > OK with that. I'd either add a test (if we have some) or placeholder kind initially. But I'd also be ok with going for either of the other versions directly - but it seems harder to tackle the patches together. Greetings, Andres Freund
On Thu, Dec 12, 2019 at 2:54 PM Andres Freund <andres@anarazel.de> wrote: > I'd either add a test (if we have some) or placeholder kind > initially. But I'd also be ok with going for either of the other > versions directly - but it seems harder to tackle the patches together. OK. I have committed 0001-0003 as I had mentioned last week that I intended to do. For 0004, I have replaced "sample" with "placeholder" and added comments explaining that this is intended to be replaced by the first real user of the mechanism. If there are no further comments/objections I'll go ahead with this one as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hello > Stellar. If nobody objects in the meantime, I plan to commit 0001-0003 > next week. My compiler (gcc 8.3.0) is not happy with recent 5910d6c7e311f0b14e3d3cb9ce3597c01d3a3cde commit: autovacuum.c:831:1: error: ‘AutoVacLauncherShutdown’ was used with no prototype before its definition [-Werror=missing-prototypes] checkpointer.c:524:1: error: ‘HandleCheckpointerInterrupts’ was used with no prototype before its definition [-Werror=missing-prototypes] I think definition should looks as in attached patch. With this change build is clean regards, Sergei
Attachment
On Tue, Dec 17, 2019 at 1:44 PM Sergei Kornilov <sk@zsrv.org> wrote: > > Stellar. If nobody objects in the meantime, I plan to commit 0001-0003 > > next week. > > My compiler (gcc 8.3.0) is not happy with recent 5910d6c7e311f0b14e3d3cb9ce3597c01d3a3cde commit: > > autovacuum.c:831:1: error: ‘AutoVacLauncherShutdown’ was used with no prototype before its definition [-Werror=missing-prototypes] > checkpointer.c:524:1: error: ‘HandleCheckpointerInterrupts’ was used with no prototype before its definition [-Werror=missing-prototypes] > > I think definition should looks as in attached patch. With this change build is clean Andrew Gierth complained about this too over on -committers, and I saw his message first and pushed a fix. It includes the first and third hunks from your proposed patch, but not the second one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi > Andrew Gierth complained about this too over on -committers, and I saw > his message first and pushed a fix. It includes the first and third > hunks from your proposed patch, but not the second one. Yep, I received his email just after sending mine. Thanks, my build is clean now. regards, Sergei
On Tue, Dec 17, 2019 at 1:38 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 12, 2019 at 2:54 PM Andres Freund <andres@anarazel.de> wrote: > > I'd either add a test (if we have some) or placeholder kind > > initially. But I'd also be ok with going for either of the other > > versions directly - but it seems harder to tackle the patches together. > > OK. I have committed 0001-0003 as I had mentioned last week that I > intended to do. For 0004, I have replaced "sample" with "placeholder" > and added comments explaining that this is intended to be replaced by > the first real user of the mechanism. If there are no further > comments/objections I'll go ahead with this one as well. And, done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 19, 2019 at 8:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 17, 2019 at 1:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 12, 2019 at 2:54 PM Andres Freund <andres@anarazel.de> wrote:
> > I'd either add a test (if we have some) or placeholder kind
> > initially. But I'd also be ok with going for either of the other
> > versions directly - but it seems harder to tackle the patches together.
>
> OK. I have committed 0001-0003 as I had mentioned last week that I
> intended to do. For 0004, I have replaced "sample" with "placeholder"
> and added comments explaining that this is intended to be replaced by
> the first real user of the mechanism. If there are no further
> comments/objections I'll go ahead with this one as well.
And, done.
\o/
//Magnus
On Mon, Dec 9, 2019 at 10:42 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Hm. In my mental model it would be useful for barrier "processors" to > > not acknowledge the state change at certain points. Imagine e.g. needing > > to efficiently wait till all backends have processed a config file > > reload - since we don't reload while a query is being processed, we > > should be able to not ack the barrier at that point. > > Yeah, you mentioned this before, but I think we could leave it for > later. I think it's going to be complex. I suppose the way to do it > would be to add an argument to ProcessProcSignalBarrier() that lets > you specify which barrier events you're OK with processing from the > current point in the code. However, that will mean that whenever > somebody adds a new barrier type, they have got to go through all of > those callers and think about whether they should add their new > barrier type into the flags or not. If we try to do the specific thing > you're talking about with config-file processing, it will also mean > that we could be waiting MUCH longer for acknowledgements, which I > think might have pretty far-reaching ramifications. You might get > stuck waiting for that barrier to be absorbed for hours, and that > would also impact later barriers, since you won't see the generation > bump to 42 until it goes through 40 and 41. That sounds fairly awful. > You could try to work around it by looking at which barrier flags are > set, but I think that has ABA problems. I might have rejected this idea too hastily. At any rate, I have a new view on it having studied the issue a bit more. In the case of ALTER SYSTEM READ ONLY, the big problem is that we can't afford to find out that we're unable to emit WAL after we've already entered a critical section, because "ERROR: system is read only" or similar will get promoted to PANIC due to the critical section and that will be sad. (Before someone asks, not promoting the ERROR to PANIC would be unsafe, even for this specific case.) Some WAL records don't use a critical section, though it's recently been proposed[1] that we add a critical section in at least some of the cases that currently lack one. For those, I think we can just upgrade the existing test-and-elog cases in XLogInsert() and XLogBeginInsert() to ereport() and call it good. But the cases that do have a critical section are harder. Currently, I see only the following two options: (1) Check just before entering a critical section whether or not the system is read only, and if so, error out. Once we've entered a critical section, ProcessInterrupts() will not do anything, so at that point we're safe. This could be done either by making START_CRIT_SECTION() itself do it or by finding every critical section that is used for inserting WAL and adding a call just beforehand. The latter is probably better, as some critical sections appear to be used for other purposes; see PGSTAT_BEGIN_WRITE_ACTIVITY in particular. But it means changing things in a bunch of places, adding an if-test. That wouldn't add *much* overhead, but it's not quite zero. (2) Accept barrier events for read-only/read-write changes only at command boundaries. In that case, we only need to check for a read only state around the places where we currently do PreventCommandIfReadOnly and similar, and the additional overhead should be basically nil. However, this seems pretty unappealing to me, because it means that if you try to make the system READ ONLY, it might run for hours before actually going read only. If you're trying to make the system read only because the master has become isolated from the rest of your network, that's not a fast enough response. In general, I think the problem here is to avoid TOCTTOU problems. I think the checksums patch has this problem too. For instance, the changes to basebackup.c in that function check DataChecksumsNeedVerify() only once per file, but what is to say that a call to ProcessInterrupts() couldn't happen in the middle of sending a file, changing prevailing value? If someone sets checksums off, and everyone acknowledges that they're off, then backends may begin writing blocks without setting checksums, and then this code will potentially try to verify checksums in a block written without them. That patch also modifies the definition of XLogHintBitIsNeeded(), so e.g. log_heap_visible is wrong if a CHECK_FOR_INTERRUPTS() can happen in XLogRegisterBuffer(). I don't think it can currently, but it seems like a heck of a fragile assumption, and I don't see how we can be sure that there's no test for XLogHintBitIsNeeded() that does CHECK_FOR_INTERRUPTS() between where it's tested and where it does something critical with the result of that test. At the moment, the ALTER SYSTEM READ ONLY problem appears to me to be the lesser of the two (although I may be wrong). Assuming people are OK with inserting an additional check before each xlog-writing critical section, we can just go do that and then I think the problem is pretty much solved. The only way there would be a residual problem, I think, is if something could change between the time XLogInsert() checks whether xlog writing is allowed and the time when it does the insert. Even if that's an issue, it can be fixed with a HOLD_INTERRUPTS/RESUME_INTERRUPTS in just that function. The meaning of READ ONLY is just that xlog insertion is prohibited, so the only use of the value is for allowing XLogInsert() to go ahead or not. But for checksums, there's not such a clear definition of what it means to use the value, nor does it get used in only one place, so the situation seems less clear. In short, I am not still not sure that we need the facility Andres was asking for here, but I do now understand better why Andres was worried about having this capability, and I do think that patches using it are going to need to be very carefully studied for TOCTTOU problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://postgr.es/m/20191207001232.klidxnm756wqxvwx@alap3.anarazel.de