On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Magnus cornered me at pgconf.eu and asked me whether I could prototype the "barriers" I'd been talking about in the online checksumming thread.
The problem there was to make sure that all processes, backends and auxiliary processes have seen the new state of checksums being enabled, and aren't currently in the process of writing a new page out.
The current prototype solves that by requiring a restart, but that strikes me as a far too large hammer.
The attached patch introduces "global barriers" (name was invented in a overcrowded hotel lounge, so ...), which allow to wait for such a change to be absorbed by all backends.
I've only tested the code with gdb, but that seems to work:
p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))
waits until all backends (including bgwriter, checkpointers, walwriters, bgworkers, ...) have accepted interrupts at least once. Multiple such requests are coalesced.
I decided to wait until interrupts are actually process, rather than just the signal received, because that means the system is in a well defined state. E.g. there's no pages currently being written out.
For the checksum enablement patch you'd do something like;
and after that you should be able to set it to a perstistent mode.
I chose to use procsignals to send the signals, a global uint64 globalBarrierGen, and per-backend barrierGen, barrierFlags, with the latter keeping track which barriers have been requested. There likely seem to be other usecases.
The patch definitely is in a prototype stage. At the very least it needs a high-level comment somewhere, and some of the lower-level code needs to be cleaned up.
One thing I wasn't happy about is how checksum internals have to absorb barrier requests - that seems unavoidable, but I'd hope for something more global than just BufferSync().
Comments?
Finally getting around to playing with this one and it unfortunately doesn't apply anymore (0003).
I think it's just a matter of adding those two rows though, right? That is, it's not an actual conflict it's just something else added in the same place?