Re: global / super barriers (for checksums) - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: global / super barriers (for checksums)
Date
Msg-id CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com
Whole thread Raw
In response to global / super barriers (for checksums)  (Andres Freund <andres@anarazel.de>)
Responses Re: global / super barriers (for checksums)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

Magnus cornered me at pgconf.eu and asked me whether I could prototype
the "barriers" I'd been talking about in the online checksumming thread.

The problem there was to make sure that all processes, backends and
auxiliary processes have seen the new state of checksums being enabled,
and aren't currently in the process of writing a new page out.

The current prototype solves that by requiring a restart, but that
strikes me as a far too large hammer.

The attached patch introduces "global barriers" (name was invented in a
overcrowded hotel lounge, so ...), which allow to wait for such a change
to be absorbed by all backends.

I've only tested the code with gdb, but that seems to work:

p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))

waits until all backends (including bgwriter, checkpointers, walwriters,
bgworkers, ...) have accepted interrupts at least once.  Multiple such
requests are coalesced.

I decided to wait until interrupts are actually process, rather than
just the signal received, because that means the system is in a well
defined state. E.g. there's no pages currently being written out.

For the checksum enablement patch you'd do something like;

EnableChecksumsInShmemWithLock();
WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));

and after that you should be able to set it to a perstistent mode.


I chose to use procsignals to send the signals, a global uint64
globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
latter keeping track which barriers have been requested. There likely
seem to be other usecases.


The patch definitely is in a prototype stage. At the very least it needs
a high-level comment somewhere, and some of the lower-level code needs
to be cleaned up.

One thing I wasn't happy about is how checksum internals have to absorb
barrier requests - that seems unavoidable, but I'd hope for something
more global than just BufferSync().


Comments?



Finally getting back to this one.

In re-reading this, I notice there are a lot of references to Intterrupt (with two t). I'm guessing this is just a spelling error, and not something that actually conveys some meaning?

Can you elaborate on what you mean with:
+       /* XXX: need a more principled approach here */

Is that the thing you refer to above about "checksum internals"?

Also in checking we figured it'd be nice to have a wait event for this, since a process can potentially get stuck in an infinite loop waiting for some other process if it's misbehaving. Kind of like the attached?

//Magnus
 
Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: progress report for ANALYZE
Next
From: Alvaro Herrera
Date:
Subject: Re: Contribution to Perldoc for TestLib module in Postgres