Re: global / super barriers (for checksums) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: global / super barriers (for checksums)
Date
Msg-id 20191113194458.4me5a7dsspdldtwh@alap3.anarazel.de
Whole thread Raw
In response to Re: global / super barriers (for checksums)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: global / super barriers (for checksums)  (Robert Haas <robertmhaas@gmail.com>)
Re: global / super barriers (for checksums)  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
Hi,

On 2019-11-13 12:26:34 -0500, Robert Haas wrote:
> TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I
> have some concerns about 0003 and am interested in working further on
> it.

Thanks for looking at the patch!


> 0001 changes the StartBackgroundWorker so that the SIGINT handler is
> contingent on BGWORKER_SHMEM_ACCESS rather than
> BGWORKER_BACKEND_DATABASE_CONNECTION. It seems to me that the goal
> here should be to use procsignal_sigusr1_handler(), or something that
> calls it, in any process where ProcSignalInit() is called, but a
> backend that only requests BGWORKER_SHMEM_ACCESS probably won't,
> because the normal way for a background worker to call
> ProcSignalInit() would be to indirectly call InitPostgres() by means
> of BackgroundWorkerInitializeConnection() or
> BackgroundWorkerInitializeConnectionByOid(), and a worker that didn't
> ask for a database connection presumably shouldn't be doing that. So
> I'm not sure that I understand why we need this. (Is it legal for a
> worker to call one of these functions as long as it passes InvalidOid
> for the database OID, or something? Do we have examples of workers
> that do that?)

Hm. Well, it seems useful for non-database connected processes to be
able to partake in global barriers. Without that, if there's any chance
they could e.g. generate WAL, it'd e.g. break the checksum enablement
patch. Note that auxiliary processes already do call ProcSignalInit().

You're right that we ought to make it easier (or automatic) to call
ProcSignalInit() for such processes.  Perhaps we ought to do so in
ProcessInit()?

But perhaps we don't strictly need this - I'm not sure how many examples
of BGWORKER_SHMEM_ACCESS bgworkers that don't also use
BGWORKER_BACKEND_DATABASE_CONNECTION there are.



> Regarding 0003:
> 
> - The proc->pid == 0 check in EmitGlobalBarrier() doesn't really do
> what it appears to do, because regular backends don't clear proc->pid
> when they exit; only auxiliary processes do. (We could, and perhaps
> should, change that.)

Ick.


> - It seems  to me that it would be generally better to insert
> CHECK_FOR_INTERRUPTS() in places where the patch just inserts an
> ad-hoc if (GlobalBarrierInterruptPending)
> ProcessGlobalBarrierIntterupt(). There might be someplace where we
> have to do it the latter way because we unavoidably hold an LWLock or
> something, but I think we should avoid that whenever possible. If it's
> a good place to check for one kind of interrupt, it's probably a good
> place to check for all of them.

I might be missing something. Aren't all of the places where those
checks are places where we currently can't do a CHECK_FOR_INTERRUPTS()?
I've swapped this thoroughly out of my mind, but going through them:

1) AutoVacLauncherMain() - doesn't do CFI()
2) BackgroundWriterMain() - dito
3) CheckpointerMain() - dito
4) HandleStartupProcInterrupts() - dito
5) WalWriterMain() - dito
6) BufferSync() - dito, called from CheckpointerMain(), and startup process
7) ProcessClientWriteInterrupt() - can't do generic CFI, don't want to
   process all interrupts while writing out data, to avoid corrupting
   the output stream or loosing messages

Which one do you think we should convert to CFI()? As far as I can tell
we can't make the non-backend cases use the postgres.c
ProcessInterrupts(), and the ProcessClientWriteInterrupt() one can't do
so either.


> - I don't think BufferSync() is doing this in the right place. Unless
> I am confused, it's doing it inside a loop that just allocates stuff
> and copies data around, which probably does not need this kind of
> decoration. It wouldn't hurt to check here, and it might be a good
> idea for safety, but I think the place that really needs this
> treatment is the following loop where we are actually doing I/O and
> sleeping. Possibly we should even be doctoring CheckpointWriteDelay to
> use a latch-wait loop that can CHECK_FOR_INTERRUPTS() before
> re-sleeping, although for 100ms (curiously described as a Big Sleep)
> it might be overkill.

Yea. Not sure what happened there. I think it's good to have such a
check in all the buffer loops in BufferSync(), but clearly it's most
important to have one in the write case.


> - I think it would be nice to make this system more extensible. IIUC,
> the idea is that ProcessGlobalBarrierIntterupt() will call bespoke
> code for each GLOBBAR_* flag that set, and that code will then update
> process-local state from global state before returning. But, instead
> of hard-coding constants, how about just having a list of callbacks,
> and we call them all here, and each one is responsible for figuring
> out whether anything's been changed that it cares about, and if so
> updating the appropriate local state?

I don't think that's a good idea. This stuff happens at an extremely low
level, in different types of processes, with various locks held
etc. Running arbitrary code in those circumstances strikes me as a
seriously bad idea.


> Then GlobalBarrierKind goes away altogether. Granted, this could be a
> bit expensive if we were using global barriers for lots of different
> things and at least some of those things occurred frequently, but I
> don't think we're very near the point where we need that kind of
> optimization. As long as the callbacks have tests that exit quickly if
> there's nothing to do, I think it should be fine. (As an intermediate
> position, we could consider keeping barrierFlags but assign the bits
> dynamically; but unless and until we get more users of the facility, I
> think it might be simplest to just forget about barrierFlags
> altogether. Then even extensions can use this, if they want.)

I think we need the option to *not* "accept" the barrier (as in, not
update MyProc->barrierGen), because we cannot guarantee the option has
taken effect yet. Without something identifying which types of events
being waited for, that seems hard.


> - The patch needs some general tidying up, like comments and naming
> consistency and stuff like that.

Yea. It really was a prototype to allow Magnus to continue...


> Andres, Magnus, if neither of you are planning to work on this soon, I
> might like to jump in and run with this. Please let me know your
> thoughts.

I'm not planning to do so in the near term - so I'd more than welcome
you to do so.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Next
From: Andrew Dunstan
Date:
Subject: Re: ssl passphrase callback