Re: Postgres-R: internal messaging - Mailing list pgsql-hackers

From Markus Wanner
Subject Re: Postgres-R: internal messaging
Date
Msg-id 4887940C.4090305@bluegap.ch
Whole thread Raw
In response to Re: Postgres-R: internal messaging  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Postgres-R: internal messaging  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

what follows are some comments after trying to understand how the 
autovacuum launcher works and thoughts on how to apply this to the 
replication manager in Postgres-R.

The initial comments in autovacuum.c say:

> If the fork() call fails in the postmaster, it sets a flag in the shared
> memory area, and sends a signal to the launcher.  

I note that the shmem area that the postmaster is writing to is pretty 
static and not dependent on any other state stored in shmem. That 
certainly makes a difference compared to my imessages approach, where a 
corruption in the shmem for imessages could also confuse the postmaster.

Reading on, the 'can_launch' flag in the launcher's main loop makes sure 
that only one worker is requested concurrently, so that the launcher 
doesn't miss a failure or success notice from either the postmaster or 
the newly started worker. The replication manager currently shamelessly 
requests as many helper backend as it wants. I think I can change that 
without much trouble. Would certainly make sense.

Notifications of the replication manager after termination or crashes of 
a helper backend remain. Upon normal errors (i.e. elog(ERROR... ), the 
backend processes themselves should take care of notifying the 
replication manager. But crashes are more difficult. IMO the replication 
manager needs to stay alive during this reinitialization, to keep the 
GCS connection. However, it can easily detach from shared memory 
temporarily (the imessages stuff is the only shmem place it touches, 
IIRC). However, a more difficult aspect is: it must be able to tell if a 
backend has applied its transaction *before* it died or not. Thus, after 
all backends have been killed, the postmaster needs to wait with 
reinitializing shared memory, until the replication manager has consumed 
all its messages. (Otherwise we would risk "losing" local transactions, 
probably also remote ones).

So, yes, after thinking about it, detaching the postmaster from shared 
memory seems doable for Postgres-R (in the sense of "the postmaster does 
not rely on possibly corrupted data in shared memory"). Reinitialization 
needs some more thoughts, but in general that seems like the way to go.

Regards

Markus Wanner



pgsql-hackers by date:

Previous
From: "Dann Corbit"
Date:
Subject: Re: Research/Implementation of Nested Loop Join optimization
Next
From: Oleg Bartunov
Date:
Subject: Re: [GENERAL] Fragments in tsearch2 headline