Re: Postgres-R: internal messaging - Mailing list pgsql-hackers

From Markus Wanner
Subject Re: Postgres-R: internal messaging
Date
Msg-id 48870883.9040807@bluegap.ch
Whole thread Raw
In response to Re: Postgres-R: internal messaging  (Alexey Klyukin <alexk@commandprompt.com>)
List pgsql-hackers
Hi Alexey,

thanks for your feedback, these are interesting points.

Alexey Klyukin wrote:
> In Replicator we avoided the need for postmaster to read/write backend's
> shmem data by using it as a signal forwarder. When a backend wants to
> inform a special process (i.e. queue monitor) about replication-related
> event (such as commit) it sends SIGUSR1 to Postmaster with a related
> "reason" flag and the postmaster upon receiving this signal forwards
> it to the destination process. Termination of backends and special
> processes are handled by the postmaster itself.

Hm.. how about larger data chunks, like change sets? In Postgres-R, 
those need to travel between the backends and the replication manager, 
which then sends it to the GCS.

> Hm...what would happen with the new data under heavy load when the queue 
> would eventually be filled with messages, the relevant transactions
> would be aborted or they would wait for the manager to release the queue
> space occupied by already processed messages? ISTM that having a fixed
> size buffer limits the maximum transaction rate.

That's why the replication manager is a very simple forwarder, which 
does not block messages, but consumes them immediately from shared 
memory. It already features a message cache, which holds messages it 
cannot currently forward to a backend, because all backends are busy.

And it takes care to only send change sets to helper backend which are 
not busy and can consume the process the remote transaction immediately. 
That way, I don't think the limit on shared memory is the bottleneck. 
However, I didn't measure.

WRT waiting vs aborting: I think at the moment I don't handle this 
situation gracefully. I've never encountered it. ;-)  But I think the 
simpler option is letting the sender wait until there is enough room in 
the queue for its message. To avoid deadlocks, each process should 
consume its messages, before trying to send one. (Which is done 
correctly only for the replication manager ATM, not for the backends, IIRC).

> What about keeping the per-process message queue in the local memory of
> the process, and exporting only the queue head to the shmem, thus having
> only one message per-process there.

The replication manager already does that with its cache. No other 
process needs to send (large enough) messages which cannot be consumed 
immediately. So such a local cache does not make much sense for any 
other process.

Even for the replication manager, I find it dubious to require such a 
cache, because it introduces an unnecessary copying of data within memory.

> When the queue manager gets a
> message from the process it may signal that process to copy the next
> message from the process local memory into the shmem. To keep a
> correct ordering of queue messages an additional shared memory queue of
> pid_t can be maintained, containing one pid per each message.

The replication manager takes care of the ordering for cached messages.

Regards

Markus Wanner



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: [PATCH] "\ef " in psql
Next
From: Simon Riggs
Date:
Subject: Re: [PATCHES] odd output in restore mode