Thread: Synch Rep: communication between backends and walsender

Synch Rep: communication between backends and walsender

From
Fujii Masao
Date:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00448.php

One of the major complaints about the current synch rep patch is that
signals are used for communication between backends and walsender.
On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
system call), which would increase the performance overhead of
replication.

So I'd like to propose using the UDP socket and the semaphores
instead of signals for communication from backends to walsender
and vice versa, respectively.

The UDP socket is used for backends to request walsender to send
WAL records. Semaphores cannot be used for this purpose because
walsender must wait for the request from backends and the reply from
the standby server concurrently. Some UDP packets might get lost,
but that doesn't matter because the important data is communicated
via the shared memory and walsender wakes up periodically without
receiving that request. This UDP socket can be created like that for
statistics collector.

On the other hand, the semaphores are used for backends to wait
for the reply from walsender. The backend registers its semaphore
on the shared memory before sleeping, then walsender wakes it up
by using that semaphore.

Comments? Do you have another better approach?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep: communication between backends and walsender

From
Greg Stark
Date:
On Tue, Jun 16, 2009 at 12:50 PM, Fujii Masao<masao.fujii@gmail.com> wrote:
> On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
> system call)

say what?

-- 
Gregory Stark
http://mit.edu/~gsstark/resume.pdf


Re: Synch Rep: communication between backends and walsender

From
Tom Lane
Date:
Greg Stark <stark@enterprisedb.com> writes:
> On Tue, Jun 16, 2009 at 12:50 PM, Fujii Masao<masao.fujii@gmail.com> wrote:
>> On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
>> system call)

> say what?

Yup, what he said.
        regards, tom lane


Re: Synch Rep: communication between backends and walsender

From
Markus Wanner
Date:
Hi,

Fujii Masao wrote:
> One of the major complaints about the current synch rep patch is that
> signals are used for communication between backends and walsender.
> On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
> system call), which would increase the performance overhead of
> replication.

Reading the past messages on this topic, I realized that this problem so
far only affects HPUX. I fear the proposed UDP/semaphores approach might
have a similar gotcha on at least one of the supported platforms, too.
Limits of open file descriptors come to mind, for example. Or kernel
packet filtering rules, as mentioned in pgstat.c.

If I understand correctly, even Postgres itself suffers from that
problem on HPUX (even though the consequences aren't dramatic, as
pointed out by Tom). Plus we are not completely save from syscalls
returning EINTR due to SA_RESTART not being set for SIGALRM.

So, does it really make sense to take care of this issue as part of the
sync rep patch?

Regards

Markus Wanner


Re: Synch Rep: communication between backends and walsender

From
Fujii Masao
Date:
Hi,

On Sat, Jun 20, 2009 at 6:05 PM, Markus Wanner<markus@bluegap.ch> wrote:
> Reading the past messages on this topic, I realized that this problem so
> far only affects HPUX. I fear the proposed UDP/semaphores approach might
> have a similar gotcha on at least one of the supported platforms, too.
> Limits of open file descriptors come to mind, for example. Or kernel
> packet filtering rules, as mentioned in pgstat.c.

You're right. The UDP approach at least would cause another dissatisfaction,
as you illustrated.

> If I understand correctly, even Postgres itself suffers from that
> problem on HPUX (even though the consequences aren't dramatic, as
> pointed out by Tom). Plus we are not completely save from syscalls
> returning EINTR due to SA_RESTART not being set for SIGALRM.
>
> So, does it really make sense to take care of this issue as part of the
> sync rep patch?

The perfect solution seems to be to remove SA_RESTART and handle EINTR
in an appropriate way after every syscalls. But, this is very tough job and has
much influence on whole source code, so I don't think that this should be done
as part of synch rep.

On the other hand, I think that a semaphore should be used instead of a signal
at least for backends to wait for walsender. This would turn around
the situation
on HPUX in some degree. In this case, the remaining problem is that walsender
cannot immediately wake up. But, since walsender wakes up periodically and
that period can be tweaked (by new GUC wal_sender_delay), the overhead on
HPUX might not be actually so big.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep: communication between backends and walsender

From
Markus Wanner
Date:
Hi,

Fujii Masao wrote:
> The perfect solution seems to be to remove SA_RESTART and handle EINTR
> in an appropriate way after every syscalls. But, this is very tough job and has
> much influence on whole source code, so I don't think that this should be done
> as part of synch rep.

Especially note Tom's valid concerns about third party code. OTOH
checking against error codes should be common practice. And the failure
can then clearly be located in the third party module, and not in Postgres.

> On the other hand, I think that a semaphore should be used instead of a signal
> at least for backends to wait for walsender. This would turn around
> the situation
> on HPUX in some degree. In this case, the remaining problem is that walsender
> cannot immediately wake up. But, since walsender wakes up periodically and
> that period can be tweaked (by new GUC wal_sender_delay), the overhead on
> HPUX might not be actually so big.

Hm.. does the walsender really wake up periodically? IIRC the
misbehavior discovered on HPUX is that select() gets restarted upon
signaled with SA_RESTART - with its full timeout, so that a steady rate
of signals would lock the walsender process within select() completely.
Or what else do you use for the walsender to wake up periodically?

However, my point is that I think you don't have to solve this problem.
It should rather be taken care of by core. We can then pick up whatever
solution is decided on.

Just my 2c.

Regards

Markus Wanner


Re: Synch Rep: communication between backends and walsender

From
Fujii Masao
Date:
Hi,

On Tue, Jun 23, 2009 at 1:55 AM, Markus Wanner<markus@bluegap.ch> wrote:
>> On the other hand, I think that a semaphore should be used instead of a signal
>> at least for backends to wait for walsender. This would turn around
>> the situation
>> on HPUX in some degree. In this case, the remaining problem is that walsender
>> cannot immediately wake up. But, since walsender wakes up periodically and
>> that period can be tweaked (by new GUC wal_sender_delay), the overhead on
>> HPUX might not be actually so big.
>
> Hm.. does the walsender really wake up periodically? IIRC the
> misbehavior discovered on HPUX is that select() gets restarted upon
> signaled with SA_RESTART - with its full timeout, so that a steady rate
> of signals would lock the walsender process within select() completely.
> Or what else do you use for the walsender to wake up periodically?

I was thinking of reducing the number of signals by backends using
the shared flag which indicates whether walsender has already received
it or not. If the flag is true, the backend skips sending the signal because
walsender is going to wake up soon. Otherwise, the signal is sent, then
the signal handler of walsender sets the flag to true. Though I need to
examine it furthermore.

> However, my point is that I think you don't have to solve this problem.
> It should rather be taken care of by core. We can then pick up whatever
> solution is decided on.

Good for me. Does anyone have another thought? If not, I would
remove this task from the TODO list of synch rep.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center