Thread: Synch Rep: communication between backends and walsender
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00448.php One of the major complaints about the current synch rep patch is that signals are used for communication between backends and walsender. On some platforms, a signal doesn't interrupt sleep (i.e. poll or select system call), which would increase the performance overhead of replication. So I'd like to propose using the UDP socket and the semaphores instead of signals for communication from backends to walsender and vice versa, respectively. The UDP socket is used for backends to request walsender to send WAL records. Semaphores cannot be used for this purpose because walsender must wait for the request from backends and the reply from the standby server concurrently. Some UDP packets might get lost, but that doesn't matter because the important data is communicated via the shared memory and walsender wakes up periodically without receiving that request. This UDP socket can be created like that for statistics collector. On the other hand, the semaphores are used for backends to wait for the reply from walsender. The backend registers its semaphore on the shared memory before sleeping, then walsender wakes it up by using that semaphore. Comments? Do you have another better approach? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jun 16, 2009 at 12:50 PM, Fujii Masao<masao.fujii@gmail.com> wrote: > On some platforms, a signal doesn't interrupt sleep (i.e. poll or select > system call) say what? -- Gregory Stark http://mit.edu/~gsstark/resume.pdf
Greg Stark <stark@enterprisedb.com> writes: > On Tue, Jun 16, 2009 at 12:50 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >> On some platforms, a signal doesn't interrupt sleep (i.e. poll or select >> system call) > say what? Yup, what he said. regards, tom lane
Hi, Fujii Masao wrote: > One of the major complaints about the current synch rep patch is that > signals are used for communication between backends and walsender. > On some platforms, a signal doesn't interrupt sleep (i.e. poll or select > system call), which would increase the performance overhead of > replication. Reading the past messages on this topic, I realized that this problem so far only affects HPUX. I fear the proposed UDP/semaphores approach might have a similar gotcha on at least one of the supported platforms, too. Limits of open file descriptors come to mind, for example. Or kernel packet filtering rules, as mentioned in pgstat.c. If I understand correctly, even Postgres itself suffers from that problem on HPUX (even though the consequences aren't dramatic, as pointed out by Tom). Plus we are not completely save from syscalls returning EINTR due to SA_RESTART not being set for SIGALRM. So, does it really make sense to take care of this issue as part of the sync rep patch? Regards Markus Wanner
Hi, On Sat, Jun 20, 2009 at 6:05 PM, Markus Wanner<markus@bluegap.ch> wrote: > Reading the past messages on this topic, I realized that this problem so > far only affects HPUX. I fear the proposed UDP/semaphores approach might > have a similar gotcha on at least one of the supported platforms, too. > Limits of open file descriptors come to mind, for example. Or kernel > packet filtering rules, as mentioned in pgstat.c. You're right. The UDP approach at least would cause another dissatisfaction, as you illustrated. > If I understand correctly, even Postgres itself suffers from that > problem on HPUX (even though the consequences aren't dramatic, as > pointed out by Tom). Plus we are not completely save from syscalls > returning EINTR due to SA_RESTART not being set for SIGALRM. > > So, does it really make sense to take care of this issue as part of the > sync rep patch? The perfect solution seems to be to remove SA_RESTART and handle EINTR in an appropriate way after every syscalls. But, this is very tough job and has much influence on whole source code, so I don't think that this should be done as part of synch rep. On the other hand, I think that a semaphore should be used instead of a signal at least for backends to wait for walsender. This would turn around the situation on HPUX in some degree. In this case, the remaining problem is that walsender cannot immediately wake up. But, since walsender wakes up periodically and that period can be tweaked (by new GUC wal_sender_delay), the overhead on HPUX might not be actually so big. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, Fujii Masao wrote: > The perfect solution seems to be to remove SA_RESTART and handle EINTR > in an appropriate way after every syscalls. But, this is very tough job and has > much influence on whole source code, so I don't think that this should be done > as part of synch rep. Especially note Tom's valid concerns about third party code. OTOH checking against error codes should be common practice. And the failure can then clearly be located in the third party module, and not in Postgres. > On the other hand, I think that a semaphore should be used instead of a signal > at least for backends to wait for walsender. This would turn around > the situation > on HPUX in some degree. In this case, the remaining problem is that walsender > cannot immediately wake up. But, since walsender wakes up periodically and > that period can be tweaked (by new GUC wal_sender_delay), the overhead on > HPUX might not be actually so big. Hm.. does the walsender really wake up periodically? IIRC the misbehavior discovered on HPUX is that select() gets restarted upon signaled with SA_RESTART - with its full timeout, so that a steady rate of signals would lock the walsender process within select() completely. Or what else do you use for the walsender to wake up periodically? However, my point is that I think you don't have to solve this problem. It should rather be taken care of by core. We can then pick up whatever solution is decided on. Just my 2c. Regards Markus Wanner
Hi, On Tue, Jun 23, 2009 at 1:55 AM, Markus Wanner<markus@bluegap.ch> wrote: >> On the other hand, I think that a semaphore should be used instead of a signal >> at least for backends to wait for walsender. This would turn around >> the situation >> on HPUX in some degree. In this case, the remaining problem is that walsender >> cannot immediately wake up. But, since walsender wakes up periodically and >> that period can be tweaked (by new GUC wal_sender_delay), the overhead on >> HPUX might not be actually so big. > > Hm.. does the walsender really wake up periodically? IIRC the > misbehavior discovered on HPUX is that select() gets restarted upon > signaled with SA_RESTART - with its full timeout, so that a steady rate > of signals would lock the walsender process within select() completely. > Or what else do you use for the walsender to wake up periodically? I was thinking of reducing the number of signals by backends using the shared flag which indicates whether walsender has already received it or not. If the flag is true, the backend skips sending the signal because walsender is going to wake up soon. Otherwise, the signal is sent, then the signal handler of walsender sets the flag to true. Though I need to examine it furthermore. > However, my point is that I think you don't have to solve this problem. > It should rather be taken care of by core. We can then pick up whatever > solution is decided on. Good for me. Does anyone have another thought? If not, I would remove this task from the TODO list of synch rep. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center