On Sep12, 2011, at 14:54 , Peter Eisentraut wrote:
> On mån, 2011-09-12 at 16:46 +1000, George Barnett wrote:
>> On 12/09/2011, at 3:59 PM, Florian Pflug wrote:
>>> Still, I agree with Noah and Kevin that we ought to deal more gracefully with this, i.e. resubmit after a partial
read()or write(). AFAICS there's nothing to be gained by not doing that, and the increase in code complexity should be
negligible.If we do that, however, I believe we might as well handle EINTR correctly, even if SA_RESTART should prevent
usfrom ever seeing that.
>>
>> It does still concern me that pgsql did not deal with this as gracefully as other software. I hope the list will
considera patch to resolve that.
>
> We have signal handling configured so that system calls are not
> interrupted. So there is ordinarily no reason to do anything more
> graceful. The problem is that NFS is in this case not observing that
> setting. It's debatable whether it's worth supporting that; just saying
> that the code is correct as it stands.
SA_RESTART doesn't protect against partial reads/writes due to signal delivery,
it only removes the need to check for EINTR. In other words, it retries until
at least one byte has been written, not until all bytes have been written.
The GNU LibC documentation has this to say on the subject
"There is one situation where resumption never happens no matter which choice you make: when a data-transfer function
suchas read or write is interrupted by a signal after transferring part of the data. In this case, the function
returnsthe number of bytes already transferred, indicating partial success."[1]
While it's true that reads and writes are by tradition non-interruptible, I
personally wouldn't bet that they'll stay that way forever. It all depends on
whether the timeouts involved in the communication with a disk are short enough
to mask the difference - once they get too long (or even infinite like in the
case of "hard" NFS mounts) you pay for non-interruptible primitives with
un-killable stuck processes. Since the current trend is to move storage further
away from processing, and to put non-deterministic networks like ethernet between
the two, I expect situations in which read/write primitives are interruptible
to increase, not decrease.
And BTW, the GNU LibC documentations doesn't seem to mention anything about
local reads and writes being non-interruptible. In fact, it even says
"A signal can arrive and be handled while an I/O primitive such as open or read is waiting for an I/O device. If the
signalhandler returns, the system faces the question: what should happen next?"[1]
Had the GNU people faith in local read and writes being non-interruptible, they'd
probably have said "network device" or "remove device", not "I/O device".
best regards,
Florian Pflug
[1] http://www.gnu.org/s/hello/manual/libc/Interrupted-Primitives.html#Interrupted-Primitives