Re: Patch to improve reliability of postgresql on linux nfs - Mailing list pgsql-hackers
From | Florian Pflug |
---|---|
Subject | Re: Patch to improve reliability of postgresql on linux nfs |
Date | |
Msg-id | AD4A13A5-8778-4D94-BBB5-AB6C13BF752A@phlo.org Whole thread Raw |
In response to | Patch to improve reliability of postgresql on linux nfs (George Barnett <gbarnett@atlassian.com>) |
Responses |
Re: Patch to improve reliability of postgresql on linux nfs
|
List | pgsql-hackers |
[CC'ing to the list again - I assume you omitted pgsql-hackers from the recipient list by accident] On Sep13, 2011, at 03:00 , George Barnett wrote: > On 12/09/2011, at 11:39 PM, Florian Pflug wrote: >> Also, non-interruptible IO primitives are by no means "right". At best, they're >> a compromise between complexity and functionality for I/O devices with rather >> short (and bounded) communication timeouts - because in that case, processes are >> only blocked un-interruptibly for a short while. > > Just to expand on that - I'm now in the situation where I can run my nfs mounts > 'nointr' and postgres will work, but that means if I lose a storage unit I have > a number of stuck processes, effectively meaning I need to reboot all my frontend > servers before I can fail over to backup nfs stores. > > However, if I run the mounts with intr, then if a storage unit fails, I can fail > over to a backup node (taking a minor loss of data hit I'm willing to accept) but > postgres breaks under a moderate insert load. > > With the patch I supplied though, I'm able to have most of my cake and eat it. > > I'd be very interested in moving this forward - is there something I can change > in the patch to make it more acceptable for a merge? Here are a few comments Tom already remarked that if we do that for write()s, we ought to do it for read()s also which I agree with. All other primitives like lseek, close, ... should be taken care of by SA_RESTART, but I'd be a good idea to verify that. Also, I don't think that POSIX mandates that errno be reset to 0 if a function returns successfully, making that "returnCode == 0 && errno == 0" check pretty dubious. I'm not sure of this was what Tom was getting at with his remark about the ENOSPC handling being wrong in the retry case. And I also think that if we do this, we might as well handle EINTR correctly, even if our use of SA_RESTART should prevent us from ever seeing that. The rules surrounding EINTR and SA_RESTART for read/write are quite subtle... If we retry, shouldn't be do CHECK_FOR_INTERRUPTS? Otherwise, processes waiting for a vanished NFS server would be killable only with SIGKILL, not SIGTERM or SIGINT. But I'm not sure if it's safe to put that into a generic function like pg_write_nointr. Finally, WriteAll() seems like a poor name for that function. How about pg_write_nointr()? Here's my suggested implementation for pg_write_nointr. pg_read_nointr should be similar (but obviously without the ENOSPC handling) int pg_write_nointr(int fd, const void *bytes, Size amount) { int written = 0; while (amount > 0) { int ret; ret = write(fd, bytes, amount); if ((ret < 0) && (errno == EINTR)) { /* interrupted by signal before first bytewas written. Retry */ /* XXX: Is it safe to call CHECK_FOR_INTERRUPTS here? */ CHECK_FOR_INTERRUPTS(); continue; } else if (ret < 0) { /* error occurred. Abort */ return -1; } else if (ret == 0) { /* outof disk space. Abort */ return written; } /* made progress */ /* XXX: Is it safe to call CHECK_FOR_INTERRUPTS here? */ CHECK_FOR_INTERRUPTS(); written += ret; amount -= ret; bytes = (const char *) bytes + ret; } } best regards, Florian Pflug
pgsql-hackers by date: