Re: Patch to improve reliability of postgresql on linux nfs - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Patch to improve reliability of postgresql on linux nfs
Date
Msg-id 8CB04209-C5B3-48B1-A2E5-F41FAA6BC367@phlo.org
Whole thread Raw
In response to Re: Patch to improve reliability of postgresql on linux nfs  (Aidan Van Dyk <aidan@highrise.ca>)
Responses Re: Patch to improve reliability of postgresql on linux nfs
Re: Patch to improve reliability of postgresql on linux nfs
List pgsql-hackers
On Sep13, 2011, at 15:05 , Aidan Van Dyk wrote:
> On Tue, Sep 13, 2011 at 7:30 AM, Florian Pflug <fgp@phlo.org> wrote:
>> Sorry for the self-reply. I realized only after hitting send that I
>> got the ENOSPC handling wrong again - we probably ought to check for
>> ENOSPC as well as ret == 0. Also, it seems preferable to return the
>> number of bytes actually written instead of -1 if we hit an error during
>> retry.
>>
>> With this version, any return value other than <amount> signals an
>> error, the number of actually written bytes is reported even in the
>> case of an error (to the best of pg_write_nointr's knowledge), and
>> errno always indicates the kind of error.
>
> Personally, I'ld think that's ripe for bugs.   If the contract is that
> ret != amount is the "error" case, then don't return -1 for an error
> *sometimes*.

Hm, but isn't that how write() works also? AFAIK (non-interruptible) write()
will return the number of bytes written, which may be less than the requested
number if there's not enough free space, or -1 in case of an error like
an invalid fd being passed.

> If you sometimes return -1 for an error, even though ret != amount is
> the *real* test, I'm going to guess there will be lots of chance for
> code to do:
>  if (pg_write_no_intr(...) < 0)
>   ...
>
> which will only catch some of the errors, and happily continue with the rest...

Yeah, but that's equally wrong for plain write(), so I'm not sure I share
your concern there. Also, I'm not sure how to improve that. We could always
return -1 in case of an error, and "amount" in case of success, but that makes
it impossible to determine how many bytes where actually written (and also feel
awkward). Or we could return 0 instead of -1 if there was an error and zero
bytes where written. But that feels awkward also...

One additional possibility would be to make the signature
 boolean pg_write_nointr(int fd, const void *bytes, int len, int *written)

and simply return true on success and false on error. Callers who're interested
in the number of bytes actually written (in the case of an error) would need to
pass some non-NULL pointer for <written>, while all others would just pass NULL.

Thoughts?

best regards,
Florian Pflug




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: augmenting MultiXacts to improve foreign keys
Next
From: Tom Lane
Date:
Subject: Re: SSL key with passphrase