Home > mailing lists

Re: Patch to improve reliability of postgresql on linux nfs - Mailing list pgsql-hackers

From	Florian Pflug
Subject	Re: Patch to improve reliability of postgresql on linux nfs
Date	September 13, 2011 11:14:33
Msg-id	8CB04209-C5B3-48B1-A2E5-F41FAA6BC367@phlo.org Whole thread Raw
In response to	Re: Patch to improve reliability of postgresql on linux nfs (Aidan Van Dyk <aidan@highrise.ca>)
Responses	Re: Patch to improve reliability of postgresql on linux nfs Re: Patch to improve reliability of postgresql on linux nfs
List	pgsql-hackers

Tree view

On Sep13, 2011, at 15:05 , Aidan Van Dyk wrote:
> On Tue, Sep 13, 2011 at 7:30 AM, Florian Pflug <fgp@phlo.org> wrote:
>> Sorry for the self-reply. I realized only after hitting send that I
>> got the ENOSPC handling wrong again - we probably ought to check for
>> ENOSPC as well as ret == 0. Also, it seems preferable to return the
>> number of bytes actually written instead of -1 if we hit an error during
>> retry.
>>
>> With this version, any return value other than <amount> signals an
>> error, the number of actually written bytes is reported even in the
>> case of an error (to the best of pg_write_nointr's knowledge), and
>> errno always indicates the kind of error.
>
> Personally, I'ld think that's ripe for bugs.   If the contract is that
> ret != amount is the "error" case, then don't return -1 for an error
> *sometimes*.

Hm, but isn't that how write() works also? AFAIK (non-interruptible) write()
will return the number of bytes written, which may be less than the requested
number if there's not enough free space, or -1 in case of an error like
an invalid fd being passed.

> If you sometimes return -1 for an error, even though ret != amount is
> the *real* test, I'm going to guess there will be lots of chance for
> code to do:
>  if (pg_write_no_intr(...) < 0)
>   ...
>
> which will only catch some of the errors, and happily continue with the rest...

Yeah, but that's equally wrong for plain write(), so I'm not sure I share
your concern there. Also, I'm not sure how to improve that. We could always
return -1 in case of an error, and "amount" in case of success, but that makes
it impossible to determine how many bytes where actually written (and also feel
awkward). Or we could return 0 instead of -1 if there was an error and zero
bytes where written. But that feels awkward also...

One additional possibility would be to make the signature
 boolean pg_write_nointr(int fd, const void *bytes, int len, int *written)

and simply return true on success and false on error. Callers who're interested
in the number of bytes actually written (in the case of an error) would need to
pass some non-NULL pointer for <written>, while all others would just pass NULL.

Thoughts?

best regards,
Florian Pflug

pgsql-hackers by date:

From: Robert Haas
Date: 13 September 2011, 11:02:58
Subject: Re: augmenting MultiXacts to improve foreign keys

From: Tom Lane
Date: 13 September 2011, 11:17:30
Subject: Re: SSL key with passphrase

Re: Patch to improve reliability of postgresql on linux nfs - Mailing list pgsql-hackers

Previous

Next