Thread: Fix some error handling for read() and errno

Fix some error handling for read() and errno

From

Michael Paquier

Date:

20 May 2018, 03:05:22

Hi all,

This is basically a new thread after what has been discussed for
pg_controldata with its error handling for read():
https://www.postgresql.org/message-id/CABUevEx8ZRV5Ut_FvP2etXiQppx3xVzm7oOaV3AcdHxX81Yt8Q%40mail.gmail.com

While reviewing the core code, I have noticed similar weird error
handling for read().  At the same time, some of those places may use an
incorrect errno, as an error is invoked using an errno which may be
overwritten by another system call.  I found a funny one in slru.c,
for which I have added a note in the patch.  I don't think that this is
worth addressing with more facility, thoughts are welcome.

Attached is a patch addressing the issues I found.

Thanks,
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Kyotaro HORIGUCHI

Date:

22 May 2018, 10:51:00

Hello.

At Sun, 20 May 2018 09:05:22 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180520000522.GB1603@paquier.xyz>
> Hi all,
> 
> This is basically a new thread after what has been discussed for
> pg_controldata with its error handling for read():
> https://www.postgresql.org/message-id/CABUevEx8ZRV5Ut_FvP2etXiQppx3xVzm7oOaV3AcdHxX81Yt8Q%40mail.gmail.com
> 
> While reviewing the core code, I have noticed similar weird error
> handling for read().  At the same time, some of those places may use an
> incorrect errno, as an error is invoked using an errno which may be
> overwritten by another system call.  I found a funny one in slru.c,
> for which I have added a note in the patch.  I don't think that this is
> worth addressing with more facility, thoughts are welcome.
> 
> Attached is a patch addressing the issues I found.

I see the same issue in snapbuild.c(4 places).

| readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize);
| pgstat_report_wait_end();
| if (readBytes != SnapBuildOnDiskConstantSize)
| {
|   CloseTransientFile(fd);
|   ereport(ERROR,
|       (errcode_for_file_access(),
|        errmsg("could not read file \"%s\", read %d of %d: %m",
|           path, readBytes, (int) SnapBuildOnDiskConstantSize)));
| }

and walsender.c (2 places)

|   if (nread <= 0)
|     ereport(ERROR,
|         (errcode_for_file_access(),
|          errmsg("could not read file \"%s\": %m",
|             path)));

and pg_receivewal.c

| if (read(fd, (char *) buf, sizeof(buf)) != sizeof(buf))
| {
|   fprintf(stderr, _("%s: could not read compressed file \"%s\": %s\n"),
|       progname, fullpath, strerror(errno));

pg_waldump.c

| if (readbytes <= 0)
...
|   fatal_error("could not read from log file %s, offset %u, length %d: %s",
|         fname, sendOff, segbytes, strerror(err));


A bit differnt issue, but in pg_waldump.c, search_directory can
check uninitialized errno when read returns a non-zero value.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

23 May 2018, 03:00:40

On Tue, May 22, 2018 at 04:51:00PM +0900, Kyotaro HORIGUCHI wrote:
> I see the same issue in snapbuild.c(4 places).
>
> | readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize);
> | pgstat_report_wait_end();
> | if (readBytes != SnapBuildOnDiskConstantSize)
> | {
> |   CloseTransientFile(fd);
> |   ereport(ERROR,
> |       (errcode_for_file_access(),
> |        errmsg("could not read file \"%s\", read %d of %d: %m",
> |           path, readBytes, (int) SnapBuildOnDiskConstantSize)));
> | }

Four times the same pattern, which also bloat errno when closing the
file descriptor.  I did not catch those.

> and walsender.c (2 places)
>
> |   if (nread <= 0)
> |     ereport(ERROR,
> |         (errcode_for_file_access(),
> |          errmsg("could not read file \"%s\": %m",
> |             path)));

Those two ones I saw, but I was not sure if it is worth the complication
to error on an empty file.  We could do something like the attached which
would be an improvement in readability?

> and pg_receivewal.c
>
> | if (read(fd, (char *) buf, sizeof(buf)) != sizeof(buf))
> | {
> |   fprintf(stderr, _("%s: could not read compressed file \"%s\": %s\n"),
> |       progname, fullpath, strerror(errno));

Okay.

> pg_waldump.c
>
> | if (readbytes <= 0)
> ...
> |   fatal_error("could not read from log file %s, offset %u, length %d: %s",
> |         fname, sendOff, segbytes, strerror(err));
>
>
> A bit different issue, but in pg_waldump.c, search_directory can
> check uninitialized errno when read returns a non-zero value.

Yeah, the error message could be improved as well if the result is an
empty file.

Updated patch is attached.  Thanks for your review.
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Kyotaro HORIGUCHI

Date:

25 May 2018, 07:19:58

At Wed, 23 May 2018 09:00:40 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180523000040.GA3461@paquier.xyz>
> On Tue, May 22, 2018 at 04:51:00PM +0900, Kyotaro HORIGUCHI wrote:
> > I see the same issue in snapbuild.c(4 places).
> > 
> > | readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize);
> > | pgstat_report_wait_end();
> > | if (readBytes != SnapBuildOnDiskConstantSize)
> > | {
> > |   CloseTransientFile(fd);
> > |   ereport(ERROR,
> > |       (errcode_for_file_access(),
> > |        errmsg("could not read file \"%s\", read %d of %d: %m",
> > |           path, readBytes, (int) SnapBuildOnDiskConstantSize)));
> > | }
> 
> Four times the same pattern, which also bloat errno when closing the
> file descriptor.  I did not catch those.
> 
> > and walsender.c (2 places)
> > 
> > |   if (nread <= 0)
> > |     ereport(ERROR,
> > |         (errcode_for_file_access(),
> > |          errmsg("could not read file \"%s\": %m",
> > |             path)));
> 
> Those two ones I saw, but I was not sure if it is worth the complication
> to error on an empty file.  We could do something like the attached which
> would be an improvement in readability?

The case is not of an empty file. read() reads 0 bytes without
error while lseek have told that the file has *more* data. I
don't think that can happen. How about just commenting with
something like the following?

> nread = read(fd, rbuf, sizeof(rbuf));
> /*
>  *  errno is E_OK in the case where nread == 0, but that can
>  *  scarecely happen so we don't separate the case.
>  */
> if (nread <= 0)
>     ereport(ERROR,

If we ereport(%m) for the nread == 0 case, we need to initialize
errno.

> > and pg_receivewal.c
> > 
> > | if (read(fd, (char *) buf, sizeof(buf)) != sizeof(buf))
> > | {
> > |   fprintf(stderr, _("%s: could not read compressed file \"%s\": %s\n"),
> > |       progname, fullpath, strerror(errno));
> 
> Okay.
> 
> > pg_waldump.c
> > 
> > | if (readbytes <= 0)
> > ...
> > |   fatal_error("could not read from log file %s, offset %u, length %d: %s",
> > |         fname, sendOff, segbytes, strerror(err));
> > 
> > 
> > A bit different issue, but in pg_waldump.c, search_directory can
> > check uninitialized errno when read returns a non-zero value.
> 
> Yeah, the error message could be improved as well if the result is an
> empty file.
> 
> Updated patch is attached.  Thanks for your review.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

25 May 2018, 10:04:58

On Fri, May 25, 2018 at 01:19:58PM +0900, Kyotaro HORIGUCHI wrote:
> The case is not of an empty file. read() reads 0 bytes without
> error while lseek have told that the file has *more* data. I
> don't think that can happen. How about just commenting with
> something like the following?

Actually it can be useful to report that no data has been read and that
more data was expected, like that:
+       else if (nread == 0)
+           ereport(ERROR,
+                   (errmsg("no data read from file \"%s\": expected %zu more bytes",
+                           path, bytesleft)));
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Robbie Harwood

Date:

12 June 2018, 00:42:32

Michael Paquier <michael@paquier.xyz> writes:

> diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
> index 87942b4cca..d487347cc6 100644
> --- a/src/backend/access/transam/slru.c
> +++ b/src/backend/access/transam/slru.c
> @@ -683,6 +683,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
>      pgstat_report_wait_start(WAIT_EVENT_SLRU_READ);
>      if (read(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ)
>      {
> +        /*
> +         * XXX: Note that this may actually report sucess if the number
> +         * of bytes read is positive, but lacking data so that errno is
> +         * not set.
> +         */
>          pgstat_report_wait_end();
>          slru_errcause = SLRU_READ_FAILED;
>          slru_errno = errno;

It might be less confusing to just set errno if it's not set already
(e.g., to EIO, or something).  Up to you though - this is a bit of a
niche case.

The rest of the patch looks good to me.

Thanks,
--Robbie

Attachment

signature.asc

Re: Fix some error handling for read() and errno

From

Alvaro Herrera

Date:

12 June 2018, 01:11:05

On 2018-Jun-11, Robbie Harwood wrote:

> It might be less confusing to just set errno if it's not set already
> (e.g., to EIO, or something).  Up to you though - this is a bit of a
> niche case.

I think that would be very confusing, if we receive a report that really
is a short read but looks like EIO.  I'm for keeping the code as
proposed.

As for the messages, I propose to make them more regular, i.e. always
use the wording "could not read from file "%s": read %d, expected %zu", 
avoiding variations such as
    not enough data in file \"%s\": %d read, %d expected"
    could not read compressed file \"%s\": read %d out of %zu
    could not read any data from log segment %s, offset %u, length %lu
and others that appear in other places.  (In the last case, I even go as
far as proposing "read %d, expected %zu" where the the first %d is
constant zero.  Extra points if the sprintf format specifiers are always
the same (say %zu) instead of, say, %d in odd places.

I would go as far as suggesting to remove qualifiers that indicate what
the file is for (such as "relation mapping file"); relying on the path
as indicator of what's going wrong should be sufficient, since it's an
error that affects internals anyway, not anything that users can do much
about.  Keep variations to a minimum, to ease translator's work;
sometimes it's hard enough to come up with good translations for things
like "relation mapping file" in the first place, and they don't help the
end user.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix some error handling for read() and errno

From

Andres Freund

Date:

12 June 2018, 01:17:15

On 2018-06-11 18:11:05 -0400, Alvaro Herrera wrote:
> As for the messages, I propose to make them more regular, i.e. always
> use the wording "could not read from file "%s": read %d, expected %zu", 
> avoiding variations such as
>     not enough data in file \"%s\": %d read, %d expected"
>     could not read compressed file \"%s\": read %d out of %zu
>     could not read any data from log segment %s, offset %u, length %lu
> and others that appear in other places.  (In the last case, I even go as
> far as proposing "read %d, expected %zu" where the the first %d is
> constant zero.  Extra points if the sprintf format specifiers are always
> the same (say %zu) instead of, say, %d in odd places.

> I would go as far as suggesting to remove qualifiers that indicate what
> the file is for (such as "relation mapping file"); relying on the path
> as indicator of what's going wrong should be sufficient, since it's an
> error that affects internals anyway, not anything that users can do much
> about.  Keep variations to a minimum, to ease translator's work;
> sometimes it's hard enough to come up with good translations for things
> like "relation mapping file" in the first place, and they don't help the
> end user.

If we go there, why not wrap the read/write/etc calls into a wrapper,
including the error handling?

I'm not quite convinced that "relation mapping file" doesn't provide any
information. It's likley to be easier to search for than a specific
filename, particularly if there's oids or such in the name...

Greetings,

Andres Freund

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

12 June 2018, 07:19:54

On Mon, Jun 11, 2018 at 03:17:15PM -0700, Andres Freund wrote:
> On 2018-06-11 18:11:05 -0400, Alvaro Herrera wrote:
>> As for the messages, I propose to make them more regular, i.e. always
>> use the wording "could not read from file "%s": read %d, expected %zu",
>> avoiding variations such as
>>     not enough data in file \"%s\": %d read, %d expected"
>>     could not read compressed file \"%s\": read %d out of %zu
>>     could not read any data from log segment %s, offset %u, length %lu
>> and others that appear in other places.  (In the last case, I even go as
>> far as proposing "read %d, expected %zu" where the the first %d is
>> constant zero.  Extra points if the sprintf format specifiers are always
>> the same (say %zu) instead of, say, %d in odd places.
>
>> I would go as far as suggesting to remove qualifiers that indicate what
>> the file is for (such as "relation mapping file"); relying on the path
>> as indicator of what's going wrong should be sufficient, since it's an
>> error that affects internals anyway, not anything that users can do much
>> about.  Keep variations to a minimum, to ease translator's work;
>> sometimes it's hard enough to come up with good translations for things
>> like "relation mapping file" in the first place, and they don't help the
>> end user.
>
> If we go there, why not wrap the read/write/etc calls into a wrapper,
> including the error handling?

On this thread have been spotted 17 code paths involved: 13 for the
backend and 4 for src/bin.  For the frontend part the error handling of
each of of them is slightly different (pg_rewind uses pg_fatal,
pg_waldump uses fatal_error) so I don't think that for this part another
wrapper would bring more clarity.  For the backend part, 7 are working
on transient files and 6 are not, which would make the wrapper a tad
more complex if we would need for example a set of flags in input to
control if the fd passed down by the caller should use
CloseTransientFile() before emitting an error or not.  If the balance
was way more in favor of one or the other though...

> I'm not quite convinced that "relation mapping file" doesn't provide any
> information. It's likley to be easier to search for than a specific
> filename, particularly if there's oids or such in the name...

Agreed.  I also quite like the message mentioning directly 2PC files as
well.  I think that we could gain by making all end messages more
consistent, as the markers used and the style of each message is
slightly different, so I would suggest something like that instead to
gain in consistency:
if (readBytes < 0)
    ereport(elevel, "could not blah: %m");
else
    ereport(elevel, "could not blah: %d read, expected %zu");

My point is that if we use the same markers and the same end messages,
then those are easier to grep for, and callers are still free to provide
the head of error messages the way they want depending on the situation.
--
Michael

Attachment

signature.asc

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

13 June 2018, 06:52:35

On Tue, Jun 12, 2018 at 01:19:54PM +0900, Michael Paquier wrote:
> Agreed.  I also quite like the message mentioning directly 2PC files as
> well.  I think that we could gain by making all end messages more
> consistent, as the markers used and the style of each message is
> slightly different, so I would suggest something like that instead to
> gain in consistency:
> if (readBytes < 0)
>     ereport(elevel, "could not blah: %m");
> else
>     ereport(elevel, "could not blah: %d read, expected %zu");
>
> My point is that if we use the same markers and the same end messages,
> then those are easier to grep for, and callers are still free to provide
> the head of error messages the way they want depending on the situation.

I have dug again into this stuff, and I have finished with the attached
which uses mainly "could not read file %s: read %d bytes, expected
%zu".  The markers are harder to make consistent without being more
invasive so I stopped on that.

There is also this bit in slru.c which I'd like to discuss:
+       /*
+        * Note that this would report success if the number of bytes read is
+        * positive, but lacking data so that errno is not set, which would be
+        * confusing, so set errno to EIO in this case.
+        */
+       if (errno == 0)
+           errno = EIO;
Please note that I don't necessarily propose to add this in the final
patch, and I think that at least an XXX comment should be added here to
mention that errno may not be set.

Thoughts?
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Robert Haas

Date:

14 June 2018, 16:50:33

On Mon, Jun 11, 2018 at 6:11 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I would go as far as suggesting to remove qualifiers that indicate what
> the file is for (such as "relation mapping file"); relying on the path
> as indicator of what's going wrong should be sufficient, since it's an
> error that affects internals anyway, not anything that users can do much
> about.  Keep variations to a minimum, to ease translator's work;
> sometimes it's hard enough to come up with good translations for things
> like "relation mapping file" in the first place, and they don't help the
> end user.

+1.  I think those things are often hard to phrase even in English.
It makes the messages long, awkward, and almost invariably the style
differs from one message to the next depending on the author and how
easy it is to describe the type of file involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

18 June 2018, 07:42:03

On Thu, Jun 14, 2018 at 09:50:33AM -0400, Robert Haas wrote:
> On Mon, Jun 11, 2018 at 6:11 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I would go as far as suggesting to remove qualifiers that indicate what
>> the file is for (such as "relation mapping file"); relying on the path
>> as indicator of what's going wrong should be sufficient, since it's an
>> error that affects internals anyway, not anything that users can do much
>> about.  Keep variations to a minimum, to ease translator's work;
>> sometimes it's hard enough to come up with good translations for things
>> like "relation mapping file" in the first place, and they don't help the
>> end user.
>
> +1.  I think those things are often hard to phrase even in English.
> It makes the messages long, awkward, and almost invariably the style
> differs from one message to the next depending on the author and how
> easy it is to describe the type of file involved.

Okay, so this makes two votes in favor of keeping the messages simple
without context, shaped like "could not read file %s", with Robert and
Alvaro, and two votes for keeping some context with Andres and I.
Anybody else has an opinion to offer?

Please note that I think that some messages should keep some context
anyway, for example the WAL-related information is useful.  An example
is this one where the offset is good to know about:
+   ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
+           (errmsg("could not read from log segment %s, offset %u: read %d bytes, expected %d",
+                   fname, readOff, r, XLOG_BLCKSZ)));
--
Michael

Attachment

signature.asc

Re: Fix some error handling for read() and errno

From

Magnus Hagander

Date:

20 June 2018, 15:52:23

On Mon, Jun 18, 2018 at 6:42 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jun 14, 2018 at 09:50:33AM -0400, Robert Haas wrote:
> On Mon, Jun 11, 2018 at 6:11 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I would go as far as suggesting to remove qualifiers that indicate what
>> the file is for (such as "relation mapping file"); relying on the path
>> as indicator of what's going wrong should be sufficient, since it's an
>> error that affects internals anyway, not anything that users can do much
>> about. Keep variations to a minimum, to ease translator's work;
>> sometimes it's hard enough to come up with good translations for things
>> like "relation mapping file" in the first place, and they don't help the
>> end user.
>
> +1. I think those things are often hard to phrase even in English.
> It makes the messages long, awkward, and almost invariably the style
> differs from one message to the next depending on the author and how
> easy it is to describe the type of file involved.

Okay, so this makes two votes in favor of keeping the messages simple
without context, shaped like "could not read file %s", with Robert and
Alvaro, and two votes for keeping some context with Andres and I.
Anybody else has an opinion to offer?

Count me in with Robert and Alvaro with a +0.5 :)

Please note that I think that some messages should keep some context
anyway, for example the WAL-related information is useful. An example
is this one where the offset is good to know about:
+ ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
+ (errmsg("could not read from log segment %s, offset %u: read %d bytes, expected %d",
+ fname, readOff, r, XLOG_BLCKSZ)));

Yeah, I think you'd need to make a call for the individual message to see how much it helps. In this one the context definitely does, in some others it doesn't.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

21 June 2018, 09:32:01

On Wed, Jun 20, 2018 at 02:52:23PM +0200, Magnus Hagander wrote:
> On Mon, Jun 18, 2018 at 6:42 AM, Michael Paquier <michael@paquier.xyz>
> wrote:
>> Okay, so this makes two votes in favor of keeping the messages simple
>> without context, shaped like "could not read file %s", with Robert and
>> Alvaro, and two votes for keeping some context with Andres and I.
>> Anybody else has an opinion to offer?
>>
>
> Count me in with Robert and Alvaro with a +0.5 :)

Thanks for your opinion.

>> Please note that I think that some messages should keep some context
>> anyway, for example the WAL-related information is useful.  An example
>> is this one where the offset is good to know about:
>> +   ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
>> +           (errmsg("could not read from log segment %s, offset %u: read
>> %d bytes, expected %d",
>> +                   fname, readOff, r, XLOG_BLCKSZ)));
>
> Yeah, I think you'd need to make a call for the individual message to see
> how much it helps. In this one the context definitely does, in some others
> it doesn't.

Sure.  I have looked at that and I am finishing with the updated version
attached.

I removed the portion about slru in the code, but I would like to add at
least a mention about it in the commit message.  The simplifications are
also applied for the control file messages you changed as well in
cfb758b.  Also in ReadControlFile(), we use "could not open control
file", so for consistency this should be replaced with a more simple
message.  I noticed as well that the 2PC file handling loses errno a
couple of times, and that we had better keep the messages also
consistent if we do the simplification move...  relmapper.c also gains
simplifications.

All the incorrect errno handling I found should be back-patched in my
opinion as we did so in the past with 2a11b188 for example.  I am not
really willing to bother about the error message improvements on
anything else than HEAD (not v11, but v12), but...  Opinions about all
that?
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Robert Haas

Date:

22 June 2018, 15:46:31

On Thu, Jun 21, 2018 at 2:32 AM, Michael Paquier <michael@paquier.xyz> wrote:
> Sure.  I have looked at that and I am finishing with the updated version
> attached.
>
> I removed the portion about slru in the code, but I would like to add at
> least a mention about it in the commit message.  The simplifications are
> also applied for the control file messages you changed as well in
> cfb758b.  Also in ReadControlFile(), we use "could not open control
> file", so for consistency this should be replaced with a more simple
> message.  I noticed as well that the 2PC file handling loses errno a
> couple of times, and that we had better keep the messages also
> consistent if we do the simplification move...  relmapper.c also gains
> simplifications.
>
> All the incorrect errno handling I found should be back-patched in my
> opinion as we did so in the past with 2a11b188 for example.  I am not
> really willing to bother about the error message improvements on
> anything else than HEAD (not v11, but v12), but...  Opinions about all
> that?

I think this should be split into two patches.  Fixing failure to save
and restore errno is a different issue from cleaning up short write
messages.  I agree that the former should be back-patched and the
latter not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix some error handling for read() and errno

From

Alvaro Herrera

Date:

22 June 2018, 16:51:30

On 2018-Jun-22, Robert Haas wrote:

> I think this should be split into two patches.  Fixing failure to save
> and restore errno is a different issue from cleaning up short write
> messages.  I agree that the former should be back-patched and the
> latter not.

Hmm, yeah, good thought, +1.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

23 June 2018, 02:48:03

On Fri, Jun 22, 2018 at 09:51:30AM -0400, Alvaro Herrera wrote:
> On 2018-Jun-22, Robert Haas wrote:
>> I think this should be split into two patches.  Fixing failure to save
>> and restore errno is a different issue from cleaning up short write
>> messages.  I agree that the former should be back-patched and the
>> latter not.
>
> Hmm, yeah, good thought, +1.

That's exactly why I have started this thread so as both problems are
addressed separately:
https://www.postgresql.org/message-id/20180622061535.GD5215@paquier.xyz
And back-patching the errno patch while only bothering about messages on
HEAD matches also what I got in mind.  I'll come back to this thread
once the errno issues are all addressed.
--
Michael

Attachment

signature.asc

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

25 June 2018, 10:18:18

On Sat, Jun 23, 2018 at 08:48:03AM +0900, Michael Paquier wrote:
> That's exactly why I have started this thread so as both problems are
> addressed separately:
> https://www.postgresql.org/message-id/20180622061535.GD5215@paquier.xyz
> And back-patching the errno patch while only bothering about messages on
> HEAD matches also what I got in mind.  I'll come back to this thread
> once the errno issues are all addressed.

As this one is done, I have been looking at that this thread again.
Peter Eisentraut has pushed as e5d11b9 something which does not need to
worry about pluralization of error messages.  So I have moved to this
message style for all messages.  All of this is done as 0001.

I have been thinking as well about a common interface which could be
used to read/write/fsync transient files:
void WriteTransientFile(int fd, char *buf, Size count, int elevel,
    const char *filename, uint32 wait_event_info);
bool ReadTransientFile(int fd, char *buf, Size count, int elevel,
    const char *filename, uint32 wait_event_info);
void SyncTransientFile(int fd, int elevel, const char *filename
    uint32 wait_event_info);

There are rather equivalent things with FileRead and FileWrite but the
purpose is different as well.

If you look at 0002, this shaves a bit of code:
6 files changed, 128 insertions(+), 200 deletions(-)

There are also a couple of advantages here:
- Centralize errno handling for transient files with ENOSPC for write(2)
and read count for read(2)
- Wait events have to be defined, so those would unlikely get forgotten
in the future.
- Error handling for CloseTransientFile in code paths is centralized.

ReadTransientFile could be redefined to return the number of bytes read
as result with caller checking for errno, but that feels a bit duplicate
work for twophase.c.  WriteTransientFile and SyncTransientFile could
also have the same treatment for consistency but they would not really
be used now.  Do you guys think that this is worth pursuing?  Merging
0001 and 0002 together may make sense then.
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

13 July 2018, 16:38:46

On Mon, Jun 25, 2018 at 04:18:18PM +0900, Michael Paquier wrote:
> As this one is done, I have been looking at that this thread again.
> Peter Eisentraut has pushed as e5d11b9 something which does not need to
> worry about pluralization of error messages.  So I have moved to this
> message style for all messages.  All of this is done as 0001.

Mr. Robot has been complaining about this patch set, so attached is a
rebased version.  Thinking about it, I would tend to just merge 0001 and
give up on 0002 as that may not justify future backpatch pain.  Thoughts
are welcome.
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Alvaro Herrera

Date:

13 July 2018, 18:31:31

On 2018-Jul-13, Michael Paquier wrote:

> Mr. Robot has been complaining about this patch set, so attached is a
> rebased version.  Thinking about it, I would tend to just merge 0001 and
> give up on 0002 as that may not justify future backpatch pain.  Thoughts
> are welcome.

I vote to push both.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

14 July 2018, 02:47:12

On Fri, Jul 13, 2018 at 11:31:31AM -0400, Alvaro Herrera wrote:
>> Mr. Robot has been complaining about this patch set, so attached is a
>> rebased version.  Thinking about it, I would tend to just merge 0001 and
>> give up on 0002 as that may not justify future backpatch pain.  Thoughts
>> are welcome.
>
> I vote to push both.

Thanks!  Did you look at the code?  The first patch is just some
cleanup, while the second could have adjustments?  For the second I went
with the minimal amount of work, and actually there is no need to make
ReadTransientFile() return a status per my study of ReadTwoPhaseFile()
in https://postgr.es/m/20180709050309.GM1467@paquier.xyz which must fail
when reading the file.  So patch 0002 depends on the other 2PC patch.
--
Michael

Attachment

signature.asc

Re: Fix some error handling for read() and errno

From

Alvaro Herrera

Date:

14 July 2018, 07:50:02

On 2018-Jul-14, Michael Paquier wrote:

> On Fri, Jul 13, 2018 at 11:31:31AM -0400, Alvaro Herrera wrote:
> >> Mr. Robot has been complaining about this patch set, so attached is a
> >> rebased version.  Thinking about it, I would tend to just merge 0001 and
> >> give up on 0002 as that may not justify future backpatch pain.  Thoughts
> >> are welcome.
> > 
> > I vote to push both.
> 
> Thanks!  Did you look at the code?  The first patch is just some
> cleanup, while the second could have adjustments?  For the second I went
> with the minimal amount of work, and actually there is no need to make
> ReadTransientFile() return a status per my study of ReadTwoPhaseFile()
> in https://postgr.es/m/20180709050309.GM1467@paquier.xyz which must fail
> when reading the file.  So patch 0002 depends on the other 2PC patch.

I did read them, though not in minute detail.  0002 seems to result in
code easier to read.  If there are particular places that deviate from
the obvious patterns, I didn't notice them.

In 0001 one thing I wasn't terribly in love with was random deviations
in sprintf format specifiers for things that should be identical, ie.
%lu in some places and %zu in others, for "read only %d of %d".  It
seems you should pick the more general one (%zu) and use casts to Size
(or is it size_t?) in the places that have other types.  That way you
*really* decrease translator effort to the bare minimum :-)

Ah, in 0001 you have one case of "could not read _from_" (in
SimpleXLogPageRead).  The "from" is not present in the other places.
Looks odd.

I'm not sure about putting the wait event stuff inside the new
functions.  It looks odd, though I understand why you did it.

No opinion on the 2PC stuff -- didn't look at that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

14 July 2018, 09:37:56

On Sat, Jul 14, 2018 at 12:50:02AM -0400, Alvaro Herrera wrote:
> I did read them, though not in minute detail.  0002 seems to result in
> code easier to read.  If there are particular places that deviate from
> the obvious patterns, I didn't notice them.
>
> In 0001 one thing I wasn't terribly in love with was random deviations
> in sprintf format specifiers for things that should be identical, ie.
> %lu in some places and %zu in others, for "read only %d of %d".  It
> seems you should pick the more general one (%zu) and use casts to Size
> (or is it size_t?) in the places that have other types.  That way you
> *really* decrease translator effort to the bare minimum :-)

Yeah, that would be really the point, hence let's use "could not read
file \"%s\": read %d of %zu" everywhere with a cast to Size.  That's
used in some other places as well.

> Ah, in 0001 you have one case of "could not read _from_" (in
> SimpleXLogPageRead).  The "from" is not present in the other places.
> Looks odd.

Right.

> I'm not sure about putting the wait event stuff inside the new
> functions.  It looks odd, though I understand why you did it.

I don't really find that strange as any transient files should be
tracked by I/O wait events :)

> No opinion on the 2PC stuff -- didn't look at that.

For now, I think that just moving forward with 0001, and then revisit
0002 once the other 2PC patch is settled makes the most sense.  On the
other thread, the current 2PC behavior can create silent data loss so
I would like to back-patch it, so that would be less work.
--
Michael

Attachment

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

16 July 2018, 14:09:19

On Sat, Jul 14, 2018 at 03:37:56PM +0900, Michael Paquier wrote:
> For now, I think that just moving forward with 0001, and then revisit
> 0002 once the other 2PC patch is settled makes the most sense.  On the
> other thread, the current 2PC behavior can create silent data loss so
> I would like to back-patch it, so that would be less work.

Are there any objections with this plan?  If none, then I would like to
move on with 0001 as there is clearly a consensus to simplify the work
of translators and to clean up the error code paths for read() calls.
Let's sort of the rest after the 2PC code paths are addressed.
--
Michael

Attachment

signature.asc

Re: Fix some error handling for read() and errno

From

Alvaro Herrera

Date:

16 July 2018, 23:04:12

On 2018-Jul-16, Michael Paquier wrote:

> On Sat, Jul 14, 2018 at 03:37:56PM +0900, Michael Paquier wrote:
> > For now, I think that just moving forward with 0001, and then revisit
> > 0002 once the other 2PC patch is settled makes the most sense.  On the
> > other thread, the current 2PC behavior can create silent data loss so
> > I would like to back-patch it, so that would be less work.
> 
> Are there any objections with this plan?  If none, then I would like to
> move on with 0001 as there is clearly a consensus to simplify the work
> of translators and to clean up the error code paths for read() calls.
> Let's sort of the rest after the 2PC code paths are addressed.

No objection here -- incremental progress is better than none.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix some error handling for read() and errno

From

Michael Paquier

Date:

18 July 2018, 02:07:48

On Mon, Jul 16, 2018 at 04:04:12PM -0400, Alvaro Herrera wrote:
> No objection here -- incremental progress is better than none.

Thanks.  I have pushed 0001 now.  I have found some more work which
could be done, for which I'll spawn a new thread.
--
Michael

Attachment

signature.asc