Thread: Fix some error handling for read() and errno
Hi all, This is basically a new thread after what has been discussed for pg_controldata with its error handling for read(): https://www.postgresql.org/message-id/CABUevEx8ZRV5Ut_FvP2etXiQppx3xVzm7oOaV3AcdHxX81Yt8Q%40mail.gmail.com While reviewing the core code, I have noticed similar weird error handling for read(). At the same time, some of those places may use an incorrect errno, as an error is invoked using an errno which may be overwritten by another system call. I found a funny one in slru.c, for which I have added a note in the patch. I don't think that this is worth addressing with more facility, thoughts are welcome. Attached is a patch addressing the issues I found. Thanks, -- Michael
Attachment
Hello. At Sun, 20 May 2018 09:05:22 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180520000522.GB1603@paquier.xyz> > Hi all, > > This is basically a new thread after what has been discussed for > pg_controldata with its error handling for read(): > https://www.postgresql.org/message-id/CABUevEx8ZRV5Ut_FvP2etXiQppx3xVzm7oOaV3AcdHxX81Yt8Q%40mail.gmail.com > > While reviewing the core code, I have noticed similar weird error > handling for read(). At the same time, some of those places may use an > incorrect errno, as an error is invoked using an errno which may be > overwritten by another system call. I found a funny one in slru.c, > for which I have added a note in the patch. I don't think that this is > worth addressing with more facility, thoughts are welcome. > > Attached is a patch addressing the issues I found. I see the same issue in snapbuild.c(4 places). | readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize); | pgstat_report_wait_end(); | if (readBytes != SnapBuildOnDiskConstantSize) | { | CloseTransientFile(fd); | ereport(ERROR, | (errcode_for_file_access(), | errmsg("could not read file \"%s\", read %d of %d: %m", | path, readBytes, (int) SnapBuildOnDiskConstantSize))); | } and walsender.c (2 places) | if (nread <= 0) | ereport(ERROR, | (errcode_for_file_access(), | errmsg("could not read file \"%s\": %m", | path))); and pg_receivewal.c | if (read(fd, (char *) buf, sizeof(buf)) != sizeof(buf)) | { | fprintf(stderr, _("%s: could not read compressed file \"%s\": %s\n"), | progname, fullpath, strerror(errno)); pg_waldump.c | if (readbytes <= 0) ... | fatal_error("could not read from log file %s, offset %u, length %d: %s", | fname, sendOff, segbytes, strerror(err)); A bit differnt issue, but in pg_waldump.c, search_directory can check uninitialized errno when read returns a non-zero value. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, May 22, 2018 at 04:51:00PM +0900, Kyotaro HORIGUCHI wrote: > I see the same issue in snapbuild.c(4 places). > > | readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize); > | pgstat_report_wait_end(); > | if (readBytes != SnapBuildOnDiskConstantSize) > | { > | CloseTransientFile(fd); > | ereport(ERROR, > | (errcode_for_file_access(), > | errmsg("could not read file \"%s\", read %d of %d: %m", > | path, readBytes, (int) SnapBuildOnDiskConstantSize))); > | } Four times the same pattern, which also bloat errno when closing the file descriptor. I did not catch those. > and walsender.c (2 places) > > | if (nread <= 0) > | ereport(ERROR, > | (errcode_for_file_access(), > | errmsg("could not read file \"%s\": %m", > | path))); Those two ones I saw, but I was not sure if it is worth the complication to error on an empty file. We could do something like the attached which would be an improvement in readability? > and pg_receivewal.c > > | if (read(fd, (char *) buf, sizeof(buf)) != sizeof(buf)) > | { > | fprintf(stderr, _("%s: could not read compressed file \"%s\": %s\n"), > | progname, fullpath, strerror(errno)); Okay. > pg_waldump.c > > | if (readbytes <= 0) > ... > | fatal_error("could not read from log file %s, offset %u, length %d: %s", > | fname, sendOff, segbytes, strerror(err)); > > > A bit different issue, but in pg_waldump.c, search_directory can > check uninitialized errno when read returns a non-zero value. Yeah, the error message could be improved as well if the result is an empty file. Updated patch is attached. Thanks for your review. -- Michael
Attachment
At Wed, 23 May 2018 09:00:40 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180523000040.GA3461@paquier.xyz> > On Tue, May 22, 2018 at 04:51:00PM +0900, Kyotaro HORIGUCHI wrote: > > I see the same issue in snapbuild.c(4 places). > > > > | readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize); > > | pgstat_report_wait_end(); > > | if (readBytes != SnapBuildOnDiskConstantSize) > > | { > > | CloseTransientFile(fd); > > | ereport(ERROR, > > | (errcode_for_file_access(), > > | errmsg("could not read file \"%s\", read %d of %d: %m", > > | path, readBytes, (int) SnapBuildOnDiskConstantSize))); > > | } > > Four times the same pattern, which also bloat errno when closing the > file descriptor. I did not catch those. > > > and walsender.c (2 places) > > > > | if (nread <= 0) > > | ereport(ERROR, > > | (errcode_for_file_access(), > > | errmsg("could not read file \"%s\": %m", > > | path))); > > Those two ones I saw, but I was not sure if it is worth the complication > to error on an empty file. We could do something like the attached which > would be an improvement in readability? The case is not of an empty file. read() reads 0 bytes without error while lseek have told that the file has *more* data. I don't think that can happen. How about just commenting with something like the following? > nread = read(fd, rbuf, sizeof(rbuf)); > /* > * errno is E_OK in the case where nread == 0, but that can > * scarecely happen so we don't separate the case. > */ > if (nread <= 0) > ereport(ERROR, If we ereport(%m) for the nread == 0 case, we need to initialize errno. > > and pg_receivewal.c > > > > | if (read(fd, (char *) buf, sizeof(buf)) != sizeof(buf)) > > | { > > | fprintf(stderr, _("%s: could not read compressed file \"%s\": %s\n"), > > | progname, fullpath, strerror(errno)); > > Okay. > > > pg_waldump.c > > > > | if (readbytes <= 0) > > ... > > | fatal_error("could not read from log file %s, offset %u, length %d: %s", > > | fname, sendOff, segbytes, strerror(err)); > > > > > > A bit different issue, but in pg_waldump.c, search_directory can > > check uninitialized errno when read returns a non-zero value. > > Yeah, the error message could be improved as well if the result is an > empty file. > > Updated patch is attached. Thanks for your review. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, May 25, 2018 at 01:19:58PM +0900, Kyotaro HORIGUCHI wrote: > The case is not of an empty file. read() reads 0 bytes without > error while lseek have told that the file has *more* data. I > don't think that can happen. How about just commenting with > something like the following? Actually it can be useful to report that no data has been read and that more data was expected, like that: + else if (nread == 0) + ereport(ERROR, + (errmsg("no data read from file \"%s\": expected %zu more bytes", + path, bytesleft))); -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c > index 87942b4cca..d487347cc6 100644 > --- a/src/backend/access/transam/slru.c > +++ b/src/backend/access/transam/slru.c > @@ -683,6 +683,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) > pgstat_report_wait_start(WAIT_EVENT_SLRU_READ); > if (read(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ) > { > + /* > + * XXX: Note that this may actually report sucess if the number > + * of bytes read is positive, but lacking data so that errno is > + * not set. > + */ > pgstat_report_wait_end(); > slru_errcause = SLRU_READ_FAILED; > slru_errno = errno; It might be less confusing to just set errno if it's not set already (e.g., to EIO, or something). Up to you though - this is a bit of a niche case. The rest of the patch looks good to me. Thanks, --Robbie
Attachment
On 2018-Jun-11, Robbie Harwood wrote: > It might be less confusing to just set errno if it's not set already > (e.g., to EIO, or something). Up to you though - this is a bit of a > niche case. I think that would be very confusing, if we receive a report that really is a short read but looks like EIO. I'm for keeping the code as proposed. As for the messages, I propose to make them more regular, i.e. always use the wording "could not read from file "%s": read %d, expected %zu", avoiding variations such as not enough data in file \"%s\": %d read, %d expected" could not read compressed file \"%s\": read %d out of %zu could not read any data from log segment %s, offset %u, length %lu and others that appear in other places. (In the last case, I even go as far as proposing "read %d, expected %zu" where the the first %d is constant zero. Extra points if the sprintf format specifiers are always the same (say %zu) instead of, say, %d in odd places. I would go as far as suggesting to remove qualifiers that indicate what the file is for (such as "relation mapping file"); relying on the path as indicator of what's going wrong should be sufficient, since it's an error that affects internals anyway, not anything that users can do much about. Keep variations to a minimum, to ease translator's work; sometimes it's hard enough to come up with good translations for things like "relation mapping file" in the first place, and they don't help the end user. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-06-11 18:11:05 -0400, Alvaro Herrera wrote: > As for the messages, I propose to make them more regular, i.e. always > use the wording "could not read from file "%s": read %d, expected %zu", > avoiding variations such as > not enough data in file \"%s\": %d read, %d expected" > could not read compressed file \"%s\": read %d out of %zu > could not read any data from log segment %s, offset %u, length %lu > and others that appear in other places. (In the last case, I even go as > far as proposing "read %d, expected %zu" where the the first %d is > constant zero. Extra points if the sprintf format specifiers are always > the same (say %zu) instead of, say, %d in odd places. > I would go as far as suggesting to remove qualifiers that indicate what > the file is for (such as "relation mapping file"); relying on the path > as indicator of what's going wrong should be sufficient, since it's an > error that affects internals anyway, not anything that users can do much > about. Keep variations to a minimum, to ease translator's work; > sometimes it's hard enough to come up with good translations for things > like "relation mapping file" in the first place, and they don't help the > end user. If we go there, why not wrap the read/write/etc calls into a wrapper, including the error handling? I'm not quite convinced that "relation mapping file" doesn't provide any information. It's likley to be easier to search for than a specific filename, particularly if there's oids or such in the name... Greetings, Andres Freund
On Mon, Jun 11, 2018 at 03:17:15PM -0700, Andres Freund wrote: > On 2018-06-11 18:11:05 -0400, Alvaro Herrera wrote: >> As for the messages, I propose to make them more regular, i.e. always >> use the wording "could not read from file "%s": read %d, expected %zu", >> avoiding variations such as >> not enough data in file \"%s\": %d read, %d expected" >> could not read compressed file \"%s\": read %d out of %zu >> could not read any data from log segment %s, offset %u, length %lu >> and others that appear in other places. (In the last case, I even go as >> far as proposing "read %d, expected %zu" where the the first %d is >> constant zero. Extra points if the sprintf format specifiers are always >> the same (say %zu) instead of, say, %d in odd places. > >> I would go as far as suggesting to remove qualifiers that indicate what >> the file is for (such as "relation mapping file"); relying on the path >> as indicator of what's going wrong should be sufficient, since it's an >> error that affects internals anyway, not anything that users can do much >> about. Keep variations to a minimum, to ease translator's work; >> sometimes it's hard enough to come up with good translations for things >> like "relation mapping file" in the first place, and they don't help the >> end user. > > If we go there, why not wrap the read/write/etc calls into a wrapper, > including the error handling? On this thread have been spotted 17 code paths involved: 13 for the backend and 4 for src/bin. For the frontend part the error handling of each of of them is slightly different (pg_rewind uses pg_fatal, pg_waldump uses fatal_error) so I don't think that for this part another wrapper would bring more clarity. For the backend part, 7 are working on transient files and 6 are not, which would make the wrapper a tad more complex if we would need for example a set of flags in input to control if the fd passed down by the caller should use CloseTransientFile() before emitting an error or not. If the balance was way more in favor of one or the other though... > I'm not quite convinced that "relation mapping file" doesn't provide any > information. It's likley to be easier to search for than a specific > filename, particularly if there's oids or such in the name... Agreed. I also quite like the message mentioning directly 2PC files as well. I think that we could gain by making all end messages more consistent, as the markers used and the style of each message is slightly different, so I would suggest something like that instead to gain in consistency: if (readBytes < 0) ereport(elevel, "could not blah: %m"); else ereport(elevel, "could not blah: %d read, expected %zu"); My point is that if we use the same markers and the same end messages, then those are easier to grep for, and callers are still free to provide the head of error messages the way they want depending on the situation. -- Michael
Attachment
On Tue, Jun 12, 2018 at 01:19:54PM +0900, Michael Paquier wrote: > Agreed. I also quite like the message mentioning directly 2PC files as > well. I think that we could gain by making all end messages more > consistent, as the markers used and the style of each message is > slightly different, so I would suggest something like that instead to > gain in consistency: > if (readBytes < 0) > ereport(elevel, "could not blah: %m"); > else > ereport(elevel, "could not blah: %d read, expected %zu"); > > My point is that if we use the same markers and the same end messages, > then those are easier to grep for, and callers are still free to provide > the head of error messages the way they want depending on the situation. I have dug again into this stuff, and I have finished with the attached which uses mainly "could not read file %s: read %d bytes, expected %zu". The markers are harder to make consistent without being more invasive so I stopped on that. There is also this bit in slru.c which I'd like to discuss: + /* + * Note that this would report success if the number of bytes read is + * positive, but lacking data so that errno is not set, which would be + * confusing, so set errno to EIO in this case. + */ + if (errno == 0) + errno = EIO; Please note that I don't necessarily propose to add this in the final patch, and I think that at least an XXX comment should be added here to mention that errno may not be set. Thoughts? -- Michael
Attachment
On Mon, Jun 11, 2018 at 6:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I would go as far as suggesting to remove qualifiers that indicate what > the file is for (such as "relation mapping file"); relying on the path > as indicator of what's going wrong should be sufficient, since it's an > error that affects internals anyway, not anything that users can do much > about. Keep variations to a minimum, to ease translator's work; > sometimes it's hard enough to come up with good translations for things > like "relation mapping file" in the first place, and they don't help the > end user. +1. I think those things are often hard to phrase even in English. It makes the messages long, awkward, and almost invariably the style differs from one message to the next depending on the author and how easy it is to describe the type of file involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 14, 2018 at 09:50:33AM -0400, Robert Haas wrote: > On Mon, Jun 11, 2018 at 6:11 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> I would go as far as suggesting to remove qualifiers that indicate what >> the file is for (such as "relation mapping file"); relying on the path >> as indicator of what's going wrong should be sufficient, since it's an >> error that affects internals anyway, not anything that users can do much >> about. Keep variations to a minimum, to ease translator's work; >> sometimes it's hard enough to come up with good translations for things >> like "relation mapping file" in the first place, and they don't help the >> end user. > > +1. I think those things are often hard to phrase even in English. > It makes the messages long, awkward, and almost invariably the style > differs from one message to the next depending on the author and how > easy it is to describe the type of file involved. Okay, so this makes two votes in favor of keeping the messages simple without context, shaped like "could not read file %s", with Robert and Alvaro, and two votes for keeping some context with Andres and I. Anybody else has an opinion to offer? Please note that I think that some messages should keep some context anyway, for example the WAL-related information is useful. An example is this one where the offset is good to know about: + ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), + (errmsg("could not read from log segment %s, offset %u: read %d bytes, expected %d", + fname, readOff, r, XLOG_BLCKSZ))); -- Michael
Attachment
On Mon, Jun 18, 2018 at 6:42 AM, Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jun 14, 2018 at 09:50:33AM -0400, Robert Haas wrote:
> On Mon, Jun 11, 2018 at 6:11 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I would go as far as suggesting to remove qualifiers that indicate what
>> the file is for (such as "relation mapping file"); relying on the path
>> as indicator of what's going wrong should be sufficient, since it's an
>> error that affects internals anyway, not anything that users can do much
>> about. Keep variations to a minimum, to ease translator's work;
>> sometimes it's hard enough to come up with good translations for things
>> like "relation mapping file" in the first place, and they don't help the
>> end user.
>
> +1. I think those things are often hard to phrase even in English.
> It makes the messages long, awkward, and almost invariably the style
> differs from one message to the next depending on the author and how
> easy it is to describe the type of file involved.
Okay, so this makes two votes in favor of keeping the messages simple
without context, shaped like "could not read file %s", with Robert and
Alvaro, and two votes for keeping some context with Andres and I.
Anybody else has an opinion to offer?
Count me in with Robert and Alvaro with a +0.5 :)
Please note that I think that some messages should keep some context
anyway, for example the WAL-related information is useful. An example
is this one where the offset is good to know about:
+ ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
+ (errmsg("could not read from log segment %s, offset %u: read %d bytes, expected %d",
+ fname, readOff, r, XLOG_BLCKSZ)));
Yeah, I think you'd need to make a call for the individual message to see how much it helps. In this one the context definitely does, in some others it doesn't.
On Wed, Jun 20, 2018 at 02:52:23PM +0200, Magnus Hagander wrote: > On Mon, Jun 18, 2018 at 6:42 AM, Michael Paquier <michael@paquier.xyz> > wrote: >> Okay, so this makes two votes in favor of keeping the messages simple >> without context, shaped like "could not read file %s", with Robert and >> Alvaro, and two votes for keeping some context with Andres and I. >> Anybody else has an opinion to offer? >> > > Count me in with Robert and Alvaro with a +0.5 :) Thanks for your opinion. >> Please note that I think that some messages should keep some context >> anyway, for example the WAL-related information is useful. An example >> is this one where the offset is good to know about: >> + ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), >> + (errmsg("could not read from log segment %s, offset %u: read >> %d bytes, expected %d", >> + fname, readOff, r, XLOG_BLCKSZ))); > > Yeah, I think you'd need to make a call for the individual message to see > how much it helps. In this one the context definitely does, in some others > it doesn't. Sure. I have looked at that and I am finishing with the updated version attached. I removed the portion about slru in the code, but I would like to add at least a mention about it in the commit message. The simplifications are also applied for the control file messages you changed as well in cfb758b. Also in ReadControlFile(), we use "could not open control file", so for consistency this should be replaced with a more simple message. I noticed as well that the 2PC file handling loses errno a couple of times, and that we had better keep the messages also consistent if we do the simplification move... relmapper.c also gains simplifications. All the incorrect errno handling I found should be back-patched in my opinion as we did so in the past with 2a11b188 for example. I am not really willing to bother about the error message improvements on anything else than HEAD (not v11, but v12), but... Opinions about all that? -- Michael
Attachment
On Thu, Jun 21, 2018 at 2:32 AM, Michael Paquier <michael@paquier.xyz> wrote: > Sure. I have looked at that and I am finishing with the updated version > attached. > > I removed the portion about slru in the code, but I would like to add at > least a mention about it in the commit message. The simplifications are > also applied for the control file messages you changed as well in > cfb758b. Also in ReadControlFile(), we use "could not open control > file", so for consistency this should be replaced with a more simple > message. I noticed as well that the 2PC file handling loses errno a > couple of times, and that we had better keep the messages also > consistent if we do the simplification move... relmapper.c also gains > simplifications. > > All the incorrect errno handling I found should be back-patched in my > opinion as we did so in the past with 2a11b188 for example. I am not > really willing to bother about the error message improvements on > anything else than HEAD (not v11, but v12), but... Opinions about all > that? I think this should be split into two patches. Fixing failure to save and restore errno is a different issue from cleaning up short write messages. I agree that the former should be back-patched and the latter not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-Jun-22, Robert Haas wrote: > I think this should be split into two patches. Fixing failure to save > and restore errno is a different issue from cleaning up short write > messages. I agree that the former should be back-patched and the > latter not. Hmm, yeah, good thought, +1. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 22, 2018 at 09:51:30AM -0400, Alvaro Herrera wrote: > On 2018-Jun-22, Robert Haas wrote: >> I think this should be split into two patches. Fixing failure to save >> and restore errno is a different issue from cleaning up short write >> messages. I agree that the former should be back-patched and the >> latter not. > > Hmm, yeah, good thought, +1. That's exactly why I have started this thread so as both problems are addressed separately: https://www.postgresql.org/message-id/20180622061535.GD5215@paquier.xyz And back-patching the errno patch while only bothering about messages on HEAD matches also what I got in mind. I'll come back to this thread once the errno issues are all addressed. -- Michael
Attachment
On Sat, Jun 23, 2018 at 08:48:03AM +0900, Michael Paquier wrote: > That's exactly why I have started this thread so as both problems are > addressed separately: > https://www.postgresql.org/message-id/20180622061535.GD5215@paquier.xyz > And back-patching the errno patch while only bothering about messages on > HEAD matches also what I got in mind. I'll come back to this thread > once the errno issues are all addressed. As this one is done, I have been looking at that this thread again. Peter Eisentraut has pushed as e5d11b9 something which does not need to worry about pluralization of error messages. So I have moved to this message style for all messages. All of this is done as 0001. I have been thinking as well about a common interface which could be used to read/write/fsync transient files: void WriteTransientFile(int fd, char *buf, Size count, int elevel, const char *filename, uint32 wait_event_info); bool ReadTransientFile(int fd, char *buf, Size count, int elevel, const char *filename, uint32 wait_event_info); void SyncTransientFile(int fd, int elevel, const char *filename uint32 wait_event_info); There are rather equivalent things with FileRead and FileWrite but the purpose is different as well. If you look at 0002, this shaves a bit of code: 6 files changed, 128 insertions(+), 200 deletions(-) There are also a couple of advantages here: - Centralize errno handling for transient files with ENOSPC for write(2) and read count for read(2) - Wait events have to be defined, so those would unlikely get forgotten in the future. - Error handling for CloseTransientFile in code paths is centralized. ReadTransientFile could be redefined to return the number of bytes read as result with caller checking for errno, but that feels a bit duplicate work for twophase.c. WriteTransientFile and SyncTransientFile could also have the same treatment for consistency but they would not really be used now. Do you guys think that this is worth pursuing? Merging 0001 and 0002 together may make sense then. -- Michael
Attachment
On Mon, Jun 25, 2018 at 04:18:18PM +0900, Michael Paquier wrote: > As this one is done, I have been looking at that this thread again. > Peter Eisentraut has pushed as e5d11b9 something which does not need to > worry about pluralization of error messages. So I have moved to this > message style for all messages. All of this is done as 0001. Mr. Robot has been complaining about this patch set, so attached is a rebased version. Thinking about it, I would tend to just merge 0001 and give up on 0002 as that may not justify future backpatch pain. Thoughts are welcome. -- Michael
Attachment
On 2018-Jul-13, Michael Paquier wrote: > Mr. Robot has been complaining about this patch set, so attached is a > rebased version. Thinking about it, I would tend to just merge 0001 and > give up on 0002 as that may not justify future backpatch pain. Thoughts > are welcome. I vote to push both. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 13, 2018 at 11:31:31AM -0400, Alvaro Herrera wrote: >> Mr. Robot has been complaining about this patch set, so attached is a >> rebased version. Thinking about it, I would tend to just merge 0001 and >> give up on 0002 as that may not justify future backpatch pain. Thoughts >> are welcome. > > I vote to push both. Thanks! Did you look at the code? The first patch is just some cleanup, while the second could have adjustments? For the second I went with the minimal amount of work, and actually there is no need to make ReadTransientFile() return a status per my study of ReadTwoPhaseFile() in https://postgr.es/m/20180709050309.GM1467@paquier.xyz which must fail when reading the file. So patch 0002 depends on the other 2PC patch. -- Michael
Attachment
On 2018-Jul-14, Michael Paquier wrote: > On Fri, Jul 13, 2018 at 11:31:31AM -0400, Alvaro Herrera wrote: > >> Mr. Robot has been complaining about this patch set, so attached is a > >> rebased version. Thinking about it, I would tend to just merge 0001 and > >> give up on 0002 as that may not justify future backpatch pain. Thoughts > >> are welcome. > > > > I vote to push both. > > Thanks! Did you look at the code? The first patch is just some > cleanup, while the second could have adjustments? For the second I went > with the minimal amount of work, and actually there is no need to make > ReadTransientFile() return a status per my study of ReadTwoPhaseFile() > in https://postgr.es/m/20180709050309.GM1467@paquier.xyz which must fail > when reading the file. So patch 0002 depends on the other 2PC patch. I did read them, though not in minute detail. 0002 seems to result in code easier to read. If there are particular places that deviate from the obvious patterns, I didn't notice them. In 0001 one thing I wasn't terribly in love with was random deviations in sprintf format specifiers for things that should be identical, ie. %lu in some places and %zu in others, for "read only %d of %d". It seems you should pick the more general one (%zu) and use casts to Size (or is it size_t?) in the places that have other types. That way you *really* decrease translator effort to the bare minimum :-) Ah, in 0001 you have one case of "could not read _from_" (in SimpleXLogPageRead). The "from" is not present in the other places. Looks odd. I'm not sure about putting the wait event stuff inside the new functions. It looks odd, though I understand why you did it. No opinion on the 2PC stuff -- didn't look at that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jul 14, 2018 at 12:50:02AM -0400, Alvaro Herrera wrote: > I did read them, though not in minute detail. 0002 seems to result in > code easier to read. If there are particular places that deviate from > the obvious patterns, I didn't notice them. > > In 0001 one thing I wasn't terribly in love with was random deviations > in sprintf format specifiers for things that should be identical, ie. > %lu in some places and %zu in others, for "read only %d of %d". It > seems you should pick the more general one (%zu) and use casts to Size > (or is it size_t?) in the places that have other types. That way you > *really* decrease translator effort to the bare minimum :-) Yeah, that would be really the point, hence let's use "could not read file \"%s\": read %d of %zu" everywhere with a cast to Size. That's used in some other places as well. > Ah, in 0001 you have one case of "could not read _from_" (in > SimpleXLogPageRead). The "from" is not present in the other places. > Looks odd. Right. > I'm not sure about putting the wait event stuff inside the new > functions. It looks odd, though I understand why you did it. I don't really find that strange as any transient files should be tracked by I/O wait events :) > No opinion on the 2PC stuff -- didn't look at that. For now, I think that just moving forward with 0001, and then revisit 0002 once the other 2PC patch is settled makes the most sense. On the other thread, the current 2PC behavior can create silent data loss so I would like to back-patch it, so that would be less work. -- Michael
Attachment
On Sat, Jul 14, 2018 at 03:37:56PM +0900, Michael Paquier wrote: > For now, I think that just moving forward with 0001, and then revisit > 0002 once the other 2PC patch is settled makes the most sense. On the > other thread, the current 2PC behavior can create silent data loss so > I would like to back-patch it, so that would be less work. Are there any objections with this plan? If none, then I would like to move on with 0001 as there is clearly a consensus to simplify the work of translators and to clean up the error code paths for read() calls. Let's sort of the rest after the 2PC code paths are addressed. -- Michael
Attachment
On 2018-Jul-16, Michael Paquier wrote: > On Sat, Jul 14, 2018 at 03:37:56PM +0900, Michael Paquier wrote: > > For now, I think that just moving forward with 0001, and then revisit > > 0002 once the other 2PC patch is settled makes the most sense. On the > > other thread, the current 2PC behavior can create silent data loss so > > I would like to back-patch it, so that would be less work. > > Are there any objections with this plan? If none, then I would like to > move on with 0001 as there is clearly a consensus to simplify the work > of translators and to clean up the error code paths for read() calls. > Let's sort of the rest after the 2PC code paths are addressed. No objection here -- incremental progress is better than none. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 16, 2018 at 04:04:12PM -0400, Alvaro Herrera wrote: > No objection here -- incremental progress is better than none. Thanks. I have pushed 0001 now. I have found some more work which could be done, for which I'll spawn a new thread. -- Michael