Thread: Re: Is this a bug in pg_current_logfile() on Windows?
[ redirecting to pghackers ] Thomas Kellerer <shammat@gmx.net> writes: > Tom Lane schrieb am 08.07.2020 um 18:41: >> Somehow, the reading file is being left in binary mode and thus it's >> failing to convert \r\n back to plain \n. >> Now the weird thing about that is I'd have expected "r" and "w" modes >> to imply Windows text mode already, so that I'd have figured that >> _setmode call to be a useless no-op. Apparently on some Windows libc >> implementations, it's not. How was your installation built exactly? > That's the build from EnterpriseDB What I'd momentarily forgotten is that we don't use Windows' native fopen(). On that platform, we use pgwin32_fopen which defaults to binary mode (because _open_osfhandle does). So the _setmode calls in syslogger.c are *not* no-ops, and the failure in pg_current_logfile() is clearly explained by the fact that it's doing nothing to strip carriage returns. However ... I put in a test case to try to expose this failure, and our Windows buildfarm critters remain perfectly happy. So what's up with that? After some digging around, I believe the reason is that PostgresNode::psql is stripping the \r from pg_current_logfile()'s result, here: $$stdout =~ s/\r//g if $TestLib::windows_os; I'm slightly tempted to extend the test case by verifying on the server side that the result ends in ".log" with no extra characters. More generally, I wonder if the above behavior is really a good idea. It seems to have been added in commit 33f3bbc6d as a hack to avoid having to think too hard about mingw's behavior, but now I wonder if it isn't masking other bugs too. At the very least I think we ought to tighten the coding to $$stdout =~ s/\r\n/\n/g if $TestLib::windows_os; so that it won't strip carriage returns at random. Meanwhile, back at the ranch, how shall we fix pg_current_logfile()? I see two credible alternatives: 1. Insert #ifdef WIN32 _setmode(_fileno(fd), _O_TEXT); #endif to make this function match the coding in syslogger.c. 2. Manually strip '\r' if present, independent of platform. The second alternative would conform to the policy we established in commit b654714f9, that newline-chomping code should uniformly drop \r. However, that policy is mainly intended to allow non-Windows builds to cope with text files that might have been made with a Windows text editor. Surely we don't need to worry about a cross-platform source for the log metafile. So I'm leaning a bit to the first alternative, so as not to add useless overhead and complexity on non-Windows builds. Thoughts? regards, tom lane
On 7/8/20 5:26 PM, Tom Lane wrote: > > However ... I put in a test case to try to expose this failure, and > our Windows buildfarm critters remain perfectly happy. So what's up > with that? After some digging around, I believe the reason is that > PostgresNode::psql is stripping the \r from pg_current_logfile()'s > result, here: > > $$stdout =~ s/\r//g if $TestLib::windows_os; > > I'm slightly tempted to extend the test case by verifying on the > server side that the result ends in ".log" with no extra characters. > More generally, I wonder if the above behavior is really a good idea. > It seems to have been added in commit 33f3bbc6d as a hack to avoid > having to think too hard about mingw's behavior, but now I wonder if > it isn't masking other bugs too. At the very least I think we ought > to tighten the coding to > > $$stdout =~ s/\r\n/\n/g if $TestLib::windows_os; > > so that it won't strip carriage returns at random. > Seems reasonable. If we rip it out completely we'll have to find all the places it breaks and fix them. And we'll almost certainly get new breakage. If it's hiding a real bug we'll have to do that, but I'd be reluctant unless there's actual proof. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 7/8/20 5:26 PM, Tom Lane wrote: >> However ... I put in a test case to try to expose this failure, and >> our Windows buildfarm critters remain perfectly happy. So what's up >> with that? After some digging around, I believe the reason is that >> PostgresNode::psql is stripping the \r from pg_current_logfile()'s >> result, here: >> $$stdout =~ s/\r//g if $TestLib::windows_os; >> I'm slightly tempted to extend the test case by verifying on the >> server side that the result ends in ".log" with no extra characters. >> More generally, I wonder if the above behavior is really a good idea. >> It seems to have been added in commit 33f3bbc6d as a hack to avoid >> having to think too hard about mingw's behavior, but now I wonder if >> it isn't masking other bugs too. At the very least I think we ought >> to tighten the coding to >> $$stdout =~ s/\r\n/\n/g if $TestLib::windows_os; >> so that it won't strip carriage returns at random. > Seems reasonable. If we rip it out completely we'll have to find all the > places it breaks and fix them. And we'll almost certainly get new > breakage. If it's hiding a real bug we'll have to do that, but I'd be > reluctant unless there's actual proof. Hard to tell. What I propose to do right now is change the \r filters as shown above, and see if the test I added in 004_logrotate.pl starts to show failures on Windows. If it does, and no other place does, I'm willing to be satisfied with that. If we see *other* failures then that'd prove that the problem is real, no? regards, tom lane
I wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> Seems reasonable. If we rip it out completely we'll have to find all the >> places it breaks and fix them. And we'll almost certainly get new >> breakage. If it's hiding a real bug we'll have to do that, but I'd be >> reluctant unless there's actual proof. > Hard to tell. What I propose to do right now is change the \r filters > as shown above, and see if the test I added in 004_logrotate.pl starts > to show failures on Windows. If it does, and no other place does, > I'm willing to be satisfied with that. If we see *other* failures then > that'd prove that the problem is real, no? So I did that, and the first report is from bowerbird and it's still green. Unless I'm completely misinterpreting what's happening (always a possibility), that means we're still managing to remove "data" occurrences of \r. The most likely theory about that, I think, is that IPC::Run::run already translated any \r\n occurrences in the psql command's output to plain \n. Then, the \r generated by pg_current_logfile() would butt up against the line-ending \n, allowing the "fix" in sub psql to remove valid data. One thing I noticed while making 91bdf499b is that some of these substitutions are conditioned on "if $TestLib::windows_os" while others are conditioned on "if $Config{osname} eq 'msys'". What is the reason for this difference? Is it possible that we only really need to do it in the latter case? regards, tom lane
On 7/8/20 10:40 PM, Tom Lane wrote: > I wrote: >> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>> Seems reasonable. If we rip it out completely we'll have to find all the >>> places it breaks and fix them. And we'll almost certainly get new >>> breakage. If it's hiding a real bug we'll have to do that, but I'd be >>> reluctant unless there's actual proof. >> Hard to tell. What I propose to do right now is change the \r filters >> as shown above, and see if the test I added in 004_logrotate.pl starts >> to show failures on Windows. If it does, and no other place does, >> I'm willing to be satisfied with that. If we see *other* failures then >> that'd prove that the problem is real, no? > So I did that, and the first report is from bowerbird and it's still > green. Unless I'm completely misinterpreting what's happening (always > a possibility), that means we're still managing to remove "data" > occurrences of \r. > > The most likely theory about that, I think, is that IPC::Run::run already > translated any \r\n occurrences in the psql command's output to plain \n. > Then, the \r generated by pg_current_logfile() would butt up against the > line-ending \n, allowing the "fix" in sub psql to remove valid data. It's possible. I do see some mangling of that kind in IPC::Run's Win32IO.pm and Win32Pump.pm. Attached for reference is the IPC::Run package I usually use on Windows. > > One thing I noticed while making 91bdf499b is that some of these > substitutions are conditioned on "if $TestLib::windows_os" while others > are conditioned on "if $Config{osname} eq 'msys'". What is the reason > for this difference? Is it possible that we only really need to do it > in the latter case? > > In general I make the condition for such hacks as restrictive as possible. I don't guarantee that I have been perfectly consistent about that, though. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 7/8/20 10:40 PM, Tom Lane wrote: >> So I did that, and the first report is from bowerbird and it's still >> green. Unless I'm completely misinterpreting what's happening (always >> a possibility), that means we're still managing to remove "data" >> occurrences of \r. >> The most likely theory about that, I think, is that IPC::Run::run already >> translated any \r\n occurrences in the psql command's output to plain \n. >> Then, the \r generated by pg_current_logfile() would butt up against the >> line-ending \n, allowing the "fix" in sub psql to remove valid data. > It's possible. I do see some mangling of that kind in IPC::Run's > Win32IO.pm and Win32Pump.pm. The plot thickens: as of this morning, fairywren and jacana are showing the failure I expected, while drongo and bowerbird are not. (Our other Windows animals are not running the TAP tests, so they're no help here.) It's not hard to believe that the latter two are using a different libc implementation, but how would that affect the behavior of the TAP infrastructure? Are they also using different Perls? (By hypothesis, the pg_current_logfile bug exists across all Windows builds, so we have to explain why the TAP tests only reveal it on some of these animals.) regards, tom lane
On 7/9/20 10:44 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 7/8/20 10:40 PM, Tom Lane wrote: >>> So I did that, and the first report is from bowerbird and it's still >>> green. Unless I'm completely misinterpreting what's happening (always >>> a possibility), that means we're still managing to remove "data" >>> occurrences of \r. >>> The most likely theory about that, I think, is that IPC::Run::run already >>> translated any \r\n occurrences in the psql command's output to plain \n. >>> Then, the \r generated by pg_current_logfile() would butt up against the >>> line-ending \n, allowing the "fix" in sub psql to remove valid data. >> It's possible. I do see some mangling of that kind in IPC::Run's >> Win32IO.pm and Win32Pump.pm. > The plot thickens: as of this morning, fairywren and jacana are showing > the failure I expected, while drongo and bowerbird are not. (Our other > Windows animals are not running the TAP tests, so they're no help here.) > > It's not hard to believe that the latter two are using a different libc > implementation, but how would that affect the behavior of the TAP > infrastructure? Are they also using different Perls? (By hypothesis, > the pg_current_logfile bug exists across all Windows builds, so we have > to explain why the TAP tests only reveal it on some of these animals.) > > They should use the same libc implementation (msvcrt.dll). But the perls they are using are indeed different - msys animals have to use msys' perl for TAP tests because native perl doesn't understand msys file paths. Conversely, MSVC animals must use native perl (AS or Strawberry) to run TAP tests. So jacana and fairywren, the two msys animals, are doing what you expect5ed and drongo and bowerbird, the two MSVC animals, are not. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 7/9/20 10:44 AM, Tom Lane wrote: >> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>> On 7/8/20 10:40 PM, Tom Lane wrote: >>>> The most likely theory about that, I think, is that IPC::Run::run already >>>> translated any \r\n occurrences in the psql command's output to plain \n. >> It's not hard to believe that the latter two are using a different libc >> implementation, but how would that affect the behavior of the TAP >> infrastructure? Are they also using different Perls? (By hypothesis, >> the pg_current_logfile bug exists across all Windows builds, so we have >> to explain why the TAP tests only reveal it on some of these animals.) > But the perls they are using are indeed different - msys animals have to > use msys' perl for TAP tests because native perl doesn't understand msys > file paths. Conversely, MSVC animals must use native perl (AS or > Strawberry) to run TAP tests. So jacana and fairywren, the two msys > animals, are doing what you expect5ed and drongo and bowerbird, the two > MSVC animals, are not. Ah-hah. So this leads to the conclusion that in native perl, IPC::Run is doing \r\n conversion for us while in msys perl it is not. Therefore, we either should figure out how to get msys perl to do that conversion (and remove it from our code altogether), or make the conversions conditional on "is it msys perl?". I am not quite sure if the existing tests "if $Config{osname} eq 'msys'" are a legitimate implementation of that condition or not --- it seems like nominally they are checking the OS not the Perl, but maybe it's close enough. regards, tom lane
On 7/9/20 11:04 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 7/9/20 10:44 AM, Tom Lane wrote: >>> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>>> On 7/8/20 10:40 PM, Tom Lane wrote: >>>>> The most likely theory about that, I think, is that IPC::Run::run already >>>>> translated any \r\n occurrences in the psql command's output to plain \n. >>> It's not hard to believe that the latter two are using a different libc >>> implementation, but how would that affect the behavior of the TAP >>> infrastructure? Are they also using different Perls? (By hypothesis, >>> the pg_current_logfile bug exists across all Windows builds, so we have >>> to explain why the TAP tests only reveal it on some of these animals.) >> But the perls they are using are indeed different - msys animals have to >> use msys' perl for TAP tests because native perl doesn't understand msys >> file paths. Conversely, MSVC animals must use native perl (AS or >> Strawberry) to run TAP tests. So jacana and fairywren, the two msys >> animals, are doing what you expect5ed and drongo and bowerbird, the two >> MSVC animals, are not. > Ah-hah. So this leads to the conclusion that in native perl, IPC::Run > is doing \r\n conversion for us while in msys perl it is not. > > Therefore, we either should figure out how to get msys perl to do > that conversion (and remove it from our code altogether), or make the > conversions conditional on "is it msys perl?". I am not quite sure > if the existing tests "if $Config{osname} eq 'msys'" are a legitimate > implementation of that condition or not --- it seems like nominally > they are checking the OS not the Perl, but maybe it's close enough. > > If the reported OS is msys (it's a pseudo OS in effect) then the perl must be msys' perl. Even when called from msys, native perl reports the OS as MSWin32. So yes, close enough. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 7/9/20 11:04 AM, Tom Lane wrote: >> Therefore, we either should figure out how to get msys perl to do >> that conversion (and remove it from our code altogether), or make the >> conversions conditional on "is it msys perl?". I am not quite sure >> if the existing tests "if $Config{osname} eq 'msys'" are a legitimate >> implementation of that condition or not --- it seems like nominally >> they are checking the OS not the Perl, but maybe it's close enough. > If the reported OS is msys (it's a pseudo OS in effect) then the perl > must be msys' perl. Even when called from msys, native perl reports the > OS as MSWin32. So yes, close enough. Cool, I'll go try changing all those conditions to use the msys test. regards, tom lane
I wrote: > Cool, I'll go try changing all those conditions to use the msys test. OK, that worked: all four relevant buildfarm members are now showing the expected test failure. So I'll go fix the original bug. Should we consider back-patching the CRLF filtering changes, ie 91bdf499b + ffb4cee43? It's not really necessary perhaps, but I dislike situations where the "same" test on different branches is testing different things. Seems like a recipe for future surprises. regards, tom lane
On 7/9/20 3:36 PM, Tom Lane wrote: > I wrote: >> Cool, I'll go try changing all those conditions to use the msys test. > OK, that worked: all four relevant buildfarm members are now showing > the expected test failure. So I'll go fix the original bug. > > Should we consider back-patching the CRLF filtering changes, ie > 91bdf499b + ffb4cee43? It's not really necessary perhaps, but > I dislike situations where the "same" test on different branches is > testing different things. Seems like a recipe for future surprises. Yes please. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 7/9/20 3:36 PM, Tom Lane wrote: >> Should we consider back-patching the CRLF filtering changes, ie >> 91bdf499b + ffb4cee43? It's not really necessary perhaps, but >> I dislike situations where the "same" test on different branches is >> testing different things. Seems like a recipe for future surprises. > Yes please. Done. regards, tom lane