Thread: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
Re-enable TAP tests of pg_receivewal for ZLIB on Windows

This is a revert of 6cea447, that disabled those tests temporarily on
Windows due to failures with bowerbird where gzflush() would fail when
executed on a freshly-opened compressed and partial segment.  This
problem should be taken care of now thanks to 7fbe0c8, so let's see what
the buildfarm has to say on Windows for those tests.

Discussion: https://postgr.es/m/YPDLz2x3o1aX2wRh@paquier.xyz

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/91d395f47aa92849b2556b1a4d6bc1ff34121a30

Modified Files
--------------
src/bin/pg_basebackup/t/020_pg_receivewal.pl | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)


Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/19/21 11:23 PM, Michael Paquier wrote:
> Re-enable TAP tests of pg_receivewal for ZLIB on Windows
>
> This is a revert of 6cea447, that disabled those tests temporarily on
> Windows due to failures with bowerbird where gzflush() would fail when
> executed on a freshly-opened compressed and partial segment.  This
> problem should be taken care of now thanks to 7fbe0c8, so let's see what
> the buildfarm has to say on Windows for those tests.
>

fairywren doesn't seem to like this.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Tue, Jul 20, 2021 at 05:33:52PM -0400, Andrew Dunstan wrote:
> fairywren doesn't seem to like this.

Indeed, it doesn't.  The logs don't offer much, and I am not sure if
this is a problem that happens when opening the first segment, or if
that's the first write.  The starting position is good:
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)

fairywren did not run the first round of those tests in ffc9dda,
perhaps that's a different issue than what bowerbird has reported.

The backend logs don't tell much either, but that certainly looks like
a crash:
2021-07-20 15:33:30.684 UTC [59784:13] 020_pg_receivewal.pl STATEMENT:
START_REPLICATION 0/2000000 TIMELINE 1
2021-07-20 15:33:30.793 UTC [59784:14] 020_pg_receivewal.pl LOG:
could not receive data from client: An existing connection was
forcibly closed by the remote host.
2021-07-20 15:33:30.793 UTC [59784:15] 020_pg_receivewal.pl STATEMENT:
START_REPLICATION 0/2000000 TIMELINE 1
2021-07-20 15:33:30.793 UTC [59784:16] 020_pg_receivewal.pl LOG:
unexpected EOF on standby connection

Could it be possible to get a backtrace?  I won't let this enabled
much longer, just waiting for bowerbird to tell more.
--
Michael

Attachment

Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/20/21 7:43 PM, Michael Paquier wrote:
> On Tue, Jul 20, 2021 at 05:33:52PM -0400, Andrew Dunstan wrote:
>> fairywren doesn't seem to like this.
> Indeed, it doesn't.  The logs don't offer much, and I am not sure if
> this is a problem that happens when opening the first segment, or if
> that's the first write.  The starting position is good:
> pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
>
> fairywren did not run the first round of those tests in ffc9dda,
> perhaps that's a different issue than what bowerbird has reported.
>
> The backend logs don't tell much either, but that certainly looks like
> a crash:
> 2021-07-20 15:33:30.684 UTC [59784:13] 020_pg_receivewal.pl STATEMENT:
> START_REPLICATION 0/2000000 TIMELINE 1
> 2021-07-20 15:33:30.793 UTC [59784:14] 020_pg_receivewal.pl LOG:
> could not receive data from client: An existing connection was
> forcibly closed by the remote host.
> 2021-07-20 15:33:30.793 UTC [59784:15] 020_pg_receivewal.pl STATEMENT:
> START_REPLICATION 0/2000000 TIMELINE 1
> 2021-07-20 15:33:30.793 UTC [59784:16] 020_pg_receivewal.pl LOG:
> unexpected EOF on standby connection
>
> Could it be possible to get a backtrace?  I won't let this enabled 
> much longer, just waiting for bowerbird to tell more.


drongo, an MSVC animal running on the same machine, doesn't seem to have
an issue, nor jacana which runs msys1, so this seems possibly
msys2-specific. I'll investigate on a similar instance I have.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote:
> drongo, an MSVC animal running on the same machine, doesn't seem to have
> an issue, nor jacana which runs msys1, so this seems possibly
> msys2-specific. I'll investigate on a similar instance I have.

How is doing bowerbird?  I was waiting for it to send an update before
doing anything but there is no report yet.  Is it getting stuck?  I
would feel honestly less sad about those tests if this proves to
require temporarily a $is_msys2 rather than a $windows_os.
--
Michael

Attachment

Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/21/21 8:00 PM, Michael Paquier wrote:
> On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote:
>> drongo, an MSVC animal running on the same machine, doesn't seem to have
>> an issue, nor jacana which runs msys1, so this seems possibly
>> msys2-specific. I'll investigate on a similar instance I have.
> How is doing bowerbird?  I was waiting for it to send an update before
> doing anything but there is no report yet.  Is it getting stuck?  I
> would feel honestly less sad about those tests if this proves to
> require temporarily a $is_msys2 rather than a $windows_os.



Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to
die, so it's even worse than fairywren.


Let's skip for $windows_os and do some more thorough investigation.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Wed, Jul 21, 2021 at 09:08:28PM -0400, Andrew Dunstan wrote:
> Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to
> die, so it's even worse than fairywren.

That's impressive.  I went through the code of zlib and its
documentation, and I don't see anything wrong in what we are doing
with those APIs, even for Windows.  Considering that other members
with very similar environment characteristics are able to run the
tests is a bit strange.

> Let's skip for $windows_os and do some more thorough investigation.

Okay, done for now.  There is enough data gathered.
--
Michael

Attachment

Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/21/21 9:08 PM, Andrew Dunstan wrote:
> On 7/21/21 8:00 PM, Michael Paquier wrote:
>> On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote:
>>> drongo, an MSVC animal running on the same machine, doesn't seem to have
>>> an issue, nor jacana which runs msys1, so this seems possibly
>>> msys2-specific. I'll investigate on a similar instance I have.
>> How is doing bowerbird?  I was waiting for it to send an update before
>> doing anything but there is no report yet.  Is it getting stuck?  I
>> would feel honestly less sad about those tests if this proves to
>> require temporarily a $is_msys2 rather than a $windows_os.
>
>
> Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to
> die, so it's even worse than fairywren.
>
>
> Let's skip for $windows_os and do some more thorough investigation.
>
>


I have got to the bottom of the issue on fairywren - it was caused by
using a zlib I had build (and which is used on drongo) rather than the
mingw64 zlib package. So I've switched that.


While doing that I uncovered some more things that need to be fixed for
portability both in TestLib and the pg_basebackup tests,


Meanwhile I will now go and investigate what's happening with bowerbird.


cheers


andrew


-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Tue, Jul 27, 2021 at 02:24:12PM -0400, Andrew Dunstan wrote:
> I have got to the bottom of the issue on fairywren - it was caused by
> using a zlib I had build (and which is used on drongo) rather than the
> mingw64 zlib package. So I've switched that.

Thanks!

> While doing that I uncovered some more things that need to be fixed for
> portability both in TestLib and the pg_basebackup tests,

What did you uncover here?
--
Michael

Attachment

Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/27/21 2:24 PM, Andrew Dunstan wrote:
> On 7/21/21 9:08 PM, Andrew Dunstan wrote:
>> On 7/21/21 8:00 PM, Michael Paquier wrote:
>>> On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote:
>>>> drongo, an MSVC animal running on the same machine, doesn't seem to have
>>>> an issue, nor jacana which runs msys1, so this seems possibly
>>>> msys2-specific. I'll investigate on a similar instance I have.
>>> How is doing bowerbird?  I was waiting for it to send an update before
>>> doing anything but there is no report yet.  Is it getting stuck?  I
>>> would feel honestly less sad about those tests if this proves to
>>> require temporarily a $is_msys2 rather than a $windows_os.
>>
>> Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to
>> die, so it's even worse than fairywren.
>>
>>
>> Let's skip for $windows_os and do some more thorough investigation.
>>
>>
>
> I have got to the bottom of the issue on fairywren - it was caused by
> using a zlib I had build (and which is used on drongo) rather than the
> mingw64 zlib package. So I've switched that.
>
>
> While doing that I uncovered some more things that need to be fixed for
> portability both in TestLib and the pg_basebackup tests,
>
>
> Meanwhile I will now go and investigate what's happening with bowerbird.



It gets stuck in a loop like this:


ok 19 - one partial WAL segment was created
0/2001968
# Running: pg_receivewal -D
H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal--verbose
--endpos0/3000028 --compress 1 
 
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
pg_receivewal: disconnected; waiting 5 seconds to try again
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
pg_receivewal: disconnected; waiting 5 seconds to try again
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
pg_receivewal: disconnected; waiting 5 seconds to try again
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
pg_receivewal: disconnected; waiting 5 seconds to try again
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
pg_receivewal: disconnected; waiting 5 seconds to try again


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Thu, Jul 29, 2021 at 01:26:22PM -0400, Andrew Dunstan wrote:
> It gets stuck in a loop like this:
>
> ok 19 - one partial WAL segment was created
> 0/2001968
> # Running: pg_receivewal -D
H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal--verbose
--endpos0/3000028 --compress 1  
> pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
> pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
> pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
> pg_receivewal: disconnected; waiting 5 seconds to try again

Hmm.  This error is strange.  Still, it is an oversight of ffc9ddae to
not use --no-loop here for the two new commands of pg_receivewal
--endpos.  Please see the attached.
--
Michael

Attachment

Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/29/21 8:29 PM, Michael Paquier wrote:
> On Thu, Jul 29, 2021 at 01:26:22PM -0400, Andrew Dunstan wrote:
>> It gets stuck in a loop like this:
>>
>> ok 19 - one partial WAL segment was created
>> 0/2001968
>> # Running: pg_receivewal -D
H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal--verbose
--endpos0/3000028 --compress 1 
 
>> pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
>> pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied
>> pg_receivewal: error: could not close file "000000010000000000000002": Permission denied
>> pg_receivewal: disconnected; waiting 5 seconds to try again
> Hmm.  This error is strange.  Still, it is an oversight of ffc9ddae to
> not use --no-loop here for the two new commands of pg_receivewal
> --endpos.  Please see the attached.



Well that unsticks the test, so now it fails in an ordinary way, so +1
for doing that. I must say that --no-loop seems a little ill-designed.
Effectively we have to choose between zero loops and infinite loops. 
But that seems to be a matter of ancient (or at least medieval) history.


I will see if a more modern zlib helps matters.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Fri, Jul 30, 2021 at 07:42:45AM -0400, Andrew Dunstan wrote:
> Well that unsticks the test, so now it fails in an ordinary way, so +1
> for doing that. I must say that --no-loop seems a little ill-designed.
> Effectively we have to choose between zero loops and infinite loops. 
> But that seems to be a matter of ancient (or at least medieval) history.

Okay, done.  We could add an option that defines a number of failures,
though I have yet to see a use for that in the field.

> I will see if a more modern zlib helps matters.

Thanks!
--
Michael

Attachment

Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Andrew Dunstan
Date:
On 7/30/21 8:40 AM, Michael Paquier wrote:
>
>> I will see if a more modern zlib helps matters.


It's working. You can re-re-enable the tests.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows

From
Michael Paquier
Date:
On Fri, Jul 30, 2021 at 02:03:32PM -0400, Andrew Dunstan wrote:
> It's working. You can re-re-enable the tests.

Thanks!  I have just re-enabled the tests.  I wasn't really useful
here at the end :)
--
Michael

Attachment