Thread: pgsql: Re-enable TAP tests of pg_receivewal for ZLIB on Windows
Re-enable TAP tests of pg_receivewal for ZLIB on Windows This is a revert of 6cea447, that disabled those tests temporarily on Windows due to failures with bowerbird where gzflush() would fail when executed on a freshly-opened compressed and partial segment. This problem should be taken care of now thanks to 7fbe0c8, so let's see what the buildfarm has to say on Windows for those tests. Discussion: https://postgr.es/m/YPDLz2x3o1aX2wRh@paquier.xyz Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/91d395f47aa92849b2556b1a4d6bc1ff34121a30 Modified Files -------------- src/bin/pg_basebackup/t/020_pg_receivewal.pl | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
On 7/19/21 11:23 PM, Michael Paquier wrote: > Re-enable TAP tests of pg_receivewal for ZLIB on Windows > > This is a revert of 6cea447, that disabled those tests temporarily on > Windows due to failures with bowerbird where gzflush() would fail when > executed on a freshly-opened compressed and partial segment. This > problem should be taken care of now thanks to 7fbe0c8, so let's see what > the buildfarm has to say on Windows for those tests. > fairywren doesn't seem to like this. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Jul 20, 2021 at 05:33:52PM -0400, Andrew Dunstan wrote: > fairywren doesn't seem to like this. Indeed, it doesn't. The logs don't offer much, and I am not sure if this is a problem that happens when opening the first segment, or if that's the first write. The starting position is good: pg_receivewal: starting log streaming at 0/2000000 (timeline 1) fairywren did not run the first round of those tests in ffc9dda, perhaps that's a different issue than what bowerbird has reported. The backend logs don't tell much either, but that certainly looks like a crash: 2021-07-20 15:33:30.684 UTC [59784:13] 020_pg_receivewal.pl STATEMENT: START_REPLICATION 0/2000000 TIMELINE 1 2021-07-20 15:33:30.793 UTC [59784:14] 020_pg_receivewal.pl LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. 2021-07-20 15:33:30.793 UTC [59784:15] 020_pg_receivewal.pl STATEMENT: START_REPLICATION 0/2000000 TIMELINE 1 2021-07-20 15:33:30.793 UTC [59784:16] 020_pg_receivewal.pl LOG: unexpected EOF on standby connection Could it be possible to get a backtrace? I won't let this enabled much longer, just waiting for bowerbird to tell more. -- Michael
Attachment
On 7/20/21 7:43 PM, Michael Paquier wrote: > On Tue, Jul 20, 2021 at 05:33:52PM -0400, Andrew Dunstan wrote: >> fairywren doesn't seem to like this. > Indeed, it doesn't. The logs don't offer much, and I am not sure if > this is a problem that happens when opening the first segment, or if > that's the first write. The starting position is good: > pg_receivewal: starting log streaming at 0/2000000 (timeline 1) > > fairywren did not run the first round of those tests in ffc9dda, > perhaps that's a different issue than what bowerbird has reported. > > The backend logs don't tell much either, but that certainly looks like > a crash: > 2021-07-20 15:33:30.684 UTC [59784:13] 020_pg_receivewal.pl STATEMENT: > START_REPLICATION 0/2000000 TIMELINE 1 > 2021-07-20 15:33:30.793 UTC [59784:14] 020_pg_receivewal.pl LOG: > could not receive data from client: An existing connection was > forcibly closed by the remote host. > 2021-07-20 15:33:30.793 UTC [59784:15] 020_pg_receivewal.pl STATEMENT: > START_REPLICATION 0/2000000 TIMELINE 1 > 2021-07-20 15:33:30.793 UTC [59784:16] 020_pg_receivewal.pl LOG: > unexpected EOF on standby connection > > Could it be possible to get a backtrace? I won't let this enabled > much longer, just waiting for bowerbird to tell more. drongo, an MSVC animal running on the same machine, doesn't seem to have an issue, nor jacana which runs msys1, so this seems possibly msys2-specific. I'll investigate on a similar instance I have. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote: > drongo, an MSVC animal running on the same machine, doesn't seem to have > an issue, nor jacana which runs msys1, so this seems possibly > msys2-specific. I'll investigate on a similar instance I have. How is doing bowerbird? I was waiting for it to send an update before doing anything but there is no report yet. Is it getting stuck? I would feel honestly less sad about those tests if this proves to require temporarily a $is_msys2 rather than a $windows_os. -- Michael
Attachment
On 7/21/21 8:00 PM, Michael Paquier wrote: > On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote: >> drongo, an MSVC animal running on the same machine, doesn't seem to have >> an issue, nor jacana which runs msys1, so this seems possibly >> msys2-specific. I'll investigate on a similar instance I have. > How is doing bowerbird? I was waiting for it to send an update before > doing anything but there is no report yet. Is it getting stuck? I > would feel honestly less sad about those tests if this proves to > require temporarily a $is_msys2 rather than a $windows_os. Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to die, so it's even worse than fairywren. Let's skip for $windows_os and do some more thorough investigation. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Wed, Jul 21, 2021 at 09:08:28PM -0400, Andrew Dunstan wrote: > Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to > die, so it's even worse than fairywren. That's impressive. I went through the code of zlib and its documentation, and I don't see anything wrong in what we are doing with those APIs, even for Windows. Considering that other members with very similar environment characteristics are able to run the tests is a bit strange. > Let's skip for $windows_os and do some more thorough investigation. Okay, done for now. There is enough data gathered. -- Michael
Attachment
On 7/21/21 9:08 PM, Andrew Dunstan wrote: > On 7/21/21 8:00 PM, Michael Paquier wrote: >> On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote: >>> drongo, an MSVC animal running on the same machine, doesn't seem to have >>> an issue, nor jacana which runs msys1, so this seems possibly >>> msys2-specific. I'll investigate on a similar instance I have. >> How is doing bowerbird? I was waiting for it to send an update before >> doing anything but there is no report yet. Is it getting stuck? I >> would feel honestly less sad about those tests if this proves to >> require temporarily a $is_msys2 rather than a $windows_os. > > > Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to > die, so it's even worse than fairywren. > > > Let's skip for $windows_os and do some more thorough investigation. > > I have got to the bottom of the issue on fairywren - it was caused by using a zlib I had build (and which is used on drongo) rather than the mingw64 zlib package. So I've switched that. While doing that I uncovered some more things that need to be fixed for portability both in TestLib and the pg_basebackup tests, Meanwhile I will now go and investigate what's happening with bowerbird. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Jul 27, 2021 at 02:24:12PM -0400, Andrew Dunstan wrote: > I have got to the bottom of the issue on fairywren - it was caused by > using a zlib I had build (and which is used on drongo) rather than the > mingw64 zlib package. So I've switched that. Thanks! > While doing that I uncovered some more things that need to be fixed for > portability both in TestLib and the pg_basebackup tests, What did you uncover here? -- Michael
Attachment
On 7/27/21 2:24 PM, Andrew Dunstan wrote: > On 7/21/21 9:08 PM, Andrew Dunstan wrote: >> On 7/21/21 8:00 PM, Michael Paquier wrote: >>> On Wed, Jul 21, 2021 at 10:05:19AM -0400, Andrew Dunstan wrote: >>>> drongo, an MSVC animal running on the same machine, doesn't seem to have >>>> an issue, nor jacana which runs msys1, so this seems possibly >>>> msys2-specific. I'll investigate on a similar instance I have. >>> How is doing bowerbird? I was waiting for it to send an update before >>> doing anything but there is no report yet. Is it getting stuck? I >>> would feel honestly less sad about those tests if this proves to >>> require temporarily a $is_msys2 rather than a $windows_os. >> >> Yeah, bowerbird got a SIGBREAK that caused the whole buildfarm run to >> die, so it's even worse than fairywren. >> >> >> Let's skip for $windows_os and do some more thorough investigation. >> >> > > I have got to the bottom of the issue on fairywren - it was caused by > using a zlib I had build (and which is used on drongo) rather than the > mingw64 zlib package. So I've switched that. > > > While doing that I uncovered some more things that need to be fixed for > portability both in TestLib and the pg_basebackup tests, > > > Meanwhile I will now go and investigate what's happening with bowerbird. It gets stuck in a loop like this: ok 19 - one partial WAL segment was created 0/2001968 # Running: pg_receivewal -D H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal--verbose --endpos0/3000028 --compress 1 pg_receivewal: starting log streaming at 0/2000000 (timeline 1) pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied pg_receivewal: error: could not close file "000000010000000000000002": Permission denied pg_receivewal: disconnected; waiting 5 seconds to try again pg_receivewal: starting log streaming at 0/2000000 (timeline 1) pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied pg_receivewal: error: could not close file "000000010000000000000002": Permission denied pg_receivewal: disconnected; waiting 5 seconds to try again pg_receivewal: starting log streaming at 0/2000000 (timeline 1) pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied pg_receivewal: error: could not close file "000000010000000000000002": Permission denied pg_receivewal: disconnected; waiting 5 seconds to try again pg_receivewal: starting log streaming at 0/2000000 (timeline 1) pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied pg_receivewal: error: could not close file "000000010000000000000002": Permission denied pg_receivewal: disconnected; waiting 5 seconds to try again pg_receivewal: starting log streaming at 0/2000000 (timeline 1) pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied pg_receivewal: error: could not close file "000000010000000000000002": Permission denied pg_receivewal: disconnected; waiting 5 seconds to try again cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Jul 29, 2021 at 01:26:22PM -0400, Andrew Dunstan wrote: > It gets stuck in a loop like this: > > ok 19 - one partial WAL segment was created > 0/2001968 > # Running: pg_receivewal -D H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal--verbose --endpos0/3000028 --compress 1 > pg_receivewal: starting log streaming at 0/2000000 (timeline 1) > pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied > pg_receivewal: error: could not close file "000000010000000000000002": Permission denied > pg_receivewal: disconnected; waiting 5 seconds to try again Hmm. This error is strange. Still, it is an oversight of ffc9ddae to not use --no-loop here for the two new commands of pg_receivewal --endpos. Please see the attached. -- Michael
Attachment
On 7/29/21 8:29 PM, Michael Paquier wrote: > On Thu, Jul 29, 2021 at 01:26:22PM -0400, Andrew Dunstan wrote: >> It gets stuck in a loop like this: >> >> ok 19 - one partial WAL segment was created >> 0/2001968 >> # Running: pg_receivewal -D H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal--verbose --endpos0/3000028 --compress 1 >> pg_receivewal: starting log streaming at 0/2000000 (timeline 1) >> pg_receivewal: error: could not write 131072 bytes to WAL file "000000010000000000000002": Permission denied >> pg_receivewal: error: could not close file "000000010000000000000002": Permission denied >> pg_receivewal: disconnected; waiting 5 seconds to try again > Hmm. This error is strange. Still, it is an oversight of ffc9ddae to > not use --no-loop here for the two new commands of pg_receivewal > --endpos. Please see the attached. Well that unsticks the test, so now it fails in an ordinary way, so +1 for doing that. I must say that --no-loop seems a little ill-designed. Effectively we have to choose between zero loops and infinite loops. But that seems to be a matter of ancient (or at least medieval) history. I will see if a more modern zlib helps matters. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, Jul 30, 2021 at 07:42:45AM -0400, Andrew Dunstan wrote: > Well that unsticks the test, so now it fails in an ordinary way, so +1 > for doing that. I must say that --no-loop seems a little ill-designed. > Effectively we have to choose between zero loops and infinite loops. > But that seems to be a matter of ancient (or at least medieval) history. Okay, done. We could add an option that defines a number of failures, though I have yet to see a use for that in the field. > I will see if a more modern zlib helps matters. Thanks! -- Michael
Attachment
On 7/30/21 8:40 AM, Michael Paquier wrote: > >> I will see if a more modern zlib helps matters. It's working. You can re-re-enable the tests. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, Jul 30, 2021 at 02:03:32PM -0400, Andrew Dunstan wrote: > It's working. You can re-re-enable the tests. Thanks! I have just re-enabled the tests. I wasn't really useful here at the end :) -- Michael