Re: Why is src/test/modules/committs/t/002_standby.pl flaky? - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date
Msg-id d6633d6c-e547-52d1-0798-a7848583b501@gmail.com
Whole thread Raw
In response to Re: Why is src/test/modules/committs/t/002_standby.pl flaky?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
List pgsql-hackers
> 11.01.2022 23:16, Tom Lane wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>>> Ouch.  I think our options at this point are:
>>> 1.  Revert 6051857fc (and put it back when we have a working
>>> long-lived WES as I showed).  This is not very satisfying, now that we
>>> understand the bug, because even without that change I guess you must
>>> be able to reach the hanging condition by using Windows postgres_fdw
>>> to talk to a non-Windows server (ie a normal TCP stack with graceful
>>> shutdown/linger on process exit).
>> It'd be worth checking, perhaps.  One thing I've been wondering all
>> along is how much of this behavior is specific to the local-loopback
>> case where Windows can see both ends of the connection.  You'd think
>> that they couldn't long get away with such blatant violations of the
>> TCP specs when talking to external servers, because the failures
>> would be visible to everyone with a web browser.
I've split my test (both parts attached) and run it on two virtual
machines with clean builds from master (ac7c8075) on both (just the
debugging output added to connection.c). I provide probably redundant
info (also see attached screenshot) just to make sure that I didn't make
a mistake.
The excerpt from 001_disconnection1_local.log:
...
2022-01-12 09:29:48.099 MSK|Administrator|postgres|61de755a.a54|LOG: 
pgfdw_get_result: before WaitLatchOrSocket
2022-01-12 09:29:48.099
MSK|Administrator|postgres|61de755a.a54|STATEMENT:  SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|LOG: 
pgfdw_get_result: after WaitLatchOrSocket
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|STATEMENT:  SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|LOG: 
pgfdw_get_result: before WaitLatchOrSocket
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|STATEMENT:  SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|LOG: 
pgfdw_get_result: after WaitLatchOrSocket
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|STATEMENT:  SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|ERROR: 
FATAL:  terminating connection due to administrator command
    server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|CONTEXT:  remote SQL command:
FETCH 100 FROM c1
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|WARNING:  no connection to the
server
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|CONTEXT:  remote SQL command:
ABORT TRANSACTION
2022-01-12 09:29:48.107 MSK|Administrator|postgres|61de755a.a54|LOG: 
disconnection: session time: 0:00:01.577 user=Administrator
database=postgres host=127.0.0.1 port=49752
2022-01-12 09:29:48.257 MSK|[unknown]|[unknown]|61de755c.a4c|LOG: 
connection received: host=127.0.0.1 port=49754
2022-01-12 09:29:48.261 MSK|Administrator|postgres|61de755c.a4c|LOG: 
connection authenticated: identity="WIN-FCPSOVMM1JC\Administrator"
method=sspi
(C:/src/postgrespro/contrib/postgres_fdw/tmp_check/t_001_disconnection1_local_data/pgdata/pg_hba.conf:2)
2022-01-12 09:29:48.261 MSK|Administrator|postgres|61de755c.a4c|LOG: 
connection authorized: user=Administrator database=postgres
application_name=001_disconnection1.pl
2022-01-12 09:29:48.263 MSK|Administrator|postgres|61de755c.a4c|LOG: 
statement: SELECT * FROM large WHERE a = fx2(a)
2022-01-12 09:29:48.285 MSK|Administrator|postgres|61de755c.a4c|LOG: 
pgfdw_get_result: before WaitLatchOrSocket
2022-01-12 09:29:48.285
MSK|Administrator|postgres|61de755c.a4c|STATEMENT:  SELECT * FROM large
WHERE a = fx2(a)
...

By the look of things, you are right and this is the localhost-only issue.
I've rechecked that the test 001_disconnection.pl (local-loopback
version) hangs on both machines while 001_disconnection1.pl performs
successfully in both directions. I'm not sure whether the Windows client
and non-Windows server or reverse combinations are of interest in light
of the above.

Best regards,
Alexander

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Next
From: Julien Rouhaud
Date:
Subject: Re: Proposal: allow database-specific role memberships