Re: pgsql: Add tests for '-f' option in dropdb utility. - Mailing list pgsql-committers

From Amit Kapila
Subject Re: pgsql: Add tests for '-f' option in dropdb utility.
Date
Msg-id CAA4eK1+qV07HzeVi2iw=ccu+NNN220-=+56TBJ3LrKF69LCfGA@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Add tests for '-f' option in dropdb utility.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: Add tests for '-f' option in dropdb utility.  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-committers
On Fri, Nov 29, 2019 at 2:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Thu, Nov 28, 2019 at 8:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think the correct question to ask is "why not all cases"?
>
> > As of now, it seems to me that this happens only on Windows.  I am not
> > sure why so?  I will investigate this further and share my findings.
>
> A couple of interesting things stand out from looking at the buildfarm
> failures:
>
> * On some of the machines, it seems like "chomp" is failing to get
> rid of all the trailing whitespace in $pid:
>
> ok 4 - acquired pid for SIGTERM
> not ok 5 - database foobar1 is used
>
> #   Failed test 'database foobar1 is used'
> #   at t/051_dropdb_force.pl line 71.
> #          got: '212024'
> #     expected: '212024
> '
>
> How can that be?  It somewhat-accidentally doesn't seem to be
> causing any additional problems, but still we need this test
> step to work (or else remove it, it's not really essential).
>

Yeah, we need to do something about this, if nothing works, we can
remove this step from the test, but let us first decide what to do
about the next point.

> * On all the failing machines, it's very clear from the postmaster
> log that the backend knows why it's being terminated:
>
> 2019-11-28 13:47:56.320 UTC [212024:4] 051_dropdb_force.pl FATAL:  terminating connection due to administrator
command
>

Yeah, I can confirm this behavior on the Windows machines.  I have
also seen that we already expose this behavior in more than one way
and all behave similar to this.  If I use pg_terminate_backend(<pid>)
or pg_ctl kill TERM <pid>, the behavior is exactly the same
(terminated backend doesn't get the message (FATAL:  terminating
connection due to administrator command), but it is present in
postmaster log.

> So the question seems to be why libpq isn't reporting that
> message before it detects connection-closed.
>
> This triggered a vague memory, and after a bit of archives-digging,
> I found this thread from a few months back:
>
>
https://www.postgresql.org/message-id/flat/CA%2BhUKGJowQypXSKsjws9A%2BnEQDD0-mExHZqFXtJ09N209rCO5A%40mail.gmail.com#0629f079bc59ecdaa0d6ac9f8f2c18ac
>
> in which it's alleged that Windows' TCP stack is flat-out
> broken and will drop not-yet-delivered data when the server
> closes the connection.
>
> If that's true, it's pretty nasty.  Windows is about the
> last platform where I'd want us to have behavior like this,
> because we *will* get bug reports about it from novices.
>
> If there's no other workaround, I'm tempted to propose
>
> #ifdef WIN32
>         pg_sleep(1 second);
> #endif
>
> or something close to that, before we close the socket.
>

I can experiment with this or if something else occurred to me.

> Or we could revert the whole feature.
>

Yeah, that is also one possibility, but I think given we already have
this behavior in existing features, it is better to either come up
with some solution or maybe mention in docs that in such cases users
need to check postmaster log to know the actual reason.

I think we can further explore this, but for now, we might want to (a)
revert this test, or (b) change the expected output to match.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-committers by date:

Previous
From: Tomas Vondra
Date:
Subject: pgsql: Use memcpy instead of a byte loop in pglz_decompress
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Add error position to an error message