Re: Some 9.5beta2 backend processes not terminating properly? - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Some 9.5beta2 backend processes not terminating properly?
Date
Msg-id CADT4RqBMPE_V=7DCtqkdQdzWyF-E-uV-jWTpuP8u7eOfXziOmA@mail.gmail.com
Whole thread Raw
In response to Re: Some 9.5beta2 backend processes not terminating properly?  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Some 9.5beta2 backend processes not terminating properly?
Re: Some 9.5beta2 backend processes not terminating properly?
Re: Some 9.5beta2 backend processes not terminating properly?
List pgsql-hackers
OK, I finally found some time to dive into this.

The backends seem to hang when the client closes a socket without first sending a Terminate message - some of the tests make this happen. I've confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7), but this does not occur on Ubuntu 15.10. The client runs on Windows as well (although I doubt that's important).

In case it helps, here's a gist with some .NET code that uses Npgsql 3.0.4 to reproduce this.

If there's anything else I can do please let me know.

Shay

On Wed, Dec 30, 2015 at 5:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:


On Tue, Dec 29, 2015 at 7:04 PM, Shay Rojansky <roji@roji.org> wrote:
Could you describe the worklad a bit more? Is this rather concurrent? Do
you use optimized or debug builds? How long did you wait for the
backends to die? Is this all over localhost, external ip but local,
remotely?

The workload is a a rather diverse set of integration tests executed with Npgsql. There's no concurrency whatsoever - tests are executed serially. The backends stay alive indefinitely, until they are killed. All this is over localhost with TCP. I can try other scenarios if that'll help.
 

What procedure do you use to kill backends?  Normally, if we kill
via task manager using "End Process", it is considered as backend
crash and the server gets restarted and all other backends got
disconnected.
 
> Note that the number of backends that stay stuck after the tests is
> constant (always 12).

Can you increase the number of backends used in the test? And check
whether it's still 12?

Well, I ran the testsuite twice in parallel, and got... 23 backends stuck at the end.
 
How are your clients disconnecting? Possibly without properly
disconnecting?

That's possible, definitely in some of the test cases.

What I can do is try to isolate things further by playing around with the tests and trying to see if a more minimal repro can be done - I'll try doing this later today or tomorrow. If anyone has any other specific tests or checks I should do let me know.

I think first we should try to isolate whether the hanged backends
are due to the reason that they are not disconnected properly or
there is some other factor involved as well, so you can try to kill/
disconnect the sessions connected via psql in the same way as
you are doing for connections with Npgsql and see if you can
reproduce the same behaviour.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: --enable-depend by default (was Re: Patch: fix lock contention for HASHHDR.mutex)
Next
From: Tom Lane
Date:
Subject: Rationalizing Query.withCheckOptions