Thread: Connection terminated but client didn't realise
Hello wise postgresians,
We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25 minutes) and then it will usually return the results to the application and everything is golden. However occasionally we see something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’ perspective. From the application’s perspective the connection is still alive, and it sits there forever waiting for the result.
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.202 AEDT [4034]: [4693187-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 CONTEXT: SQL statement "refresh materialized view cube02.fact_sales"
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.202 AEDT [4034]: [4693188-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 STATEMENT: select analytics.refresh($1)
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693189-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG: duration: 1444211.603 ms execute S_126: select analytics.refresh($1)
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693190-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 DETAIL: parameters: $1 = 'f'
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693191-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG: could not receive data from client: Connection reset by peer
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693192-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG: unexpected EOF on client connection with an open transaction
postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.845 AEDT [4034]: [4693193-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG: disconnection: session time: 82:44:13.962 user=server@usn2082000510 database=usn2082000510 host=172.30.3.21 port=52722
From the application side, this is the thread that executes the query
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:288)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1963)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
- locked <0x0000000683a1fe70> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:117)
Possibly relevant is that there’s another thread attempting to close this connection, and is waiting for lock on 0x0000000683a1fe70 (this is Glassfish’s connection leak reclaiming)
at org.postgresql.core.QueryExecutorBase.getTransactionState(QueryExecutorBase.java:242)
- waiting to lock <0x0000000683a1fe70> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc.PgConnection.rollback(PgConnection.java:780)
at org.postgresql.ds.PGPooledConnection.close(PGPooledConnection.java:86)
at com.sun.gjc.spi.ManagedConnectionImpl.destroy(ManagedConnectionImpl.java:433)
I haven’t confirmed this but my guess is that this second thread doesn’t kick in until much later in the events, well after 2019-11-11 22:59:04.845.
postgres version: PostgreSQL 9.5.19 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
JDBC driver postgresql-42.1.4
Does anyone have an idea what might be causing the connection to close? Or perhaps why the JDBC thread hasn’t detected that the connection is closed?
TIA
Best regards,
David Wheeler
General Manager Bali Office
Bali T +62 361 475 2333 M +62 819 3660 9180
Jl. Pura Mertasari No. 7, Sunset Road Abian Base
Kuta, Badung – Bali 80361, Indonesia
Attachment
David Wheeler <dwheeler@dgitsystems.com> writes: > We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25 minutes)and then it will usually return the results to the application and everything is golden. However occasionally wesee something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’ perspective.From the application’s perspective the connection is still alive, and it sits there forever waiting for the result. Is the application remote from the database server? My gut reaction to this type of report is "something timed out the network connection", but there would have to be a router or firewall or the like in between to make that a tenable explanation. If that is the issue, you should be able to fix it by making the server's TCP keepalive settings more aggressive. regards, tom lane
> Is the application remote from the database server? My gut reaction to this type of report is "something timed out thenetwork connection", but there would have to be a router or firewall or the like in between to make that a tenable explanation. > If that is the issue, you should be able to fix it by making the server's TCP keepalive settings more aggressive. Yes the application server is separate from the database server, and the application is running within docker which I suspectadds some complexity too. I had suspicions about something in the middle closing the connection too, but your emailhas clarified my thinking a bit. TCP Keepalive appears to be enabled on the application server and within docker, and the client holds the allegedly deadconnection for much longer (24h) than the keepalive should take to kill it (<3h), so I think the next step is to tryto identify the connection at the OS level with netstat to see what state it's in. Thanks for your help. Regards, David On 2/12/19, 11:17 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: David Wheeler <dwheeler@dgitsystems.com> writes: > We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25 minutes)and then it will usually return the results to the application and everything is golden. However occasionally wesee something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’ perspective.From the application’s perspective the connection is still alive, and it sits there forever waiting for the result. Is the application remote from the database server? My gut reaction to this type of report is "something timed out the network connection", but there would have to be a router or firewall or the like in between to make that a tenable explanation. If that is the issue, you should be able to fix it by making the server's TCP keepalive settings more aggressive. regards, tom lane