Thread: TCP Resets when closing connection opened via SSL
Hello,
I'm trying to do a simple health check for keepalived and other services via a python script and psycopg2 library. All seems to be all right, until I close the connection, at which point a packet with TCP reset is produced.
This has become very problematic and creates extensive noise in monitoring.
It also happens with IntelliJ DataGrip provided Java driver and PHP drivers. And hence I do not think this is the drivers problem, but PG instead.
Script
import psycopg2
import sys
def check(): # Relying on .pgpass for password con = psycopg2.connect('user=monitoring dbname=postgres host=127.0.0.1') cur = con.cursor() cur.execute("select 'keepalived healthcheck'") cur.close() con.close()
try: check() print("ok")
except Exception as e: print("not ok") print(str(e)) sys.exit(1)
tcpdump tcpdump -v 'tcp[tcpflags] & (tcp-rst) != 0' -ilo
16:27:45.307006 IP (tos 0x0, ttl 64, id 8123, offset 0, flags [DF], proto TCP (6), length 40) localhost.40797 > localhost.postgres: Flags [R], cksum 0x0cca (correct), seq 3830516781, win 0, length 0
From PG side of things, all seems to be OK
2019-04-23 16:27:45.300 CEST process=15615 c= t=0 s=5cbf20e1.3cff [unknown]@127.0.0.1:[unknown] app=[unknown] LOG: connection received: host=127.0.0.1 port=40797
2019-04-23 16:27:45.304 CEST process=15615 c=authentication t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: connection authorized: user=monitoring database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off)
2019-04-23 16:27:45.306 CEST process=15615 c=BEGIN t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.095 ms
2019-04-23 16:27:45.306 CEST process=15615 c=SELECT t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.234 ms
2019-04-23 16:27:45.306 CEST process=15615 c=idle in transaction t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: disconnection: session time: 0:00:00.006 user=monitoring database=postgres host=127.0.0.1 port=40797
I've tried various combinations of OS, psycopg2 and PG versions to no avail. My focus has been to reproduce with this driver only.
OS: Centos 7 and Ubuntu 18.04
psycopg2: From latest down to 2.7.x
PG: Latest 11 down to 9.4 (we use 9.4-bdr currently in production)
I can not reproduce this, when SSL has been disabled on PG. However this is not a solution, as we are committed to use SSL for all our connections to DB.
Has anyone experienced this and could share any advice how I could prevent this from producing network packets with TCP [R] flag ?
Thank you very much in advance and any advice will be much appreciated.
P.S. I've also posted this on following channels:
https://dba.stackexchange.com/questions/235497/tcp-resets-when-using-psycopg2
https://github.com/psycopg/psycopg2/issues/906
Hello,
I'm trying to do a simple health check for keepalived and other services via a python script and psycopg2 library. All seems to be all right, until I close the connection, at which point a packet with TCP reset is produced.
This has become very problematic and creates extensive noise in monitoring.
It also happens with IntelliJ DataGrip provided Java driver and PHP drivers. And hence I do not think this is the drivers problem, but PG instead.
Script
import psycopg2 import sys def check(): # Relying on .pgpass for password con = psycopg2.connect('user=monitoring dbname=postgres host=127.0.0.1') cur = con.cursor() cur.execute("select 'keepalived healthcheck'") cur.close() con.close() try: check() print("ok") except Exception as e: print("not ok") print(str(e)) sys.exit(1)
tcpdump
tcpdump -v 'tcp[tcpflags] & (tcp-rst) != 0' -ilo
16:27:45.307006 IP (tos 0x0, ttl 64, id 8123, offset 0, flags [DF], proto TCP (6), length 40) localhost.40797 > localhost.postgres: Flags [R], cksum 0x0cca (correct), seq 3830516781, win 0, length 0
From PG side of things, all seems to be OK
2019-04-23 16:27:45.300 CEST process=15615 c= t=0 s=5cbf20e1.3cff [unknown]@127.0.0.1:[unknown] app=[unknown] LOG: connection received: host=127.0.0.1 port=40797 2019-04-23 16:27:45.304 CEST process=15615 c=authentication t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: connection authorized: user=monitoring database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off) 2019-04-23 16:27:45.306 CEST process=15615 c=BEGIN t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.095 ms 2019-04-23 16:27:45.306 CEST process=15615 c=SELECT t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.234 ms 2019-04-23 16:27:45.306 CEST process=15615 c=idle in transaction t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: disconnection: session time: 0:00:00.006 user=monitoring database=postgres host=127.0.0.1 port=40797
I've tried various combinations of OS, psycopg2 and PG versions to no avail. My focus has been to reproduce with this driver only.
OS: Centos 7 and Ubuntu 18.04
psycopg2: From latest down to 2.7.x
PG: Latest 11 down to 9.4 (we use 9.4-bdr currently in production)
I can not reproduce this, when SSL has been disabled on PG. However this is not a solution, as we are committed to use SSL for all our connections to DB.
Has anyone experienced this and could share any advice how I could prevent this from producing network packets with TCP [R] flag ?
Thank you very much in advance and any advice will be much appreciated.
P.S. I've also posted this on following channels:
https://dba.stackexchange.com/questions/235497/tcp-resets-when-using-psycopg2
https://github.com/psycopg/psycopg2/issues/906
Med vennlig hilsen.Best regards, Janis Puris.
From PG side of things, all seems to be OK
What actual problem you are experiencing?
Usually, getting a TCP reset is a normal event on TCP connections not necessarily related to any application level activity.
A TCP reset is triggered as soon as a packet is being received on a "closed" connection.
Such "late orphaned packet" may e.g. result from an ack for a packet in transit by the time of close.
For more detailed information you might look at the exact sequence of packets being exchanged.
This way, you might get better insight in what is happening and better information on wether something is going wrong.
That you do experience TCP reset only with SSL being involved might just be accidentally or be a
consequence of SSL stack activities. It could even be caused by the way the (SSL) connection
is being configured/set-up in the specific case.
But nevertheless, it might be perfectly OK for such packets to be generated.
So, please re-think why you are considering them problematic in the first place.
Regards
Rainer
=?utf-8?Q?J=C4=81nis_P=C5=ABris?= <janis@puris.lv> writes: > I'm trying to do a simple health check for keepalived and other services via a python script and psycopg2 library. Allseems to be all right, until I close the connection, at which point a packet with TCP reset is produced. Hm. It's fairly obvious from the postmaster log that the client side is not bothering to close the transaction it started: > 2019-04-23 16:27:45.306 CEST process=15615 c=BEGIN t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.095 ms > 2019-04-23 16:27:45.306 CEST process=15615 c=SELECT t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.234 ms > 2019-04-23 16:27:45.306 CEST process=15615 c=idle in transaction t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown]LOG: disconnection: session time: 0:00:00.006 user=monitoring database=postgres host=127.0.0.1 port=40797 There's no COMMIT or ROLLBACK to go with the BEGIN, and that's reflected in the fact that the disconnection message shows c=idle in transaction. Now, I doubt that would have any impact on the TCP-level session behavior, but it suggests that maybe the client isn't bothering to close the session cleanly either. The RST rather than FIN would then likely be caused by SSL having to do an unclean shutdown. regards, tom lane
It's fairly obvious from the postmaster log that the client side
is not bothering to close the transaction it started
So, please re-think why you are considering them problematic in the first place.
Jānis Pūris <janis@puris.lv> writes:I'm trying to do a simple health check for keepalived and other services via a python script and psycopg2 library. All seems to be all right, until I close the connection, at which point a packet with TCP reset is produced.
Hm. It's fairly obvious from the postmaster log that the client side
is not bothering to close the transaction it started:2019-04-23 16:27:45.306 CEST process=15615 c=BEGIN t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.095 ms
2019-04-23 16:27:45.306 CEST process=15615 c=SELECT t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: duration: 0.234 ms
2019-04-23 16:27:45.306 CEST process=15615 c=idle in transaction t=0 s=5cbf20e1.3cff monitoring@127.0.0.1:postgres app=[unknown] LOG: disconnection: session time: 0:00:00.006 user=monitoring database=postgres host=127.0.0.1 port=40797
There's no COMMIT or ROLLBACK to go with the BEGIN, and that's reflected
in the fact that the disconnection message shows c=idle in transaction.
Now, I doubt that would have any impact on the TCP-level session behavior,
but it suggests that maybe the client isn't bothering to close the session
cleanly either. The RST rather than FIN would then likely be caused by
SSL having to do an unclean shutdown.
regards, tom lane
=?utf-8?Q?J=C4=81nis_P=C5=ABris?= <janis@puris.lv> writes: > This is only reproducible with SSL connection. Seems like you might get more insight from OpenSSL experts, then. (No idea where you find some, but they must exist.) regards, tom lane
On 26/04/2019 18:19, Tom Lane wrote: > =?utf-8?Q?J=C4=81nis_P=C5=ABris?= <janis@puris.lv> writes: >> This is only reproducible with SSL connection. > > Seems like you might get more insight from OpenSSL experts, then. > (No idea where you find some, but they must exist.) List-Id: <openssl-users.openssl.org> -- Cheers, Jeremy
On 2019-04-26 18:36:12 +0200, Jānis Pūris wrote: > Thanks for the insight, Tom. > > > It's fairly obvious from the postmaster log that the client side > is not bothering to close the transaction it started > > > Thats what I was also thinking, but I've managed to reproduce it with > autocommit or commit before closing connection as well. > This is only reproducible with SSL connection. > > I've reproduced it with ubuntu package bundles certs, my own created, generated > with CA etc hence I do not think it is a problem with certificates themselves. > This happens with various clients, php driver, java driver and also libpq used > by psycopg2. We are running a lot of different kind of microservices and thus > far I can tell, almost (if not all) of them have this problem manifest. > > I can not however reproduce this when doing something like > $ psql "user=postgres sslmode=require host=localhost dbname=postgres" > postgres=# \q I can reproduce it in this case also (kernel 4.9, openssl 1.1.0j, posgres 9.5.16). stracing psql shows this: send(3, "\27\3\3\0\35fV\354\31v\332\10\22\202-#8\5\267\5\202\346\3748\353\326&\374\rf\2776"..., 34, MSG_NOSIGNAL) = 34 send(3, "\25\3\3\0\32fV\354\31v\332\10\23\307\207\335V\355Lhh\"\214\331\25l\352\27\32\332\357", 31, MSG_NOSIGNAL) = 31 close(3) = 0 I.e., psql sends some data (presumably a TLS close_notify alert) and then closes the file descriptor. tcpdump shows this: 12:30:02.276188 IP6 localhost.47712 > localhost.postgresql: Flags [P.], seq 414:445, ack 1239, win 360, options [nop,nop,TSval 337416186 ecr 337416186], length 31 12:30:02.276243 IP6 localhost.47712 > localhost.postgresql: Flags [F.], seq 445, ack 1239, win 360, options [nop,nop,TS val337416186 ecr 337416186], length 0 12:30:02.276306 IP6 localhost.postgresql > localhost.47712: Flags [P.], seq 1239:1270, ack 446, win 350, options [nop,nop,TSval 337416186 ecr 337416186], length 31 12:30:02.276315 IP6 localhost.47712 > localhost.postgresql: Flags [R], seq 2477517213, win 0, length 0 The first packet shown are the 31 bytes of the second send, then the cliens sends a FIN packet. Then the server replies with a packet of the same length (so presumably also a close_notify alert). Since the client has already closed the connection, it replies with a RST. One might argue that openssl should wait for the other side to reply before closing the connection. But waiting doesn't really accomplish anything from the application's point of view and introduces and unnecessary delay. > This is an issue because we have a lot of connections being initiated and > closed. And this creates a lot of TCP resets. Why are those resets a problem? (If the answer is "our monitoring software complains about them" then the question becomes "why does your monitoring software think they are a problem?") hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment
On 2019-04-26 18:36:12 +0200, Jānis Pūris wrote:Thanks for the insight, Tom.
It's fairly obvious from the postmaster log that the client side
is not bothering to close the transaction it started
Thats what I was also thinking, but I've managed to reproduce it with
autocommit or commit before closing connection as well.
This is only reproducible with SSL connection.
I've reproduced it with ubuntu package bundles certs, my own created, generated
with CA etc hence I do not think it is a problem with certificates themselves.
This happens with various clients, php driver, java driver and also libpq used
by psycopg2. We are running a lot of different kind of microservices and thus
far I can tell, almost (if not all) of them have this problem manifest.
I can not however reproduce this when doing something like
$ psql "user=postgres sslmode=require host=localhost dbname=postgres"
postgres=# \q
I can reproduce it in this case also (kernel 4.9, openssl 1.1.0j,
posgres 9.5.16).
stracing psql shows this:
send(3, "\27\3\3\0\35fV\354\31v\332\10\22\202-#8\5\267\5\202\346\3748\353\326&\374\rf\2776"..., 34, MSG_NOSIGNAL) = 34
send(3, "\25\3\3\0\32fV\354\31v\332\10\23\307\207\335V\355Lhh\"\214\331\25l\352\27\32\332\357", 31, MSG_NOSIGNAL) = 31
close(3) = 0
I.e., psql sends some data (presumably a TLS close_notify alert) and
then closes the file descriptor.
tcpdump shows this:
12:30:02.276188 IP6 localhost.47712 > localhost.postgresql: Flags [P.], seq 414:445, ack 1239, win 360, options [nop,nop,TS val 337416186 ecr 337416186], length 31
12:30:02.276243 IP6 localhost.47712 > localhost.postgresql: Flags [F.], seq 445, ack 1239, win 360, options [nop,nop,TS val 337416186 ecr 337416186], length 0
12:30:02.276306 IP6 localhost.postgresql > localhost.47712: Flags [P.], seq 1239:1270, ack 446, win 350, options [nop,nop,TS val 337416186 ecr 337416186], length 31
12:30:02.276315 IP6 localhost.47712 > localhost.postgresql: Flags [R], seq 2477517213, win 0, length 0
The first packet shown are the 31 bytes of the second send, then the
cliens sends a FIN packet. Then the server replies with a packet of the
same length (so presumably also a close_notify alert). Since the client
has already closed the connection, it replies with a RST.
One might argue that openssl should wait for the other side to reply
before closing the connection. But waiting doesn't really accomplish
anything from the application's point of view and introduces and
unnecessary delay.This is an issue because we have a lot of connections being initiated and
closed. And this creates a lot of TCP resets.
Why are those resets a problem? (If the answer is "our monitoring
software complains about them" then the question becomes "why does your
monitoring software think they are a problem?")
hp
--
_ | Peter J. Holzer | we build much bigger, better disasters now
|_|_) | | because we have much more sophisticated
| | | hjp@hjp.at | management tools.
__/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>