Re: Add client connection check during the execution of the query - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Add client connection check during the execution of the query
Date
Msg-id 20190718.121906.432767475415875714.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Add client connection check during the execution of the query  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Add client connection check during the execution of the query  (Thomas Munro <thomas.munro@gmail.com>)
Re: Add client connection check during the execution of the query  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
> Yeah.
> 
> +1 for this patch, with a few adjustments including making the test
> use pg_sleep() as mentioned.  It does something useful, namely
> cancelling very long running queries sooner if the client has gone
> away instead of discovering that potentially much later when sending a
> response.  It does so with a portable kernel interface (though we
> haven't heard from a Windows tester), and I think it's using it in a
> safe way (we're not doing the various bad things you can do with
> MSG_PEEK, and the fd is expected to be valid for the process's
> lifetime, and the socket is always in non-blocking mode*, so I don't
> think there is any bad time for pg_check_client_connection() to run).
> It reuses the existing timer infrastructure so there isn't really any
> new overhead.  One syscall every 10 seconds or whatever at the next
> available CFI is basically nothing.  On its own, this patch will
> reliably detect clients that closed abruptly or exited/crashed (so
> they client kernel sends a FIN packet).  In combination with TCP
> keepalive, it'll also detect clients that went away because the
> network or client kernel ceased to exist.
> 
> *There are apparently no callers of pg_set_block(), so if you survived
> pq_init() you have a non-blocking socket.  If you're on Windows, the
> code always sets the magic pgwin32_noblock global flag before trying
> to peek.  I wondered if it's OK that the CFI would effectively clobber
> that with 0 on its way out, but that seems to be OK because every
> place in the code that sets that flag does so immediately before an IO
> operationg without a CFI in between.  As the comment in pgstat.c says
> "This is extremely broken and should be fixed someday.".  I wonder if
> we even need that flag at all now that all socket IO is non-blocking.

I have looked into the patch and tested a little bit.

First of all, I had to grab February snapshot to test the patch
because it does not apply to the current HEAD. I noticed that there
are some confusions in the doc and code regarding what the new
configuration parameter means. According to the doc:

+        Default value is <literal>zero</literal> - it disables connection
+        checks, so the backend will detect client disconnection only when trying
+        to send a response to the query.

But guc.c comment says:

+            gettext_noop("Sets the time interval for checking connection with the client."),
+            gettext_noop("A value of -1 disables this feature. Zero selects a suitable default value."),

Probably the doc is correct since the actual code does so.

Also I found this in postgresql.conf default:

#client_connection_check_interval = 1000    # set time interval between

So here the default value seems to be be 1000. If so, guc.c should be
adjusted and the doc should be changed accordingly. I am not sure.

Next I have tested the patch using standard pgbench.

With the feature enabled with 1000ms check interval:
$ pgbench -c 10 -T 300 -S test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 19347995
latency average = 0.155 ms
tps = 64493.278581 (including connections establishing)
tps = 64493.811945 (excluding connections establishing)

Without the feature (client-connection-check-interval = 0)
$ pgbench -c 10 -T 300 -S test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 20314812
latency average = 0.148 ms
tps = 67715.993428 (including connections establishing)
tps = 67717.251843 (excluding connections establishing)

So the performance is about 5% down with the feature enabled in this
case.  For me, 5% down is not subtle. Probably we should warn this in
the doc.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Ryan Lambert
Date:
Subject: Re: dropdb --force