Hi.
I have database users which have to run some ad-hoc queries on the
database. These queries sometimes are pretty long running, especially
when a user messes up a join condition for two 1Grows tables and
therefore tries to count the number of quarks in the universe or
something like this.
After a query fails to return result for some time the user sometimes
just closes his laptop lid, dropping his VPN connection and goes home to
watch some cats on the Internet.
The TCP keepalives functionality on the database server's operating
system would figure out that the connection is not working anymore and
close it.
You'd assume that the database would get the info and cancel the query.
Except - it does not.
The database would get an error on the connection socket only after it
tries to read or write to it next time. But it does not try to do this -
it's busy counting those quarks until statement_timeout is reached.
But default statement_timeout has to be quite large - the databases are
large (up to several TB) and are sometimes exported with pg_dump,
restored etc. - 24 hours is maybe a reasonable setting.
Is there a way to configure the database somehow, so it would figure out
that the connection is closed, faster than statement_timeout?
I'd think maybe a periodic signal to the connection process asking for
validating its connections somehow if it's running a query longer than
tcp_keepalives_idle+tcp_keepalives_count*tcp_keepalives_interval?
I think it would be possible to work-around with some external process
checking /proc/[PID]/net/tcp[6] for closed connections and
pg_cancel_backend() them. But it's managed PostgreSQL (AWS RDS), so it's
not allowed.
Is there any possibility to have something like this added to
PostgreSQL? I believe it's not just us having this problem.
--
Tomasz "Tometzky" Ostrowski