Hi,
I am not sure how to explain this, but I believe this is something usually referred to in the JDBC world as a hung thread.
This is an Aurora RDS PostgreSQL database that was patched in a supposed to be zero downtime kind of thing.
After the patching, I have a user complaining that he has a job that normally gets run in the OS cron session that has been failing.
When I check the database, it
=> SELECT pg_postmaster_start_time(),
-> pg_postmaster_start_time() at time zone 'utc' at time zone 'Pacific/Auckland' as "Pacific/Auckland TIMEZONE",
-> date_trunc( 'second', current_timestamp - pg_postmaster_start_time() ) as uptime;
-[ RECORD 1 ]-------------+------------------------------
pg_postmaster_start_time | 2024-11-06 10:44:51.832663+00
Pacific/Auckland TIMEZONE | 2024-11-05 21:44:51.832663+00
uptime | 3 days 10:29:08
So, in theory it really wasn't zero downtime, not sure Aurora RDS takes care of it but from the output above, it was re-started although pg_stat_activity shows it is still there.
Checking pg_stat_activity
select pid as process_id,
usename as username,
datname as database_name,
client_addr as client_address,
application_name,
now() - pg_stat_activity.query_start AS duration,
backend_start,
backend_start at time zone 'Pacific/Auckland' as backend_start_localtime,
state,
state_change,
state_change at time zone 'Pacific/Auckland' as state_change_localtime
from pg_stat_activity
where usename = '[blah]'
order by username asc
I have the output as below:
process_id | username | database_name | client_address | application_name | duration | backend_start | backend_start_localtime | state | state_change | state_change_localtime
------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+----------------------------
31947 | [blah] | [blah] | [blah] | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06 10:44:53.309388+00 | 2024-11-06 23:44:53.309388 | idle | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173
(1 row)
From the pg_stat_output, the backend_start has not changed for several days since the patching, so I am 'guessing' it may have gone lost/rogue already. Is there any way to check that this is the case before I kill it.
The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.
Any advice much appreciated. Thanks in advance.
Regards,
Ed