Thread: How to check if session is a hung thread/session
-> pg_postmaster_start_time() at time zone 'utc' at time zone 'Pacific/Auckland' as "Pacific/Auckland TIMEZONE",
-> date_trunc( 'second', current_timestamp - pg_postmaster_start_time() ) as uptime;
-[ RECORD 1 ]-------------+------------------------------
pg_postmaster_start_time | 2024-11-06 10:44:51.832663+00
Pacific/Auckland TIMEZONE | 2024-11-05 21:44:51.832663+00
uptime | 3 days 10:29:08
usename as username,
datname as database_name,
client_addr as client_address,
application_name,
now() - pg_stat_activity.query_start AS duration,
backend_start,
backend_start at time zone 'Pacific/Auckland' as backend_start_localtime,
state,
state_change,
state_change at time zone 'Pacific/Auckland' as state_change_localtime
from pg_stat_activity
where usename = '[blah]'
order by username asc
------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+----------------------------
31947 | [blah] | [blah] | [blah] | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06 10:44:53.309388+00 | 2024-11-06 23:44:53.309388 | idle | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173
(1 row)
On Sun, 2024-11-10 at 10:25 +1300, Edwin UY wrote: > I am not sure how to explain this, but I believe this is something usually referred to in the JDBC world as a hung thread. > > This is an Aurora RDS PostgreSQL database that was patched in a supposed to be zero downtime kind of thing. > After the patching, I have a user complaining that he has a job that normally gets run in the OS cron session that hasbeen failing. > > Checking pg_stat_activity > > select pid as process_id, > usename as username, > datname as database_name, > client_addr as client_address, > application_name, > now() - pg_stat_activity.query_start AS duration, > backend_start, > backend_start at time zone 'Pacific/Auckland' as backend_start_localtime, > state, > state_change, > state_change at time zone 'Pacific/Auckland' as state_change_localtime > from pg_stat_activity > where usename = '[blah]' > order by username asc > > I have the output as below: > > process_id | username | database_name | client_address | application_name | duration | backend_start | backend_start_localtime | state | state_change | state_change_localtime > ------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+---------------------------- > 31947 | [blah] | [blah] | [blah] | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06 10:44:53.309388+00| 2024-11-06 23:44:53.309388 | idle | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173 > (1 row) > > From the pg_stat_output, the backend_start has not changed for several days since the patching, so I am 'guessing' > it may have gone lost/rogue already. Is there any way to check that this is the case before I kill it. > The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread. You should look at the "state_change" column for the session to see how long it has been idle. At any rate, an idle session is not hanging - at least not in the database. Perhaps you got the wrong session, or perhaps the client has got a problem. I'd say that killing the session won't resolve any hanging thread on the client side. Still, if you kill it, you'd at worst cause an error on the client side. Yours, Laurenz Albe
On Sun, 2024-11-10 at 10:25 +1300, Edwin UY wrote:
> I am not sure how to explain this, but I believe this is something usually referred to in the JDBC world as a hung thread.
>
> This is an Aurora RDS PostgreSQL database that was patched in a supposed to be zero downtime kind of thing.
> After the patching, I have a user complaining that he has a job that normally gets run in the OS cron session that has been failing.
>
> Checking pg_stat_activity
>
> select pid as process_id,
> usename as username,
> datname as database_name,
> client_addr as client_address,
> application_name,
> now() - pg_stat_activity.query_start AS duration,
> backend_start,
> backend_start at time zone 'Pacific/Auckland' as backend_start_localtime,
> state,
> state_change,
> state_change at time zone 'Pacific/Auckland' as state_change_localtime
> from pg_stat_activity
> where usename = '[blah]'
> order by username asc
>
> I have the output as below:
>
> process_id | username | database_name | client_address | application_name | duration | backend_start | backend_start_localtime | state | state_change | state_change_localtime
> ------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+----------------------------
> 31947 | [blah] | [blah] | [blah] | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06 10:44:53.309388+00 | 2024-11-06 23:44:53.309388 | idle | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173
> (1 row)
>
> From the pg_stat_output, the backend_start has not changed for several days since the patching, so I am 'guessing'
> it may have gone lost/rogue already. Is there any way to check that this is the case before I kill it.
> The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.
You should look at the "state_change" column for the session to see how long it has been idle.
At any rate, an idle session is not hanging - at least not in the database. Perhaps you got the
wrong session, or perhaps the client has got a problem.
I'd say that killing the session won't resolve any hanging thread on the client side.
Still, if you kill it, you'd at worst cause an error on the client side.
Yours,
Laurenz Albe
On Saturday, November 9, 2024, Edwin UY <edwin.uy@gmail.com> wrote:
The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.
On Saturday, November 9, 2024, Edwin UY <edwin.uy@gmail.com> wrote:The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.The definition of a “hung process” is a process whose state is not changing.David J.
Hi David,Thanks for the clarification.I thought it could be the backend has sent something back to the client but it never received it and it just kept on doing the same at some intervals.On Mon, Nov 11, 2024 at 4:44 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Saturday, November 9, 2024, Edwin UY <edwin.uy@gmail.com> wrote:The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.The definition of a “hung process” is a process whose state is not changing.David J.
Edwin UY <edwin.uy@gmail.com> writes: > I thought it could be the backend has sent something back to the client but > it never received it and it just kept on doing the same at some intervals. Your pg_stat_activity output shows the backend is idle, meaning it's waiting for a client command. While the session has been around for days, we can see the last client command was about 47 minutes ago, from your "now() - pg_stat_activity.query_start AS duration" column. I see no reason to think there is anything interesting here at all, except for a client that is sitting doing nothing for long periods. regards, tom lane
[25751]:LOG: process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
[25751]:CONTEXT: while locking tuple (1,17) in relation " [table_name] "
Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.
Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command. While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.
regards, tom lane
Hi,
The log you expose doesn’t mean a dead lock! It comes from setting deadlock_timeout parameter. Its value is 1 second by default. When a session waits for a lock more than this threshold then a test to see if we are in a deadlock condition is triggered. It is simply that we are waiting to obtain a lock for longer that the value of deadlock_timeout. Any way it could be interpreted as a performance problem when we are frequently waiting more than 1 sec for a lock…
Regards
Michel SALAIS
De : Edwin UY <edwin.uy@gmail.com>
Envoyé : lundi 11 novembre 2024 07:45
À : Tom Lane <tgl@sss.pgh.pa.us>
Cc : David G. Johnston <david.g.johnston@gmail.com>; pgsql-admin@lists.postgresql.org
Objet : Re: How to check if session is a hung thread/session
Thanks Tom,
OK, I've decided to 'painfully' look at the PostgreSQL RDS logs and it is showing something like below.
There seems to be a locking/deadlock issue of some sort somewhere.
I have checked the other days' log prior to the patching and these appear to be a 'normal' occurrence for this database and it wasn't affecting the application nonetheless.
After the patching, it starts affecting the application. Not sure what else I can check on the Aurora PostgreSQL RDS end. I may request them to restart the app server.
[25751]:LOG: process 25751 still waiting for ShareLock on transaction 114953443 after 1000.054 ms
[25751]:DETAIL: Process holding the lock: 22297. Wait queue: 25751.
[25751]:CONTEXT: while locking tuple (1,17) in relation "[table_name]"
[25751]:STATEMENT: [SQL_STATEMENT] for update
[25751]:LOG: process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
[25751]:CONTEXT: while locking tuple (1,17) in relation " [table_name] "
On Mon, Nov 11, 2024 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.
Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command. While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.
regards, tom lane
[25751]:CONTEXT: while locking tuple (1,17) in relation "[table_name]"
[25751]:LOG: process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
Hi,
The log you expose doesn’t mean a dead lock! It comes from setting deadlock_timeout parameter. Its value is 1 second by default. When a session waits for a lock more than this threshold then a test to see if we are in a deadlock condition is triggered. It is simply that we are waiting to obtain a lock for longer that the value of deadlock_timeout. Any way it could be interpreted as a performance problem when we are frequently waiting more than 1 sec for a lock…
Regards
Michel SALAIS
De : Edwin UY <edwin.uy@gmail.com>
Envoyé : lundi 11 novembre 2024 07:45
À : Tom Lane <tgl@sss.pgh.pa.us>
Cc : David G. Johnston <david.g.johnston@gmail.com>; pgsql-admin@lists.postgresql.org
Objet : Re: How to check if session is a hung thread/session
Thanks Tom,
OK, I've decided to 'painfully' look at the PostgreSQL RDS logs and it is showing something like below.
There seems to be a locking/deadlock issue of some sort somewhere.
I have checked the other days' log prior to the patching and these appear to be a 'normal' occurrence for this database and it wasn't affecting the application nonetheless.
After the patching, it starts affecting the application. Not sure what else I can check on the Aurora PostgreSQL RDS end. I may request them to restart the app server.
[25751]:LOG: process 25751 still waiting for ShareLock on transaction 114953443 after 1000.054 ms
[25751]:DETAIL: Process holding the lock: 22297. Wait queue: 25751.
[25751]:CONTEXT: while locking tuple (1,17) in relation "[table_name]"
[25751]:STATEMENT: [SQL_STATEMENT] for update
[25751]:LOG: process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
[25751]:CONTEXT: while locking tuple (1,17) in relation " [table_name] "
On Mon, Nov 11, 2024 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.
Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command. While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.
regards, tom lane