Thread: How to check if session is a hung thread/session

How to check if session is a hung thread/session

From
Edwin UY
Date:
Hi,

I am not sure how to explain this, but I believe this is something usually referred to in the JDBC world as a hung thread.

This is an Aurora RDS PostgreSQL database that was patched in a supposed to be zero downtime kind of thing.
After the patching, I have a user complaining that he has a job that normally gets run in the OS cron session that has been failing.

When I check the database, it

=> SELECT pg_postmaster_start_time(),
->        pg_postmaster_start_time() at time zone 'utc' at time zone 'Pacific/Auckland' as "Pacific/Auckland TIMEZONE",
->        date_trunc( 'second',  current_timestamp - pg_postmaster_start_time() ) as uptime;
-[ RECORD 1 ]-------------+------------------------------
pg_postmaster_start_time  | 2024-11-06 10:44:51.832663+00
Pacific/Auckland TIMEZONE | 2024-11-05 21:44:51.832663+00
uptime                    | 3 days 10:29:08

So, in theory it really wasn't zero downtime, not sure Aurora RDS takes care of it but from the output above, it was re-started although pg_stat_activity shows it is still there.

Checking pg_stat_activity

select pid as process_id,
       usename as username,
       datname as database_name,
       client_addr as client_address,
       application_name,
       now() - pg_stat_activity.query_start AS duration,
       backend_start,
       backend_start at time zone 'Pacific/Auckland' as backend_start_localtime,
       state,
       state_change,
       state_change at time zone 'Pacific/Auckland' as state_change_localtime
from pg_stat_activity
where usename = '[blah]'
order by username asc

I have the output as below:

 process_id | username  | database_name | client_address |    application_name    |    duration     |         backend_start         |  backend_start_localtime   | state |         state_change          |   state_change_localtime
------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+----------------------------
      31947 | [blah] | [blah]         | [blah]    | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06 10:44:53.309388+00 | 2024-11-06 23:44:53.309388 | idle  | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173
(1 row)

From the pg_stat_output, the backend_start has not changed for several days since the patching, so I am 'guessing' it may have gone lost/rogue already. Is there any way to check that this is the case before I kill it.
The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.

Any advice much appreciated. Thanks in advance.



Regards,
Ed

Re: How to check if session is a hung thread/session

From
Laurenz Albe
Date:
On Sun, 2024-11-10 at 10:25 +1300, Edwin UY wrote:
> I am not sure how to explain this, but I believe this is something usually referred to in the JDBC world as a hung
thread.
>
> This is an Aurora RDS PostgreSQL database that was patched in a supposed to be zero downtime kind of thing.
> After the patching, I have a user complaining that he has a job that normally gets run in the OS cron session that
hasbeen failing. 
>
> Checking pg_stat_activity
>
> select pid as process_id,
>        usename as username,
>        datname as database_name,
>        client_addr as client_address,
>        application_name,
>        now() - pg_stat_activity.query_start AS duration,
>        backend_start,
>        backend_start at time zone 'Pacific/Auckland' as backend_start_localtime,
>        state,
>        state_change,
>        state_change at time zone 'Pacific/Auckland' as state_change_localtime
> from pg_stat_activity
> where usename = '[blah]'
> order by username asc
>
> I have the output as below:
>
>  process_id | username  | database_name | client_address |    application_name    |    duration     |        
backend_start        |  backend_start_localtime   | state |         state_change          |   state_change_localtime 
>
------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+----------------------------
>       31947 | [blah] | [blah]         | [blah]    | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06
10:44:53.309388+00| 2024-11-06 23:44:53.309388 | idle  | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173 
> (1 row)
>
> From the pg_stat_output, the backend_start has not changed for several days since the patching, so I am 'guessing'
> it may have gone lost/rogue already. Is there any way to check that this is the case before I kill it.
> The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.

You should look at the "state_change" column for the session to see how long it has been idle.

At any rate, an idle session is not hanging - at least not in the database.  Perhaps you got the
wrong session, or perhaps the client has got a problem.

I'd say that killing the session won't resolve any hanging thread on the client side.
Still, if you kill it, you'd at worst cause an error on the client side.

Yours,
Laurenz Albe



Re: How to check if session is a hung thread/session

From
Edwin UY
Date:
Hi Laurenz

Thanks for your reply. 

When you say "You should look at the "state_change" column for the session to see how long it has been idle."

Do you mean compare the time between state_change whether they change or that? Will have to script that somehow.


On Sun, Nov 10, 2024 at 6:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2024-11-10 at 10:25 +1300, Edwin UY wrote:
> I am not sure how to explain this, but I believe this is something usually referred to in the JDBC world as a hung thread.
>
> This is an Aurora RDS PostgreSQL database that was patched in a supposed to be zero downtime kind of thing.
> After the patching, I have a user complaining that he has a job that normally gets run in the OS cron session that has been failing.
>
> Checking pg_stat_activity
>
> select pid as process_id,
>        usename as username,
>        datname as database_name,
>        client_addr as client_address,
>        application_name,
>        now() - pg_stat_activity.query_start AS duration,
>        backend_start,
>        backend_start at time zone 'Pacific/Auckland' as backend_start_localtime,
>        state,
>        state_change,
>        state_change at time zone 'Pacific/Auckland' as state_change_localtime
> from pg_stat_activity
> where usename = '[blah]'
> order by username asc
>
> I have the output as below:
>
>  process_id | username  | database_name | client_address |    application_name    |    duration     |         backend_start         |  backend_start_localtime   | state |         state_change          |   state_change_localtime
> ------------+-----------+---------------+----------------+------------------------+-----------------+-------------------------------+----------------------------+-------+-------------------------------+----------------------------
>       31947 | [blah] | [blah]         | [blah]    | PostgreSQL JDBC Driver | 00:47:21.838892 | 2024-11-06 10:44:53.309388+00 | 2024-11-06 23:44:53.309388 | idle  | 2024-11-08 02:00:06.005173+00 | 2024-11-08 15:00:06.005173
> (1 row)
>
> From the pg_stat_output, the backend_start has not changed for several days since the patching, so I am 'guessing'
> it may have gone lost/rogue already. Is there any way to check that this is the case before I kill it.
> The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.

You should look at the "state_change" column for the session to see how long it has been idle.

At any rate, an idle session is not hanging - at least not in the database.  Perhaps you got the
wrong session, or perhaps the client has got a problem.

I'd say that killing the session won't resolve any hanging thread on the client side.
Still, if you kill it, you'd at worst cause an error on the client side.

Yours,
Laurenz Albe

Re: How to check if session is a hung thread/session

From
"David G. Johnston"
Date:


On Saturday, November 9, 2024, Edwin UY <edwin.uy@gmail.com> wrote:

The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.

The definition of a “hung process” is a process whose state is not changing.

David J.
 

Re: How to check if session is a hung thread/session

From
Edwin UY
Date:
Hi David,

Thanks for the clarification. 
I thought it could be the backend has sent something back to the client but it never received it and it just kept on doing the same at some intervals.


On Mon, Nov 11, 2024 at 4:44 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Saturday, November 9, 2024, Edwin UY <edwin.uy@gmail.com> wrote:

The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.

The definition of a “hung process” is a process whose state is not changing.

David J.
 

Re: How to check if session is a hung thread/session

From
Edwin UY
Date:
Need to Google how to trace a RDS session then :( and see what's the SQL that's getting run, the clue may be something in there

On Mon, Nov 11, 2024 at 5:08 PM Edwin UY <edwin.uy@gmail.com> wrote:
Hi David,

Thanks for the clarification. 
I thought it could be the backend has sent something back to the client but it never received it and it just kept on doing the same at some intervals.


On Mon, Nov 11, 2024 at 4:44 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Saturday, November 9, 2024, Edwin UY <edwin.uy@gmail.com> wrote:

The state_change date is getting updated though, I don't know if this is proof that it is not a hung thread.

The definition of a “hung process” is a process whose state is not changing.

David J.
 

Re: How to check if session is a hung thread/session

From
Tom Lane
Date:
Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.

Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command.  While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.

            regards, tom lane



Re: How to check if session is a hung thread/session

From
Edwin UY
Date:
Thanks Tom,

OK, I've decided to 'painfully' look at the PostgreSQL RDS logs and it is showing something like below.
There seems to be a locking/deadlock issue of some sort somewhere.
I have checked the other days' log prior to the patching and these appear to be a 'normal' occurrence for this database and it wasn't affecting the application nonetheless.
After the patching, it starts affecting the application. Not sure what else I can check on the Aurora PostgreSQL RDS end. I may request them to restart the app server.

[25751]:LOG:  process 25751 still waiting for ShareLock on transaction 114953443 after 1000.054 ms
[25751]:DETAIL:  Process holding the lock: 22297. Wait queue: 25751.
[25751]:CONTEXT:  while locking tuple (1,17) in relation "[table_name]"
[25751]:STATEMENT:  [SQL_STATEMENT] for update
[25751]:LOG:  process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
[25751]:CONTEXT:  while locking tuple (1,17) in relation "
[table_name] "


On Mon, Nov 11, 2024 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.

Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command.  While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.

                        regards, tom lane

RE: How to check if session is a hung thread/session

From
Date:

Hi,

 

The log you expose doesn’t mean a dead lock! It comes from setting deadlock_timeout parameter. Its value is 1 second by default. When a session waits for a lock more than this threshold then a test to see if we are in a deadlock condition is triggered. It is simply that we are waiting to obtain a lock for longer that the value of deadlock_timeout. Any way it could be interpreted as a performance problem when we are frequently waiting more than 1 sec for a lock…

 

Regards

 

Michel SALAIS

De : Edwin UY <edwin.uy@gmail.com>
Envoyé : lundi 11 novembre 2024 07:45
À : Tom Lane <tgl@sss.pgh.pa.us>
Cc : David G. Johnston <david.g.johnston@gmail.com>; pgsql-admin@lists.postgresql.org
Objet : Re: How to check if session is a hung thread/session

 

Thanks Tom,

 

OK, I've decided to 'painfully' look at the PostgreSQL RDS logs and it is showing something like below.

There seems to be a locking/deadlock issue of some sort somewhere.

I have checked the other days' log prior to the patching and these appear to be a 'normal' occurrence for this database and it wasn't affecting the application nonetheless.

After the patching, it starts affecting the application. Not sure what else I can check on the Aurora PostgreSQL RDS end. I may request them to restart the app server.

 

[25751]:LOG:  process 25751 still waiting for ShareLock on transaction 114953443 after 1000.054 ms

[25751]:DETAIL:  Process holding the lock: 22297. Wait queue: 25751.

[25751]:CONTEXT:  while locking tuple (1,17) in relation "[table_name]"

[25751]:STATEMENT:  [SQL_STATEMENT] for update
[25751]:LOG:  process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
[25751]:CONTEXT:  while locking tuple (1,17) in relation "
[table_name] "

 

 

On Mon, Nov 11, 2024 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.

Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command.  While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.

                        regards, tom lane

Re: How to check if session is a hung thread/session

From
Edwin UY
Date:
Hi Michel,

Thanks for the explanation. Didn't know about this default setting.
It doesn't happen all the time. It is reported in the log every 2-3 hours and once or two at most during that time but it doesn't necessarily affect the whole application.

This table is a Mule flow sort of table where there's 3 flows that check on this table for things to process and then continue on further. 2 of them goes fine and only one of them is failing intermittently.

[25751]:DETAIL:  Process holding the lock: 22297. Wait queue: 25751.
[25751]:CONTEXT:  while locking tuple (1,17) in relation "[table_name]"
[25751]:STATEMENT:  [select SQL statement] for update
[25751]:LOG:  process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms


On Tue, Nov 12, 2024 at 5:09 AM <msalais@msym.fr> wrote:

Hi,

 

The log you expose doesn’t mean a dead lock! It comes from setting deadlock_timeout parameter. Its value is 1 second by default. When a session waits for a lock more than this threshold then a test to see if we are in a deadlock condition is triggered. It is simply that we are waiting to obtain a lock for longer that the value of deadlock_timeout. Any way it could be interpreted as a performance problem when we are frequently waiting more than 1 sec for a lock…

 

Regards

 

Michel SALAIS

De : Edwin UY <edwin.uy@gmail.com>
Envoyé : lundi 11 novembre 2024 07:45
À : Tom Lane <tgl@sss.pgh.pa.us>
Cc : David G. Johnston <david.g.johnston@gmail.com>; pgsql-admin@lists.postgresql.org
Objet : Re: How to check if session is a hung thread/session

 

Thanks Tom,

 

OK, I've decided to 'painfully' look at the PostgreSQL RDS logs and it is showing something like below.

There seems to be a locking/deadlock issue of some sort somewhere.

I have checked the other days' log prior to the patching and these appear to be a 'normal' occurrence for this database and it wasn't affecting the application nonetheless.

After the patching, it starts affecting the application. Not sure what else I can check on the Aurora PostgreSQL RDS end. I may request them to restart the app server.

 

[25751]:LOG:  process 25751 still waiting for ShareLock on transaction 114953443 after 1000.054 ms

[25751]:DETAIL:  Process holding the lock: 22297. Wait queue: 25751.

[25751]:CONTEXT:  while locking tuple (1,17) in relation "[table_name]"

[25751]:STATEMENT:  [SQL_STATEMENT] for update
[25751]:LOG:  process 25751 acquired ShareLock on transaction 114953443 after 4756.967 ms
[25751]:CONTEXT:  while locking tuple (1,17) in relation "
[table_name] "

 

 

On Mon, Nov 11, 2024 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edwin UY <edwin.uy@gmail.com> writes:
> I thought it could be the backend has sent something back to the client but
> it never received it and it just kept on doing the same at some intervals.

Your pg_stat_activity output shows the backend is idle, meaning it's
waiting for a client command.  While the session has been around for
days, we can see the last client command was about 47 minutes ago,
from your "now() - pg_stat_activity.query_start AS duration" column.
I see no reason to think there is anything interesting here at all,
except for a client that is sitting doing nothing for long periods.

                        regards, tom lane