Thread: BUG #10123: Weird entries in pg_stat_activity
The following bug has been logged on the website: Bug reference: 10123 Logged by: Maxim Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 9.2.8 Operating system: Linux Description: Hi everyone, During debug of the weird nagios database alerts about long transactions I found that sometime pg_stat_activity contains very weird entries with xact_start NOT NULL and state='idle'... what's more they happened during backend idle state. To check what's going on I performed the following actions on production database: 1)set log_min_duration_statement=0 to see ALL database queries 2)run single line bash script to check for weird entries: while true; do psql -x -X -c "select now() as ts,now()-xact_start as tx_age, now()-state_change as change_age,* from pg_stat_activity where state='idle' and xact_start is not null" | grep -vE '^$|No rows'; sleep 0.1; done And yes, once-twice-trice per minute script output some interesting results such as: -[ RECORD 1 ]----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ts | 2014-04-24 13:49:03.498354+04 tx_age | 00:04:01.283106 change_age | 00:04:01.279373 datid | 16385 datname | ******* pid | 685 usesysid | 16384 usename | ******* application_name | client_addr | 127.0.0.1 client_hostname | client_port | 37461 backend_start | 2014-04-24 13:30:03.275711+04 xact_start | 2014-04-24 13:45:02.215248+04 query_start | 2014-04-24 13:45:02.215248+04 state_change | 2014-04-24 13:45:02.218981+04 waiting | f state | idle query | | SELECT cache.save_response( | ***** | ); State=idle xact_start=4 min ago Now what I found in the full database log for this pid: ... 2014-04-24 13:45:02 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] [SELECT] LOG: duration: 3.703 ms statement: SELECT cache.save_response(...) -- no queries between 13:45 and 13:50 at all! 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] [SELECT] LOG: duration: 0.962 ms statement: select 1 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] [SELECT] LOG: duration: 0.415 ms statement: SELECT * FROM o_queries_queue ... So it seems that sometime pg_stat_activity entry become alive with the old information (it shows xact_start from the last executed query in backend, even if that query finished long time ago). I know about asynchronous behavior of pg_stat_activity but getting xact_start from deep past seems wrong for me. What's more: i tested that problem on another databases - and found they have same effect (so it doesn't look like a problem local to that single exact database). Kind Regards, Maksym
On Thu, Apr 24, 2014 at 12:18 PM, <maxim.boguk@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: 10123 > Logged by: Maxim Boguk > Email address: maxim.boguk@gmail.com > PostgreSQL version: 9.2.8 > Operating system: Linux > Description: > > Hi everyone, > > During debug of the weird nagios database alerts about long transactions = I > found that sometime pg_stat_activity contains very weird entries with > xact_start NOT NULL and state=3D'idle'... what's more they happened durin= g > backend idle state. > > To check what's going on I performed the following actions on production > database: > 1)set log_min_duration_statement=3D0 to see ALL database queries > 2)run single line bash script to check for weird entries: > while true; do psql -x -X -c "select now() as ts,now()-xact_start as > tx_age, now()-state_change as change_age,* from pg_stat_activity where > state=3D'idle' and xact_start is not null" | grep -vE '^$|No rows'; slee= p > 0.1; done > > And yes, once-twice-trice per minute script output some interesting resul= ts > such as: > > > -[ RECORD 1 > > ]----+-------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------- > ts | 2014-04-24 13:49:03.498354+04 > tx_age | 00:04:01.283106 > change_age | 00:04:01.279373 > datid | 16385 > datname | ******* > pid | 685 > usesysid | 16384 > usename | ******* > application_name | > client_addr | 127.0.0.1 > client_hostname | > client_port | 37461 > backend_start | 2014-04-24 13:30:03.275711+04 > xact_start | 2014-04-24 13:45:02.215248+04 > query_start | 2014-04-24 13:45:02.215248+04 > state_change | 2014-04-24 13:45:02.218981+04 > waiting | f > state | idle > query | > | SELECT cache.save_response( > | ***** > | ); > > > State=3Didle > xact_start=3D4 min ago > > Now what I found in the full database log for this pid: > > ... > > 2014-04-24 13:45:02 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 3.703 ms statement: > SELECT cache.save_response(...) > > -- no queries between 13:45 and 13:50 at all! > > 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 0.962 ms statement: select 1 > 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 0.415 ms statement: SELECT * FROM o_queries_que= ue > ... > > So it seems that sometime pg_stat_activity entry become alive with the ol= d > information (it shows xact_start from the last executed query in backend, > even if that query finished long time ago). > I know about asynchronous behavior of pg_stat_activity but getting > xact_start from deep past seems wrong for me. > > What's more: i tested that problem on another databases - and found they > have same effect (so it doesn't look like a problem local to that single > exact database). > This sounds like the expected behavior. When state=3DIdle, the query field shows the value of the *last* query that ran. See http://www.postgresql.org/docs/9.3/static/monitoring-stats.html#PG-STAT-ACT= IVITY-VIEW . Note that this was changed in 9.2, and at the same time the current_query field was renamed to query to highlight this change. The fact that xact_start shows something far in the past just means that there is an open transaction that has been open since then. --=20 Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 2014-04-24 12:26:09 +0200, Magnus Hagander wrote: > On Thu, Apr 24, 2014 at 12:18 PM, <maxim.boguk@gmail.com> wrote: > > ]----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > ts | 2014-04-24 13:49:03.498354+04 > > tx_age | 00:04:01.283106 > > change_age | 00:04:01.279373 > > datid | 16385 > > datname | ******* > > pid | 685 > > usesysid | 16384 > > usename | ******* > > application_name | > > client_addr | 127.0.0.1 > > client_hostname | > > client_port | 37461 > > backend_start | 2014-04-24 13:30:03.275711+04 > > xact_start | 2014-04-24 13:45:02.215248+04 > > query_start | 2014-04-24 13:45:02.215248+04 > > state_change | 2014-04-24 13:45:02.218981+04 > > waiting | f > > state | idle > > query | > > | SELECT cache.save_response( > > | ***** > > | ); > This sounds like the expected behavior. When state=Idle, the query field > shows the value of the *last* query that ran. See > http://www.postgresql.org/docs/9.3/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW > . > > Note that this was changed in 9.2, and at the same time the current_query > field was renamed to query to highlight this change. > > The fact that xact_start shows something far in the past just means that > there is an open transaction that has been open since then. I don't think that's true. In that case the state should have been idle in transaction, no? I think there's a pgstat_report_xact_timestamp(0) missing from xact.c:PrepareTransaction(). There's no point in reporting a prepared xact's starttime in a backend that's not associated with it anymore. Maxim, are you using prepared transactions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Apr 24, 2014 at 12:44 PM, Andres Freund <andres@2ndquadrant.com>wro= te: > On 2014-04-24 12:26:09 +0200, Magnus Hagander wrote: > > On Thu, Apr 24, 2014 at 12:18 PM, <maxim.boguk@gmail.com> wrote: > > > > ]----+-------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------- > > > ts | 2014-04-24 13:49:03.498354+04 > > > tx_age | 00:04:01.283106 > > > change_age | 00:04:01.279373 > > > datid | 16385 > > > datname | ******* > > > pid | 685 > > > usesysid | 16384 > > > usename | ******* > > > application_name | > > > client_addr | 127.0.0.1 > > > client_hostname | > > > client_port | 37461 > > > backend_start | 2014-04-24 13:30:03.275711+04 > > > xact_start | 2014-04-24 13:45:02.215248+04 > > > query_start | 2014-04-24 13:45:02.215248+04 > > > state_change | 2014-04-24 13:45:02.218981+04 > > > waiting | f > > > state | idle > > > query | > > > | SELECT cache.save_response( > > > | ***** > > > | ); > > > This sounds like the expected behavior. When state=3DIdle, the query fi= eld > > shows the value of the *last* query that ran. See > > > http://www.postgresql.org/docs/9.3/static/monitoring-stats.html#PG-STAT-A= CTIVITY-VIEW > > . > > > > Note that this was changed in 9.2, and at the same time the current_que= ry > > field was renamed to query to highlight this change. > > > > The fact that xact_start shows something far in the past just means tha= t > > there is an open transaction that has been open since then. > > I don't think that's true. In that case the state should have been idle > in transaction, no? > Oh, good point. Clearly I was too cofused on the "query" field. Sorry about that, and thanks for spotting it! I think there's a pgstat_report_xact_timestamp(0) missing from > xact.c:PrepareTransaction(). There's no point in reporting a prepared > xact's starttime in a backend that's not associated with it anymore. > Maxim, are you using prepared transactions? > --=20 Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thu, Apr 24, 2014 at 8:44 PM, Andres Freund <andres@2ndquadrant.com>wrot= e: > On 2014-04-24 12:26:09 +0200, Magnus Hagander wrote: > > On Thu, Apr 24, 2014 at 12:18 PM, <maxim.boguk@gmail.com> wrote: > > > > ]----+-------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------- > > > ts | 2014-04-24 13:49:03.498354+04 > > > tx_age | 00:04:01.283106 > > > change_age | 00:04:01.279373 > > > datid | 16385 > > > datname | ******* > > > pid | 685 > > > usesysid | 16384 > > > usename | ******* > > > application_name | > > > client_addr | 127.0.0.1 > > > client_hostname | > > > client_port | 37461 > > > backend_start | 2014-04-24 13:30:03.275711+04 > > > xact_start | 2014-04-24 13:45:02.215248+04 > > > query_start | 2014-04-24 13:45:02.215248+04 > > > state_change | 2014-04-24 13:45:02.218981+04 > > > waiting | f > > > state | idle > > > query | > > > | SELECT cache.save_response( > > > | ***** > > > | ); > > > This sounds like the expected behavior. When state=3DIdle, the query fi= eld > > shows the value of the *last* query that ran. See > > > http://www.postgresql.org/docs/9.3/static/monitoring-stats.html#PG-STAT-A= CTIVITY-VIEW > > . > > > > Note that this was changed in 9.2, and at the same time the current_que= ry > > field was renamed to query to highlight this change. > > > > The fact that xact_start shows something far in the past just means tha= t > > there is an open transaction that has been open since then. > > I don't think that's true. In that case the state should have been idle > in transaction, no? > > I think there's a pgstat_report_xact_timestamp(0) missing from > xact.c:PrepareTransaction(). There's no point in reporting a prepared > xact's starttime in a backend that's not associated with it anymore. > > Maxim, are you using prepared transactions? > > No, project isn't using prepared transactions. I absolutely sure. postgres=3D# show max_prepared_transactions; max_prepared_transactions --------------------------- 0 What strange - that exact pg_stat_activity entry stayed with xact_start NULL for 4 minute after the last query for this pid finished, and after that somehow got the live xact_start again (and in the next check iteration after 0.1s xact_start become NULL again for that pid). tx_age can vary, i seen anything between 1 second and 30 minutes... all related to idle backends (so state=3D'idle' is correct). Kind Regards, Maksym
Andres Freund <andres@2ndquadrant.com> writes: > I think there's a pgstat_report_xact_timestamp(0) missing from > xact.c:PrepareTransaction(). There's no point in reporting a prepared > xact's starttime in a backend that's not associated with it anymore. > Maxim, are you using prepared transactions? Even if he were, that wouldn't explain this report, because the displayed latest-query would have to have been a PREPARE TRANSACTION, no? However, regardless of exactly what's going on here, I think you're probably right that PrepareTransaction ought to do pgstat_report_xact_timestamp(0) once it exits the transaction state locally. The definition of that field is that it's null whenever the session isn't inside a transaction, and once we do PREPARE, we aren't. regards, tom lane
On 2014-04-24 12:37:15 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > I think there's a pgstat_report_xact_timestamp(0) missing from > > xact.c:PrepareTransaction(). There's no point in reporting a prepared > > xact's starttime in a backend that's not associated with it anymore. > > > Maxim, are you using prepared transactions? > > Even if he were, that wouldn't explain this report, because the displayed > latest-query would have to have been a PREPARE TRANSACTION, no? Right. I think there's some other contents that could potentially end up there, but surely not the SELECT Maxim is reporting. Especially not as there seem to be several minutes inbetween the pg_stat_get_activity call and the last state change... > However, regardless of exactly what's going on here, I think you're > probably right that PrepareTransaction ought to do > pgstat_report_xact_timestamp(0) once it exits the transaction state > locally. The definition of that field is that it's null whenever the > session isn't inside a transaction, and once we do PREPARE, we aren't. Thanks for fixing it. I don't really have a clue what's going on here. I don't believe pgstat.c's changecount thing is 100% safe, but that doesn't explain the problem here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
maxim.boguk@gmail.com writes: > During debug of the weird nagios database alerts about long transactions I > found that sometime pg_stat_activity contains very weird entries with > xact_start NOT NULL and state='idle'... > ... > What's more: i tested that problem on another databases - and found they > have same effect (so it doesn't look like a problem local to that single > exact database). So can you provide a self-contained test case for this? I tried to reproduce it here without much success, and I don't see a code path other than PREPARE TRANSACTION that could lead to this. regards, tom lane
On Fri, Apr 25, 2014 at 4:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > maxim.boguk@gmail.com writes: > > During debug of the weird nagios database alerts about long transactions > I > > found that sometime pg_stat_activity contains very weird entries with > > xact_start NOT NULL and state='idle'... > > ... > > What's more: i tested that problem on another databases - and found they > > have same effect (so it doesn't look like a problem local to that single > > exact database). > > So can you provide a self-contained test case for this? I tried to > reproduce it here without much success, and I don't see a code path > other than PREPARE TRANSACTION that could lead to this. > > regards, tom lane > I going to work on test case over weekend. Initial quick and dirty attempts to reproduce outside of complicated production environments provided no results yet. What I almost sure now that reproducing of the bug requires a mix of very active connections to the database (with a lot of requests per second) with semi-idle connections. And that transient problem affects only these semi-idle connections. I will provide an update when I get more information. Kind Regards, Maksym
On Fri, Apr 25, 2014 at 11:04 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote: > > > > On Fri, Apr 25, 2014 at 4:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> maxim.boguk@gmail.com writes: >> > During debug of the weird nagios database alerts about long >> transactions I >> > found that sometime pg_stat_activity contains very weird entries with >> > xact_start NOT NULL and state='idle'... >> > ... >> > What's more: i tested that problem on another databases - and found they >> > have same effect (so it doesn't look like a problem local to that single >> > exact database). >> >> So can you provide a self-contained test case for this? I tried to >> reproduce it here without much success, and I don't see a code path >> other than PREPARE TRANSACTION that could lead to this. >> >> regards, tom lane >> > > I going to work on test case over weekend. > Initial quick and dirty attempts to reproduce outside of complicated > production environments provided no results yet. > > What I almost sure now that reproducing of the bug requires a mix of very > active connections to the database (with a lot of requests per second) with > semi-idle connections. > And that transient problem affects only these semi-idle connections. > > I will provide an update when I get more information. > > Kind Regards, > Maksym > Unfortunately even if problem can be seen in at least 10 different project and db-servers, creating reproducible test case had not been successful. So I decided work with gdb on the real database. At this moment I dig only to the pgstat.c : pgstat_read_current_status(void) function. gdb: set print pretty on b pgstat.c:2796 if ((localentry->st_procpid>0) && (localentry->st_xact_start_timestamp>0) && (localentry->st_state==STATE_IDLE)) cont 2796 line in 9.3.4 source is pgstat_read_current_status function just after memcpy(localentry, (char *) beentry, sizeof(PgBackendStatus)); After short time running: select pg_sleep(0.01); select now() as ts,now()-xact_start as tx_age, now()-state_change as change_age,* from pg_stat_activity where state='idle' and xact_start is not null; in loop in process under gdb I got conditional breakpoint fired with the next results: (gdb) cont Continuing. Breakpoint 1, pgstat_read_current_status () at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/pgstat.c:2796 2796 strcpy(localappname, (char *) beentry->st_appname); (gdb) p *localentry $1 = { st_changecount = 1222944, st_procpid = 16215, st_proc_start_timestamp = 454324143169490, st_xact_start_timestamp = 454324206165363, st_activity_start_timestamp = 454324206165363, st_state_start_timestamp = 454324206165650, st_databaseid = 16423, st_userid = 25078444, st_clientaddr = { addr = { ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times> }, salen = 16 }, st_clienthostname = 0x7fac4344fbe0 "", st_waiting = 0 '\000', st_state = STATE_IDLE, st_appname = 0x7fac624d8ff0 "", st_activity = 0x7fac624e4bb0 "SELECT \"last_host_hour_accumulation_flag\".\"id\", \"last_host_hour_accumulation_flag\".\"day\", \"last_host_hour_accumulation_flag\".\"hour\", \"last_host_hour_accumulation_flag\".\"type_id\", \"last_host_hour_accum"... } Please note that the: st_xact_start_timestamp = 454324206165363, st_activity_start_timestamp = 454324206165363, and in the same time: st_state = STATE_IDLE On the next iteration after 0.01 second st_proc_start_timestamp and st_activity_start_timestamp returned to 0 again. Now I have no good idea how to perform the future debug because BackendStatusArray is shared memory structure with every backend writing information into it (so I can't set a conditional breakpoint in specific process) Ideas welcome. Kind Regards, Maksym
On Thu, Apr 24, 2014 at 7:19 PM <maxim.boguk@gmail.com> wrote: > > The following bug has been logged on the website: > > Bug reference: 10123 > Logged by: Maxim Boguk > Email address: maxim.boguk@gmail.com > PostgreSQL version: 9.2.8 > Operating system: Linux > Description: > > Hi everyone, > > During debug of the weird nagios database alerts about long transactions I > found that sometime pg_stat_activity contains very weird entries with > xact_start NOT NULL and state='idle'... what's more they happened during > backend idle state. > > To check what's going on I performed the following actions on production > database: > 1)set log_min_duration_statement=0 to see ALL database queries > 2)run single line bash script to check for weird entries: > while true; do psql -x -X -c "select now() as ts,now()-xact_start as > tx_age, now()-state_change as change_age,* from pg_stat_activity where > state='idle' and xact_start is not null" | grep -vE '^$|No rows'; sleep > 0.1; done > > And yes, once-twice-trice per minute script output some interesting results > such as: > > > -[ RECORD 1 > ]----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ts | 2014-04-24 13:49:03.498354+04 > tx_age | 00:04:01.283106 > change_age | 00:04:01.279373 > datid | 16385 > datname | ******* > pid | 685 > usesysid | 16384 > usename | ******* > application_name | > client_addr | 127.0.0.1 > client_hostname | > client_port | 37461 > backend_start | 2014-04-24 13:30:03.275711+04 > xact_start | 2014-04-24 13:45:02.215248+04 > query_start | 2014-04-24 13:45:02.215248+04 > state_change | 2014-04-24 13:45:02.218981+04 > waiting | f > state | idle > query | > | SELECT cache.save_response( > | ***** > | ); > > > State=idle > xact_start=4 min ago > > Now what I found in the full database log for this pid: > > ... > > 2014-04-24 13:45:02 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 3.703 ms statement: > SELECT cache.save_response(...) > > -- no queries between 13:45 and 13:50 at all! > > 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 0.962 ms statement: select 1 > 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 0.415 ms statement: SELECT * FROM o_queries_queue > ... > > So it seems that sometime pg_stat_activity entry become alive with the old > information (it shows xact_start from the last executed query in backend, > even if that query finished long time ago). > I know about asynchronous behavior of pg_stat_activity but getting > xact_start from deep past seems wrong for me. > > What's more: i tested that problem on another databases - and found they > have same effect (so it doesn't look like a problem local to that single > exact database). Sorry for replying to a very old thread. My customer also observed weird entries in pg_stat_activity, with state="idle" and xact_start showing an old timestamp. I figured out this issue occurs if catchup interrupts arrive in "idle" state and ProcessCatchupInterrupt() is called. ProcessCatchupInterrupt() starts and completes the transaction, as follows, without changing the state (it remains "idle"). However, StartTransactionCommand() sets xact_start to the timestamp of the last query executed, not the current timestamp. If pg_stat_activity is read during this brief transaction, the weird entries (state="idle" and xact_start="old timestamp") can appear. else { elog(DEBUG4, "ProcessCatchupEvent outside transaction"); StartTransactionCommand(); CommitTransactionCommand(); } Similarly, RemoveTempRelationsCallback() can cause the same issue. When a backend exits and RemoveTempRelationsCallback() is called, it also starts and completes a transaction, as follows, without changing the state. This sets xact_start to the timestamp of the last query executed, too. if (OidIsValid(myTempNamespace)) /* should always be true */ { /* Need to ensure we have a usable transaction. */ AbortOutOfAnyTransaction(); StartTransactionCommand(); PushActiveSnapshot(GetTransactionSnapshot()); RemoveTempRelations(myTempNamespace); PopActiveSnapshot(); CommitTransactionCommand(); } Additionally, exec_describe_statement_message() and exec_describe_portal_message() can cause pg_stat_activity records with state="idle" but xact_start!=NULL. However, these cases call SetCurrentStatementStartTimestamp(), setting xact_start to the current timestamp instead of an old one. I'm not sure if these are bugs. However, to prevent confusion from these odd pg_stat_activity entries, I suggest the following changes: 1. Modify ProcessCatchupInterrupt() and RemoveTempRelationsCallback() to reset stmtStartTimestamp (timestamp of the last query executed) to 0 before starting the transaction. This way, during the transaction, records with state="idle" and xact_start=NULL would be observed. 2. Update exec_describe_statement_message() and exec_describe_portal_message() to change the state to "active," similar to exec_simple_query(). This would result in records with state="active" and xact_start="current timestamp" during those operations. Thoughts? Regards, -- Fujii Masao
On 2024/07/04 0:14, Fujii Masao wrote: > I'm not sure if these are bugs. However, to prevent confusion from > these odd pg_stat_activity entries, I suggest the following changes: > > 1. Modify ProcessCatchupInterrupt() and RemoveTempRelationsCallback() > to reset stmtStartTimestamp (timestamp of the last query > executed) to 0 > before starting the transaction. This way, during the transaction, > records with state="idle" and xact_start=NULL would be observed. > > 2. Update exec_describe_statement_message() and > exec_describe_portal_message() > to change the state to "active," similar to exec_simple_query(). > This would result in records with state="active" and > xact_start="current timestamp" > during those operations. Attached patch does these things. I will register it into the next CF. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
Hi, On 2024-07-04 14:54:01 +0900, Fujii Masao wrote: > @@ -515,6 +515,17 @@ ProcessClientReadInterrupt(bool blocked) > /* Check for general interrupts that arrived before/while reading */ > CHECK_FOR_INTERRUPTS(); > > + /* > + * Reset statement_timestamp() to 0 so that ProcessCatchupInterrupt() > + * and ProcessNotifyInterrupt() will set pg_stat_activity.xact_start > + * to NULL when they start new transaction. Otherwise, xact_start > + * could be set to statement_timestamp() that may indicate the > + * timestamp of the last query executed, while the backend's state > + * remains "idle," leading to incorrect "idle" entries with non-NULL > + * xact_start in pg_stat_activity. > + */ > + ResetCurrentStatementStartTimestamp(); > + > /* Process sinval catchup interrupts, if any */ > if (catchupInterruptPending) > ProcessCatchupInterrupt(); This seems odd to me. ProcessClientReadInterrupt() can be called a lot, why do we need to reset the statement start timestamp even if nothing is happening here? > @@ -2598,6 +2609,8 @@ exec_describe_statement_message(const char *stmt_name) > { > CachedPlanSource *psrc; > > + pgstat_report_activity(STATE_RUNNING, NULL); > + > /* > * Start up a transaction command. (Note that this will normally change > * current memory context.) Nothing happens if we are already in one. > @@ -2692,6 +2705,8 @@ exec_describe_portal_message(const char *portal_name) > { > Portal portal; > > + pgstat_report_activity(STATE_RUNNING, NULL); > + > /* > * Start up a transaction command. (Note that this will normally change > * current memory context.) Nothing happens if we are already in one. Calls to pgstat_report_activity() aren't exactly free. This substantially increases the number of calls to it for common workloads. There got to be a more targeted way of dealing with this. Greetings, Andres Freund
On 2024/07/05 7:03, Andres Freund wrote: > Calls to pgstat_report_activity() aren't exactly free. This substantially > increases the number of calls to it for common workloads. There got to be a > more targeted way of dealing with this. Yes. So, as an alternative approach, how about setting xact_start and query_start to NULL whenever the state is idle, since non-NULL values aren't expected in that case? This would prevent the weired pg_stat_activity entries I mentioned earlier, with minimal performance impact. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, 22 Nov 2024 at 17:30, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Patch attached.
+1 I think, this is much better approach than try to reset timestamp.
* Don't expose transaction time for walsenders; it confuses
* monitoring, particularly because we don't keep the time up-to-
* date.
+ *
+ * Also, don't show transaction time for backends in the "idle"
+ * state. There are cases, like during "Describe" message
+ * handling, removing temporary relations at exit, or processing
+ * client read interrupts, where the backend remains "idle" but
+ * still sets transaction time. This can lead to incorrect "idle"
+ * entries with non-NULL transaction times in pg_stat_activity. To
+ * prevent these misleading entries, avoid exposing transaction
+ * time for idle backends.
*/
if (beentry->st_xact_start_timestamp != 0 &&
- beentry->st_backendType != B_WAL_SENDER)
+ beentry->st_backendType != B_WAL_SENDER &&
+ (beentry->st_state != STATE_IDLE ||
+ beentry->st_backendType != B_BACKEND))
* monitoring, particularly because we don't keep the time up-to-
* date.
+ *
+ * Also, don't show transaction time for backends in the "idle"
+ * state. There are cases, like during "Describe" message
+ * handling, removing temporary relations at exit, or processing
+ * client read interrupts, where the backend remains "idle" but
+ * still sets transaction time. This can lead to incorrect "idle"
+ * entries with non-NULL transaction times in pg_stat_activity. To
+ * prevent these misleading entries, avoid exposing transaction
+ * time for idle backends.
*/
if (beentry->st_xact_start_timestamp != 0 &&
- beentry->st_backendType != B_WAL_SENDER)
+ beentry->st_backendType != B_WAL_SENDER &&
+ (beentry->st_state != STATE_IDLE ||
+ beentry->st_backendType != B_BACKEND))
One thing here is not 100% clear for me. As comment above stated, we don't expose transaction time for walsenders, since it confuses monitoring. That is clear.
- if (beentry->st_activity_start_timestamp != 0)
+ /*
+ * Don't expose query start time for idle backends for the same
+ * reasons mentioned above regarding transaction time.
+ */
+ if (beentry->st_activity_start_timestamp != 0 &&
+ (beentry->st_state != STATE_IDLE ||
+ beentry->st_backendType != B_BACKEND))
+ /*
+ * Don't expose query start time for idle backends for the same
+ * reasons mentioned above regarding transaction time.
+ */
+ if (beentry->st_activity_start_timestamp != 0 &&
+ (beentry->st_state != STATE_IDLE ||
+ beentry->st_backendType != B_BACKEND))
But here the comment stated "for the same reasons mentioned above" but the condition is different. AFAICS, the walsender backend is the only one to be affected. It's quite possible I miss something, but I expected to have, let's say, alike conditions in this particular cases.
--
Best regards,
Maxim Orlov.