Thread: elog.c query_id support vs shutdown
Hi, In the course of [1] I again looked at all pgstat_* callsites and noticed the new calls to pgstat_get_my_query_id() in elog.c. As currently implemented those pgstat_get_my_query_id() calls are not safe. It's fine during backend startup because MyBEEntry is not set, but during shutdown that's not ok, because we never unset MyBEEntry. andres@awork3:~/src/postgresql$ /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres --single postgres -D /srv/dev/pgdev-dev/-c 'log_line_prefix=%Q' -c log_min_messages=debug1 0NOTICE: database system was shut down at 2021-08-07 16:42:26 PDT 0DEBUG: checkpoint record is at 1/1C000508 0DEBUG: redo record is at 1/1C000508; shutdown true 0DEBUG: next transaction ID: 213067; next OID: 16434 0DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 0DEBUG: oldest unfrozen transaction ID: 701, in database 1 0DEBUG: oldest MultiXactId: 1, in database 1 0DEBUG: commit timestamp Xid oldest/newest: 0/0 0DEBUG: transaction ID wrap limit is 2147484348, limited by database with OID 1 0DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 0DEBUG: starting up replication slots 0DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 0DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1 PostgreSQL stand-alone backend 15devel backend> 0NOTICE: shutting down 0DEBUG: performing replication slot checkpoint Segmentation fault #0 0x000055cb9b871632 in pgstat_get_my_query_id () at /home/andres/src/postgresql/src/backend/utils/activity/backend_status.c:1042 #1 0x000055cb9ba00398 in log_line_prefix (buf=0x7ffc751092e0, edata=0x55cb9bd6a900 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:2723 #2 0x000055cb9ba00ee6 in send_message_to_server_log (edata=0x55cb9bd6a900 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3017 #3 0x000055cb9b9fe1b3 in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1542 #4 0x000055cb9b9fb600 in errfinish (filename=0x55cb9bc1a0eb "miscinit.c", lineno=974, funcname=0x55cb9bc1aea0 <__func__.5>"UnlinkLockFiles") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:597 #5 0x000055cb9ba0f2b4 in UnlinkLockFiles (status=0, arg=0) at /home/andres/src/postgresql/src/backend/utils/init/miscinit.c:974 #6 0x000055cb9b8148da in proc_exit_prepare (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:209 #7 0x000055cb9b814792 in proc_exit (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:107 #8 0x000055cb9b851568 in PostgresMain (argc=9, argv=0x55cb9daddec0, dbname=0x55cb9dac4f70 "postgres", username=0x55cb9dac3580"andres") at /home/andres/src/postgresql/src/backend/tcop/postgres.c:4702 #9 0x000055cb9b672c5b in main (argc=9, argv=0x55cb9daddec0) at /home/andres/src/postgresql/src/backend/main/main.c:195 I suspect that to make the elog.c usage safe, we'll have to clear MyBEEntry in pgstat_beshutdown_hook(). Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20210807210349.bby5ta2xrbnte6ht%40alap3.anarazel.de
On Sat, Aug 07, 2021 at 04:44:07PM -0700, Andres Freund wrote: > > As currently implemented those pgstat_get_my_query_id() calls are not > safe. It's fine during backend startup because MyBEEntry is not set, but > during shutdown that's not ok, because we never unset MyBEEntry. > > andres@awork3:~/src/postgresql$ /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres --single postgres -D/srv/dev/pgdev-dev/ -c 'log_line_prefix=%Q' -c log_min_messages=debug1 > [...] > PostgreSQL stand-alone backend 15devel > backend> 0NOTICE: shutting down > 0DEBUG: performing replication slot checkpoint > Segmentation fault Ouch > I suspect that to make the elog.c usage safe, we'll have to clear MyBEEntry in > pgstat_beshutdown_hook(). I agree, and a quick test indeed fix your scenario. It also seems like a good thing to do overall. I didn't find any other problematic corner cases, but I'm not that familiar with pgstat, especially after the recent activity.
On Sun, Aug 08, 2021 at 01:46:39PM +0800, Julien Rouhaud wrote: > On Sat, Aug 07, 2021 at 04:44:07PM -0700, Andres Freund wrote: > > > > As currently implemented those pgstat_get_my_query_id() calls are not > > safe. It's fine during backend startup because MyBEEntry is not set, but > > during shutdown that's not ok, because we never unset MyBEEntry. > > > > andres@awork3:~/src/postgresql$ /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres --single postgres -D/srv/dev/pgdev-dev/ -c 'log_line_prefix=%Q' -c log_min_messages=debug1 > > [...] > > PostgreSQL stand-alone backend 15devel > > backend> 0NOTICE: shutting down > > 0DEBUG: performing replication slot checkpoint > > Segmentation fault > FTR I just added an open item for that.
Hi, On 2021-08-08 13:46:39 +0800, Julien Rouhaud wrote: > On Sat, Aug 07, 2021 at 04:44:07PM -0700, Andres Freund wrote: > > > > As currently implemented those pgstat_get_my_query_id() calls are not > > safe. It's fine during backend startup because MyBEEntry is not set, but > > during shutdown that's not ok, because we never unset MyBEEntry. > > > > andres@awork3:~/src/postgresql$ /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres --single postgres -D/srv/dev/pgdev-dev/ -c 'log_line_prefix=%Q' -c log_min_messages=debug1 > > [...] > > PostgreSQL stand-alone backend 15devel > > backend> 0NOTICE: shutting down > > 0DEBUG: performing replication slot checkpoint > > Segmentation fault > > Ouch > > > I suspect that to make the elog.c usage safe, we'll have to clear MyBEEntry in > > pgstat_beshutdown_hook(). > > I agree, and a quick test indeed fix your scenario. It also seems like a good > thing to do overall. Yea, it does seem like a good thing. But we should do a search for the problems it could cause... > I didn't find any other problematic corner cases, but I'm not that familiar > with pgstat, especially after the recent activity. I don't think anything relevant to this issue has changed so far... And there shouldn't be a meaningful amount of change to backend_status.c anyway - the "what is currently happening" stuff that backend_status.c implements is mostly independent from the "what has happened so far" that pgstats.c implements. It probably would be a good idea to separate out the two namespaces more clearly. Even with things like pgstat_report_activity() not being entirely clear cut (because of the pgstat_count_conn_* calls) it still seems like it would be an improvement. But I don't want to do that before the shared memory stuff is in. Greetings, Andres Freund
Hi, On 2021-08-08 11:53:39 -0700, Andres Freund wrote: > On 2021-08-08 13:46:39 +0800, Julien Rouhaud wrote: > > > I suspect that to make the elog.c usage safe, we'll have to clear MyBEEntry in > > > pgstat_beshutdown_hook(). > > > > I agree, and a quick test indeed fix your scenario. It also seems like a good > > thing to do overall. > > Yea, it does seem like a good thing. But we should do a search for the > problems it could cause... Not a problem with unsetting MyBEEntry. But the search for problems made me reread the following comment: /* * There's no need for a lock around pgstat_begin_read_activity / * pgstat_end_read_activity here as it's only called from * pg_stat_get_activity which is already protected, or from the same * backend which means that there won't be concurrent writes. */ I don't understand the pg_stat_get_activity() part of this comment? pgstat_get_my_query_id() hardcodes MyBEEntry->st_query_id, so it can't be useful to pg_stat_get_activity(), nor is it used there? I assume it's just a remnant from an earlier iteration of the code? Greetings, Andres Freund
On Mon, Aug 9, 2021 at 3:06 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-08-08 11:53:39 -0700, Andres Freund wrote: > > On 2021-08-08 13:46:39 +0800, Julien Rouhaud wrote: > > > > I suspect that to make the elog.c usage safe, we'll have to clear MyBEEntry in > > > > pgstat_beshutdown_hook(). > > > > > > I agree, and a quick test indeed fix your scenario. It also seems like a good > > > thing to do overall. > > > > Yea, it does seem like a good thing. But we should do a search for the > > problems it could cause... Agreed, and I'm also looking into it. > Not a problem with unsetting MyBEEntry. But the search for problems made me > reread the following comment: > > /* > * There's no need for a lock around pgstat_begin_read_activity / > * pgstat_end_read_activity here as it's only called from > * pg_stat_get_activity which is already protected, or from the same > * backend which means that there won't be concurrent writes. > */ > > I don't understand the pg_stat_get_activity() part of this comment? > pgstat_get_my_query_id() hardcodes MyBEEntry->st_query_id, so it can't be > useful to pg_stat_get_activity(), nor is it used there? > > I assume it's just a remnant from an earlier iteration of the code? Ah indeed! This was quite a long thread so I didn't try to see when that changed. I also now realize that I made a typo in the patch where I s/loop/look/ which was then changed to s/look/lock/. The comment should be something like: /* * There's no need for a loop around pgstat_begin_read_activity / * pgstat_end_read_activity here as it's only called from the same backend * which means that there won't be concurrent writes. */
On Mon, Aug 09, 2021 at 11:29:24AM +0800, Julien Rouhaud wrote: > Ah indeed! This was quite a long thread so I didn't try to see when > that changed. I also now realize that I made a typo in the patch > where I s/loop/look/ which was then changed to s/look/lock/. The > comment should be something like: The thread of this open item is now idle for 10 days and there does not seem to be a lot of progress. Bruce, this is assigned to you. Are you planning to look at it? -- Michael
Attachment
On Thu, Aug 19, 2021 at 3:05 PM Michael Paquier <michael@paquier.xyz> wrote: > > The thread of this open item is now idle for 10 days and there does > not seem to be a lot of progress. Bruce, this is assigned to you. > Are you planning to look at it? I'm really sorry for the lack of answer on my side, I had too many duties at work to thoroughly look at the implication of clearing MyBEEntry in the pgstat_beshutdown_hook :( With what I've seen so far I didn't find any evidence that it could lead to any new problem (it should actually make things slightly safer) and couldn't hit any issue with some testing. Opinion from someone more familiar with pgstats is of course welcome. While reading the various code, I also found this incorrect comment in backend_status.c: /* exposed so that progress.c can access it */ PgBackendStatus *MyBEEntry = NULL; progress.c was apparently renamed to backend_progress.c sometime during e1025044 development (Split backend status and progress related functionality out of pgstat.c.). So I'm +1 for the fix originally suggested by Andres.
On Thu, Aug 19, 2021 at 03:29:40PM +0800, Julien Rouhaud wrote: > On Thu, Aug 19, 2021 at 3:05 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > The thread of this open item is now idle for 10 days and there does > > not seem to be a lot of progress. Bruce, this is assigned to you. > > Are you planning to look at it? > > I'm really sorry for the lack of answer on my side, I had too many > duties at work to thoroughly look at the implication of clearing > MyBEEntry in the pgstat_beshutdown_hook :( > > With what I've seen so far I didn't find any evidence that it could > lead to any new problem (it should actually make things slightly > safer) and couldn't hit any issue with some testing. Opinion from > someone more familiar with pgstats is of course welcome. Okay. Would you like to write a patch then? -- Michael
Attachment
On 2021-08-19 17:05:01 +0900, Michael Paquier wrote: > Okay. Would you like to write a patch then? I'll push something in a bit... - Andres
On 2021-08-19 01:10:55 -0700, Andres Freund wrote: > On 2021-08-19 17:05:01 +0900, Michael Paquier wrote: > > Okay. Would you like to write a patch then? > > I'll push something in a bit... Done.
On Thu, Aug 19, 2021 at 8:20 PM Andres Freund <andres@anarazel.de> wrote: > > On 2021-08-19 01:10:55 -0700, Andres Freund wrote: > > On 2021-08-19 17:05:01 +0900, Michael Paquier wrote: > > > Okay. Would you like to write a patch then? > > > > I'll push something in a bit... > > Done. Thanks!
On Thu, Aug 19, 2021 at 05:20:52AM -0700, Andres Freund wrote: > On 2021-08-19 01:10:55 -0700, Andres Freund wrote: > > On 2021-08-19 17:05:01 +0900, Michael Paquier wrote: > > > Okay. Would you like to write a patch then? > > > > I'll push something in a bit... > > Done. Thanks. I was kind of wondering since there had not been a patch posted yet. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Aug 19, 2021 at 05:20:52AM -0700, Andres Freund wrote: > Done. Thanks. There does not seem to be anything left here, so I have marked this open item as fixed. -- Michael