Thread: Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/02/03 22:35, torikoshia wrote: > Hi, > > When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible: > > $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yet acceptingconnections > DETAIL: Consistent recovery state has not been yet reached. Could you share the steps to reproduce this situation? > However, the log message that indicates the cause of this issue seems to be only output at the DEBUG1 level: > > elog(DEBUG1, > "recovery snapshot waiting for non-overflowed snapshot or " > "until oldest active xid on standby is at least %u (now %u)", > standbySnapshotPendingXmin, > running->oldestRunningXid); > > I believe this message would be useful not only for developers but also for users. Isn't this log message too difficult for most users? It seems to describe PostgreSQL's internal mechanisms, making it hard for users to understand the issue and what actions to take. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-03 13:10, Fujii Masao wrote: Thanks for your comments! > On 2025/02/03 22:35, torikoshia wrote: >> Hi, >> >> When a hot standby is restarted in a state where subtransactions have >> overflowed, it may become inaccessible: >> >> $ psql: error: connection to server at "localhost" (::1), port 5433 >> failed: FATAL: the database system is not yet accepting connections >> DETAIL: Consistent recovery state has not been yet >> reached. > > Could you share the steps to reproduce this situation? We can reproduce this situation using the following procedure. I performed this test with one asynchronous standby server. -- overflow subtransaction (primary)=# create table t1 (i int); (primary)=# select 'insert into t1 values (1); savepoint s_' || generate_series(1, 70) ; \gexec (primary)=# checkpoint; -- restart standby $ pg_ctl restart -D data_stb/ waiting for server to shut down.... done server stopped waiting for server to start.... LOG: redirecting log output to logging collector process ........................................................... stopped waiting pg_ctl: server did not start in time -- standby log DEBUG: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 887 (now 818) >> However, the log message that indicates the cause of this issue seems >> to be only output at the DEBUG1 level: >> >> elog(DEBUG1, >> "recovery snapshot waiting for non-overflowed snapshot or " >> "until oldest active xid on standby is at least %u (now %u)", >> standbySnapshotPendingXmin, >> running->oldestRunningXid); >> >> I believe this message would be useful not only for developers but >> also for users. > > Isn't this log message too difficult for most users? It seems to > describe PostgreSQL's internal mechanisms, making it hard > for users to understand the issue and what actions to take. Agreed and I feel that a message suggesting something like "check if there are any overflowing transactions on the primary side" would make it useful. On the other hand, the manual's explanation of pg_stat_get_backend_subxact() does not mention subtransaction overflow, so I am not sure how much detail should be included. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/04 0:20, torikoshia wrote: > On 2025-03-03 13:10, Fujii Masao wrote: > > Thanks for your comments! > >> On 2025/02/03 22:35, torikoshia wrote: >>> Hi, >>> >>> When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible: >>> >>> $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yetaccepting connections >>> DETAIL: Consistent recovery state has not been yet reached. >> >> Could you share the steps to reproduce this situation? > > We can reproduce this situation using the following procedure. Thanks! I was able to reproduce the issue. > Agreed and I feel that a message suggesting something like "check if there are any overflowing transactions on the primaryside" would make it useful. I’m wondering if this message might still be confusing for users. Would they immediately understand what "overflowing transactions" means? Even after reading this message, it seems also unclear what actions they should take to resolve the issue. Plus, this message can appear multiple times if there are multiple overflowing transactions before starting accepting read-only connections - which could be even more confusing. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-04 03:17, Fujii Masao wrote: >> Agreed and I feel that a message suggesting something like "check if >> there are any overflowing transactions on the primary side" would make >> it useful. > > I’m wondering if this message might still be confusing for users. > Would they immediately understand what "overflowing transactions" > means? > Even after reading this message, it seems also unclear what actions > they should take to resolve the issue. Plus, this message can appear > multiple times if there are multiple overflowing transactions before > starting accepting read-only connections - which could be even more > confusing. It seems better to reconsider the content and timing of this message output. I personally think that logging information about this situation where subtransaction overflowed and it prevents hot standby connections would be helpful for users and support providers to understand the cause of the issue. Do you think such logging is unnecessary? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.