Thread: Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/02/03 22:35, torikoshia wrote: > Hi, > > When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible: > > $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yet acceptingconnections > DETAIL: Consistent recovery state has not been yet reached. Could you share the steps to reproduce this situation? > However, the log message that indicates the cause of this issue seems to be only output at the DEBUG1 level: > > elog(DEBUG1, > "recovery snapshot waiting for non-overflowed snapshot or " > "until oldest active xid on standby is at least %u (now %u)", > standbySnapshotPendingXmin, > running->oldestRunningXid); > > I believe this message would be useful not only for developers but also for users. Isn't this log message too difficult for most users? It seems to describe PostgreSQL's internal mechanisms, making it hard for users to understand the issue and what actions to take. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-03 13:10, Fujii Masao wrote: Thanks for your comments! > On 2025/02/03 22:35, torikoshia wrote: >> Hi, >> >> When a hot standby is restarted in a state where subtransactions have >> overflowed, it may become inaccessible: >> >> $ psql: error: connection to server at "localhost" (::1), port 5433 >> failed: FATAL: the database system is not yet accepting connections >> DETAIL: Consistent recovery state has not been yet >> reached. > > Could you share the steps to reproduce this situation? We can reproduce this situation using the following procedure. I performed this test with one asynchronous standby server. -- overflow subtransaction (primary)=# create table t1 (i int); (primary)=# select 'insert into t1 values (1); savepoint s_' || generate_series(1, 70) ; \gexec (primary)=# checkpoint; -- restart standby $ pg_ctl restart -D data_stb/ waiting for server to shut down.... done server stopped waiting for server to start.... LOG: redirecting log output to logging collector process ........................................................... stopped waiting pg_ctl: server did not start in time -- standby log DEBUG: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 887 (now 818) >> However, the log message that indicates the cause of this issue seems >> to be only output at the DEBUG1 level: >> >> elog(DEBUG1, >> "recovery snapshot waiting for non-overflowed snapshot or " >> "until oldest active xid on standby is at least %u (now %u)", >> standbySnapshotPendingXmin, >> running->oldestRunningXid); >> >> I believe this message would be useful not only for developers but >> also for users. > > Isn't this log message too difficult for most users? It seems to > describe PostgreSQL's internal mechanisms, making it hard > for users to understand the issue and what actions to take. Agreed and I feel that a message suggesting something like "check if there are any overflowing transactions on the primary side" would make it useful. On the other hand, the manual's explanation of pg_stat_get_backend_subxact() does not mention subtransaction overflow, so I am not sure how much detail should be included. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/04 0:20, torikoshia wrote: > On 2025-03-03 13:10, Fujii Masao wrote: > > Thanks for your comments! > >> On 2025/02/03 22:35, torikoshia wrote: >>> Hi, >>> >>> When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible: >>> >>> $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yetaccepting connections >>> DETAIL: Consistent recovery state has not been yet reached. >> >> Could you share the steps to reproduce this situation? > > We can reproduce this situation using the following procedure. Thanks! I was able to reproduce the issue. > Agreed and I feel that a message suggesting something like "check if there are any overflowing transactions on the primaryside" would make it useful. I’m wondering if this message might still be confusing for users. Would they immediately understand what "overflowing transactions" means? Even after reading this message, it seems also unclear what actions they should take to resolve the issue. Plus, this message can appear multiple times if there are multiple overflowing transactions before starting accepting read-only connections - which could be even more confusing. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-04 03:17, Fujii Masao wrote: >> Agreed and I feel that a message suggesting something like "check if >> there are any overflowing transactions on the primary side" would make >> it useful. > > I’m wondering if this message might still be confusing for users. > Would they immediately understand what "overflowing transactions" > means? > Even after reading this message, it seems also unclear what actions > they should take to resolve the issue. Plus, this message can appear > multiple times if there are multiple overflowing transactions before > starting accepting read-only connections - which could be even more > confusing. It seems better to reconsider the content and timing of this message output. I personally think that logging information about this situation where subtransaction overflowed and it prevents hot standby connections would be helpful for users and support providers to understand the cause of the issue. Do you think such logging is unnecessary? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
Hi, After an off-list discussion with Fujii-san, I'm now trying to modify the following message that is output when a client attempts to connect instead of changing the log level as the original proposal: $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yet accepting connections DETAIL: Consistent recovery state has not been yet reached. I have now 2 candidates to do this. The 1st one(v1-0001-Change-log-message-when-hot-standby-is-not-access.patch) is a simple update to the existing log messages, explicitly mentioning that snapshot overflow could be a possible cause. The 2nd(v1-0001-Make-it-clear-when-hot-standby-is-inaccessible-du.patch) one introduces new states for pmState and CAC_state (which manages whether connections can be accepted) to represent waiting for a non-overflowed snapshot. The advantage of the 2nd one is that it makes it clear whether the connection failure is due to not reaching a consistent recovery state or a snapshot overflow. However, I haven't found other significant benefits, and I feel it might be overkill. Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback. -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/12 21:57, torikoshia wrote: > Hi, > > After an off-list discussion with Fujii-san, I'm now trying to modify the following message that is output when a clientattempts to connect instead of changing the log level as the original proposal: > > $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yet acceptingconnections > DETAIL: Consistent recovery state has not been yet reached. > > I have now 2 candidates to do this. Thanks for the patches! > The 1st one(v1-0001-Change-log-message-when-hot-standby-is-not-access.patch) is a simple update to the existing log messages,explicitly mentioning that snapshot overflow could be a possible cause. > The 2nd(v1-0001-Make-it-clear-when-hot-standby-is-inaccessible-du.patch) one introduces new states for pmState and CAC_state(which manages whether connections can be accepted) to represent waiting for a non-overflowed snapshot. > > The advantage of the 2nd one is that it makes it clear whether the connection failure is due to not reaching a consistentrecovery state or a snapshot overflow. However, I haven't found other significant benefits, and I feel it mightbe overkill. I agree that adding a new postmaster signal and state for this minor issue seems unnecessary. > Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback. Agreed. - errdetail("Consistent recovery state has not been yet reached."))); + errdetail("Consistent recovery state has not been yet reached, or snappshot is pending becausesubtransaction is overflowed."), + errhint("In the latter case, find and close the transaction with more than %d subtransactions",PGPROC_MAX_CACHED_SUBXIDS))); This message might be too detailed. Instead, how about simplifying it to something like: "Consistent recovery state has not been reached, or snapshot is not ready for hot standby." We can then update the documentation to clarify that overflowed subtransactions may delay snapshot readiness for hot standby and explain how to address it. For example, the current description - "it will begin accepting connections once the recovery has brought the system to a consistent state." - should be updated to reflect this condition. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
Hi, On 2025-03-21 02:15, Fujii Masao wrote: Thanks for your review! >> Personally, I feel 1st patch may be sufficient, but I would appreciate >> any feedback. > > Agreed. > > - errdetail("Consistent recovery state has not been yet > reached."))); > + errdetail("Consistent recovery state has not been yet > reached, or snappshot is pending because subtransaction is > overflowed."), > + errhint("In the latter case, find and close the transaction > with more than %d subtransactions", PGPROC_MAX_CACHED_SUBXIDS))); > > This message might be too detailed. Instead, how about simplifying it > to something like: "Consistent recovery state has not been reached, > or snapshot is not ready for hot standby." Agreed. Do you also think the errhint message is unnecessary? I agree with your idea to add a description of the overflowed subtransaction in the manual, but I'm not sure all users will be able to find it. Some people may not understand what needs to be done to make the snapshot ready for hot standby. I think adding an errhint may help those users. > We can then update the documentation to clarify that overflowed > subtransactions > may delay snapshot readiness for hot standby and explain how to address > it. > For example, the current description - "it will begin accepting > connections once > the recovery has brought the system to a consistent state." - should be > updated > to reflect this condition. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/21 21:29, torikoshia wrote: > Hi, > > On 2025-03-21 02:15, Fujii Masao wrote: > Thanks for your review! > >>> Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback. >> >> Agreed. >> >> - errdetail("Consistent recovery state has not been yet reached."))); >> + errdetail("Consistent recovery state has not been yet >> reached, or snappshot is pending because subtransaction is >> overflowed."), >> + errhint("In the latter case, find and close the transaction >> with more than %d subtransactions", PGPROC_MAX_CACHED_SUBXIDS))); >> >> This message might be too detailed. Instead, how about simplifying it >> to something like: "Consistent recovery state has not been reached, >> or snapshot is not ready for hot standby." > > Agreed. > > Do you also think the errhint message is unnecessary? > I agree with your idea to add a description of the overflowed subtransaction in the manual, but I'm not sure all userswill be able to find it. > Some people may not understand what needs to be done to make the snapshot ready for hot standby. > I think adding an errhint may help those users. I see your concern that users might overlook the documentation and struggle to find a solution. However, I still believe it's better to include this information in the documentation rather than logging it as a hint. Since the scenario where the hint would be useful is relatively rare, logging it every time might be more confusing than helpful. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-24 00:08, Fujii Masao wrote: >> >> Do you also think the errhint message is unnecessary? >> I agree with your idea to add a description of the overflowed >> subtransaction in the manual, but I'm not sure all users will be able >> to find it. >> Some people may not understand what needs to be done to make the >> snapshot ready for hot standby. >> I think adding an errhint may help those users. > > I see your concern that users might overlook the documentation and > struggle to find a solution. However, I still believe it's better to > include this information in the documentation rather than logging it > as a hint. Since the scenario where the hint would be useful is > relatively rare, logging it every time might be more confusing than > helpful. Thanks for your opinion and it sounds reasonable. Attached an updated patch. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/24 23:18, torikoshia wrote: > On 2025-03-24 00:08, Fujii Masao wrote: > >>> >>> Do you also think the errhint message is unnecessary? >>> I agree with your idea to add a description of the overflowed subtransaction in the manual, but I'm not sure all userswill be able to find it. >>> Some people may not understand what needs to be done to make the snapshot ready for hot standby. >>> I think adding an errhint may help those users. >> >> I see your concern that users might overlook the documentation and >> struggle to find a solution. However, I still believe it's better to >> include this information in the documentation rather than logging it >> as a hint. Since the scenario where the hint would be useful is >> relatively rare, logging it every time might be more confusing than helpful. > > Thanks for your opinion and it sounds reasonable. > > Attached an updated patch. Thanks for updating the patch! In high-availability.sgml, the "Administrator's Overview" section already describes the conditions for accepting hot standby connections. This section should also be updated accordingly. + brought the system to a consistent state. However, overflowed + subtransactions may also delay snapshot readiness for hot standby. In such + case, the issue can be resolved by closing the transaction containing the + overflowed subtransactions. All connections accepted by the hot standby + are strictly read-only; not even temporary tables may be written. It would be better to move this explanation about overflowed subtransactions to the "Administrator's Overview" section. - case CAC_NOTCONSISTENT: + case CAC_NOTCONSISTENT_OR_OVERFLOWED: This new name seems a bit too long. I'm OK to leave the name as it is. Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
Hi, I had another off-list discussion with Fujii-san, and according to the following manual[1], it seems that a transaction with an overflowed subtransaction is already considered inconsistent: Reaching a consistent state can also be delayed in the presence of both of these conditions: - A write transaction has more than 64 subtransactions - Very long-lived write transactions IIUC, the manual suggests that both conditions must be met -- recovery reaching at least minRecoveryPoint and no overflowed subtransactions —- for the standby to be considered consistent. OTOH, the following log message is emitted even when subtransactions have overflowed, which appears to contradict the definition of consistency mentioned above: LOG: consistent recovery state reached This log message is triggered when recovery progresses beyond minRecoveryPoint(according to CheckRecoveryConsistency()). However, since this state does not satisfy 'consistency' defined in the manual, I think it would be more accurate to log that it has merely reached the "minimum recovery point". Furthermore, it may be better to emit the above log message only when recovery has progressed beyond minRecoveryPoint and there are no overflowed subtransactions. Attached patch does this. Additionally, renaming variables such as reachedConsistency in CheckRecoveryConsistency might also be appropriate. However, in the attached patch, I have left them unchanged for now. On 2025-03-25 00:55, Fujii Masao wrote: > - case CAC_NOTCONSISTENT: > + case CAC_NOTCONSISTENT_OR_OVERFLOWED: > > This new name seems a bit too long. I'm OK to leave the name as it is. > Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me. Beyond just the length issue, given the understanding outlined above, I now think CAC_NOTCONSISTENT does not actually need to be changed. > In high-availability.sgml, the "Administrator's Overview" section > already > describes the conditions for accepting hot standby connections. > This section should also be updated accordingly. Agreed. I have updated this section to mention that the resolution is to close the problematic transaction. OTOH the changes made in v2 patch seem unnecessary, since the concept of 'consistent' is already explained in the "Administrator's Overview." - errdetail("Consistent recovery state has not been yet reached."))); + errdetail("Consistent recovery state has not been yet reached, or snappshot is pending because subtransaction is overflowed."), Given the above understanding, "or" is not appropriate in this context, so I left this message unchanged. Instead, I have added an errhint. The phrasing in the hint message aligns with the manual, allowing users to search for this hint and find the newly added resolution instructions. What do you think? [1] https://www.postgresql.org/docs/devel/hot-standby.html -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-27 11:06, torikoshia wrote: > Hi, > > I had another off-list discussion with Fujii-san, and according to the > following manual[1], it seems that a transaction with an overflowed > subtransaction is already considered inconsistent: > > Reaching a consistent state can also be delayed in the presence of > both of these conditions: > > - A write transaction has more than 64 subtransactions > - Very long-lived write transactions > > IIUC, the manual suggests that both conditions must be met -- recovery > reaching at least minRecoveryPoint and no overflowed subtransactions > —- for the standby to be considered consistent. > > OTOH, the following log message is emitted even when subtransactions > have overflowed, which appears to contradict the definition of > consistency mentioned above: > > LOG: consistent recovery state reached > > This log message is triggered when recovery progresses beyond > minRecoveryPoint(according to CheckRecoveryConsistency()). > However, since this state does not satisfy 'consistency' defined in > the manual, I think it would be more accurate to log that it has > merely reached the "minimum recovery point". > Furthermore, it may be better to emit the above log message only when > recovery has progressed beyond minRecoveryPoint and there are no > overflowed subtransactions. > > Attached patch does this. > > Additionally, renaming variables such as reachedConsistency in > CheckRecoveryConsistency might also be appropriate. > However, in the attached patch, I have left them unchanged for now. > > > On 2025-03-25 00:55, Fujii Masao wrote: >> - case CAC_NOTCONSISTENT: >> + case CAC_NOTCONSISTENT_OR_OVERFLOWED: >> >> This new name seems a bit too long. I'm OK to leave the name as it is. >> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me. > > Beyond just the length issue, given the understanding outlined above, > I now think CAC_NOTCONSISTENT does not actually need to be changed. > > >> In high-availability.sgml, the "Administrator's Overview" section >> already >> describes the conditions for accepting hot standby connections. >> This section should also be updated accordingly. > > Agreed. > I have updated this section to mention that the resolution is to close > the problematic transaction. > OTOH the changes made in v2 patch seem unnecessary, since the concept > of 'consistent' is already explained in the "Administrator's > Overview." > > > - errdetail("Consistent recovery state has not been yet > reached."))); > + errdetail("Consistent recovery state has not been yet > reached, or snappshot is pending because subtransaction is > overflowed."), > > Given the above understanding, "or" is not appropriate in this > context, so I left this message unchanged. > Instead, I have added an errhint. The phrasing in the hint message > aligns with the manual, allowing users to search for this hint and > find the newly added resolution instructions. On second thought, it may not be appropriate to show this output to clients attempting to connect. This message should be notified not to clients but to administrators. From this point of view, it'd be better to output a message indicating the status inside ProcArrayApplyRecoveryInfo(). However, a straightforward implementation would result in the same message being logged every time an XLOG_RUNNING_XACTS WAL is received, making it noisy. Instead of directly outputting a log indicating that a hot standby connection cannot be established due to subtransaction overflow, the attached patch updates the manual so that administrators can determine whether a subtransaction overflow has occurred based on the modified log output. What do you think? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/28 0:13, torikoshia wrote: > On 2025-03-27 11:06, torikoshia wrote: >> Hi, >> >> I had another off-list discussion with Fujii-san, and according to the >> following manual[1], it seems that a transaction with an overflowed >> subtransaction is already considered inconsistent: >> >> Reaching a consistent state can also be delayed in the presence of >> both of these conditions: >> >> - A write transaction has more than 64 subtransactions >> - Very long-lived write transactions >> >> IIUC, the manual suggests that both conditions must be met -- recovery >> reaching at least minRecoveryPoint and no overflowed subtransactions >> —- for the standby to be considered consistent. >> >> OTOH, the following log message is emitted even when subtransactions >> have overflowed, which appears to contradict the definition of >> consistency mentioned above: >> >> LOG: consistent recovery state reached >> >> This log message is triggered when recovery progresses beyond >> minRecoveryPoint(according to CheckRecoveryConsistency()). >> However, since this state does not satisfy 'consistency' defined in >> the manual, I think it would be more accurate to log that it has >> merely reached the "minimum recovery point". >> Furthermore, it may be better to emit the above log message only when >> recovery has progressed beyond minRecoveryPoint and there are no >> overflowed subtransactions. >> >> Attached patch does this. >> >> Additionally, renaming variables such as reachedConsistency in >> CheckRecoveryConsistency might also be appropriate. >> However, in the attached patch, I have left them unchanged for now. >> >> >> On 2025-03-25 00:55, Fujii Masao wrote: >>> - case CAC_NOTCONSISTENT: >>> + case CAC_NOTCONSISTENT_OR_OVERFLOWED: >>> >>> This new name seems a bit too long. I'm OK to leave the name as it is. >>> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me. >> >> Beyond just the length issue, given the understanding outlined above, >> I now think CAC_NOTCONSISTENT does not actually need to be changed. >> >> >>> In high-availability.sgml, the "Administrator's Overview" section already >>> describes the conditions for accepting hot standby connections. >>> This section should also be updated accordingly. >> >> Agreed. >> I have updated this section to mention that the resolution is to close >> the problematic transaction. >> OTOH the changes made in v2 patch seem unnecessary, since the concept >> of 'consistent' is already explained in the "Administrator's >> Overview." >> >> >> - errdetail("Consistent recovery state has not been yet reached."))); >> + errdetail("Consistent recovery state has not been yet >> reached, or snappshot is pending because subtransaction is >> overflowed."), >> >> Given the above understanding, "or" is not appropriate in this >> context, so I left this message unchanged. >> Instead, I have added an errhint. The phrasing in the hint message >> aligns with the manual, allowing users to search for this hint and >> find the newly added resolution instructions. > > On second thought, it may not be appropriate to show this output to clients attempting to connect. This message shouldbe notified not to clients but to administrators. > > From this point of view, it'd be better to output a message indicating the status inside ProcArrayApplyRecoveryInfo().However, a straightforward implementation would result in the same message being logged everytime an XLOG_RUNNING_XACTS WAL is received, making it noisy. > > Instead of directly outputting a log indicating that a hot standby connection cannot be established due to subtransactionoverflow, the attached patch updates the manual so that administrators can determine whether a subtransactionoverflow has occurred based on the modified log output. > > > What do you think? I had the same thought during our off-list discussion. However, after reviewing the recovery code - such as recoveryStopsBefore(), which checks whether a consistent state is reached - I now believe the manual’s definition of a consistent state may be incorrect. A consistent state should be defined as the point where recovery has reached minRecoveryPoint. If we were to change the definition to match the manual, we would also need to update various recovery checks, which wouldn't be a trivial task. Given that, I now think it's better to revive your v1 patch, which introduces a new postmaster signal and improves the error message when connections are not accepted during hot standby. I've attached a revised version of the patch based on your v1. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-03-31 12:51, Fujii Masao wrote: > I had the same thought during our off-list discussion. However, > after reviewing the recovery code - such as recoveryStopsBefore(), > which checks whether a consistent state is reached - I now believe > the manual’s definition of a consistent state may be incorrect. > A consistent state should be defined as the point where recovery > has reached minRecoveryPoint. I now agree with you. > If we were to change the definition to match the manual, > we would also need to update various recovery checks, > which wouldn't be a trivial task. > > Given that, I now think it's better to revive your v1 patch, > which introduces a new postmaster signal and improves the error message > when connections are not accepted during hot standby. I've attached > a revised version of the patch based on your v1. Thought? Thank you for writing the patch! Here are some comments on the documentation. The following description in high-availability.sgml also seems to misuse the word 'consistent': When the <xref linkend="guc-hot-standby"/> parameter is set to true on a standby server, it will begin accepting connections once the recovery has brought the system to a consistent state. Since this is part of the "User's Overview" section, it may not be appropriate to include too much detail. How about rewording it to avoid using 'consistent', for example: When the <xref linkend="guc-hot-standby"/> parameter is set to true on a standby server, it will begin accepting connections once it is ready. + delaying accepting read-only connections. To enable hot standby, + a long-lived write transaction with more than 64 subtransactions + needs to be closed on the primary. Is it better to use 'transactions' in the plural form rather than as a nominal? - There may be more than one such transaction. - The <itemizedlist> below also uses the plural form. - The newly added message also uses the plural form: + errhint("To enable hot standby, close write transactions with more than %d subtransactions on the primary server." What do you think? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Yugo Nagata
Date:
On Mon, 31 Mar 2025 12:51:06 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > On 2025/03/28 0:13, torikoshia wrote: > > On 2025-03-27 11:06, torikoshia wrote: > >> Hi, > >> > >> I had another off-list discussion with Fujii-san, and according to the > >> following manual[1], it seems that a transaction with an overflowed > >> subtransaction is already considered inconsistent: > >> > >> Reaching a consistent state can also be delayed in the presence of > >> both of these conditions: > >> > >> - A write transaction has more than 64 subtransactions > >> - Very long-lived write transactions > >> > >> IIUC, the manual suggests that both conditions must be met -- recovery > >> reaching at least minRecoveryPoint and no overflowed subtransactions > >> —- for the standby to be considered consistent. > >> > >> OTOH, the following log message is emitted even when subtransactions > >> have overflowed, which appears to contradict the definition of > >> consistency mentioned above: > >> > >> LOG: consistent recovery state reached > >> > >> This log message is triggered when recovery progresses beyond > >> minRecoveryPoint(according to CheckRecoveryConsistency()). > >> However, since this state does not satisfy 'consistency' defined in > >> the manual, I think it would be more accurate to log that it has > >> merely reached the "minimum recovery point". > >> Furthermore, it may be better to emit the above log message only when > >> recovery has progressed beyond minRecoveryPoint and there are no > >> overflowed subtransactions. > >> > >> Attached patch does this. > >> > >> Additionally, renaming variables such as reachedConsistency in > >> CheckRecoveryConsistency might also be appropriate. > >> However, in the attached patch, I have left them unchanged for now. > >> > >> > >> On 2025-03-25 00:55, Fujii Masao wrote: > >>> - case CAC_NOTCONSISTENT: > >>> + case CAC_NOTCONSISTENT_OR_OVERFLOWED: > >>> > >>> This new name seems a bit too long. I'm OK to leave the name as it is. > >>> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me. > >> > >> Beyond just the length issue, given the understanding outlined above, > >> I now think CAC_NOTCONSISTENT does not actually need to be changed. > >> > >> > >>> In high-availability.sgml, the "Administrator's Overview" section already > >>> describes the conditions for accepting hot standby connections. > >>> This section should also be updated accordingly. > >> > >> Agreed. > >> I have updated this section to mention that the resolution is to close > >> the problematic transaction. > >> OTOH the changes made in v2 patch seem unnecessary, since the concept > >> of 'consistent' is already explained in the "Administrator's > >> Overview." > >> > >> > >> - errdetail("Consistent recovery state has not been yet reached."))); > >> + errdetail("Consistent recovery state has not been yet > >> reached, or snappshot is pending because subtransaction is > >> overflowed."), > >> > >> Given the above understanding, "or" is not appropriate in this > >> context, so I left this message unchanged. > >> Instead, I have added an errhint. The phrasing in the hint message > >> aligns with the manual, allowing users to search for this hint and > >> find the newly added resolution instructions. > > > > On second thought, it may not be appropriate to show this output to clients attempting to connect. This message shouldbe notified not to clients but to administrators. > > > > From this point of view, it'd be better to output a message indicating the status inside ProcArrayApplyRecoveryInfo().However, a straightforward implementation would result in the same message being logged everytime an XLOG_RUNNING_XACTS WAL is received, making it noisy. > > > > Instead of directly outputting a log indicating that a hot standby connection cannot be established due to subtransactionoverflow, the attached patch updates the manual so that administrators can determine whether a subtransactionoverflow has occurred based on the modified log output. > > > > > > What do you think? > > I had the same thought during our off-list discussion. However, > after reviewing the recovery code - such as recoveryStopsBefore(), > which checks whether a consistent state is reached - I now believe > the manual’s definition of a consistent state may be incorrect. > A consistent state should be defined as the point where recovery > has reached minRecoveryPoint. > > If we were to change the definition to match the manual, > we would also need to update various recovery checks, > which wouldn't be a trivial task. > > Given that, I now think it's better to revive your v1 patch, > which introduces a new postmaster signal and improves the error message > when connections are not accepted during hot standby. I've attached > a revised version of the patch based on your v1. Thought? I prefer this approach clarifying that consistency and subtransaction overflow are separate concepts in the documentation. Here are minor comments on the patch: - case CAC_NOTCONSISTENT: - if (EnableHotStandby) + case CAC_NOTHOTSTANDBY: + if (!EnableHotStandby) ereport(FATAL, (errcode(ERRCODE_CANNOT_CONNECT_NOW), errmsg("the database system is not yet accepting connections"), - errdetail("Consistent recovery state has not been yet reached."))); + errdetail("Hot standby mode is disabled."))); + else if (reachedConsistency) + ereport(FATAL, + (errcode(ERRCODE_CANNOT_CONNECT_NOW), + errmsg("the database system is not accepting connections"), + errdetail("Recovery snapshot is not yet ready for hot standby."), + errhint("To enable hot standby, close write transactions with more than %d subtransactions onthe primary server.", + PGPROC_MAX_CACHED_SUBXIDS))); else ereport(FATAL, (errcode(ERRCODE_CANNOT_CONNECT_NOW), errmsg("the database system is not accepting connections"), - errdetail("Hot standby mode is disabled."))); + errdetail("Consistent recovery state has not been yet reached."))); The message says "the database system is not yet accepting connections" when "Hot standby mode is disabled". I think "yet" is not necessary in this case. Otherwise, when "Recovery snapshot is not yet ready for hot standby" or "Consistent recovery state has not been yet reached", it seems better to use "yet" Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/31 22:44, torikoshia wrote: > Here are some comments on the documentation. Thanks for the review! > The following description in high-availability.sgml also seems to misuse the word 'consistent': > > When the <xref linkend="guc-hot-standby"/> parameter is set to true on a > standby server, it will begin accepting connections once the recovery has > brought the system to a consistent state. > > Since this is part of the "User's Overview" section, it may not be appropriate to include too much detail. > How about rewording it to avoid using 'consistent', for example: > > When the <xref linkend="guc-hot-standby"/> parameter is set to true on a > standby server, it will begin accepting connections once it is ready. "once it is ready." feels too vague to me. How about using "once recovery has brought the system to a consistent state and be ready for hot standby." instead? > + delaying accepting read-only connections. To enable hot standby, > + a long-lived write transaction with more than 64 subtransactions > + needs to be closed on the primary. > > Is it better to use 'transactions' in the plural form rather than as a nominal? > > - There may be more than one such transaction. > - The <itemizedlist> below also uses the plural form. > - The newly added message also uses the plural form: > + errhint("To enable hot standby, close write transactions with more than %d subtransactions on the primary server." > > > What do you think? I'm not sure whether the plural form is better here, but I've updated the patch as suggested. Attached is the revised version. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/03/31 22:45, Yugo Nagata wrote: > I prefer this approach clarifying that consistency and subtransaction overflow > are separate concepts in the documentation. > > Here are minor comments on the patch: Thanks for the review! > - case CAC_NOTCONSISTENT: > - if (EnableHotStandby) > + case CAC_NOTHOTSTANDBY: > + if (!EnableHotStandby) > ereport(FATAL, > (errcode(ERRCODE_CANNOT_CONNECT_NOW), > errmsg("the database system is not yet accepting connections"), > - errdetail("Consistent recovery state has not been yet reached."))); > + errdetail("Hot standby mode is disabled."))); > + else if (reachedConsistency) > + ereport(FATAL, > + (errcode(ERRCODE_CANNOT_CONNECT_NOW), > + errmsg("the database system is not accepting connections"), > + errdetail("Recovery snapshot is not yet ready for hot standby."), > + errhint("To enable hot standby, close write transactions with more than %d subtransactionson the primary server.", > + PGPROC_MAX_CACHED_SUBXIDS))); > else > ereport(FATAL, > (errcode(ERRCODE_CANNOT_CONNECT_NOW), > errmsg("the database system is not accepting connections"), > - errdetail("Hot standby mode is disabled."))); > + errdetail("Consistent recovery state has not been yet reached."))); > > The message says "the database system is not yet accepting connections" when "Hot standby mode is disabled". > I think "yet" is not necessary in this case. Otherwise, when "Recovery snapshot is not yet ready for hot standby" > or "Consistent recovery state has not been yet reached", it seems better to use "yet" I may have unintentionally modified the error message. I fixed the patch as suggested. Please check the latest patch I posted earlier in response to Torikoshi-san. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-04-01 01:12, Fujii Masao wrote: > On 2025/03/31 22:45, Yugo Nagata wrote: >> I prefer this approach clarifying that consistency and subtransaction >> overflow >> are separate concepts in the documentation. >> >> Here are minor comments on the patch: > > Thanks for the review! > > >> - case CAC_NOTCONSISTENT: >> - if (EnableHotStandby) >> + case CAC_NOTHOTSTANDBY: >> + if (!EnableHotStandby) >> ereport(FATAL, >> (errcode(ERRCODE_CANNOT_CONNECT_NOW), >> errmsg("the database system is not yet >> accepting connections"), >> - errdetail("Consistent recovery state has >> not been yet reached."))); >> + errdetail("Hot standby mode is >> disabled."))); >> + else if (reachedConsistency) >> + ereport(FATAL, >> + (errcode(ERRCODE_CANNOT_CONNECT_NOW), >> + errmsg("the database system is not >> accepting connections"), >> + errdetail("Recovery snapshot is not yet >> ready for hot standby."), >> + errhint("To enable hot standby, close >> write transactions with more than %d subtransactions on the primary >> server.", >> + PGPROC_MAX_CACHED_SUBXIDS))); >> else >> ereport(FATAL, >> (errcode(ERRCODE_CANNOT_CONNECT_NOW), >> errmsg("the database system is not >> accepting connections"), >> - errdetail("Hot standby mode is >> disabled."))); >> + errdetail("Consistent recovery state has >> not been yet reached."))); >> >> The message says "the database system is not yet accepting >> connections" when "Hot standby mode is disabled". >> I think "yet" is not necessary in this case. Otherwise, when "Recovery >> snapshot is not yet ready for hot standby" >> or "Consistent recovery state has not been yet reached", it seems >> better to use "yet" > > I may have unintentionally modified the error message. > I fixed the patch as suggested. Please check the latest patch > I posted earlier in response to Torikoshi-san. Thank you for updating the patch! LGTM. I feel like changing the status to 'Ready for Committer', but since Nagata-san may have additional comments, I'm leaving it as 'Needs Review'. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
Fujii Masao
Date:
On 2025/04/01 20:54, torikoshia wrote: > Thank you for updating the patch! > > LGTM. I've pushed the patch. Thanks! > I feel like changing the status to 'Ready for Committer', but since Nagata-san may have additional comments, I'm leavingit as 'Needs Review'. If any issues arise, let's continue to address them. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
From
torikoshia
Date:
On 2025-04-02 15:21, Fujii Masao wrote: > On 2025/04/01 20:54, torikoshia wrote: >> Thank you for updating the patch! >> >> LGTM. > > I've pushed the patch. Thanks! Thanks a lot! >> I feel like changing the status to 'Ready for Committer', but since >> Nagata-san may have additional comments, I'm leaving it as 'Needs >> Review'. > > If any issues arise, let's continue to address them. OK. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.