Thread: Re: Change log level for notifying hot standby is waiting non-overflowed snapshot


On 2025/02/03 22:35, torikoshia wrote:
> Hi,
> 
> When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible:
> 
>    $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL:  the database system is not yet
acceptingconnections
 
>            DETAIL:  Consistent recovery state has not been yet reached.

Could you share the steps to reproduce this situation?


> However, the log message that indicates the cause of this issue seems to be only output at the DEBUG1 level:
> 
>    elog(DEBUG1,
>         "recovery snapshot waiting for non-overflowed snapshot or "
>         "until oldest active xid on standby is at least %u (now %u)",
>         standbySnapshotPendingXmin,
>         running->oldestRunningXid);
> 
> I believe this message would be useful not only for developers but also for users.

Isn't this log message too difficult for most users? It seems to
describe PostgreSQL's internal mechanisms, making it hard
for users to understand the issue and what actions to take.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-03-03 13:10, Fujii Masao wrote:

Thanks for your comments!

> On 2025/02/03 22:35, torikoshia wrote:
>> Hi,
>> 
>> When a hot standby is restarted in a state where subtransactions have 
>> overflowed, it may become inaccessible:
>> 
>>    $ psql: error: connection to server at "localhost" (::1), port 5433 
>> failed: FATAL:  the database system is not yet accepting connections
>>            DETAIL:  Consistent recovery state has not been yet 
>> reached.
> 
> Could you share the steps to reproduce this situation?

We can reproduce this situation using the following procedure.
I performed this test with one asynchronous standby server.

-- overflow subtransaction
(primary)=# create table t1 (i int);
(primary)=# select 'insert into t1 values (1); savepoint s_' || 
generate_series(1, 70) ; \gexec
(primary)=# checkpoint;

-- restart standby
$ pg_ctl restart -D data_stb/
waiting for server to shut down.... done
server stopped
waiting for server to start.... LOG:  redirecting log output to logging 
collector process
........................................................... stopped 
waiting
pg_ctl: server did not start in time

-- standby log
DEBUG:  recovery snapshot waiting for non-overflowed snapshot or until 
oldest active xid on standby is at least 887 (now 818)


>> However, the log message that indicates the cause of this issue seems 
>> to be only output at the DEBUG1 level:
>> 
>>    elog(DEBUG1,
>>         "recovery snapshot waiting for non-overflowed snapshot or "
>>         "until oldest active xid on standby is at least %u (now %u)",
>>         standbySnapshotPendingXmin,
>>         running->oldestRunningXid);
>> 
>> I believe this message would be useful not only for developers but 
>> also for users.
> 
> Isn't this log message too difficult for most users? It seems to
> describe PostgreSQL's internal mechanisms, making it hard
> for users to understand the issue and what actions to take.

Agreed and I feel that a message suggesting something like "check if 
there are any overflowing transactions on the primary side" would make 
it useful.
On the other hand, the manual's explanation of 
pg_stat_get_backend_subxact() does not mention subtransaction overflow, 
so I am not sure how much detail should be included.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.




On 2025/03/04 0:20, torikoshia wrote:
> On 2025-03-03 13:10, Fujii Masao wrote:
> 
> Thanks for your comments!
> 
>> On 2025/02/03 22:35, torikoshia wrote:
>>> Hi,
>>>
>>> When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible:
>>>
>>>    $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL:  the database system is not
yetaccepting connections
 
>>>            DETAIL:  Consistent recovery state has not been yet reached.
>>
>> Could you share the steps to reproduce this situation?
> 
> We can reproduce this situation using the following procedure.

Thanks! I was able to reproduce the issue.


> Agreed and I feel that a message suggesting something like "check if there are any overflowing transactions on the
primaryside" would make it useful.
 

I’m wondering if this message might still be confusing for users.
Would they immediately understand what "overflowing transactions" means?
Even after reading this message, it seems also unclear what actions
they should take to resolve the issue. Plus, this message can appear
multiple times if there are multiple overflowing transactions before
starting accepting read-only connections - which could be even more confusing.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-03-04 03:17, Fujii Masao wrote:

>> Agreed and I feel that a message suggesting something like "check if 
>> there are any overflowing transactions on the primary side" would make 
>> it useful.
> 
> I’m wondering if this message might still be confusing for users.
> Would they immediately understand what "overflowing transactions" 
> means?
> Even after reading this message, it seems also unclear what actions
> they should take to resolve the issue. Plus, this message can appear
> multiple times if there are multiple overflowing transactions before
> starting accepting read-only connections - which could be even more 
> confusing.

It seems better to reconsider the content and timing of this message 
output.

I personally think that logging information about this situation where 
subtransaction overflowed and it prevents hot standby connections would 
be helpful for users and support providers to understand the cause of 
the issue.
Do you think such logging is unnecessary?


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.



Hi,

After an off-list discussion with Fujii-san, I'm now trying to modify 
the following message that is output when a client attempts to connect 
instead of changing the log level as the original proposal:

   $ psql: error: connection to server at "localhost" (::1), port 5433 
failed: FATAL:  the database system is not yet accepting connections
     DETAIL:  Consistent recovery state has not been yet reached.

I have now 2 candidates to do this.

The 1st 
one(v1-0001-Change-log-message-when-hot-standby-is-not-access.patch) is 
a simple update to the existing log messages, explicitly mentioning that 
snapshot overflow could be a possible cause.
The 2nd(v1-0001-Make-it-clear-when-hot-standby-is-inaccessible-du.patch) 
one introduces new states for pmState and CAC_state (which manages 
whether connections can be accepted) to represent waiting for a 
non-overflowed snapshot.

The advantage of the 2nd one is that it makes it clear whether the 
connection failure is due to not reaching a consistent recovery state or 
a snapshot overflow. However, I haven't found other significant 
benefits, and I feel it might be overkill.

Personally, I feel 1st patch may be sufficient, but I would appreciate 
any feedback.



--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment

On 2025/03/12 21:57, torikoshia wrote:
> Hi,
> 
> After an off-list discussion with Fujii-san, I'm now trying to modify the following message that is output when a
clientattempts to connect instead of changing the log level as the original proposal:
 
> 
>    $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL:  the database system is not yet
acceptingconnections
 
>      DETAIL:  Consistent recovery state has not been yet reached.
> 
> I have now 2 candidates to do this.

Thanks for the patches!


> The 1st one(v1-0001-Change-log-message-when-hot-standby-is-not-access.patch) is a simple update to the existing log
messages,explicitly mentioning that snapshot overflow could be a possible cause.
 
> The 2nd(v1-0001-Make-it-clear-when-hot-standby-is-inaccessible-du.patch) one introduces new states for pmState and
CAC_state(which manages whether connections can be accepted) to represent waiting for a non-overflowed snapshot.
 
> 
> The advantage of the 2nd one is that it makes it clear whether the connection failure is due to not reaching a
consistentrecovery state or a snapshot overflow. However, I haven't found other significant benefits, and I feel it
mightbe overkill.
 

I agree that adding a new postmaster signal and state for
this minor issue seems unnecessary.


> Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback.

Agreed.

-                             errdetail("Consistent recovery state has not been yet reached.")));
+                             errdetail("Consistent recovery state has not been yet reached, or snappshot is pending
becausesubtransaction is overflowed."),
 
+                             errhint("In the latter case, find and close the transaction with more than %d
subtransactions",PGPROC_MAX_CACHED_SUBXIDS)));
 

This message might be too detailed. Instead, how about simplifying it
to something like: "Consistent recovery state has not been reached,
or snapshot is not ready for hot standby."

We can then update the documentation to clarify that overflowed subtransactions
may delay snapshot readiness for hot standby and explain how to address it.
For example, the current description - "it will begin accepting connections once
the recovery has brought the system to a consistent state." - should be updated
to reflect this condition.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Hi,

On 2025-03-21 02:15, Fujii Masao wrote:
Thanks for your review!

>> Personally, I feel 1st patch may be sufficient, but I would appreciate 
>> any feedback.
> 
> Agreed.
> 
> -                             errdetail("Consistent recovery state has not been yet 
> reached.")));
> +                             errdetail("Consistent recovery state has not been yet
> reached, or snappshot is pending because subtransaction is
> overflowed."),
> +                             errhint("In the latter case, find and close the transaction
> with more than %d subtransactions", PGPROC_MAX_CACHED_SUBXIDS)));
> 
> This message might be too detailed. Instead, how about simplifying it
> to something like: "Consistent recovery state has not been reached,
> or snapshot is not ready for hot standby."

Agreed.

Do you also think the errhint message is unnecessary?
I agree with your idea to add a description of the overflowed 
subtransaction in the manual, but I'm not sure all users will be able to 
find it.
Some people may not understand what needs to be done to make the 
snapshot ready for hot standby.
I think adding an errhint may help those users.

> We can then update the documentation to clarify that overflowed 
> subtransactions
> may delay snapshot readiness for hot standby and explain how to address 
> it.
> For example, the current description - "it will begin accepting 
> connections once
> the recovery has brought the system to a consistent state." - should be 
> updated
> to reflect this condition.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.




On 2025/03/21 21:29, torikoshia wrote:
> Hi,
> 
> On 2025-03-21 02:15, Fujii Masao wrote:
> Thanks for your review!
> 
>>> Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback.
>>
>> Agreed.
>>
>> -                             errdetail("Consistent recovery state has not been yet reached.")));
>> +                             errdetail("Consistent recovery state has not been yet
>> reached, or snappshot is pending because subtransaction is
>> overflowed."),
>> +                             errhint("In the latter case, find and close the transaction
>> with more than %d subtransactions", PGPROC_MAX_CACHED_SUBXIDS)));
>>
>> This message might be too detailed. Instead, how about simplifying it
>> to something like: "Consistent recovery state has not been reached,
>> or snapshot is not ready for hot standby."
> 
> Agreed.
> 
> Do you also think the errhint message is unnecessary?
> I agree with your idea to add a description of the overflowed subtransaction in the manual, but I'm not sure all
userswill be able to find it.
 
> Some people may not understand what needs to be done to make the snapshot ready for hot standby.
> I think adding an errhint may help those users.

I see your concern that users might overlook the documentation and
struggle to find a solution. However, I still believe it's better to
include this information in the documentation rather than logging it
as a hint. Since the scenario where the hint would be useful is
relatively rare, logging it every time might be more confusing than helpful.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-03-24 00:08, Fujii Masao wrote:

>> 
>> Do you also think the errhint message is unnecessary?
>> I agree with your idea to add a description of the overflowed 
>> subtransaction in the manual, but I'm not sure all users will be able 
>> to find it.
>> Some people may not understand what needs to be done to make the 
>> snapshot ready for hot standby.
>> I think adding an errhint may help those users.
> 
> I see your concern that users might overlook the documentation and
> struggle to find a solution. However, I still believe it's better to
> include this information in the documentation rather than logging it
> as a hint. Since the scenario where the hint would be useful is
> relatively rare, logging it every time might be more confusing than 
> helpful.

Thanks for your opinion and it sounds reasonable.

Attached an updated patch.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment

On 2025/03/24 23:18, torikoshia wrote:
> On 2025-03-24 00:08, Fujii Masao wrote:
> 
>>>
>>> Do you also think the errhint message is unnecessary?
>>> I agree with your idea to add a description of the overflowed subtransaction in the manual, but I'm not sure all
userswill be able to find it.
 
>>> Some people may not understand what needs to be done to make the snapshot ready for hot standby.
>>> I think adding an errhint may help those users.
>>
>> I see your concern that users might overlook the documentation and
>> struggle to find a solution. However, I still believe it's better to
>> include this information in the documentation rather than logging it
>> as a hint. Since the scenario where the hint would be useful is
>> relatively rare, logging it every time might be more confusing than helpful.
> 
> Thanks for your opinion and it sounds reasonable.
> 
> Attached an updated patch.

Thanks for updating the patch!

In high-availability.sgml, the "Administrator's Overview" section already
describes the conditions for accepting hot standby connections.
This section should also be updated accordingly.

+    brought the system to a consistent state.  However, overflowed
+    subtransactions may also delay snapshot readiness for hot standby. In such
+    case, the issue can be resolved by closing the transaction containing the
+    overflowed subtransactions.  All connections accepted by the hot standby
+    are strictly read-only; not even temporary tables may be written.

It would be better to move this explanation about overflowed subtransactions
to the "Administrator's Overview" section.

-            case CAC_NOTCONSISTENT:
+            case CAC_NOTCONSISTENT_OR_OVERFLOWED:

This new name seems a bit too long. I'm OK to leave the name as it is.
Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.
Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Hi,

I had another off-list discussion with Fujii-san, and according to the 
following manual[1], it seems that a transaction with an overflowed 
subtransaction is already considered inconsistent:

   Reaching a consistent state can also be delayed in the presence of 
both of these conditions:

   - A write transaction has more than 64 subtransactions
   - Very long-lived write transactions

IIUC, the manual suggests that both conditions must be met -- recovery 
reaching at least minRecoveryPoint and no overflowed subtransactions —- 
for the standby to be considered consistent.

OTOH, the following log message is emitted even when subtransactions 
have overflowed, which appears to contradict the definition of 
consistency mentioned above:

   LOG:  consistent recovery state reached

This log message is triggered when recovery progresses beyond 
minRecoveryPoint(according to CheckRecoveryConsistency()).
However, since this state does not satisfy 'consistency' defined in the 
manual, I think it would be more accurate to log that it has merely 
reached the "minimum recovery point".
Furthermore, it may be better to emit the above log message only when 
recovery has progressed beyond minRecoveryPoint and there are no 
overflowed subtransactions.

Attached patch does this.

Additionally, renaming variables such as reachedConsistency in 
CheckRecoveryConsistency might also be appropriate.
However, in the attached patch, I have left them unchanged for now.


On 2025-03-25 00:55, Fujii Masao wrote:
> -                  case CAC_NOTCONSISTENT:
> +                  case CAC_NOTCONSISTENT_OR_OVERFLOWED:
> 
> This new name seems a bit too long. I'm OK to leave the name as it is.
> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.

Beyond just the length issue, given the understanding outlined above, I 
now think CAC_NOTCONSISTENT does not actually need to be changed.


> In high-availability.sgml, the "Administrator's Overview" section 
> already
> describes the conditions for accepting hot standby connections.
> This section should also be updated accordingly.

Agreed.
I have updated this section to mention that the resolution is to close 
the problematic transaction.
OTOH the changes made in v2 patch seem unnecessary, since the concept of 
'consistent' is already explained in the "Administrator's Overview."


-                             errdetail("Consistent recovery state has not been yet 
reached.")));
+                             errdetail("Consistent recovery state has not been yet reached, 
or snappshot is pending because subtransaction is overflowed."),

Given the above understanding, "or" is not appropriate in this context, 
so I left this message unchanged.
Instead, I have added an errhint. The phrasing in the hint message 
aligns with the manual, allowing users to search for this hint and find 
the newly added resolution instructions.


What do you think?

[1] https://www.postgresql.org/docs/devel/hot-standby.html

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment
On 2025-03-27 11:06, torikoshia wrote:
> Hi,
> 
> I had another off-list discussion with Fujii-san, and according to the
> following manual[1], it seems that a transaction with an overflowed
> subtransaction is already considered inconsistent:
> 
>   Reaching a consistent state can also be delayed in the presence of
> both of these conditions:
> 
>   - A write transaction has more than 64 subtransactions
>   - Very long-lived write transactions
> 
> IIUC, the manual suggests that both conditions must be met -- recovery
> reaching at least minRecoveryPoint and no overflowed subtransactions
> —- for the standby to be considered consistent.
> 
> OTOH, the following log message is emitted even when subtransactions
> have overflowed, which appears to contradict the definition of
> consistency mentioned above:
> 
>   LOG:  consistent recovery state reached
> 
> This log message is triggered when recovery progresses beyond
> minRecoveryPoint(according to CheckRecoveryConsistency()).
> However, since this state does not satisfy 'consistency' defined in
> the manual, I think it would be more accurate to log that it has
> merely reached the "minimum recovery point".
> Furthermore, it may be better to emit the above log message only when
> recovery has progressed beyond minRecoveryPoint and there are no
> overflowed subtransactions.
> 
> Attached patch does this.
> 
> Additionally, renaming variables such as reachedConsistency in
> CheckRecoveryConsistency might also be appropriate.
> However, in the attached patch, I have left them unchanged for now.
> 
> 
> On 2025-03-25 00:55, Fujii Masao wrote:
>> -                  case CAC_NOTCONSISTENT:
>> +                  case CAC_NOTCONSISTENT_OR_OVERFLOWED:
>> 
>> This new name seems a bit too long. I'm OK to leave the name as it is.
>> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.
> 
> Beyond just the length issue, given the understanding outlined above,
> I now think CAC_NOTCONSISTENT does not actually need to be changed.
> 
> 
>> In high-availability.sgml, the "Administrator's Overview" section 
>> already
>> describes the conditions for accepting hot standby connections.
>> This section should also be updated accordingly.
> 
> Agreed.
> I have updated this section to mention that the resolution is to close
> the problematic transaction.
> OTOH the changes made in v2 patch seem unnecessary, since the concept
> of 'consistent' is already explained in the "Administrator's
> Overview."
> 
> 
> -                             errdetail("Consistent recovery state has not been yet 
> reached.")));
> +                             errdetail("Consistent recovery state has not been yet
> reached, or snappshot is pending because subtransaction is
> overflowed."),
> 
> Given the above understanding, "or" is not appropriate in this
> context, so I left this message unchanged.
> Instead, I have added an errhint. The phrasing in the hint message
> aligns with the manual, allowing users to search for this hint and
> find the newly added resolution instructions.

On second thought, it may not be appropriate to show this output to 
clients attempting to connect. This message should be notified not to 
clients but to administrators.

 From this point of view, it'd be better to output a message indicating 
the status inside ProcArrayApplyRecoveryInfo(). However, a 
straightforward implementation would result in the same message being 
logged every time an XLOG_RUNNING_XACTS WAL is received, making it 
noisy.

Instead of directly outputting a log indicating that a hot standby 
connection cannot be established due to subtransaction overflow, the 
attached patch updates the manual so that administrators can determine 
whether a subtransaction overflow has occurred based on the modified log 
output.


What do you think?


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment

On 2025/03/28 0:13, torikoshia wrote:
> On 2025-03-27 11:06, torikoshia wrote:
>> Hi,
>>
>> I had another off-list discussion with Fujii-san, and according to the
>> following manual[1], it seems that a transaction with an overflowed
>> subtransaction is already considered inconsistent:
>>
>>   Reaching a consistent state can also be delayed in the presence of
>> both of these conditions:
>>
>>   - A write transaction has more than 64 subtransactions
>>   - Very long-lived write transactions
>>
>> IIUC, the manual suggests that both conditions must be met -- recovery
>> reaching at least minRecoveryPoint and no overflowed subtransactions
>> —- for the standby to be considered consistent.
>>
>> OTOH, the following log message is emitted even when subtransactions
>> have overflowed, which appears to contradict the definition of
>> consistency mentioned above:
>>
>>   LOG:  consistent recovery state reached
>>
>> This log message is triggered when recovery progresses beyond
>> minRecoveryPoint(according to CheckRecoveryConsistency()).
>> However, since this state does not satisfy 'consistency' defined in
>> the manual, I think it would be more accurate to log that it has
>> merely reached the "minimum recovery point".
>> Furthermore, it may be better to emit the above log message only when
>> recovery has progressed beyond minRecoveryPoint and there are no
>> overflowed subtransactions.
>>
>> Attached patch does this.
>>
>> Additionally, renaming variables such as reachedConsistency in
>> CheckRecoveryConsistency might also be appropriate.
>> However, in the attached patch, I have left them unchanged for now.
>>
>>
>> On 2025-03-25 00:55, Fujii Masao wrote:
>>> -                  case CAC_NOTCONSISTENT:
>>> +                  case CAC_NOTCONSISTENT_OR_OVERFLOWED:
>>>
>>> This new name seems a bit too long. I'm OK to leave the name as it is.
>>> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.
>>
>> Beyond just the length issue, given the understanding outlined above,
>> I now think CAC_NOTCONSISTENT does not actually need to be changed.
>>
>>
>>> In high-availability.sgml, the "Administrator's Overview" section already
>>> describes the conditions for accepting hot standby connections.
>>> This section should also be updated accordingly.
>>
>> Agreed.
>> I have updated this section to mention that the resolution is to close
>> the problematic transaction.
>> OTOH the changes made in v2 patch seem unnecessary, since the concept
>> of 'consistent' is already explained in the "Administrator's
>> Overview."
>>
>>
>> -                             errdetail("Consistent recovery state has not been yet reached.")));
>> +                             errdetail("Consistent recovery state has not been yet
>> reached, or snappshot is pending because subtransaction is
>> overflowed."),
>>
>> Given the above understanding, "or" is not appropriate in this
>> context, so I left this message unchanged.
>> Instead, I have added an errhint. The phrasing in the hint message
>> aligns with the manual, allowing users to search for this hint and
>> find the newly added resolution instructions.
> 
> On second thought, it may not be appropriate to show this output to clients attempting to connect. This message
shouldbe notified not to clients but to administrators.
 
> 
>  From this point of view, it'd be better to output a message indicating the status inside
ProcArrayApplyRecoveryInfo().However, a straightforward implementation would result in the same message being logged
everytime an XLOG_RUNNING_XACTS WAL is received, making it noisy.
 
> 
> Instead of directly outputting a log indicating that a hot standby connection cannot be established due to
subtransactionoverflow, the attached patch updates the manual so that administrators can determine whether a
subtransactionoverflow has occurred based on the modified log output.
 
> 
> 
> What do you think?

I had the same thought during our off-list discussion. However,
after reviewing the recovery code - such as recoveryStopsBefore(),
which checks whether a consistent state is reached - I now believe
the manual’s definition of a consistent state may be incorrect.
A consistent state should be defined as the point where recovery
has reached minRecoveryPoint.

If we were to change the definition to match the manual,
we would also need to update various recovery checks,
which wouldn't be a trivial task.

Given that, I now think it's better to revive your v1 patch,
which introduces a new postmaster signal and improves the error message
when connections are not accepted during hot standby. I've attached
a revised version of the patch based on your v1. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
On 2025-03-31 12:51, Fujii Masao wrote:
> I had the same thought during our off-list discussion. However,
> after reviewing the recovery code - such as recoveryStopsBefore(),
> which checks whether a consistent state is reached - I now believe
> the manual’s definition of a consistent state may be incorrect.
> A consistent state should be defined as the point where recovery
> has reached minRecoveryPoint.

I now agree with you.

> If we were to change the definition to match the manual,
> we would also need to update various recovery checks,
> which wouldn't be a trivial task.
> 
> Given that, I now think it's better to revive your v1 patch,
> which introduces a new postmaster signal and improves the error message
> when connections are not accepted during hot standby. I've attached
> a revised version of the patch based on your v1. Thought?

Thank you for writing the patch!


Here are some comments on the documentation.

The following description in high-availability.sgml also seems to misuse 
the word 'consistent':

   When the <xref linkend="guc-hot-standby"/> parameter is set to true on 
a
   standby server, it will begin accepting connections once the recovery 
has
   brought the system to a consistent state.

Since this is part of the "User's Overview" section, it may not be 
appropriate to include too much detail.
How about rewording it to avoid using 'consistent', for example:

   When the <xref linkend="guc-hot-standby"/> parameter is set to true on 
a
   standby server, it will begin accepting connections once it is ready.


+    delaying accepting read-only connections.  To enable hot standby,
+    a long-lived write transaction with more than 64 subtransactions
+    needs to be closed on the primary.

Is it better to use 'transactions' in the plural form rather than as a 
nominal?

- There may be more than one such transaction.
- The <itemizedlist> below also uses the plural form.
- The newly added message also uses the plural form:
   + errhint("To enable hot standby, close write transactions with more 
than %d subtransactions on the primary server."


What do you think?

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.



On Mon, 31 Mar 2025 12:51:06 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

> 
> 
> On 2025/03/28 0:13, torikoshia wrote:
> > On 2025-03-27 11:06, torikoshia wrote:
> >> Hi,
> >>
> >> I had another off-list discussion with Fujii-san, and according to the
> >> following manual[1], it seems that a transaction with an overflowed
> >> subtransaction is already considered inconsistent:
> >>
> >>   Reaching a consistent state can also be delayed in the presence of
> >> both of these conditions:
> >>
> >>   - A write transaction has more than 64 subtransactions
> >>   - Very long-lived write transactions
> >>
> >> IIUC, the manual suggests that both conditions must be met -- recovery
> >> reaching at least minRecoveryPoint and no overflowed subtransactions
> >> —- for the standby to be considered consistent.
> >>
> >> OTOH, the following log message is emitted even when subtransactions
> >> have overflowed, which appears to contradict the definition of
> >> consistency mentioned above:
> >>
> >>   LOG:  consistent recovery state reached
> >>
> >> This log message is triggered when recovery progresses beyond
> >> minRecoveryPoint(according to CheckRecoveryConsistency()).
> >> However, since this state does not satisfy 'consistency' defined in
> >> the manual, I think it would be more accurate to log that it has
> >> merely reached the "minimum recovery point".
> >> Furthermore, it may be better to emit the above log message only when
> >> recovery has progressed beyond minRecoveryPoint and there are no
> >> overflowed subtransactions.
> >>
> >> Attached patch does this.
> >>
> >> Additionally, renaming variables such as reachedConsistency in
> >> CheckRecoveryConsistency might also be appropriate.
> >> However, in the attached patch, I have left them unchanged for now.
> >>
> >>
> >> On 2025-03-25 00:55, Fujii Masao wrote:
> >>> -                  case CAC_NOTCONSISTENT:
> >>> +                  case CAC_NOTCONSISTENT_OR_OVERFLOWED:
> >>>
> >>> This new name seems a bit too long. I'm OK to leave the name as it is.
> >>> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.
> >>
> >> Beyond just the length issue, given the understanding outlined above,
> >> I now think CAC_NOTCONSISTENT does not actually need to be changed.
> >>
> >>
> >>> In high-availability.sgml, the "Administrator's Overview" section already
> >>> describes the conditions for accepting hot standby connections.
> >>> This section should also be updated accordingly.
> >>
> >> Agreed.
> >> I have updated this section to mention that the resolution is to close
> >> the problematic transaction.
> >> OTOH the changes made in v2 patch seem unnecessary, since the concept
> >> of 'consistent' is already explained in the "Administrator's
> >> Overview."
> >>
> >>
> >> -                             errdetail("Consistent recovery state has not been yet reached.")));
> >> +                             errdetail("Consistent recovery state has not been yet
> >> reached, or snappshot is pending because subtransaction is
> >> overflowed."),
> >>
> >> Given the above understanding, "or" is not appropriate in this
> >> context, so I left this message unchanged.
> >> Instead, I have added an errhint. The phrasing in the hint message
> >> aligns with the manual, allowing users to search for this hint and
> >> find the newly added resolution instructions.
> > 
> > On second thought, it may not be appropriate to show this output to clients attempting to connect. This message
shouldbe notified not to clients but to administrators.
 
> > 
> >  From this point of view, it'd be better to output a message indicating the status inside
ProcArrayApplyRecoveryInfo().However, a straightforward implementation would result in the same message being logged
everytime an XLOG_RUNNING_XACTS WAL is received, making it noisy.
 
> > 
> > Instead of directly outputting a log indicating that a hot standby connection cannot be established due to
subtransactionoverflow, the attached patch updates the manual so that administrators can determine whether a
subtransactionoverflow has occurred based on the modified log output.
 
> > 
> > 
> > What do you think?
> 
> I had the same thought during our off-list discussion. However,
> after reviewing the recovery code - such as recoveryStopsBefore(),
> which checks whether a consistent state is reached - I now believe
> the manual’s definition of a consistent state may be incorrect.
> A consistent state should be defined as the point where recovery
> has reached minRecoveryPoint.
> 
> If we were to change the definition to match the manual,
> we would also need to update various recovery checks,
> which wouldn't be a trivial task.
> 
> Given that, I now think it's better to revive your v1 patch,
> which introduces a new postmaster signal and improves the error message
> when connections are not accepted during hot standby. I've attached
> a revised version of the patch based on your v1. Thought?

I prefer this approach clarifying that consistency and subtransaction overflow
are separate concepts in the documentation.

Here are minor comments on the patch:

-           case CAC_NOTCONSISTENT:
-               if (EnableHotStandby)
+           case CAC_NOTHOTSTANDBY:
+               if (!EnableHotStandby)
                    ereport(FATAL,
                            (errcode(ERRCODE_CANNOT_CONNECT_NOW),
                             errmsg("the database system is not yet accepting connections"),
-                            errdetail("Consistent recovery state has not been yet reached.")));
+                            errdetail("Hot standby mode is disabled.")));
+               else if (reachedConsistency)
+                   ereport(FATAL,
+                           (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                            errmsg("the database system is not accepting connections"),
+                            errdetail("Recovery snapshot is not yet ready for hot standby."),
+                            errhint("To enable hot standby, close write transactions with more than %d subtransactions
onthe primary server.",
 
+                                    PGPROC_MAX_CACHED_SUBXIDS)));
                else
                    ereport(FATAL,
                            (errcode(ERRCODE_CANNOT_CONNECT_NOW),
                             errmsg("the database system is not accepting connections"),
-                            errdetail("Hot standby mode is disabled.")));
+                            errdetail("Consistent recovery state has not been yet reached.")));

The message says "the database system is not yet accepting connections" when "Hot standby mode is disabled".
I think "yet" is not necessary in this case. Otherwise, when "Recovery snapshot is not yet ready for hot standby"
or "Consistent recovery state has not been yet reached", it seems better to use "yet" 


Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>




On 2025/03/31 22:44, torikoshia wrote:
> Here are some comments on the documentation.

Thanks for the review!


> The following description in high-availability.sgml also seems to misuse the word 'consistent':
> 
>    When the <xref linkend="guc-hot-standby"/> parameter is set to true on a
>    standby server, it will begin accepting connections once the recovery has
>    brought the system to a consistent state.
> 
> Since this is part of the "User's Overview" section, it may not be appropriate to include too much detail.
> How about rewording it to avoid using 'consistent', for example:
> 
>    When the <xref linkend="guc-hot-standby"/> parameter is set to true on a
>    standby server, it will begin accepting connections once it is ready.

"once it is ready." feels too vague to me. How about using
"once recovery has brought the system to a consistent state and
be ready for hot standby." instead?


> +    delaying accepting read-only connections.  To enable hot standby,
> +    a long-lived write transaction with more than 64 subtransactions
> +    needs to be closed on the primary.
> 
> Is it better to use 'transactions' in the plural form rather than as a nominal?
> 
> - There may be more than one such transaction.
> - The <itemizedlist> below also uses the plural form.
> - The newly added message also uses the plural form:
>    + errhint("To enable hot standby, close write transactions with more than %d subtransactions on the primary
server."
> 
> 
> What do you think?

I'm not sure whether the plural form is better here, but I've updated
the patch as suggested. Attached is the revised version.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

On 2025/03/31 22:45, Yugo Nagata wrote:
> I prefer this approach clarifying that consistency and subtransaction overflow
> are separate concepts in the documentation.
> 
> Here are minor comments on the patch:

Thanks for the review!


> -           case CAC_NOTCONSISTENT:
> -               if (EnableHotStandby)
> +           case CAC_NOTHOTSTANDBY:
> +               if (!EnableHotStandby)
>                      ereport(FATAL,
>                              (errcode(ERRCODE_CANNOT_CONNECT_NOW),
>                               errmsg("the database system is not yet accepting connections"),
> -                            errdetail("Consistent recovery state has not been yet reached.")));
> +                            errdetail("Hot standby mode is disabled.")));
> +               else if (reachedConsistency)
> +                   ereport(FATAL,
> +                           (errcode(ERRCODE_CANNOT_CONNECT_NOW),
> +                            errmsg("the database system is not accepting connections"),
> +                            errdetail("Recovery snapshot is not yet ready for hot standby."),
> +                            errhint("To enable hot standby, close write transactions with more than %d
subtransactionson the primary server.",
 
> +                                    PGPROC_MAX_CACHED_SUBXIDS)));
>                  else
>                      ereport(FATAL,
>                              (errcode(ERRCODE_CANNOT_CONNECT_NOW),
>                               errmsg("the database system is not accepting connections"),
> -                            errdetail("Hot standby mode is disabled.")));
> +                            errdetail("Consistent recovery state has not been yet reached.")));
> 
> The message says "the database system is not yet accepting connections" when "Hot standby mode is disabled".
> I think "yet" is not necessary in this case. Otherwise, when "Recovery snapshot is not yet ready for hot standby"
> or "Consistent recovery state has not been yet reached", it seems better to use "yet"

I may have unintentionally modified the error message.
I fixed the patch as suggested. Please check the latest patch
I posted earlier in response to Torikoshi-san.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-04-01 01:12, Fujii Masao wrote:
> On 2025/03/31 22:45, Yugo Nagata wrote:
>> I prefer this approach clarifying that consistency and subtransaction 
>> overflow
>> are separate concepts in the documentation.
>> 
>> Here are minor comments on the patch:
> 
> Thanks for the review!
> 
> 
>> -           case CAC_NOTCONSISTENT:
>> -               if (EnableHotStandby)
>> +           case CAC_NOTHOTSTANDBY:
>> +               if (!EnableHotStandby)
>>                      ereport(FATAL,
>>                              (errcode(ERRCODE_CANNOT_CONNECT_NOW),
>>                               errmsg("the database system is not yet 
>> accepting connections"),
>> -                            errdetail("Consistent recovery state has 
>> not been yet reached.")));
>> +                            errdetail("Hot standby mode is 
>> disabled.")));
>> +               else if (reachedConsistency)
>> +                   ereport(FATAL,
>> +                           (errcode(ERRCODE_CANNOT_CONNECT_NOW),
>> +                            errmsg("the database system is not 
>> accepting connections"),
>> +                            errdetail("Recovery snapshot is not yet 
>> ready for hot standby."),
>> +                            errhint("To enable hot standby, close 
>> write transactions with more than %d subtransactions on the primary 
>> server.",
>> +                                    PGPROC_MAX_CACHED_SUBXIDS)));
>>                  else
>>                      ereport(FATAL,
>>                              (errcode(ERRCODE_CANNOT_CONNECT_NOW),
>>                               errmsg("the database system is not 
>> accepting connections"),
>> -                            errdetail("Hot standby mode is 
>> disabled.")));
>> +                            errdetail("Consistent recovery state has 
>> not been yet reached.")));
>> 
>> The message says "the database system is not yet accepting 
>> connections" when "Hot standby mode is disabled".
>> I think "yet" is not necessary in this case. Otherwise, when "Recovery 
>> snapshot is not yet ready for hot standby"
>> or "Consistent recovery state has not been yet reached", it seems 
>> better to use "yet"
> 
> I may have unintentionally modified the error message.
> I fixed the patch as suggested. Please check the latest patch
> I posted earlier in response to Torikoshi-san.

Thank you for updating the patch!

LGTM.
I feel like changing the status to 'Ready for Committer', but since 
Nagata-san may have additional comments, I'm leaving it as 'Needs 
Review'.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.




On 2025/04/01 20:54, torikoshia wrote:
> Thank you for updating the patch!
> 
> LGTM.

I've pushed the patch. Thanks!

> I feel like changing the status to 'Ready for Committer', but since Nagata-san may have additional comments, I'm
leavingit as 'Needs Review'.
 

If any issues arise, let's continue to address them.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-04-02 15:21, Fujii Masao wrote:
> On 2025/04/01 20:54, torikoshia wrote:
>> Thank you for updating the patch!
>> 
>> LGTM.
> 
> I've pushed the patch. Thanks!

Thanks a lot!

>> I feel like changing the status to 'Ready for Committer', but since 
>> Nagata-san may have additional comments, I'm leaving it as 'Needs 
>> Review'.
> 
> If any issues arise, let's continue to address them.

OK.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.