Thread: Re: Change log level for notifying hot standby is waiting non-overflowed snapshot


On 2025/02/03 22:35, torikoshia wrote:
> Hi,
> 
> When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible:
> 
>    $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL:  the database system is not yet
acceptingconnections
 
>            DETAIL:  Consistent recovery state has not been yet reached.

Could you share the steps to reproduce this situation?


> However, the log message that indicates the cause of this issue seems to be only output at the DEBUG1 level:
> 
>    elog(DEBUG1,
>         "recovery snapshot waiting for non-overflowed snapshot or "
>         "until oldest active xid on standby is at least %u (now %u)",
>         standbySnapshotPendingXmin,
>         running->oldestRunningXid);
> 
> I believe this message would be useful not only for developers but also for users.

Isn't this log message too difficult for most users? It seems to
describe PostgreSQL's internal mechanisms, making it hard
for users to understand the issue and what actions to take.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-03-03 13:10, Fujii Masao wrote:

Thanks for your comments!

> On 2025/02/03 22:35, torikoshia wrote:
>> Hi,
>> 
>> When a hot standby is restarted in a state where subtransactions have 
>> overflowed, it may become inaccessible:
>> 
>>    $ psql: error: connection to server at "localhost" (::1), port 5433 
>> failed: FATAL:  the database system is not yet accepting connections
>>            DETAIL:  Consistent recovery state has not been yet 
>> reached.
> 
> Could you share the steps to reproduce this situation?

We can reproduce this situation using the following procedure.
I performed this test with one asynchronous standby server.

-- overflow subtransaction
(primary)=# create table t1 (i int);
(primary)=# select 'insert into t1 values (1); savepoint s_' || 
generate_series(1, 70) ; \gexec
(primary)=# checkpoint;

-- restart standby
$ pg_ctl restart -D data_stb/
waiting for server to shut down.... done
server stopped
waiting for server to start.... LOG:  redirecting log output to logging 
collector process
........................................................... stopped 
waiting
pg_ctl: server did not start in time

-- standby log
DEBUG:  recovery snapshot waiting for non-overflowed snapshot or until 
oldest active xid on standby is at least 887 (now 818)


>> However, the log message that indicates the cause of this issue seems 
>> to be only output at the DEBUG1 level:
>> 
>>    elog(DEBUG1,
>>         "recovery snapshot waiting for non-overflowed snapshot or "
>>         "until oldest active xid on standby is at least %u (now %u)",
>>         standbySnapshotPendingXmin,
>>         running->oldestRunningXid);
>> 
>> I believe this message would be useful not only for developers but 
>> also for users.
> 
> Isn't this log message too difficult for most users? It seems to
> describe PostgreSQL's internal mechanisms, making it hard
> for users to understand the issue and what actions to take.

Agreed and I feel that a message suggesting something like "check if 
there are any overflowing transactions on the primary side" would make 
it useful.
On the other hand, the manual's explanation of 
pg_stat_get_backend_subxact() does not mention subtransaction overflow, 
so I am not sure how much detail should be included.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.




On 2025/03/04 0:20, torikoshia wrote:
> On 2025-03-03 13:10, Fujii Masao wrote:
> 
> Thanks for your comments!
> 
>> On 2025/02/03 22:35, torikoshia wrote:
>>> Hi,
>>>
>>> When a hot standby is restarted in a state where subtransactions have overflowed, it may become inaccessible:
>>>
>>>    $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL:  the database system is not
yetaccepting connections
 
>>>            DETAIL:  Consistent recovery state has not been yet reached.
>>
>> Could you share the steps to reproduce this situation?
> 
> We can reproduce this situation using the following procedure.

Thanks! I was able to reproduce the issue.


> Agreed and I feel that a message suggesting something like "check if there are any overflowing transactions on the
primaryside" would make it useful.
 

I’m wondering if this message might still be confusing for users.
Would they immediately understand what "overflowing transactions" means?
Even after reading this message, it seems also unclear what actions
they should take to resolve the issue. Plus, this message can appear
multiple times if there are multiple overflowing transactions before
starting accepting read-only connections - which could be even more confusing.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2025-03-04 03:17, Fujii Masao wrote:

>> Agreed and I feel that a message suggesting something like "check if 
>> there are any overflowing transactions on the primary side" would make 
>> it useful.
> 
> I’m wondering if this message might still be confusing for users.
> Would they immediately understand what "overflowing transactions" 
> means?
> Even after reading this message, it seems also unclear what actions
> they should take to resolve the issue. Plus, this message can appear
> multiple times if there are multiple overflowing transactions before
> starting accepting read-only connections - which could be even more 
> confusing.

It seems better to reconsider the content and timing of this message 
output.

I personally think that logging information about this situation where 
subtransaction overflowed and it prevents hot standby connections would 
be helpful for users and support providers to understand the cause of 
the issue.
Do you think such logging is unnecessary?


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.