Re:Re:Re: [BUG] standby node can not provide service even itreplays all log files - Mailing list pgsql-hackers

From Thunder
Subject Re:Re:Re: [BUG] standby node can not provide service even itreplays all log files
Date
Msg-id 78a38648.8cd4.16e12a5e053.Coremail.thunder1@126.com
Whole thread Raw
In response to Re:Re: [BUG] standby node can not provide service even it replaysall log files  (Thunder <thunder1@126.com>)
List pgsql-hackers
Hi
In our usage scenario the standby node could be OOM killed and we have to create new standby node.
If master node has uncommitted long transaction and new standby node can not provide service.
So for us this is a critical issue.

I do hope any suggestion to this issue.
And can any one help to review the attached patch?
Thanks. 





At 2019-10-22 20:42:21, "Thunder" <thunder1@126.com> wrote:
Update the patch.
1. The STANDBY_SNAPSHOT_PENDING state is set when we replay the first XLOG_RUNNING_XACTS and the sub transaction ids are overflow.
2. When we log XLOG_RUNNING_XACTS in master node, can we assume that all xact IDS < oldestRunningXid are considered finished?
3. If we can assume this, when we replay XLOG_RUNNING_XACTS and change standbyState to STANDBY_SNAPSHOT_PENDING, can we record oldestRunningXid to a shared variable, like procArray->oldest_running_xid?
4. In standby node when call GetSnapshotData if procArray->oldest_running_xid is valid, can we set xmin to be procArray->oldest_running_xid?

Appreciate any suggestion to this issue.



At 2019-10-22 01:27:58, "Robert Haas" <robertmhaas@gmail.com> wrote: >On Mon, Oct 21, 2019 at 4:13 AM Thunder <thunder1@126.com> wrote: >> Can we fix this issue like the following patch? >> >> $git diff src/backend/access/transam/xlog.c >> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c >> index 49ae97d4459..0fbdf6fd64a 100644 >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -8365,7 +8365,7 @@ CheckRecoveryConsistency(void) >> * run? If so, we can tell postmaster that the database is consistent now, >> * enabling connections. >> */ >> - if (standbyState == STANDBY_SNAPSHOT_READY && >> + if ((standbyState == STANDBY_SNAPSHOT_READY || standbyState == STANDBY_SNAPSHOT_PENDING) && >> !LocalHotStandbyActive && >> reachedConsistency && >> IsUnderPostmaster) > >I think that the issue you've encountered is design behavior. In >other words, it's intended to work that way. > >The comments for the code you propose to change say that we can allow >connections once we've got a valid snapshot. So presumably the effect >of your change would be to allow connections even though we don't have >a valid snapshot. > >That seems bad. > >-- >Robert Haas >EnterpriseDB: http://www.enterprisedb.com >The Enterprise PostgreSQL Company


 



 

Attachment

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Geoff Winkless
Date:
Subject: Re: Proposition to use '==' as synonym for 'IS NOT DISTINCT FROM'