Thread: Hot Standby remaining issues
I've put up a wiki page with the issues I see with the patch as it stands. They're roughly categorized by seriousness. http://wiki.postgresql.org/wiki/Hot_Standby_TODO New issues can and probably will still pop up, let's add them to the list as they're found so that we know what still needs to be done. You had a list of work items at the hot standby main page, but I believe it's badly out-of-date. Please move any still relevant items to the above list, if any. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-11-25 at 13:00 +0200, Heikki Linnakangas wrote: > I've put up a wiki page with the issues I see with the patch as it > stands. They're roughly categorized by seriousness. > > http://wiki.postgresql.org/wiki/Hot_Standby_TODO > > New issues can and probably will still pop up, let's add them to the > list as they're found so that we know what still needs to be done. > > You had a list of work items at the hot standby main page, but I believe > it's badly out-of-date. Please move any still relevant items to the > above list, if any. I've linked the two pages together and identified the ones I'm currently working on, plus added a few comments. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2009-11-25 at 13:00 +0200, Heikki Linnakangas wrote: > I've put up a wiki page with the issues I see with the patch as it > stands. They're roughly categorized by seriousness. > > http://wiki.postgresql.org/wiki/Hot_Standby_TODO > > New issues can and probably will still pop up, let's add them to the > list as they're found so that we know what still needs to be done. > > You had a list of work items at the hot standby main page, but I believe > it's badly out-of-date. Please move any still relevant items to the > above list, if any. 4 changes on TODO included here, including all must-fix issues. This is a combined patch. Will push changes to git also, so each commit is visible separately. Lots of wiki comments added. -- Simon Riggs www.2ndQuadrant.com
Attachment
Simon Riggs wrote: > @@ -654,10 +656,13 @@ LockAcquire(const LOCKTAG *locktag, > elog(PANIC, "lock table corrupted"); > } > LWLockRelease(partitionLock); > - ereport(ERROR, > - (errcode(ERRCODE_OUT_OF_MEMORY), > - errmsg("out of shared memory"), > - errhint("You might need to increase max_locks_per_transaction."))); > + if (reportLockTableError) > + ereport(ERROR, > + (errcode(ERRCODE_OUT_OF_MEMORY), > + errmsg("out of shared memory"), > + errhint("You might need to increase max_locks_per_transaction."))); > + else > + return LOCKACQUIRE_NOT_AVAIL; > } > locallock->proclock = proclock; > That seems dangerous when dontWait==false. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > commit 02c3eadb766201db084b668daa271db4a900adc9 > Author: Simon Riggs <sriggs@ebony.(none)> > Date: Sat Nov 28 06:23:33 2009 +0000 > > Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off. > Various comments added also. > This patch makes it unsafe to start hot standby mode from a shutdown checkpoint, because we don't know if wal_standby_info was enabled in the master. I still don't think we need the GUC. But for future-proofing, perhaps we should add a flag to shutdown checkpoint records, indicating whether it's safe to start hot standby from it. That way, if we decide to add a GUC like that at a later stage, we don't need to change the on-disk format. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Simon Riggs wrote: >> @@ -654,10 +656,13 @@ LockAcquire(const LOCKTAG *locktag, >> elog(PANIC, "lock table corrupted"); >> } >> LWLockRelease(partitionLock); >> - ereport(ERROR, >> - (errcode(ERRCODE_OUT_OF_MEMORY), >> - errmsg("out of shared memory"), >> - errhint("You might need to increase max_locks_per_transaction."))); >> + if (reportLockTableError) >> + ereport(ERROR, >> + (errcode(ERRCODE_OUT_OF_MEMORY), >> + errmsg("out of shared memory"), >> + errhint("You might need to increase max_locks_per_transaction."))); >> + else >> + return LOCKACQUIRE_NOT_AVAIL; >> } >> locallock->proclock = proclock; >> > > That seems dangerous when dontWait==false. Ah, I see now that you're only setting reportLockTableError just before you call LockAcquire, and reset it afterwards. It's safe then, but it should rather be another argument to the function, as how the global variable is really being used. The patch doesn't actually fix the issue it was supposed to fix. If a read-only transaction holds a lot of locks, consuming so much lock space that there's none left for the startup process to hold the lock it wants, it will abort and bring down postmaster. The patch attempts to kill any conflicting lockers, but those are handled fine already (if there's any conflicting locks, LockAcquire will return LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting locks using up the lock space. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-12-02 at 12:49 +0200, Heikki Linnakangas wrote: > If a read-only transaction holds a lot of locks, consuming so much > lock space that there's none left for the startup process to hold the > lock it wants, it will abort and bring down postmaster. The patch > attempts to kill any conflicting lockers, but those are handled fine > already (if there's any conflicting locks, LockAcquire will return > LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting > locks using up the lock space. Oh dear, another "nuke 'em all from orbit" scenario. Will do. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2009-12-02 at 12:49 +0200, Heikki Linnakangas wrote: > >> If a read-only transaction holds a lot of locks, consuming so much >> lock space that there's none left for the startup process to hold the >> lock it wants, it will abort and bring down postmaster. The patch >> attempts to kill any conflicting lockers, but those are handled fine >> already (if there's any conflicting locks, LockAcquire will return >> LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting >> locks using up the lock space. > > Oh dear, another "nuke 'em all from orbit" scenario. Will do. Yeah. This case is much like the OOM killer on Linux. Not really "nuke 'em all" but "nuke someone, don't care who".. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-12-01 at 20:26 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > commit 02c3eadb766201db084b668daa271db4a900adc9 > > Author: Simon Riggs <sriggs@ebony.(none)> > > Date: Sat Nov 28 06:23:33 2009 +0000 > > > > Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off. > > Various comments added also. > > > > This patch makes it unsafe to start hot standby mode from a shutdown > checkpoint, because we don't know if wal_standby_info was enabled in the > master. > I still don't think we need the GUC. If that's a good plan we can remove it in late beta. Let's keep it there for now. > But for future-proofing, perhaps we > should add a flag to shutdown checkpoint records, indicating whether > it's safe to start hot standby from it. That way, if we decide to add a > GUC like that at a later stage, we don't need to change the on-disk format. OK, I understand this now. Taken me a while, even though obvious now I see. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2009-12-02 at 16:41 +0000, Simon Riggs wrote: > On Tue, 2009-12-01 at 20:26 +0200, Heikki Linnakangas wrote: > > Simon Riggs wrote: > > > commit 02c3eadb766201db084b668daa271db4a900adc9 > > > Author: Simon Riggs <sriggs@ebony.(none)> > > > Date: Sat Nov 28 06:23:33 2009 +0000 > > > > > > Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off. > > > Various comments added also. > > > > > > > This patch makes it unsafe to start hot standby mode from a shutdown > > checkpoint, because we don't know if wal_standby_info was enabled in the > > master. Hmm, what happens if someone enables wal_standby_info in postgresql.conf while the server is shutdown. It would still be a valid starting point in that case. I'll just make a note, I think. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > Hmm, what happens if someone enables wal_standby_info in postgresql.conf > while the server is shutdown. It would still be a valid starting point > in that case. Yeah, true. > I'll just make a note, I think. Yeah, a manual (or automatic, if you just wait) checkpoint will produce a new checkpoint record showing that it's safe to start standby again. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Regarding this item from the wiki page: > The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little activityin the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours.The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reportingquery is started in the standby. Ten minutes later, a small transaction is executed in the master that conflictswith the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transactionbegan, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. > > * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) Update recoveryLastXTime at checkpoints doesn't help when the master is completely idle, because we skip checkpoints in that case. It's better than nothing, of course. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-12-04 at 10:37 +0200, Heikki Linnakangas wrote: > Regarding this item from the wiki page: > > The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little activityin the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours.The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reportingquery is started in the standby. Ten minutes later, a small transaction is executed in the master that conflictswith the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transactionbegan, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. > > > > * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) > > Update recoveryLastXTime at checkpoints doesn't help when the master is > completely idle, because we skip checkpoints in that case. It's better > than nothing, of course. Not if archive_timeout is set, which it would be in warm standby case. We can do even better than this with SR. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Fri, 2009-12-04 at 10:37 +0200, Heikki Linnakangas wrote: >> Regarding this item from the wiki page: >>> The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little activityin the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours.The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reportingquery is started in the standby. Ten minutes later, a small transaction is executed in the master that conflictswith the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transactionbegan, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. >>> >>> * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) >> Update recoveryLastXTime at checkpoints doesn't help when the master is >> completely idle, because we skip checkpoints in that case. It's better >> than nothing, of course. > > Not if archive_timeout is set, which it would be in warm standby case. > We can do even better than this with SR. If the system is completely idle, and no WAL is written, we skip xlog switches too, even if archive_timeout is set . It would be pointless to create a stream of WAL files with no content except for the XLOG_SWITCH records. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > If the system is completely idle, and no WAL is written, we skip > xlog switches too, even if archive_timeout is set . It would be > pointless to create a stream of WAL files with no content except > for the XLOG_SWITCH records. It's not pointless if you want to monitor that your backup system is healthy. This was previously mentioned a couple years ago: http://archives.postgresql.org/pgsql-general/2007-10/msg01448.php It turns out that it's been working fine under 8.3. Of course, we can always add a crontab job to do some small bogus update to force WAL switches, so it's not the end of the world if we lose the 8.3 behavior; but my preference would be that if a WAL switch interval is specified, the WAL files switch at least that often. -Kevin