Thread: Hot Standby remaining issues

Hot Standby remaining issues

From
Heikki Linnakangas
Date:
I've put up a wiki page with the issues I see with the patch as it
stands. They're roughly categorized by seriousness.

http://wiki.postgresql.org/wiki/Hot_Standby_TODO

New issues can and probably will still pop up, let's add them to the
list as they're found so that we know what still needs to be done.

You had a list of work items at the hot standby main page, but I believe
it's badly out-of-date. Please move any still relevant items to the
above list, if any.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Simon Riggs
Date:
On Wed, 2009-11-25 at 13:00 +0200, Heikki Linnakangas wrote:
> I've put up a wiki page with the issues I see with the patch as it
> stands. They're roughly categorized by seriousness.
> 
> http://wiki.postgresql.org/wiki/Hot_Standby_TODO
> 
> New issues can and probably will still pop up, let's add them to the
> list as they're found so that we know what still needs to be done.
> 
> You had a list of work items at the hot standby main page, but I believe
> it's badly out-of-date. Please move any still relevant items to the
> above list, if any.

I've linked the two pages together and identified the ones I'm currently
working on, plus added a few comments.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby remaining issues

From
Simon Riggs
Date:
On Wed, 2009-11-25 at 13:00 +0200, Heikki Linnakangas wrote:
> I've put up a wiki page with the issues I see with the patch as it
> stands. They're roughly categorized by seriousness.
>
> http://wiki.postgresql.org/wiki/Hot_Standby_TODO
>
> New issues can and probably will still pop up, let's add them to the
> list as they're found so that we know what still needs to be done.
>
> You had a list of work items at the hot standby main page, but I believe
> it's badly out-of-date. Please move any still relevant items to the
> above list, if any.

4 changes on TODO included here, including all must-fix issues. This is
a combined patch. Will push changes to git also, so each commit is
visible separately.

Lots of wiki comments added.

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> @@ -654,10 +656,13 @@ LockAcquire(const LOCKTAG *locktag,
>                  elog(PANIC, "lock table corrupted");
>          }
>          LWLockRelease(partitionLock);
> -        ereport(ERROR,
> -                (errcode(ERRCODE_OUT_OF_MEMORY),
> -                 errmsg("out of shared memory"),
> -          errhint("You might need to increase max_locks_per_transaction.")));
> +        if (reportLockTableError)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_OUT_OF_MEMORY),
> +                     errmsg("out of shared memory"),
> +              errhint("You might need to increase max_locks_per_transaction.")));
> +        else
> +            return LOCKACQUIRE_NOT_AVAIL;
>      }
>      locallock->proclock = proclock;
>  

That seems dangerous when dontWait==false.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> commit 02c3eadb766201db084b668daa271db4a900adc9
> Author: Simon Riggs <sriggs@ebony.(none)>
> Date:   Sat Nov 28 06:23:33 2009 +0000
> 
>     Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off.
>     Various comments added also.
> 

This patch makes it unsafe to start hot standby mode from a shutdown
checkpoint, because we don't know if wal_standby_info was enabled in the
master.

I still don't think we need the GUC. But for future-proofing, perhaps we
should add a flag to shutdown checkpoint records, indicating whether
it's safe to start hot standby from it. That way, if we decide to add a
GUC like that at a later stage, we don't need to change the on-disk format.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> @@ -654,10 +656,13 @@ LockAcquire(const LOCKTAG *locktag,
>>                  elog(PANIC, "lock table corrupted");
>>          }
>>          LWLockRelease(partitionLock);
>> -        ereport(ERROR,
>> -                (errcode(ERRCODE_OUT_OF_MEMORY),
>> -                 errmsg("out of shared memory"),
>> -          errhint("You might need to increase max_locks_per_transaction.")));
>> +        if (reportLockTableError)
>> +            ereport(ERROR,
>> +                    (errcode(ERRCODE_OUT_OF_MEMORY),
>> +                     errmsg("out of shared memory"),
>> +              errhint("You might need to increase max_locks_per_transaction.")));
>> +        else
>> +            return LOCKACQUIRE_NOT_AVAIL;
>>      }
>>      locallock->proclock = proclock;
>>  
> 
> That seems dangerous when dontWait==false.

Ah, I see now that you're only setting reportLockTableError just before
you call LockAcquire, and reset it afterwards. It's safe then, but it
should rather be another argument to the function, as how the global
variable is really being used.

The patch doesn't actually fix the issue it was supposed to fix. If a
read-only transaction holds a lot of locks, consuming so much lock space
that there's none left for the startup process to hold the lock it
wants, it will abort and bring down postmaster. The patch attempts to
kill any conflicting lockers, but those are handled fine already (if
there's any conflicting locks, LockAcquire will return
LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting locks
using up the lock space.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Simon Riggs
Date:
On Wed, 2009-12-02 at 12:49 +0200, Heikki Linnakangas wrote:

> If a read-only transaction holds a lot of locks, consuming so much
> lock space that there's none left for the startup process to hold the
> lock it wants, it will abort and bring down postmaster. The patch
> attempts to kill any conflicting lockers, but those are handled fine
> already (if there's any conflicting locks, LockAcquire will return
> LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting
> locks using up the lock space.

Oh dear, another "nuke 'em all from orbit" scenario. Will do.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2009-12-02 at 12:49 +0200, Heikki Linnakangas wrote:
> 
>> If a read-only transaction holds a lot of locks, consuming so much
>> lock space that there's none left for the startup process to hold the
>> lock it wants, it will abort and bring down postmaster. The patch
>> attempts to kill any conflicting lockers, but those are handled fine
>> already (if there's any conflicting locks, LockAcquire will return
>> LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting
>> locks using up the lock space.
> 
> Oh dear, another "nuke 'em all from orbit" scenario. Will do.

Yeah. This case is much like the OOM killer on Linux. Not really "nuke
'em all" but "nuke someone, don't care who"..

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Simon Riggs
Date:
On Tue, 2009-12-01 at 20:26 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > commit 02c3eadb766201db084b668daa271db4a900adc9
> > Author: Simon Riggs <sriggs@ebony.(none)>
> > Date:   Sat Nov 28 06:23:33 2009 +0000
> > 
> >     Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off.
> >     Various comments added also.
> > 
> 
> This patch makes it unsafe to start hot standby mode from a shutdown
> checkpoint, because we don't know if wal_standby_info was enabled in the
> master.

> I still don't think we need the GUC. 

If that's a good plan we can remove it in late beta. Let's keep it there
for now.

> But for future-proofing, perhaps we
> should add a flag to shutdown checkpoint records, indicating whether
> it's safe to start hot standby from it. That way, if we decide to add a
> GUC like that at a later stage, we don't need to change the on-disk format.

OK, I understand this now. Taken me a while, even though obvious now I
see.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby remaining issues

From
Simon Riggs
Date:
On Wed, 2009-12-02 at 16:41 +0000, Simon Riggs wrote:
> On Tue, 2009-12-01 at 20:26 +0200, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > commit 02c3eadb766201db084b668daa271db4a900adc9
> > > Author: Simon Riggs <sriggs@ebony.(none)>
> > > Date:   Sat Nov 28 06:23:33 2009 +0000
> > > 
> > >     Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off.
> > >     Various comments added also.
> > > 
> > 
> > This patch makes it unsafe to start hot standby mode from a shutdown
> > checkpoint, because we don't know if wal_standby_info was enabled in the
> > master.

Hmm, what happens if someone enables wal_standby_info in postgresql.conf
while the server is shutdown. It would still be a valid starting point
in that case. I'll just make a note, I think.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Hmm, what happens if someone enables wal_standby_info in postgresql.conf
> while the server is shutdown. It would still be a valid starting point
> in that case.

Yeah, true.

> I'll just make a note, I think.

Yeah, a manual (or automatic, if you just wait) checkpoint will produce
a new checkpoint record showing that it's safe to start standby again.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Regarding this item from the wiki page:
> The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little
activityin the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8
hours.The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long
reportingquery is started in the standby. Ten minutes later, a small transaction is executed in the master that
conflictswith the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting
transactionbegan, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was
replayed.
> 
>     * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) 

Update recoveryLastXTime at checkpoints doesn't help when the master is
completely idle, because we skip checkpoints in that case. It's better
than nothing, of course.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
Simon Riggs
Date:
On Fri, 2009-12-04 at 10:37 +0200, Heikki Linnakangas wrote:
> Regarding this item from the wiki page:
> > The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little
activityin the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8
hours.The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long
reportingquery is started in the standby. Ten minutes later, a small transaction is executed in the master that
conflictswith the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting
transactionbegan, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was
replayed.
> > 
> >     * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) 
> 
> Update recoveryLastXTime at checkpoints doesn't help when the master is
> completely idle, because we skip checkpoints in that case. It's better
> than nothing, of course.

Not if archive_timeout is set, which it would be in warm standby case.
We can do even better than this with SR.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby remaining issues

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2009-12-04 at 10:37 +0200, Heikki Linnakangas wrote:
>> Regarding this item from the wiki page:
>>> The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little
activityin the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8
hours.The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long
reportingquery is started in the standby. Ten minutes later, a small transaction is executed in the master that
conflictswith the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting
transactionbegan, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was
replayed.
>>>
>>>     * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) 
>> Update recoveryLastXTime at checkpoints doesn't help when the master is
>> completely idle, because we skip checkpoints in that case. It's better
>> than nothing, of course.
> 
> Not if archive_timeout is set, which it would be in warm standby case.
> We can do even better than this with SR.

If the system is completely idle, and no WAL is written, we skip xlog
switches too, even if archive_timeout is set . It would be pointless to
create a stream of WAL files with no content except for the XLOG_SWITCH
records.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby remaining issues

From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> If the system is completely idle, and no WAL is written, we skip
> xlog switches too, even if archive_timeout is set . It would be
> pointless to create a stream of WAL files with no content except
> for the XLOG_SWITCH records.
It's not pointless if you want to monitor that your backup system is
healthy.  This was previously mentioned a couple years ago:
http://archives.postgresql.org/pgsql-general/2007-10/msg01448.php
It turns out that it's been working fine under 8.3.  Of course, we
can always add a crontab job to do some small bogus update to force
WAL switches, so it's not the end of the world if we lose the 8.3
behavior; but my preference would be that if a WAL switch interval
is specified, the WAL files switch at least that often.
-Kevin