Thread: XMin Hot Standby Feedback patch

XMin Hot Standby Feedback patch

From
Daniel Farina
Date:
This is another bit of the syncrep patch split out.

I will revisit the replication timeout one Real Soon, I promise -- but
I have a couple things to do today that may delay that until the
evening.

https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1

Context diff supplied here.

Note that this information is not exposed via catalog in the original
syncrep patch, and is not here. Do we want that kind of reporting?

--
fdr

Attachment

Re: XMin Hot Standby Feedback patch

From
Simon Riggs
Date:
On Sat, 2011-02-12 at 14:11 -0800, Daniel Farina wrote:
> This is another bit of the syncrep patch split out.
>
> I will revisit the replication timeout one Real Soon, I promise -- but
> I have a couple things to do today that may delay that until the
> evening.
>
> https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1
>
> Context diff supplied here.

Greg just tipped me off to this thread a few hours ago. I saw your other
work on timeouts which looks good.

I've reworked this feature myself, and its roughly the same thing you
have posted, so I will just add on to this thread. The major change from
my earlier patch is that the logic around setting xmin on the master is
considerably tighter, and correctly uses locking.

Patch attached, no docs yet, but the patch is clear.

I'm looking to commit this in next 24 hours barring objections and/or
test failures.

> Note that this information is not exposed via catalog in the original
> syncrep patch, and is not here. Do we want that kind of reporting?

Probably, but its a small change and will conflict with other work for
now.

--
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services


Attachment

Re: XMin Hot Standby Feedback patch

From
Heikki Linnakangas
Date:
On 15.02.2011 18:42, Simon Riggs wrote:
> On Sat, 2011-02-12 at 14:11 -0800, Daniel Farina wrote:
>> This is another bit of the syncrep patch split out.
>>
>> I will revisit the replication timeout one Real Soon, I promise -- but
>> I have a couple things to do today that may delay that until the
>> evening.
>>
>> https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1
>>
>> Context diff supplied here.
>
> Greg just tipped me off to this thread a few hours ago. I saw your other
> work on timeouts which looks good.
>
> I've reworked this feature myself, and its roughly the same thing you
> have posted, so I will just add on to this thread. The major change from
> my earlier patch is that the logic around setting xmin on the master is
> considerably tighter, and correctly uses locking.

It would be wise to also transmit the epoch in addition to xmin, to 
avoid confusion if the standby is > 2 billion transactions behind.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XMin Hot Standby Feedback patch

From
Robert Haas
Date:
On Tue, Feb 15, 2011 at 11:49 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 15.02.2011 18:42, Simon Riggs wrote:
>>
>> On Sat, 2011-02-12 at 14:11 -0800, Daniel Farina wrote:
>>>
>>> This is another bit of the syncrep patch split out.
>>>
>>> I will revisit the replication timeout one Real Soon, I promise -- but
>>> I have a couple things to do today that may delay that until the
>>> evening.
>>>
>>>
>>> https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1
>>>
>>> Context diff supplied here.
>>
>> Greg just tipped me off to this thread a few hours ago. I saw your other
>> work on timeouts which looks good.
>>
>> I've reworked this feature myself, and its roughly the same thing you
>> have posted, so I will just add on to this thread. The major change from
>> my earlier patch is that the logic around setting xmin on the master is
>> considerably tighter, and correctly uses locking.
>
> It would be wise to also transmit the epoch in addition to xmin, to avoid
> confusion if the standby is > 2 billion transactions behind.

That case is probably hopelessly broken anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: XMin Hot Standby Feedback patch

From
Heikki Linnakangas
Date:
On 15.02.2011 18:52, Robert Haas wrote:
> On Tue, Feb 15, 2011 at 11:49 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> It would be wise to also transmit the epoch in addition to xmin, to avoid
>> confusion if the standby is>  2 billion transactions behind.
>
> That case is probably hopelessly broken anyway.

I don't expect the feedback to do anything too useful in case of xid 
wraparound, but at least the master would recognize the situation and 
know to ignore the bogus xmin from that standby.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XMin Hot Standby Feedback patch

From
Simon Riggs
Date:
On Tue, 2011-02-15 at 18:49 +0200, Heikki Linnakangas wrote:

> It would be wise to also transmit the epoch in addition to xmin, to 
> avoid confusion if the standby is > 2 billion transactions behind.

Yes, good idea, thanks.

That has to be the record for the fastest patch review. ;-)

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: XMin Hot Standby Feedback patch

From
Robert Haas
Date:
On Tue, Feb 15, 2011 at 11:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Patch attached, no docs yet, but the patch is clear.
>
> I'm looking to commit this in next 24 hours barring objections and/or
> test failures.

Looks pretty good to me, though I haven't tested it.  I like some of
the safety valves you put in there, but I don't understand this part:

+                       /*
+                        * Feedback from standby should move us
forwards, but not too far.
+                        * Avoid grabbing XidGenLock here in case of
waits, so use
+                        * a heuristic instead.
+                        */

What's XIDGenLock got to do with it?

On another disk, I think that those warning messages are a bad idea.
That could fill up someone's disk really quickly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: XMin Hot Standby Feedback patch

From
Robert Haas
Date:
On Tue, Feb 15, 2011 at 12:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On another disk, I think that those warning messages are a bad idea.
> That could fill up someone's disk really quickly.

On another disk?  What the heck am I talking about?

On another point?  On another note?  Anyway, you get the idea... hopefully.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: XMin Hot Standby Feedback patch

From
Simon Riggs
Date:
On Tue, 2011-02-15 at 12:20 -0500, Robert Haas wrote:
> On Tue, Feb 15, 2011 at 12:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On another disk, I think that those warning messages are a bad idea.
> > That could fill up someone's disk really quickly.
> 
> On another disk?  What the heck am I talking about?
> 
> On another point?  On another note?  Anyway, you get the idea... hopefully.

Yeh. I quite liked it as a metaphorical turn of phrase.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: XMin Hot Standby Feedback patch

From
Simon Riggs
Date:
On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
> Looks pretty good to me, though I haven't tested it.  I like some of
> the safety valves you put in there, but I don't understand this part

Reworked logic covering all feedback, plus tests, plus docs.

Last comments before commit please.

--
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services


Attachment

Re: XMin Hot Standby Feedback patch

From
Robert Haas
Date:
On Tue, Feb 15, 2011 at 4:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
>> Looks pretty good to me, though I haven't tested it.  I like some of
>> the safety valves you put in there, but I don't understand this part
>
> Reworked logic covering all feedback, plus tests, plus docs.
>
> Last comments before commit please.

What happens if someone has hot_standby_feedback on and then turns it
off?  I think in XLogWalRcvSendReply() you need

if (hot_standby_feedback)
{stuff
}
else
{   reply_message.xmin = InvaidXID;   reply_message.epoch = 0;  /* or something */
}

Also this part looks kludgy to me:

+        GetNextXidAndEpoch(&nextXid, &nextEpoch);
+        if (nextXid < reply_message.xmin)
+            nextEpoch--;

How about introducing a GetOldestXminAndEpoch function instead?

Would it make sense to avoid grabbing the ProcArrayLock except when we
truly need to update MyProc->xmin?  ProcessStandbyReplyMessage gets
called a lot...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: XMin Hot Standby Feedback patch

From
Fujii Masao
Date:
On Wed, Feb 16, 2011 at 6:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
>> Looks pretty good to me, though I haven't tested it.  I like some of
>> the safety valves you put in there, but I don't understand this part
>
> Reworked logic covering all feedback, plus tests, plus docs.
>
> Last comments before commit please.

When I started the standby with hot_standby = off and hot_standby_feedback = on,
I got the following assertion error.

-----------------
sby LOG:  streaming replication successfully connected to primary
TRAP: FailedAssertion("!(((result) >= ((TransactionId) 3)))", File:
"procarray.c", Line: 1027)
act LOG:  unexpected EOF on standby connection
sby LOG:  WAL receiver process (PID 17572) was terminated by signal 6: Aborted
sby LOG:  terminating any other active server processes
-----------------

vacuum_defer_cleanup_age on the *standby* should not affect the
feedback xid.

VACUUM TABLE on the *primary* doesn't use the feedback xid at all.
Is this intentional?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: XMin Hot Standby Feedback patch

From
Simon Riggs
Date:
On Wed, 2011-02-16 at 11:25 +0900, Fujii Masao wrote:
> On Wed, Feb 16, 2011 at 6:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
> >> Looks pretty good to me, though I haven't tested it.  I like some of
> >> the safety valves you put in there, but I don't understand this part
> >
> > Reworked logic covering all feedback, plus tests, plus docs.
> >
> > Last comments before commit please.
> 
> When I started the standby with hot_standby = off and hot_standby_feedback = on,
> I got the following assertion error.
> 
> -----------------
> sby LOG:  streaming replication successfully connected to primary
> TRAP: FailedAssertion("!(((result) >= ((TransactionId) 3)))", File:
> "procarray.c", Line: 1027)
> act LOG:  unexpected EOF on standby connection
> sby LOG:  WAL receiver process (PID 17572) was terminated by signal 6: Aborted
> sby LOG:  terminating any other active server processes
> -----------------

Thanks

> vacuum_defer_cleanup_age on the *standby* should not affect the
> feedback xid.

Not sure, will think some more.

> VACUUM TABLE on the *primary* doesn't use the feedback xid at all.
> Is this intentional?

Yes, I was in the middle of fixing that.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services