Thread: Hot Standby on git

Hot Standby on git

From
Simon Riggs
Date:
Just a note to say that Hot Standby patch is now on git repository
 git://git.postgresql.org/git/users/simon/postgres
Branch name: hot_standby

The complete contents of that repository are BSD licenced contributions
to the PostgreSQL project.

Any further changes to that will be by agreement here on hackers. From
now, I will be submitting each individual change as patch-on-patch to
allow people to see and discuss them and to confirm them as open source
contributions. I request anybody else interested to do the same to allow
us to work together. All contributions welcome.

My record of agreed changes is here
http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items

You'll notice that I've already completed 8 changes (10 commits); those
are all fairly minor changes, so submitted here as a combined patch.
There are 9 pending changes, so far, none of which appear to be major
obstacles to resolve. Many thanks to Heikki for a thorough review which
has identified nearly all of those change requests.

I estimate that making the remaining changes noted on the Wiki and fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.

Many thanks to Magnus and Aidan for helping me resolve my git-wrestling
contest and apologies for the delay while that bout happened.

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Hot Standby on git

From
Robert Haas
Date:
On Sat, Sep 26, 2009 at 5:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Just a note to say that Hot Standby patch is now on git repository
>  git://git.postgresql.org/git/users/simon/postgres
> Branch name: hot_standby

Awesome!  Thanks for taking the time to get this set up.

> The complete contents of that repository are BSD licenced contributions
> to the PostgreSQL project.

Excellent...

> Any further changes to that will be by agreement here on hackers. From
> now, I will be submitting each individual change as patch-on-patch to
> allow people to see and discuss them and to confirm them as open source
> contributions.

Sounds good.

> I request anybody else interested to do the same to allow
> us to work together. All contributions welcome.
>
> My record of agreed changes is here
> http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items
>
> You'll notice that I've already completed 8 changes (10 commits); those
> are all fairly minor changes, so submitted here as a combined patch.
> There are 9 pending changes, so far, none of which appear to be major
> obstacles to resolve. Many thanks to Heikki for a thorough review which
> has identified nearly all of those change requests.
>
> I estimate that making the remaining changes noted on the Wiki and fully
> testing them will take at least 2 weeks. Gabriele Bartolini is assisting
> in this area, though neither of us are able to work full time on this.
> We still have ample time to complete the project in this release.

Sounds like you are making rapid progress.  If you think there's
something useful I could do, let me know and I'll take a look.

...Robert


Re: Hot Standby on git

From
Simon Riggs
Date:
On Sat, 2009-09-26 at 09:29 -0400, Robert Haas wrote:
> > I estimate that making the remaining changes noted on the Wiki and
> fully
> > testing them will take at least 2 weeks. Gabriele Bartolini is assisting
> > in this area, though neither of us are able to work full time on this.
> > We still have ample time to complete the project in this release.
> 
> Sounds like you are making rapid progress.  

Only because Heikki has put so much effort into review and because most
of the remaining issues are coding related, rather than theoretical.

We moving along and are in no way threatened by time.

> If you think there's
> something useful I could do, let me know and I'll take a look.

I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Mark Mielke
Date:
On 09/26/2009 10:04 AM, Simon Riggs wrote:
>> If you think there's
>> something useful I could do, let me know and I'll take a look.
>>      
> I feel like I need a better way of unit testing new code. Some of the
> code in the patch is to handle corner cases, so recreating them is
> fairly hard. It is a nagging feeling that I am missing some knowledge
> here and would welcome some insight, or research, into better ways of
> doing general case unit testing.
>    

You might try and steal ideas from EasyMock / PowerMock - but not sure 
how well the ideas map to C.

Generally it means allowing the functions to be called from a "mock" 
environment, where subroutine calls that might be called are stubbed out 
to return sample data that would simulate your scenario. Object oriented 
languages that require every object to provide an interface where most 
object methods can be overridden are more ideal for performing this sort 
of test.

I rarely ever see this sort of stuff in FOSS projects, and never that I 
can remember in FOSS C projects. It's not easy, though.

I assume you are doing it through code changing right now. Commenting 
out lines, replacing them with others, etc?

Cheers,
mark

-- 
Mark Mielke<mark@mielke.cc>



Re: Hot Standby on git

From
Dan Colish
Date:
On Sat, Sep 26, 2009 at 10:45:17AM -0400, Mark Mielke wrote:
> On 09/26/2009 10:04 AM, Simon Riggs wrote:
> >>If you think there's
> >>something useful I could do, let me know and I'll take a look.
> >I feel like I need a better way of unit testing new code. Some of the
> >code in the patch is to handle corner cases, so recreating them is
> >fairly hard. It is a nagging feeling that I am missing some knowledge
> >here and would welcome some insight, or research, into better ways of
> >doing general case unit testing.
> 
> You might try and steal ideas from EasyMock / PowerMock - but not
> sure how well the ideas map to C.
> 
> Generally it means allowing the functions to be called from a "mock"
> environment, where subroutine calls that might be called are stubbed
> out to return sample data that would simulate your scenario. Object
> oriented languages that require every object to provide an interface
> where most object methods can be overridden are more ideal for
> performing this sort of test.
> 
> I rarely ever see this sort of stuff in FOSS projects, and never
> that I can remember in FOSS C projects. It's not easy, though.
> 
> I assume you are doing it through code changing right now.
> Commenting out lines, replacing them with others, etc?
> 
> Cheers,
> mark
> 
> -- 
> Mark Mielke<mark@mielke.cc>
> 
> 

There are a variety of projects dedicated to creating C unit test
frameworks. I don't have a lot of experience with them, but I have heard
good things about check and cunit. Here's a link I found with a longer
list of frameworks. http://www.opensourcetesting.org/unit_c.php

--
--Dan


Re: Hot Standby on git

From
Josh Berkus
Date:
> I feel like I need a better way of unit testing new code. Some of the
> code in the patch is to handle corner cases, so recreating them is
> fairly hard. It is a nagging feeling that I am missing some knowledge
> here and would welcome some insight, or research, into better ways of
> doing general case unit testing.

There's always pgtap.  Whenever we find a new corner case, we add it to
the development test suite.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Hot Standby on git

From
Mark Mielke
Date:
On 09/26/2009 02:28 PM, Dan Colish wrote:
> There are a variety of projects dedicated to creating C unit test
> frameworks. I don't have a lot of experience with them, but I have heard
> good things about check and cunit. Here's a link I found with a longer
> list of frameworks. http://www.opensourcetesting.org/unit_c.php
>    

Looking at check and cunit - I don't see what sort of mock function 
facility they would provide? One part of unit testing is arranging for 
functions to be called, tested, and results reported on. This can take 
you a certain amount of the way. "Pure" functions, for example, that 
always generate the same output for the same input parameters, are 
perfect for this situation. Perhaps test how a qsort() or bsearch() 
method works under various scenarios?

Most real life code gets a little more complicated. For example, what if 
we want to simulate a network failure or "out of disk space" condition? 
What if we want to test out what happens when the Y2038 date is reached? 
This requires either complex test case setup that is difficult to run 
reproducibly, or another approach - "mock". It means doing things like 
overriding the write() method, and making it return successful N times, 
and then failing on the (N+1)th time with ENOSPC. It means overriding 
the gettimeofday() method to return a time in the future. A major 
benefit of this sort of testing is that it should not require source 
changes in order to perform the test. This sort of stuff is a LOT easier 
to do in OO languages. I see it done in Java a lot. I can't remember 
ever having seen it done in C. I think it's just too hard compared to 
the value obtained from the effort.

In your list above, it does show a few attempts - CMock sticks out as a 
for example. It looks more complicated, though. It takes a .h file and 
generates stubs for you to fill in? That could be difficult to manage 
for a large project with thousands or many times more unit tests. OO is 
easier because you can override *only* particular methods, and you can 
safely call the super method that it overrides to provide the underlying 
behaviour in the success cases.

Cheers,
mark

-- 
Mark Mielke<mark@mielke.cc>



Re: Hot Standby on git

From
"David E. Wheeler"
Date:
On Sep 26, 2009, at 12:33 PM, Josh Berkus wrote:

> There's always pgtap.  Whenever we find a new corner case, we add it  
> to
> the development test suite.

Also, for C TAP, there's [libtap](http://jc.ngo.org.uk/trac-bin/trac.cgi/wiki/LibTap 
). You can then use `prove` which you likely already have on your  
system, to harness the test output.

Best,

David


Re: Hot Standby on git

From
Alvaro Herrera
Date:
Mark Mielke escribió:

> Most real life code gets a little more complicated. For example,
> what if we want to simulate a network failure or "out of disk space"
> condition? What if we want to test out what happens when the Y2038
> date is reached? This requires either complex test case setup that
> is difficult to run reproducibly, or another approach - "mock". It
> means doing things like overriding the write() method, and making it
> return successful N times, and then failing on the (N+1)th time with
> ENOSPC.

I remember a kernel simulator based on User-Mode Linux called UMLsim,
with which you could stuff like this.  It's dead at this point though,
and I don't know if there's any replacement.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Per Simon's request, for the benefit of the archive, here's all the
> changes I've done on the patch since he posted the initial version for
> review for this commitfest as incremental patches. This is extracted
> from my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git.

Further fixes extracted from above repository attached..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquired first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Regarding this in InitStandbyDelayTimers:
+   /*
+    * If replication delay is enormously huge, just treat that as
+    * zero and work up from there. This prevents us from acting
+    * foolishly when replaying old log files.
+    */
+   if (*currentDelay_ms < 0)
+       *currentDelay_ms = 0;
+

So we're treating restoring from an old backup the same as an up-to-datestandby server. If you're restoring from say a
monthold base backup
 
with WAL archive up to present day, and have max_standby_delay set to
say 5 seconds, the server will wait for that 5 seconds on each
conflicting query before killing it. Until it reaches the point in the
archive where the delay is less than INT_MAX/1000 seconds old: at that
point it switches into "oh my goodness, we've fallen badly behind, let's
try to catch up ASAP and kill any queries that get into the way" mode.
That's pretty surprising behavior, and not documented either. I propose
we simply remove the above check (fixing the rest of the code so that
you don't hit integer overflows), and always respect max_standby_delay.

BTW, I wonder if should warn or something if we find that the timestamps
in the archive are in the future? IOW, if either the master's or the
standby's clock is not set correctly.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Heikki Linnakangas
Date:
+   /*
+    * If our initial RunningXactData had an overflowed snapshot then we
+    * knew we were missing some subxids from our snapshot. We can use
+    * this data as an initial snapshot, but we cannot yet mark it valid.
+    * We know that the missing subxids are equal to or earlier than
+    * LatestRunningXid. After we initialise we continue to apply changes
+    * during recovery, so once the oldestRunningXid is later than the
+    * initLatestRunningXid we can now prove that we no longer have
+    * missing information and can mark the snapshot as valid.
+    */
+   if (initRunningXactData && !recoverySnapshotValid)
+   {
+       if (TransactionIdPrecedes(initLatestRunningXid,
xlrec->oldestRunningXid)
+       {
+           recoverySnapshotValid = true;
+           elog(trace_recovery(DEBUG2),
+                   "running xact data now proven complete");
+           elog(trace_recovery(DEBUG2),
+                   "recovery snapshots are now enabled");
+       }
+       return;
+   }
+

When GetRunningXactData() calculates latestRunningXid in the master,
which is stored in initLatestRunningXid in the standby, it only looks at
xids and subxids present in the procarray. It doesn't take into account
overflowed subxids. I think we could declare a recovery snapshot "proven
complete" too early because of that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Wed, 2009-09-30 at 18:45 +0300, Heikki Linnakangas wrote:
> Regarding this in InitStandbyDelayTimers:
> +   /*
> +    * If replication delay is enormously huge, just treat that as
> +    * zero and work up from there. This prevents us from acting
> +    * foolishly when replaying old log files.
> +    */
> +   if (*currentDelay_ms < 0)
> +       *currentDelay_ms = 0;
> +
> 
> So we're treating restoring from an old backup the same as an up-to-date
>  standby server. If you're restoring from say a month old base backup
> with WAL archive up to present day, and have max_standby_delay set to
> say 5 seconds, the server will wait for that 5 seconds on each
> conflicting query before killing it. Until it reaches the point in the
> archive where the delay is less than INT_MAX/1000 seconds old: at that
> point it switches into "oh my goodness, we've fallen badly behind, let's
> try to catch up ASAP and kill any queries that get into the way" mode.
> That's pretty surprising behavior, and not documented either. I propose
> we simply remove the above check (fixing the rest of the code so that
> you don't hit integer overflows), and always respect max_standby_delay.

Agreed.

I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).

> BTW, I wonder if should warn or something if we find that the timestamps
> in the archive are in the future? IOW, if either the master's or the
> standby's clock is not set correctly.

Something similar was just spotted by a client. You can set a
recovery_target_timestamp that is before the pg_stop_recovery()
timestamp and it doesn't complain. Will fix.

Not sure if I like the sound of a system moaning at me about the clock
settings. Perhaps just once when it starts, when we read control file.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 14:29 +0300, Heikki Linnakangas wrote:
> +   /*
> +    * If our initial RunningXactData had an overflowed snapshot then we
> +    * knew we were missing some subxids from our snapshot. We can use
> +    * this data as an initial snapshot, but we cannot yet mark it valid.
> +    * We know that the missing subxids are equal to or earlier than
> +    * LatestRunningXid. After we initialise we continue to apply changes
> +    * during recovery, so once the oldestRunningXid is later than the
> +    * initLatestRunningXid we can now prove that we no longer have
> +    * missing information and can mark the snapshot as valid.
> +    */
> +   if (initRunningXactData && !recoverySnapshotValid)
> +   {
> +       if (TransactionIdPrecedes(initLatestRunningXid,
> xlrec->oldestRunningXid)
> +       {
> +           recoverySnapshotValid = true;
> +           elog(trace_recovery(DEBUG2),
> +                   "running xact data now proven complete");
> +           elog(trace_recovery(DEBUG2),
> +                   "recovery snapshots are now enabled");
> +       }
> +       return;
> +   }
> +
> 
> When GetRunningXactData() calculates latestRunningXid in the master,
> which is stored in initLatestRunningXid in the standby, it only looks at
> xids and subxids present in the procarray. It doesn't take into account
> overflowed subxids. I think we could declare a recovery snapshot "proven
> complete" too early because of that.

Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
while assuming it is calculated the new way. The new way is just to grab
nextXid since we have XidGenLock and do TransactionIdRetreat() on it.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Wed, 2009-09-30 at 09:33 +0300, Heikki Linnakangas wrote:

> Looking at the changes to StartupMultiXact, you're changing the locking
> so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
> are acquire first before changing anything. Why? Looking at the other
> functions in that file, all others that access both files are happy to
> acquire one lock at a time, like StartupMultiXact does without the patch.

I think those changes are just paranoia from early versions of patch.

We now know for certain that MultiXact isn't used at the time
StartupMultiXact() is called. The same isn't true for StartupClog() and
StartupSubtrans() which need to cope with concurrent callers.

Will remove changes and document that nothing touching it when it runs.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2009-09-30 at 09:33 +0300, Heikki Linnakangas wrote:
>
>> Looking at the changes to StartupMultiXact, you're changing the locking
>> so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
>> are acquire first before changing anything. Why? Looking at the other
>> functions in that file, all others that access both files are happy to
>> acquire one lock at a time, like StartupMultiXact does without the patch.
>
> I think those changes are just paranoia from early versions of patch.
>
> We now know for certain that MultiXact isn't used at the time
> StartupMultiXact() is called. The same isn't true for StartupClog() and
> StartupSubtrans() which need to cope with concurrent callers.
>
> Will remove changes and document that nothing touching it when it runs.

Thanks, I reverted that in my working version already. Comment patch
welcome if you feel it's needed.

Attached is a new batch of changes I've been doing since last batch.
These are again extracted from my git repository. It includes some of
the add-on patches from your repository, the rest I believe have I had
already did myself earlier, or are not necessary anymore for other reasons.

Could you look into these two TODO items you listed on the wiki page:
- Correct SET default_transaction_read_only and SET
transaction_read_only (Heikki 21/9 Hackers)
- Shutdown checkpoints must not clear locks for prepared transactions
(Heikki 23/9)

And if you could please review the changes I've been doing, just to make
sure I haven't inadvertently introduced new bugs. That has happened
before, as you've rightfully reminded me :-).

There's also the issue that you can't go into hot standby mode after a
shutdown checkpoint. I think that really should be fixed, it's just
weird from a usability point of view if it doesn't work.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
> while assuming it is calculated the new way. The new way is just to grab
> nextXid since we have XidGenLock and do TransactionIdRetreat() on it.

Ok, good, that's what I thought too. I'll fix that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 18:47 +0300, Heikki Linnakangas wrote:
> Could you look into these two TODO items you listed on the wiki page:

Unless we agree otherwise, if its listed on the Wiki page then I will
work on it. 

Maybe not as when you might like it, but I am working through the list.
5 new changes pushed just minutes ago, sans full testing.

Yes, will review your changes also.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 18:48 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
> > while assuming it is calculated the new way. The new way is just to grab
> > nextXid since we have XidGenLock and do TransactionIdRetreat() on it.
> 
> Ok, good, that's what I thought too. I'll fix that.

OK, not working on that, so go ahead. Thanks.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> @@ -7061,6 +7061,15 @@ ShutdownXLOG(int code, Datum arg)
>     else
>     {
>         /*
> +        * Take a snapshot of running transactions and write this to WAL.
> +        * This allows us to reconstruct the state of running transactions
> +        * during archive recovery, if required. We do this even if we are
> +        * not archiving, to allow a cold physical backup of the server to
> +        * be useful as a read only standby.
> +        */
> +       GetRunningTransactionData();
> +
> +       /*
>          * If archiving is enabled, rotate the last XLOG file so that all the
>          * remaining records are archived (postmaster wakes up the archiver
>          * process one more time at the end of shutdown). The checkpoint
> 

I don't think this will do any good where it's placed. The checkpoint
that follows will have its redo-pointer beyond the running-xacts record,
so WAL replay will never see it.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> I will docuemnt the recommendation to set max_standby_delay = 0 if
> performing an archive recovery (and explain why).

Hmm, not sure if that's such a good piece of advice either. It will mean
waiting for queries forever, which probably isn't what you want if
you're performing archive recovery. Or maybe it is? Maybe -1? I guess it
depends on the situation...

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 10:04 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > @@ -7061,6 +7061,15 @@ ShutdownXLOG(int code, Datum arg)
> >     else
> >     {
> >         /*
> > +        * Take a snapshot of running transactions and write this to WAL.
> > +        * This allows us to reconstruct the state of running transactions
> > +        * during archive recovery, if required. We do this even if we are
> > +        * not archiving, to allow a cold physical backup of the server to
> > +        * be useful as a read only standby.
> > +        */
> > +       GetRunningTransactionData();
> > +
> > +       /*
> >          * If archiving is enabled, rotate the last XLOG file so that all the
> >          * remaining records are archived (postmaster wakes up the archiver
> >          * process one more time at the end of shutdown). The checkpoint
> > 
> 
> I don't think this will do any good where it's placed. The checkpoint
> that follows will have its redo-pointer beyond the running-xacts record,
> so WAL replay will never see it.

Perhaps we need two entries then to cover multiple use cases?

The placement of this was specifically chosen so that it is the last
entry before the log switch, so that the runningxact record would be
archived. 

Yes, we also need one after the shutdown checkpoint to cover the case
where the whole data directory is copied after shutdown. The comments
matched the latter case but the position addressed the first case, so it
looks like I was confused as to which case I was addressing.

Have updated code to do both. See what you think. Thanks.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2009-10-02 at 10:04 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> @@ -7061,6 +7061,15 @@ ShutdownXLOG(int code, Datum arg)
>>>     else
>>>     {
>>>         /*
>>> +        * Take a snapshot of running transactions and write this to WAL.
>>> +        * This allows us to reconstruct the state of running transactions
>>> +        * during archive recovery, if required. We do this even if we are
>>> +        * not archiving, to allow a cold physical backup of the server to
>>> +        * be useful as a read only standby.
>>> +        */
>>> +       GetRunningTransactionData();
>>> +
>>> +       /*
>>>          * If archiving is enabled, rotate the last XLOG file so that all the
>>>          * remaining records are archived (postmaster wakes up the archiver
>>>          * process one more time at the end of shutdown). The checkpoint
>>>
>> I don't think this will do any good where it's placed. The checkpoint
>> that follows will have its redo-pointer beyond the running-xacts record,
>> so WAL replay will never see it.
> 
> Perhaps we need two entries then to cover multiple use cases?
> 
> The placement of this was specifically chosen so that it is the last
> entry before the log switch, so that the runningxact record would be
> archived. 
> 
> Yes, we also need one after the shutdown checkpoint to cover the case
> where the whole data directory is copied after shutdown. The comments
> matched the latter case but the position addressed the first case, so it
> looks like I was confused as to which case I was addressing.
> 
> Have updated code to do both. See what you think. Thanks.

It seems dangerous to write a WAL record after the shutdown checkpoint.
It will be overwritten by subsequent startup, which is a recipe for trouble.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 10:32 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I will docuemnt the recommendation to set max_standby_delay = 0 if
> > performing an archive recovery (and explain why).
> 
> Hmm, not sure if that's such a good piece of advice either. It will mean
> waiting for queries forever, which probably isn't what you want if
> you're performing archive recovery. Or maybe it is? Maybe -1? I guess it
> depends on the situation...

That assumes that the purpose of the archive recovery is more important
than running queries. As you say, it would mean always waiting. But the
beauty is that you *can* run queries to determine when to stop, so
having them cancelled defeats that purpose.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:

> It seems dangerous to write a WAL record after the shutdown checkpoint.
> It will be overwritten by subsequent startup, which is a recipe for trouble.

I've said its a corner case and not worth spending time on. I'm putting
it in at your request. If it's not correct before and not correct after,
where exactly do you want it?

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:
> 
>> It seems dangerous to write a WAL record after the shutdown checkpoint.
>> It will be overwritten by subsequent startup, which is a recipe for trouble.
> 
> I've said its a corner case and not worth spending time on. I'm putting
> it in at your request. If it's not correct before and not correct after,
> where exactly do you want it?

I don't know. Perhaps it should go between the REDO pointer of the
shutdown checkpoint and the checkpoint record itself. Or maybe the
information should be included in the checkpoint record itself.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 11:26 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:
> > 
> >> It seems dangerous to write a WAL record after the shutdown checkpoint.
> >> It will be overwritten by subsequent startup, which is a recipe for trouble.
> > 
> > I've said its a corner case and not worth spending time on. I'm putting
> > it in at your request. If it's not correct before and not correct after,
> > where exactly do you want it?
> 
> I don't know. Perhaps it should go between the REDO pointer of the
> shutdown checkpoint and the checkpoint record itself. 

That would seem minimally invasive approach and would appear to work for
both cases. Will do.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 11:26 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:
> > 
> >> It seems dangerous to write a WAL record after the shutdown checkpoint.
> >> It will be overwritten by subsequent startup, which is a recipe for trouble.
> > 
> > I've said its a corner case and not worth spending time on. I'm putting
> > it in at your request. If it's not correct before and not correct after,
> > where exactly do you want it?
> 
> I don't know. Perhaps it should go between the REDO pointer of the
> shutdown checkpoint and the checkpoint record itself. Or maybe the
> information should be included in the checkpoint record itself.

I've implemented this but it requires us to remove two checks - one at
shutdown and one at startup on a shutdown checkpoint. I'm not happy
doing that and would like to put them back.

I'd rather just skip this for now. It's a minor case anyway and there's
nothing stopping writing their own RunningXactData records with a
function, if it is needed. I can add a function for that.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> I'd rather just skip this for now. It's a minor case anyway and there's
> nothing stopping writing their own RunningXactData records with a
> function, if it is needed. I can add a function for that.

That won't help. There's no way to have it in a right place wrt. the
shutdown checkpoint if it's triggered by a user-callable function.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 13:52 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I'd rather just skip this for now. It's a minor case anyway and there's
> > nothing stopping writing their own RunningXactData records with a
> > function, if it is needed. I can add a function for that.
> 
> That won't help. There's no way to have it in a right place wrt. the
> shutdown checkpoint if it's triggered by a user-callable function.

I notice that you avoid saying "yes, I agree we should remove the two
checks".

I will add code to make a shutdown checkpoint be a valid starting place
for Hot Standby, as long as there are no in-doubt prepared transactions.
That way we know there are no xids still running and no locks, without
needing to write a record to say so.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> I will add code to make a shutdown checkpoint be a valid starting place
> for Hot Standby, as long as there are no in-doubt prepared transactions.
> That way we know there are no xids still running and no locks, without
> needing to write a record to say so.

Ok, I can live with that, and should be dead simple to implement.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 18:04 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I will add code to make a shutdown checkpoint be a valid starting place
> > for Hot Standby, as long as there are no in-doubt prepared transactions.
> > That way we know there are no xids still running and no locks, without
> > needing to write a record to say so.
> 
> Ok, I can live with that, and should be dead simple to implement.

First cut changes made, for discussion.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Mon, 2009-09-28 at 11:25 +0300, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > Per Simon's request, for the benefit of the archive, here's all the
> > changes I've done on the patch since he posted the initial version for
> > review for this commitfest as incremental patches. This is extracted
> > from my git repository at
> > git://git.postgresql.org/git/users/heikki/postgres.git.
> 
> Further fixes extracted from above repository attached..

I've applied changes on all these patches apart from 0006-... which has
some dependencies on earlier work I'm still looking at.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Sun, 2009-09-27 at 13:57 +0300, Heikki Linnakangas wrote:
> Per Simon's request, for the benefit of the archive, here's all the
> changes I've done on the patch since he posted the initial version for
> review for this commitfest as incremental patches. This is extracted
> from my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git.

I'm working my way through these changes now. 1, 2, 15 and 16 applied.

We discussed briefly your change 
0011-Replace-per-proc-counters-of-loggable-locks-with-per.patch.

I don't see how that helps at all. The objective of lock counters was to
know if we can skip acquiring an LWlock on all lock partitions. This
change keeps the lock counters yet acquires the locks we were trying to
avoid. This change needs some justification since it is not a bug fix.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I don't see how that helps at all. The objective of lock counters was to
> know if we can skip acquiring an LWlock on all lock partitions. This
> change keeps the lock counters yet acquires the locks we were trying to
> avoid. This change needs some justification since it is not a bug fix.

[ scratches head ... ]  Why is hot standby messing with this sort of
thing at all?  It sounds like a performance optimization that should
be considered separately, and *later*.
        regards, tom lane


Re: Hot Standby on git

From
Simon Riggs
Date:
On Mon, 2009-10-05 at 10:19 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > I don't see how that helps at all. The objective of lock counters was to
> > know if we can skip acquiring an LWlock on all lock partitions. This
> > change keeps the lock counters yet acquires the locks we were trying to
> > avoid. This change needs some justification since it is not a bug fix.
> 
> [ scratches head ... ]  Why is hot standby messing with this sort of
> thing at all?  It sounds like a performance optimization that should
> be considered separately, and *later*.

Possibly.

We have 3 suggested approaches:
* Avoid taking LockPartition locks while we get info for Hot Standby
during normal running, by means of a ref counting scheme (Simon)
* Take the locks and implement a ref counting scheme (Heikki)
* Take the locks, worry later (Tom)

The middle ground seems pointless to me.

I'm happy to go with simple lock-everything-for-now but it's pretty
clear its going to be a annoying performance hit. If we do that we
should put in a parameter to turn on/off so that those who will never
use Hot Standby can avoid this completely.

I'll wait for Heikki's thoughts before implementing anything.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> We discussed briefly your change 
> 0011-Replace-per-proc-counters-of-loggable-locks-with-per.patch.
> 
> I don't see how that helps at all. The objective of lock counters was to
> know if we can skip acquiring an LWlock on all lock partitions. This
> change keeps the lock counters yet acquires the locks we were trying to
> avoid. This change needs some justification since it is not a bug fix.

Well, the original code was buggy. But more to the point, it's a lot
simpler this way, I don't see any reason why the counters should be
per-process, meaning that they need to be exposed in the pgproc structs
or procarray.c.

The point is to avoid the seqscan of the lock hash table. I presumed
that's the expensive part in GetRunningTransactionLocks().

Tom Lane wrote:
> [ scratches head ... ]  Why is hot standby messing with this sort of
> thing at all?  It sounds like a performance optimization that should
> be considered separately, and *later*.

Yeah, I too considered just ripping it out. Simon is worried that
locking all the lock partitions and scanning the locks table can take a
long time. We do that in the master, while holding both ProcArrayLock
and XidGenLock in exclusive mode (hmm, why is shared not enough?), so
there is some grounds for worry. OTOH, it's only done once per checkpoint.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Mon, 2009-09-28 at 11:25 +0300, Heikki Linnakangas wrote:
>> Heikki Linnakangas wrote:
>>> Per Simon's request, for the benefit of the archive, here's all the
>>> changes I've done on the patch since he posted the initial version for
>>> review for this commitfest as incremental patches. This is extracted
>>> from my git repository at
>>> git://git.postgresql.org/git/users/heikki/postgres.git.
>> Further fixes extracted from above repository attached..
> 
> I've applied changes on all these patches apart from 0006-... which has
> some dependencies on earlier work I'm still looking at.

Simon, you don't need to apply those patches. Just review them, and post
comments or subsequent patches on top of the repository.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby on git

From
Simon Riggs
Date:
On Mon, 2009-10-05 at 18:30 -0400, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2009-09-28 at 11:25 +0300, Heikki Linnakangas wrote:
> >> Heikki Linnakangas wrote:
> >>> Per Simon's request, for the benefit of the archive, here's all the
> >>> changes I've done on the patch since he posted the initial version for
> >>> review for this commitfest as incremental patches. This is extracted
> >>> from my git repository at
> >>> git://git.postgresql.org/git/users/heikki/postgres.git.
> >> Further fixes extracted from above repository attached..
> > 
> > I've applied changes on all these patches apart from 0006-... which has
> > some dependencies on earlier work I'm still looking at.
> 
> Simon, you don't need to apply those patches. Just review them, and post
> comments or subsequent patches on top of the repository.

I've applied them to the repository.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Tue, 2009-10-06 at 01:10 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > We discussed briefly your change 
> > 0011-Replace-per-proc-counters-of-loggable-locks-with-per.patch.
> > 
> > I don't see how that helps at all. The objective of lock counters was to
> > know if we can skip acquiring an LWlock on all lock partitions. This
> > change keeps the lock counters yet acquires the locks we were trying to
> > avoid. This change needs some justification since it is not a bug fix.
> 
> Well, the original code was buggy. 

One bug was found, yes, but the submitted changes are not just a bug
fix, they alter the whole basic meaning of that code.

> But more to the point, it's a lot
> simpler this way, I don't see any reason why the counters should be
> per-process, meaning that they need to be exposed in the pgproc structs
> or procarray.c.

There was a very clear reason for doing it that way. By putting the
counters on the PGPROC structs it allows us to check the counts while we
are performing the main sweep of the procarray *and* if the count is
zero it allows us to *completely avoid* taking the locks. By making the
lock counters private to each backend the counters themselves do not
need to be protected by a lock when updated, and the fetch can be done
atomically, as we do with xid.

With respect, I have explained this on list at least twice previously,
as well as in code comments.

> The point is to avoid the seqscan of the lock hash table. I presumed
> that's the expensive part in GetRunningTransactionLocks().

> Tom Lane wrote:
> > [ scratches head ... ]  Why is hot standby messing with this sort of
> > thing at all?  It sounds like a performance optimization that should
> > be considered separately, and *later*.
> 
> Yeah, I too considered just ripping it out. Simon is worried that
> locking all the lock partitions and scanning the locks table can take a
> long time. We do that in the master, while holding both ProcArrayLock
> and XidGenLock in exclusive mode (hmm, why is shared not enough?), so
> there is some grounds for worry. OTOH, it's only done once per checkpoint.

I could live with ripping it out, but what we have now doesn't make
sense, to me.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Sun, 2009-09-27 at 13:57 +0300, Heikki Linnakangas wrote:

> Per Simon's request, for the benefit of the archive, here's all the
> changes I've done on the patch since he posted the initial version for
> review for this commitfest as incremental patches. This is extracted
> from my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git.

There were 16 change patches included in this post. I have applied 14 of
them, almost all without any changes to comment or code. I've fixed 2
bugs and made changes where XXX comments were left in code. That leaves
0011-locks... discussed on another part of this thread and patch
0005-Include-information... which I'm discussing here.

It's a huge set of changes, all of which is just refactoring, none of
which is a bug fix of any kind. The refactoring does sound reasonable,
but is really fairly minor. I feel we should defer this change for the
future to allow us to stabilise the current patch. I do understand the
need for refactoring, but if we refactor everything touched by Hot
Standby then we will simply touch more code and trigger more bugs etc..
This is especially true with prepared transactions, which aren't well
understood and not covered by as many tests.

---src/backend/access/transam/twophase.c      |   96
+++++++++++++++++-----------src/backend/access/transam/twophase_rmgr.c |   13 ----src/backend/access/transam/xact.c
    |   64 +++++++------------src/backend/access/transam/xlog.c          |    5 +-src/backend/commands/discard.c
    |    3 +-src/backend/commands/sequence.c            |    3 +src/backend/storage/lmgr/lock.c            |    5
+-src/backend/tcop/postgres.c               |    1 +src/backend/utils/cache/inval.c            |   90
 
++------------------------src/include/access/subtrans.h              |    3 -src/include/access/twophase.h
|   2 -src/include/access/twophase_rmgr.h         |    6 +-src/include/access/xact.h                  |    6
--src/include/miscadmin.h                   |    7 --src/include/storage/proc.h                 |    7
+-src/include/utils/inval.h                 |    7 ++16 files changed, 108 insertions(+), 210 deletions(-)
 

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 18:47 +0300, Heikki Linnakangas wrote:

> And if you could please review the changes I've been doing, just to
> make sure I haven't inadvertently introduced new bugs. That has
> happened before, as you've rightfully reminded me :-).

You posted 17 patches here.

I've reviewed/applied patches 1,3,4,5,6,7,9,10,13,14,15.
That leaves me with some form of issue on 2, 5, 8, 11, 12, 16 and 17.
Sounds a lot, but out of 41 patches in total to date, I have as yet
unresolved issues with 9.

Patch 0017 has significant changes to it, so I'm posting patches here
for further discussion. Main line thought is that I agree with the
changes you wanted to make and I've added a few extra things.

Commit message from repo:
Apply 0017-Revert-changes-to-subtrans.c-and-slru.c.-Instead-cal.patch
but with heavy modifications to fix a number of bugs and make associated
changes. First, StartupSubtrans() positioned itself at oldestXid, so
that when later running transactions complete they could find no page
for them to update and crash. Second, ExtendClog() expected to be able
to write WAL during recovery and so crashed after 32768 xids. This patch
also extends the patch to cover the recently added support for starting
Hot Standby from a shutdown checkpoint, which causes some refactoring.
Various comments reworded, including allowing a lock overflow to cause a
PENDING state, just as we do with subxid overflow. Another bug was also
found, in that failing to make subtrans entries from the initial
snapshot could lead to later abort records hanging because the topxid
was not set. Code is now similar in all code paths. Sounds like a lot of
changes, but mostly subtle changes rather than lengthy ones.

It seems highly likely that you'll find an error in my changes to your
changes also, but they do pass initial testing.

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Hot Standby on git

From
Simon Riggs
Date:
On Fri, 2009-10-02 at 10:32 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I will docuemnt the recommendation to set max_standby_delay = 0 if
> > performing an archive recovery (and explain why).
> 
> Hmm, not sure if that's such a good piece of advice either. It will mean
> waiting for queries forever, which probably isn't what you want if
> you're performing archive recovery. Or maybe it is? Maybe -1? I guess it
> depends on the situation...

I've changed that to -1 now, which was what I meant. 0 is likely to be a
very wrong setting during archive recovery.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
While playing with conflict resolution, I bumped into this:

postgres=# begin ISOLATION LEVEL SERIALIZABLE;
BEGIN
postgres=# SELECT * FROM foo;id | data
----+------12 |
(1 row)

postgres=# SELECT * FROM foo;id | data
----+------12 |
(1 row)

postgres=# SELECT * FROM foo;id | data
----+------12 |
(1 row)

postgres=# SELECT * FROM foo;id | data
----+------12 |
(1 row)

postgres=# SELECT * FROM foo;id | data
----+------12 |
(1 row)

postgres=# SELECT * FROM foo;
ERROR:  canceling statement due to conflict with recovery
postgres=# SELECT * FROM foo;id | data
----+------13 |
(1 row)

postgres=# SELECT * FROM foo;id | data
----+------13 |
(1 row)

postgres=# begin ISOLATION LEVEL SERIALIZABLE;id | data
----+------13 |
(1 row)

postgres=# SELECT * FROM foo;
BEGIN
postgres=# SELECT * FROM foo;id | data
----+------13 |
(1 row)

The backend and the frontend seem to go out of sync, when a conflict
happens in idle-in-transaction mode.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com



Re: Hot Standby on git

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
>> Tom Lane wrote:
>>> [ scratches head ... ]  Why is hot standby messing with this sort of
>>> thing at all?  It sounds like a performance optimization that should
>>> be considered separately, and *later*.
>> Yeah, I too considered just ripping it out. Simon is worried that
>> locking all the lock partitions and scanning the locks table can take a
>> long time. We do that in the master, while holding both ProcArrayLock
>> and XidGenLock in exclusive mode (hmm, why is shared not enough?), so
>> there is some grounds for worry. OTOH, it's only done once per checkpoint.
> 
> I could live with ripping it out, but what we have now doesn't make
> sense, to me.

Ok, let's just rip it out for now.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com