Thread: hot standby - merged up to CVS HEAD

hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
For reasons that can only be described as masochistic, I took it upon
myself to merge the Hot Standby patch up to CVS HEAD during my
vacation last week.  Here's what I did:

1. Downloaded norecoveryprocs-1.patch from
http://archives.postgresql.org/message-id/49A64D73.6090302@enterprisedb.com
2. Found the commit in my git clone to which it applied (for some
reason, I wasn't able to locate the commit mentioned in Heiki's git
diff output).
3. Incrementally merged in the changes from my master branch,
resolving conflicts as I went.
4. Fixed all the whitespace errors about which git diff --check
complained (there were many of these).
5. Made a quick copy-editing pass over the docs and fixed a few
obvious errors (stray words, missing punctuation).
6. Went through the comments and replaced a few replaced references to
8.4 with references to 8.5 where appropriate.
7. Published it here:
http://git.postgresql.org/gitweb?p=postgresql-rhaas.git;a=shortlog;h=refs/heads/hs

I have tested that this version of the patch:

1. Compiles and passes regression tests.
2. Works for the simplest possible test cases (select * from table on
standby, insert one row on master, checkpoint, select * from table on
standby again).

I don't expect this to be reviewed for CommitFest 2009-07.  For one
thing, I'm submitting it an hour-plus after the deadline; for another
thing, it needs a lot more testing than the above; for a third thing,
I haven't addressed any of the substantive issues with the patch that
Heikki mentioned here:

http://archives.postgresql.org/message-id/4A4DBF8F.8040007@enterprisedb.com

I would have liked to reach some of those issues, but I ran out of
time, and I don't understand the history of the development of this
patch well enough to separate the "crud" to which Heikki refers from
the non-crud (yet, anyway).  However, I'm hopeful that the work I've
done so far will be helpful to someone else in taking this forward,
and I thought it made sense to publish this now, before turning my
full attention to the CommitFest, in case someone else was
contemplating doing something similar.

A few other comments based on a preliminary reading of this patch:

- pg_last_recovered_xact_timestamp is documented to return "a default
value" when recovery is not in progress; the default value seems to be
2000-01-01 00:00:00 UTC, but the actual displayed value depends on the
time zone.  I think this is bogus; it should return NULL instead of
this "default value".
- Why do some of the functions in xlog.c have apparently totally
superfluous inner blocks?
- The documentation states that the fact that a pause request made
while waiting for a WAL file does not take effect until that WAL file
arrives, and that this is not a bug.  Why not?
- It appears that this code has been added (almost, but not quite,
verbatim) to the top of most (but I think not quite all) of the *_redo
functions.  Can/should this be centralized in StartupXLOG where
rm_redo is invoked?
- ProcArrayRemove() contains a commented-out hunk that should
presumably go away, unless of course it shouldn't be commented out.
- ProcArrayInitRecoveryEnvironment() does nothing except call
PublishStartupProcessInformation(), and therefore seems quite
unnecessary.

...Robert

Attachment

Re: hot standby - merged up to CVS HEAD

From
Simon Riggs
Date:
On Tue, 2009-07-14 at 21:12 -0400, Robert Haas wrote:
> 
> 1. Downloaded norecoveryprocs-1.patch from
> http://archives.postgresql.org/message-id/49A64D73.6090302@enterprisedb.com

http://archives.postgresql.org/message-id/4A4DBF8F.8040007@enterprisedb.com

I have to confess that I had no idea that the above discussion had taken
place. (The title wasn't anything to do with Hot Standby, nor did anyone
copy me in; I don't read every email).

I've said very clearly that I would work on this for 8.5 [at the
developer meeting] and also that it wouldn't be ready for the first
commit fest, when asked. I was told recently that someone heard the
patch was dead; I've never said that, but I would like a summer holiday.

It's going to be very confusing if people submit their own versions of
it. So now we have mine, Heikki's and Robert's. I'd like this to stop
please, have a little faith and a little patience. Presumably Robert's
rebasing patch is best place to start from now for later work.

Welcome to add notes here
http://wiki.postgresql.org/wiki/Hot_Standby

On other points: there was a bug related to subtransaction handling in
the initial startup info of Hot Standby. It could not have been
committed without that being fixed.

In my own recent review, I've noted two design flaw bugs:
* AccessExclusiveLocks held at startup are not properly initialised
* AccessExclusiveLocks held by prepared transactions are not handled
correctly at termination of recovery - the lock owner needs to be
transferred or we redesign somehow

I'm sure it needs much work yet. Not least of which will be re-checking
all of the previous bugs to ensure no regressions in translation.

Thanks,

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: hot standby - merged up to CVS HEAD

From
"Joshua D. Drake"
Date:
On Wed, 2009-07-15 at 17:27 +0100, Simon Riggs wrote:
> On Tue, 2009-07-14 at 21:12 -0400, Robert Haas wrote:
> > 

> It's going to be very confusing if people submit their own versions of
> it. So now we have mine, Heikki's and Robert's. I'd like this to stop
> please, have a little faith and a little patience. Presumably Robert's
> rebasing patch is best place to start from now for later work.
> 

Robert, thank you for your work.

Simon you need to realize that a lot of people really want this patch. I
for one applaud Robert's work (and Heikki's). If you want a summer
holiday, go for it. I certainly haven't been working that hard this
summer. 

However, I certainly don't think that is any reason for people who are
showing initiation and drive should stop. If Robert wants to work on
this patch, more power to him. Perhaps he can solve something you can't.
Perhaps it will be done before you are done with holiday. If not, then
at least we have moved a little further in the process and in theory
taken some workload off of you.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 



Re: hot standby - merged up to CVS HEAD

From
Bruce Momjian
Date:
Joshua D. Drake wrote:
> On Wed, 2009-07-15 at 17:27 +0100, Simon Riggs wrote:
> > On Tue, 2009-07-14 at 21:12 -0400, Robert Haas wrote:
> > > 
> 
> > It's going to be very confusing if people submit their own versions of
> > it. So now we have mine, Heikki's and Robert's. I'd like this to stop
> > please, have a little faith and a little patience. Presumably Robert's
> > rebasing patch is best place to start from now for later work.
> > 
> 
> Robert, thank you for your work.
> 
> Simon you need to realize that a lot of people really want this patch. I
> for one applaud Robert's work (and Heikki's). If you want a summer
> holiday, go for it. I certainly haven't been working that hard this
> summer. 
> 
> However, I certainly don't think that is any reason for people who are
> showing initiation and drive should stop. If Robert wants to work on
> this patch, more power to him. Perhaps he can solve something you can't.
> Perhaps it will be done before you are done with holiday. If not, then
> at least we have moved a little further in the process and in theory
> taken some workload off of you.

Sorry to be chiming in weeks late on this, but there are some procedural
issues I want to address.

First, I agree with everything Joshua Drake said, and thank you for
chiming in on this.

Second, Simon, you seem disappointed that Robert Haas helped with the
hot standby patch.  By your own admission you weren't working on it, so
I would think you would be grateful that someone moved it forward.  This
is not a question of 'faith' and 'patience', but getting the patch
completed.  The goal is to get the patch completed, not for a single
individual to complete it.

Third, Robert, you should have communicated to the list that you were
going to work on the patch, so that there would not be duplicate effort
if someone else was also working on it.  As I understood it, Heikki was
in control of the patch, but it doesn't hurt to send out a short email
stating you wanted to work on it now.  In this case no one was working
on it, but if someone had been, there would have been duplicate effort
and that is disappointing to everyone.  Odds are when you started on the
patch you didn't realize you would be overhauling it, but once that
became clear, an email to hackers, ideally CC'ing the original patch
authors, would have been a good idea.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: hot standby - merged up to CVS HEAD

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Fri, Aug 7, 2009 at 11:33 PM, Bruce Momjian<bruce@momjian.us> wrote:
> > Third, Robert, you should have communicated to the list that you were
> > going to work on the patch, so that there would not be duplicate effort
> > if someone else was also working on it. ?As I understood it, Heikki was
> > in control of the patch, but it doesn't hurt to send out a short email
> > stating you wanted to work on it now. ?In this case no one was working
> > on it, but if someone had been, there would have been duplicate effort
> > and that is disappointing to everyone. ?Odds are when you started on the
> > patch you didn't realize you would be overhauling it, but once that
> > became clear, an email to hackers, ideally CC'ing the original patch
> > authors, would have been a good idea.
> 
> Simon asked me about this offlist as well: I'll repeat the gist of
> what I said to him here.  I really wasn't sure how far I was going to
> be able to get with this, and I didn't put more work into it before
> sending it than the amount of time I was willing to waste on it.  I
> figured that I was fairly safe because there had been no activity for
> 5 months, but if I had been wrong, I was prepared to accept that.  I
> thought it would be a little arrogant to say I was going to work on
> the patch without having any idea whether I was going to be able to do
> anything useful with it; since I ended up getting something that may
> be useful done, it now seems like I should've said something, but that
> was a little less obvious at the time.  At any rate, I am sensitive to
> the issue and will try to handle it better the next time.

Yea, it is a learning experience.

> Also, to my knowledge, nobody has really looked through the results to
> see if they are any good, so the success of the endeavor remains in
> doubt from my point of view.  That's a bit of a shame because I am
> interested in putting some more time into this, but I don't have the
> knowledge or experience to "fly solo" here.

Well, Simon stated that your version should now be used as the most
recent one, so I would call that a success.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Fri, Aug 7, 2009 at 11:33 PM, Bruce Momjian<bruce@momjian.us> wrote:
> Third, Robert, you should have communicated to the list that you were
> going to work on the patch, so that there would not be duplicate effort
> if someone else was also working on it.  As I understood it, Heikki was
> in control of the patch, but it doesn't hurt to send out a short email
> stating you wanted to work on it now.  In this case no one was working
> on it, but if someone had been, there would have been duplicate effort
> and that is disappointing to everyone.  Odds are when you started on the
> patch you didn't realize you would be overhauling it, but once that
> became clear, an email to hackers, ideally CC'ing the original patch
> authors, would have been a good idea.

Simon asked me about this offlist as well: I'll repeat the gist of
what I said to him here.  I really wasn't sure how far I was going to
be able to get with this, and I didn't put more work into it before
sending it than the amount of time I was willing to waste on it.  I
figured that I was fairly safe because there had been no activity for
5 months, but if I had been wrong, I was prepared to accept that.  I
thought it would be a little arrogant to say I was going to work on
the patch without having any idea whether I was going to be able to do
anything useful with it; since I ended up getting something that may
be useful done, it now seems like I should've said something, but that
was a little less obvious at the time.  At any rate, I am sensitive to
the issue and will try to handle it better the next time.

Also, to my knowledge, nobody has really looked through the results to
see if they are any good, so the success of the endeavor remains in
doubt from my point of view.  That's a bit of a shame because I am
interested in putting some more time into this, but I don't have the
knowledge or experience to "fly solo" here.

...Robert


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Sat, Aug 8, 2009 at 12:02 AM, Bruce Momjian<bruce@momjian.us> wrote:
> Well, Simon stated that your version should now be used as the most
> recent one, so I would call that a success.

Fair enough, but it still needs more work.  I had some review comments
I was hoping to get responses to, in the section beginning with "A few
other comments based on a preliminary reading of this patch":

http://archives.postgresql.org/pgsql-hackers/2009-07/msg00854.php

...Robert


Re: hot standby - merged up to CVS HEAD

From
Simon Riggs
Date:
On Sat, 2009-08-08 at 00:02 -0400, Bruce Momjian wrote:

> > Also, to my knowledge, nobody has really looked through the results to
> > see if they are any good, so the success of the endeavor remains in
> > doubt from my point of view.  That's a bit of a shame because I am
> > interested in putting some more time into this, but I don't have the
> > knowledge or experience to "fly solo" here.
> 
> Well, Simon stated that your version should now be used as the most
> recent one, so I would call that a success.

I'm not sure why you're stirring this up again.

Simon didn't state that the above. You can re-read my words and we can
debate their meaning, but that's just a waste of time.

I shouldn't have to publicly justify why I haven't finished working on a
patch, when a) we have time, b) it's summer and c) I've already said I
would finish the patch, very very clearly in a big loud voice. I expect
to finish and commit comfortably in 2009, leaving many months before
next release.

So, as I said before, I expect to be left in peace to finish my own
work. There wouldn't be anything to finish if it wasn't for me. I
specifically don't want to review other people's versions of work when
I'm trying to do my own, nor do I expect others to encourage multiple
authors on the same piece of work.

-- Simon Riggs           www.2ndQuadrant.com



Re: hot standby - merged up to CVS HEAD

From
Bruce Momjian
Date:
Simon Riggs wrote:
> 
> On Sat, 2009-08-08 at 00:02 -0400, Bruce Momjian wrote:
> 
> > > Also, to my knowledge, nobody has really looked through the results to
> > > see if they are any good, so the success of the endeavor remains in
> > > doubt from my point of view.  That's a bit of a shame because I am
> > > interested in putting some more time into this, but I don't have the
> > > knowledge or experience to "fly solo" here.
> > 
> > Well, Simon stated that your version should now be used as the most
> > recent one, so I would call that a success.
> 
> I'm not sure why you're stirring this up again.
> 
> Simon didn't state that the above. You can re-read my words and we can
> debate their meaning, but that's just a waste of time.

You stated:

- It's going to be very confusing if people submit their own versions of
- it. So now we have mine, Heikki's and Robert's. I'd like this to stop
- please, have a little faith and a little patience. Presumably Robert's
- rebasing patch is best place to start from now for later work.

I assume your last sentence is saying exactly that Robert's version
should be used as the most current reprsentation of this feature patch.

> I shouldn't have to publicly justify why I haven't finished working on a
> patch, when a) we have time, b) it's summer and c) I've already said I
> would finish the patch, very very clearly in a big loud voice. I expect
> to finish and commit comfortably in 2009, leaving many months before
> next release.
> 
> So, as I said before, I expect to be left in peace to finish my own
> work. There wouldn't be anything to finish if it wasn't for me. I
> specifically don't want to review other people's versions of work when
> I'm trying to do my own, nor do I expect others to encourage multiple
> authors on the same piece of work.

The bottom line is that you think you have ownership of the patch and
the feature --- you do not.

You are right you don't have to justify anything, but neither can you
claim ownership of the patch/feature and complain that others are
working on it too.  This is a community project --- if you want your
patches to remain your property, I suggest you no longer post them to
our community lists.  If you are actively working on patches, I assume
others will not duplicate your work, but if you are idle, others are
encouraged to keep improving the patch.  Again, if you don't like that,
then perhaps the community-development process isn't for you.

And your misunderstanding in this area is exactly why I am bringing this
up.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Sat, Aug 8, 2009 at 1:12 PM, Bruce Momjian<bruce@momjian.us> wrote:
> You are right you don't have to justify anything, but neither can you
> claim ownership of the patch/feature and complain that others are
> working on it too.  This is a community project --- if you want your
> patches to remain your property, I suggest you no longer post them to
> our community lists.  If you are actively working on patches, I assume
> others will not duplicate your work, but if you are idle, others are
> encouraged to keep improving the patch.  Again, if you don't like that,
> then perhaps the community-development process isn't for you.

Simon,

I think it would also be fair to point out that you keep saying that
you're going to deliver this patch for 8.5, but you haven't provided
any real timetable as to when you're going to start working on it or
when it'll be completed.  Because this patch IS so important to the
community, people want to know the answers to those questions.  That
is exactly why you were asked about your schedule at PGcon; and you
demurred.  I understand that your #1 priority needs to be the work for
which you get paid the most money, but I think it's unfair to ask
other people to wait for you to work on something when you haven't
committed to a timetable for working on it.  It might be unfair to ask
it even if you had committed to a timetable and that timetable was
well out in the future, but it's certainly unfair when there is no
timetable at all.

The most recent discussion of the timing of this patch was that you
opined it should go after Streaming Rep.  Based on the review of
Streaming Rep this CommitFest, I would say that there is an awful lot
of work left to be done to make that patch committable.  I think we
will be lucky if it makes it into 8.5.  Call me a pessimist but I
think we'll be doing pretty well if it makes it into 8.6.  I think the
chances that we are going to get streaming rep committed and still
have enough CommitFests left to get Hot Standby committed too are just
about zero, so waiting for Streaming Rep to be committed first does
not seem like a very realistic plan to me.  Note that Streaming Rep
got moved to returned with feedback on *the first day* of this
CommitFest; that's how much work it took to see that it was not
committable.  Note also that the resistance to committing large
patches is going to grow and grow as we get closer to the end of this
development cycle.  I am very much afraid that if we don't have a
version of Hot Standby that is reviewable for the next CommitFest we
are going to be out of luck for 8.5.

I do not think that I have the juice to make Hot Standby happen.  It's
possible that I don't know my own strength, but I'm not prepared to
bet on it.  At least, it looks like I do have the juice to dust of the
bitrot, and maybe fix some of the more superficial problems with it.
I would like to think that is something helpful.

...Robert


Re: hot standby - merged up to CVS HEAD

From
Simon Riggs
Date:
On Sat, 2009-08-08 at 22:02 -0400, Robert Haas wrote:

> I think it would also be fair to point out that you keep saying that
> you're going to deliver this patch for 8.5, but you haven't provided
> any real timetable as to when you're going to start working on it or
> when it'll be completed.  Because this patch IS so important to the
> community, people want to know the answers to those questions.  That
> is exactly why you were asked about your schedule at PGcon; and you
> demurred.  

I'm not sure I have ever demurred on anything, a failing I'm sure.

I mentioned that HS would make sense to go in *after* Synch Rep and I
stand by that, since we only have so many people that understand that
area and we cannot do everything at once.

HS is important. That is why I have put so much time and money into
being in a position where the end of the tunnel is in sight. Had I not
done so, we wouldn't even be discussing it.

> I think it's unfair to ask
> other people to wait for you to work on something when you haven't
> committed to a timetable for working on it

I've not asked anybody to wait. I tried very, very hard to get HS into
8.4 and many people were opposed to that. The next release of Postgres
isn't released until next year. If it matters as to which month it goes
into Postgres, I've not heard anybody explain why the exact month is
important. I don't see anything there myself of concern to the
community. 

I'm working on HS; I've said so clearly and say it again now. To my
knowledge, no other Postgres project has committed to a timetable for
delivery, so I'm not clear why you think one should have been given
here, or why the absence of such a timetable implies anything. Dev tree
only opened again about a month ago, the dates of which were not
published in advance, so no detailed planning was possible for people
contributing to the beta and release of 8.4.

Want an HS Timetable? Well, I will try to complete it for next
commit-fest, but there may be issues that mean it comes in the next one
after that. So Sept 15, or maybe Nov 15. My understanding is that
community wants quality and so that is my #1 priority. I'll make code
available on Sept 15, so that we either have a WIP patch or a
request-for-commit patch, not sure which it will be.

> I understand that your #1 priority needs to be the work for
> which you get paid the most money, but I think it's unfair to ask
> other people to wait for you to work on something when you haven't
> committed to a timetable for working on it.  It might be unfair to ask
> it even if you had committed to a timetable and that timetable was
> well out in the future, but it's certainly unfair when there is no
> timetable at all.

I'm not embarrassed by discussing money but that doesn't make it my
personal priority. I'm sure you didn't mean to imply I was mercenary.

I've contributed to the community for years, mostly unpaid. Which means
I do at some point have to take work that pays. If I have ever got paid
for working on Postgres, it has always been at a much lower rate than I
would have otherwise received, so if anything I've lost money by working
on Postgres. My choice. I parted with EDB specifically to allow me to
spend more time working on software for Postgres, which would otherwise
have certainly been denied me. My choice, nobody else's and one that has
worked well for me. I've chosen contribution to this community over
money many, many times.

The current team will continue working on HS; assistance from any and
every other hacker is welcome in producing that. Not all effort is
productive teamwork, however, and I encourage anyone that wishes to help
on a project prior to patch submission to contact the patch author to
discuss that first, to coordinate and avoid wasted effort. People
interested in review and test need not make contact, since they'll have
access to the code in the normal way.

-- Simon Riggs           www.2ndQuadrant.com



Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Sun, Aug 9, 2009 at 6:11 AM, Simon Riggs<simon@2ndquadrant.com> wrote:
> I'm working on HS; I've said so clearly and say it again now. To my
> knowledge, no other Postgres project has committed to a timetable for
> delivery, so I'm not clear why you think one should have been given
> here, or why the absence of such a timetable implies anything. Dev tree

Well, basically because otherwise nobody except you can do anything.
In your last email you wrote:

> assistance from any and every other hacker is welcome in producing that.

What I need is some help figuring out when and how I can provide that
assistance and what I can do.

At the moment, the version of the patch that I last posted does not
apply due to a conflict in sinval.h, I believe due to conflicts
introduced by Fujii Masao's signal multiplexing patch.  I haven't had
time to look at that in any detail yet, but I'd like to do do so soon.There are some other things that look like easy
cleanupsthat I think
 
I could knock out as well; see my original email on this thread.  Of
course, your input on those items would be invaluable.  Of course, if
you've already done some of this work, that would be great, but then
it seems like you ought to have let us know that you were doing it
before you did it, and posted the updated patch to -hackers
afterwards, just as you asked me to do.

Working disconnected from everyone else until September 15th (or
November 15th) and then posting the patch will make it very, very
difficult for anyone else to do anything useful.   When I was working
the explain patches, I posted them regularly to -hackers, and
published a git repository on git.postgresql.org, which meant that
anyone could follow along at home if they wished.  Since Hot Standby
is more interesting before breakfast than machine-readable explain
output is all day, the same approach seems desirable here.  At the
very least, I think you should post your progress weekly so that we
can read, review, comment, propose changes...

...Robert


Re: hot standby - merged up to CVS HEAD

From
Simon Riggs
Date:
On Sat, 2009-08-08 at 13:12 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 
> > I'm not sure why you're stirring this up again.
> > 

> You stated:
> 
> - It's going to be very confusing if people submit their own versions of
> - it. So now we have mine, Heikki's and Robert's. I'd like this to stop
> - please, have a little faith and a little patience. Presumably Robert's
> - rebasing patch is best place to start from now for later work.
> 
> I assume your last sentence is saying exactly that Robert's version
> should be used as the most current reprsentation of this feature patch.

That isn't what I meant then and isn't what I think now: that patch is
not verified.

The reason for my objection was that accepting patches had already
caused significant setbacks on this complex patch. I won't be ignoring
Robert's work, which would be petty, but I won't be picking it up
wholesale either, nor will I be providing a review of it. Nor Heikki's,
nor anyone elses.

I am moving forward the parts of the patch that I consider worth
submitting. I need to be happy with every single line of code before I
submit it; it's too easy to make a mistake otherwise. I'm not going to
submit something that I can't verify, any more than I would expect any
committer to commit code they can't verify either. The current dev team
(Simon, Gianni, Gabriele) only has time to spend on testing one patch,
not various ones. I do hope to receive comments from reviewers and will
include consensus changes into the code. And as I mentioned elsewhere,
there are still changes/features to add to the code itself.

As you point out, people can do anything they want with submitted code,
so they may make any judgement they wish about that patch. If anybody
thinks any good will come from ignoring the opinion of the original
author, go right ahead.

> The bottom line is that you think you have ownership of the patch and
> the feature --- you do not.
> 
> You are right you don't have to justify anything, but neither can you
> claim ownership of the patch/feature and complain that others are
> working on it too.  This is a community project --- if you want your
> patches to remain your property, I suggest you no longer post them to
> our community lists.  If you are actively working on patches, I assume
> others will not duplicate your work, but if you are idle, others are
> encouraged to keep improving the patch.  Again, if you don't like
> that,
> then perhaps the community-development process isn't for you.

I've *never* spoken of code or feature ownership. But this is a
community project and I can request teamwork from other contributors,
which is what I did.

I've said very clearly that I am working on this and it's fairly
laughable to suggest that anybody thought I wasn't. What more should I
do to prove something is "active" if you won't accept my clearly spoken
word? How did you decide I was idle exactly? 

I'll make sure to do regular blogs on what I'm working on.

I have no problem with Robert. I have no problem with Robert completing
my inactive patches - he is doing exactly that with join removal and I
haven't uttered a word. If I felt as you think I do, then surely I would
have objected to both. Yet I have only objected on the one patch that
I've said clearly I'm working on, with specific reasons. If Robert
hadn't been present when I said it, I might have reacted differently.

To everybody and anybody: please don't submit alternative versions of a
patch that other hackers have said they are working on, and don't have
conversations about those projects on diverse threads. That's not a
claim of code or feature ownership, it's just common sense teamwork on
an important development project. 

-- Simon Riggs           www.2ndQuadrant.com



Re: hot standby - merged up to CVS HEAD

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Sat, 2009-08-08 at 13:12 -0400, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > 
> > > I'm not sure why you're stirring this up again.
> > > 
> 
> > You stated:
> > 
> > - It's going to be very confusing if people submit their own versions of
> > - it. So now we have mine, Heikki's and Robert's. I'd like this to stop
> > - please, have a little faith and a little patience. Presumably Robert's
> > - rebasing patch is best place to start from now for later work.
> > 
> > I assume your last sentence is saying exactly that Robert's version
> > should be used as the most current reprsentation of this feature patch.
> 
> That isn't what I meant then and isn't what I think now: that patch is
> not verified.

I am not sure how to respond to you when I can't even interpret what you
say in emails, e.g. "Presumably Robert's rebasing patch is best place to
start from now for later work."

> As you point out, people can do anything they want with submitted code,
> so they may make any judgement they wish about that patch. If anybody
> thinks any good will come from ignoring the opinion of the original
> author, go right ahead.

Right.  At some point more people are going to get involved and complete
the patch --- historically this is the way complex patches have evolved,
and I think many of your patches are in that group.

> > The bottom line is that you think you have ownership of the patch and
> > the feature --- you do not.
> > 
> > You are right you don't have to justify anything, but neither can you
> > claim ownership of the patch/feature and complain that others are
> > working on it too.  This is a community project --- if you want your
> > patches to remain your property, I suggest you no longer post them to
> > our community lists.  If you are actively working on patches, I assume
> > others will not duplicate your work, but if you are idle, others are
> > encouraged to keep improving the patch.  Again, if you don't like
> > that,
> > then perhaps the community-development process isn't for you.
> 
> I've *never* spoken of code or feature ownership. But this is a
> community project and I can request teamwork from other contributors,
> which is what I did.
> 
> I've said very clearly that I am working on this and it's fairly
> laughable to suggest that anybody thought I wasn't. What more should I
> do to prove something is "active" if you won't accept my clearly spoken
> word? How did you decide I was idle exactly? 

Your statement of 15 Jul 2009 stated:

- I've said very clearly that I would work on this for 8.5 [at the
- developer meeting] and also that it wouldn't be ready for the first
- commit fest, when asked. I was told recently that someone heard the
- patch was dead; I've never said that, but I would like a summer holiday.

I assume that means you were not actively working on it, hence my
conclusion, which is probably wrong because I can't manage to interpret
your emails.  :-(

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Sun, Aug 9, 2009 at 2:43 PM, Simon Riggs<simon@2ndquadrant.com> wrote:
> I've said very clearly that I am working on this and it's fairly
> laughable to suggest that anybody thought I wasn't. What more should I
> do to prove something is "active" if you won't accept my clearly spoken
> word? How did you decide I was idle exactly?

I think we looked at the fact that you haven't posted an updated
version of this patch in almost 6 months.

...Robert


Re: hot standby - merged up to CVS HEAD

From
"Joshua D. Drake"
Date:
On Sun, 2009-08-09 at 22:15 -0400, Robert Haas wrote:
> On Sun, Aug 9, 2009 at 2:43 PM, Simon Riggs<simon@2ndquadrant.com> wrote:
> > I've said very clearly that I am working on this and it's fairly
> > laughable to suggest that anybody thought I wasn't. What more should I
> > do to prove something is "active" if you won't accept my clearly spoken
> > word? How did you decide I was idle exactly?
>
> I think we looked at the fact that you haven't posted an updated
> version of this patch in almost 6 months.

That pretty much covers it.  We practice open development, we always
have. Those who don't generally run into problems just like this one.
Robert has taken the path of being open about the work that is being
performed and thus he is the one that appears to be making progress.

Simon, regardless of your "words" you have shown nothing for 6 months.
Does that mean you aren't working on it? Of course not but it certainly
shows a lack of transparency to the community with the work. You know
that doesn't work. The community assumes by default that no patch (or
active communication which you also haven't done) means no work. It
always has.

So instead of all of us bickering, how about we start actively working
"together" on the feature again.

Sincerely,

Joshua D. Drake
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: hot standby - merged up to CVS HEAD

From
Josh Berkus
Date:
All,

Can we stop arguing about a patch everyone wants?

Simon:  you have people offering to help with the patch.  Offering to
help *right now*.  Might I suggest that you establish a GIT branch for
Hot Standby so that more people can collaborate?  Working on it until
you get it "perfect" offsite doesn't work; it's going to require
adjustment/debugging once it gets to commitfest anyway.  Might as well
start that now, or it'll just delay application.

Everyone Else: Simon is working hard on this, please get off his back.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: hot standby - merged up to CVS HEAD

From
"Joshua D. Drake"
Date:
On Mon, 2009-08-10 at 10:20 -0700, Josh Berkus wrote:
> All,
>
> Can we stop arguing about a patch everyone wants?
>
> Simon:  you have people offering to help with the patch.  Offering to
> help *right now*.  Might I suggest that you establish a GIT branch for
> Hot Standby so that more people can collaborate?  Working on it until
> you get it "perfect" offsite doesn't work; it's going to require
> adjustment/debugging once it gets to commitfest anyway.  Might as well
> start that now, or it'll just delay application.
>
> Everyone Else: Simon is working hard on this, please get off his back.

I believe that all anyone is asking is that Simon communicate and
collaborate.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: hot standby - merged up to CVS HEAD

From
Simon Riggs
Date:
On Mon, 2009-08-10 at 10:20 -0700, Josh Berkus wrote:

> Simon:  you have people offering to help with the patch.  Offering to
> help *right now*.  Might I suggest that you establish a GIT branch for
> Hot Standby so that more people can collaborate?  Working on it until
> you get it "perfect" offsite doesn't work; it's going to require
> adjustment/debugging once it gets to commitfest anyway.  Might as well
> start that now, or it'll just delay application.

Agreed, but there will be some time before that is possible. I'm happy
to commit to Sept 15 *latest* to do the above. I know what has to be
done and that's my timescale for doing it.

> Everyone Else: Simon is working hard on this, please get off his back.

Thanks, good plan.

There is absolutely no danger this patch is going to be delayed and
there is really no call for haste. I near killed myself trying to get it
into 8.4 and I would like to avoid a tension-fest this time around. We
have time and intend to take it at a reasonable pace, and spend time
thinking first, then talking later. Over and out, for now.

-- Simon Riggs           www.2ndQuadrant.com



Re: hot standby - merged up to CVS HEAD

From
David Fetter
Date:
On Mon, Aug 10, 2009 at 11:15:51PM +0100, Simon Riggs wrote:
> On Mon, 2009-08-10 at 10:20 -0700, Josh Berkus wrote:
> 
> > Simon:  you have people offering to help with the patch.  Offering
> > to help *right now*.  Might I suggest that you establish a GIT
> > branch for Hot Standby so that more people can collaborate?
> > Working on it until you get it "perfect" offsite doesn't work;
> > it's going to require adjustment/debugging once it gets to
> > commitfest anyway.  Might as well start that now, or it'll just
> > delay application.
> 
> Agreed, but there will be some time before that is possible. I'm
> happy to commit to Sept 15 *latest* to do the above. I know what has
> to be done and that's my timescale for doing it.

With all due respect, Simon, you've missed the point completely.  If
you have done any work on this whatsoever, *NOW* is the time to share
it and going forward, immediately publishing any change, is the way to
keep it shared.

Working off in splendid isolation, while it may appeal to you, is the
wrong move.  Guaranteed tears.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: hot standby - merged up to CVS HEAD

From
Bruce Momjian
Date:
Simon Riggs wrote:
> 
> On Mon, 2009-08-10 at 10:20 -0700, Josh Berkus wrote:
> 
> > Simon:  you have people offering to help with the patch.  Offering to
> > help *right now*.  Might I suggest that you establish a GIT branch for
> > Hot Standby so that more people can collaborate?  Working on it until
> > you get it "perfect" offsite doesn't work; it's going to require
> > adjustment/debugging once it gets to commitfest anyway.  Might as well
> > start that now, or it'll just delay application.
> 
> Agreed, but there will be some time before that is possible. I'm happy
> to commit to Sept 15 *latest* to do the above. I know what has to be
> done and that's my timescale for doing it.
> 
> > Everyone Else: Simon is working hard on this, please get off his back.
> 
> Thanks, good plan.
> 
> There is absolutely no danger this patch is going to be delayed and
> there is really no call for haste. I near killed myself trying to get it
> into 8.4 and I would like to avoid a tension-fest this time around. We
> have time and intend to take it at a reasonable pace, and spend time
> thinking first, then talking later. Over and out, for now.

Simon, I am sure you worked very hard trying to get hot standby into
8.4, and worked under great pressure, I am sure it is hard to get
motivated to continue that work.

I regret not vocally expressing caution about trying to get hot standby
into 8.4.  It would have required Herculean effort with everything going
perfectly, so it was very unlikely to be possible, but rather than say
something, I didn't want to be the bearer of bad news, so said nothing.

I want to try to avoid a big push at the end of 8.5 development to get
hot standby completed, which is why I am trying to motivate anyone to
continue working on it, and I want to get more people to help Simon with
it.

Simon, I am not sure how happy you are about this, but the community
needs your help with hot standby _and_ synchronous replication in 8.5,
so I would like to get hot standby completed soon (hopefully by getting
you help and not killing you), so we leave time for you to help on
synchronous replication as well.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: hot standby - merged up to CVS HEAD

From
"Joshua D. Drake"
Date:
On Sun, 2009-08-09 at 22:15 -0400, Robert Haas wrote:
> On Sun, Aug 9, 2009 at 2:43 PM, Simon Riggs<simon@2ndquadrant.com> wrote:
> > I've said very clearly that I am working on this and it's fairly
> > laughable to suggest that anybody thought I wasn't. What more should I
> > do to prove something is "active" if you won't accept my clearly spoken
> > word? How did you decide I was idle exactly?
> 
> I think we looked at the fact that you haven't posted an updated
> version of this patch in almost 6 months.

That pretty much covers it.  We practice open development, we always
have. Those who don't generally run into problems just like this one.
Robert has taken the path of being open about the work that is being
performed and thus he is the one that appears to be making progress.

Simon, regardless of your "words" you have shown nothing for 6 months.
Does that mean you aren't working on it? Of course not but it certainly
shows a lack of transparency to the community with the work. You know
that doesn't work. The community assumes by default that no patch (or
active communication which you also haven't done) means no work. It
always has.

So instead of all of us bickering, how about we start actively working
"together" on the feature again.

Sincerely,

Joshua D. Drake
-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 



Re: hot standby - merged up to CVS HEAD

From
"Joshua D. Drake"
Date:
On Mon, 2009-08-10 at 10:20 -0700, Josh Berkus wrote:
> All,
> 
> Can we stop arguing about a patch everyone wants?
> 
> Simon:  you have people offering to help with the patch.  Offering to
> help *right now*.  Might I suggest that you establish a GIT branch for
> Hot Standby so that more people can collaborate?  Working on it until
> you get it "perfect" offsite doesn't work; it's going to require
> adjustment/debugging once it gets to commitfest anyway.  Might as well
> start that now, or it'll just delay application.
> 
> Everyone Else: Simon is working hard on this, please get off his back.

I believe that all anyone is asking is that Simon communicate and
collaborate.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 



Re: hot standby - merged up to CVS HEAD

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> I had some review comments
> I was hoping to get responses to, in the section beginning with "A few
> other comments based on a preliminary reading of this patch":
> 
> http://archives.postgresql.org/pgsql-hackers/2009-07/msg00854.php

Having read the patch now, here's a one issue in addition to the remarks
you made in mail linked above, and all the things already marked with
XXX comments:

I think there's a race condition in the way LogCurrentRunningXacts() is
called at the end of checkpoint. This can happen in the master:

1. Checkpoint starts
2. Transaction 123 begins, and does some updates
3. Checkpoint ends. LogCurrentRunningXacts() is called.
4. LogCurrentRunningXacts() gets the list of currently running
transactions by calling GetCurrentTransactionData().
5. Transaction 123 ends, writing commit record to WAL
6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
includes XID 123, since that was still running at step 4.

When that is replayed, ProcArrayUpdateTransactions() will zap the
unobserved xids array with the list that includes XID 123, even though
we already saw a commit record for it.


I removed some Recovery Proc related crud that was still in the patch
but unused. Merge from the "hs" branch at
git://git.postgresql.org/git/users/heikki/postgres.git to get that change.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Mon, Aug 17, 2009 at 4:19 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> I had some review comments
>> I was hoping to get responses to, in the section beginning with "A few
>> other comments based on a preliminary reading of this patch":
>>
>> http://archives.postgresql.org/pgsql-hackers/2009-07/msg00854.php
>
> Having read the patch now, here's a one issue in addition to the remarks
> you made in mail linked above, and all the things already marked with
> XXX comments:

I'll work on cleaning some of that up.

> I think there's a race condition in the way LogCurrentRunningXacts() is
> called at the end of checkpoint. This can happen in the master:
>
> 1. Checkpoint starts
> 2. Transaction 123 begins, and does some updates
> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
> 4. LogCurrentRunningXacts() gets the list of currently running
> transactions by calling GetCurrentTransactionData().
> 5. Transaction 123 ends, writing commit record to WAL
> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
> includes XID 123, since that was still running at step 4.
>
> When that is replayed, ProcArrayUpdateTransactions() will zap the
> unobserved xids array with the list that includes XID 123, even though
> we already saw a commit record for it.

Sounds like we need some locking there, then.  This exceeds my current
depth of understanding of the patch, but I'll see if I can figure it
out.

> I removed some Recovery Proc related crud that was still in the patch
> but unused. Merge from the "hs" branch at
> git://git.postgresql.org/git/users/heikki/postgres.git to get that change.

Thanks, merged.

...Robert


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Mon, Aug 17, 2009 at 6:50 AM, Robert Haas<robertmhaas@gmail.com> wrote:
>> I think there's a race condition in the way LogCurrentRunningXacts() is
>> called at the end of checkpoint. This can happen in the master:
>>
>> 1. Checkpoint starts
>> 2. Transaction 123 begins, and does some updates
>> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
>> 4. LogCurrentRunningXacts() gets the list of currently running
>> transactions by calling GetCurrentTransactionData().
>> 5. Transaction 123 ends, writing commit record to WAL
>> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
>> includes XID 123, since that was still running at step 4.
>>
>> When that is replayed, ProcArrayUpdateTransactions() will zap the
>> unobserved xids array with the list that includes XID 123, even though
>> we already saw a commit record for it.
>
> Sounds like we need some locking there, then.  This exceeds my current
> depth of understanding of the patch, but I'll see if I can figure it
> out.

I looked at this a little more.  I'm wondering if we can fix this by
making ProcArrayUpdateRecoveryTransactions() smarter.  Can we just
refuse to add an Xid to the UnobservedXids() array if in fact we've
already observed it?  (Not sure how to check that.) Fixing this on the
master would seem to require acquiring the WALInsertLock before
calling GetRunningTransactionData() and holding it until we finish
writing that data to WAL, which I suspect someone's going to complain
about...

For the archives, I'm also attaching a copy of this patch with my
latest (very minor) cleanups, and Heikki's.  This should apply to CVS
HEAD, if anyone wants to test.  git repo is available at:

http://git.postgresql.org/gitweb?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/hs

This is useful: git log --no-merges master...hs

...Robert

Attachment

Re: hot standby - merged up to CVS HEAD

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> On Mon, Aug 17, 2009 at 6:50 AM, Robert Haas<robertmhaas@gmail.com> wrote:
>>> I think there's a race condition in the way LogCurrentRunningXacts() is
>>> called at the end of checkpoint. This can happen in the master:
>>>
>>> 1. Checkpoint starts
>>> 2. Transaction 123 begins, and does some updates
>>> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
>>> 4. LogCurrentRunningXacts() gets the list of currently running
>>> transactions by calling GetCurrentTransactionData().
>>> 5. Transaction 123 ends, writing commit record to WAL
>>> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
>>> includes XID 123, since that was still running at step 4.
>>>
>>> When that is replayed, ProcArrayUpdateTransactions() will zap the
>>> unobserved xids array with the list that includes XID 123, even though
>>> we already saw a commit record for it.
>> Sounds like we need some locking there, then.  This exceeds my current
>> depth of understanding of the patch, but I'll see if I can figure it
>> out.
> 
> I looked at this a little more.  I'm wondering if we can fix this by
> making ProcArrayUpdateRecoveryTransactions() smarter.  Can we just
> refuse to add an Xid to the UnobservedXids() array if in fact we've
> already observed it?  (Not sure how to check that.) 

There's also the opposite problem: If a transaction starts (and writes a
WAL record) between LogCurrentRunningXacts() and XLogInstrt(), it is not
present in the RunningXacts record. When the standby replays the
RunningXacts record, it removes the XID of that transaction from the
array, even though it's still running.

> Fixing this on the
> master would seem to require acquiring the WALInsertLock before
> calling GetRunningTransactionData() and holding it until we finish
> writing that data to WAL, which I suspect someone's going to complain
> about...

Yeah, it's hard to get that locking right without risking deadlock. As
the patch stands, we only write a RunningXacts record once per
checkpoint, so it's not performance critical, but we must avoid deadlocks.

If there's a way, I would prefer a solution where the RunningXacts
snapshot represents the situation the moment it appears in WAL, not some
moment before it. It makes the logic easier to understand.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: hot standby - merged up to CVS HEAD

From
Robert Haas
Date:
On Thu, Aug 20, 2009 at 1:55 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
>>>> When that is replayed, ProcArrayUpdateTransactions() will zap the
>>>> unobserved xids array with the list that includes XID 123, even though
>>>> we already saw a commit record for it.
>>
>> I looked at this a little more.  I'm wondering if we can fix this by
>> making ProcArrayUpdateRecoveryTransactions() smarter.  Can we just
>> refuse to add an Xid to the UnobservedXids() array if in fact we've
>> already observed it?  (Not sure how to check that.)
>
> There's also the opposite problem: If a transaction starts (and writes a
> WAL record) between LogCurrentRunningXacts() and XLogInstrt(), it is not
> present in the RunningXacts record. When the standby replays the
> RunningXacts record, it removes the XID of that transaction from the
> array, even though it's still running.

Yep, Simon appears to have contemplated that problem - see comments in
ProcArrayUpdateRecoveryTransactions().

>> Fixing this on the
>> master would seem to require acquiring the WALInsertLock before
>> calling GetRunningTransactionData() and holding it until we finish
>> writing that data to WAL, which I suspect someone's going to complain
>> about...
>
> Yeah, it's hard to get that locking right without risking deadlock. As
> the patch stands, we only write a RunningXacts record once per
> checkpoint, so it's not performance critical, but we must avoid deadlocks.
>
> If there's a way, I would prefer a solution where the RunningXacts
> snapshot represents the situation the moment it appears in WAL, not some
> moment before it. It makes the logic easier to understand.

I think this is going to be difficult.  At a preliminary look, it
seems to require taking a sledgehammer to the abstraction layer
encapsulated by XLogInsert().  It's also going to require holding both
ProcArrayLock and WALInsertLock simultaneously.  I'm not sure where
the risk of deadlock comes in - we just have to define a rule (or
maintain the existing rule) about which order to acquire those two
locks in. But I'm guessing the existing rule is along the lines of
"Don't do that, or Tom Lane will reject your patch and I'll you'll get
is this stupid T-shirt."

http://img199.yfrog.com/i/b9w.jpg/

...Robert


Re: hot standby - merged up to CVS HEAD

From
Simon Riggs
Date:
On Mon, 2009-08-17 at 11:19 +0300, Heikki Linnakangas wrote:

> I think there's a race condition in the way LogCurrentRunningXacts() is
> called at the end of checkpoint. This can happen in the master:
> 
> 1. Checkpoint starts
> 2. Transaction 123 begins, and does some updates
> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
> 4. LogCurrentRunningXacts() gets the list of currently running
> transactions by calling GetCurrentTransactionData().
> 5. Transaction 123 ends, writing commit record to WAL
> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
> includes XID 123, since that was still running at step 4.
> 
> When that is replayed, ProcArrayUpdateTransactions() will zap the
> unobserved xids array with the list that includes XID 123, even though
> we already saw a commit record for it.

That's not a race condition, but it does make the code more complex. The
issue has been long understood.

I don't think it's acceptable to take and hold both ProcArray and
WALInsertLock. Those are now the two most heavily contended locks on the
system. We have evidence that there are burst delays associated with
various operations on just one of those locks, let alone two.

If you're still doubtful, the problem I've been working on recently is
the point that I overlooked the initial state of the lock table in my
earlier patch. GetRunningTransactionData() also needs to have initial
lock data.

There is no way in hell that I could personally condone holding
ProcArrayLock, WALInsertLock and all of the LockMgrLock partitions at
same time. So we just have to eat the complexity. (No doubt someone will
disagree with my strong language here, but please take it as an
indication of exactly how bad an idea holding multiple locks will be).

Slight timing issues are not too bad really. We just have to be careful
to assume that there is a mismatch in the data and must have code to
handle that.

Anyway, I've been working on this problem for some time and continue to
do so.

-- Simon Riggs           www.2ndQuadrant.com



Re: hot standby - merged up to CVS HEAD

From
David Fetter
Date:
On Thu, Aug 27, 2009 at 07:08:28PM +0100, Simon Riggs wrote:
> 
> On Mon, 2009-08-17 at 11:19 +0300, Heikki Linnakangas wrote:
> 
> > I think there's a race condition in the way LogCurrentRunningXacts() is
> > called at the end of checkpoint. This can happen in the master:
> > 
> > 1. Checkpoint starts
> > 2. Transaction 123 begins, and does some updates
> > 3. Checkpoint ends. LogCurrentRunningXacts() is called.
> > 4. LogCurrentRunningXacts() gets the list of currently running
> > transactions by calling GetCurrentTransactionData().
> > 5. Transaction 123 ends, writing commit record to WAL
> > 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
> > includes XID 123, since that was still running at step 4.
> > 
> > When that is replayed, ProcArrayUpdateTransactions() will zap the
> > unobserved xids array with the list that includes XID 123, even though
> > we already saw a commit record for it.
> 
> That's not a race condition, but it does make the code more complex. The
> issue has been long understood.
> 
> I don't think it's acceptable to take and hold both ProcArray and
> WALInsertLock. Those are now the two most heavily contended locks on the
> system. We have evidence that there are burst delays associated with
> various operations on just one of those locks, let alone two.
> 
> If you're still doubtful, the problem I've been working on recently is
> the point that I overlooked the initial state of the lock table in my
> earlier patch. GetRunningTransactionData() also needs to have initial
> lock data.
> 
> There is no way in hell that I could personally condone holding
> ProcArrayLock, WALInsertLock and all of the LockMgrLock partitions at
> same time. So we just have to eat the complexity. (No doubt someone will
> disagree with my strong language here, but please take it as an
> indication of exactly how bad an idea holding multiple locks will be).
> 
> Slight timing issues are not too bad really. We just have to be careful
> to assume that there is a mismatch in the data and must have code to
> handle that.
> 
> Anyway, I've been working on this problem for some time and continue to
> do so.

Great!  Where's the git repository?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate