Thread: Streaming replication and postmaster signaling

Streaming replication and postmaster signaling

From
Heikki Linnakangas
Date:
Looking at the latest streaming replication patch, I don't much like the
signaling between WAL sender and postmaster. It seems complicated, and
as a rule of thumb postmaster shouldn't be accessing shared memory. The
current signaling is:

1. A new connection arrives. A new backend process is forked forked like
for a normal connection.
2. When the new process is done with the initialization, it allocates
itself a slot from WalSndCtlData shared memory array. It marks its pid
there, sets registered = false, and signals postmaster with
PMSIGNAL_REGISTER_WALSENDER
3. Upon receiving that signal, postmaster scans the WalSndCtlData array
looking for entries with registered==false. For such entries, it scans
the postmaster-private backend list for a matching entry with the same
pid, marks the entry in the list as a walsender, and sets
registered=true in the shared memory entry.

This way postmaster knows which child processes are walsenders, when
it's time to signal them.

I think it would be better to utilize the existing array of child
processes in pmsignal.c. Instead of having postmaster peek into
WalSndCtlData, let's add a new state to PMChildFlags,
PM_CHILD_WALSENDER, which is just like PM_CHILD_ACTIVE but tells
postmaster that the child is not a normal backend but a walsender.

I've done that in my git branch.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and postmaster signaling

From
Alvaro Herrera
Date:
Heikki Linnakangas escribió:
> Looking at the latest streaming replication patch, I don't much like the
> signaling between WAL sender and postmaster. It seems complicated, and
> as a rule of thumb postmaster shouldn't be accessing shared memory. The
> current signaling is:
> 
> 1. A new connection arrives. A new backend process is forked forked like
> for a normal connection.

This was probably discussed to death earlier, but: why was it decided to
not simply use a different port for listening for walsender
connections?


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Streaming replication and postmaster signaling

From
Fujii Masao
Date:
On Tue, Jan 5, 2010 at 11:07 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I think it would be better to utilize the existing array of child
> processes in pmsignal.c. Instead of having postmaster peek into
> WalSndCtlData, let's add a new state to PMChildFlags,
> PM_CHILD_WALSENDER, which is just like PM_CHILD_ACTIVE but tells
> postmaster that the child is not a normal backend but a walsender.

Seems good.

> I've done that in my git branch.

Could you push that git branch to a public place?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and postmaster signaling

From
Fujii Masao
Date:
On Tue, Jan 5, 2010 at 11:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> This was probably discussed to death earlier, but: why was it decided to
> not simply use a different port for listening for walsender
> connections?

I believe that using a different port would make the setup
of replication messier; look for the unused port number,
open that port for replication in the firewall, etc.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and postmaster signaling

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
>> I've done that in my git branch.
> 
> Could you push that git branch to a public place?

Ahh, sorry, forgot that again. It's there now, at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
'replication'.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and postmaster signaling

From
Craig Ringer
Date:
Fujii Masao wrote:
> On Tue, Jan 5, 2010 at 11:29 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> This was probably discussed to death earlier, but: why was it decided to
>> not simply use a different port for listening for walsender
>> connections?
> 
> I believe that using a different port would make the setup
> of replication messier; look for the unused port number,
> open that port for replication in the firewall, etc.

Actually, being able to firewall walsender traffic separately might be
rather handy.

Having to assign a different port wouldn't be fun for packagers, though,
especially those (like the Debian-derived Linux distros) who already try
to support more than one Pg version installed in parallel.

--
Craig Ringer


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> Fujii Masao wrote:
>> On Tue, Jan 5, 2010 at 11:29 PM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>>> This was probably discussed to death earlier, but: why was it decided to
>>> not simply use a different port for listening for walsender
>>> connections?
>> 
>> I believe that using a different port would make the setup
>> of replication messier; look for the unused port number,
>> open that port for replication in the firewall, etc.

> Actually, being able to firewall walsender traffic separately might be
> rather handy.

> Having to assign a different port wouldn't be fun for packagers, though,

Well, we'd have to get a port number officially assigned by IANA.

I tend to agree that the management overhead of a second port isn't
worth it.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
On Wed, Jan 6, 2010 at 3:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>> I've done that in my git branch.
>>
>> Could you push that git branch to a public place?
>
> Ahh, sorry, forgot that again. It's there now, at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch
> 'replication'.

I'm feeling like we're running out of time to get this committed.
Committing large patches late in the release cycle is a recipe for a
buggy beta, possibly a long beta, and a buggy release, and we're now
down to 8 days before the start of the final CommitFest, after which
our schedule indicates that we expect to put out an alpha and a beta
relatively quickly.  If this isn't ready to go, maybe we need to
postpone it to 8.6.  We've already had a bunch of bug reports (some of
which have been fixed) as a result of HS, and I don't see any reason
to believe that this isn't going to have the same problem.

Personally, I would rather have a release without SR in June or July
than a release with SR in August or September.  We already have too
many good features in the tree to hold up the whole process for
patches that aren't ready yet - though like everyone else, I think
this is a killer feature.

Thoughts?

...Robert


Re: Streaming replication and postmaster signaling

From
Devrim GÜNDÜZ
Date:
On Thu, 2010-01-07 at 11:55 -0500, Robert Haas wrote:

> Personally, I would rather have a release without SR in June or July
> than a release with SR in August or September.

If SR will be ready until then, I'd like to see a release in September
which has SR in it. We already postponed SR a lot. Many of advocacy
people including me already mentioned about SR, and many people are
lookig after it. BTW, July probably won't be a good time for a new
release, because of people's holidays.

...and maybe then we can start 8.5 -> 9.0 thread.

Regards,
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: Streaming replication and postmaster signaling

From
Magnus Hagander
Date:
2010/1/7 Devrim GÜNDÜZ <devrim@gunduz.org>:
> On Thu, 2010-01-07 at 11:55 -0500, Robert Haas wrote:
>
>> Personally, I would rather have a release without SR in June or July
>> than a release with SR in August or September.

June, yes. July, frankly, no, because July == September, when it comes
to any such scheduling. At least in the countries where my clients are
:)


> If SR will be ready until then, I'd like to see a release in September
> which has SR in it. We already postponed SR a lot. Many of advocacy
> people including me already mentioned about SR, and many people are
> lookig after it. BTW, July probably won't be a good time for a new
> release, because of people's holidays.

-1. Frankly, if advocacy people said it would be there, they didn't
tell the truth, and that's their problem. If they said "hopefully it
will be there, but we don't know yet", then they don't have a problem
either way.

Not having our release schedule driven by marketing is a *strength* of
our project!

We made the mistake last time to delay the release significantly for a
single feature. It turned out said feature didn't make it *anyway*.
Let's not repeat that mistake.


> ...and maybe then we can start 8.5 -> 9.0 thread.

....


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Streaming replication and postmaster signaling

From
Andres Freund
Date:
On Thursday 07 January 2010 18:10:43 Magnus Hagander wrote:
> Not having our release schedule driven by marketing is a *strength* of
> our project!
Yes.

> We made the mistake last time to delay the release significantly for a
> single feature. It turned out said feature didn't make it *anyway*.
> Let's not repeat that mistake.
I would consider SR to be significantly less complex than HS though.
What about giving it two weeks from now on to be in a comittable state? Last 
time the main discussion started a good while *after* the last commitfest...

Andres


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
2010/1/7 Magnus Hagander <magnus@hagander.net>:
> 2010/1/7 Devrim GÜNDÜZ <devrim@gunduz.org>:
>> On Thu, 2010-01-07 at 11:55 -0500, Robert Haas wrote:
>>
>>> Personally, I would rather have a release without SR in June or July
>>> than a release with SR in August or September.
>
> June, yes. July, frankly, no, because July == September, when it comes
> to any such scheduling. At least in the countries where my clients are
> :)

In terms of when the release comes out, maybe.  In terms of the NEXT
release, it still matters.  If the release is delayed, the first
CommitFest of the next release will be that much later.  If we put out
a release by July 1 of this year, we can repeat the same schedule for
the next release that we are using for this release and I will be
happy with that.  If we don't put out a release until September, our
first CommitFest will be at least 2 months later than it was for the
last one, which means that (1) we will have a gap of 8 months without
a CommitFest and (2) 8.6 will have no chance of coming out before
September 2011, and may end up being more like Thanksgiving if that
one also slips.

I really don't want to go 8 months with no CommitFest.  That leads to
too many patches in the queue, too many merge conflicts, too many
patch authors who just plain give up, and no feedback to anyone for a
very, very long time.

>> If SR will be ready until then, I'd like to see a release in September
>> which has SR in it. We already postponed SR a lot. Many of advocacy
>> people including me already mentioned about SR, and many people are
>> lookig after it. BTW, July probably won't be a good time for a new
>> release, because of people's holidays.
>
> -1. Frankly, if advocacy people said it would be there, they didn't
> tell the truth, and that's their problem. If they said "hopefully it
> will be there, but we don't know yet", then they don't have a problem
> either way.
>
> Not having our release schedule driven by marketing is a *strength* of
> our project!
>
> We made the mistake last time to delay the release significantly for a
> single feature. It turned out said feature didn't make it *anyway*.
> Let's not repeat that mistake.

Indeed.

...Robert


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> We made the mistake last time to delay the release significantly for a
> single feature. It turned out said feature didn't make it *anyway*.
> Let's not repeat that mistake.

Yeah, we've certainly learned that lesson often enough, or should I say
failed to learn that lesson?

However, HS is already in the tree, and HS without SR is a whole lot
less compelling than HS with SR.  So it's going to be pretty
unsatisfying if we can't get SR in there.

I read Robert's original question not so much as a proposal to slip the
schedule to accommodate SR as a question about whether SR could still
meet the current schedule.  I think we ought to get that answered before
we start debating schedule changes.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> However, HS is already in the tree, and HS without SR is a whole lot
> less compelling than HS with SR.  So it's going to be pretty
> unsatisfying if we can't get SR in there.

I don't think that's the case. Having HS alone would be a huge win,
and the sooner we can get it out there the better. Those that are
waiting for SR might have to wait one more version, but my intuition
tells me that's a small minority compared to those waiting for HS.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201001071231
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAktGGoMACgkQvJuQZxSWSsj74ACgmjeQgRIAncQiCeQ5aaEeWI3y
UHMAoOFWsCldiRzC0GJygwDdYXLGjE4O
=YAwK
-----END PGP SIGNATURE-----




Re: Streaming replication and postmaster signaling

From
Stefan Kaltenbrunner
Date:
Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
> 
> 
>> However, HS is already in the tree, and HS without SR is a whole lot
>> less compelling than HS with SR.  So it's going to be pretty
>> unsatisfying if we can't get SR in there.
> 
> I don't think that's the case. Having HS alone would be a huge win,
> and the sooner we can get it out there the better. Those that are
> waiting for SR might have to wait one more version, but my intuition
> tells me that's a small minority compared to those waiting for HS.

while I agree that HS is very useful without SR, I think that it's 
mostly the well known powerusers inthe community are actively waiting 
for HS and not so much for SR. For the typical user outside of -hackers 
or even -general I'm not so sure about that...


Stefan


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
On Thu, Jan 7, 2010 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> We made the mistake last time to delay the release significantly for a
>> single feature. It turned out said feature didn't make it *anyway*.
>> Let's not repeat that mistake.
>
> Yeah, we've certainly learned that lesson often enough, or should I say
> failed to learn that lesson?

I think the latter phrasing is more accurate.

> However, HS is already in the tree, and HS without SR is a whole lot
> less compelling than HS with SR.  So it's going to be pretty
> unsatisfying if we can't get SR in there.
>
>
> I read Robert's original question not so much as a proposal to slip the
> schedule to accommodate SR as a question about whether SR could still
> meet the current schedule.  I think we ought to get that answered before
> we start debating schedule changes.

Unfortunately, we've also discovered from hard experience that the
timing of commits is difficult to predict unless the answer is
something like "today" or "tomorrow".  I'm not terribly interested in
an estimate of when this will be committed if it's much more distant
than that because experience indicates that such estimates are
typically inaccurate, usually on the optimistic side.  I seem to
recall Heikki estimating two weeks for SR about this time last year,
and of course it took a lot longer than that, even if you subtract out
the breaks in the action.  That's not because Heikki is a bad
estimator; it's just that estimating how long a particular piece of
code will take to finish is extremely difficult and almost no one can
do it with any degree of accuracy.  It is the things the programmer
can't foresee that push out the end date, and of course you can't know
how many of those there will be.

I like Andres' suggestion upthread of setting a deadline and
determining to bounce the patch if it's not committed by that date.
If it turns out we have to bounce it, that stinks, but I don't think
it makes sense to go to beta with a huge, barely-tested pile of code
in the tree.  Not that the testing Heikki and Fujii Masao have been
doing until now hasn't been good, but it's not nearly as rigorous as
what we will get when all of our users start banging on it.

The problem with even TALKING about changing the schedule is that we
will have no idea what to change it TO.  If we add two months to the
schedule today, that will probably increase the chances of SR getting
committed within that time frame (unless, of course, Heikki's employer
uses that as an excuse to take him off the project for two months...)
but we don't know how much because we can't predict how long it's
going to take to be ready.  If someone could show us a curve with
probability on one axis and commit date on the other axis we could
probably make a good decision about where to slice it off, but that
isn't possible.

...Robert


Re: Streaming replication and postmaster signaling

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> while I agree that HS is very useful without SR, I think that it's
> mostly the well known powerusers inthe community are actively waiting
> for HS and not so much for SR. For the typical user outside of -hackers
> or even -general I'm not so sure about that...

Well, I can state that we have plenty of clients that would be very
interested in HS, but none that would really care if it came without
SR. This power user knows a lot of people outside of -hackers and
- -general and they are what I'm basing my opinion on. :)

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201001071303
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAktGIjEACgkQvJuQZxSWSsgT8gCgsfgjp+1ND312KXtExdqtlDRy
tcYAnigTw1L+m4hFeT+qQ8mPHRitn78V
=b+Vn
-----END PGP SIGNATURE-----




Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I like Andres' suggestion upthread of setting a deadline and
> determining to bounce the patch if it's not committed by that date.
> If it turns out we have to bounce it, that stinks, but I don't think
> it makes sense to go to beta with a huge, barely-tested pile of code
> in the tree.  Not that the testing Heikki and Fujii Masao have been
> doing until now hasn't been good, but it's not nearly as rigorous as
> what we will get when all of our users start banging on it.

This argument would hold more water if there weren't *already* a huge,
barely-tested pile of code in the tree, namely HS.  If you think that's
anywhere near ready to go to beta, I'm afraid I'd better disillusion
you immediately.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
On Thu, Jan 7, 2010 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I like Andres' suggestion upthread of setting a deadline and
>> determining to bounce the patch if it's not committed by that date.
>> If it turns out we have to bounce it, that stinks, but I don't think
>> it makes sense to go to beta with a huge, barely-tested pile of code
>> in the tree.  Not that the testing Heikki and Fujii Masao have been
>> doing until now hasn't been good, but it's not nearly as rigorous as
>> what we will get when all of our users start banging on it.
>
> This argument would hold more water if there weren't *already* a huge,
> barely-tested pile of code in the tree, namely HS.  If you think that's
> anywhere near ready to go to beta, I'm afraid I'd better disillusion
> you immediately.

That may well be so, but adding another one is not going to improve
the situation even a little bit.  I don't think what you're saying
weakens in the slightest the argument that I was making, namely, that
if this isn't committed RSN it should be postponed to 8.6.  Do you
disagree?

...Robert


Re: Streaming replication and postmaster signaling

From
Bruce Momjian
Date:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I like Andres' suggestion upthread of setting a deadline and
> > determining to bounce the patch if it's not committed by that date.
> > If it turns out we have to bounce it, that stinks, but I don't think
> > it makes sense to go to beta with a huge, barely-tested pile of code
> > in the tree.  Not that the testing Heikki and Fujii Masao have been
> > doing until now hasn't been good, but it's not nearly as rigorous as
> > what we will get when all of our users start banging on it.
> 
> This argument would hold more water if there weren't *already* a huge,
> barely-tested pile of code in the tree, namely HS.  If you think that's
> anywhere near ready to go to beta, I'm afraid I'd better disillusion
> you immediately.

I agree with Tom's analysis.  HS is very complex, while SR is more
mechanical.  We might find that in the end SR was stable before HS.
I think we should stay on course and see where we are when Heikki is
ready for a commit of SR.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> That may well be so, but adding another one is not going to improve
> the situation even a little bit.  I don't think what you're saying
> weakens in the slightest the argument that I was making, namely, that
> if this isn't committed RSN it should be postponed to 8.6.  Do you
> disagree?

Well, the argument to my mind is about a suitable value of "RSN".
I think you were stating that we should bounce SR if it's not committed
before the final commitfest starts (ie, next week).  I think we can give
it more slack than that.  Maybe the end of the fest (where the length of
the fest is determined by the other open patches)?
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Josh Berkus
Date:
> Well, the argument to my mind is about a suitable value of "RSN".
> I think you were stating that we should bounce SR if it's not committed
> before the final commitfest starts (ie, next week).  I think we can give
> it more slack than that.  Maybe the end of the fest (where the length of
> the fest is determined by the other open patches)?

Yes.  I think there's tremendous value to PG if we could get HS+SR into
8.5.  And I know that SR is what Heikki is working on exclusively.

If we find that working on SR is causing us not to have time for other
patch review, then we can revisit the decision.

--Josh Berkus


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
On Thu, Jan 7, 2010 at 2:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> That may well be so, but adding another one is not going to improve
>> the situation even a little bit.  I don't think what you're saying
>> weakens in the slightest the argument that I was making, namely, that
>> if this isn't committed RSN it should be postponed to 8.6.  Do you
>> disagree?
>
> Well, the argument to my mind is about a suitable value of "RSN".
> I think you were stating that we should bounce SR if it's not committed
> before the final commitfest starts (ie, next week).  I think we can give
> it more slack than that.  Maybe the end of the fest (where the length of
> the fest is determined by the other open patches)?

We did that for 8.4 and I don't think it worked very well.  I think we
should have a HARD cutoff of one month for this CommitFest, just as we
have had for the other ones this cycle.  Anything we can't get through
by February 15th and isn't a release-blocker should just be pushed out
to 8.6.  It is not as if we have a big backlog of patches carried over
from previous CommitFests; we have only two of any size, that I am
aware of: SR and listen/notify.  Anything else that is adversely
affected by this policy is something that was submitted for only the
last CommitFest, and as long as we give those patches a fair review
and good feedback before bouncing them, I don't think we should feel
bad about postponing the ones that are not ready.

If you accept that principle then the question is whether SR should
have a different cut-off than the other patches in the CommitFest.  I
was being a bit coy about my position on that topic in my original
message and I'm still of two minds about it, but your mind-reading is
basically correct.  On the one hand, imposing an earlier deadline
might be viewed as moving the goalposts and I'm generally a big
opponent of that.  On the other hand, assuming that the amount of
community time Heikki gets is independent of what he's working on,
shooting SR through the head sooner means more time to stabilize HS,
which means an earlier release, and since there seems to be a
substantial danger that the release will be late no matter what we do,
I am tempted to say we should clamp down and go into damage control
mode sooner rather than later.

I am really reluctant to go through another cycle of giving a big
feature as much time as humanly possible before bouncing it, and then
bouncing it anyway, and I fear that is what will happen.  I don't
believe this patch has had a major rewrite since it was submitted for
the September CommitFest, and if 4 months hasn't been enough time to
get it committed, then why do we think the magic number is 5 rather
than 6 or 8 or anything else?  If someone has a concrete answer to
that question, I am all ears, but I feel like we're operating mostly
on hope.

...Robert


Re: Streaming replication and postmaster signaling

From
Josh Berkus
Date:
> I am really reluctant to go through another cycle of giving a big
> feature as much time as humanly possible before bouncing it, and then
> bouncing it anyway, and I fear that is what will happen.  I don't
> believe this patch has had a major rewrite since it was submitted for
> the September CommitFest, and if 4 months hasn't been enough time to
> get it committed, then why do we think the magic number is 5 rather
> than 6 or 8 or anything else?  If someone has a concrete answer to
> that question, I am all ears, but I feel like we're operating mostly
> on hope.

I think Heikki needs to speak to this.  I know that he's now working
hard on SR, but I have no idea what his timeline is.

--Josh Berkus


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
On Thu, Jan 7, 2010 at 1:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > I like Andres' suggestion upthread of setting a deadline and
>> > determining to bounce the patch if it's not committed by that date.
>> > If it turns out we have to bounce it, that stinks, but I don't think
>> > it makes sense to go to beta with a huge, barely-tested pile of code
>> > in the tree.  Not that the testing Heikki and Fujii Masao have been
>> > doing until now hasn't been good, but it's not nearly as rigorous as
>> > what we will get when all of our users start banging on it.
>>
>> This argument would hold more water if there weren't *already* a huge,
>> barely-tested pile of code in the tree, namely HS.  If you think that's
>> anywhere near ready to go to beta, I'm afraid I'd better disillusion
>> you immediately.
>
> I agree with Tom's analysis.  HS is very complex, while SR is more
> mechanical.  We might find that in the end SR was stable before HS.
> I think we should stay on course and see where we are when Heikki is
> ready for a commit of SR.

I think you're saying that you think we should hold up the 8.5 release
for as long as it takes to get SR committed.  If that's the case, -1
from me.

It also does not seem right to me to suppose that the amount of time
that it will take to stabilize HS is independent of when we give up on
SR.  Surely Heikki, Simon, and Fujii Masao are all key people for both
projects, no?  Tom as well, come to think of it.

...Robert


Re: Streaming replication and postmaster signaling

From
Heikki Linnakangas
Date:
Josh Berkus wrote:
> Yes.  I think there's tremendous value to PG if we could get HS+SR into
> 8.5.  And I know that SR is what Heikki is working on exclusively.

That hasn't been true for some time, I haven't spent very much time on
SR recently. Not enough, really.

But FWIW I have dedicated today and tomorrow for SR, and plan to
dedicate 2-3 days next week as well.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and postmaster signaling

From
"David E. Wheeler"
Date:
On Jan 7, 2010, at 12:10 PM, Heikki Linnakangas wrote:

> But FWIW I have dedicated today and tomorrow for SR, and plan to
> dedicate 2-3 days next week as well.

Should we then await what you determine over the next week?

Best,

David


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes:
>> However, HS is already in the tree, and HS without SR is a whole lot
>> less compelling than HS with SR.  So it's going to be pretty
>> unsatisfying if we can't get SR in there.

> I don't think that's the case. Having HS alone would be a huge win,
> and the sooner we can get it out there the better. Those that are
> waiting for SR might have to wait one more version, but my intuition
> tells me that's a small minority compared to those waiting for HS.

No, I don't think so.  HS without SR means you still have to fool with
setting up WAL-file-based replication, which despite the existence of
pg_standby is a PITA.  And you have to make a tradeoff of how often to
flush WAL files to the standby.  To be a real candidate for "it just
works" replication, we've *got* to have SR.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Magnus Hagander
Date:
On Thu, Jan 7, 2010 at 21:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Greg Sabino Mullane" <greg@turnstep.com> writes:
>>> However, HS is already in the tree, and HS without SR is a whole lot
>>> less compelling than HS with SR.  So it's going to be pretty
>>> unsatisfying if we can't get SR in there.
>
>> I don't think that's the case. Having HS alone would be a huge win,
>> and the sooner we can get it out there the better. Those that are
>> waiting for SR might have to wait one more version, but my intuition
>> tells me that's a small minority compared to those waiting for HS.
>
> No, I don't think so.  HS without SR means you still have to fool with
> setting up WAL-file-based replication, which despite the existence of
> pg_standby is a PITA.  And you have to make a tradeoff of how often to
> flush WAL files to the standby.  To be a real candidate for "it just
> works" replication, we've *got* to have SR.

Yes, but HS without SR certainly solves all the "need to offload my
reporting" kind of situations, which is still a very big thing. Yes,
it'll be much nicer with SR, but it will be *very* useful without it
as well.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Jan 7, 2010 at 21:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No, I don't think so. �HS without SR means you still have to fool with
>> setting up WAL-file-based replication, which despite the existence of
>> pg_standby is a PITA. �And you have to make a tradeoff of how often to
>> flush WAL files to the standby. �To be a real candidate for "it just
>> works" replication, we've *got* to have SR.

> Yes, but HS without SR certainly solves all the "need to offload my
> reporting" kind of situations, which is still a very big thing. Yes,
> it'll be much nicer with SR, but it will be *very* useful without it
> as well.

[ shrug... ]  To me, HS+SR is actual replication, which would justify
tagging this release 9.0.  With only one of them, it's 8.5.  I
understand that there are power users who would find HS alone to be
tremendously useful, but in terms of what the average user sees, there's
a quantum difference.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Josh Berkus wrote:
>> Yes.  I think there's tremendous value to PG if we could get HS+SR into
>> 8.5.  And I know that SR is what Heikki is working on exclusively.

> That hasn't been true for some time, I haven't spent very much time on
> SR recently. Not enough, really.

> But FWIW I have dedicated today and tomorrow for SR, and plan to
> dedicate 2-3 days next week as well.

So you carefully avoided answering the question: when do you think it
might be committable?
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I am tempted to say we should clamp down and go into damage control
> mode sooner rather than later.

The more I see of the HS patch, the more I think the same.  But my
proposal for "damage control mode" would be to immediately punt
everything else to the next release and focus our energies exclusively
on HS *and* SR.  In terms of the "big picture" for the project, those
are headline items, and everything else is just trivia that the average
user won't even notice.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Andrew Dunstan
Date:

Tom Lane wrote:
>
> [ shrug... ]  To me, HS+SR is actual replication, which would justify
> tagging this release 9.0.  With only one of them, it's 8.5.  I
> understand that there are power users who would find HS alone to be
> tremendously useful, but in terms of what the average user sees, there's
> a quantum difference.
>         
>   

Right. As someone engaged in the marketplace, I can tell you that IMNSHO 
it is almost impossible to overstate the importance of getting both of 
these features. We will suffer an enormous loss of face and respect if 
we don't.

cheers

andrew


Re: Streaming replication and postmaster signaling

From
Dave Page
Date:
On Thu, Jan 7, 2010 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> That may well be so, but adding another one is not going to improve
>> the situation even a little bit.  I don't think what you're saying
>> weakens in the slightest the argument that I was making, namely, that
>> if this isn't committed RSN it should be postponed to 8.6.  Do you
>> disagree?
>
> Well, the argument to my mind is about a suitable value of "RSN".
> I think you were stating that we should bounce SR if it's not committed
> before the final commitfest starts (ie, next week).  I think we can give
> it more slack than that.  Maybe the end of the fest (where the length of
> the fest is determined by the other open patches)?

Absolutely agree. Stretching the freeze to accomodate SR is what we
should avoid, but bumping it before it's even started is completely
over the top and would be the complete opposite of our past errors.
Especially given that Heikki is spending significant time on it right
now...

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


Re: Streaming replication and postmaster signaling

From
Dave Page
Date:
On Thu, Jan 7, 2010 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I am tempted to say we should clamp down and go into damage control
>> mode sooner rather than later.
>
> The more I see of the HS patch, the more I think the same.  But my
> proposal for "damage control mode" would be to immediately punt
> everything else to the next release and focus our energies exclusively
> on HS *and* SR.  In terms of the "big picture" for the project, those
> are headline items, and everything else is just trivia that the average
> user won't even notice.

100% agree. Whether or not individual users or their clients on this
list need both features, as a project we absolutely need to get them
out there to further boost our reputation as a credible alternative to
Oracle, SQL Server, DB2 and friends - which seem particularly
important to do whilst MySQL is in so much flux and people are
watching us closely.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


Re: Streaming replication and postmaster signaling

From
Bruce Momjian
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Thu, Jan 7, 2010 at 21:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> No, I don't think so. �HS without SR means you still have to fool with
> >> setting up WAL-file-based replication, which despite the existence of
> >> pg_standby is a PITA. �And you have to make a tradeoff of how often to
> >> flush WAL files to the standby. �To be a real candidate for "it just
> >> works" replication, we've *got* to have SR.
> 
> > Yes, but HS without SR certainly solves all the "need to offload my
> > reporting" kind of situations, which is still a very big thing. Yes,
> > it'll be much nicer with SR, but it will be *very* useful without it
> > as well.
> 
> [ shrug... ]  To me, HS+SR is actual replication, which would justify
> tagging this release 9.0.  With only one of them, it's 8.5.  I
> understand that there are power users who would find HS alone to be
> tremendously useful, but in terms of what the average user sees, there's
> a quantum difference.

No question.  We have to think of the average user when considering the
impact of these features, meaning not what _we_ are capable of doing,
but what the average user is capable of easily setting up.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Streaming replication and postmaster signaling

From
Devrim GÜNDÜZ
Date:
On Thu, 2010-01-07 at 15:49 -0500, Andrew Dunstan wrote:
> Right. As someone engaged in the marketplace, I can tell you that
> IMNSHO  it is almost impossible to overstate the importance of getting
> both of  these features. We will suffer an enormous loss of face and
> respect if we don't.

+1. That was what I was trying to say.
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: Streaming replication and postmaster signaling

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> But FWIW I have dedicated today and tomorrow for SR, and plan to
>> dedicate 2-3 days next week as well.
> 
> So you carefully avoided answering the question: when do you think it
> might be committable?

:-). I was hoping to commit it by the end of this week, but I just
bumped into issues with the postmaster state machine is broken. That's
probably going to take tomorrow to clear out. So I'm now thinking next
week. I'm quite familiar with the whole patch, so I'm not expecting to
uncover any new major issues.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and postmaster signaling

From
Robert Haas
Date:
On Thu, Jan 7, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I am tempted to say we should clamp down and go into damage control
>> mode sooner rather than later.
>
> The more I see of the HS patch, the more I think the same.  But my
> proposal for "damage control mode" would be to immediately punt
> everything else to the next release and focus our energies exclusively
> on HS *and* SR.  In terms of the "big picture" for the project, those
> are headline items, and everything else is just trivia that the average
> user won't even notice.

Hmm.  There's something to what you say, but what about the people who
were expecting their patches to be reviewed and perhaps committed in
the forthcoming CommitFest.  I proposed a schedule for this release
that involved only three CommitFests and it was rejected, so it seems
a bit unfair to pull the rug out from under people at the eleventh
hour.  Will we lose developers if we do this?

One thing we can certainly do if we decide to still have the
CommitFest is try to shift more of the committing work from you to
other committers who are not involved in the HS/SR work; I can
volunteer myself.  That would hopefully free you up to spend more time
on HS/SR.  I can also try to handle more of the minor bug-fixes, like
the bit-substring bug.  I couldn't have fixed that as quickly as you
did, but I could have fixed it, and your time is more valuable than
mine.

Unfortunately, there are some patches that I probably will not feel
confident to commit without your input - in particular, writeable
CTEs, listen/notify, more frame options in window functions - and I
venture to say there may not be too many other takers either.  So
we're going to have to confront the question of whether it's fair to
make those people wait a year.  Maybe that is the right decision and
maybe it's not, but I want to make sure we are thinking about our
developer community as well as our user community, because without
them we are dead.

...Robert


Re: Streaming replication and postmaster signaling

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 7, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I am tempted to say we should clamp down and go into damage control
>>> mode sooner rather than later.
>> 
>> The more I see of the HS patch, the more I think the same. �But my
>> proposal for "damage control mode" would be to immediately punt
>> everything else to the next release and focus our energies exclusively
>> on HS *and* SR.

> Hmm.  There's something to what you say, but what about the people who
> were expecting their patches to be reviewed and perhaps committed in
> the forthcoming CommitFest.  I proposed a schedule for this release
> that involved only three CommitFests and it was rejected, so it seems
> a bit unfair to pull the rug out from under people at the eleventh
> hour.  Will we lose developers if we do this?

Well, I think we should put at least some effort into clearing out
the underbrush.  There's a lot of pretty small stuff in the January CF
list that I think we could go through in a short time.  The biggies,
IMO, are the ones you noted:

> Unfortunately, there are some patches that I probably will not feel
> confident to commit without your input - in particular, writeable
> CTEs, listen/notify, more frame options in window functions -

and Teodor's GIN/GIST stuff.  If we feel that we are in schedule
trouble I think that bumping these to the next release is by far
the sanest response.  Bumping SR so we can get these in would be
completely misguided.
        regards, tom lane


Re: Streaming replication and postmaster signaling

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Hmm.  There's something to what you say, but what about the people who
> were expecting their patches to be reviewed and perhaps committed in
> the forthcoming CommitFest.  I proposed a schedule for this release
> that involved only three CommitFests and it was rejected, so it seems
> a bit unfair to pull the rug out from under people at the eleventh
> hour.  Will we lose developers if we do this?

Well the RRR people will not be able to help much with SR, will we?

So I'm not sure about what you say, but running the commitfest as usual
seems entirely feasible while continuing the efforts on SR. Now, only
the last action item of the commitfest is to be spoken of, namely the
one we always struggle with: finding commiter time to finish up the
work. I guess the 4 new commiters will help, even if I guess Simon will
be exclusively focused on HS+SR issues and review.

> Unfortunately, there are some patches that I probably will not feel
> confident to commit without your input - in particular, writeable
> CTEs, listen/notify, more frame options in window functions - and I
> venture to say there may not be too many other takers either.  So
> we're going to have to confront the question of whether it's fair to
> make those people wait a year.  Maybe that is the right decision and
> maybe it's not, but I want to make sure we are thinking about our
> developer community as well as our user community, because without
> them we are dead.

What about asking for input from authors themselves? Like would you be
really upset if we had SR in 8.5, surely meaning lots of new users (and
development contract opportunities), at the cost of not being able to
properly review your work and postponing it to 8.6?

That's a hard attitude, but it's not clear for me how to avoid it, and
if as a project there's a better way to face the issue. I've been bitten
with cultural issues before, so if you find this utterly harsh to the
point of being shoked, please accept my excuses, I'm not able to propose
something better on the practical front.

Regards,
-- 
dim


Re: Streaming replication and postmaster signaling

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> No, I don't think so.  HS without SR means you still have to fool with
> setting up WAL-file-based replication, which despite the existence of
> pg_standby is a PITA.  And you have to make a tradeoff of how often to
> flush WAL files to the standby.  To be a real candidate for "it just
> works" replication, we've *got* to have SR.

There are also walmgr.py from Skytools and pitrtools from CMD, both of
them are simpler to install and get working. In my view the big ticket
with SR would be the synchronous part for full HA setup, without that I
guess walmgr+HS is plenty good enough. But as a project that still means
having to get an external piece of software to operate replication, so
that's still not "PostgreSQL 8.5 comes with replication support."

Regards,
-- 
dim


Re: Streaming replication and postmaster signaling

From
David Fetter
Date:
On Thu, Jan 07, 2010 at 03:32:28PM -0500, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Thu, Jan 7, 2010 at 21:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> No, I don't think so. �HS without SR means you still have to fool
> >> with setting up WAL-file-based replication, which despite the
> >> existence of pg_standby is a PITA. �And you have to make a
> >> tradeoff of how often to flush WAL files to the standby. �To be a
> >> real candidate for "it just works" replication, we've *got* to
> >> have SR.
> 
> > Yes, but HS without SR certainly solves all the "need to offload
> > my reporting" kind of situations, which is still a very big thing.
> > Yes, it'll be much nicer with SR, but it will be *very* useful
> > without it as well.
> 
> [ shrug... ]  To me, HS+SR is actual replication, which would
> justify tagging this release 9.0.  With only one of them, it's 8.5.
> I understand that there are power users who would find HS alone to
> be tremendously useful, but in terms of what the average user sees,
> there's a quantum difference.

By, "quantum," do you mean, "the smallest possible?"

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Streaming replication and postmaster signaling

From
Stefan Kaltenbrunner
Date:
Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
> 
> 
>> while I agree that HS is very useful without SR, I think that it's
>> mostly the well known powerusers inthe community are actively waiting
>> for HS and not so much for SR. For the typical user outside of -hackers
>> or even -general I'm not so sure about that...
> 
> Well, I can state that we have plenty of clients that would be very
> interested in HS, but none that would really care if it came without
> SR. This power user knows a lot of people outside of -hackers and
> - -general and they are what I'm basing my opinion on. :)

that is more or less my point but maybe I phrased it badly :) Real 
database powerusers are mostly interested in HS. THe people that are 
coming from a different background or maybe MySQL are likely going to 
find something that requires doing a base backup, implementing wal 
shipping and installing pg_standby(or some other custom scripts) akward...


Stefan


Re: Streaming replication and postmaster signaling

From
Alvaro Herrera
Date:
David Fetter wrote:
> On Thu, Jan 07, 2010 at 03:32:28PM -0500, Tom Lane wrote:

> > [ shrug... ]  To me, HS+SR is actual replication, which would
> > justify tagging this release 9.0.  With only one of them, it's 8.5.
> > I understand that there are power users who would find HS alone to
> > be tremendously useful, but in terms of what the average user sees,
> > there's a quantum difference.
> 
> By, "quantum," do you mean, "the smallest possible?"

When you're an electron, a quantum leap is enormous ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.