Thread: alpha3 release schedule?

alpha3 release schedule?

From
Peter Eisentraut
Date:
Do people want more time to play with hot standby?  Otherwise alpha3
should go out on Monday or Tuesday.



Re: alpha3 release schedule?

From
Hiroyuki Yamada
Date:
>Do people want more time to play with hot standby?  Otherwise alpha3
>should go out on Monday or Tuesday.
>

Well, I want to know whether the problem I refered to 
in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php
is must-fix or not.

This problem is a corollary of the deadlock problem. This is less catstrophic
but more likely to happen.

If you leave this problem, for example, any long-running transactions,
holding any cursors in whatever tables, have a possibility of freezing
whole recovery work in HotStandby node until the transaction commit.


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: alpha3 release schedule?

From
Robert Haas
Date:
On Sat, Dec 19, 2009 at 7:20 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Do people want more time to play with hot standby?  Otherwise alpha3
> should go out on Monday or Tuesday.

I think we should try to wrap it promptly.  It's true that Hot Standby
almost certainly has bugs and/or annoying limitations, as one would
expect with a feature of this magnitude, but I think we'll get a
better idea what they are and which ones are the most important by
getting something out there for people to test.  AIUI, the reason why
Simon has been busting ass to get this committed is precisely so that
it could go into alpha3 and get more testing, and speaking in my
capacity as a guy who is anal about the schedule, I couldn't be
happier about that! Postponing alpha3 would seem to defeat the purpose
of all that hard work.

...Robert


Re: alpha3 release schedule?

From
Tom Lane
Date:
Hiroyuki Yamada <yamada@kokolink.net> writes:
> Well, I want to know whether the problem I refered to 
> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php
> is must-fix or not.

> This problem is a corollary of the deadlock problem. This is less catstrophic
> but more likely to happen.

> If you leave this problem, for example, any long-running transactions,
> holding any cursors in whatever tables, have a possibility of freezing
> whole recovery work in HotStandby node until the transaction commit.

Seems like something we should fix ASAP, but I do not see why it need
hold up an alpha release.  Alpha releases are expected to have bugs,
and this one doesn't look like it would stop people from finding
other bugs.
        regards, tom lane


Re: alpha3 release schedule?

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Hiroyuki Yamada <yamada@kokolink.net> writes:
>> Well, I want to know whether the problem I refered to 
>> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php
>> is must-fix or not.
> 
>> This problem is a corollary of the deadlock problem. This is less catstrophic
>> but more likely to happen.
> 
>> If you leave this problem, for example, any long-running transactions,
>> holding any cursors in whatever tables, have a possibility of freezing
>> whole recovery work in HotStandby node until the transaction commit.
> 
> Seems like something we should fix ASAP, but I do not see why it need
> hold up an alpha release.  Alpha releases are expected to have bugs,
> and this one doesn't look like it would stop people from finding
> other bugs.

yeah afaik alpha tarballs are a forma of a checkpoint at the end of a 
commitfest to get people a reasonable testing target. Every feature (not 
only HS) deserves getting serious testing so I vote for getting alpha3 
out as soon as possible.


Stefan


Re: alpha3 release schedule?

From
Devrim GÜNDÜZ
Date:
On Sat, 2009-12-19 at 18:12 +0100, Stefan Kaltenbrunner wrote:
> > Seems like something we should fix ASAP, but I do not see why it
> need
> > hold up an alpha release.  Alpha releases are expected to have bugs,
> > and this one doesn't look like it would stop people from finding
> > other bugs.
>
> yeah afaik alpha tarballs are a forma of a checkpoint at the end of a
> commitfest to get people a reasonable testing target. Every feature
> (not
> only HS) deserves getting serious testing so I vote for getting
> alpha3
> out as soon as possible.
>
>

+1 for both.
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: alpha3 release schedule?

From
Hiroyuki Yamada
Date:
>Hiroyuki Yamada <yamada@kokolink.net> writes:
>> Well, I want to know whether the problem I refered to 
>> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php
>> is must-fix or not.
>
>> This problem is a corollary of the deadlock problem. This is less catstrophic
>> but more likely to happen.
>
>> If you leave this problem, for example, any long-running transactions,
>> holding any cursors in whatever tables, have a possibility of freezing
>> whole recovery work in HotStandby node until the transaction commit.
>
>Seems like something we should fix ASAP, but I do not see why it need
>hold up an alpha release.  Alpha releases are expected to have bugs,
>and this one doesn't look like it would stop people from finding
>other bugs.
>

At the beginning of this commit fest, Heikki said in
http://archives.postgresql.org/pgsql-hackers/2009-11/msg00914.php

>Of course there should be several phases! We've *already* punted a lot
>of stuff from this first increment we're currently working on. The
>criteria for getting this first phase committed is: could we release
>with no further changes?

And other patches seem to be checked with similar criteria, as long as
I read mails in this list. So I wanted to know whether the problem is
must-fix, and if it is, why the criteria has been changed during the
commit fest.

Anyway, thanks for answering my question.


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: alpha3 release schedule?

From
Heikki Linnakangas
Date:
Hiroyuki Yamada wrote:
>> Hiroyuki Yamada <yamada@kokolink.net> writes:
>>> Well, I want to know whether the problem I refered to 
>>> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php
>>> is must-fix or not.
>>> This problem is a corollary of the deadlock problem. This is less catstrophic
>>> but more likely to happen.
>>> If you leave this problem, for example, any long-running transactions,
>>> holding any cursors in whatever tables, have a possibility of freezing
>>> whole recovery work in HotStandby node until the transaction commit.
>> Seems like something we should fix ASAP, but I do not see why it need
>> hold up an alpha release.  Alpha releases are expected to have bugs,
>> and this one doesn't look like it would stop people from finding
>> other bugs.
> 
> At the beginning of this commit fest, Heikki said in
> http://archives.postgresql.org/pgsql-hackers/2009-11/msg00914.php
> 
>> Of course there should be several phases! We've *already* punted a lot
>> of stuff from this first increment we're currently working on. The
>> criteria for getting this first phase committed is: could we release
>> with no further changes?
> 
> And other patches seem to be checked with similar criteria, as long as
> I read mails in this list. So I wanted to know whether the problem is
> must-fix, and if it is, why the criteria has been changed during the
> commit fest.

Well, that was the criteria I used to decide whether to commit or not.
Not everyone agreed to begin with, and the reason I used that criteria
was a selfish one: I didn't want to be forced to fix loose ends after
the commitfest myself. The big reason for that was that I didn't know
how much time I would have for that. I have no complaints about Simon's
commit. Knowing that I'm not on the hook to close the loose ends, I'm
very happy that it's finally in. (That doesn't mean that I'll stop
paying attention to this patch; I will do as much as I have time to.)

Regarding the bugs you found, I put them on the TODO list at
https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
category. I think they need to be fixed before final release, but
there's no need to delay the alpha release for them.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: alpha3 release schedule?

From
Hiroyuki Yamada
Date:
>Well, that was the criteria I used to decide whether to commit or not.
>Not everyone agreed to begin with, and the reason I used that criteria
>was a selfish one: I didn't want to be forced to fix loose ends after
>the commitfest myself. The big reason for that was that I didn't know
>how much time I would have for that. I have no complaints about Simon's
>commit. Knowing that I'm not on the hook to close the loose ends, I'm
>very happy that it's finally in. (That doesn't mean that I'll stop
>paying attention to this patch; I will do as much as I have time to.)
>
>Regarding the bugs you found, I put them on the TODO list at
>https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
>category. I think they need to be fixed before final release, but
>there's no need to delay the alpha release for them.
>

I never think it's selfish. But I see. Thanks for your kind reply.


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote:

> Well, that was the criteria I used to decide whether to commit or not.
> Not everyone agreed to begin with, and the reason I used that criteria
> was a selfish one: I didn't want to be forced to fix loose ends after
> the commitfest myself. The big reason for that was that I didn't know
> how much time I would have for that. I have no complaints about Simon's
> commit. Knowing that I'm not on the hook to close the loose ends, I'm
> very happy that it's finally in. (That doesn't mean that I'll stop
> paying attention to this patch; I will do as much as I have time to.)
> 
> Regarding the bugs you found, I put them on the TODO list at
> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
> category. I think they need to be fixed before final release, but
> there's no need to delay the alpha release for them.

Hmmm, well, if you are still paying attention you'll know that neither
of those issues are bugs in the code that was committed. One of them has
been fixed and the deadlock problem has a workaround applied. 

That workaround, err... works, but I accept its not ideal. But then a
few things are not ideal and it does seem unlikely that every one of
them will have a perfect fix in the next two months.

I'll change the TODO page.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Sat, 2009-12-19 at 14:20 +0200, Peter Eisentraut wrote:

> Do people want more time to play with hot standby?  Otherwise alpha3
> should go out on Monday or Tuesday.

No thanks. There were no known bugs in the code I committed, excepting
the need to address VACUUM FULL. That will take longer than two days and
isn't sufficient reason to halt Alpha3, IMHO.

If others wish to revoke, then maybe we should consider a specific Alpha
version just for Hot Standby.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote:
> I put them on the TODO list at
> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
> category.

I notice you also re-arranged other items on there, specifically the
notion that starting from a shutdown checkpoint is somehow important.
It's definitely not any kind of bug.

We've discussed this on-list and I've requested that you justify this.
So far, nothing you've said on that issue has been at all convincing for
me or others. The topic is already mentioned on the HS todo, since if
one person requests something we should track that, just in case others
eventually agree. But having said that, it clearly isn't a priority, so
rearranging the item like that was not appropriate, unless you were
thinking of doing it yourself, though that wasn't marked.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Sat, 2009-12-19 at 23:22 +0900, Hiroyuki Yamada wrote:
> >Do people want more time to play with hot standby?  Otherwise alpha3
> >should go out on Monday or Tuesday.
> >
> 
> Well, I want to know whether the problem I refered to 
> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php
> is must-fix or not.
> 
> This problem is a corollary of the deadlock problem. This is less catstrophic
> but more likely to happen.
> 
> If you leave this problem, for example, any long-running transactions,
> holding any cursors in whatever tables, have a possibility of freezing
> whole recovery work in HotStandby node until the transaction commit.

You seem very insistent on bringing up problems just before release.
Almost as if you have a reason to back some other technology other than
this one.

The problem you mention here has been documented and very accessible for
months and not a single person mentioned it up to now. What's more, the
equivalent problem happens in the latest production version of Postgres
- users can delay VACUUM endlessly in just the same way, yet I've not
seen this raised as an issue in many years of using Postgres. Similarly,
there are some ways that Postgres can deadlock that it need not, yet
those negative behaviours are accepted and nobody is rushing to fix
them, nor demanding that they should be. Few things are theoretically
perfect on their first release.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Robert Haas
Date:
On Sun, Dec 20, 2009 at 3:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote:
>> I put them on the TODO list at
>> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
>> category.
>
> I notice you also re-arranged other items on there, specifically the
> notion that starting from a shutdown checkpoint is somehow important.
> It's definitely not any kind of bug.
>
> We've discussed this on-list and I've requested that you justify this.
> So far, nothing you've said on that issue has been at all convincing for
> me or others. The topic is already mentioned on the HS todo, since if
> one person requests something we should track that, just in case others
> eventually agree. But having said that, it clearly isn't a priority, so
> rearranging the item like that was not appropriate, unless you were
> thinking of doing it yourself, though that wasn't marked.

This doesn't match my recollection of the previous discussion on this
topic.  I am not sure that I'd call it a bug, but I'd definitely like
to see it fixed, and I think I mentioned that previously, though I
don't have the email in front ATM.  I am also not aware that anyone
other than yourself has opined that we should not worry about fixing
it, although I might be wrong about that too.  At any rate, "clearly
not a priority" seems like an overstatement relative to my memory of
that conversation.

...Robert


Re: alpha3 release schedule?

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote:
>> I put them on the TODO list at
>> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
>> category.
> 
> I notice you also re-arranged other items on there, specifically the
> notion that starting from a shutdown checkpoint is somehow important.

I didn't rearrange anything. I added that item because it was missing.

Yes, it is important in my opinion.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: alpha3 release schedule?

From
Hiroyuki Yamada
Date:
>The problem you mention here has been documented and very accessible for
>months and not a single person mentioned it up to now. What's more, the
>equivalent problem happens in the latest production version of Postgres
>- users can delay VACUUM endlessly in just the same way, yet I've not
>seen this raised as an issue in many years of using Postgres. Similarly,
>there are some ways that Postgres can deadlock that it need not, yet
>those negative behaviours are accepted and nobody is rushing to fix
>them, nor demanding that they should be. Few things are theoretically
>perfect on their first release.
>


Sorry for annoying you, at the very first.

Well, this is certainly a well-known problem, but the cursor example
(or deadlock example) reveals that the problem is more severe than
it was considered before, I guess.


Following comments in backup.sgml(which are now replaced by the deadlock example)

>       Waits for buffer cleanup locks do not currently result in query
>       cancellation. Long waits are uncommon, though can happen in some cases
>       with long running nested loop joins.

...refered only to the example where startup process should wait
until the end of one query. And long waits are assumed to be uncommon.

The cursor example shows, however, the waits can be as long as one
transaction, and occur in usual use case. FYI, I wrote a typical freeze
scenario in the mail posted in the original deadlock example thread.

Then the startup process may have to wait until the end of transaction,
and we can not expect when the pin-holder transaction ends.


Also, you mentioned the VACCUM case of the production version, but following
two problems have different impacts.
* One VACUUM process freezes until the end of a certain transaction.* Startup process(and whole recovery work) freezes
untilthe end of  a certain transaction.
 

The startup process is the last process to freeze. So I guess
this problem may become must-fix.


Anyway, the patch are committed and alpha 3 are to be released.
Do you think this problem is must-fix for the final release ?


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Mon, 2009-12-21 at 18:42 +0900, Hiroyuki Yamada wrote:

> Do you think this problem is must-fix for the final release ?

We should be clear that this is a behaviour I told you about, not a
shock discovery by yourself. There is no permanent freeze, just a wait,
from which the Startup process wakes up at the appropriate time. There
is no crash or hang as is usually implied by the word freeze.

It remains to be seen whether this is a priority for usability
enhancement in this release. There are other issues as well and it is
doubtful that every user will be fully happy with the functionality in
this release. I will work on things in the order in which I understand
them to be important for the majority, given my time and budget
constraints and the resolvability of the issues.

When you report bugs, I say thanks. When you start agitating about
already-documented restrictions and I see which other software you
promote, I think you may have other motives. Regrettably that reduces
the weight I give your claims, in relation to other potential users.

If you genuinely care about this topic then I hope and expect that you
would start thinking about improvements, or even writing some.

I am already in touch with many potential users and will be engaging
more widely to understand users's reactions from the Alpha release.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Sun, 2009-12-20 at 19:11 -0500, Robert Haas wrote:
> On Sun, Dec 20, 2009 at 3:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote:
> >> I put them on the TODO list at
> >> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
> >> category.
> >
> > I notice you also re-arranged other items on there, specifically the
> > notion that starting from a shutdown checkpoint is somehow important.
> > It's definitely not any kind of bug.
> >
> > We've discussed this on-list and I've requested that you justify this.
> > So far, nothing you've said on that issue has been at all convincing for
> > me or others. The topic is already mentioned on the HS todo, since if
> > one person requests something we should track that, just in case others
> > eventually agree. But having said that, it clearly isn't a priority, so
> > rearranging the item like that was not appropriate, unless you were
> > thinking of doing it yourself, though that wasn't marked.
> 
> This doesn't match my recollection of the previous discussion on this
> topic.  I am not sure that I'd call it a bug, but I'd definitely like
> to see it fixed, and I think I mentioned that previously, though I
> don't have the email in front ATM.  I am also not aware that anyone
> other than yourself has opined that we should not worry about fixing
> it, although I might be wrong about that too.  At any rate, "clearly
> not a priority" seems like an overstatement relative to my memory of
> that conversation.

Please check the thread then. Nobody but me has "opined that we should
not worry about fixing it", but then nobody else other than Heikki has
suggested it is even a feature worthy of inclusion, ever. One person
agreed with my position, nobody has spoken in favour of Heikki's
position. However, I had already included the feature on the todo; it
was further down the todo before a second copy was added, second copy
now removed.

If you are saying being able to start Hot Standby from a shutdown
checkpoint is an important feature for you, then say so, and why.

Please also be careful that you don't mix this up with other
improvements, nor say "they all need fixing". This isn't a general
discussion on those points. There are other important things.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Florian Pflug
Date:
On 22.12.09 9:34 , Simon Riggs wrote:
> If you are saying being able to start Hot Standby from a shutdown
> checkpoint is an important feature for you, then say so, and why.

I think it's not so much an important feature but more the removal of a
footgun.

Image a reporting database where all transactions but a few daily bulk
imports are read-only. To spread the load, you do your bulk loads on the
master, but run the reporting queries against a read-only HS slave. Now
you take the master down for maintenance. Since all clients but the bulk
loader use the slave already, and since the bulk loads can be deferred
until after the maintenance window closes again, you don't actually do a
fail-over.

Now you're already pointing at your foot with the gun. All it takes to
ruin your day is *some* reason for the slave to restart. Maybe due to a
junior DBA's typo, or maybe due to a bug in postgres. Anway, once the
slave is down, it won't come up until you manage to get the master up
and running again. And this limitation is pretty surprising, since one
would assume that if the slave survives a *crash* of the master, it'd
certainly survive a simple *shutdown*.

best regards,
Florian Pflug


Re: alpha3 release schedule?

From
Greg Stark
Date:
On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> If you are saying being able to start Hot Standby from a shutdown
> checkpoint is an important feature for you, then say so, and why.

Can you explain the consequences of missing this? It sounds to me like
if I lose my master and it happened to be while it was shut down for
whatever reason then I'll be stuck and won't be able to use my
standby. If that's true it seems like it's a major problem. Or does it
just mean I would have to follow a different procedure when failing
over?

I'm not sure if it's relevant but one thing to realize is that a lot
of MySQL people are used to doing failovers to do regular maintenance
tasks like creating indexes or making schema changes. Besides,a lot of
sites build in regular failovers to ensure that their failover
procedure works. In both cases they usually want to do a clean shut
down of the master to ensure they don't lose any transactions during
the failover.

-- 
greg


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote:
> On 22.12.09 9:34 , Simon Riggs wrote:
> > If you are saying being able to start Hot Standby from a shutdown
> > checkpoint is an important feature for you, then say so, and why.
> 
> I think it's not so much an important feature but more the removal of a
> footgun.
> 
> Image a reporting database where all transactions but a few daily bulk
> imports are read-only. To spread the load, you do your bulk loads on the
> master, but run the reporting queries against a read-only HS slave. Now
> you take the master down for maintenance. Since all clients but the bulk
> loader use the slave already, and since the bulk loads can be deferred
> until after the maintenance window closes again, you don't actually do a
> fail-over.
> 
> Now you're already pointing at your foot with the gun. All it takes to
> ruin your day is *some* reason for the slave to restart. Maybe due to a
> junior DBA's typo, or maybe due to a bug in postgres. Anway, once the
> slave is down, it won't come up until you manage to get the master up
> and running again. And this limitation is pretty surprising, since one
> would assume that if the slave survives a *crash* of the master, it'd
> certainly survive a simple *shutdown*.

Well, you either wait for master to come up again and restart, or you
flip into normal mode and keep running queries from there. You aren't
prevented from using the server, except by your own refusal to failover.

That's not enough for me to raise the priority for this feature.

But it was already on the list and remains there now. If someone does
add this, it will require careful thought about how to avoid introducing
further subtle ways to break HS, all of which will need testing and
re-testing to avoid regression.

So I'm not personally going to be working on it, for this release and
likely the next also, nor will I encourage others to do so, for anyone
looking to assist. There are more important things for us to do, IMHO.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 11:41 +0000, Greg Stark wrote:
> On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > If you are saying being able to start Hot Standby from a shutdown
> > checkpoint is an important feature for you, then say so, and why.
> 
> Can you explain the consequences of missing this? It sounds to me like
> if I lose my master and it happened to be while it was shut down for
> whatever reason then I'll be stuck and won't be able to use my
> standby. If that's true it seems like it's a major problem. Or does it
> just mean I would have to follow a different procedure when failing
> over?

Failover isn't prevented in this case.

If we were going to spend time on anything it would be to make failover
and switchback easier so that people aren't afraid of it. I've spent a
few weeks trying to remove the shutdown checkpoint, but no luck so far.

Switchback optimization is probably something for next release now,
unless you're looking for a project?

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> If someone does
> add this, it will require careful thought about how to avoid introducing
> further subtle ways to break HS, all of which will need testing and
> re-testing to avoid regression.

Well, I *did* add that, but you removed it...

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 16:09 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > If someone does
> > add this, it will require careful thought about how to avoid introducing
> > further subtle ways to break HS, all of which will need testing and
> > re-testing to avoid regression.
> 
> Well, I *did* add that, but you removed it...

It was already on there when you added the second one. It is still there
now, even after I removed the duplicate entry.

By "add" I meant to write the feature, test and then support it
afterwards, not to re-discuss editing the Wiki.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Florian Pflug
Date:
On 22.12.09 13:21 , Simon Riggs wrote:
> On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote:
>> Image a reporting database where all transactions but a few daily
>> bulk imports are read-only. To spread the load, you do your bulk
>> loads on the master, but run the reporting queries against a
>> read-only HS slave. Now you take the master down for maintenance.
>> Since all clients but the bulk loader use the slave already, and
>> since the bulk loads can be deferred until after the maintenance
>> window closes again, you don't actually do a fail-over.
>>
>> Now you're already pointing at your foot with the gun. All it
>> takes to ruin your day is *some* reason for the slave to restart.
>> Maybe due to a junior DBA's typo, or maybe due to a bug in
>> postgres. Anway, once the slave is down, it won't come up until you
>> manage to get the master up and running again. And this limitation
>> is pretty surprising, since one would assume that if the slave
>> survives a *crash* of the master, it'd certainly survive a simple
>> *shutdown*.
>
> Well, you either wait for master to come up again and restart, or you
> flip into normal mode and keep running queries from there. You aren't
> prevented from using the server, except by your own refusal to
> failover.

Very true. However, that "refusal" as you put it might actually be the
most sensible thing to do in a lot of setups. Not everyone needs extreme
up-time guarantees, and for those people setting up, testing and
*continuously* exercising fail-over is just not worth the effort.
Especially since fail-over with asynchronous replication is tricky to
get right if you want to avoid data loss.

So I still believe that there are very real use-cases for HS where this
limitation can be quite a PITA.

But you are of course free to work on whatever you feel like, and
probably need to satisfy your client's needs first. So I'm in no way
implying that this issue is a must-fix issue, or that you're in any way
obliged to take care of it. I merely wanted to make the point that there
*are* valid use-cases where this behavior is not ideal.

best regards,
Florian Pflug


Re: alpha3 release schedule?

From
Greg Stark
Date:
On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote:
>> Well, you either wait for master to come up again and restart, or you
>> flip into normal mode and keep running queries from there. You aren't
>> prevented from using the server, except by your own refusal to
>> failover.
>
> Very true. However, that "refusal" as you put it might actually be the
> most sensible thing to do in a lot of setups. Not everyone needs extreme
> up-time guarantees, and for those people setting up, testing and
> *continuously* exercising fail-over is just not worth the effort.
> Especially since fail-over with asynchronous replication is tricky to
> get right if you want to avoid data loss.

To say nothing that the replica might not be a suitable master at all.
It could be running on inferior hardware or be on a separate network
perhaps too slow to reach from production services.

HA is not the only use case for HS or even the main one in my experience


-- 
greg


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote:

> But you are of course free to work on whatever you feel like, and
> probably need to satisfy your client's needs first.

Alluding to me as whimsical or mercenary isn't likely to change my mind.

IMHO this isn't one of the more important features, for the majority, in
this release. I do intend to check that.

If there are people that believe otherwise, knock yourselves out.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 15:38 +0000, Greg Stark wrote:
> On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote:
> >> Well, you either wait for master to come up again and restart, or you
> >> flip into normal mode and keep running queries from there. You aren't
> >> prevented from using the server, except by your own refusal to
> >> failover.
> >
> > Very true. However, that "refusal" as you put it might actually be the
> > most sensible thing to do in a lot of setups. Not everyone needs extreme
> > up-time guarantees, and for those people setting up, testing and
> > *continuously* exercising fail-over is just not worth the effort.
> > Especially since fail-over with asynchronous replication is tricky to
> > get right if you want to avoid data loss.
> 
> To say nothing that the replica might not be a suitable master at all.
> It could be running on inferior hardware or be on a separate network
> perhaps too slow to reach from production services.
> 
> HA is not the only use case for HS or even the main one in my experience

I can invent scenarios in which all the outstanding issues give
problems. What I have to do is balance which of those is more likely and
which have useful workarounds. This is about priority and in particular,
my priority. IMHO my time would be misplaced to work upon this issue,
though I will check that other users feel that way also.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> By "add" I meant to write the feature, test and then support it
> afterwards, not to re-discuss editing the Wiki.

That's exactly what I meant too. I *did* write the feature, but you
removed it before committing.

I can extract the removed parts from the git repository and send you as
a new patch for review, if you'd like.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > By "add" I meant to write the feature, test and then support it
> > afterwards, not to re-discuss editing the Wiki.
> 
> That's exactly what I meant too. I *did* write the feature, but you
> removed it before committing.

I removed it because you showed it wouldn't work. If you want to fix
that problem, test, commit and support it, go right ahead.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> By "add" I meant to write the feature, test and then support it
>>> afterwards, not to re-discuss editing the Wiki.
>> That's exactly what I meant too. I *did* write the feature, but you
>> removed it before committing.
> 
> I removed it because you showed it wouldn't work.

I did?

I believe this is the discussion that lead to you removing it (6th of
December, thread "Hot Standby, recent changes"):

Simon Riggs wrote:
> > On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote:
>> > 4. Need to handle the case where master is started up with
>> > wal_standby_info=true, shut down, and restarted with
>> > wal_standby_info=false, while the standby server runs continuously. And
>> > the code in StartupXLog() to initialize recovery snapshot from a
>> > shutdown checkpoint needs to check that too.
> 
> I don't really understand the use case for shutting down the server and
> then using it as a HS base backup. Why would anyone do that? Why would
> they have their server down for potentially hours, when they can take
> the backup while the server is up? If the server is idle, it can be
> up-and-idle just as easily as down-and-idle, in which case we wouldn't
> need to support this at all. Adding yards of code for this capability
> isn't important to me. I'd rather strip the whole lot out than keep
> fiddling with a low priority area. Please justify this as a real world
> solution before we continue trying to support it.

The issue I mentioned had nothing to do with starting from a shutdown
checkpoint - it's still a problem if you keep the standby running
through the restart cycle in the master) - but maybe you thought it was?
Or was there something else?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 18:40 +0200, Heikki Linnakangas wrote:

> The issue I mentioned had nothing to do with starting from a shutdown
> checkpoint - it's still a problem if you keep the standby running
> through the restart cycle in the master) - but maybe you thought it was?
> Or was there something else?

Strangely enough that exact same problem already happens with
archive_mode, and we see a fix coming for that soon also.

That fix takes the same approach as HS already takes. HS will flip out
when it sees the next record (checkpoint). The only way out is to
re-take base backup, just the same.

Even after that fix is applied, HS will still work as well as
archive-mode, so if anything HS is ahead of other functionality.

Fixing obscure cases where people actively try to get past configuration
options is not a priority. I'm not sure why you see it as important,
especially when you've argued we don't even need the parameter in the
first place.

You've been perfectly happy for *years* with the situation that recovery
would fail if max_prepared_transactions was not correctly. You're not
going to tell me you never noticed? Why is avoidance of obvious
misconfiguration of HS such a heavy priority when nothing else ever was?

I'm going to concentrate on fixing important issues. I'd rather you
helped with those.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> You've been perfectly happy for *years* with the situation that recovery
> would fail if max_prepared_transactions was not correctly. You're not
> going to tell me you never noticed? Why is avoidance of obvious
> misconfiguration of HS such a heavy priority when nothing else ever was?

That's not a priority, and I never said it was.

It almost sounds like we're in a violant agreement: this issue of
flipping wal_standby_info in the master has nothing to do with the
removal of the capability to start standby from a shutdown checkpoint.

So what *was* the reason? Was there something wrong with it? If not,
please put it back.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: alpha3 release schedule?

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Mon, 2009-12-21 at 18:42 +0900, Hiroyuki Yamada wrote:
> 
> > Do you think this problem is must-fix for the final release ?
> 
> We should be clear that this is a behaviour I told you about, not a
> shock discovery by yourself. There is no permanent freeze, just a wait,
> from which the Startup process wakes up at the appropriate time. There
> is no crash or hang as is usually implied by the word freeze.
> 
> It remains to be seen whether this is a priority for usability
> enhancement in this release. There are other issues as well and it is
> doubtful that every user will be fully happy with the functionality in
> this release. I will work on things in the order in which I understand
> them to be important for the majority, given my time and budget
> constraints and the resolvability of the issues.
> 
> When you report bugs, I say thanks. When you start agitating about
> already-documented restrictions and I see which other software you
> promote, I think you may have other motives. Regrettably that reduces
> the weight I give your claims, in relation to other potential users.

Simon, where did this come from?  "Other software?"  

I think Simon's comments are way off base here and only serve to
increase tension in this discussion.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 19:30 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > You've been perfectly happy for *years* with the situation that recovery
> > would fail if max_prepared_transactions was not correctly. You're not
> > going to tell me you never noticed? Why is avoidance of obvious
> > misconfiguration of HS such a heavy priority when nothing else ever was?
> 
> That's not a priority, and I never said it was.
> 
> It almost sounds like we're in a violant agreement: this issue of
> flipping wal_standby_info in the master has nothing to do with the
> removal of the capability to start standby from a shutdown checkpoint.

I removed the capability to start at shutdown checkpoints because you
said it would cause this bug. That gives two choices: fix the bug,
remove the feature. I don't think it is a priority to support that
feature, so I removed it in favour of other work.

I will work on issues in priority order and this was already on the list
and remains so. I don't have endless time, so realistically, given its
current priority it is unlikely to be addressed in this release.

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
Florian Pflug
Date:
On 22.12.09 16:45 , Simon Riggs wrote:
> On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote:
>> But you are of course free to work on whatever you feel like, and
>> probably need to satisfy your client's needs first.
>
> Alluding to me as whimsical or mercenary isn't likely to change my
> mind.

Simon, you *completely* miss-understood my last paragraph!

I never intended to call you whimsical or mercenary, and I honestly
don't believe I did. The only thing I "alluded" to you was seeing HS
mostly as a solution for HA setups, whereas I felt that I has quite a
few use-cases beside that. Plus that your view of what the important
use-cases are is influenced by the projects you usually work on, and
that it's perfectly reasonable for your priorities to reflect that view.

None of this was meant as an insult of any kind.

best regards,
Florian Pflug


Re: alpha3 release schedule?

From
Simon Riggs
Date:
On Tue, 2009-12-22 at 19:53 +0100, Florian Pflug wrote:

> None of this was meant as an insult of any kind.

Then I apologise completely.

I've clearly been working too hard and will retire for some rest (even
though that is not listed as a task on the Wiki).

-- Simon Riggs           www.2ndQuadrant.com



Re: alpha3 release schedule?

From
"David E. Wheeler"
Date:
On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote:

> I've clearly been working too hard and will retire for some rest (even
> though that is not listed as a task on the Wiki).

Someone add it!

David


Re: alpha3 release schedule?

From
David Fetter
Date:
On Tue, Dec 22, 2009 at 11:04:29AM -0800, David Wheeler wrote:
> On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote:
> 
> > I've clearly been working too hard and will retire for some rest (even
> > though that is not listed as a task on the Wiki).
> 
> Someone add it!

Done! :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate