Thread: Hot Standby: Relation-specific deferred conflict resolution

Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
Conflict resolution improvements are important to include in this
release, as discussed many times. Proposal given here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
presents a viable design to improve this.

Following patch is a complete working implementation of that design.
I'm still testing it, but its worth publishing as early as possible to
allow discussion. Not for commit, just yet, but soon.

standby.c changes are to decide whether to defer recovery based upon
relfilenode of WAL record. If resolution deferred, re-check for conflict
during LockAcquire() and fail with snapshot error, just as if resolution
had never been deferred. Also, an optimisation of conflict processing to
avoid continual re-evaluation of conflicts since some are now deferred.

API changes in heapam and nbtxlog to pass thru RelFileNode
API changes in indexcmds, no behaviour changes
procarray changes to implement LatestRemovedXid cache

 backend/access/heap/heapam.c    |    6 -
 backend/access/nbtree/nbtxlog.c |    2
 backend/commands/indexcmds.c    |    4 -
 backend/storage/ipc/procarray.c |   55 +++++++++++-
 backend/storage/ipc/standby.c   |  112 +++++++++++++++++++++++------
 backend/storage/lmgr/lock.c     |  124 ++++++++++++++++++++++++++++++--
 include/storage/lock.h          |    8 ++
 include/storage/proc.h          |    5 +
 include/storage/standby.h       |    9 ++
 9 files changed, 292 insertions(+), 33 deletions(-)

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Hot Standby: Relation-specific deferred conflict resolution

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Conflict resolution improvements are important to include in this
> release, as discussed many times. Proposal given here
> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
> presents a viable design to improve this.
> 
> Following patch is a complete working implementation of that design.
> I'm still testing it, but its worth publishing as early as possible to
> allow discussion. Not for commit, just yet, but soon.

Um, you're not considering this for 9.0, are you? I think it's time to
concentrate on the must-fix issues and fix the rough edges in what we have.

For example, the "can't start hot standby mode from a shutdown
checkpoint" issue is a must-fix issue in my opinion, about 10x as
important as this. When that was last discussed, many others agreed. I
run into that all the time when testing streaming replication, and every
time I go "Huh, why isn't the standby opening up for connections?", and
then, "Ahh, it's this stupid shutdown checkpoint issue again".

And the VACUUM FULL issue is still hanging too. And maybe you could help
with some other things on the open items or commitfest list.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 08:26 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Conflict resolution improvements are important to include in this
> > release, as discussed many times. Proposal given here
> > http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
> > presents a viable design to improve this.
> > 
> > Following patch is a complete working implementation of that design.
> > I'm still testing it, but its worth publishing as early as possible to
> > allow discussion. Not for commit, just yet, but soon.
> 
> Um, you're not considering this for 9.0, are you? I think it's time to
> concentrate on the must-fix issues and fix the rough edges in what we have.

Yes, it is important.

> For example, the "can't start hot standby mode from a shutdown
> checkpoint" issue is a must-fix issue in my opinion, about 10x as
> important as this. When that was last discussed, many others agreed. I
> run into that all the time when testing streaming replication, and every
> time I go "Huh, why isn't the standby opening up for connections?", and
> then, "Ahh, it's this stupid shutdown checkpoint issue again".

That was not the feedback I have received. Nobody has commented on that
to me, though many have commented on the need for the current patch. As
mentioned, I went to the trouble of running a meeting to gain additional
feedback and the result was very clear.

Of course, if we ignore any feature, then someone will say "its that
stupid issue again", but that won't imply we got our priority wrong.

> And the VACUUM FULL issue is still hanging too. 

Yes, that is a must fix.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Guillaume Smet
Date:
On Fri, Jan 29, 2010 at 9:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> That was not the feedback I have received. Nobody has commented on that
> to me, though many have commented on the need for the current patch. As
> mentioned, I went to the trouble of running a meeting to gain additional
> feedback and the result was very clear.

I don't have a technical opinion about this problem yet as I haven't
tested HS+SR yet but I'm not sure it's a good idea to base technical
decisions and priorities on user polls (I'm pretty sure most of them
don't use HS+SR as much as Heikki these days).
If you ask people what they want in their future cars, they won't
answer they want wheels or an engine: it's something obvious for them.
AFAICS (but I might be wrong), you asked this question to people who
are interested in HS+SR but don't have any idea of what it's like to
use HS+SR daily with or without this limitation.

There are perhaps better arguments for not doing it but this one seems
a bit weird to me.

-- 
Guillaume


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 09:20 +0100, Guillaume Smet wrote:
> On Fri, Jan 29, 2010 at 9:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > That was not the feedback I have received. Nobody has commented on that
> > to me, though many have commented on the need for the current patch. As
> > mentioned, I went to the trouble of running a meeting to gain additional
> > feedback and the result was very clear.
> 
> I don't have a technical opinion about this problem yet as I haven't
> tested HS+SR yet but I'm not sure it's a good idea to base technical
> decisions and priorities on user polls (I'm pretty sure most of them
> don't use HS+SR as much as Heikki these days).
> If you ask people what they want in their future cars, they won't
> answer they want wheels or an engine: it's something obvious for them.
> AFAICS (but I might be wrong), you asked this question to people who
> are interested in HS+SR but don't have any idea of what it's like to
> use HS+SR daily with or without this limitation.

Well, you are correct that a larger group of users *could* have avoided
an obvious and important issue. Though if you deploy that argument it
can be applied both ways: Heikki may also be missing an obvious and
important issue. Where does that leave us?

I am not against putting both into this release. If I am forced to
choose just one, I've at least given reasons why that should be so.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2010-01-29 at 08:26 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> Conflict resolution improvements are important to include in this
>>> release, as discussed many times. Proposal given here
>>> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
>>> presents a viable design to improve this.
>>>
>>> Following patch is a complete working implementation of that design.
>>> I'm still testing it, but its worth publishing as early as possible to
>>> allow discussion. Not for commit, just yet, but soon.
>> Um, you're not considering this for 9.0, are you? I think it's time to
>> concentrate on the must-fix issues and fix the rough edges in what we have.
> 
> Yes, it is important.
> 
>> For example, the "can't start hot standby mode from a shutdown
>> checkpoint" issue is a must-fix issue in my opinion, about 10x as
>> important as this. When that was last discussed, many others agreed. I
>> run into that all the time when testing streaming replication, and every
>> time I go "Huh, why isn't the standby opening up for connections?", and
>> then, "Ahh, it's this stupid shutdown checkpoint issue again".
> 
> That was not the feedback I have received. Nobody has commented on that
> to me, 

Yes they have. I have on several occasions, as have other people on this
mailing list:

http://archives.postgresql.org/message-id/603c8f070912201611h4951087craa080ff6b48a97cd@mail.gmail.com
http://archives.postgresql.org/message-id/4B30AE53.6020202@gmail.com
http://archives.postgresql.org/message-id/407d949e0912220738je1e0141m87d7b688dd4ba27f@mail.gmail.com

I even *fixed* that already, but you decided to take it out before
committing. I then added it to the list of must-fix items in the TODO
list, but you took that out too. I have no objection to doing things in
smaller steps, but this *is* a must-fix item before release. I still
don't understand why you took it out, nor why you're objecting to that.

> though many have commented on the need for the current patch.

Who?

>  As
> mentioned, I went to the trouble of running a meeting to gain additional
> feedback and the result was very clear.

So what was the clear result?

If you're looking for things to do, I agree with Greg Stark that the
removal of max_standby_delay=-1 option is not good. That should be fixed
too.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

> So what was the clear result?

I have spoken clearly enough. You were welcome to attend the Hot Standby
User Group. The fact that you did not expresses your own priorities
quite well, ISTM. Your protestations to know more about the wishes of
users than they do themselves isn't hugely impressive.

There are many features we should add. I will add them in priority order
until forced to stop.

If you or anyone else believes features are essential, then either add
them yourselves or have the courage to stand up and say the release
should be delayed so that I can. To do otherwise is to admit you do not
actually consider them essential. It cannot be both ways.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Stefan Kaltenbrunner
Date:
Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> Conflict resolution improvements are important to include in this
>> release, as discussed many times. Proposal given here
>> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
>> presents a viable design to improve this.
>>
>> Following patch is a complete working implementation of that design.
>> I'm still testing it, but its worth publishing as early as possible to
>> allow discussion. Not for commit, just yet, but soon.
> 
> Um, you're not considering this for 9.0, are you? I think it's time to
> concentrate on the must-fix issues and fix the rough edges in what we have.
 I agree  - this looks like a completely new feature development to me 
that tries to move the goalpost quite far for 9.0.

> 
> For example, the "can't start hot standby mode from a shutdown
> checkpoint" issue is a must-fix issue in my opinion, about 10x as
> important as this. When that was last discussed, many others agreed. I
> run into that all the time when testing streaming replication, and every
> time I go "Huh, why isn't the standby opening up for connections?", and
> then, "Ahh, it's this stupid shutdown checkpoint issue again".

Yeah I ran into that one during testing as well - and I consider it a 
serious issue

> 
> And the VACUUM FULL issue is still hanging too. And maybe you could help
> with some other things on the open items or commitfest list.

yeah and we keep finding major bugs nearly daily so I don't think we 
should add features that way now.
First is stability and reliability, optimization and new features are 
imho clearly 9.1+ material. Just calling the release 9.0 and saying "we 
do that because it is a radical change and we expect some instability" 
should NOT mean we are free to put in every feature at the last minute 
with "yeah thats fine because people expect instability anyways".


Stefan


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Stefan Kaltenbrunner
Date:
Simon Riggs wrote:
> On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:
> 
>> So what was the clear result?
> 
> I have spoken clearly enough. You were welcome to attend the Hot Standby
> User Group. The fact that you did not expresses your own priorities
> quite well, ISTM. Your protestations to know more about the wishes of
> users than they do themselves isn't hugely impressive.

huh? traditionally discussions of that kind had to happen on -hackers 
and not in some online place some unnamed people attended.

> 
> There are many features we should add. I will add them in priority order
> until forced to stop.

we are past the point of adding new features for 9.0 imho

> 
> If you or anyone else believes features are essential, then either add
> them yourselves or have the courage to stand up and say the release
> should be delayed so that I can. To do otherwise is to admit you do not
> actually consider them essential. It cannot be both ways.

bugfix and stabilization mode is what we are in now (except for the 
stuff that already made it into the commitfest).


Stefan


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:

> yeah and we keep finding major bugs nearly daily

Facts, please?

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Magnus Hagander
Date:
2010/1/29 Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>:
> Simon Riggs wrote:
>>
>> On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:
>>
>>> So what was the clear result?
>>
>> I have spoken clearly enough. You were welcome to attend the Hot Standby
>> User Group. The fact that you did not expresses your own priorities
>> quite well, ISTM. Your protestations to know more about the wishes of
>> users than they do themselves isn't hugely impressive.
>
> huh? traditionally discussions of that kind had to happen on -hackers and not in some online place some unnamed
peopleattended.
 

+1.

Haven't we had enough communications failures with off-hackers groups
trying to come up with something only to have it not be agreed upon by
hackers later on?

(win32 would be the biggest thing so far, but it's not like we haven't
done it before in more cases)



>> There are many features we should add. I will add them in priority order
>> until forced to stop.
>
> we are past the point of adding new features for 9.0 imho
>
>>
>> If you or anyone else believes features are essential, then either add
>> them yourselves or have the courage to stand up and say the release
>> should be delayed so that I can. To do otherwise is to admit you do not
>> actually consider them essential. It cannot be both ways.
>
> bugfix and stabilization mode is what we are in now (except for the stuff that already made it into the commitfest).

Well, per some recent discussions, it seems small features are still
ok. But I doubt this qualifies as such.


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 11:12 +0100, Stefan Kaltenbrunner wrote:

> > 
> > There are many features we should add. I will add them in priority order
> > until forced to stop.
> 
> we are past the point of adding new features for 9.0 imho

So presumably we cannot add the new feature to start hot standby at
shutdown checkpoints then either?
> > If you or anyone else believes features are essential, then either add
> > them yourselves or have the courage to stand up and say the release
> > should be delayed so that I can. To do otherwise is to admit you do not
> > actually consider them essential. It cannot be both ways.
> 
> bugfix and stabilization mode is what we are in now (except for the 
> stuff that already made it into the commitfest).

That's where we'd like to be, but these new features have not been in
the tree long enough for what you say to be the actual position. We can
pretend it is, but that doesn't make it so.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Stefan Kaltenbrunner
Date:
Simon Riggs wrote:
> On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:
> 
>> yeah and we keep finding major bugs nearly daily
> 
> Facts, please?
> 

5 seconds of time spent on archives.postgresql.org show at least the 
following SR/HS related bugs in the last 7 days or so:

http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php

some of those you might call "minor" but they are bugs and given the 
current rate we are seeing them is imho a clear sign of "code by far not 
stable enough to consider new features that late in the cycle"

Stefan


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Stefan Kaltenbrunner
Date:
Simon Riggs wrote:
> On Fri, 2010-01-29 at 11:12 +0100, Stefan Kaltenbrunner wrote:
> 
>>> There are many features we should add. I will add them in priority order
>>> until forced to stop.
>> we are past the point of adding new features for 9.0 imho
> 
> So presumably we cannot add the new feature to start hot standby at
> shutdown checkpoints then either?

well as far as I recall that ones has been proposed much earlier than 
mid of december(like the patch in discussion here) and I agree with 
heikki that I'm not clear on what your actual objections to that patch 
were - care to provide a link to the archives please?

>  
>>> If you or anyone else believes features are essential, then either add
>>> them yourselves or have the courage to stand up and say the release
>>> should be delayed so that I can. To do otherwise is to admit you do not
>>> actually consider them essential. It cannot be both ways.
>> bugfix and stabilization mode is what we are in now (except for the 
>> stuff that already made it into the commitfest).
> 
> That's where we'd like to be, but these new features have not been in
> the tree long enough for what you say to be the actual position. We can
> pretend it is, but that doesn't make it so.

Not sure I follow (maybe because I'm not a native speaker) but are you 
trying to say that we can simply add new features late in the release 
cycle to stuff commited because it is not long in the tree instead of 
focusing stabilizing what we have?
If you are so sure that we NEED those features to be releaseworthy - 
maybe it was premature to commit HS and SR for this cycle?


Stefan


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 11:20 +0100, Stefan Kaltenbrunner wrote:
> Simon Riggs wrote:
> > On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:
> > 
> >> yeah and we keep finding major bugs nearly daily
> > 
> > Facts, please?
> > 
> 
> 5 seconds of time spent on archives.postgresql.org show at least the 
> following SR/HS related bugs in the last 7 days or so:
> 
> http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php
> http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php
> http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php
> http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php
> 
> some of those you might call "minor" but they are bugs and given the 
> current rate we are seeing them is imho a clear sign of "code by far not 
> stable enough to consider new features that late in the cycle"

I don't think two very minor bugs in Hot Standby, reported and fixed 7
days apart is any indication of instability. It isn't the "daily bugs
reported" you suggested. Personally, I think it indicates quite the
opposite - if those are the only bugs I can find now, I'm ecstatic.

I think your argument does apply to Streaming Rep, at this point. We
should consider releasing Alpha4 and then later going to Beta.

My point of view expressed here is not built in a few seconds, it is
built on discussion and feedback over 18 months. The conflict issue was
discussed by me with hackers at the May 2008 dev meeting. It should be
improved upon in this release and it has been the main issue concerning
the full range of people I have discussed HS with.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Robert Haas
Date:
On Fri, Jan 29, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:
>
>> So what was the clear result?
>
> I have spoken clearly enough. You were welcome to attend the Hot Standby
> User Group. The fact that you did not expresses your own priorities
> quite well, ISTM. Your protestations to know more about the wishes of
> users than they do themselves isn't hugely impressive.

This doesn't make any sense at all.  It is not the case that people
who attended the Hot Standby user group are the only ones who are
entitled to provide any feedback, and that people who read
pgsql-hackers are not (unless they also happen to have attended the
Hot Standby user group).  If anything, that's 100% backwards, but at
any rate Heikki has probably spent more time over the last year on
this feature than anyone in the world with the exception of yourself;
I think that counts for a lot more than one two-hour meeting.

...Robert


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 07:01 -0500, Robert Haas wrote:
> On Fri, Jan 29, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:
> >
> >> So what was the clear result?
> >
> > I have spoken clearly enough. You were welcome to attend the Hot Standby
> > User Group. The fact that you did not expresses your own priorities
> > quite well, ISTM. Your protestations to know more about the wishes of
> > users than they do themselves isn't hugely impressive.
> 
> This doesn't make any sense at all.  

I am busy working on the right features for the most number of people,
as expressed to me. I accept there are people that disagree and I am
sorry for that. Others are welcome to add code to do the things I will
not be doing through lack of time, if they are not satisfied with my
priority.

I think we should extend the time available to make sure we have a
sensible set of features for 9.0. The heat of this discussion tells me
that we are going to be lacking features that are must-have to someone,
whether or not they are in the majority.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

> I even *fixed* that already, but you decided to take it out before
> committing. I then added it to the list of must-fix items in the TODO
> list, but you took that out too. I have no objection to doing things
> in smaller steps, but this *is* a must-fix item before release. I
> still don't understand why you took it out, nor why you're objecting
> to that.

This is a misrepresentation of what has happened. The item you mention
was added to the TODO by me in response to your comments and has never
been removed at any point; it is still there now. You added a duplicate
and the duplicate was removed. I removed code that you mentioned was
buggy because I don't have time to fix it and it is not high enough up
the priority list. We have discussed all of these things before yet you
raise them again as if those things have never been said.

I am working on TODO in a priority order. The priority list has changed
over time in response to comments received from both you and other
people. I understand you don't like my current sequence of actions and
I'm sorry for that. I am trying to be rational and balance what has been
said to me for the benefit of the community from all stakeholders and
have consulted widely to gather thoughts.

I am more than happy for other people to work on items on the list, as
Joachim and Andres have done.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> I removed code that you mentioned was
> buggy because I don't have time to fix it and it is not high enough up
> the priority list. We have discussed all of these things before yet you
> raise them again as if those things have never been said.

*sigh*. Yeah, we've been through this. As I've said before, I never said
the code was buggy, that was just a misunderstanding at your end.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Greg Stark
Date:
The fundamental disagreement here is over what qualifies as a
"wishlist" item, aka a feature or added functionality. And what
qualifies as a must-fix bug.

Priorities are context sensitive. If this were early in the cycle then
working on bigger impact features like conflict resolution code might
be more important because it's important to get them into the code
base where other people can see if it solves their problems and the
behaviour can be tweaked. Fixing rare but outright broken things might
not be high priority because while they have to be done by release
nobody is blocking on on the solution before then.

On the other hand near release we stop trying to incorporate new
features and focus on basic bug features. The new features don't
become any less important to the users, it's just that they'll make it
into the next release. There will always be more features that can
help users and we'll always think of cool new things to knock off the
rough edges off what we have now and get it out so we can go back to
the playground for the next release.

You said "I think we should extend the time available to make sure we
have a sensible set of features for 9.0."  Perhaps part of the problem
is that I couldn't understand what your patch did from the description
you posted and can't evaluate whether it's fixing a problem that makes
the current feature set incoherent. Can you explain what it does in
more detail so we can understand why it's necessary for a sensible set
of features?


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 16:44 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I removed code that you mentioned was
> > buggy because I don't have time to fix it and it is not high enough up
> > the priority list. We have discussed all of these things before yet you
> > raise them again as if those things have never been said.
> 
> *sigh*. Yeah, we've been through this. As I've said before, I never said
> the code was buggy, that was just a misunderstanding at your end.

OK, if you say I misunderstood, then I accept that. 

The deed is done and the code removed. Putting it back takes time and
given enough of that rare cloth, it will eventually be put back.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote:

> I think we should extend the time available to make sure we have a
> sensible set of features for 9.0. The heat of this discussion tells me
> that we are going to be lacking features that are must-have to someone,
> whether or not they are in the majority.

-1

Missing release dates because of some patch that isn't done is something
the community has been trying to get away from, aggressively. The way
this is supposed to work is:

We have a release date
  Features that aren't going to make that date, don't.
We release


Sincerely,

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote:

> You said "I think we should extend the time available to make sure we
> have a sensible set of features for 9.0."  Perhaps part of the problem
> is that I couldn't understand what your patch did from the description
> you posted and can't evaluate whether it's fixing a problem that makes
> the current feature set incoherent. Can you explain what it does in
> more detail so we can understand why it's necessary for a sensible set
> of features?

I'll break down the patch into two pieces to make it easier to review,
and add more description, as you suggest.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Stefan Kaltenbrunner
Date:
Simon Riggs wrote:
> On Fri, 2010-01-29 at 11:20 +0100, Stefan Kaltenbrunner wrote:
>> Simon Riggs wrote:
>>> On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:
>>>
>>>> yeah and we keep finding major bugs nearly daily
>>> Facts, please?
>>>
>> 5 seconds of time spent on archives.postgresql.org show at least the 
>> following SR/HS related bugs in the last 7 days or so:
>>
>> http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php
>> http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php
>> http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php
>> http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php
>>
>> some of those you might call "minor" but they are bugs and given the 
>> current rate we are seeing them is imho a clear sign of "code by far not 
>> stable enough to consider new features that late in the cycle"
> 
> I don't think two very minor bugs in Hot Standby, reported and fixed 7
> days apart is any indication of instability. It isn't the "daily bugs
> reported" you suggested. Personally, I think it indicates quite the
> opposite - if those are the only bugs I can find now, I'm ecstatic.

well we have not even made a realistic release (not even an alpha!) with 
the current HS/SR code, we still have "must fix" issues on the table(for 
both SR and HS) AND we find/fix more than a bug every two days in those 
two features that are the cause for moving to 9.0.
If we want to release in anya realistic timeframe (and I recall you 
advocating for doing that in the past) we really need to wrap up what we 
have now, make it robust and see what we have left for all the further 
releases.

> 
> I think your argument does apply to Streaming Rep, at this point. We
> should consider releasing Alpha4 and then later going to Beta.

so you basically say that the current codebase(as a whole) is in need of 
stabilisation and we need to make the cut off and release alpha4 now?
Not sure how that fits into proposing new features for other parts of 
the system...

> 
> My point of view expressed here is not built in a few seconds, it is
> built on discussion and feedback over 18 months. The conflict issue was
> discussed by me with hackers at the May 2008 dev meeting. It should be
> improved upon in this release and it has been the main issue concerning
> the full range of people I have discussed HS with.

"in this release" refers to the current patch I guess - because there 
was no HS in older versions of pg :)


Stefan


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Robert Haas
Date:
On Fri, Jan 29, 2010 at 11:32 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote:
>
>> I think we should extend the time available to make sure we have a
>> sensible set of features for 9.0. The heat of this discussion tells me
>> that we are going to be lacking features that are must-have to someone,
>> whether or not they are in the majority.
>
> -1
>
> Missing release dates because of some patch that isn't done is something
> the community has been trying to get away from, aggressively. The way
> this is supposed to work is:
>
> We have a release date
>  Features that aren't going to make that date, don't.
> We release

Exactly.  It would be nice to see 9.0 come out in 2010, and we're not
going to get there unless we start fixing the issues that are actually
release-blockers, rather than adding new features.  Hot Standby was
committed with at least one known release blocker (VACUUM FULL) on the
assumption that that release blocker would be fixed by the committer
who introduced it (isn't that the rule?).  Two months on, there is
zero sign of any activity on that front, and instead we're now being
bombarded with a series of other patches that fix issues that are not
release-blockers under the theory that the feature isn't good enough
to be used without them.  If that's really true, it wasn't ready for
commit in the first place.

If this were any other patch, I'd propose reverting it.

...Robert


Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:

> Exactly.  It would be nice to see 9.0 come out in 2010, and we're not
> going to get there unless we start fixing the issues that are actually
> release-blockers, rather than adding new features.  Hot Standby was
> committed with at least one known release blocker (VACUUM FULL) on the
> assumption that that release blocker would be fixed by the committer
> who introduced it (isn't that the rule?).  Two months on, there is
> zero sign of any activity on that front, and instead we're now being
> bombarded with a series of other patches that fix issues that are not
> release-blockers under the theory that the feature isn't good enough
> to be used without them.  If that's really true, it wasn't ready for
> commit in the first place.
>
> If this were any other patch, I'd propose reverting it.
>

I would suggest that if we don't see activity on the release blockers,
pretty much stat... we revert it.

Joshua D. Drake


> ...Robert
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:
> Two months on, there is
> zero sign of any activity on that front

I'm surprised that you call 14 commits in 28 days following a publicly
available priority list: "zero sign of activity".

Further discussion seems pointless.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 18:08 +0000, Simon Riggs wrote:
> On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:
> > Two months on, there is
> > zero sign of any activity on that front
>
> I'm surprised that you call 14 commits in 28 days following a publicly
> available priority list: "zero sign of activity".
>
> Further discussion seems pointless.

Let's be clear. Robert is discussing release blockers. He is not
suggesting that no work has been done. I believe the community considers
release-blocks "the priority".

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Hot Standby: Relation-specific deferred conflict resolution

From
Robert Haas
Date:
On Fri, Jan 29, 2010 at 1:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:
>> Two months on, there is
>> zero sign of any activity on that front
>
> I'm surprised that you call 14 commits in 28 days following a publicly
> available priority list: "zero sign of activity".
>
> Further discussion seems pointless.

Wow, that was an awesome way to quote what I said out of context and
make it sound like I said something ridiculous.  The problem I and
others have is not with the quantity of your commits but with the
issues you are choosing (not) to address.

...Robert


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Josh Berkus
Date:
All,

Is there a working list of HS must-fix items somewhere which people
agree on?  Or are we still lacking consensus?

--Josh Berkus


Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 11:41 -0800, Josh Berkus wrote:
> All,
>
> Is there a working list of HS must-fix items somewhere which people
> agree on?  Or are we still lacking consensus?

VACUUM FULL, I believe is one.

Joshua D. Drake

>
> --Josh Berkus
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote:

> I think we should extend the time available to make sure we have a
> sensible set of features for 9.0. The heat of this discussion tells me
> that we are going to be lacking features that are must-have to someone,
> whether or not they are in the majority.

-1

Missing release dates because of some patch that isn't done is something
the community has been trying to get away from, aggressively. The way
this is supposed to work is:

We have a release date Features that aren't going to make that date, don't.
We release


Sincerely,

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.



Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:

> Exactly.  It would be nice to see 9.0 come out in 2010, and we're not
> going to get there unless we start fixing the issues that are actually
> release-blockers, rather than adding new features.  Hot Standby was
> committed with at least one known release blocker (VACUUM FULL) on the
> assumption that that release blocker would be fixed by the committer
> who introduced it (isn't that the rule?).  Two months on, there is
> zero sign of any activity on that front, and instead we're now being
> bombarded with a series of other patches that fix issues that are not
> release-blockers under the theory that the feature isn't good enough
> to be used without them.  If that's really true, it wasn't ready for
> commit in the first place.
> 
> If this were any other patch, I'd propose reverting it.
> 

I would suggest that if we don't see activity on the release blockers,
pretty much stat... we revert it.

Joshua D. Drake


> ...Robert
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.



Re: Hot Standby: Relation-specific deferred conflict resolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 18:08 +0000, Simon Riggs wrote:
> On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:
> > Two months on, there is
> > zero sign of any activity on that front
> 
> I'm surprised that you call 14 commits in 28 days following a publicly
> available priority list: "zero sign of activity".
> 
> Further discussion seems pointless.

Let's be clear. Robert is discussing release blockers. He is not
suggesting that no work has been done. I believe the community considers
release-blocks "the priority".

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.



Re: Hot Standby: Relation-specific deferred conflictresolution

From
"Joshua D. Drake"
Date:
On Fri, 2010-01-29 at 11:41 -0800, Josh Berkus wrote:
> All,
> 
> Is there a working list of HS must-fix items somewhere which people
> agree on?  Or are we still lacking consensus?

VACUUM FULL, I believe is one.

Joshua D. Drake

> 
> --Josh Berkus
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote:

> Can you explain what it does in
> more detail so we can understand why it's necessary for a sensible set
> of features?

I've slimmed down the patch to make it clearer what it does, having
committed some refactoring.

Problem: Currently when we perform conflict resolution we do not use the
relid from the WAL record, we target all users regardless of which
relations they have accessed or intend to access. So changes to table X
can cause cancelation of someone accessing table Y because **they might
later in the transaction access table X**. That is too heavy handed and
is most often overkill. This is the same problem you and I have
discussed many times, over the last 14 months, though the problem itself
has been discussed on hackers many times over last 20 months and many
potential solutions offered by me.

An example of current behaviour, using tables A, B and C
T0: An AccessExclusiveLock is applied to B
T1: Q1 takes snapshot, takes lock on A and begins query
T2: Q2 takes snapshot, queues for lock on B behind AccessExclusiveLock
T3: Cleanup on table C is handled that will conflict with both snapshots
T4: Q3 takes snapshot, takes lock on C and begins query (if possible)
T5: Cleanup on table C is handled that will conflict with Q3

Current: At T3, current conflict resolution will wait for
max_standby_delay and then cancel Q1 and Q2. Q3 can begin processing
immediately because the snapshot it takes will always be same or later
than the xmin that generated the cleanup at T3. At T5, Q3 will be
quickly cancelled because all the standby delay was used up at T3 and
there is none left to spend on delaying for Q3.

Proposed Resolution:
as presented to hackers in 12/2009
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php

Let's look at the effect first, then return to the detail.

In this proposal, the above sequence of actions will look like this:
Conflict resolution will wait at T3 until we hit max_standby_delay, at
which point we learn that Q1 and Q2 do not conflict and we let them
continue on their way. At T5, Q3 will be cancelled without much delay,
because we have now used up most of max_standby_delay.

So in both approaches, Q3 that accessed table C will be canceled fairly
quickly. The key to this is that in the new proposal, Q1 and Q2 will not
be canceled: they will continue to completion.

How it works: When we process a snapshot conflict we check which queries
have snapshots that conflict. We then wait for max_standby_delay and the
check lock conflicts. (We do it this way because of a timing issue
described on the link above, pointed out by Greg). When we check for
lock conflicts we also set a latestRemovedXid on "that relation", so
that we capture all current lockers *and* allow all future lockers to
check the latestRemovedXid against their snapshot. In either case, if a
lock conflict occurs then we will cancel the query.

I mention "that relation" because *where* we record the xid limit for
each relation is an important aspect of the design. In the current patch
we take a simple approach, others are possible. If there is already a
lock in the shared lock table, then we add the latestRemovedXid to that.
If not, we keep track of the latestRemovedXid for the whole lock
partition. So we aren't tracking each relation separately in most cases,
except for when a table is being frequently accessed, or access for a
long period.

There is also an optimization added here. When we defer cancelation of
queries the same query keeps re-appearing in the conflict list for later
WAL records. As a result there is a mechanism to avoid constant
re-listing of a conflict.

The attached patch is for review and discussion only at this stage.

I'm working on other areas now while discussion takes place, or not.

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Fri, 2010-01-29 at 15:01 +0000, Simon Riggs wrote:

> Putting it back takes time and
> given enough of that rare cloth, it will eventually be put back.

Looks like I'll have time to add the starts-at-shutdown-checkpoint item
back in after all.

I'd appreciate it if you could review the relation-specific conflict
patch, 'cos it's still important.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby: Relation-specific deferred conflict resolution

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2010-01-29 at 15:01 +0000, Simon Riggs wrote:
> 
>> Putting it back takes time and
>> given enough of that rare cloth, it will eventually be put back.
> 
> Looks like I'll have time to add the starts-at-shutdown-checkpoint item
> back in after all.

Great! Thank you, much appreciated.

> I'd appreciate it if you could review the relation-specific conflict
> patch, 'cos it's still important.

One fundamental gripe I have about that approach is that it's hard to
predict when you will be saved by the cache and when your query will be
canceled. For example, the patch stores only one "latestRemovedXid"
value per lock partition. So if you have two tables that hash to
different lock partitions, and are never both accessed in a single
transaction, the cache will save your query every time. So far so good,
but then you do a dump/restore, and the tables happen to be assigned to
the same lock partition. Oops, a system that used to work fine starts to
get "snapshot too old" errors.

It's often better to be consistent and predictable, even if it means
cancelling more queries. I think wë́'d need to have a much more
fine-grained system before it's worthwhile to do deferred resolution.
There's just too much "false sharing" otherwise.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby: Relation-specific deferred conflict resolution

From
Simon Riggs
Date:
On Tue, 2010-02-02 at 20:27 +0200, Heikki Linnakangas wrote:

> > I'd appreciate it if you could review the relation-specific conflict
> > patch, 'cos it's still important.
> 
> One fundamental gripe I have about that approach is that it's hard to
> predict when you will be saved by the cache and when your query will be
> canceled. For example, the patch stores only one "latestRemovedXid"
> value per lock partition. So if you have two tables that hash to
> different lock partitions, and are never both accessed in a single
> transaction, the cache will save your query every time. So far so good,
> but then you do a dump/restore, and the tables happen to be assigned to
> the same lock partition. Oops, a system that used to work fine starts to
> get "snapshot too old" errors.
> 
> It's often better to be consistent and predictable, even if it means
> cancelling more queries. I think wë́'d need to have a much more
> fine-grained system before it's worthwhile to do deferred resolution.
> There's just too much "false sharing" otherwise.

ISTM that this is exactly backwards. There is already way too many false
positives and this patch would reduce them very significantly. Plus the
cancelation is hardly predictable since it relies on whether or not a
btree delete takes place during execution and the arrival time and rate
of those is sporadic. There is no safe, predicatable behaviour in the
current code.

The gripe about the cache cannot be a fundamental one, since we can
easily change the size and mechanism by which the cache operates without
changing the patch very much at all.

I am being told this area is a must-fix issue for this release. Tom's
reaction to this issue (on other thread) illustrates that beautifully:

On Sun, 2010-01-31 at 15:41 -0500, Tom Lane wrote: 
> Simon Riggs <simon@2ndQuadrant.com> writes:
(snip) 
> > 2. no matter if they haven't accessed the index being cleaned (they
> > might later, is the thinking...)
> 
> That seems seriously horrid.  What is the rationale for #2 in
> particular?  I would hope that at worst this would affect sessions
> that are actively competing for the index being cleaned.

-- Simon Riggs           www.2ndQuadrant.com