Thread: Recovery Test Framework

Recovery Test Framework

From

Simon Riggs

Date:

11 January 2009, 12:34:09

Recovery doesn't have a test framework as yet. I would like to add one
for this release, especially since we have so much recovery-related code
being added to the release (and manual testing is so time consuming).
Testing Hot Standby will also test sync rep, PITR etc, and could easily
uncover a few problems hiding in the background that have lain dormant.

The current regression tests are all self-contained tests that create
objects, insert data, run tests and then cleanup again. Almost every
single test case is read-write.

This gives a few problems for recovery & Hot Standby
* tests cannot easily be split so that read/write happens on master and
test execution happens on standby (or on both master and standby)
* there is no easy way to synchronise object creation on master and test
execution on standby

So I propose to setup two test schedules
* rep_master_schedule
* rep_standby_schedule
to be executed using pg_regress concurrently on separate database
servers running on different ports on the same system.

A test table would keep track of which tables have had their
prerequisites met, and rep_standby_schedule would wait until a test was
correctly set up before running the test. This would be achieved using
the attached test framework code.

We would then include newly written tests, rather than attempt to use
existing tests - but use the same framework of /sql /out /expected. Some
of these have already been written for HS.

Is this something the community would like to see included within the
distribution, or should we just keep or private and publish test results
using it. I would prefer the former since it would allow us to prove
that the code works and to be able to check for specific regressions as
bugs appear. It may also help the community to work together on the
various aspects of recovery code that are being included in 8.4.

It would be massively cool to be able to add this to the build farm.
There would be few blockers because with two servers running on same
system there are few OS specific aspects to this.

If people can discuss what would be required we should be able to get it
done in the near term, i.e. over next 2-3 weeks.

Comments?

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

hs.testframework.v1.sql

Re: Recovery Test Framework

From

Tom Lane

Date:

11 January 2009, 13:08:00

Simon Riggs <simon@2ndQuadrant.com> writes:
> Recovery doesn't have a test framework as yet. I would like to add one
> for this release, especially since we have so much recovery-related code
> being added to the release (and manual testing is so time consuming).

I've been thinking for some time that putting replication into 8.4
has proven to be an unreasonably optimistic goal.  Seeing new
requirements like this one pop up two months after feature freeze
kind of drives the point home.

I think it's time to back off and agree that we should target all this
stuff for 8.5.  I don't want our first release of replication to be
flaky, but I hardly see how it will be anything else if it ships in 8.4.
        regards, tom lane

Re: Recovery Test Framework

From

"Robert Haas"

Date:

11 January 2009, 15:32:16

On Sun, Jan 11, 2009 at 12:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Recovery doesn't have a test framework as yet. I would like to add one
>> for this release, especially since we have so much recovery-related code
>> being added to the release (and manual testing is so time consuming).
>
> I've been thinking for some time that putting replication into 8.4
> has proven to be an unreasonably optimistic goal.  Seeing new
> requirements like this one pop up two months after feature freeze
> kind of drives the point home.

I don't have a strong feeling as to which of the replication-related
patches are ready to commit, but I don't think that the fact that
Simon has an idea for improving the test framework is a reason for
rejecting them.  Referring to an idea for a new test framework for
recovery as "a new requirement for replication" is quite a stretch.

It might also be pointed out that the "Infrastructure Changes for
Recovery" patch was originally submitted for the September CommitFest,
but since review stopped for about 6 weeks beginning just after the
first of October, picked up briefly again on November 17th with a few
messages from Heikki, and then died again until late December, it's
perhaps not surprising that not a lot of progress has been made....
particularly since no reviewers were assigned for a ridiculously long
time.  "Infrastructure Changes for Recovery" was moved from the
September Commitfest with your name and Heikki's name already on it,
and no one else was ever assigned (nor did you provide any more
review, at least as far as I can remember seeing on -hackers).

"Hot Standby" finally had a reviewer assigned on November 26th, when
Pavan Daolesee was added - but I'm not even sure that was an official
RRR assignment, I think he may have just picked it up.  At any rate,
when a reviewer isn't assigned for almost four weeks, and it's at that
point the day before Thanksgiving, well, don't expect a lot to get
done before Christmas.  SE-Postgresql was even more egregious - it
certainly never had a round robin reviewer and was listed as being
reviewed by "Nobody" for over a month.

All of this is particularly mysterious to me in light of the fact that
it was you yourself who suggested that we should make sure to get
feedback out to the authors of major patches early.

http://archives.postgresql.org/message-id/3555.1225336370@sss.pgh.pa.us

I personally reviewed at least 10 patches for this CommitFest.  I
thought the point of that was to take some of the load off of the
committers so that they could focus on major new features like
replication.  Otherwise, what is the point of having round robin
reviewers?  And what is the point of saying that we want replication?

...Robert

Re: Recovery Test Framework

From

Simon Riggs

Date:

12 January 2009, 09:22:22

On Sun, 2009-01-11 at 12:07 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Recovery doesn't have a test framework as yet. I would like to add one
> > for this release, especially since we have so much recovery-related code
> > being added to the release (and manual testing is so time consuming).
> 
> I've been thinking for some time that putting replication into 8.4
> has proven to be an unreasonably optimistic goal.  Seeing new
> requirements like this one pop up two months after feature freeze
> kind of drives the point home.
> 
> I think it's time to back off and agree that we should target all this
> stuff for 8.5.  I don't want our first release of replication to be
> flaky, but I hardly see how it will be anything else if it ships in 8.4.

I understand that. As you know, I have been concerned and disappointed
by the progress of sync rep in particular, though salute Fujii-san's
personal effort and skill.

Which patches were you thinking of when you say "all this stuff"?

As a main reviewer of Sync Rep, I can say it's shaping up nicely. I
don't have any reason now to veto it for architectural reasons and it
covers many subtle, second level issues very well that would be easily
missed in re-designs. It has some innovative features that make it best
in class. Is it flaky? Not fundamentally; code wise I see it more as a
question of time. Does it do everything? No, some advanced features
(multiple streaming standbys, single command setup for small installs)
have been deferred to later releases. 

Now that it's time to discuss such things I personally think we have run
out of time though for WAL I/O read-ahead ("Proposal of PITR performance
improvement") especially since tests show it has little advantage with
FPW enabled. If we really need to we could lose most of my
Infrastructure patch, since that adds fast failover and additional
performance with bgwriter.

If we insist upon cuts, we can lose some patches and code and yet still
maintain the popular headline features of both Sync Rep and Hot Standby.
Realistically, we need your attention if we are to include them. I can
list points where your attention would be especially welcome since the
patches are relatively large. Let's look at the detail of what we need
to do rather than the broad brush.

Back to the test framework: this is not relevant to replication. PITR
and crash recovery are all manually tested and have been for years.
Testing Hot Standby *revealed* a bug in visibility maps that went
through otherwise unnoticed and I think there are others, new and old.
Even if we reject replication entirely, a recovery test framework is
going to increase the robustness of what we *currently* have. It's not a
new requirement; what is new is I now have some sponsorship money
explicitly earmarked for this, but as part of the HS project and indeed
the test framework relies upon HS to operate at all. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Recovery Test Framework

From

Gregory Stark

Date:

12 January 2009, 10:01:55

Simon Riggs <simon@2ndQuadrant.com> writes:

> Is it flaky? Not fundamentally; code wise I see it more as a
> question of time. 

"a question of time" indeed.

> If we insist upon cuts, ...

> Even if we reject replication entirely ...

There's a clear difference between how you're thinking about this and I do.
The way I see it nobody suggested cutting or rejecting anything, just
committing it into a different branch for a different release date. It would
give us a year of experience seeing the code in action before releasing it on
the world.

I'm not sure whether it's too immature to commit, I haven't read the patch;
from what I see in the mailing list it seems about as ready as other large
patches in the past which were committed. But from my point of view it would
just always be better to commit large patches immediately after forking a
release instead of just before the beta to give them a whole release cycle of
exposure to developers before beta testers.

I'm not sure if this is the right release cycle to start this policy, but I
would like to convince people of this at some point so we can start having a
flood of big commits at the *beginning* of the release cycle and then a whole
release cycle of incremental polishing to those features rather than always
having freshly committed features in our releases that none of us has much
experience with.

> a recovery test framework is going to increase the robustness of what we
> *currently* have. It's not a new requirement; what is new is I now have some
> sponsorship money explicitly earmarked for this, but as part of the HS
> project and indeed the test framework relies upon HS to operate at all.

I agree with you that additional tests don't represent any immaturity in the
patch. They don't affect the run-time behaviour and I love the fact that they
might turn up any problems with our existing recovery process.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 10:04:53

Gregory Stark <stark@enterprisedb.com> writes:
> ... But from my point of view it would
> just always be better to commit large patches immediately after forking a
> release instead of just before the beta to give them a whole release cycle of
> exposure to developers before beta testers.

I'm in favor of such an approach for this work, but it'll never fly as a
general project policy.  People already dislike the fact that it takes
up to a year before their work gets reflected into a public release.
With such a policy we'd be telling developers "whatever you submit won't
see the light of day for one to two years".  Not good for a project that
depends on the willingness of developers to scratch their own itches.

However, we are getting off onto a tangent.  I wasn't trying to start
a discussion about general project policies, but about the specific
status of this particular group of patches.
        regards, tom lane

Re: Recovery Test Framework

From

Simon Riggs

Date:

12 January 2009, 10:35:30

On Mon, 2009-01-12 at 09:04 -0500, Tom Lane wrote:

> I wasn't trying to start
> a discussion about general project policies, but about the specific
> status of this particular group of patches.

Which ones exactly? 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 10:56:51

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, 2009-01-12 at 09:04 -0500, Tom Lane wrote:
>> I wasn't trying to start
>> a discussion about general project policies, but about the specific
>> status of this particular group of patches.

> Which ones exactly? 

Well, one of the things that makes me uncomfortable is that it's not
even clear exactly which set of patches is currently proposed for
inclusion.  We've seen a whole lot of URLs fly back and forth, many
of them pointing at pages that aren't there a few days later.
I've been too busy with non-replication-related patches to pay really
close attention, but I certainly don't get the impression that there's
a stable set of patches waiting to be applied.  (And for the record,
there is nothing I like even a little bit about the practice of posting
a URL instead of an actual patch.)
        regards, tom lane

Re: Recovery Test Framework

From

Simon Riggs

Date:

12 January 2009, 11:19:59

On Mon, 2009-01-12 at 09:55 -0500, Tom Lane wrote:
> (And for the record,
> there is nothing I like even a little bit about the practice of posting
> a URL instead of an actual patch.)

I don't like it either.

The patchsets are too big to post to the list directly, at least that is
the reason in my case and with Fujii-san and KaiGai's cases.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Recovery Test Framework

From

Stefan Kaltenbrunner

Date:

12 January 2009, 11:30:13

Simon Riggs wrote:
> On Mon, 2009-01-12 at 09:55 -0500, Tom Lane wrote:
>> (And for the record,
>> there is nothing I like even a little bit about the practice of posting
>> a URL instead of an actual patch.)
> 
> I don't like it either.
> 
> The patchsets are too big to post to the list directly, at least that is
> the reason in my case and with Fujii-san and KaiGai's cases.

yeah - afaik we still have a 100k limit on -hackers.


Stefan

Re: Recovery Test Framework

From

"Guillaume Smet"

Date:

12 January 2009, 11:37:39

On Mon, Jan 12, 2009 at 3:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, we are getting off onto a tangent.  I wasn't trying to start
> a discussion about general project policies, but about the specific
> status of this particular group of patches.

I concur with Gregory on this one.

IM(Very)HO, it's really too late in the cycle to commit these features
(ie sync rep and hot standby). They are supposed to guarantee high
availability and data security and they must be rock solid. Having
them commited just before the release seems to me like a very
dangerous way to publish them.

A lot of users are waiting for these features so they really should be
usable and rock solid before they get released to the public. One more
year without them is perhaps better than causing problems on critical
databases.

Apart from the features themselves, what people expect the most (at
least the ones I met) is a replication feature which is simple to set
up and integrated. A polished "user interface" is probably what is the
most important from the user point of view (correctness and stability
are a minimum). That's what is going to make a difference with what
already existed (for the users I know).

I'm just handwaving but I think there's probably need for at least one
more month to get these patches reviewed and ready to commit
(considering there are very few people able to review them and to fix
problem in this set of patches).

Note that I don't question the quality of the patches, just that there
will be very little time to test the final code commited before the
release.

-- 
Guillaume

Re: Recovery Test Framework

From

Peter Eisentraut

Date:

12 January 2009, 11:49:14

Simon Riggs wrote:
> Recovery doesn't have a test framework as yet.

I have been having these concerns as well.  In fact, I recall 
discussions at least 8 years back about how pg_dump doesn't really have 
any organized testing, and we also have little regular testing of PITR 
aside from specific exercises that users or developers occasionally run.

The question remains how to do it.  Running read-only queries on a slave 
doesn't show anything about how well the write-relevant parts of WAL 
archiving work.  That's not to say it's not interesting to test that, 
but there is really a lot more to having a full test suite for our 
backup and recovery facilities.

Re: Recovery Test Framework

From

"Merlin Moncure"

Date:

12 January 2009, 11:56:49

On 1/12/09, Guillaume Smet <guillaume.smet@gmail.com> wrote:
> On Mon, Jan 12, 2009 at 3:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  > However, we are getting off onto a tangent.  I wasn't trying to start
>  > a discussion about general project policies, but about the specific
>  > status of this particular group of patches.
>
> I concur with Gregory on this one.
>
>  IM(Very)HO, it's really too late in the cycle to commit these features
>  (ie sync rep and hot standby). They are supposed to guarantee high
>  availability and data security and they must be rock solid. Having
>  them commited just before the release seems to me like a very
>  dangerous way to publish them.

I disagree at least with hot standby.  I've been using/testing (as
have others) it under a variety of workloads for several months now
with no issues outside of corrected issues in the very early patches.
Also, a relatively few amount of people update/build from cvs
frequently so being committed late in the release cycle isn't as
important as you are claiming...the real 'wider net' testing happens
when the beta period begins.

IMO, Simon needs to produce a patch (quickly), have it be reviewed,
and get it included/excluded based on its merits.

merlin

Re: Recovery Test Framework

From

"Guillaume Smet"

Date:

12 January 2009, 12:07:23

On Mon, Jan 12, 2009 at 4:56 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I disagree at least with hot standby.  I've been using/testing (as
> have others) it under a variety of workloads for several months now
> with no issues outside of corrected issues in the very early patches.
> Also, a relatively few amount of people update/build from cvs
> frequently so being committed late in the release cycle isn't as
> important as you are claiming...the real 'wider net' testing happens
> when the beta period begins.

Update/build from CVS != Update/build from CVS + apply the replication
patches + test them explicitely.

That said, I didn't have the time to test them myself so I feel also
responsible for that.

My point is that what Simon currently has (and so what you tested) is
different from what is going to be commited (note the "final" in what
I wrote) and I suspect there will be a certain number of non
negligible adjustments (see the last discussions between Simon and
Heikki and I don't think Tom has taken a look at these patches yet).

I'm not sure that the beta/rc testing cycle is sufficient for such a
critical feature and that we probably need some time to polish it.

But once again, it's just MVHO.

-- 
Guillaume

Re: Recovery Test Framework

From

"Greg Stark"

Date:

12 January 2009, 12:11:55

On Mon, Jan 12, 2009 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, 2009-01-12 at 09:04 -0500, Tom Lane wrote:
>
> Well, one of the things that makes me uncomfortable is that it's not
> even clear exactly which set of patches is currently proposed for
> inclusion.  We've seen a whole lot of URLs fly back and forth, many
> of them pointing at pages that aren't there a few days later.
> I've been too busy with non-replication-related patches to pay really
> close attention, but I certainly don't get the impression that there's
> a stable set of patches waiting to be applied.

See this is one of the things which bothers me. I don't see any
advantage in forcing Simon to stop making improvements -- and there
are always improvements to be made -- just to make his code seem more
stable.

Obviously we want to avoid having people actively stepping on each
others' toes, but as long as the code isn't actively being worked on
by anyone else by it would be silly to ask Simon to just sit on his
hands when he sees further things that can be done.

-- 
greg

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 12:18:55

"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> On Mon, Jan 12, 2009 at 4:56 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I disagree at least with hot standby.  I've been using/testing (as
>> have others) it under a variety of workloads for several months now
>> with no issues outside of corrected issues in the very early patches.

> My point is that what Simon currently has (and so what you tested) is
> different from what is going to be commited (note the "final" in what
> I wrote) and I suspect there will be a certain number of non
> negligible adjustments (see the last discussions between Simon and
> Heikki and I don't think Tom has taken a look at these patches yet).

The thing that's disturbing me is that (to judge by what I've been
seeing on the mailing list) there's been a steady stream of "non
negligible adjustments" for the past two months.  That's good from
the standpoint that problems are getting found and fixed, but it's
not giving me any warm fuzzies about the code being ready to go.

Basically I think we are up against the same type of project management
decision we've had several times before: are we willing to slip the
8.4 release schedule for however long it will take for hot standby
and the other replication-related features to be ready?  At this point
I think there can be no question that it will not be a small slip;
in fact I'm not even prepared to guess at how long it will take.
        regards, tom lane

Re: Recovery Test Framework

From

"Greg Stark"

Date:

12 January 2009, 12:20:08

On Mon, Jan 12, 2009 at 11:07 AM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Mon, Jan 12, 2009 at 4:56 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I disagree at least with hot standby.  I've been using/testing (as
>> have others) it under a variety of workloads for several months now
>> with no issues outside of corrected issues in the very early patches.
>> Also, a relatively few amount of people update/build from cvs
>> frequently so being committed late in the release cycle isn't as
>> important as you are claiming...the real 'wider net' testing happens
>> when the beta period begins.
>
> Update/build from CVS != Update/build from CVS + apply the replication
> patches + test them explicitely.
>
> That said, I didn't have the time to test them myself so I feel also
> responsible for that.

In the general case I think plenty of people update and build from CVS
regularly. It's great that the FSM has been in for a couple months
before the beta, we've uncovered a couple problems which could easily
have slipped through the betas for example.

In the case of hot standby and replication I'm not really sure that
logic applies. It takes quite a lot of work to test these features and
they don't turn up problems in other areas when you're not running
them. So I doubt it would really have helped in this case.

-- 
greg

Re: Recovery Test Framework

From

David Fetter

Date:

12 January 2009, 12:27:19

On Mon, Jan 12, 2009 at 11:11:20AM -0500, Greg Stark wrote:
> On Mon, Jan 12, 2009 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> >> On Mon, 2009-01-12 at 09:04 -0500, Tom Lane wrote:
> >
> > Well, one of the things that makes me uncomfortable is that it's
> > not even clear exactly which set of patches is currently proposed
> > for inclusion.  We've seen a whole lot of URLs fly back and forth,
> > many of them pointing at pages that aren't there a few days later.
> > I've been too busy with non-replication-related patches to pay
> > really close attention, but I certainly don't get the impression
> > that there's a stable set of patches waiting to be applied.
> 
> See this is one of the things which bothers me. I don't see any
> advantage in forcing Simon to stop making improvements -- and there
> are always improvements to be made -- just to make his code seem
> more stable.

Two things to fix this, and several other problems:

1.  Remove the messages size limits on -hackers.  They serve no useful
purpose, and they interfere with our development process.  If -hackers
isn't already subscriber-only, now would be the time to make it so.

2.  Start using more git, as many hackers and committers have already
started to do.  This is the kind of situation where CVS just plain
falls down because branching and merging are unmanageably difficult in
it, where in git, they're many-times-a-day operations.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 12:34:44

David Fetter <david@fetter.org> writes:
> Two things to fix this, and several other problems:

> 1.  Remove the messages size limits on -hackers.  They serve no useful
> purpose, and they interfere with our development process.

Agreed, or at least boost it up a good bit more.

> If -hackers
> isn't already subscriber-only, now would be the time to make it so.

Not sure how that's relevant?

> 2.  Start using more git, as many hackers and committers have already
> started to do.  This is the kind of situation where CVS just plain
> falls down because branching and merging are unmanageably difficult in
> it, where in git, they're many-times-a-day operations.

This is a red herring, unless your proposal also includes making the
master CVS^H^H^Hgit repository world-writable.  The complaint I have
about people posting URLs is that there's no stable archive of what the
patches really were, and just because it came out of someone's local git
repository doesn't help that.
        regards, tom lane

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 12:38:29

On Mon, 2009-01-12 at 11:18 -0500, Tom Lane wrote:

> > My point is that what Simon currently has (and so what you tested) is
> > different from what is going to be commited (note the "final" in what
> > I wrote) and I suspect there will be a certain number of non
> > negligible adjustments (see the last discussions between Simon and
> > Heikki and I don't think Tom has taken a look at these patches yet).
> 
> The thing that's disturbing me is that (to judge by what I've been
> seeing on the mailing list) there's been a steady stream of "non
> negligible adjustments" for the past two months.  That's good from
> the standpoint that problems are getting found and fixed, but it's
> not giving me any warm fuzzies about the code being ready to go.

This is the same thing that makes me nervous. The feature appears to be
"Under heavy development". As I understand the development model the
heavy development is supposed to happen before commit fest.
> 
> Basically I think we are up against the same type of project management
> decision we've had several times before: are we willing to slip the
> 8.4 release schedule for however long it will take for hot standby
> and the other replication-related features to be ready? 

I would certainly not like to see 8.4 slip.

>  At this point
> I think there can be no question that it will not be a small slip;
> in fact I'm not even prepared to guess at how long it will take.

Not a comforting thought.

Sincerely,

Joshua D Drake


> 
>             regards, tom lane
> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 12:40:01

On Mon, 2009-01-12 at 11:33 -0500, Tom Lane wrote:

> > If -hackers
> > isn't already subscriber-only, now would be the time to make it so.
> 
> Not sure how that's relevant?

So we don't get spam patches. 

Joshua D. Drake


> 
>             regards, tom lane
> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

David Fetter

Date:

12 January 2009, 12:42:51

On Mon, Jan 12, 2009 at 11:33:43AM -0500, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > Two things to fix this, and several other problems:
> 
> > 1.  Remove the messages size limits on -hackers.  They serve no
> > useful purpose, and they interfere with our development process.
> 
> Agreed, or at least boost it up a good bit more.
> 
> > If -hackers isn't already subscriber-only, now would be the time
> > to make it so.
> 
> Not sure how that's relevant?

Spam and wackiness.  Consider what Dmitry Turin would do with an
unlimited ability to send his "specs" to -hackers.

> > 2.  Start using more git, as many hackers and committers have
> > already started to do.  This is the kind of situation where CVS
> > just plain falls down because branching and merging are
> > unmanageably difficult in it, where in git, they're
> > many-times-a-day operations.
> 
> This is a red herring, unless your proposal also includes making the
> master CVS^H^H^Hgit repository world-writable.  The complaint I have
> about people posting URLs is that there's no stable archive of what
> the patches really were, and just because it came out of someone's
> local git repository doesn't help that.

The master repository need not be world-writeable, but as many public
ones as needed for development should be.  I'd love for people to use
our infrastructure, but github, etc., would also work.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Recovery Test Framework

From

"Guillaume Smet"

Date:

12 January 2009, 12:43:28

On Mon, Jan 12, 2009 at 5:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Basically I think we are up against the same type of project management
> decision we've had several times before: are we willing to slip the
> 8.4 release schedule for however long it will take for hot standby
> and the other replication-related features to be ready?  At this point
> I think there can be no question that it will not be a small slip;
> in fact I'm not even prepared to guess at how long it will take.

What I wouldn't like to see is the replication patches becoming
another "Bitmap index on disk" patch. If we release 8.4 and postpone
replication to 8.5, we really need a plan to concentrate the efforts
of the few people capable of making it happen in the very few months
of the 8.5 cycle.

-- 
Guillaume

Re: Recovery Test Framework

From

"Merlin Moncure"

Date:

12 January 2009, 12:47:45

On 1/12/09, Joshua D. Drake <jd@commandprompt.com> wrote:
>
>  > Basically I think we are up against the same type of project management
>  > decision we've had several times before: are we willing to slip the
>  > 8.4 release schedule for however long it will take for hot standby
>  > and the other replication-related features to be ready?
>
>
> I would certainly not like to see 8.4 slip.

8.4 has already slipped.  From my (basically a user's persepective)
the thing is ready to drop in right now...it works wonderfully.  So,

*) Does Simon intend for 8.4 release
*) Is the thing up to production standards?
*) If so, is a patch ready?

merlin

Re: Recovery Test Framework

From

"Dave Page"

Date:

12 January 2009, 12:49:31

On Mon, Jan 12, 2009 at 4:37 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> On Mon, 2009-01-12 at 11:18 -0500, Tom Lane wrote:
>
>> Basically I think we are up against the same type of project management
>> decision we've had several times before: are we willing to slip the
>> 8.4 release schedule for however long it will take for hot standby
>> and the other replication-related features to be ready?
>
> I would certainly not like to see 8.4 slip.

I would. PostgreSQL is not a commercial application which has to be
released on schedule to satisfy shareholders - it's an Open Source
project that aims to provide it's users with useful features. We have
two extremely useful features here (hot standby and sync replication)
which together will make this a killer release for many people - we
can delay a month or two as required to polish and get them ready for
release, or decide we're willing to wait another 12 - 14 months for
them to be available for end users.

I'd much rather see them included than deferred (particularly hot
standby, parts of which have been awaiting review for months now
anyway, through no fault of Simons).

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Magnus Hagander

Date:

12 January 2009, 12:51:22


On 12 jan 2009, at 17.42, David Fetter <david@fetter.org> wrote:

> On Mon, Jan 12, 2009 at 11:33:43AM -0500, Tom Lane wrote:
>> David Fetter <david@fetter.org> writes:
>>> Two things to fix this, and several other problems:
>>
>>> 1.  Remove the messages size limits on -hackers.  They serve no
>>> useful purpose, and they interfere with our development process.
>>
>> Agreed, or at least boost it up a good bit more.
>>
>>> If -hackers isn't already subscriber-only, now would be the time
>>> to make it so.
>>
>> Not sure how that's relevant?
>
> Spam and wackiness.  Consider what Dmitry Turin would do with an
> unlimited ability to send his "specs" to -hackers.

It is. Anything submitted from addresses not subscribed is moderated.  
I tested this by mistake today by sending from the wrong address.


>
>>> 2.  Start using more git, as many hackers and committers have
>>> already started to do.  This is the kind of situation where CVS
>>> just plain falls down because branching and merging are
>>> unmanageably difficult in it, where in git, they're
>>> many-times-a-day operations.
>>
>> This is a red herring, unless your proposal also includes making the
>> master CVS^H^H^Hgit repository world-writable.  The complaint I have
>> about people posting URLs is that there's no stable archive of what
>> the patches really were, and just because it came out of someone's
>> local git repository doesn't help that.
>
> The master repository need not be world-writeable, but as many public
> ones as needed for development should be.  I'd love for people to use
> our infrastructure, but github, etc., would also work.

As much as I'm starting to join the "let's move the main repo to git"  
crowd, all you need for what you're suggesting here is a stable git  
mirror on git.postgresql.org.

/Magnus

>
>
> Cheers,
> David.
> -- 
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Recovery Test Framework

From

David Fetter

Date:

12 January 2009, 12:52:48

On Mon, Jan 12, 2009 at 05:50:19PM +0100, Magnus Hagander wrote:
>>>> 2.  Start using more git, as many hackers and committers have
>>>> already started to do.  This is the kind of situation where CVS
>>>> just plain falls down because branching and merging are
>>>> unmanageably difficult in it, where in git, they're
>>>> many-times-a-day operations.
>>>
>>> This is a red herring, unless your proposal also includes making
>>> the master CVS^H^H^Hgit repository world-writable.  The complaint
>>> I have about people posting URLs is that there's no stable archive
>>> of what the patches really were, and just because it came out of
>>> someone's local git repository doesn't help that.
>>
>> The master repository need not be world-writeable, but as many
>> public ones as needed for development should be.  I'd love for
>> people to use our infrastructure, but github, etc., would also
>> work.
>
> As much as I'm starting to join the "let's move the main repo to
> git"  crowd, all you need for what you're suggesting here is a
> stable git  mirror on git.postgresql.org.

Agreed :)

Cheers,
David (happy to help by setting people up on git.postgresql.org).
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Recovery Test Framework

From

Grzegorz Jaskiewicz

Date:

12 January 2009, 13:03:15

On 2009-01-12, at 16:48, Dave Page wrote:

> On Mon, Jan 12, 2009 at 4:37 PM, Joshua D. Drake  
> <jd@commandprompt.com> wrote:
>> On Mon, 2009-01-12 at 11:18 -0500, Tom Lane wrote:
>>
>>> Basically I think we are up against the same type of project  
>>> management
>>> decision we've had several times before: are we willing to slip the
>>> 8.4 release schedule for however long it will take for hot standby
>>> and the other replication-related features to be ready?
>>
>> I would certainly not like to see 8.4 slip.
>
> I would. PostgreSQL is not a commercial application which has to be
> released on schedule to satisfy shareholders - it's an Open Source
> project that aims to provide it's users with useful features. We have
> two extremely useful features here (hot standby and sync replication)
> which together will make this a killer release for many people - we
> can delay a month or two as required to polish and get them ready for
> release, or decide we're willing to wait another 12 - 14 months for
> them to be available for end users.
>
> I'd much rather see them included than deferred (particularly hot
> standby, parts of which have been awaiting review for months now
> anyway, through no fault of Simons).

+1

Re: Recovery Test Framework

From

"Guillaume Smet"

Date:

12 January 2009, 13:17:36

On Mon, Jan 12, 2009 at 5:48 PM, Dave Page <dpage@pgadmin.org> wrote:
> I would. PostgreSQL is not a commercial application which has to be
> released on schedule to satisfy shareholders - it's an Open Source
> project that aims to provide it's users with useful features.

It has nothing to do with commercial/non commercial. It's basically a
decision of time based releases vs features based releases.

For many years now, new versions of PostgreSQL have been released on a
time based schedule (one version/year in december) even if it was not
a strong decision.

> We have
> two extremely useful features here (hot standby and sync replication)
> which together will make this a killer release for many people - we
> can delay a month or two as required to polish and get them ready for
> release, or decide we're willing to wait another 12 - 14 months for
> them to be available for end users.

IMHO, if it takes 4 months to have these patches in the tree, it's not
worth it: do we accept other patches during this period or not? If so,
on which basis, if not how are we going to deal with 3-4 months of
patches waiting for review.

Note that delaying 8.4 is also delaying the other features of 8.4
which are ready (new FSM, CTE, Windowing functions). Personnaly,
integrated replication is by far the most expected feature but I'm not
sure it's the case for everyone, especially if they have to wait 3-4
more months.

It's really a matter of how far we are of having these patches in
their final form (and I mean after reviewing). And as Tom stated, it's
currently hard to know.

IMHO, Simon's proposal to identify which parts especially needs
attention is a very good idea. I really think these patches need a
thorough review sooner than later: we won't make it happen by letting
Simon write code alone without feedback.

After this first review, we should be able to know if it's a matter of
1 month or 3. Without knowing this, it's hard to take a good decision.

-- 
Guillaume

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 13:20:13

"Dave Page" <dpage@pgadmin.org> writes:
> On Mon, Jan 12, 2009 at 4:37 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> I would certainly not like to see 8.4 slip.

> I would. PostgreSQL is not a commercial application which has to be
> released on schedule to satisfy shareholders - it's an Open Source
> project that aims to provide it's users with useful features. We have
> two extremely useful features here (hot standby and sync replication)
> which together will make this a killer release for many people

Yeah, but there are already a number of things in 8.4 that are killer
features for various applications --- window functions and WITH to take
two recently-committed examples.  Should we sit on those for however
long it will take to make replication release-worthy?

In general, we have always regretted it in the past when we chose to
slip a release waiting for a specific feature...
        regards, tom lane

Re: Recovery Test Framework

From

Alvaro Herrera

Date:

12 January 2009, 13:21:48

Peter Eisentraut wrote:
> Simon Riggs wrote:
>> Recovery doesn't have a test framework as yet.
>
> I have been having these concerns as well.  In fact, I recall  
> discussions at least 8 years back about how pg_dump doesn't really have  
> any organized testing, and we also have little regular testing of PITR  
> aside from specific exercises that users or developers occasionally run.
>
> The question remains how to do it.

A very simple way to start would be to stop emitting shutdown
checkpoints.  This would mean that any server restart would cause
recovery.  This wouldn't be exhaustive, but enough people doing that
should cause at least some bugs to emerge.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 13:22:09

On Mon, 2009-01-12 at 16:48 +0000, Dave Page wrote:
> On Mon, Jan 12, 2009 at 4:37 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> > On Mon, 2009-01-12 at 11:18 -0500, Tom Lane wrote:
> >
> >> Basically I think we are up against the same type of project management
> >> decision we've had several times before: are we willing to slip the
> >> 8.4 release schedule for however long it will take for hot standby
> >> and the other replication-related features to be ready?
> >
> > I would certainly not like to see 8.4 slip.
> 
> I would. PostgreSQL is not a commercial application which has to be
> released on schedule to satisfy shareholders - it's an Open Source
> project that aims to provide it's users with useful features.

The community are our shareholders.

>  We have
> two extremely useful features here (hot standby and sync replication)
> which together will make this a killer release for many people - we

And for others we already have a killer release. 

> can delay a month or two as required to polish and get them ready for
> release, or decide we're willing to wait another 12 - 14 months for
> them to be available for end users.

Right. Except that isn't really the question at hand is it? The above is
just a potential result of the question at hand. The low level question
is, "do we feel comfortable from a technical (not a whiz bang) level
with the diligence that has been provided this code. 

If we don't it should push to 8.5. These are "features" not core
requirements of the product. They can wait until another release if need
be. We already have a gargantuan list of whiz bang features in this
release.

IMO, the reasons to delay a release:

We broke autovacuum
MVCC is no longer MVCC
Our grammar looks like MySQL
Constraints no longer constrain

Not:

I want super duper feature.

> 
> I'd much rather see them included than deferred (particularly hot
> standby, parts of which have been awaiting review for months now
> anyway, through no fault of Simons).
> 

Well its really nobody's fault except the hacker that didn't step up to
do the work. I believe all hackers have already been working diligently.

Sincerely,

Joshua D. Drake

-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Dave Page"

Date:

12 January 2009, 13:27:59

On Mon, Jan 12, 2009 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Dave Page" <dpage@pgadmin.org> writes:
> Yeah, but there are already a number of things in 8.4 that are killer
> features for various applications --- window functions and WITH to take
> two recently-committed examples.  Should we sit on those for however
> long it will take to make replication release-worthy?

When chatting with 'real' users (as opposed to us lot) at LinuxLive
late last year, sync rep and hot standby were the features they all
seemed to want. The others are excellent additions as well of course,
but most people didn't seem as interested.

> In general, we have always regretted it in the past when we chose to
> slip a release waiting for a specific feature...

I don't recall such a time - though perhaps the last time it happened
was before I was so heavily involved in the release process (ie. 7.x).
What were the reasons for regretting it?

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 13:31:22

On Mon, 2009-01-12 at 17:27 +0000, Dave Page wrote:

> > In general, we have always regretted it in the past when we chose to
> > slip a release waiting for a specific feature...
> 
> I don't recall such a time - though perhaps the last time it happened
> was before I was so heavily involved in the release process (ie. 7.x).
> What were the reasons for regretting it?

8.2 suffered from horrendous slip.

Joshua D. Drake


> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Dave Page"

Date:

12 January 2009, 13:34:38

On Mon, Jan 12, 2009 at 5:20 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

> The community are our shareholders.

Exactly - and their dividends are the features we release, not a share
of profits we make from pushing out something a few weeks earlier.

> Right. Except that isn't really the question at hand is it? The above is
> just a potential result of the question at hand. The low level question
> is, "do we feel comfortable from a technical (not a whiz bang) level
> with the diligence that has been provided this code.

I always feel very confident knowing that it won't be committed until
it's right.

> Well its really nobody's fault except the hacker that didn't step up to
> do the work. I believe all hackers have already been working diligently.

They have - but I see no reason why an imperfect process should delay
the hard work of developers getting into the hands of users that want
it for 12 months or more. It'll annoy users and potentially alienate
important developers - and there are few enough of them able to work
on features of this complexity as it is.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 13:37:50

>> 2.  Start using more git, as many hackers and committers have already
>> started to do.  This is the kind of situation where CVS just plain
>> falls down because branching and merging are unmanageably difficult in
>> it, where in git, they're many-times-a-day operations.
>
> This is a red herring, unless your proposal also includes making the
> master CVS^H^H^Hgit repository world-writable.  The complaint I have
> about people posting URLs is that there's no stable archive of what the
> patches really were, and just because it came out of someone's local git
> repository doesn't help that.

No, git really does help with this.  If Simon were making his changes
in git and pushing them to a git branch on git.postgresql.org, you
would be able to see exactly what he changed and when he changed it.
You would therefore be able to assess whether the changes over the
last several months were or were not substantial.  The whole point of
git is that you don't just publish the master branch - everyone can
have their own branches, and they can all be published, and everyone
can see the whole development history of every project for whatever
purpose they care to see it.

git IS a stable archive of what the patches really were.

...Robert

Re: Recovery Test Framework

From

Gregory Stark

Date:

12 January 2009, 13:38:20

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Yeah, but there are already a number of things in 8.4 that are killer
> features for various applications --- window functions and WITH to take
> two recently-committed examples.  Should we sit on those for however
> long it will take to make replication release-worthy?

Do we know it's not release-worthy now? From what I see Heikki is proposing
refactorings which improve the code but hasn't found anything actually broken.
I'm all for cleaner simpler code -- especially in critical backup processes
since simpler means safer -- but just because there are better ways to do
things doesn't mean the current code isn't acceptable.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!

Re: Recovery Test Framework

From

"Christopher Browne"

Date:

12 January 2009, 13:44:11

On Mon, Jan 12, 2009 at 12:27 PM, Dave Page <dpage@pgadmin.org> wrote:
> On Mon, Jan 12, 2009 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In general, we have always regretted it in the past when we chose to
>> slip a release waiting for a specific feature...
>
> I don't recall such a time - though perhaps the last time it happened
> was before I was so heavily involved in the release process (ie. 7.x).
> What were the reasons for regretting it?

I seem to recall us deferring 8.1 (was it 8.1?) for a while on the
basis that we were waiting for [something I don't recall offhand].

The feature that we were *hoping* to get wound up dropped on the floor
because it just wasn't ready, even *with* the extra time.

A good reason to regret the slippage would be if that slippage didn't
allow bringing in the new feature.

That does seem pretty relevant here: it sure would be a shame if we
put off 8.4 because we *hoped* extra time would allow us to get hot
standby into place, only to find that it wasn't ready because it
*really* Wasn't Ready.  We would have several things to regret:

Irrespective of the state of Hot Standby, we would have the following regrets:
- 8.4 would be put off, which is of nonzero cost
- 8.5 would also be put off
- Features deferred for 8.5 would be put off further
- Developers working on features other than Hot Standby have the
fruits of their efforts deferred

If Hot Standby turns out to be "8.4-worthy", then there is an anti-regret:
- We get Hot Standby in 8.4

On the other hand, if Hot Standby turns out to *not* be ready enough,
then add the further regret:
- We waited, and deferred more functionality to 8.5, without any attendant gain

Each one of the "regrets" is fairly material.  I'm not arguing (at the
moment) against taking time to put in Hot Standby, but there is
certainly grist available to make that argument.
-- 
http://linuxfinances.info/info/linuxdistributions.html
Eddie Izzard  - "I grew up in Europe, where the history comes from."

Re: Recovery Test Framework

From

Simon Riggs

Date:

12 January 2009, 13:45:02

On Mon, 2009-01-12 at 08:37 -0800, Joshua D. Drake wrote:
> On Mon, 2009-01-12 at 11:18 -0500, Tom Lane wrote:
> 
> > > My point is that what Simon currently has (and so what you tested) is
> > > different from what is going to be commited (note the "final" in what
> > > I wrote) and I suspect there will be a certain number of non
> > > negligible adjustments (see the last discussions between Simon and
> > > Heikki and I don't think Tom has taken a look at these patches yet).
> > 
> > The thing that's disturbing me is that (to judge by what I've been
> > seeing on the mailing list) there's been a steady stream of "non
> > negligible adjustments" for the past two months.  That's good from
> > the standpoint that problems are getting found and fixed, but it's
> > not giving me any warm fuzzies about the code being ready to go.
> 
> This is the same thing that makes me nervous. The feature appears to be
> "Under heavy development". As I understand the development model the
> heavy development is supposed to happen before commit fest.

That simply not true. 

So far there have been 15 resolved issues, shown on Wiki
2 correctness issues, critical if not resolvable, but easily done
1 trivial porting issue
1 minor issue related to disallowed read-write modes
1 minor refactoring
3 minor issues resolved to no action
6 bugs, all easily diagnosed and fixed

plus

1 major refactoring, that required significant examination of internals
to achieve, but has not changed anything that we now agree is
significant (that also was a discussion point). Much of the difficulty
caused because the design needs to be both correct and performant, which
I believe it is.

So far this patch has had much less refactoring than async commit, which
reached version > 22 before it was committed. 

There are 322 separate code chunks touching 69 files, with more than
9000 lines of code (>11000 including patch diffs). In percentage terms,
that's pretty good "code quality".

There are two main oustanding tasks, which will be complete over the
next couple of days.

* prepared transactions - left because I knew Heikki wanted to refactor
things in a way that would rip out any code written for that part

* usability issues related to how/when queries get cancelled. That came
out of discussion during review and was not an outstanding item before
deadline.

There have been no changes to the main mechanisms in the patch, nor in
the user interface related parts. To be honest we might reasonably have
expected *more* change to have taken place than has done.

Some dev might sound like heavy lifting, because the code often involves
the internal mechanics of the transaction system, but this is hardly
either my or Heikki's first foray into that section of code and we're
used to hitting problems and solving them.

Anyway, I'm actually going to turn off my email for a few days. Talking
won't get it done and that way I can pretend you're all sending me
positive waves (not to pick specifically on you Josh).

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Recovery Test Framework

From

Ron Mayer

Date:

12 January 2009, 14:01:36

Robert Haas wrote:
>>> 2.  Start using more git...
>> This is a red herring, unless your proposal also includes making the
>> master CVS^H^H^Hgit repository world-writable.  The complaint I have
>> about people posting URLs is that there's no stable archive of what the
>> patches really were, and just because it came out of someone's local git
>> repository doesn't help that.
> 
> No, git really does help with this.  ... 
> git IS a stable archive of what the patches really were.

Sorry to re-ignite the flame war, but this is the
*perfect* example of the singlemost compelling advantage git over cvs.

All of Simon's history remains visible in git on his branch.

Better - any patches submitted to Simon by code reviewers that
Simon accepts (pulls) into his branch - can also be seen on
branches off of Simon's branch with the complete history of where
they came from.

When/if the patch eventually gets accepted into the master, as
as much (or as little, thanks to git-rebase) of the history of
that branch can be pulled along with it; as can be seen with the
major merges of linux branches here:
http://repo.or.cz/git-browser/by-commit.html?r=linux-2.6.git

There's no need for the master git to be world-writable.   The
few with write access choose exactly how much history from Simon's
branch (and from the code review's branches) they want to merge in
when they pull&merge from his branch.

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 14:05:18

"Dave Page" <dpage@pgadmin.org> writes:
> On Mon, Jan 12, 2009 at 5:20 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> Well its really nobody's fault except the hacker that didn't step up to
>> do the work. I believe all hackers have already been working diligently.

> They have - but I see no reason why an imperfect process should delay
> the hard work of developers getting into the hands of users that want
> it for 12 months or more.

How is it that this argument applies only to work not yet done, as
opposed to work that was already done and committed over the past 12
months?

> It'll annoy users and potentially alienate
> important developers - and there are few enough of them able to work
> on features of this complexity as it is.

Well, we can alienate developers who get annoyed because we won't slip
the schedule for their convenience, or we can alienate the ones who met
the agreed-on schedule and don't get to see their work shipped in a
timely fashion.

Really it was possible to foresee this coming months ago.
We knew when we posted
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00913.php
that it was very ambitious to hope for working replication in 8.4.
Then basically nothing happened all summer; Simon didn't ramp up
his effort until around September IIRC.  He's done yeoman work
since then, but it can hardly be surprising that we're faced with
a slip-or-cut-the-feature decision now.
        regards, tom lane

Re: Recovery Test Framework

From

"Dave Page"

Date:

12 January 2009, 14:07:16

On Mon, Jan 12, 2009 at 5:30 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> On Mon, 2009-01-12 at 17:27 +0000, Dave Page wrote:
>
>> > In general, we have always regretted it in the past when we chose to
>> > slip a release waiting for a specific feature...
>>
>> I don't recall such a time - though perhaps the last time it happened
>> was before I was so heavily involved in the release process (ie. 7.x).
>> What were the reasons for regretting it?
>
> 8.2 suffered from horrendous slip.

It wasn't delayed for a specific feature though was it? Just because
there was so much in the queue it took far longer than planned. Plus
it was originally intended as a half-length cycle which clearly didn't
work for a number of reasons.


-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 14:08:55

"Robert Haas" <robertmhaas@gmail.com> writes:
>> This is a red herring, unless your proposal also includes making the
>> master CVS^H^H^Hgit repository world-writable.  The complaint I have
>> about people posting URLs is that there's no stable archive of what the
>> patches really were, and just because it came out of someone's local git
>> repository doesn't help that.

> No, git really does help with this.  If Simon were making his changes
> in git and pushing them to a git branch on git.postgresql.org, you
> would be able to see exactly what he changed and when he changed it.

Well, if that's actually an archival repository then it would work.
But wasn't I just reading something about having to wipe that repository
and re-import the CVS history to fix various problems?

(In any case, the URLs I'm complaining of weren't pointing at
git.postgresql.org, but various private servers or wiki pages.)
        regards, tom lane

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 14:16:19

"Christopher Browne" <cbbrowne@gmail.com> writes:
> On Mon, Jan 12, 2009 at 12:27 PM, Dave Page <dpage@pgadmin.org> wrote:
>> On Mon, Jan 12, 2009 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> In general, we have always regretted it in the past when we chose to
>>> slip a release waiting for a specific feature...
>> 
>> I don't recall such a time - though perhaps the last time it happened
>> was before I was so heavily involved in the release process (ie. 7.x).
>> What were the reasons for regretting it?

> I seem to recall us deferring 8.1 (was it 8.1?) for a while on the
> basis that we were waiting for [something I don't recall offhand].
> The feature that we were *hoping* to get wound up dropped on the floor
> because it just wasn't ready, even *with* the extra time.

That's happened more than once, though my memory of details is fuzzy
and I don't have time to troll the archives for them right now.
Maybe Bruce can remember without a lot of searching.  But our current
policy of time-based releases (ie deadlines) is born of hard experience
with the negative consequences of saying "we'll release when feature X
is ready".  The real killer disadvantage is that all other development
tends to stop until X is ready, because no one can plan anything.
        regards, tom lane

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 14:23:46

>> No, git really does help with this.  If Simon were making his changes
>> in git and pushing them to a git branch on git.postgresql.org, you
>> would be able to see exactly what he changed and when he changed it.
>
> Well, if that's actually an archival repository then it would work.
> But wasn't I just reading something about having to wipe that repository
> and re-import the CVS history to fix various problems?

Not sure; I hope not.  I think we'd be well-served by getting rid of
CVS permanently and using git for the master branch.  That would
eliminate the possibility of git reading a partial commit from CVS and
any other possible issues of needing to go back and reconstruct git
things based on unexpected wankage in the CVS repository.  We could
keep the list of committers exactly the same as what it is now; they'd
just be people with rights to push the master git branch rather than
rights to commit to CVS.

I am sure this would involve a fair amount of work but I think it
would be worth it, and I'd be willing to help with the doing of it.  I
have resisted learning git for a while but I've come around.  I'm even
switching to git for development work I do for my employer, where I'm
the only developer, because it's just so much easier to do work on a
branch and then merge it than it is with CVS or SVN.

> (In any case, the URLs I'm complaining of weren't pointing at
> git.postgresql.org, but various private servers or wiki pages.)

Agreed.

...Robert

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 14:32:13

> That's happened more than once, though my memory of details is fuzzy
> and I don't have time to troll the archives for them right now.
> Maybe Bruce can remember without a lot of searching.  But our current
> policy of time-based releases (ie deadlines) is born of hard experience
> with the negative consequences of saying "we'll release when feature X
> is ready".  The real killer disadvantage is that all other development
> tends to stop until X is ready, because no one can plan anything.

This is a very reasonable concern, and a good policy.  But I would
feel better about the application of it to this particular case if
you, personally, spent a couple of hours reviewing the patches at
issue and expressed an opinion about how close they are to being ready
to commit.  I doubt that many of us would care to substitute our
judgment for yours - but it would be a shame to bump them to 8.5
needlessly.

One thing I find interesting is that the "Infrastructure Changes for
Recovery" patch became the foundation for both "Hot Standby" and
"Synchronous Replication".  That implies that those changes might be
of somewhat more general interest, at least as the foundation for
further work.  If we HS and/or SR are out of reach, it might be worth
at least looking to see if any of that infrastructure work can be
reasonably be committed for 8.4.

...Robert

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 14:33:51

On Mon, 2009-01-12 at 13:23 -0500, Robert Haas wrote:
> >> No, git really does help with this.  If Simon were making his changes
> >> in git and pushing them to a git branch on git.postgresql.org, you
> >> would be able to see exactly what he changed and when he changed it.
> >
> > Well, if that's actually an archival repository then it would work.
> > But wasn't I just reading something about having to wipe that repository
> > and re-import the CVS history to fix various problems?
> 
> Not sure; I hope not. 

Actually yes we did. There was a bug in git-cvs that we fixed. Its is
talked about here:

http://archives.postgresql.org/pgsql-www/2008-12/msg00182.php

But... that wasn't really the fault of git.

>  I think we'd be well-served by getting rid of
> CVS permanently and using git for the master branch.  That would
> eliminate the possibility of git reading a partial commit from CVS and
> any other possible issues of needing to go back and reconstruct git
> things based on unexpected wankage in the CVS repository.  We could
> keep the list of committers exactly the same as what it is now; they'd
> just be people with rights to push the master git branch rather than
> rights to commit to CVS.
> 

There are specific problems with git that people should be aware of
before we start the idea of migrating full to it. The most bothersome to
me is that you must check out the ENTIRE repo. It isn't possible to say:

git clone https://git.postgresql.org/postgresql/7.3

It is all or nothing. I know why this is but that doesn't mean I like
it :)

> I am sure this would involve a fair amount of work but I think it

Actually the work is relatively minimal as we have git infrastructure in
place. The larger problem is:

What is the problem we are trying to solve?
Does git actually solve it?

Sincerely,

Joshua D. Drake

-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

Simon Riggs

Date:

12 January 2009, 14:41:03

On Mon, 2009-01-12 at 13:04 -0500, Tom Lane wrote:

> Simon didn't ramp up
> his effort until around September IIRC.  

The main topic of snapshot creation was being discussed at PGcon in May
and another sponsors got serious then. I started working on a coherent
detailed design in July, but didn't publish for another month while I
waited for sponsors to confirm. The shape of the patch was coming
together over previous months.

I published the "infrastructure" patch on 1 Sept, which is about 35% of
current patch, deliberately to allow Sync Rep to integrate.

> He's done yeoman work since then, 

Thanks.

> but it can hardly be surprising that we're faced with
> a slip-or-cut-the-feature decision now.

It was always going to be tight; I said as much to Bruce last February
and reconfirmed that in May after talking with Suzuki-san's team. I
could have ignored Sync Rep completely and made it less tight for myself
and more difficult for the project.

How much are we going to slip by? Async Commit was committed to CVS on 
1 Aug 2007, exactly 4 months after deadline for 8.3. At moment we are
2.4 months part deadline on 8.4. We hoped 8.4 would be different: it has
been - we have got much much more done in the same time, just we have a
long tail again.

I think we should just deal with things as they are now and put in a
rule for next time that we auto-chop/no discussion any big patches
submitted on last commit fest that haven't already had feedback from
earlier fests.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Recovery Test Framework

From

Heikki Linnakangas

Date:

12 January 2009, 15:06:09

Robert Haas wrote:
>> That's happened more than once, though my memory of details is fuzzy
>> and I don't have time to troll the archives for them right now.
>> Maybe Bruce can remember without a lot of searching.  But our current
>> policy of time-based releases (ie deadlines) is born of hard experience
>> with the negative consequences of saying "we'll release when feature X
>> is ready".  The real killer disadvantage is that all other development
>> tends to stop until X is ready, because no one can plan anything.
> 
> This is a very reasonable concern, and a good policy.  But I would
> feel better about the application of it to this particular case if
> you, personally, spent a couple of hours reviewing the patches at
> issue and expressed an opinion about how close they are to being ready
> to commit.  I doubt that many of us would care to substitute our
> judgment for yours - but it would be a shame to bump them to 8.5
> needlessly.

Well, I've been keeping an eye on both Hot Standby and Synchronous 
Replication patches. IMHO the Hot Standby patch is architecturally 
sound, and while I suggested some pretty big changes just recently 
(which Simon picked up and did already), it's in pretty good shape. No 
doubt there's still some issues that haven't been uncovered, comments to 
be fixed, documentation to be written, but no showstoppers or anything 
that requires a major rewrite. There's one todo item left: prepared 
transactions, but I don't think there's anything fundamentally hard 
about them, just needs to be fixed. Simon mentioned usability issues 
related to who/when queries get cancelled, but I think we've discussed 
that to death already and the patch handles it quite nicely.

IMHO, the synchronous replication isn't in such good shape, I'm afraid. 
I've said this before, but I'm not happy with the "built from spare 
parts" nature of it. You shouldn't have to configure an archive, 
file-based log shipping using rsync or whatever, and pg_standby. All 
that is in addition to the direct connection between master and slave. 
The slave really should be able to just connect to the master, and 
download all the WAL it needs directly. That's a huge usability issue if 
left as is, but requires very large architectural changes to fix.

> One thing I find interesting is that the "Infrastructure Changes for
> Recovery" patch became the foundation for both "Hot Standby" and
> "Synchronous Replication".  That implies that those changes might be
> of somewhat more general interest, at least as the foundation for
> further work.  If we HS and/or SR are out of reach, it might be worth
> at least looking to see if any of that infrastructure work can be
> reasonably be committed for 8.4.

Yeah, being able to do an online checkpoint after recovery has some 
value of its own.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Heikki Linnakangas

Date:

12 January 2009, 15:12:52

Robert Haas wrote:
> git IS a stable archive of what the patches really were.

No. A developer can delete, move and rebase branches in his own 
repository as he likes, and all of those operations "modify history". In 
fact, a developer can completely destroy or take offline his published 
repository. It's *not* an archive.

There's other reasons why I like git very much over cvs, but archiving 
is not one of them.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 15:18:26

On Mon, 2009-01-12 at 20:43 +0200, Heikki Linnakangas wrote:
> Robert Haas wrote:
> > git IS a stable archive of what the patches really were.
> 
> No. A developer can delete, move and rebase branches in his own 
> repository as he likes, and all of those operations "modify history". In 
> fact, a developer can completely destroy or take offline his published 
> repository. It's *not* an archive.

Yes but I have to pull the whole repo to do it is my point. I can't just
pull down the 8.3 branch. I have to pull down the whole tree and then
work on 8.3.

SVN on the other hand, if I only want to work on trunk, I can check out
trunk and only work (and commit) into trunk.

> 
> There's other reasons why I like git very much over cvs, but archiving 
> is not one of them.

Oh don't get me wrong. I am not a CVS user on any level except with
PostgreSQL.

Joshua D. Drake

> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Dave Page"

Date:

12 January 2009, 15:21:19

On Mon, Jan 12, 2009 at 6:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Dave Page" <dpage@pgadmin.org> writes:
>> On Mon, Jan 12, 2009 at 5:20 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>>> Well its really nobody's fault except the hacker that didn't step up to
>>> do the work. I believe all hackers have already been working diligently.
>
>> They have - but I see no reason why an imperfect process should delay
>> the hard work of developers getting into the hands of users that want
>> it for 12 months or more.
>
> How is it that this argument applies only to work not yet done, as
> opposed to work that was already done and committed over the past 12
> months?

It doesn't - but those whose work has been committed haven't suffered
due to the process.

> Really it was possible to foresee this coming months ago.
> We knew when we posted
> http://archives.postgresql.org/pgsql-hackers/2008-05/msg00913.php
> that it was very ambitious to hope for working replication in 8.4.
> Then basically nothing happened all summer; Simon didn't ramp up
> his effort until around September IIRC.  He's done yeoman work
> since then, but it can hardly be surprising that we're faced with
> a slip-or-cut-the-feature decision now.

Simon wasn't working on replication. He's been doing hot standby which
has been feature-complete (bar the 2PC stuff which I believe Heikki
wanted to hack about in some way) since some time before feature
freeze. At this time it's being reviewed and refactored/debugged as a
result of the feedback he's received which is precisely what feature
freeze is for.

The async replication I believe is not in such good shape, having been
submitted in a working, but primitive form immediately prior to
feature freeze. Although I'd love to see it included in 8.4 (in a form
meeting our normal quality requirements of course), I can appreciate
it should be bumped if it's not practical to bring it up to par in a
reasonable timeframe. I don't believe that decision should be made
until it has had a good first review by a couple of committers who can
assess what might be required. If it's felt it can then be whipped
into shape with a minor delay to the release, then I think it's worth
the wait.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Aidan Van Dyk

Date:

12 January 2009, 15:31:36

* Joshua D. Drake <jd@commandprompt.com> [090112 14:22]:
> > No. A developer can delete, move and rebase branches in his own 
> > repository as he likes, and all of those operations "modify history". In 
> > fact, a developer can completely destroy or take offline his published 
> > repository. It's *not* an archive.
> 
> Yes but I have to pull the whole repo to do it is my point. I can't just
> pull down the 8.3 branch. I have to pull down the whole tree and then
> work on 8.3.

Not correct.  Please, if you're going to say what git "does", please
make sure it's correct.  I'm sure people would scream if I said that the
SVN forced you checkout out all of /trunk /branches and /tags (i.e. the
"root" of your SVN repo) into a directory structure simultaneously.

With git, you pull down the complete *history* of whatever branch, tag,
or reference you want to pull down.  The *default* "clone" options are
setup to pull down the history of all available branches and tags, but
that's not mandatory.

And the benefit of having the whole history of the branch available, is
that you can work on the branch *and history* locally, committing,
inspecting, reviewing, without needing to go back over the net.

And to top it off, the history in git is usually *smaller* (takes up
less space) than the .svn of a SVN checkout...

> SVN on the other hand, if I only want to work on trunk, I can check out
> trunk and only work (and commit) into trunk.

Accept when your busy waiting for the slow SVN operations to do stuff
over the network, you can't do anything... Not to mention, merging,
repeatedly, rebasing,
etc ;-)

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 15:31:38

> Actually yes we did. There was a bug in git-cvs that we fixed. Its is
> talked about here:
>
> http://archives.postgresql.org/pgsql-www/2008-12/msg00182.php
>
> But... that wasn't really the fault of git.

OK, but that's in the past now - good.  I thought Tom was saying that
it might need to be done again.

> There are specific problems with git that people should be aware of
> before we start the idea of migrating full to it. The most bothersome to
> me is that you must check out the ENTIRE repo. It isn't possible to say:

I agree.  It's possible that this might change in the future - git has
come a long way in a short time.  But I'm not betting on it.

> Actually the work is relatively minimal as we have git infrastructure in
> place. The larger problem is:
>
> What is the problem we are trying to solve?
> Does git actually solve it?

I think the problems it would solve for us are (1) emailing huge
patches around sucks (it sucks unnecessarily because of the
mailing-list size limit, but even if someone fixes that, it still
sucks), (2) no need for a CVS-to-GIT conversion that may incur dirty
reads; (3) retention of history and authorship when merging patches
into core.  It's possible that it might change our workflow in other
ways too, but even if we got only those three things I think that
would be pretty nice.

...Robert

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 15:35:11

On Mon, 2009-01-12 at 14:31 -0500, Robert Haas wrote:
> > Actually yes we did. There was a bug in git-cvs that we fixed. Its is
> > talked about here:

> > Actually the work is relatively minimal as we have git infrastructure in
> > place. The larger problem is:
> >
> > What is the problem we are trying to solve?
> > Does git actually solve it?
> 
> I think the problems it would solve for us are (1) emailing huge
> patches around sucks (it sucks unnecessarily because of the
> mailing-list size limit, but even if someone fixes that, it still
> sucks), (2) no need for a CVS-to-GIT conversion that may incur dirty
> reads; (3) retention of history and authorship when merging patches
> into core.  It's possible that it might change our workflow in other
> ways too, but even if we got only those three things I think that

O.k. now the second part :)

Does bzr, mecurial or monotone offer the same or better solution? Bzr in
particular is in very wide use and I run into mecurial all the time.

Sincerely,
Joshua D. Drake

-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 15:36:19

On Mon, Jan 12, 2009 at 1:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> git IS a stable archive of what the patches really were.
>
> No. A developer can delete, move and rebase branches in his own repository
> as he likes, and all of those operations "modify history". In fact, a
> developer can completely destroy or take offline his published repository.
> It's *not* an archive.
>
> There's other reasons why I like git very much over cvs, but archiving is
> not one of them.

s/IS/CAN BE/, then.

CVS history can be rewritten, too; it's just harder.  We can make a
policy that branches once pushed to git.postgresql.org are not to be
rebased; that's recommended practice with git anyway.  I'm not sure
off the top of my head how hard it would be to enforce this in code;
you'd just need to enforce that 'git push' only ever did a
fast-forward.

...Robert

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 15:39:08

On Mon, 2009-01-12 at 14:33 -0500, Aidan Van Dyk wrote:
> * Joshua D. Drake <jd@commandprompt.com> [090112 14:22]:
> > > No. A developer can delete, move and rebase branches in his own 
> > > repository as he likes, and all of those operations "modify history". In 
> > > fact, a developer can completely destroy or take offline his published 
> > > repository. It's *not* an archive.
> > 
> > Yes but I have to pull the whole repo to do it is my point. I can't just
> > pull down the 8.3 branch. I have to pull down the whole tree and then
> > work on 8.3.
> 
> Not correct.  Please, if you're going to say what git "does", please
> make sure it's correct.  I'm sure people would scream if I said that the
> SVN forced you checkout out all of /trunk /branches and /tags (i.e. the
> "root" of your SVN repo) into a directory structure simultaneously.

They would fall on deaf ears or perhaps on their own flame thrower if
they did.

> 
> With git, you pull down the complete *history* of whatever branch, tag,
> or reference you want to pull down.  The *default* "clone" options are
> setup to pull down the history of all available branches and tags, but
> that's not mandatory.

Oh! O.k. glad to hear it. Then I was misinformed and I am glad that I
now know better.

> And to top it off, the history in git is usually *smaller* (takes up
> less space) than the .svn of a SVN checkout...
> 
> > SVN on the other hand, if I only want to work on trunk, I can check out
> > trunk and only work (and commit) into trunk.
> 
> Accept when your busy waiting for the slow SVN operations to do stuff
> over the network, you can't do anything... Not to mention, merging,
> repeatedly, rebasing,
> 

I am not suggesting that we move to SVN you don't have to start a git is
holier than SVN argument.

Sincerely,

Joshua D. Drake

-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

David Fetter

Date:

12 January 2009, 15:46:32

On Mon, Jan 12, 2009 at 02:36:08PM -0500, Robert Haas wrote:
> On Mon, Jan 12, 2009 at 1:43 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Robert Haas wrote:
> >> git IS a stable archive of what the patches really were.
> >
> > No. A developer can delete, move and rebase branches in his own
> > repository as he likes, and all of those operations "modify
> > history". In fact, a developer can completely destroy or take
> > offline his published repository.  It's *not* an archive.
> >
> > There's other reasons why I like git very much over cvs, but
> > archiving is not one of them.
> 
> s/IS/CAN BE/, then.
> 
> CVS history can be rewritten, too; it's just harder.  We can make a
> policy that branches once pushed to git.postgresql.org are not to be
> rebased; that's recommended practice with git anyway.  I'm not sure
> off the top of my head how hard it would be to enforce this in code;
> you'd just need to enforce that 'git push' only ever did a
> fast-forward.

We could do this using git's configuration:

http://www.kernel.org/pub/software/scm/git/docs/git-config.html

See receive.denyNonFastForwards, which is built for just this purpose :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 15:49:18

>> I think the problems it would solve for us are (1) emailing huge
>> patches around sucks (it sucks unnecessarily because of the
>> mailing-list size limit, but even if someone fixes that, it still
>> sucks), (2) no need for a CVS-to-GIT conversion that may incur dirty
>> reads; (3) retention of history and authorship when merging patches
>> into core.  It's possible that it might change our workflow in other
>> ways too, but even if we got only those three things I think that
>
> O.k. now the second part :)
>
> Does bzr, mecurial or monotone offer the same or better solution? Bzr in
> particular is in very wide use and I run into mecurial all the time.

I'm sure we could make any of them work, but the fact that
git.postgresql.org already exists and hg.postgresl.org,
bzr.postgresql.org, etc. do not may be suggestive of something - that
it would be less work to finish the job, if nothing else.  I also
think there is some evidence to suggest that git is evolving very
rapidly.  You could view that as a negative thing, but I don't mean
they're fixing bugs: I mean they're adding features.  That suggests
both that (1) if git doesn't have feature X that you want now, it's
likely to have it in the not-too-distant future and (2) a lot of other
people like git well enough that they're both using it themselves and
contributing changes back to the community, which is an endorsement of
the product generally.

git definitely has some downsides.  The initial repository clone is
kind of expensive, and there's a learning curve (though it's much
flatter than it was in the past).  So I don't think it's the greatest
thing ever.  But I think it's pretty good, and based on the spike in
a/... and b/... paths in recent patches, I'm not the only one.

...Robert

Re: Recovery Test Framework

From

"Jaime Casanova"

Date:

12 January 2009, 16:29:50

On Mon, Jan 12, 2009 at 12:20 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> IMO, the reasons to delay a release:
>
> Our grammar looks like MySQL
>

mmm... you mean if we add things like VALUES statement, lastval() and
things like that? ;)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: Recovery Test Framework

From

Stefan Kaltenbrunner

Date:

12 January 2009, 16:35:32

Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
>> Two things to fix this, and several other problems:
> 
>> 1.  Remove the messages size limits on -hackers.  They serve no useful
>> purpose, and they interfere with our development process.
> 
> Agreed, or at least boost it up a good bit more.

the question really is how much "a bit more" is - right now the limit is   100000 characters which limits us to ~70KB
ofattachments (around the 
 
size of the Hot-standby patch if bzip2 compressed).

The SE-Postgres patch for example is ~650KB uncompressed - if we want to  cope with uncompressed patches that large we
wouldhave to increase 
 
the current limit by a factor of 10 at least.
I wonder if there are people on the list that might not want to receive 
mails that large(like users with mobile phones)?


Stefan

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 16:39:58

On Mon, 2009-01-12 at 21:35 +0100, Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
> > David Fetter <david@fetter.org> writes:
> >> Two things to fix this, and several other problems:
> > 
> >> 1.  Remove the messages size limits on -hackers.  They serve no useful
> >> purpose, and they interfere with our development process.
> > 
> > Agreed, or at least boost it up a good bit more.
> 
> the question really is how much "a bit more" is - right now the limit is 
>    100000 characters which limits us to ~70KB of attachments (around the 
> size of the Hot-standby patch if bzip2 compressed).
> 
> The SE-Postgres patch for example is ~650KB uncompressed - if we want to 
>   cope with uncompressed patches that large we would have to increase 
> the current limit by a factor of 10 at least.
> I wonder if there are people on the list that might not want to receive 
> mails that large(like users with mobile phones)?

Smart mobile phones are not going to pull down the attachment unless the
user explicitly says, pull down attachment.

However I can say I would be fairly annoyed if everytime I checked
hackers I was pulling down 5 megs in various patches.

Joshua D. Drake


> 
> 
> Stefan
> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 17:02:48

> However I can say I would be fairly annoyed if everytime I checked
> hackers I was pulling down 5 megs in various patches.

Oh... really?  I thought we were past the day when anyone cared how
large the attachments were.

At any rate, if we increased the limit from 100k to 1M, you could
conceivably get 5M if 5 huge patches had just been posted, but I doubt
it would happen every time you checked -hackers.

...Robert

Re: Recovery Test Framework

From

"Kevin Grittner"

Date:

12 January 2009, 17:07:47

>>> "Dave Page" <dpage@pgadmin.org> wrote: 
> project that aims to provide it's users with useful features. We
> have two extremely useful features here (hot standby and sync
> replication) which together will make this a killer release for many
> people
Without taking any particular position on this one way or the other, I
think it's worth noting that when I gave a brief talk about my
experiences with PostgreSQL at Milwaukee BarCamp recently, and
coordinated discussion afterward, the biggest unanswered concern of
people thinking about making the move to PostgreSQL was lack of
integrated "multi-master replication".  I'm not entirely sure what
features they would need to make them feel comfortable, but it does
seem to currently be a barrier to migration for some.
To quantify, there were probably about five or six people out of about
30 in attendance who seemed particularly concerned about this issue.
-Kevin

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 17:08:39

On Mon, 2009-01-12 at 16:02 -0500, Robert Haas wrote:
> > However I can say I would be fairly annoyed if everytime I checked
> > hackers I was pulling down 5 megs in various patches.
> 
> Oh... really?  I thought we were past the day when anyone cared how
> large the attachments were.

IMO the fact that we email patches at all represents a broken system :P
but yes I care how big my attachments are. If -hackers was the only list
I was on, I wouldn't care but it obviously isn't.

Joshua D. Drake

-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Joshua D. Drake"

Date:

12 January 2009, 17:16:11

On Mon, 2009-01-12 at 15:07 -0600, Kevin Grittner wrote:
> >>> "Dave Page" <dpage@pgadmin.org> wrote: 
> > project that aims to provide it's users with useful features. We
> > have two extremely useful features here (hot standby and sync
> > replication) which together will make this a killer release for many
> > people
>  
> Without taking any particular position on this one way or the other, I
> think it's worth noting that when I gave a brief talk about my
> experiences with PostgreSQL at Milwaukee BarCamp recently, and
> coordinated discussion afterward, the biggest unanswered concern of
> people thinking about making the move to PostgreSQL was lack of
> integrated "multi-master replication".  I'm not entirely sure what
> features they would need to make them feel comfortable, but it does
> seem to currently be a barrier to migration for some.
>  
> To quantify, there were probably about five or six people out of about
> 30 in attendance who seemed particularly concerned about this issue.
>  

Who has integrated multi-master (transaction and power outage safe)
replication now?

Joshua D. Drake



> -Kevin
> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997

Re: Recovery Test Framework

From

"Kevin Grittner"

Date:

12 January 2009, 18:49:50

>>> "Joshua D. Drake" <jd@commandprompt.com> wrote: 
> Who has integrated multi-master (transaction and power outage safe)
> replication now?
As far as I recall, nobody there was that specific about the form of
it.  PostgreSQL arguably has non-integrated multi-master replication
now, and I've seen log-based implementations which operated through
daily dial-up connectivity as far back as 1984.  (Statewide in Alaska,
and city-wide in New York City, neither of which could afford to keep
full-time communications up for all the relevant sites.)
I did point out the options mentioned here for PostgreSQL:
http://en.wikipedia.org/wiki/Multi-master_replication
-Kevin

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 20:10:51

>> > No. A developer can delete, move and rebase branches in his own
>> > repository as he likes, and all of those operations "modify
>> > history". In fact, a developer can completely destroy or take
>> > offline his published repository.  It's *not* an archive.
>
> We could do this using git's configuration:
>
> http://www.kernel.org/pub/software/scm/git/docs/git-config.html
>
> See receive.denyNonFastForwards, which is built for just this purpose :)

Spiffy.

...Robert

Re: Recovery Test Framework

From

Bruce Momjian

Date:

12 January 2009, 20:30:11

Tom Lane wrote:
> "Christopher Browne" <cbbrowne@gmail.com> writes:
> > On Mon, Jan 12, 2009 at 12:27 PM, Dave Page <dpage@pgadmin.org> wrote:
> >> On Mon, Jan 12, 2009 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> In general, we have always regretted it in the past when we chose to
> >>> slip a release waiting for a specific feature...
> >> 
> >> I don't recall such a time - though perhaps the last time it happened
> >> was before I was so heavily involved in the release process (ie. 7.x).
> >> What were the reasons for regretting it?
> 
> > I seem to recall us deferring 8.1 (was it 8.1?) for a while on the
> > basis that we were waiting for [something I don't recall offhand].
> > The feature that we were *hoping* to get wound up dropped on the floor
> > because it just wasn't ready, even *with* the extra time.
> 
> That's happened more than once, though my memory of details is fuzzy
> and I don't have time to troll the archives for them right now.
> Maybe Bruce can remember without a lot of searching.  But our current
> policy of time-based releases (ie deadlines) is born of hard experience
> with the negative consequences of saying "we'll release when feature X
> is ready".  The real killer disadvantage is that all other development
> tends to stop until X is ready, because no one can plan anything.

OK, I had to think about this one, and I didn't want to fan the flames
in the discussion either.

Basically, I have given up trying to track the many patches around
recovery, replication, and hot standby, and I have stated that to
several people privately.  I have kept an archive of the active emails
about the topic:
http://momjian.us/cgi-bin/pgsql/pitr

Looking at the list on the commit fest wiki:
http://wiki.postgresql.org/wiki/CommitFestInProgress#Recovery.2C_Replication.2C_Hot_Standby

I think we should focus on the two simplest patches first,
"Infrastructure changes for recovery", and "rmgr hooks and
contrib/rmgr_hook" because those are probably the easiest to get
committed.  

Based on comments from Heikki, I think "Hot Standby - queries during
archive recovery" can be committed, and in fact perhaps Heikki can do
the commit.  

As far as "Synchronous log-shipping replication", there was only a hope
that would be completed in time for 8.4, and in fact trying to complete
it probably made completing the other patches harder.  I think it is
time to focus on the first three patches I listed and accept that we are
not going to be able to complete synchronous log-shipping in time.  I
think the code is just too much in flux at this point.  Even trying to
get it into 8.4, given its late start in the development process, just
reflects wishful thinking and not the kind of hard discipline we need to
keep our release process organized.  Optimism is nice and all, but with
so many people and companies relying on us, we don't have the luxury of
optimism.  If people want to be optimimistic going into the development
cycle, fine, but at the end we have to be practical, because failure
will lead a disappointment with the community which will be palpable. 
(Think back to the frustration we have felt about delayed releases, and
features we thought we were going to get, but didn't.)

As for the process used, I think it is useful to understand how
committers choose what to work on next.  One criteria is that the patch
has stabilized;  if a patch is still be modified regularly, the
committer might as well work on another patch that has stabilized.  Now,
a committer could ask for the patch to stabilize to work on it, but if
he has other patches that are stable, there is no point in asking for a
stable version;  he might as well work on just stable ones until only
unstable ones are left.

Now, maybe this is unfair to patches that are frequently updated, but
this is the typical process we follow, and it explains why the patches
above have not gotten near commit status yet.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 21:13:43

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> David Fetter <david@fetter.org> writes:
>>> 1.  Remove the messages size limits on -hackers.  They serve no useful
>>> purpose, and they interfere with our development process.
>> 
>> Agreed, or at least boost it up a good bit more.

> the question really is how much "a bit more" is - right now the limit is 
>    100000 characters which limits us to ~70KB of attachments (around the 
> size of the Hot-standby patch if bzip2 compressed).

> The SE-Postgres patch for example is ~650KB uncompressed - if we want to 
>   cope with uncompressed patches that large we would have to increase 
> the current limit by a factor of 10 at least.

I feel no need to encourage people to send huge patches uncompressed ;-)

gzip normally gets at least 3x or 4x on large diffs.  So a limit around
250K ought to be enough.
        regards, tom lane

Re: Recovery Test Framework

From

David Fetter

Date:

12 January 2009, 21:15:58

On Mon, Jan 12, 2009 at 08:12:46PM -0500, Tom Lane wrote:
> 250K ought to be enough.

...for anybody ;)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Recovery Test Framework

From

Gregory Stark

Date:

12 January 2009, 21:32:16

Bruce Momjian <bruce@momjian.us> writes:

> As for the process used, I think it is useful to understand how
> committers choose what to work on next.  One criteria is that the patch
> has stabilized;  if a patch is still be modified regularly, the
> committer might as well work on another patch that has stabilized.  Now,
> a committer could ask for the patch to stabilize to work on it, but if
> he has other patches that are stable, there is no point in asking for a
> stable version;  he might as well work on just stable ones until only
> unstable ones are left.
>
> Now, maybe this is unfair to patches that are frequently updated, but
> this is the typical process we follow, and it explains why the patches
> above have not gotten near commit status yet.

It's not just "unfair". It's counter-productive. It means you're ignoring the
very patches whose authors are mostly likely to be responsive to requests to
change them. And who would be most likely to be fertile ground for further
improvements.

Perhaps it would be useful for you to understand how it looks from a
submitter's point of view. As long as the patch sits in limbo only minor
tweaks and refinements are worth bothering with. Any thoughts of continuing on
any subsequent phases of development are all crushed since all that work might
go down the drain when the committer makes changes to the code it's based on.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 21:34:38

> I feel no need to encourage people to send huge patches uncompressed ;-)
> gzip normally gets at least 3x or 4x on large diffs.  So a limit around
> 250K ought to be enough.

To paraphrase a leading authority on PostgreSQL development, and with
tongue firmly in cheek, there's something to what you say, but
consider that we have pretty much unanimous agreement that 100k is too
small.  I think we should try to fix the problem, not just gradually
ratchet up the value until people start complaining in the other
direction.  :-)

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00809.php

...Robert

Re: Recovery Test Framework

From

Bruce Momjian

Date:

12 January 2009, 21:43:00

Gregory Stark wrote:
> > Now, maybe this is unfair to patches that are frequently updated, but
> > this is the typical process we follow, and it explains why the patches
> > above have not gotten near commit status yet.
> 
> It's not just "unfair". It's counter-productive. It means you're ignoring the
> very patches whose authors are mostly likely to be responsive to requests to
> change them. And who would be most likely to be fertile ground for further
> improvements.
> 
> Perhaps it would be useful for you to understand how it looks from a
> submitter's point of view. As long as the patch sits in limbo only minor
> tweaks and refinements are worth bothering with. Any thoughts of continuing on
> any subsequent phases of development are all crushed since all that work might
> go down the drain when the committer makes changes to the code it's based on.

I am just explaining how it works in practice.  If the patch is still
being improved, the feeling is that the author wants more time to adjust
things, and with other things on our plate, we are glad to leave their
patch until last.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Recovery Test Framework

From

Tom Lane

Date:

12 January 2009, 21:47:29

Gregory Stark <stark@enterprisedb.com> writes:
> Bruce Momjian <bruce@momjian.us> writes:
>> As for the process used, I think it is useful to understand how
>> committers choose what to work on next. ...

> It's not just "unfair". It's counter-productive. It means you're ignoring the
> very patches whose authors are mostly likely to be responsive to requests to
> change them. And who would be most likely to be fertile ground for further
> improvements.

I don't think you can honestly argue that the replication-related
patches are getting ignored.  AFAICT there's quite a lot of review
effort going on around them.  KaiGai-san probably has a legitimate
beef about lack of review on his patch, but the replication patches
do not.

It's true that stuff isn't going to get *committed* until it seems
reasonably stable, but I hope you weren't arguing for that.
        regards, tom lane

Re: Recovery Test Framework

From

"Robert Haas"

Date:

12 January 2009, 21:52:58

On Mon, Jan 12, 2009 at 2:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Well, I've been keeping an eye on both Hot Standby and Synchronous
> Replication patches. IMHO the Hot Standby patch is architecturally sound,
> and while I suggested some pretty big changes just recently (which Simon
> picked up and did already), it's in pretty good shape. No doubt there's
> still some issues that haven't been uncovered, comments to be fixed,
> documentation to be written, but no showstoppers or anything that requires a
> major rewrite. There's one todo item left: prepared transactions, but I
> don't think there's anything fundamentally hard about them, just needs to be
> fixed. Simon mentioned usability issues related to who/when queries get
> cancelled, but I think we've discussed that to death already and the patch
> handles it quite nicely.

Cool - that's good to hear.

> IMHO, the synchronous replication isn't in such good shape, I'm afraid. I've
> said this before, but I'm not happy with the "built from spare parts" nature
> of it. You shouldn't have to configure an archive, file-based log shipping
> using rsync or whatever, and pg_standby. All that is in addition to the
> direct connection between master and slave. The slave really should be able
> to just connect to the master, and download all the WAL it needs directly.
> That's a huge usability issue if left as is, but requires very large
> architectural changes to fix.

Yeah, I wasn't thinking about this, but you had mentioned it before,
and I thought (and think) it's a pretty fair criticism.  I think the
base backup should be integrated into the mechanism as well.  I want
to just be able to configure the master and slave for replication,
fire up the slave, and walk away.  Without that, I agree that it's
likely to be too cumbersome for any actual use.

>> One thing I find interesting is that the "Infrastructure Changes for
>> Recovery" patch became the foundation for both "Hot Standby" and
>> "Synchronous Replication".  That implies that those changes might be
>> of somewhat more general interest, at least as the foundation for
>> further work.  If we HS and/or SR are out of reach, it might be worth
>> at least looking to see if any of that infrastructure work can be
>> reasonably be committed for 8.4.
>
> Yeah, being able to do an online checkpoint after recovery has some value of
> its own.

Is there anything standing in the way of committing that patch?  I
don't think I've seen anything mentioned on -hackers.

...Robert

Re: Recovery Test Framework

From

"Robert Haas"

Date:

13 January 2009, 00:14:14

> I am just explaining how it works in practice.  If the patch is still
> being improved, the feeling is that the author wants more time to adjust
> things, and with other things on our plate, we are glad to leave their
> patch until last.

Well, it's good that you have an explanation, but I'm not sure it
helps much.  :-)  Surely the patches that are most likely to change
substantially are the big ones, and leaving those until last results
in them not making the time-based cutoff.  Someone who submitted a
20-line patch isn't likely to revise it substantially; someone who is
being paid $20k to write a patch is likely to spend a lot of time
working on it.

I think the fundamental problem here is the number and bandwidth of
the committers, which seems to be pretty limited.  Most of the
committers are either inactive, or essentially maintainers for a
particular subsystem.  With the exception of patches authored by the
committers themselves, I think the vast majority of patches for this
'fest were committed by Tom and Peter - and I think really mostly Tom.
And many of those were significantly modified in the process of being
committed, which suggests that efforts to take the load of committers
by having non-committers do reviews has not been entirely successful.
(It would be interesting to here how much value people think it has
added, and get suggestions on how to do things better next time.)

I'm not sure what to do about it, though.  More committers could be
added, but I presume if there were obvious candidates it would have
been done already.  It's complicated by the fact that you need people
who both (1) know what they're doing and (2) have time to review and
commit *other people's patches*.  In reality, a pretty significant
fraction of the current committers are either mostly inactive, or
essentially maintainers for one small area of the code.

...Robert

Re: Recovery Test Framework

From

"Fujii Masao"

Date:

13 January 2009, 02:37:57

Hi,

On Tue, Jan 13, 2009 at 3:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> One thing I find interesting is that the "Infrastructure Changes for
> Recovery" patch became the foundation for both "Hot Standby" and
> "Synchronous Replication".  That implies that those changes might be
> of somewhat more general interest, at least as the foundation for
> further work.  If we HS and/or SR are out of reach, it might be worth
> at least looking to see if any of that infrastructure work can be
> reasonably be committed for 8.4.

+1

If the community determined to postpone replication to 8.5, I'll give up
working on it (though I'll work it up to good place to leave off, of course)
and focus on reviewing infra-patch instead. I don't think that *all*
recovery-related patches are too late for 8.4. We should make a choice
for the patches and focus on the high-priority one.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Recovery Test Framework

From

"Fujii Masao"

Date:

13 January 2009, 03:12:51

Hi,

On Tue, Jan 13, 2009 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> IMHO, the synchronous replication isn't in such good shape, I'm afraid. I've
>> said this before, but I'm not happy with the "built from spare parts" nature
>> of it. You shouldn't have to configure an archive, file-based log shipping
>> using rsync or whatever, and pg_standby. All that is in addition to the
>> direct connection between master and slave. The slave really should be able
>> to just connect to the master, and download all the WAL it needs directly.
>> That's a huge usability issue if left as is, but requires very large
>> architectural changes to fix.
>
> Yeah, I wasn't thinking about this, but you had mentioned it before,
> and I thought (and think) it's a pretty fair criticism.  I think the
> base backup should be integrated into the mechanism as well.  I want
> to just be able to configure the master and slave for replication,
> fire up the slave, and walk away.  Without that, I agree that it's
> likely to be too cumbersome for any actual use.

I don't think this is essential for replication. It's an optimization, and
synch-rep still works fine without it. And, I'm not sure that this should
be the job of postgres, from the beginning. Do you think we should
develop "rsync" again for postgres?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Recovery Test Framework

From

Heikki Linnakangas

Date:

13 January 2009, 04:25:02

Joshua D. Drake wrote:
> On Mon, 2009-01-12 at 13:23 -0500, Robert Haas wrote:
>>> But wasn't I just reading something about having to wipe that repository
>>> and re-import the CVS history to fix various problems?
>> Not sure; I hope not. 
> 
> Actually yes we did. There was a bug in git-cvs that we fixed. Its is
> talked about here:
> 
> http://archives.postgresql.org/pgsql-www/2008-12/msg00182.php
> 
> But... that wasn't really the fault of git.

FWIW, we didn't "wipe the repository". For some reason, the CVS->GIT 
script decided to duplicate the whole history three times in the GIT 
repository. We only wiped the extra copies, and the commits done after 
the screw up.

So the screwed-up repository looked like this:

A->B->C->A'->B'->C'->A''->B''->C''->D->E

Where A, B, C are commits made before the screwup, A' etc. are extra 
copies of the same commits, and D and E are commits that were imported 
after the screwup.

The repository was fixed into:

A->B->C->D'->E'

So the history up to C remained the same, but commits D and E were 
re-imported, and therefore had their commit ids changed.

We don't know why this happened, so there's no guarantee that it won't 
happen again. At least we have a procedure to fix it now..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Heikki Linnakangas

Date:

13 January 2009, 05:09:38

Aidan Van Dyk wrote:
> With git, you pull down the complete *history* of whatever branch, tag,
> or reference you want to pull down.

You can do a so-called shallow clone, pulling only X most recent 
commits, with "git clone --depth=X". There's some limitations on what 
you can do with a shallow clone, but it's good enough for most purposes: 
you can create your own branches, merge and rebase them with upstream 
and create diffs.

>  The *default* "clone" options are
> setup to pull down the history of all available branches and tags, but
> that's not mandatory.

Right.

Here's how to create a shallow clone with just the five most recent 
commits, with master-branch only:

~$ mkdir pgsql-shallow
~$ cd pgsql-shallow/
~/pgsql-shallow$ git-init
Initialized empty Git repository in /home/hlinnaka/pgsql-shallow/.git/
~/pgsql-shallow$ git-remote add origin -t master 
git://git.postgresql.org/git/postgresql.git
~/pgsql-shallow$ git-fetch origin --depth=5
remote: Counting objects: 3646, done.
remote: Compressing objects: 100% (2247/2247), done.
remote: Total 3646 (delta 1567), reused 2334 (delta 1317)
Receiving objects: 100% (3646/3646), 15.77 MiB | 508 KiB/s, done.
Resolving deltas: 100% (1567/1567), done.From git://git.postgresql.org/git/postgresql * [new branch]      master     ->
origin/master

Not as straightforward as a plain git-clone, but it's possible. The 
resulting repository is ~16 MB, which isn't very much even across a 
crappy Internet connection.

> And the benefit of having the whole history of the branch available, is
> that you can work on the branch *and history* locally, committing,
> inspecting, reviewing, without needing to go back over the net.

Yeah. Before switching to git, I kept an rsync'd copy of the CVS 
repository on my laptop anyway for those reasons.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Simon Riggs

Date:

13 January 2009, 05:16:33

On Mon, 2009-01-12 at 20:52 -0500, Robert Haas wrote:

> I think the
> base backup should be integrated into the mechanism as well.  I want
> to just be able to configure the master and slave for replication,
> fire up the slave, and walk away.  Without that, I agree that it's
> likely to be too cumbersome for any actual use.

If you want integrated base backup, I would ask that we add it in the
next release and make it optional. It isn't necessary for sync rep and
is not a reason to slip that project; it's just icing. Many users have
been doing base backups for 2 releases now and I've never had a single
comment that it is cumbersome. The reverse actually, people say it is
flexible.

The flexibility of the current system is important for another reason.
"Integrated" will definitely mean single threaded because you just
aren't going to make it so complex. Single threaded has huge negative
implications in practice and we should not forget that the ability to do
a multi-threaded base backup is a critical user requirement.

Slony provides automated "base backup" transfer but does so using only a
single thread. So large databases take a long time to transfer. I have
spent time this year working with Jan, following up on two separate
ideas to improve this. The last one of those was looking at ways to
allow Slony to start via a base backup, just as warm standby allows.

Many users have found Warm Standby simple to configure and one part of
that is the ability to use parallel utilities to achieve the base
backup. 

Restricting the way bulk copying happens only prevents innovative
solutions such as split mirrors, snapshot copies or whatever. We cannot
judge what the best way to ship giga or even terabytes of data to
another site will be for any user.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Recovery Test Framework

From

Magnus Hagander

Date:

13 January 2009, 05:44:24

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> Tom Lane wrote:
>>> David Fetter <david@fetter.org> writes:
>>>> 1.  Remove the messages size limits on -hackers.  They serve no useful
>>>> purpose, and they interfere with our development process.
>>> Agreed, or at least boost it up a good bit more.
> 
>> the question really is how much "a bit more" is - right now the limit is 
>>    100000 characters which limits us to ~70KB of attachments (around the 
>> size of the Hot-standby patch if bzip2 compressed).
> 
>> The SE-Postgres patch for example is ~650KB uncompressed - if we want to 
>>   cope with uncompressed patches that large we would have to increase 
>> the current limit by a factor of 10 at least.
> 
> I feel no need to encourage people to send huge patches uncompressed ;-)
> 
> gzip normally gets at least 3x or 4x on large diffs.  So a limit around
> 250K ought to be enough.

Given this, I've increased the size to 1Mb. Let's see how that works out.

//Magnus

Re: Recovery Test Framework

From

Peter Eisentraut

Date:

13 January 2009, 06:00:52

Joshua D. Drake wrote:
> Does bzr, mecurial or monotone offer the same or better solution? Bzr in
> particular is in very wide use and I run into mecurial all the time.

I have found that mercurial is pretty much feature-equivalent to git, at 
least until you get to the really wizard-like use cases.  Either one is 
a fine choice, but we just have momentum going one way now.

bzr is, in my experience, inferior to either of the above, and appears 
to be a strategic dead end at this point.

Re: Recovery Test Framework

From

Heikki Linnakangas

Date:

13 January 2009, 06:33:52

Robert Haas wrote:
>> I am just explaining how it works in practice.  If the patch is still
>> being improved, the feeling is that the author wants more time to adjust
>> things, and with other things on our plate, we are glad to leave their
>> patch until last.
> 
> Well, it's good that you have an explanation, but I'm not sure it
> helps much.  :-)  Surely the patches that are most likely to change
> substantially are the big ones, and leaving those until last results
> in them not making the time-based cutoff.  Someone who submitted a
> 20-line patch isn't likely to revise it substantially; someone who is
> being paid $20k to write a patch is likely to spend a lot of time
> working on it.

Agreed. I've tried to do a quick review and give early feedback on the 
big patches, concentrating on high-level, architectural issues, so that 
authors of big patches don't need to twiddle their thumbs waiting for 
review.

OTOH, more detailed review at early phase is not a very good use of time 
if there's design issues to be resolved, and author is still working on it.

> And many of those were significantly modified in the process of being
> committed, which suggests that efforts to take the load of committers
> by having non-committers do reviews has not been entirely successful.
> (It would be interesting to here how much value people think it has
> added, and get suggestions on how to do things better next time.)

I don't have suggestions, but I'd just like to say that you Robert, have 
given extremely valuable feedback. And on many patches too, I have been 
very impressed throughout the commitfest. Thank you!

I'm not sure how much round-robin-review has taken load off committers, 
you have to read and understand a patch before committing anyway. It has 
helped, for sure, but not dramatically. However, I think that it has 
made a big difference from authors point of view; you get feedback earlier.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Recovery Test Framework

From

Tom Lane

Date:

13 January 2009, 10:53:54

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Robert Haas wrote:
>> (It would be interesting to here how much value people think it has
>> added, and get suggestions on how to do things better next time.)

> I'm not sure how much round-robin-review has taken load off committers, 
> you have to read and understand a patch before committing anyway. It has 
> helped, for sure, but not dramatically. However, I think that it has 
> made a big difference from authors point of view; you get feedback earlier.

I think it's helped from the committers' standpoint too, in the form of
taking care of some issues that would otherwise have had to be dealt
with by the committer.  (Which was all we asked for anyway.)

In my mind though, the real benefit of the system and the reason we
should keep it up is to get more people looking at the code.  New
committers don't grow on trees, they come from people getting involved.
        regards, tom lane

Re: Recovery Test Framework

From

Gregory Stark

Date:

13 January 2009, 11:03:29

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Robert Haas wrote:
>>> (It would be interesting to here how much value people think it has
>>> added, and get suggestions on how to do things better next time.)
>
>> I'm not sure how much round-robin-review has taken load off committers, 
>> you have to read and understand a patch before committing anyway. It has 
>> helped, for sure, but not dramatically. However, I think that it has 
>> made a big difference from authors point of view; you get feedback earlier.
>
> I think it's helped from the committers' standpoint too, in the form of
> taking care of some issues that would otherwise have had to be dealt
> with by the committer.  (Which was all we asked for anyway.)

I was pleasantly surprised by how helpful the feedback was on posix_fadvise. I
don't know how much real work it removed from Tom's plate but I suspect it did
reduce the little annoyances significantly.

> In my mind though, the real benefit of the system and the reason we
> should keep it up is to get more people looking at the code.  New
> committers don't grow on trees, they come from people getting involved.

Good point.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!