Re: Streaming replication status - Mailing list pgsql-hackers
From | Stefan Kaltenbrunner |
---|---|
Subject | Re: Streaming replication status |
Date | |
Msg-id | 4B4CE36E.3010603@kaltenbrunner.cc Whole thread Raw |
In response to | Re: Streaming replication status (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Streaming replication status
|
List | pgsql-hackers |
Simon Riggs wrote: > On Tue, 2010-01-12 at 15:11 -0500, Bruce Momjian wrote: >> Stefan Kaltenbrunner wrote: >>> Simon Riggs wrote: >>>> On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote: >>>>> Fujii Masao wrote: >>>>>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg@2ndquadrant.com> wrote: >>>>>>> I don't think anybody can deploy this feature without at least some very >>>>>>> basic monitoring here. I like the basic proposal you made back in September >>>>>>> for adding a pg_standbys_xlog_location to replace what you have to get from >>>>>>> ps right now: >>>>>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php >>>>>>> >>>>>>> That's basic, but enough that people could get by for a V1. >>>>>> Yeah, I have no objection to add such simple capability which monitors >>>>>> the lag into the first release. But I guess that, in addition to that, >>>>>> Simon wanted the capability to collect the statistical information about >>>>>> replication activity (e.g., a transfer time, a write time, replay time). >>>>>> So I'd like to postpone it. >>>>> yeah getting that would all be nice and handy but we have to remember >>>>> that this is really our first cut at integrated replication. Being able >>>>> to monitor lag is what is needed as a minimum, more advanced stuff can >>>>> and will emerge once we get some actual feedback from the field. >>>> Though there won't be any feedback from the field because there won't be >>>> any numbers to discuss. Just "it appears to be working". Then we will go >>>> into production and the problems will begin to be reported. We will be >>>> able to do nothing to resolve them because we won't know how many people >>>> are affected. >>> field is also production usage in my pov, and I'm not sure how we would >>> know how many people are affected by some imaginary issue just because >>> there is a column that has some numbers in it. >>> All of the large features we added in the past got finetuned and >>> improved in the following releases, and I expect SR to be one of them >>> that will see a lot of improvement in 8.5+n. >>> Adding detailed monitoring of some random stuff (I don't think there was >>> a clear proposal of what kind of stuff you would like to see) while we >>> don't really know what the performance characteristics are might easily >>> lead to us provding a ton of data and nothing relevant :( >>> What I really think we should do for this first cut is to make it as >>> foolproof and easy to set up as possible and add the minimum required >>> monitoring knobs but not going overboard with doing too many stats. >> I totally agree. If SR isn't going to be useful without being >> feature-complete, we might as well just drop it for 8.5 right now. >> >> Let's get a reasonable feature set implemented and then come back in 8.6 >> to improve it. For example, there is no need for a special >> 'replication' user (just use super-user), and monitoring should be >> minimal until we have field experience of exactly what monitoring we >> need. >> >> The final commit-fest is in 5 days --- this is not the time for design >> discussion and feature additions. If we wait for SR to be feature >> complete, with design discussions, etc, we will hopelessly delay 8.5 and >> people will get frustrated. I am not saying we can't talk about design, >> but none of this should be a requirement for 8.5. > > We can't add monitoring until we know what the performance > characteristics are. Hmmm. And how will we know what the performance > characteristics are, I wonder? well I would say we do exactly how we have done in the past with other features - by debugging the stuff with low level tools until we fully understand what it really is and then we can always add more "accessible" stats. Stefan
pgsql-hackers by date: