Re: Streaming replication status - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: Streaming replication status
Date
Msg-id 4B4CE36E.3010603@kaltenbrunner.cc
Whole thread Raw
In response to Re: Streaming replication status  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Streaming replication status
List pgsql-hackers
Simon Riggs wrote:
> On Tue, 2010-01-12 at 15:11 -0500, Bruce Momjian wrote:
>> Stefan Kaltenbrunner wrote:
>>> Simon Riggs wrote:
>>>> On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
>>>>> Fujii Masao wrote:
>>>>>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>>>>>>> I don't think anybody can deploy this feature without at least some very
>>>>>>> basic monitoring here.  I like the basic proposal you made back in September
>>>>>>> for adding a pg_standbys_xlog_location to replace what you have to get from
>>>>>>> ps right now:
>>>>>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>>>>>>
>>>>>>> That's basic, but enough that people could get by for a V1.
>>>>>> Yeah, I have no objection to add such simple capability which monitors
>>>>>> the lag into the first release. But I guess that, in addition to that,
>>>>>> Simon wanted the capability to collect the statistical information about
>>>>>> replication activity (e.g., a transfer time, a write time, replay time).
>>>>>> So I'd like to postpone it.
>>>>> yeah getting that would all be nice and handy but we have to remember 
>>>>> that this is really our first cut at integrated replication. Being able 
>>>>> to monitor lag is what is needed as a minimum, more advanced stuff can 
>>>>> and will emerge once we get some actual feedback from the field.
>>>> Though there won't be any feedback from the field because there won't be
>>>> any numbers to discuss. Just "it appears to be working". Then we will go
>>>> into production and the problems will begin to be reported. We will be
>>>> able to do nothing to resolve them because we won't know how many people
>>>> are affected.
>>> field is also production usage in my pov, and I'm not sure how we would 
>>> know how many people are affected by some imaginary issue just because 
>>> there is a column that has some numbers in it.
>>> All of the large features we added in the past got finetuned and 
>>> improved in the following releases, and I expect SR to be one of them 
>>> that will see a lot of improvement in 8.5+n.
>>> Adding detailed monitoring of some random stuff (I don't think there was 
>>> a clear proposal of what kind of stuff you would like to see) while we 
>>> don't really know what the performance characteristics are might easily 
>>> lead to us provding a ton of data and nothing relevant :(
>>> What I really think we should do for this first cut is to make it as 
>>> foolproof and easy to set up as possible and add the minimum required 
>>> monitoring knobs but not going overboard with doing too many stats.
>> I totally agree.  If SR isn't going to be useful without being
>> feature-complete, we might as well just drop it for 8.5 right now. 
>>
>> Let's get a reasonable feature set implemented and then come back in 8.6
>> to improve it.  For example, there is no need for a special
>> 'replication' user (just use super-user), and monitoring should be
>> minimal until we have field experience of exactly what monitoring we
>> need.  
>>
>> The final commit-fest is in 5 days --- this is not the time for design
>> discussion and feature additions.  If we wait for SR to be feature
>> complete, with design discussions, etc, we will hopelessly delay 8.5 and
>> people will get frustrated.  I am not saying we can't talk about design,
>> but none of this should be a requirement for 8.5.
> 
> We can't add monitoring until we know what the performance
> characteristics are. Hmmm. And how will we know what the performance
> characteristics are, I wonder?

well I would say we do exactly how we have done in the past with other 
features - by debugging the stuff with low level tools until we fully 
understand what it really is and then we can always add more 
"accessible" stats.


Stefan


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Streaming replication status
Next
From: Andres Freund
Date:
Subject: Re: Hot Standy introduced problem with query cancel behavior