Re: Core team statement on replication in PostgreSQL - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Core team statement on replication in PostgreSQL
Date
Msg-id 4840AAAA.7010109@dunslane.net
Whole thread Raw
In response to Re: Core team statement on replication in PostgreSQL  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers

Tatsuo Ishii wrote:
>> Andreas 'ads' Scherbaum wrote:
>>     
>>> On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
>>>
>>>   
>>>       
>>>> Well, yes, but you do know about archive_timeout, right? No need to wait 
>>>> 2 hours.
>>>>     
>>>>         
>>> Then you ship 16 MB binary stuff every 30 second or every minute but
>>> you only have some kbyte real data in the logfile. This must be taken
>>> into account, especially if you ship the logfile over the internet
>>> (means: no high-speed connection, maybe even pay-per-traffic) to the
>>> slave.
>>>
>>>
>>>
>>>   
>>>       
>> Sure there's a price to pay. But that doesn't mean the facility doesn't 
>> exist. And I rather suspect that most of Josh's customers aren't too 
>> concerned about traffic charges or affected by such bandwidth 
>> restrictions. Certainly, none of my clients are, and they aren't in the 
>> giant class. Shipping a 16Mb file, particularly if compressed, every 
>> minute or so, is not such a huge problem for a great many commercial 
>> users, and even many domestic users.
>>     
>
> Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in
> Japan (parent company of Sumitomo Electric Information Systems Co.,
> Ltd., which is one of the Recursive SQL development support company)
> uses 100 PostgreSQL servers. They are doing backups by using log
> shipping to another data center and have problems with the amount of
> the transferring log data. They said this is one of the big problems
> they have with PostgreSQL and hope it will be solved in the near
> future.
>
>   

Excellent data point. Now, what I'd like to know is whether they are 
getting into trouble simply because of the volume of log data generated 
or because they have a short archive_timeout set. If it's the former 
(which seems more likely) then none of the ideas I have seen so far in 
this discussion seemed likely to help, and that would indeed be a major 
issue we should look at. Another question is this: are they being 
overwhelmed by the amount of network traffic generated, or by difficulty 
in postgres producers and/or consumers to keep up? If it's network 
traffic, then perhaps compression would help us.

Maybe we need to set some goals for the level of log volumes we expect 
to be able to create/send/comsume.

cheers

andrew


pgsql-hackers by date:

Previous
From: Andreas 'ads' Scherbaum
Date:
Subject: Re: Core team statement on replication in PostgreSQL
Next
From: "Ollie Joiner"
Date:
Subject: Pgsql-hackers