Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Sync Rep: First Thoughts on Code |
Date | |
Msg-id | 1228223386.20796.358.camel@hp_dx2400_1 Whole thread Raw |
In response to | Re: Sync Rep: First Thoughts on Code ("Fujii Masao" <masao.fujii@gmail.com>) |
Responses |
Re: Sync Rep: First Thoughts on Code
Re: Sync Rep: First Thoughts on Code Re: Sync Rep: First Thoughts on Code |
List | pgsql-hackers |
On Tue, 2008-12-02 at 21:37 +0900, Fujii Masao wrote: > Thanks for taking many hours to review the code!! > > On Mon, Dec 1, 2008 at 8:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Can you confirm that all the docs on the Wiki page are up to date? There > > are a few minor discrepancies that make me think it isn't. > > Documentation is ongoing. Sorry for my slow progress. > > BTW, I'm going to add and change the sgml files listed on wiki. > http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Documentation_Plan I'm patient, I know it takes time. Happy to spend hours on the review, but I want to do that knowing I agree with the higher level features and architecture first. This was just a first review, I expect to spend more time on it yet. > > The reaction to replication_timeout may need to be configurable. I might > > not want to keep on processing if the information didn't reach the > > standby. > > OK. I will add new GUC variable (PGC_SIGHUP) to specify the reaction for > the timeout. > > > I would prefer in many cases that the transactions that were > > waiting for walsender would abort, but the walsender kept processing. > > Is it dangerous to abort the transaction with replication continued when > the timeout occurs? I think that the WAL consistency between two servers > might be broken. Because the WAL writing and sending are done concurrently, > and the backend might already write the WAL to disk on the primary when > waiting for walsender. The issue I see is that we might want to keep wal_sender_delay small so that transaction times are not increased. But we also want wal_sender_delay high so that replication never breaks. It seems better to have the action on wal_sender_delay configurable if we have an unsteady network (like the internet). Marcus made some comments on line dropping that seem relevant here; we should listen to his experience. Hmmm, dangerous? Well assuming we're linking commits with replication sends then it sounds it. We might end up committing to disk and then deciding to abort instead. But remember we don't remove the xid from procarray or mark the result in clog until the flush is over, so it is possible. But I think we should discuss this in more detail when the main patch is committed. > > Do we report > > stats on how long the replication has been taking? If the average rep > > time is close to rep timeout then we will be fragile, so we need some > > way to notice this and produce warnings. Or at least provide info to an > > external monitoring system. > > Sounds good. How about log_min_duration_replication? If the rep time > is greater than it, we produce warning (or log) like log_min_duration_xx. Maybe, lets put in something that logs if >50% (?) of timeout. Make that configurable with a #define and see if we need that to be configurable with a GUC later. > > Do we need to worry about periodic > > renegotiation of keys in be-secure.c? > > What is "keys" you mean? See the notes in that file for explanation. I wondered whether it might be a perf problem for us? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: