Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. |
Date | |
Msg-id | CAB7nPqQJ7X4Q+hDPxvVHY5Ucic3E7pGb335u7k_qq-yqCdSGaw@mail.gmail.com Whole thread Raw |
In response to | Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. (Amir Rohan <amir.rohan@zoho.com>) |
Responses |
Re: Re: In-core regression tests for replication,
cascading, archiving, PITR, etc.
(Amir Rohan <amir.rohan@zoho.com>)
|
List | pgsql-hackers |
On Thu, Oct 8, 2015 at 6:03 PM, Amir Rohan wrote: > On 10/08/2015 10:39 AM, Michael Paquier wrote: >>> Someone mentioned a daisy chain setup which sounds fun. Anything else in >>> particular? Also, it would be nice to have some canned way to measure >>> end-to-end replication latency for variable number of nodes. >> >> Hm. Do you mean comparing the LSN position between two nodes even if >> both nodes are not connected to each other? What would you use it for? >> > > In a cascading replication setup, the typical _time_ it takes for a > COMMIT on master to reach the slave (assuming constant WAL generation > rate) is an important operational metric. Hm. You mean the exact amount of time it gets to be sure that a given WAL position has been flushed on a cascading standby, be it across multiple layers. Er, that's a bit tough without patching the backend where I guess we would need to keep a track of when a LSN position has been flushed. And calls of gettimeofday are expensive, so that does not sound like a plausible alternative here to me... > It would be useful to catch future regressions for that metric, > which may happen even when a patch doesn't outright break cascading > replication. Just automating the measurement could be useful if > there's no pg facility that tracks performance over time in > a regimented fashion. I've seen multiple projects which consider > a "benchmark suite" to be part of its testing strategy. Ah, OK. I see. That's a bit out of scope of this patch, and that's really OS-dependent, but as long as the comparisons can be done on the same OS it would make sense. > As for the "daisy chain" thing, it was (IIRC) mentioned in a josh berkus > talk I caught on youtube. It's possible to setup cascading replication, > take down the master, and then reinsert it as replicating slave, so that > you end up with *all* servers replicating from the > ancestor in the chain, and no master. I think it was more > a fun hack then anything, but also an interesting corner case to > investigate. Ah, yes. I recall this one. I am sure it made the audience smile. All the nodes link to each other in closed circle. >>> What about going back through the commit log and writing some regression >>> tests for the real stinkers, if someone care to volunteer some candidate >>> bugs >> >> I have drafted a list with a couple of items upthread: >> http://www.postgresql.org/message-id/CAB7nPqSgffSPhOcrhFoAsDAnipvn6WsH2nYkf1KayRm+9_MTGw@mail.gmail.com >> So based on the existing patch the list becomes as follows: >> - wal_retrieve_retry_interval with a high value, say setting to for >> example 2/3s and loop until it is applied by checking it is it has >> been received by the standby every second. >> - recovery_target_action >> - archive_cleanup_command >> - recovery_end_command >> - pg_xlog_replay_pause and pg_xlog_replay_resume >> In the list of things that could have a test, I recall that we should >> test as well 2PC with the recovery delay, look at a1105c3d. This could >> be included in 005. > > a1105c3 Mar 23 Fix copy & paste error in 4f1b890b137. Andres Freund > 4f1b890 Mar 15 Merge the various forms of transaction commit & abort > records. Andres Freund > > Is that the right commit? That's this one. a1105c3 was actually rather tricky... The idea is to simply check the WAL replay delay with COMMIT PREPARED. >> The advantage of implementing that now is that we could see if the >> existing routines are solid enough or not. > > I can do this if you point me at a self-contained thread/#issue. Hm. This patch is already 900 lines, perhaps it would be wiser not to make it more complicated for now.. -- Michael
pgsql-hackers by date: