Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Sync Rep: First Thoughts on Code |
Date | |
Msg-id | 1228131740.20796.294.camel@hp_dx2400_1 Whole thread Raw |
Responses |
Re: Sync Rep: First Thoughts on Code
Re: Sync Rep: First Thoughts on Code |
List | pgsql-hackers |
Breaking down of patch into sections works very well for review. Should allow us to get different reviewers on different parts of the code - review wranglers please take note: Dave, Josh. Can you confirm that all the docs on the Wiki page are up to date? There are a few minor discrepancies that make me think it isn't. Examples: "For example, to make a single multi-statement transaction replication asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF within the transaction." Do we mean synchronous_replication in this sentence? I think you've copied the text and not changed all of the necessary parts - please re-read the whole section (probably the whole Wiki, actually). "wal_writer_delay" - do we mean wal_sender_delay? Is there some ability to measure the amount of data to be sent and avoid the delay altogether, when the server is sufficiently busy? The reaction to replication_timeout may need to be configurable. I might not want to keep on processing if the information didn't reach the standby. I would prefer in many cases that the transactions that were waiting for walsender would abort, but the walsender kept processing. How can we restart the walsender if it shuts down? Do we want a maximum wait for a transaction and a maximum wait for the server? Do we report stats on how long the replication has been taking? If the average rep time is close to rep timeout then we will be fragile, so we need some way to notice this and produce warnings. Or at least provide info to an external monitoring system. How do we specify the user we use to connect to primary? Definitely need more explanatory comments/README-style docs. For example, 03_libpq seems simple and self-contained. I'm not sure why we have a state called PGASYNC_REPLICATION; I was hoping that would be dynamic, but I'm not sure where to look for that. It would be useful to have a very long comment within the code to explain how the replication messages work, and note on each function who the intended client and server is. 02_pqcomm: What does HAVE_POLL mean? Do we need to worry about periodic renegotiation of keys in be-secure.c? Not sure I understand why so many new functions in there. 04_recovery_conf is a change I agree with, though I think it may not work with EXEC_BACKEND for Windows. 05... I need dome commentary to explain this better. 06 and 07 are large and will take substantial review time. So we must get the overall architecture done first and then check the code that implements that. 08 - I think I get this, but some docs will help to confirm. 09 pg_standby changes: so more changes are coming there? OK. Can we refer to those two options as failover and switchover? There's no need to change definitions that many Postgres people already use. This change can be done without making any change to server behaviour, so this change can have benefit to 8.2 and 8,3 people also. 01_signal_handling: I've looked at the LWlock acquires and releases in the patch and am fairly happy, except for the ProcArrayLock acquire during this sub-patch. Do we really need to do things this way? Is the actual state important? Could we just do this with a counter which cycles? So callers increment counter atomically and the reader just polls to see if anybody has incremented? Or could we protect that part of the proc with a different lock? Touching ProcArrayLock is bad news. Anyway, feeling very positive about this. Hope we can get this reviewed and committed in next 3-4 weeks. I have many clues as to how to structure my own work also. Thanks. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: