Thread: Synch Rep v5
Hi, I attached the updated version of Synch Rep patch (v5) on wiki. The description of "User Overview" is also already updated. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Synch_Rep On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> 4. sleeping >> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php >> >> I'm looking for the better idea. How should we resolve that problem? >> Only reduce the timeout of pq_wait to 100ms? Get rid of >> SA_RESTART only during pq_wait as follows? >> >> remove SA_RESTART >> pq_wait() >> add SA_RESTART > > Not sure, will consider. Ask others as well. I've not got an idea yet. Now (v5), I only reduce the timeout of pq_wait to 100ms. Is this sufficient? Do you have any good idea? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sat, 2009-01-10 at 19:16 +0900, Fujii Masao wrote: > I attached the updated version of Synch Rep patch (v5) on wiki. > The description of "User Overview" is also already updated. > http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Synch_Rep Looks good on initial read of Wiki. Few minor comments: The advantage of doing things this way is that the base backup can take place any way the user chooses, so potentially much faster than using a single session. I notice we use the same settings for keepalives. We may need that to be a second set of parameters. Progress reporting will be easier with HS, so you are right to leave that alone. Don't understand: "Completely automated catching up; User has to carry out some procedure manually for making the standby catch up." Multiple standby is still possible, but just using old file based mechanisms. We would need to be careful about use of %R in that case. I believe the max delay is 2* wal_sender_delay. I like the way recovery_trigger_file avoids changing pg_standby, but I guess we still need to plug that gap also one day. But does patch 10 also have the other mechanism? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sat, 2009-01-10 at 19:16 +0900, Fujii Masao wrote: > On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> 4. sleeping > >> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php > >> > >> I'm looking for the better idea. How should we resolve that problem? > >> Only reduce the timeout of pq_wait to 100ms? Get rid of > >> SA_RESTART only during pq_wait as follows? > >> > >> remove SA_RESTART > >> pq_wait() > >> add SA_RESTART > > > > Not sure, will consider. Ask others as well. > > I've not got an idea yet. Now (v5), I only reduce the timeout of > pq_wait to 100ms. Is this sufficient? Do you have any good idea? To be honest I didn't follow that part of the discussion. My preferred approach, mentioned earlier in the summer, was to use a mechanism very similar to LWlocks. A proc queue with semaphores. Minimum delay, no need for signals. The process doing the wakeup can walk up the queue until it finds somebody whose wait-for-LSN is higher than has just been sent/written. Doing it this way also gives us group commit when synch rep is not enabled. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Sat, Jan 10, 2009 at 10:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sat, 2009-01-10 at 19:16 +0900, Fujii Masao wrote: > >> On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> 4. sleeping >> >> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php >> >> >> >> I'm looking for the better idea. How should we resolve that problem? >> >> Only reduce the timeout of pq_wait to 100ms? Get rid of >> >> SA_RESTART only during pq_wait as follows? >> >> >> >> remove SA_RESTART >> >> pq_wait() >> >> add SA_RESTART >> > >> > Not sure, will consider. Ask others as well. >> >> I've not got an idea yet. Now (v5), I only reduce the timeout of >> pq_wait to 100ms. Is this sufficient? Do you have any good idea? > > To be honest I didn't follow that part of the discussion. > > My preferred approach, mentioned earlier in the summer, was to use a > mechanism very similar to LWlocks. A proc queue with semaphores. Minimum > delay, no need for signals. The process doing the wakeup can walk up the > queue until it finds somebody whose wait-for-LSN is higher than has just > been sent/written. Doing it this way also gives us group commit when > synch rep is not enabled. Yes, using semaphores for the communication is also my first approach. The problem of this approach is that walsender cannot wait for both signal from backends and the response from walreceiver concurrently, because wait-for-semaphore is blocking at least. So, I use signal for the communication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, Thanks for your comments! On Sat, Jan 10, 2009 at 10:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I notice we use the same settings for keepalives. We may need that to be > a second set of parameters. Or, we should make walreceiver execute "SET tcp_keepalives_xxx TO yyy" before starting replication if such settins are specified in recovery.conf? > Don't understand: "Completely automated catching up; User has to carry > out some procedure manually for making the standby catch up." Oh sorry, this description is not correct; the standby can catch up with the primary automatically if archive area is shared between those two servers. In fact, xlogs generated before / during replication are shipped by archiver / walsender, respectively. I also updated the figures about flow of xlogs. Please check it. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Architecture_Design > Multiple standby is still possible, but just using old file based > mechanisms. We would need to be careful about use of %R in that case. Yes. Synch Rep can work fine with existing warm-standby mechanism. > I believe the max delay is 2* wal_sender_delay. In async replication case, walsender tries to send the xlogs once per wal_sender_delay, and receives the response from the standby on demand. So, I think that max delay is wal_sender_delay. Am I missing something? > I like the way recovery_trigger_file avoids changing pg_standby, but I > guess we still need to plug that gap also one day. But does patch 10 > also have the other mechanism? As you imply, current synch-rep has already not needed the change of pg_standby, so I'll get rid of the patch from synch-rep patchset. Of course, this patch is still useful for existing warm-standby. I should add this patch to commitfest for 8.5? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sun, 2009-01-11 at 17:19 +0900, Fujii Masao wrote: > Thanks for your comments! > > On Sat, Jan 10, 2009 at 10:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I notice we use the same settings for keepalives. We may need that to be > > a second set of parameters. > > Or, we should make walreceiver execute "SET tcp_keepalives_xxx TO yyy" > before starting replication if such settins are specified in recovery.conf? Sounds reasonable, if maybe not ideal. > > Don't understand: "Completely automated catching up; User has to carry > > out some procedure manually for making the standby catch up." > > Oh sorry, this description is not correct; the standby can catch up with the > primary automatically if archive area is shared between those two servers. > In fact, xlogs generated before / during replication are shipped by > archiver / walsender, respectively. > > I also updated the figures about flow of xlogs. Please check it. > http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Architecture_Design Can't see anything different! > > Multiple standby is still possible, but just using old file based > > mechanisms. We would need to be careful about use of %R in that case. > > Yes. Synch Rep can work fine with existing warm-standby mechanism. If we want multiple standby servers they wouldn't both be able to trim files from the archive. So we would need to change pg_standby so it records the %R from multiple servers on the archive and only trimmed the max of those %R values. > > I believe the max delay is 2* wal_sender_delay. > > In async replication case, walsender tries to send the xlogs once per > wal_sender_delay, and receives the response from the standby on > demand. So, I think that max delay is wal_sender_delay. Am I missing > something? Sending takes time as well, so it is send_time + delay at least. > > I like the way recovery_trigger_file avoids changing pg_standby, but I > > guess we still need to plug that gap also one day. But does patch 10 > > also have the other mechanism? > > As you imply, current synch-rep has already not needed the change > of pg_standby, so I'll get rid of the patch from synch-rep patchset. > Of course, this patch is still useful for existing warm-standby. I should > add this patch to commitfest for 8.5? May as well leave it in, so people can use it with 8.3. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sun, 2009-01-11 at 15:11 +0900, Fujii Masao wrote: > Yes, using semaphores for the communication is also my first approach. > The problem of this approach is that walsender cannot wait for both > signal from backends and the response from walreceiver concurrently, > because > wait-for-semaphore is blocking at least. So, I use signal for the > communication. IIUC: In sync mode backend sends signal to walsender, then adds itself to wait queue on semaphore. walsender responds to signal, sends more WAL then waits for response. When response comes it then wakes backends on the semaphore. In async mode, no signal is sent and we do not wait, we just allow the walsender to wake up periodically and send. Does it release waiters as soon as possible, or does it always respond to new requests for sending? i.e. which has priority - responding to waiters who need to be woken or responding to new requests to send? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Mon, Jan 12, 2009 at 1:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Multiple standby is still possible, but just using old file based >> > mechanisms. We would need to be careful about use of %R in that case. >> >> Yes. Synch Rep can work fine with existing warm-standby mechanism. > > If we want multiple standby servers they wouldn't both be able to trim > files from the archive. So we would need to change pg_standby so it > records the %R from multiple servers on the archive and only trimmed the > max of those %R values. s/max/min ? >> > I believe the max delay is 2* wal_sender_delay. >> >> In async replication case, walsender tries to send the xlogs once per >> wal_sender_delay, and receives the response from the standby on >> demand. So, I think that max delay is wal_sender_delay. Am I missing >> something? > > Sending takes time as well, so it is send_time + delay at least. Right. I'll change the doc into an exact description. Similarly, though the document of "synchronous_commit" says ----------- When off, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) ----------- write_time also should be added up to the maximum delay. >> > I like the way recovery_trigger_file avoids changing pg_standby, but I >> > guess we still need to plug that gap also one day. But does patch 10 >> > also have the other mechanism? >> >> As you imply, current synch-rep has already not needed the change >> of pg_standby, so I'll get rid of the patch from synch-rep patchset. >> Of course, this patch is still useful for existing warm-standby. I should >> add this patch to commitfest for 8.5? > > May as well leave it in, so people can use it with 8.3. I'd like this patch to be reviewed and committed for 8.4 even if synch-rep is postponed to 8.5. Because this is very useful for existing warm-standby mechanism, and only warm-standby would be still a built-in replication solution in 8.4. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, 2009-01-13 at 16:39 +0900, Fujii Masao wrote: > > May as well leave it in, so people can use it with 8.3. > > I'd like this patch to be reviewed and committed for 8.4 even if > synch-rep > is postponed to 8.5. Because this is very useful for existing > warm-standby > mechanism, and only warm-standby would be still a built-in replication > solution in 8.4. Yes, no worries. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Mon, Jan 12, 2009 at 1:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sun, 2009-01-11 at 15:11 +0900, Fujii Masao wrote: > >> Yes, using semaphores for the communication is also my first approach. >> The problem of this approach is that walsender cannot wait for both >> signal from backends and the response from walreceiver concurrently, >> because >> wait-for-semaphore is blocking at least. So, I use signal for the >> communication. > > IIUC: In sync mode backend sends signal to walsender, then adds itself > to wait queue on semaphore. walsender responds to signal, sends more WAL > then waits for response. When response comes it then wakes backends on > the semaphore. In async mode, no signal is sent and we do not wait, we > just allow the walsender to wake up periodically and send. I'd like walsender not to wait for response in blocking mode. Because, if so, the network delay would directly interfere with transaction processing. In my design, walsender waits for response in non-blocking mode and tries to send the next requested WAL if any responses have not arrived yet. So, walsender needs to wait for signal from backend and response from standby concurrently. Problem is how walsender should wait for signal and response concurrently. If we use select/poll for it, walsender cannot respond the signal immediately in some platform. If we don't use them (which means the loop without sleep), CPU utilization would jump. > Does it release waiters as soon as possible, or does it always respond > to new requests for sending? i.e. which has priority - responding to > waiters who need to be woken or responding to new requests to send? Responding to waiters should have higher priority, for stability of response time. In fact, if both the signal from backend and the response from the standby have arrived, walsender reads the response and wakes the waiters up in first. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Sat, Jan 10, 2009 at 7:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I attached the updated version of Synch Rep patch (v5) on wiki. > The description of "User Overview" is also already updated. > http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Synch_Rep Attached is the updated version of synch-rep patch (v0114). I adjusted the patch against the latest head, fixed some bugs and wrote out some documents. I attached this patch also on wiki. There are README of the patch and the description of user overview in wiki. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Version_History If you are interested in synch-rep, please try testing or reviewing it. Any comments and feedbacks welcome! Though I'm not sure if synch-rep is postponed to 8.5, at least I'll work it up to good place to leave off. Next, I'll try to get rid of file-based log-shipping part from synch-rep, which is requested by some hackers. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center