Re: Should wal receiver reply to wal sender more aggressively? - Mailing list pgsql-hackers
From | Paul Guo |
---|---|
Subject | Re: Should wal receiver reply to wal sender more aggressively? |
Date | |
Msg-id | CABQrizdRLAYNXNx9AnKeWCSJvvRrzBLJWJoNX3WaDJbusU8BxA@mail.gmail.com Whole thread Raw |
In response to | Should wal receiver reply to wal sender more aggressively? (Paul Guo <paulguo@gmail.com>) |
List | pgsql-hackers |
[ Resending the mail since I found my previous email has a very bad format that is hard to read]. While working on some related issues I found that the wal receiver tries to call walrcv_receive() loop before replying the write/flush/apply LSN to wal senders in XLogWalRcvSendReply(). It is possible that walrcv_receive() loop receives and writes a lot of xlogs, so it does not reply those LSN information in time, thus finally slows down the transactions due to syncrep wait (assuming default synchronous_commit) During TPCB testing, I found the worst case is that 10,466,469 bytes were consumed in the walrcv_receive() loop. More seriously, we call XLogWalRcvSendReply(false, false) after handling those bytes; The first argument false means no force , i.e. it notifies unless max time of guc wal_receiver_status_interval value (10s by default) is reached, so we may have to wait for other calls of XLogWalRcvSendReply() to notify the wal sender. I thought and tried enhancing this by force-flushing-replying each time when receiving a maximum bytes (e.g. 128K) but several things confused me: - What's the purpose of guc wal_receiver_status_interval? The OS kernel is usually not efficient when handling small packets but we are not replying that aggressively so why is this guc there? - I run simple TPCB (1000 scaling, 200 connections, shared_buffers, max_connections tuned) but found no obvious performance difference with and without the code change. I did not see an obvious system IO/CPU/network) bottleneck - probably the bottleneck is in PG itself? I did not investigate further at this moment, but the change should in theory help, right? I may continue investigating but probably won't do this unless I have some clear answers to the confusions. Another thing came to my mind is the wal receiver logic: Currently the wal receiver process does network io, wal write, wal flush in one process. Network io is async, blocking at epoll/poll, etc, wal write is mostly non-blocking, but for wal flush, probably we could decouple it to a dedicated process? And maybe use sync_file_range instead of wal file fsync in issue_xlog_fsync()? We should sync those wal contents with lower LSN at first and reply to the wal sender in time, right?. Below is the related code: /* See if we can read data immediately */ len = walrcv_receive(wrconn, &buf, &wait_fd); if (len != 0) { /* * Process the received data, and any subsequent data we * can read without blocking. */ for (;;) { if (len > 0) { /* * Something was received from primary, so reset * timeout */ last_recv_timestamp = GetCurrentTimestamp(); ping_sent = false; XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1); } else if (len == 0) break; else if (len < 0) { ereport(LOG, (errmsg("replication terminated by primary server"), errdetail("End of WAL reached on timeline %u at %X/%X.", startpointTLI, LSN_FORMAT_ARGS(LogstreamResult.Write)))); endofwal = true; break; } len = walrcv_receive(wrconn, &buf, &wait_fd); } /* Let the primary know that we received some data. */ XLogWalRcvSendReply(false, false); /* * If we've written some records, flush them to disk and * let the startup process and primary server know about * them. */ XLogWalRcvFlush(false);
pgsql-hackers by date: