Thread: [HACKERS] Receive buffer size for the statistics socket
So I put in the patch I'd proposed to reduce sleep delays in the stats regression test, and I see that frogmouth has now failed that test twice, with symptoms suggesting that it's dropping the last stats report --- but not all of the stats reports --- from the test's first session. I considered reverting the test patch, but on closer inspection that seems like it would be shooting the messenger, because this is indicating a real and reproducible loss of stats data. I put in some debug elog's to see how much data is getting shoved at the stats collector during this test, and it seems that it can be as much as about 12K, between what the first session can send at exit and what the second session will send immediately after startup. (I think this value should be pretty platform-independent, but be it noted that I'm measuring on a 64-bit Linux system while frogmouth is 32-bit Windows.) Now, what's significant about that is that frogmouth is a pretty old Windows version, and what I read on the net is that Windows versions before 2012 have only 8KB socket receive buffer size by default. So this behavior is plausibly explained by the theory that the stats collector's receive buffer is overflowing, causing loss of stats messages. This could well explain the stats-test failures we've seen in the past too, which if memory serves were mostly on Windows. Also, it's clear that a session could easily shove much more than 8KB at a time out to the stats collector, because what we're doing in the stats test does not involve touching any very large number of tables. So I think this is not just a test failure but is telling us about a plausible mechanism for real-world statistics drops. I observe a default receive buffer size around 124K on my Linux box, which seems like it'd be far less prone to overflow than 8K. I propose that it'd be a good idea to try to set the stats socket's receive buffer size to be a minimum of, say, 100K on all platforms. Code for this would be analogous to what we already have in pqcomm.c (circa line 760) for forcing up the send buffer size, but SO_RCVBUF not SO_SNDBUF. A further idea is that maybe backends should be tweaked to avoid blasting large amounts of data at the stats collector in one go. That would require more thought to design, though. Thoughts? regards, tom lane
On 05/14/2017 09:54 PM, Tom Lane wrote: > Also, it's clear that a session could easily shove much more than 8KB at > a time out to the stats collector, because what we're doing in the stats > test does not involve touching any very large number of tables. So I > think this is not just a test failure but is telling us about a plausible > mechanism for real-world statistics drops. > > I observe a default receive buffer size around 124K on my Linux box, > which seems like it'd be far less prone to overflow than 8K. > > I propose that it'd be a good idea to try to set the stats socket's > receive buffer size to be a minimum of, say, 100K on all platforms. > Code for this would be analogous to what we already have in pqcomm.c > (circa line 760) for forcing up the send buffer size, but SO_RCVBUF > not SO_SNDBUF. Seems reasonable. > A further idea is that maybe backends should be tweaked to avoid > blasting large amounts of data at the stats collector in one go. > That would require more thought to design, though. The data is already sent in small < 1 kB messages, I don't see what more we can do in the sender side to avoid overwhelming the receiver. Except reduce the amount of data sent overall. But that only goes so far, we cannot eliminate the problem altogether, unless we also lose some detail. It might nevertheless be worthwhile to reduce the overall volume. It would avoid some overhead, even if the buffer is large enough, although I don't remember pgstat being significant in any profiling I've done. One thing that caught my eye at a quick glance is that we are sending the # of tuples inserted/deleted/updated counters, even for read-only cases. It seems straightforward to detect that special case, and use an abbreviated version of PgStat_MsgTabstat without those counters, when we haven't updated anything. But again, might not be worth the trouble. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 05/14/2017 09:54 PM, Tom Lane wrote: >> A further idea is that maybe backends should be tweaked to avoid >> blasting large amounts of data at the stats collector in one go. >> That would require more thought to design, though. > The data is already sent in small < 1 kB messages, I don't see what more > we can do in the sender side to avoid overwhelming the receiver. Except > reduce the amount of data sent overall. But that only goes so far, we > cannot eliminate the problem altogether, unless we also lose some detail. I was wondering about some sort of rate-throttling on the messages. For instance, stop after sending X kilobytes, leaving remaining counts to be sent at the next opportunity. (Although it's not clear if you'd ever catch up :-(.) Or just a short sleep after every X kilobytes, to make it more probable that the stats collector gets to run and collect the data. regards, tom lane
I wrote: > I propose that it'd be a good idea to try to set the stats socket's > receive buffer size to be a minimum of, say, 100K on all platforms. > Code for this would be analogous to what we already have in pqcomm.c > (circa line 760) for forcing up the send buffer size, but SO_RCVBUF > not SO_SNDBUF. I experimented with the attached patch. Modern platforms such as recent Linux and macOS seem to have default receive buffer sizes in the 100K-200K range. The default is less in older systems, but even my very oldest dinosaurs will let you set it to at least 256K. (Don't have Windows to try, though.) I propose committing this (less the debug logging part) to HEAD once the beta is out, and then back-patching if it doesn't break anything and seems to improve matters on frogmouth. regards, tom lane diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index d4feed1..d868976 100644 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *************** *** 93,98 **** --- 93,101 ---- #define PGSTAT_POLL_LOOP_COUNT (PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY) #define PGSTAT_INQ_LOOP_COUNT (PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY) + /* Minimum receive buffer size for the collector's socket. */ + #define PGSTAT_MIN_RCVBUF (100 * 1024) + /* ---------- * The initial size hints for the hash tables used in the collector. *************** retry2: *** 574,579 **** --- 577,620 ---- goto startup_failed; } + /* + * Try to ensure that the socket's receive buffer is at least + * PGSTAT_MIN_RCVBUF bytes, so that it won't easily overflow and lose + * data. Use of UDP protocol means that we are willing to lose data under + * heavy load, but we don't want it to happen just because of ridiculously + * small default buffer sizes (such as 8KB on older Windows versions). + */ + { + int old_rcvbuf; + int new_rcvbuf; + ACCEPT_TYPE_ARG3 rcvbufsize = sizeof(old_rcvbuf); + + if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF, + (char *) &old_rcvbuf, &rcvbufsize) < 0) + { + elog(LOG, "getsockopt(SO_RCVBUF) failed: %m"); + /* if we can't get existing size, always try to set it */ + old_rcvbuf = 0; + } + + new_rcvbuf = PGSTAT_MIN_RCVBUF; + if (old_rcvbuf < new_rcvbuf) + { + if (setsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF, + (char *) &new_rcvbuf, sizeof(new_rcvbuf)) < 0) + elog(LOG, "setsockopt(SO_RCVBUF) failed: %m"); + } + + /* this part is just for debugging, not needed at commit: */ + rcvbufsize = sizeof(new_rcvbuf); + if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF, + (char *) &new_rcvbuf, &rcvbufsize) < 0) + elog(LOG, "getsockopt(SO_RCVBUF) failed: %m"); + else + elog(LOG, "getsockopt(SO_RCVBUF) was %d, now %d", + old_rcvbuf, new_rcvbuf); + } + pg_freeaddrinfo_all(hints.ai_family, addrs); return; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I wrote: >> I propose that it'd be a good idea to try to set the stats socket's >> receive buffer size to be a minimum of, say, 100K on all platforms. >> Code for this would be analogous to what we already have in pqcomm.c >> (circa line 760) for forcing up the send buffer size, but SO_RCVBUF >> not SO_SNDBUF. > I propose committing this (less the debug logging part) to HEAD > once the beta is out, and then back-patching if it doesn't break > anything and seems to improve matters on frogmouth. That went in in 8b0b6303e. frogmouth had failed in 5 of the 23 HEAD runs it made between 4e37b3e15 and 8b0b6303e. Since then, it has shown zero failures in 50 runs. I don't know what confidence a statistician would assign to the proposition that 8b0b6303e improved things, but this is good enough for me. I'm going to go back-patch it. regards, tom lane