Thread: [HACKERS] Receive buffer size for the statistics socket

[HACKERS] Receive buffer size for the statistics socket

From

Tom Lane

Date:

14 May 2017, 21:54:48

So I put in the patch I'd proposed to reduce sleep delays in the stats
regression test, and I see that frogmouth has now failed that test twice,
with symptoms suggesting that it's dropping the last stats report ---
but not all of the stats reports --- from the test's first session.
I considered reverting the test patch, but on closer inspection that seems
like it would be shooting the messenger, because this is indicating a real
and reproducible loss of stats data.

I put in some debug elog's to see how much data is getting shoved at the
stats collector during this test, and it seems that it can be as much as
about 12K, between what the first session can send at exit and what the
second session will send immediately after startup.  (I think this value
should be pretty platform-independent, but be it noted that I'm measuring
on a 64-bit Linux system while frogmouth is 32-bit Windows.)

Now, what's significant about that is that frogmouth is a pretty old
Windows version, and what I read on the net is that Windows versions
before 2012 have only 8KB socket receive buffer size by default.
So this behavior is plausibly explained by the theory that the stats
collector's receive buffer is overflowing, causing loss of stats messages.
This could well explain the stats-test failures we've seen in the past
too, which if memory serves were mostly on Windows.

Also, it's clear that a session could easily shove much more than 8KB at
a time out to the stats collector, because what we're doing in the stats
test does not involve touching any very large number of tables.  So I
think this is not just a test failure but is telling us about a plausible
mechanism for real-world statistics drops.

I observe a default receive buffer size around 124K on my Linux box,
which seems like it'd be far less prone to overflow than 8K.

I propose that it'd be a good idea to try to set the stats socket's
receive buffer size to be a minimum of, say, 100K on all platforms.
Code for this would be analogous to what we already have in pqcomm.c
(circa line 760) for forcing up the send buffer size, but SO_RCVBUF
not SO_SNDBUF.

A further idea is that maybe backends should be tweaked to avoid
blasting large amounts of data at the stats collector in one go.
That would require more thought to design, though.

Thoughts?
        regards, tom lane

Re: [HACKERS] Receive buffer size for the statistics socket

From

Heikki Linnakangas

Date:

15 May 2017, 12:34:38

On 05/14/2017 09:54 PM, Tom Lane wrote:
> Also, it's clear that a session could easily shove much more than 8KB at
> a time out to the stats collector, because what we're doing in the stats
> test does not involve touching any very large number of tables.  So I
> think this is not just a test failure but is telling us about a plausible
> mechanism for real-world statistics drops.
>
> I observe a default receive buffer size around 124K on my Linux box,
> which seems like it'd be far less prone to overflow than 8K.
>
> I propose that it'd be a good idea to try to set the stats socket's
> receive buffer size to be a minimum of, say, 100K on all platforms.
> Code for this would be analogous to what we already have in pqcomm.c
> (circa line 760) for forcing up the send buffer size, but SO_RCVBUF
> not SO_SNDBUF.

Seems reasonable.

> A further idea is that maybe backends should be tweaked to avoid
> blasting large amounts of data at the stats collector in one go.
> That would require more thought to design, though.

The data is already sent in small < 1 kB messages, I don't see what more 
we can do in the sender side to avoid overwhelming the receiver. Except 
reduce the amount of data sent overall. But that only goes so far, we 
cannot eliminate the problem altogether, unless we also lose some detail.

It might nevertheless be worthwhile to reduce the overall volume. It 
would avoid some overhead, even if the buffer is large enough, although 
I don't remember pgstat being significant in any profiling I've done. 
One thing that caught my eye at a quick glance is that we are sending 
the # of tuples inserted/deleted/updated counters, even for read-only 
cases. It seems straightforward to detect that special case, and use an 
abbreviated version of PgStat_MsgTabstat without those counters, when we 
haven't updated anything. But again, might not be worth the trouble.

- Heikki

Re: [HACKERS] Receive buffer size for the statistics socket

From

Tom Lane

Date:

15 May 2017, 17:07:08

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 05/14/2017 09:54 PM, Tom Lane wrote:
>> A further idea is that maybe backends should be tweaked to avoid
>> blasting large amounts of data at the stats collector in one go.
>> That would require more thought to design, though.

> The data is already sent in small < 1 kB messages, I don't see what more 
> we can do in the sender side to avoid overwhelming the receiver. Except 
> reduce the amount of data sent overall. But that only goes so far, we 
> cannot eliminate the problem altogether, unless we also lose some detail.

I was wondering about some sort of rate-throttling on the messages.
For instance, stop after sending X kilobytes, leaving remaining counts
to be sent at the next opportunity.  (Although it's not clear if you'd
ever catch up :-(.)  Or just a short sleep after every X kilobytes, to
make it more probable that the stats collector gets to run and collect
the data.
        regards, tom lane

Re: [HACKERS] Receive buffer size for the statistics socket

From

Tom Lane

Date:

16 May 2017, 02:57:21

I wrote:
> I propose that it'd be a good idea to try to set the stats socket's
> receive buffer size to be a minimum of, say, 100K on all platforms.
> Code for this would be analogous to what we already have in pqcomm.c
> (circa line 760) for forcing up the send buffer size, but SO_RCVBUF
> not SO_SNDBUF.

I experimented with the attached patch.  Modern platforms such as
recent Linux and macOS seem to have default receive buffer sizes
in the 100K-200K range.  The default is less in older systems, but
even my very oldest dinosaurs will let you set it to at least 256K.
(Don't have Windows to try, though.)

I propose committing this (less the debug logging part) to HEAD
once the beta is out, and then back-patching if it doesn't break
anything and seems to improve matters on frogmouth.

            regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d4feed1..d868976 100644
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 93,98 ****
--- 93,101 ----
  #define PGSTAT_POLL_LOOP_COUNT    (PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY)
  #define PGSTAT_INQ_LOOP_COUNT    (PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY)

+ /* Minimum receive buffer size for the collector's socket. */
+ #define PGSTAT_MIN_RCVBUF        (100 * 1024)
+

  /* ----------
   * The initial size hints for the hash tables used in the collector.
*************** retry2:
*** 574,579 ****
--- 577,620 ----
          goto startup_failed;
      }

+     /*
+      * Try to ensure that the socket's receive buffer is at least
+      * PGSTAT_MIN_RCVBUF bytes, so that it won't easily overflow and lose
+      * data.  Use of UDP protocol means that we are willing to lose data under
+      * heavy load, but we don't want it to happen just because of ridiculously
+      * small default buffer sizes (such as 8KB on older Windows versions).
+      */
+     {
+         int            old_rcvbuf;
+         int            new_rcvbuf;
+         ACCEPT_TYPE_ARG3 rcvbufsize = sizeof(old_rcvbuf);
+
+         if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
+                        (char *) &old_rcvbuf, &rcvbufsize) < 0)
+         {
+             elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
+             /* if we can't get existing size, always try to set it */
+             old_rcvbuf = 0;
+         }
+
+         new_rcvbuf = PGSTAT_MIN_RCVBUF;
+         if (old_rcvbuf < new_rcvbuf)
+         {
+             if (setsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
+                            (char *) &new_rcvbuf, sizeof(new_rcvbuf)) < 0)
+                 elog(LOG, "setsockopt(SO_RCVBUF) failed: %m");
+         }
+
+         /* this part is just for debugging, not needed at commit: */
+         rcvbufsize = sizeof(new_rcvbuf);
+         if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
+                        (char *) &new_rcvbuf, &rcvbufsize) < 0)
+             elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
+         else
+             elog(LOG, "getsockopt(SO_RCVBUF) was %d, now %d",
+                  old_rcvbuf, new_rcvbuf);
+     }
+
      pg_freeaddrinfo_all(hints.ai_family, addrs);

      return;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Receive buffer size for the statistics socket

From

Tom Lane

Date:

30 May 2017, 03:14:55

I wrote:
>> I propose that it'd be a good idea to try to set the stats socket's
>> receive buffer size to be a minimum of, say, 100K on all platforms.
>> Code for this would be analogous to what we already have in pqcomm.c
>> (circa line 760) for forcing up the send buffer size, but SO_RCVBUF
>> not SO_SNDBUF.

> I propose committing this (less the debug logging part) to HEAD
> once the beta is out, and then back-patching if it doesn't break
> anything and seems to improve matters on frogmouth.

That went in in 8b0b6303e.  frogmouth had failed in 5 of the 23 HEAD runs
it made between 4e37b3e15 and 8b0b6303e.  Since then, it has shown zero
failures in 50 runs.  I don't know what confidence a statistician would
assign to the proposition that 8b0b6303e improved things, but this is good
enough for me.  I'm going to go back-patch it.
        regards, tom lane