Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> In the meantime, we've had *lots* of buildfarm failures in the
>>> added pg_stat_all_tables query, which indicate that indeed the
>>> stats collector mechanism isn't terribly reliable.  But that
>>> doesn't directly prove anything about the original problem,
>>> since the planner doesn't look at stats collector data.
>> I noticed that if you look at the list of failures of this type, there
>> are often pairs of animals belonging to Andres that failed at the same
>> time.  I wonder if he might be running a bunch of animals on one
>> kernel, and need to increase net.core.rmem_max and
>> net.core.rmem_default (or maybe the write side variants, or both, or
>> something like that).
> Andres's animals report the same hostname and run at the same time, so
> it'd be interesting to know what net.core.rmem_max is set to and
> whether these problems go away if it's cranked up 10x higher or
> something.  In a quick test I can see that make installcheck is
> capable of sending a *lot* of 936 byte messages in the same
> millisecond.
Yeah.  I think we've had quite enough of the stats-transmission-related
failures, and they're no longer proving anything about the original
problem.  So I will go do what I proposed in mid-July and revert the
stats queries, while keeping the reltuples/relpages check.  (I'd kind
of like to get more confirmation that the plan shape change is associated
with those fields reading as zeroes, before we decide what to do about the
underlying instability.)
            regards, tom lane