Re: Stats collector frozen? - Mailing list pgsql-general

From Tom Lane
Subject Re: Stats collector frozen?
Date
Msg-id 27665.1169952700@sss.pgh.pa.us
Whole thread Raw
In response to Re: Stats collector frozen?  (Magnus Hagander <magnus@hagander.net>)
List pgsql-general
Magnus Hagander <magnus@hagander.net> writes:
> On Fri, Jan 26, 2007 at 09:55:39AM -0500, Tom Lane wrote:
>> Keep in mind also that we have seen the stats-test failure on
>> non-Windows machines, so we still need to explain that ...

> Yeah. But it *could* be two different stats issues lurking. Perhaps the
> issue we've seen on non-windows can be fixed by the settings Alvaro had
> me try (increasing autovacuum_vacuum_cost_delay or the delay in the
> regression test).

I had a sudden thought about that: the stats machinery is designed to be
non-reliable, ie, drop messages under load.  Maybe the occasional stats
failures we see are just an artifact of that happening.  It would be
pretty unfortunate if the stats test and autovacuum together were
sufficient load to cause message drops, but I doubt that's the
explanation.  I think the important change here has been the default
enablement of stats_row_level.  That means that some of the tests
terminating just before the stats test starts may still be trying to
dump statistics out to the collector at the same time the stats test is.
(Keep in mind that psql does not wait around for the backend to be
actually gone before it exits, hence backend-exit cleanup is very likely
to happen in parallel with the start of the next test.)  This idea
explains why we mostly see the failure in parallel tests not serial:
in the serial schedule there's no opportunity to have a gang of backends
all exiting at the critical time.

If this theory is correct, then we can improve the reliability of the
stats test a good deal if we put a sleep() at the *start* of the test,
to let any old backends get out of the way.  It seems worth a try
anyway.  I'll add this to HEAD and if the stats failure noise seems to
go down, we can back-port it.

            regards, tom lane

pgsql-general by date:

Previous
From: Oisin Glynn
Date:
Subject: Re: Predicted lifespan of different PostgreSQL branches
Next
From: Michael Fuhr
Date:
Subject: Re: encode, lower and 0x8a