Re: Better way of dealing with pgstat wait timeout during buildfarm runs? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date
Msg-id CA+TgmoZHEg1aHAewYpV9yCbFFj8sDOFufMnPgyT_2jkj2nU89A@mail.gmail.com
Whole thread Raw
In response to Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 12/27/2014 12:16 AM, Alvaro Herrera wrote:
>>> Tom Lane wrote:
>>>> The argument that autovac workers need fresher stats than anything else
>>>> seems pretty dubious to start with.  Why shouldn't we simplify that down
>>>> to "they use PGSTAT_STAT_INTERVAL like everybody else"?
>
>>> The point of wanting fresher stats than that, eons ago, was to avoid a
>>> worker vacuuming a table that some other worker vacuumed more recently
>>> than PGSTAT_STAT_INTERVAL. ...
>>> Nowadays we can probably disregard the whole issue, since starting a new
>>> vacuum just after the prior one finished should not cause much stress to
>>> the system thanks to the visibility map.
>
>> Vacuuming is far from free, even if the visibility map says that most
>> pages are visible to all: you still scan all indexes, if you remove any
>> dead tuples at all.
>
> With typical autovacuum settings, I kinda doubt that there's much value in
> reducing the window for this problem from 500ms to 10ms.  As Alvaro says,
> this was just a partial, kluge solution from the start --- if we're
> worried about such duplicate vacuuming, we should undertake a real
> solution that closes the window altogether.  In any case, timeouts
> occurring inside autovacuum are not directly causing the buildfarm
> failures, since autovacuum's log entries don't reflect into regression
> outputs.  (It's possible that autovacuum's tight tolerance is contributing
> to the failures by increasing the load on the stats collector, but I'm
> not sure I believe that.)
>
> To get back to that original complaint about buildfarm runs failing,
> I notice that essentially all of those failures are coming from "wait
> timeout" warnings reported by manual VACUUM commands.  Now, VACUUM itself
> has no need to read the stats files.  What's actually causing these
> messages is failure to get a timely response in pgstat_vacuum_stat().
> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
>
>     /*
>      * Send info about dead objects to the statistics collector, unless we are
>      * in autovacuum --- autovacuum.c does this for itself.
>      */
>     if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
>         pgstat_vacuum_stat();
>
> This would have the effect of transferring all responsibility for
> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
> that'd be just fine.  It might be less fine though for people who
> disable autovacuum, if there still are any.

-1.  I don't think it's a good idea to inflict pain on people who want
to schedule their vacuums manually (and yes, there are some) to get
clean buildfarm runs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: nls and server log
Next
From: Tom Lane
Date:
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?