Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) - Mailing list pgsql-bugs

From Maxim Boguk
Subject Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date
Msg-id CAK-MWwTmJV4VprhAfCyTKRLndYbwbToF-Z0JYv6_o=qq3VYxyQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)  (Maxim Boguk <maxim.boguk@gmail.com>)
Responses Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-bugs
> On one of my servers stat collector went crasy on idle system.
>>> It constantly write around 50MB/s on idle database (no activity at all).
>>>
>>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
>             /*
>              * Write the stats file if a new request has arrived that is
> not
>              * satisfied by existing file.
>              */
>             if (last_statwrite < last_statrequest)
>                 pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>

What's made this situation interesting that the fact the code never got
inside
        /*
         * If there is clock skew between backends and the collector, we
could
         * receive a stats request time that's in the future.  If so,
complain
         * and reset last_statrequest.  Resetting ensures that no inquiry
         * message can cause more than one stats file write to occur.
         */
        if (last_statrequest > last_statwrite)
...
block.

Watchpoint on the (last_statwrite < last_statrequest) condition repetable
fire ONLY inside:
pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
{
    if (msg->inquiry_time > last_statrequest)
        last_statrequest = msg->inquiry_time;
}
function.

Now just some musings/theory (I might be dead wrong of course).
Lets suppose there are no clock skew actually and check execution flow.

Main loop executed:
            if (last_statwrite < last_statrequest)
                pgstat_write_statsfile(false);
and updated last_statwrite value.
However, it took some time because of file size.

During that period (while pgstat busy with writes) some backend issued
PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the next
code lines at
            len = recv(pgStatSock, (char *) &msg,...
And as a result the last_statrequest will be updated to the
msg->inquiry_time value.
And this value will be definitely > than last_statwrite (because this
request issued after pgstat worker stared writing new copy of the stat
file).

So these could be case when this situation could happen without time
travel/clock drift.

Kind Regards,
Maksym

pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Next
From: Jeff Janes
Date:
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)