Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date
Msg-id 7552def8-c6bf-88b7-b028-549038a24798@2ndquadrant.com
Whole thread Raw
In response to Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system  (Noah Misch <noah@leadboat.com>)
Responses Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
List pgsql-hackers
Hi,

On 03/14/2016 07:14 AM, Noah Misch wrote:
> [Aside: your new mail editor is rewrapping lines in quoted material, and the
> result is messy.  I have rerewrapped one paragraph below.]

Thanks, I've noticed that too. I've been testing Evolution in the past 
few days, and apparently the line wrapping algorithm is broken. I've 
switched back to Thunderbird, so hopefully that'll fix it.

>
> On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
>> On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:
>>> I've not attempted to study the behavior on slow hardware.  Instead, my
>>> report used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
>>> diagnostic patch no longer applies cleanly, so I'm attaching a rebased
>>> version.  I've changed the patch name from "stat-coalesce" to
>>> "slow-stat-simulate" to more-clearly distinguish it from the
>>> "pgstat-coalesce" patch.)  Problems remain after applying your patch;
>>> consider "VACUUM pg_am" behavior:
>>>
>>> 9.2 w/ stat-coalesce-v1.patch:
>>>   VACUUM returns in 3s, stats collector writes each file 1x over 3s
>>> HEAD w/ slow-stat-simulate-v2.patch:
>>>   VACUUM returns in 3s, stats collector writes each file 5x over 15s
>>> HEAD w/ slow-stat-simulate-v2.patch and your patch:
>>>   VACUUM returns in 10s, stats collector writes no files over 10s
>>
>> Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
>> the opposite direction. After fixing that "VACUUM pg_am" completes in 3
>> seconds and writes each file just once.
>
> That fixes things.  "make check" passes under an 8s stats write time.

OK, good.

>
>> --- a/src/backend/postmaster/pgstat.c
>> +++ b/src/backend/postmaster/pgstat.c
>> @@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
>>      }
>>
>>      /*
>> +     * Ignore requests that are already resolved by the last write.
>> +     *
>> +     * We discard the queue of requests at the end of pgstat_write_statsfiles(),
>> +     * so the requests already waiting on the UDP socket at that moment can't
>> +     * be discarded in the previous loop.
>> +     *
>> +     * XXX Maybe this should also care about the clock skew, just like the
>> +     *     block a few lines down.
>
> Yes, it should.  (The problem is large (>~100s), backward clock resets, not
> skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
> will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
> cases need the correction a few lines down.

I'll look into that. I have to admit I have a hard time reasoning about 
the code handling clock skew, so it might take some time, though.

>
> The other thing needed here is to look through and update comments about
> last_statrequests.  For example, this loop is dead code due to us writing
> files as soon as we receive one inquiry:
>
>     /*
>      * Find the last write request for this DB.  If it's older than the
>      * request's cutoff time, update it; otherwise there's nothing to do.
>      *
>      * Note that if a request is found, we return early and skip the below
>      * check for clock skew.  This is okay, since the only way for a DB
>      * request to be present in the list is that we have been here since the
>      * last write round.
>      */
>     slist_foreach(iter, &last_statrequests) ...
>
> I'm okay keeping the dead code for future flexibility, but the comment should
> reflect that.

Yes, that's another thing that I'd like to look into. Essentially the 
problem is that we always process the inquiries one by one, so we never 
actually see a list with more than a single element. Correct?

I think the best way to address that is to peek is to first check how 
much data is in the UDP queue, and then fetching all of that before 
actually doing the writes. Peeking at the number of requests first (or 
even some reasonable hard-coded limit) should should prevent starving 
the inquirers in case of a steady stream or inquiries.

regards
Tomas

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: pgbench - allow backslash-continuations in custom scripts
Next
From: David Steele
Date:
Subject: Re: [WIP] speeding up GIN build with parallel workers