Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date
Msg-id 56AF9E01.8010405@2ndquadrant.com
Whole thread Raw
In response to Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system  (Noah Misch <noah@leadboat.com>)
Responses Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
List pgsql-hackers
Hi,

On 12/22/2015 03:49 PM, Noah Misch wrote:
> On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:
>> I have pushed it now.  Further testing, of course, is always welcome.
>
> While investigating stats.sql buildfarm failures, mostly on animals axolotl
> and shearwater, I found that this patch (commit 187492b) inadvertently removed
> the collector's ability to coalesce inquiries.  Every PGSTAT_MTYPE_INQUIRY
> received now causes one stats file write.  Before, pgstat_recv_inquiry() did:
>
>     if (msg->inquiry_time > last_statrequest)
>         last_statrequest = msg->inquiry_time;
>
> and pgstat_write_statsfile() did:
>
>     globalStats.stats_timestamp = GetCurrentTimestamp();
> ... (work of writing a stats file) ...
>         last_statwrite = globalStats.stats_timestamp;
>         last_statrequest = last_statwrite;
>
> If the collector entered pgstat_write_statsfile() with more inquiries waiting
> in its socket receive buffer, it would ignore them as being too old once it
> finished the write and resumed message processing.  Commit 187492b converted
> last_statrequest to a "last_statrequests" list that we wipe after each write.

So essentially we remove the list of requests, and thus on the next 
round we don't know the timestamp of the last request and write the file 
again unnecessarily. Do I get that right?

What if we instead kept the list but marked the requests as 'invalid' so 
that we know the timestamp? In that case we'd be able to do pretty much 
exactly what the original code did (but at per-db granularity).

We'd have to cleanup the list once in a while not to grow excessively 
large, but something like removing entries older than 
PGSTAT_STAT_INTERVAL should be enough.

Actually, I think that was the idea when I wrote the patch, but 
apparently I got distracted and it did not make it into the code.

>
> I modeled a machine with slow stats writes using the attached diagnostic patch
> (not for commit).  It has pgstat_write_statsfiles() sleep just before renaming
> the temporary file, and it logs each stats message received.  A three second
> delay causes stats.sql to fail the way it did on shearwater[1] and on
> axolotl[2].  Inquiries accumulate until the socket receive buffer overflows,
> at which point the socket drops stats messages whose effects we later test
> for.  The 3s delay makes us drop some 85% of installcheck stats messages.
> Logs from a single VACUUM show receipt of five inquiries ("stats 1:") with 3s
> between them:
>
> 24239 2015-12-10 04:21:03.439 GMT LOG:  connection authorized: user=nmisch database=test
> 24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 1888 + 936 = 2824
> 24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 2824 + 376 = 3200
> 24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 3200 + 824 = 4024
> 24239 2015-12-10 04:21:03.455 GMT LOG:  statement: vacuum pg_class
> 24236 2015-12-10 04:21:03.455 GMT LOG:  stats 1: 4024 + 32 = 4056
> 24236 2015-12-10 04:21:06.458 GMT LOG:  stats 12: 4056 + 88 = 4144
> 24236 2015-12-10 04:21:06.458 GMT LOG:  stats 1: 4144 + 32 = 4176
> 24239 2015-12-10 04:21:06.463 GMT LOG:  disconnection: session time: 0:00:03.025 user=nmisch database=test
host=[local]
> 24236 2015-12-10 04:21:09.486 GMT LOG:  stats 1: 4176 + 32 = 4208
> 24236 2015-12-10 04:21:12.503 GMT LOG:  stats 1: 4208 + 32 = 4240
> 24236 2015-12-10 04:21:15.519 GMT LOG:  stats 1: 4240 + 32 = 4272
> 24236 2015-12-10 04:21:18.535 GMT LOG:  stats 9: 4272 + 48 = 4320
> 24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 4320 + 936 = 5256
> 24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 5256 + 376 = 5632
> 24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 5632 + 264 = 5896
> 24236 2015-12-10 04:21:18.535 GMT LOG:  stats 12: 5896 + 88 = 5984
>
>
> As for how to fix this, the most direct translation of the old logic is to
> retain last_statrequests entries that could help coalesce inquiries.I lean
> toward that for an initial, back-patched fix.

That seems reasonable and I believe it's pretty much the idea I came up 
with above, right? Depending on how you define "entries that could help 
coalesce inquiries".
> It would be good, though, to
> process two closely-spaced, different-database inquiries in one
> pgstat_write_statsfiles() call.  We do one-database writes and all-databases
> writes, but we never write "1 < N < all" databases despite the code prepared
> to do so.  I tried calling pgstat_write_statsfiles() only when the socket
> receive buffer empties.  That's dead simple to implement and aggressively
> coalesces inquiries (even a 45s sleep did not break stats.sql), but it starves
> inquirers if the socket receive buffer stays full persistently.  Ideally, I'd
> want to process inquiries when the buffer empties _or_ when the oldest inquiry
> is X seconds old.  I don't have a more-specific design in mind, though.

That's a nice idea, but I agree that binding the coalescing to buffer 
like this seems like a pretty bad idea exactly because of the starving. 
What might work though is if we could look at how much data is there in 
the buffer, process only those requests and then write the stats files.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Artur Zakirov
Date:
Subject: Re: Fuzzy substring searching with the pg_trgm extension
Next
From: David Steele
Date:
Subject: Re: PostgreSQL Audit Extension