Re: Tracing down buildfarm "postmaster does not shut down" failures - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Tracing down buildfarm "postmaster does not shut down" failures
Date
Msg-id 20160210160635.vm26bxzxqpogxbbs@alap3.anarazel.de
Whole thread Raw
In response to Re: Tracing down buildfarm "postmaster does not shut down" failures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2016-02-09 22:27:07 -0500, Tom Lane wrote:
> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data.  So the issues we're seeing would come and go
> depending on the timing of that I/O spike.  I'm not sure how to prove
> such a theory from here.

It'd be interesting to monitor
$ grep -E '^(Dirty|Writeback):' /proc/meminfo
output. At least on linux. It's terribly easy to get the kernel into a
state where it has so much data needing to be written back that an
immediate checkpoint takes pretty much forever.

If I understand the code correctly, once a buffer has been placed into
'writeback', it'll be more-or-less processed in order. That can e.g. be
because these buffers have been written to more than 30s ago. If there
then are buffers later that also need to be written back (e.g. due to an
fsync()), you'll often wait for the earlier ones.

Andres



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Updated backup APIs for non-exclusive backups
Next
From: David Steele
Date:
Subject: Re: Updated backup APIs for non-exclusive backups