Home > mailing lists

Re: Tracing down buildfarm "postmaster does not shut down" failures - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Tracing down buildfarm "postmaster does not shut down" failures
Date	February 10, 2016 16:06:41
Msg-id	20160210160635.vm26bxzxqpogxbbs@alap3.anarazel.de Whole thread Raw
In response to	Re: Tracing down buildfarm "postmaster does not shut down" failures (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On 2016-02-09 22:27:07 -0500, Tom Lane wrote:
> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data.  So the issues we're seeing would come and go
> depending on the timing of that I/O spike.  I'm not sure how to prove
> such a theory from here.

It'd be interesting to monitor
$ grep -E '^(Dirty|Writeback):' /proc/meminfo
output. At least on linux. It's terribly easy to get the kernel into a
state where it has so much data needing to be written back that an
immediate checkpoint takes pretty much forever.

If I understand the code correctly, once a buffer has been placed into
'writeback', it'll be more-or-less processed in order. That can e.g. be
because these buffers have been written to more than 30s ago. If there
then are buffers later that also need to be written back (e.g. due to an
fsync()), you'll often wait for the earlier ones.

Andres

pgsql-hackers by date:

From: David Steele
Date: 10 February 2016, 16:06:09
Subject: Re: Updated backup APIs for non-exclusive backups

From: David Steele
Date: 10 February 2016, 16:07:30
Subject: Re: Updated backup APIs for non-exclusive backups

Re: Tracing down buildfarm "postmaster does not shut down" failures - Mailing list pgsql-hackers

Previous

Next