Re: Maximum number of WAL files in the pg_xlog directory - Mailing list pgsql-hackers

From Guillaume Lelarge
Subject Re: Maximum number of WAL files in the pg_xlog directory
Date
Msg-id CAECtzeXTJFdAC4S8FKP+FVrFtfiOPq8iHMJior7v4xdLQEC=Bg@mail.gmail.com
Whole thread Raw
In response to Maximum number of WAL files in the pg_xlog directory  (Guillaume Lelarge <guillaume@lelarge.info>)
Responses Re: Maximum number of WAL files in the pg_xlog directory
List pgsql-hackers
<p dir="ltr">Le 8 août 2014 09:08, "Guillaume Lelarge" <<a
href="mailto:guillaume@lelarge.info">guillaume@lelarge.info</a>>a écrit :<br /> ><br /> > Hi,<br /> ><br />
>As part of our monitoring work for our customers, we stumbled upon an issue with our customers' servers who have a
wal_keep_segmentssetting higher than 0.<br /> ><br /> > We have a monitoring script that checks the number of WAL
filesin the pg_xlog directory, according to the setting of three parameters (checkpoint_completion_target,
checkpoint_segments,and wal_keep_segments). We usually add a percentage to the usual formula:<br /> ><br /> >
greatest(<br/> >   (2 + checkpoint_completion_target) * checkpoint_segments + 1,<br /> >   checkpoint_segments +
wal_keep_segments+ 1<br /> > )<br /> ><br /> > And we have lots of alerts from the script for customers who
settheir wal_keep_segments setting higher than 0.<br /> ><br /> > So we started to question this sentence of the
documentation:<br/> ><br /> > There will always be at least one WAL segment file, and will normally not be more
than(2 + checkpoint_completion_target) * checkpoint_segments + 1 or checkpoint_segments + wal_keep_segments + 1
files.<br/> ><br /> > (<a
href="http://www.postgresql.org/docs/9.3/static/wal-configuration.html">http://www.postgresql.org/docs/9.3/static/wal-configuration.html</a>)<br
/>><br /> > While doing some tests, it appears it would be more something like:<br /> ><br /> >
wal_keep_segments+ (2 + checkpoint_completion_target) * checkpoint_segments + 1<br /> ><br /> > But after reading
thesource code (src/backend/access/transam/xlog.c), the right formula seems to be:<br /> ><br /> >
wal_keep_segments+ 2 * checkpoint_segments + 1<br /> ><br /> > Here is how we went to this formula...<br />
><br/> > CreateCheckPoint(..) is responsible, among other things, for deleting and recycling old WAL files. From
src/backend/access/transam/xlog.c,master branch, line 8363:<br /> ><br /> > /*<br /> >  * Delete old log files
(thoseno longer needed even for previous<br /> >  * checkpoint or the standbys in XLOG streaming).<br /> >  */<br
/>> if (_logSegNo)<br /> > {<br /> >     KeepLogSeg(recptr, &_logSegNo);<br /> >     _logSegNo--;<br />
>    RemoveOldXlogFiles(_logSegNo, recptr);<br /> > }<br /> ><br /> > KeepLogSeg(...) function takes care
ofwal_keep_segments. From src/backend/access/transam/xlog.c, master branch, line 8792:<br /> ><br /> > /* compute
limitfor wal_keep_segments first */<br /> > if (wal_keep_segments > 0)<br /> > {<br /> >     /* avoid
underflow,don't go below 1 */<br /> >     if (segno <= wal_keep_segments)<br /> >         segno = 1;<br />
>    else<br /> >         segno = segno - wal_keep_segments;<br /> > }<br /> ><br /> > IOW, the segment
number(segno) is decremented according to the setting of wal_keep_segments. segno is then sent back to
CreateCheckPoint(...)via _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can remove or recycle
allfiles before this segment number. This function gets the number of WAL files to recycle with the XLOGfileslop
constant,which is defined as:<br /> ><br /> > /*<br /> >  * XLOGfileslop is the maximum number of preallocated
futureXLOG segments.<br /> >  * When we are done with an old XLOG segment file, we will recycle it as a<br /> >
 *future XLOG segment as long as there aren't already XLOGfileslop future<br /> >  * segments; else we'll delete
it. This could be made a separate GUC<br /> >  * variable, but at present I think it's sufficient to hardwire it
as<br/> >  * 2*CheckPointSegments+1.  Under normal conditions, a checkpoint will free<br /> >  * no more than
2*CheckPointSegmentslog segments, and we want to recycle all<br /> >  * of them; the +1 allows boundary cases to
happenwithout wasting a<br /> >  * delete/create-segment cycle.<br /> >  */<br /> > #define XLOGfileslop   
(2*CheckPointSegments+ 1)<br /> ><br /> > (in src/backend/access/transam/xlog.c, master branch, line 100)<br />
><br/> > IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL file, and then there may be
2*CheckPointSegments+ 1 recycled ones. Hence the formula:<br /> ><br /> > wal_keep_segments + 2 *
checkpoint_segments+ 1<br /> ><br /> > And this is what we usually find in our customers' servers. We may find
moreWAL files, depending on the write activity of the cluster, but in average, we get this number of WAL files.<br />
><br/> > AFAICT, the documentation is wrong about the usual number of WAL files in the pg_xlog directory. But I
maybe wrong, in which case, the documentation isn't clear enough for me, and should be fixed so that others can't
misinterpretit like I may have done.<br /> ><br /> > Any comments? did I miss something, or should we fix the
documentation?<br/> ><br /> > Thanks.<br /> ><p dir="ltr">Ping? 

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: ALTER SYSTEM RESET?
Next
From: Andrew Gierth
Date:
Subject: Re: Final Patch for GROUPING SETS