Thread: Re: [COMMITTERS] pgsql: Reduce checkpoints and WAL traffic on low activity database serv

On Wed, Nov 2, 2011 at 12:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Nov 2, 2011 at 4:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Nov 2, 2011 at 11:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Reduce checkpoints and WAL traffic on low activity database server
>>> Previously, we skipped a checkpoint if no WAL had been written since
>>> last checkpoint, though this does not appear in user documentation.
>>> As of now, we skip a checkpoint until we have written at least one
>>> enough WAL to switch the next WAL file. This greatly reduces the
>>> level of activity and number of WAL messages generated by a very
>>> low activity server. This is safe because the purpose of a checkpoint
>>> is to act as a starting place for a recovery, in case of crash.
>>> This patch maintains minimal WAL volume for replay in case of crash,
>>> thus maintaining very low crash recovery time.
>>
>> I think you need to update the docs, for checkpoint_timeout if nothing else.
>
> Checkpoints have always been skipped, if no activity. So the docs
> don't need changing.

Eh, well, I don't agree.  The section on WAL configuration says:

>> A checkpoint is created every checkpoint_segments log segments, or every checkpoint_timeout seconds, whichever comes
first.

Now, admittedly, that doesn't mention that a checkpoint won't be
created if there's absolutely no activity, but the fact that we might
now not do it even if there has been some minimal activity seems worth
adding.

Similarly, the discussion of checkpoint_segments itself says:

Maximum time between automatic WAL checkpoints, in seconds. The
default is five minutes (5min). Increasing this parameter can increase
the amount of time needed for crash recovery. This parameter can only
be set in the postgresql.conf file or on the server command line.

Again, it's no longer the maximum time between automatic checkpoints.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Wed, Nov 2, 2011 at 6:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> Again, it's no longer the maximum time between automatic checkpoints.

You're missing the point that it never was like that. I've not altered
the situation.

And it doesn't matter either, so I will ignore.

If you see a need to correct those docs, feel free.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


On 11/02/2011 05:48 PM, Simon Riggs wrote:
> On Wed, Nov 2, 2011 at 6:27 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>
>    
>> Again, it's no longer the maximum time between automatic checkpoints.
>>      
> You're missing the point that it never was like that. I've not altered
> the situation.
>    

Robert's point is more that the existing docs are already broken; this 
new patch can just increase how bad the drift between reality and 
documentation can be.  Before, the only people who ran into this had 
zero activity on the server, which meant there wasn't any data to be 
lost, either.  Now it's potentially broader than that.

With some trivial checkpoints containing a small amount of data skipped 
now, aren't there some cases where less WAL data will be written than 
before?  In that case, the user visible behavior here would be 
different.  I'd be most concerned about file-based log shipping case.

I don't think there's any change needed to the "Write Ahead Log" section 
of the "Server Configuration" chapter.  In the "Reliability and the 
Write-Ahead Log" chapter, this text in "WAL Configuration" was already 
highlighted as the problem here:

The server's background writer process automatically performs a 
checkpoint every so often. A checkpoint is created every 
checkpoint_segments log segments, or every checkpoint_timeout seconds, 
whichever comes first. The default settings are 3 segments and 300 
seconds (5 minutes), respectively. It is also possible to force a 
checkpoint by using the SQL command CHECKPOINT.

I think this needs a change like this, to address the hole that was 
already in the docs and cover the new behavior too; this goes just 
before " It is also possible to force..."

In cases where there are little or no writes to the WAL, checkpoints 
will be skipped even if checkpoint_timeout has passed.  At least one new 
WAL segment must have been created before an automatic checkpoint 
occurs.  The time between checkpoints and when new WAL segments are 
created are not related in any other way.  If file-based WAL shipping is 
being used and you want to bound how often files are sent to standby 
server, to reduce potential data loss you should adjust archive_timeout 
parameter rather than the checkpoint ones.

This area is a confusing one, so some clarification of the relation 
between checkpoints and replication is a useful docs improvement.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Greg Smith <greg@2ndQuadrant.com> writes:
> On 11/02/2011 05:48 PM, Simon Riggs wrote:
>> You're missing the point that it never was like that. I've not altered
>> the situation.

> Robert's point is more that the existing docs are already broken; this 
> new patch can just increase how bad the drift between reality and 
> documentation can be.

Yeah.  I agree that we need to adjust the docs.  Anybody object
to Greg's proposed text?
        regards, tom lane


On Thu, Nov 3, 2011 at 12:21 AM, Greg Smith <greg@2ndquadrant.com> wrote:

> With some trivial checkpoints containing a small amount of data skipped now,
> aren't there some cases where less WAL data will be written than before?  In
> that case, the user visible behavior here would be different.  I'd be most
> concerned about file-based log shipping case.

Typical settings are checkpoint_timeout = 300 and archive_timeout =
30. So file-based replication users won't notice any difference.

The only people who will see a difference are people with
archive_timeout = 0 and who either store or stream WAL. For those
people, a keepalive message will be available to ensure we can check
the link is up, even if no WAL data flows, which I am working on next.

The change also increases durability, since the secondary checkpoint
is usually in a separate file.

> In cases where there are little or no writes to the WAL, checkpoints will be
> skipped even if checkpoint_timeout has passed.  At least one new WAL segment
> must have been created before an automatic checkpoint occurs.  The time
> between checkpoints and when new WAL segments are created are not related in
> any other way.  If file-based WAL shipping is being used and you want to
> bound how often files are sent to standby server, to reduce potential data
> loss you should adjust archive_timeout parameter rather than the checkpoint
> ones.

Committed, thanks.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


On Wed, Nov 2, 2011 at 8:21 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> In cases where there are little or no writes to the WAL, checkpoints will be
> skipped even if checkpoint_timeout has passed.  At least one new WAL segment
> must have been created before an automatic checkpoint occurs.  The time
> between checkpoints and when new WAL segments are created are not related in
> any other way.  If file-based WAL shipping is being used and you want to
> bound how often files are sent to standby server, to reduce potential data
> loss you should adjust archive_timeout parameter rather than the checkpoint
> ones.

I think this is good, although "where there are little or no writes to
the WAL" seems a bit awkward to me - how about "where little or no WAL
has been written"?

I would probably delete "to reduce potential data loss" from the last
sentence, since I think that sentence has a few too many clauses to be
easily parseable.

Should we also put a similar sentence into the documentation for
checkpoint_timeout?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company