Re: should we enable log_checkpoints out of the box? - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: should we enable log_checkpoints out of the box?
Date
Msg-id 4cdc6a8d-cc7f-a9b6-c5de-361be048ce72@wi3ck.info
Whole thread Raw
In response to Re: should we enable log_checkpoints out of the box?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 10/31/21 16:16, Andres Freund wrote:
> Hi,
> 
> On 2021-10-31 15:43:57 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > On 2021-10-31 10:59:19 -0400, Tom Lane wrote:
>> >> No DBA would be likely to consider it as anything but log spam.
>> 
>> > I don't agree at all. No postgres instance should be run without
>> > log_checkpoints enabled. Performance is poor if checkpoints are
>> > triggered by anything but time, and that can only be diagnosed if
>> > log_checkpoints is on.
>> 
>> This is complete nonsense.
> 
> Shrug. It's based on many years of doing or being around people doing
> postgres support escalation shifts. And it's not like log_checkpoints
> incurs meaningful overhead or causes that much log volume.

I agree with Andres 100%. Whenever called to diagnose any type of 
problems this is on the usual checklist and very few customers have it 
turned on. The usefulness of this information very much outweighs the 
tiny amount of extra log created.


> 
> 
>> If we think that's a generic problem, we should be fixing the problem
>> (ie, making the checkpointer smarter);
> 
> We've made it less bad (checkpoint_segments -> max_wal_size, sorting IO
> for checkpoints, forcing the OS to flush writes earlier). But it's still
> a significant issue. It's not that easy to make it better.

And we kept the default for max_wal_size at 1GB. While it is a "soft" 
limit, it is the main reason why instances are running full bore with a 
huge percentage of full page writes because it is way too small for 
their throughput and nothing in the logs warns them about it. I can run 
a certain TPC-C workload on an 8-core machine quite comfortably when 
max_wal_size is configured at 100G. The exact same TPC-C configuration 
will spiral the machine down if left with default max_wal_size and there 
is zero hint in the logs as to why.




-- 
Jan Wieck



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: inefficient loop in StandbyReleaseLockList()
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences