Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Redesigning checkpoint_segments
Date
Msg-id CA+TgmobVkrcex_g+b5LkwVA=Yzz=vuWaxvbNe4o0-kpUcoy35w@mail.gmail.com
Whole thread Raw
In response to Re: Redesigning checkpoint_segments  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Tue, Feb 3, 2015 at 4:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> That's different from our current checkpoint_segments setting. With
>>> checkpoint_segments, the documented formula for calculating the disk usage
>>> is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's
>>> a lot less intuitive to set.
>>
>> Hmm, that's different from what I was thinking.  We probably shouldn't
>> call that max_checkpoint_segments, then.  I got confused and thought
>> you were just trying to decouple the number of segments that it takes
>> to trigger a checkpoint from the number we keep preallocated.
>
> Wait, what?  Because the new setting is an actual soft maximum, we
> *shouldn't* call it a maximum?  Or are you saying something else?

I am saying that I proposed calling it max_checkpoint_segments because
I thought it was the maximum number of segments between checkpoints.
But it's not.

>> I wouldn't object to raising it a little bit, but that's way too high.
>> It's entirely possible to have a small database that generates a lot of
>> WAL. A table that has only a few rows, but is updated very very
>> frequently, for example. And checkpointing such a database is quick too,
>> so frequent checkpoints are not a problem. You don't want to end up with
>> 1.5 GB of WAL on a 100 MB database.
>
> I suggest 192MB instead (12 segments).  That almost doubles our current
> real default, without requiring huge disk space which might surprise
> some users.
>
> In practice, checkpoint_segments is impossible to automatically tune
> correctly.  So let's be conservative.

We are too often far too conservative about these things.  If we make
the default 192MB, it will only ever get tuned in one direction: up.
It is not a bad thing for us to set the settings high enough that once
in a great while someone might find them to be too high rather than
too low.

I find it amazing that anyone here thinks that a user would be OK with
using 192MB of space for WAL, but 384MB would break the bank.  The
hard drive in my laptop is 456GB.  The point is, with Heikki's work
here, you're only going to use the maximum amount of space if you have
massive write activity.  And if you have massive write activity, it's
extremely likely that you will be OK with using a very modest amount
of disk space to have that be fast.  Right now, we have to be really
conservative because we're going to use the full allocation all the
time, but this fixes that.  I think.

If somebody were to propose limiting the size of the database to
192MB, and requiring a configuration setting to make it larger,
everybody would say that's a terrible idea.  Heck, if I were to
propose limiting the database to 19.2GB, and require a configuration
setting to make it larger, everybody would say that's a terrible idea.
But what we actually have is not far off from that.  Sure, you can
create a 20GB database with an out-of-the-box configuration, but you'd
better get out your pillow before starting the data load, because with
checkpoint_segments=3 that's going to be fantastically slow.  And
you'd better hope that the update rate is pretty low, too, because if
it's anything even slightly interesting you're going to be spewing
checkpoint warnings into the log.  So our settings need to *support*
creating a 20GB database out of the box, but it's OK if it performs
absolutely terribly.

I really have a hard time believing that there are many people who are
going to complain about WAL utilization peaking at 1.6GB (my initial
proposal).  Your database is probably rapidly expanding, and the WAL
utilization will drop when it stops.  And if it isn't rapidly
expanding, because you're doing a ton of updates in place, you'll
probably still be happier to spend a little extra disk space than to
have it be cripplingly slow.  And if you're not, then, first, what is
wrong with you, and second, well then you can turn down the setting.
That's why we have settings.  I enjoy getting paid to tell people to
increase checkpoint_segments by two orders of magnitude as much as the
next PostgreSQL consultant, but I don't enjoy the fact that people
benchmark the default configuration and get terrible results because
we haven't updated the default value for this parameter since it was
added in 2001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: David G Johnston
Date:
Subject: Re: Fetch zero result rows when executing a query?
Next
From: Magnus Hagander
Date:
Subject: Re: