Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date
Msg-id 54513397.8060509@fuzzy.cz
Whole thread Raw
In response to Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
List pgsql-hackers
On 29.10.2014 18:31, Robert Haas wrote:
> On Mon, Oct 27, 2014 at 8:01 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> (3) write-heavy workloads / large template database
>>
>>     Current approach wins, for two reasons: (a) for large databases the
>>     WAL-logging overhead may generate much more I/O than a checkpoint,
>>     and (b) it may generate so many WAL segments it eventually triggers
>>     a checkpoint anyway (even repeatedly).
> 
> I would tend not to worry too much about this case. I'm skeptical 
> that there are a lot of people using large template databases. But
> if there are, or if some particular one of those people hits this 
> problem, then they can raise checkpoint_segments to avoid it. The 
> reverse problem, which you are encountering, cannot be fixed by 
> adjusting settings.

That however solves "only" the checkpoint, not the double amount of I/O
due to writing both the files and WAL, no? But maybe that's OK.

Also, all this is concern only with 'wal_level != minimal', but ISTM 'on
wal_level=minimal it's fast' is a rather poor argument.

> 
> (This reminds me, yet again, that it would be really nice to something
> smarter than checkpoint_segments.  If there is little WAL activity
> between one checkpoint and the next, we should reduce the number of
> segments we're keeping around to free up disk space and ensure that
> we're recycling a file new enough that it's likely to still be in
> cache.  Recycling files long-since evicted from cache is poor.  But
> then we should also let the number of WAL files ratchet back up if the
> system again becomes busy.  Isn't this more or less what Heikki's
> soft-WAL-limit patch did?  Why did we reject that, again?)

What about simply reusing the files in a different way? Instead of
looping through the files in a round robin manner, couldn't we just use
the last recently used file, instead of going all the way back to the
first one? This won't free the disk space, but IMHO that's not a problem
because noone is going to use that space anyway (as it would be a risk
once all the segments will be used again).

Tomas




pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Next
From: Robert Haas
Date:
Subject: Re: pg_background (and more parallelism infrastructure patches)