proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers

From Tomas Vondra
Subject proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date
Msg-id 544D6BFB.4060906@fuzzy.cz
Whole thread Raw
Responses Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
List pgsql-hackers
Hi all,

currently, CREATE DATABASE forces an immediate checkpoint (actually, it
forces two, but the time interval is usually rather small). For
traditional deployments this is not a big deal, because creating a
database is a rare event, and may be planned to off-peak times.

However for shared (cloud-like) deployments, this is not the case. E.g.
we're hosting hundreds (or even thousands) of customer databases on some
clusters, and creating a new database is quite common.

This turns the checkpoints into a significant pain point for us, because
it forces a write of all the dirty buffers from all the databases. No
matter how well we tune the spread checkpoints, this makes it
inefficient and causes significant I/O spikes (especially with larger
shared_buffer values). It also leads to high duration of the CREATE
DATABASE command, making it rather inpractical for 'interactive' use (a
user hitting a button in a UI or something).

Based on the talks from pgconf.eu, where I've seen this mentioned in at
least two talks (and in the hallway track), we're not alone. I'd like to
address this, if possible.

The usual workaround for this is "create the databases in advance" but
that's not always possible (e.g. when having more than handful of
templates, or when the template evolves over time).

After eyeballing the code for an hour or two, I think CREATE DATABASE
should be fine with performing only a 'partial checkpoint' on the
template database - calling FlushDatabaseBuffers and processing unlink
requests, as suggested by the comment in createdb().

It's not exactly trivial change, but it does not seem frighteningly
difficult coding either.

The templates are usually static, so this would minimize both the CREATE
DATABASE duration and disruption to the cluster it causes.

My fear however is that this while the code will work, it will break the
recovery in some subtle way (as illustrated by the comments about 8.0
PITR bugs in createdb).

Am I missing something that makes this dead in the water?

regards
Tomas



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: strip nulls functions for json and jsonb
Next
From: Noah Misch
Date:
Subject: Re: narwhal and PGDLLIMPORT