Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date
Msg-id 9ca60241895f4cede3269ea33a70d9bd.squirrel@2.emaily.eu
Whole thread Raw
In response to Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT  (Atri Sharma <atri.jiit@gmail.com>)
List pgsql-hackers
Dne 27 Říjen 2014, 13:50, Atri Sharma napsal(a):
>>
>>
>>
>> IMHO writing all the data into a WAL would be the cleanest solution.
>>
>> Also, what is a small database? I don't think a static value will work,
>> because the sweet spot between the current approach (forcing two
>> checkpoints) and writing everything in WAL depends on the amount of
>> dirty
>> buffers that need to be checkpointed. Which is mostly driven by the size
>> of shared buffers and write activity - for small shared buffers and/or
>> mostly-read workload, checkpoints are cheap, so the 'small database'
>> threshold (when choosing the WAL approach) is much higher than for large
>> shared buffers or write-heavy workloads.
>>
>
> So are you proposing having a heuristic based on the amount of data in
> shared buffers and write activity? Do you have something in mind that
> works
> for general workloads as well?
>
>
>> So maybe if we could determine the amount of data to be checkpointed,
>> and
>> then base the decision on that, that'd work better? This would also have
>> to take into account that writing into WAL is sequential, while
>> checkpoints usually cause random writes all over the datafiles (which is
>> more expensive).
>>
>> Another option might be forcing just a "spread" checkpoint, not the
>> immediate one (which is what we do now). That would not fix the CREATE
>> DATABASE duration (actually, it would make it longer), but it would
>> lower
>> the impact on other activity on the machine.
>>
>>
>>
> I believe this to be the cleanest way to reduce the amount of I/O
> generated. If I understand correctly, the original problem you mentioned
> was not the time CREATE DATABASE is taking but rather the amount of I/O
> each one is generating.

Not exactly. There are two related issues, both caused by the I/O activity
from the CHECKPOINT.

(a) I/O spike, because of checkpointing everything
(b) long CREATE DATABASE durations

This "spread CREATE DATABASE" only fixes (a), and makes (b) a bit worse.

It however makes the 'create databases in advance' a bit more flexible,
because it allows more frequent runs of the cron job that actually creates
them. Right now we're forced to schedule it rather rarely (say, 1/day) to
minimize the impact, which has downsides (higher probability of running
out of spare dbs, slow response to updated template etc.).

If we can fix both (a) and (b), that'd be great. If we can fix only (a),
so be it.

> This also leads me to think if it makes sense to explore group commits
> around the creation of files for a new database (for a same backend, of
> course). This might be on call, if the user knows he/she is going to
> create
> a lot of databases in the near future and is fine with a large spike in
> I/O
> at one go. Again, might be even more broken than the current scenario, but
> depends on what the user wants...

That seems much more complex than what I proposed to do, but maybe I'm
wrong. If we could get rid of the "checkpoint everything" altogether, we
don't really need the group commit, no?

regards
Tomas




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Possible problem with shm_mq spin lock
Next
From: Michael Paquier
Date:
Subject: Re: Master ip from hot_standby..