Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id 20943f82-4a21-d7b0-3336-ca0872062d84@enterprisedb.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers

On 6/15/21 3:31 PM, Andrew Dunstan wrote:
> 
> On 6/15/21 8:04 AM, Heikki Linnakangas wrote:
>>
>> Yeah, WAL-logging the contents of the source database would certainly
>> be less weird than the current system. As Julien also pointed out, the
>> question is, are there people using on "CREATE DATABASE foo TEMPLATE
>> bar" to copy a large source database, on the premise that it's fast
>> because it skips WAL-logging?
> 
> 
> I'm 100% certain there are. It's not even a niche case.
> 
> 
>>
>> In principle, we could have both mechanisms, and use the new
>> WAL-logged system if the database is small, and the old system with
>> checkpoints if it's large. But I don't like idea of having to maintain
>> both.
>>
>>
> 
> Rather than use size, I'd be inclined to say use this if the source
> database is marked as a template, and use the copydir approach for
> anything that isn't.
> 


I think we should be asking what is the benefit of that use case, and 
perhaps try addressing that without having to maintain two entirely 
different ways to do CREATE DATABASE. It's not like we're sure the 
current code is 100% reliable in various corner cases, I doubt having 
two separate approaches will improve the situation :-/

I can see three reasons why people want to skip the WAL logging:

1) it's faster, because there's no CPU and I/O for building the WAL

   I wonder if some optimization / batching could help with (1), as
   suggested by Andres elsewhere in this thread.

2) it saves the amount of WAL (could matter with large template 
databases and WAL archiving, etc.)

   We can't really do much about this - we need to log all the data. But
   the batching from (1) might help a bit too, I guess.

3) saves the amount of WAL that needs to be copied to standby, so that 
there's no increase of replication lag, etc. particularly when the 
network link has limited bandwidth

   I think this is a more general issue - some operations that may
   generate a lot of WAL, and we generally assume it's better to do
   that rather than hold exclusive locks for long time. But maybe we
   could have some throttling, to limit the amount of WAL per second,
   similarly to what we have to plain vacuum.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: Alvaro Herrera
Date:
Subject: Re: Unresolved repliaction hang and stop problem.