Re: Pre-allocating WAL files - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: Pre-allocating WAL files
Date
Msg-id 265B06BA-7B16-4C1A-BE1A-1451D22A1F83@amazon.com
Whole thread Raw
In response to Pre-allocating WAL files  (Andres Freund <andres@anarazel.de>)
Responses Re: Pre-allocating WAL files
List pgsql-hackers
On 12/25/20, 12:09 PM, "Andres Freund" <andres@anarazel.de> wrote:
> When running write heavy transactional workloads I've many times
> observed that one needs to run the benchmarks for quite a while till
> they get to their steady state performance. The most significant reason
> for that is that initially WAL files will not get recycled, but need to
> be freshly initialized. That's 16MB of writes that need to synchronously
> finish before a small write transaction can even start to be written
> out...
>
> I think there's two useful things we could do:
>
> 1) Add pg_wal_preallocate(uint64 bytes) that ensures (bytes +
>    segment_size - 1) / segment_size WAL segments exist from the current
>    point in the WAL. Perhaps with the number of bytes defaulting to
>    min_wal_size if not explicitly specified?
>
> 2) Have checkpointer (we want walwriter to run with low latency to flush
>    out async commits etc) occasionally check if WAL files need to be
>    pre-allocated.
>
>    Checkpointer already tracks the amount of WAL that's expected to be
>    generated till the end of the checkpoint, so it seems like it's a
>    pretty good candidate to do so.
>
>    To keep checkpointer pre-allocating when idle we could signal it
>    whenever a record has crossed a segment boundary.
>
>
> With a plain pgbench run I see a 2.5x reduction in throughput in the
> periods where we initialize WAL files.

I've been exploring this independently a bit and noticed this message.
Attached is a proof-of-concept patch for a separate "WAL allocator"
process that maintains a pool of WAL-segment-sized files that can be
claimed whenever a new segment file is needed.  An early version of
this patch attempted to spread the I/O like non-immediate checkpoints
do, but I couldn't point to any real benefit from doing so, and it
complicated things quite a bit.

I like the idea of trying to bake this into an existing process such
as the checkpointer.  I'll admit that creating a new process just for
WAL pre-allocation feels a bit heavy-handed, but it was a nice way to
keep this stuff modularized.  I can look into moving this
functionality into the checkpointer process if this is something that
folks are interested in.

Nathan


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: back-port one-line gcc-10+ warning fix to REL_10_STABLE
Next
From: Tom Lane
Date:
Subject: Re: SQL-standard function body