Re: Pre-allocating WAL files - Mailing list pgsql-hackers

From vignesh C
Subject Re: Pre-allocating WAL files
Date
Msg-id CALDaNm1ANbtjNowYKG8uwt4Af85Yt2Q0+d7G4YVmjDSuyXAc4g@mail.gmail.com
Whole thread Raw
In response to Re: Pre-allocating WAL files  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: Pre-allocating WAL files  ("Bossart, Nathan" <bossartn@amazon.com>)
Re: Pre-allocating WAL files  ("Bossart, Nathan" <bossartn@amazon.com>)
Re: Pre-allocating WAL files  ("Bossart, Nathan" <bossartn@amazon.com>)
Re: Pre-allocating WAL files  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
On Mon, Jun 7, 2021 at 8:48 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/25/20, 12:09 PM, "Andres Freund" <andres@anarazel.de> wrote:
> > When running write heavy transactional workloads I've many times
> > observed that one needs to run the benchmarks for quite a while till
> > they get to their steady state performance. The most significant reason
> > for that is that initially WAL files will not get recycled, but need to
> > be freshly initialized. That's 16MB of writes that need to synchronously
> > finish before a small write transaction can even start to be written
> > out...
> >
> > I think there's two useful things we could do:
> >
> > 1) Add pg_wal_preallocate(uint64 bytes) that ensures (bytes +
> >    segment_size - 1) / segment_size WAL segments exist from the current
> >    point in the WAL. Perhaps with the number of bytes defaulting to
> >    min_wal_size if not explicitly specified?
> >
> > 2) Have checkpointer (we want walwriter to run with low latency to flush
> >    out async commits etc) occasionally check if WAL files need to be
> >    pre-allocated.
> >
> >    Checkpointer already tracks the amount of WAL that's expected to be
> >    generated till the end of the checkpoint, so it seems like it's a
> >    pretty good candidate to do so.
> >
> >    To keep checkpointer pre-allocating when idle we could signal it
> >    whenever a record has crossed a segment boundary.
> >
> >
> > With a plain pgbench run I see a 2.5x reduction in throughput in the
> > periods where we initialize WAL files.
>
> I've been exploring this independently a bit and noticed this message.
> Attached is a proof-of-concept patch for a separate "WAL allocator"
> process that maintains a pool of WAL-segment-sized files that can be
> claimed whenever a new segment file is needed.  An early version of
> this patch attempted to spread the I/O like non-immediate checkpoints
> do, but I couldn't point to any real benefit from doing so, and it
> complicated things quite a bit.
>
> I like the idea of trying to bake this into an existing process such
> as the checkpointer.  I'll admit that creating a new process just for
> WAL pre-allocation feels a bit heavy-handed, but it was a nice way to
> keep this stuff modularized.  I can look into moving this
> functionality into the checkpointer process if this is something that
> folks are interested in.

Thanks for posting the patch, the patch no more applies on Head:
Applying: wal segment pre-allocation
error: patch failed: src/backend/access/transam/xlog.c:3283
error: src/backend/access/transam/xlog.c: patch does not apply

Can you rebase the patch and post, it might help if someone is picking
it up for review.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Race condition in InvalidateObsoleteReplicationSlots()
Next
From: vignesh C
Date:
Subject: Re: simplifying foreign key/RI checks