Thread: Is there a way to speed up WAL replay?

Is there a way to speed up WAL replay?

From
Torsten Förtsch
Date:
Hi,

I am working on restoring a database from a base backup + WAL. With the default settings the database replays about 3-4 WAL files per second. The startup process takes about 65% of a CPU and writes data with something between 50 and 100 MB/sec.

Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec.

Thanks,
Torsten

Re: Is there a way to speed up WAL replay?

From
Jeff Janes
Date:
On Wed, Oct 31, 2018 at 1:38 AM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
Hi,

I am working on restoring a database from a base backup + WAL. With the default settings the database replays about 3-4 WAL files per second. The startup process takes about 65% of a CPU and writes data with something between 50 and 100 MB/sec.

Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec.

WAL replay is single-threaded, so the most you would be able to speed it up is 50%, to where it would be taking 100% CPU.

Is the time spent not on the CPU being spent waiting for WAL files to arrive from the restore_command, or waiting for the blocks it needs to replay into to get read from disk, or waiting for dirty pages to get written to disk so they can be evicted to make way for new ones?

One way I found to speed up restore_command is to have another program run a few WAL files ahead of it, copying the WAL from the real archive into a scratch space which is on the same filesystem as pg_xlog/pg_wal.  Then have restore_command simply move (not copy) the requested files from the scratch space onto %p.  The intra-filesystem move completes much faster than a copy.

If it spends time waiting for blocks that need to be recovered into to get read from disk, and you have enough RAM, you could speed it up by pre-warming the file system cache.  Something like:

tar -cf - $PGDATA | wc -c

Cheers,

Jeff

Re: Is there a way to speed up WAL replay?

From
Stephen Frost
Date:
Greetings,

* Torsten Förtsch (tfoertsch123@gmail.com) wrote:
> I am working on restoring a database from a base backup + WAL. With the
> default settings the database replays about 3-4 WAL files per second. The
> startup process takes about 65% of a CPU and writes data with something
> between 50 and 100 MB/sec.

What are you using for a restore_command..?  You can typically get some
improvement by using a restore_command that's faster or pre-stages files
locally to minimize the time required to run, or dumping all the WAL
into the xlog, but that's not something I typically recommend.

> Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec.

Kind of depends what you're ultimately going for..  If you've made
everything else faster and you're still only seeing 65% CPU utilization
on the one CPU, then maybe there's just enough latency to the disks to
be an issue, in which case you could possibly restore on to a ramdisk if
you've got enough memory/space and don't mind that.

There's other things that can be done too, like adjusting the amount of
shared buffers; depending on what you're doing that can also make a
difference in replay speed (we've been working to improve that though).

I tend to find that it's better to just reduce the amount of WAL that
needs to be replayed by taking incremental backups more frequently and
using things like pgbackrest's delta restore ability for doing repeated
restores to the same location.  Of course, that's not likely to help you
out much here.

Thanks!

Stephen

Attachment

Re: Is there a way to speed up WAL replay?

From
Stephen Frost
Date:
Greetings,

* Jeff Janes (jeff.janes@gmail.com) wrote:
> One way I found to speed up restore_command is to have another program run
> a few WAL files ahead of it, copying the WAL from the real archive into a
> scratch space which is on the same filesystem as pg_xlog/pg_wal.  Then have
> restore_command simply move (not copy) the requested files from the scratch
> space onto %p.  The intra-filesystem move completes much faster than a copy.

Right, that can definitely be a big help and is more-or-less what
pgbackrest does too.

> If it spends time waiting for blocks that need to be recovered into to get
> read from disk, and you have enough RAM, you could speed it up by
> pre-warming the file system cache.  Something like:
>
> tar -cf - $PGDATA | wc -c

Yeah, that's also a good idea.  We've discussed having something in
pgbackrest to basically go pre-load things off disk in advance of WAL
replay, and I seem to recall someone had also written an external tool
to do that.

Thanks!

Stephen

Attachment

Re: Is there a way to speed up WAL replay?

From
Nicolas Grilly
Date:
This tool may be useful:

Faults pages into PostgreSQL shared_buffers or filesystem caches in advance of WAL apply

Nicolas

On Wed, Oct 31, 2018 at 6:38 AM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
Hi,

I am working on restoring a database from a base backup + WAL. With the default settings the database replays about 3-4 WAL files per second. The startup process takes about 65% of a CPU and writes data with something between 50 and 100 MB/sec.

Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec.

Thanks,
Torsten

Re: Is there a way to speed up WAL replay?

From
Thomas Munro
Date:
On Thu, Nov 1, 2018 at 4:25 AM Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Oct 31, 2018 at 1:38 AM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
>> I am working on restoring a database from a base backup + WAL. With the default settings the database replays about
3-4WAL files per second. The startup process takes about 65% of a CPU and writes data with something between 50 and 100
MB/sec.
>>
>> Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec.
>
>
> WAL replay is single-threaded, so the most you would be able to speed it up is 50%, to where it would be taking 100%
CPU.
>
> Is the time spent not on the CPU being spent waiting for WAL files to arrive from the restore_command, or waiting for
theblocks it needs to replay into to get read from disk, or waiting for dirty pages to get written to disk so they can
beevicted to make way for new ones? 
>
> One way I found to speed up restore_command is to have another program run a few WAL files ahead of it, copying the
WALfrom the real archive into a scratch space which is on the same filesystem as pg_xlog/pg_wal.  Then have
restore_commandsimply move (not copy) the requested files from the scratch space onto %p.  The intra-filesystem move
completesmuch faster than a copy. 
>
> If it spends time waiting for blocks that need to be recovered into to get read from disk, and you have enough RAM,
youcould speed it up by pre-warming the file system cache.  Something like: 
>
> tar -cf - $PGDATA | wc -c

For more targeted prewarming of large systems that don't fit in RAM
and to get all the way into PostgreSQL's buffer pool, I suppose you
could write a small Python/whatever script that extracts the
relfilenode + block references from the output of pg_waldump (one file
ahead, or whatever), sorts and uniques them, merges them into block
ranges, converts the relfilenode reference to relation OID, and then
calls pg_prewarm() for each range.

--
Thomas Munro
http://www.enterprisedb.com