Re: Replaying 48 WAL files takes 80 minutes - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: Replaying 48 WAL files takes 80 minutes
Date
Msg-id 508FA6F4.8040905@vmware.com
Whole thread Raw
In response to Re: Replaying 48 WAL files takes 80 minutes  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Responses Re: Replaying 48 WAL files takes 80 minutes
List pgsql-performance
On 30.10.2012 10:50, Albe Laurenz wrote:
> Why does WAL replay read much more than it writes?
> I thought that pretty much every block read during WAL
> replay would also get dirtied and hence written out.

Not necessarily. If a block is modified and written out of the buffer
cache before next checkpoint, the latest version of the block is already
on disk. On replay, the redo routine reads the block, sees that the
change was applied, and does nothing.

> I wonder why the performance is good in the first few seconds.
> Why should exactly the pages that I need in the beginning
> happen to be in cache?

This is probably because of full_page_writes=on. When replay has a full
page image of a block, it doesn't need to read the old contents from
disk. It can just blindly write the image to disk. Writing a block to
disk also puts that block in the OS cache, so this also efficiently
warms the cache from the WAL. Hence in the beginning of replay, you just
write a lot of full page images to the OS cache, which is fast, and you
only start reading from disk after you've filled up the OS cache. If
this theory is true, you should see a pattern in the I/O stats, where in
the first seconds there is no I/O, but the CPU is 100% busy while it
reads from WAL and writes out the pages to the OS cache. After the OS
cache fills up with the dirty pages (up to dirty_ratio, on Linux), you
will start to see a lot of writes. As the replay progresses, you will
see more and more reads, as you start to get cache misses.

- Heikki


pgsql-performance by date:

Previous
From: AndyG
Date:
Subject: Re: Slow query, where am I going wrong?
Next
From: Tatsuo Ishii
Date:
Subject: Re: out of memory