Home > mailing lists

Minimizing Recovery Time (wal replication) - Mailing list pgsql-general

From	Bryan Murphy
Subject	Minimizing Recovery Time (wal replication)
Date	April 9, 2009 19:27:46
Msg-id	7fd310d10904091227g1566c81i3a89abc80dca3d9f@mail.gmail.com Whole thread Raw
Responses	Re: Minimizing Recovery Time (wal replication) Re: Minimizing Recovery Time (wal replication)
List	pgsql-general

Tree view

I have two hot-spare databases that use wal archiving and continuous
recovery mode.  I want to minimize recovery time when we have to fail
over to one of our hot spares.  Right now, I'm seeing the following
behavior which makes a quick recovery seem problematic:

(1) hot spare applies 70 to 75 wal files (~1.1g) in 2 to 3 min period

(2) hot spare pauses for 15 to 20 minutes, during this period pdflush
consumes 99% IO (iotop).  Dirty (from /proc/meminfo) spikes to ~760mb,
remains at that level for the first 10 minutes, and then slowly ticks
down to 0 for the second 10 minutes.

(3) goto 1

My concern is that if the database has been in recovery mode for some
time, even if it's caught up, if I go live sometime in (1) I can face
a recovery time of upwards of 20 minutes.  We've experienced delays
during fail over in the past (not 20 minutes, but long enough to make
me second guess what we are doing).

I want to better understand what is going on so that I can determine
what I can do (if anything) to minimize down time when we fail over to
one of our hot spares.

Here are my current settings:

postgres (v8.3.7):

shared_buffers = 2GB (15GB total)
effective_cache_size = 12GB (15GB total)
checkpoint_segments = 10
checkpoint_completion_target = 0.7
(other checkpoint/bgwriter settings left at default values)

sysctl:

kernel.shmmax = 2684354560
vm.dirty_background_ratio = 1
vm.dirty_ratio = 5

Thanks,
Bryan

pgsql-general by date:

From: Dave Page
Date: 09 April 2009, 19:10:09
Subject: Re: Some suggestions for the non Linux installers

From: Russell Hltn
Date: 09 April 2009, 19:34:16
Subject: Re: Some suggestions for the non Linux installers

Minimizing Recovery Time (wal replication) - Mailing list pgsql-general

Previous

Next