Home > mailing lists

Re: should crash recovery ignore checkpoint_flush_after ? - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: should crash recovery ignore checkpoint_flush_after ?
Date	January 18, 2020 20:52:21
Msg-id	CA+hUKGLSx52vsSkEMN68hTf=ZKp_CJ0JuaduQXNG7L4RF9Ameg@mail.gmail.com Whole thread
In response to	should crash recovery ignore checkpoint_flush_after ? (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: should crash recovery ignore checkpoint_flush_after ?
List	pgsql-hackers

Tree view

On Sun, Jan 19, 2020 at 3:08 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> As I understand, the first thing that happens syncing every file in the data
> dir, like in initdb --sync.  These instances were both 5+TB on zfs, with
> compression, so that's slow, but tolerable, and at least understandable, and
> with visible progress in ps.
>
> The 2nd stage replays WAL.  strace show's it's occasionally running
> sync_file_range, and I think recovery might've been several times faster if
> we'd just dumped the data at the OS ASAP, fsync once per file.  In fact, I've
> just kill -9 the recovery process and edited the config to disable this lest it
> spend all night in recovery.

Does sync_file_range() even do anything for non-mmap'd files on ZFS?
Non-mmap'd ZFS data is not in the Linux page cache, and I think
sync_file_range() works at that level.  At a guess, there'd need to be
a new VFS file_operation so that ZFS could get a callback to handle
data in its ARC.

pgsql-hackers by date:

From: Peter Geoghegan
Date: 18 January 2020, 20:44:52
Subject: Re: [HACKERS] Block level parallel vacuum

From: Felipe Sateler
Date: 18 January 2020, 22:46:11
Subject: Re: Possible performance regression with pg_dump of a large numberof relations

Re: should crash recovery ignore checkpoint_flush_after ? - Mailing list pgsql-hackers

Previous

Next