Home > mailing lists

Reduce the time required for a database recovery from archive. - Mailing list pgsql-hackers

From	Dmitry Shulga
Subject	Reduce the time required for a database recovery from archive.
Date	September 8, 2020 04:51:48
Msg-id	601EE1F5-0B78-47E1-9AAE-C15F74A1C21D@postgrespro.ru Whole thread Raw
Responses	Re: Reduce the time required for a database recovery from archive. Re: Reduce the time required for a database recovery from archive.
List	pgsql-hackers

Tree view

Hello hackers,

Currently, database recovery from archive is performed sequentially,
by reading archived WAL files and applying their records to the database.

Overall archive file processing is done one by one, and this might
create a performance bottleneck if archived WAL files are delivered slowly,
because the database server has to wait for arrival of the next
WAL segment before applying its records.

To address this issue it is proposed to receive archived WAL files in parallel
so that when the next WAL segment file is required for processing of redo log
records it would be already available.

Implementation of this approach assumes running several background processes (bgworkers)
each of which runs a shell command specified by the parameter restore_command
to deliver an archived WAL file. Number of running parallel processes is limited
by the new parameter max_restore_command_workers. If this parameter has value 0
then WAL files delivery is performed using the original algorithm, that is in
one-by-one manner. If this parameter has value greater than 0 then the database
server starts several bgworker processes up to the limit specified by
the parameter max_restore_command_workers and passes to every process
WAL file name to deliver. Active processes start prefetching of specified
WAL files and store received files in the directory pg_wal/pgsql_tmp. After
bgworker process finishes receiving a file it marks itself as a free process
and waits for a new request to receive a next WAL file. The main process
performing database recovery still handles WAL files in one-by-one manner,
but instead of waiting for a next required WAL file's availability it checks for
that file in the prefetched directory. If a new file is present there,
the main process starts its processing.

The patch implemeting the described approach is attached to this email.
The patch contains a test in the file src/test/recovery/t/021_xlogrestore.pl
Although the test result depends on real execution time and hardly could be
approved for including to the repository it was added in order to show
a positive effect from applying the new algorithm. In my environment restoring
from archive with parallel prefetching is twice as faster than in original
mode.

Regards,
Dmitry.

Attachment

archive_recovery_speedup.patch

pgsql-hackers by date:

From: Thomas Munro
Date: 08 September 2020, 04:44:17
Subject: Re: Improving connection scalability: GetSnapshotData()

From: Masahiko Sawada
Date: 08 September 2020, 05:16:17
Subject: Re: Transactions involving multiple postgres foreign servers, take 2

Reduce the time required for a database recovery from archive. - Mailing list pgsql-hackers

Attachment

Previous

Next