pg_dump, pg_dumpall and data durability - Mailing list pgsql-hackers

From Michael Paquier
Subject pg_dump, pg_dumpall and data durability
Date
Msg-id CAB7nPqS1uZ=Ov+UruW6jr3vB-S_DLVMPc0dQpV-fTDjmm0ZQMg@mail.gmail.com
Whole thread Raw
Responses Re: pg_dump, pg_dumpall and data durability  (Michael Paquier <michael.paquier@gmail.com>)
Re: pg_dump, pg_dumpall and data durability  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
Hi all,

In my quest of making the backup tools more compliant to data
durability, here is a thread for pg_dump and pg_dumpall. Here is in a
couple of lines my proposal:
- Addition in _archiveHandle of a field to track if the dump generated
should be synced or not.
- This is effective for all modes, when the user specifies an output
file. In short that's when fileSpec is not NULL.
- Actually do the the sync in _EndData and _EndBlob[s] if appropriate.
There is for example nothing to do for pg_backup_null.c
- Addition of --nosync option to allow users to disable it. By default
it is enabled.
Note that to make the data durable, the file need to be sync'ed as
well as its parent folder. So with pg_dump we can only make that
really durable with -Fd. I think that in the case where the user
specifies an output file for the other modes we should sync it, that's
the best we can do. This last statement applies as well for
pg_dumpall.

Of course, if no output file is specified, that's up to the user to
deal with the sync phases.

Or more simply, as truly durability can just be achieved if each file
and their parent directory are fsync'd, we just support the operation
for -Fd. Still I'd like to think htat we had better do the best we can
here and do things as well for the other modes.

Thoughts? I'd like to prepare a patch according to those lines for the next CF.
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Mention to pg_backup in pg_dump.c
Next
From: Prabhat Sahu
Date:
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server