Thread: pg_dump, pg_dumpall and data durability

pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
Hi all,

In my quest of making the backup tools more compliant to data
durability, here is a thread for pg_dump and pg_dumpall. Here is in a
couple of lines my proposal:
- Addition in _archiveHandle of a field to track if the dump generated
should be synced or not.
- This is effective for all modes, when the user specifies an output
file. In short that's when fileSpec is not NULL.
- Actually do the the sync in _EndData and _EndBlob[s] if appropriate.
There is for example nothing to do for pg_backup_null.c
- Addition of --nosync option to allow users to disable it. By default
it is enabled.
Note that to make the data durable, the file need to be sync'ed as
well as its parent folder. So with pg_dump we can only make that
really durable with -Fd. I think that in the case where the user
specifies an output file for the other modes we should sync it, that's
the best we can do. This last statement applies as well for
pg_dumpall.

Of course, if no output file is specified, that's up to the user to
deal with the sync phases.

Or more simply, as truly durability can just be achieved if each file
and their parent directory are fsync'd, we just support the operation
for -Fd. Still I'd like to think htat we had better do the best we can
here and do things as well for the other modes.

Thoughts? I'd like to prepare a patch according to those lines for the next CF.
-- 
Michael



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Thu, Oct 13, 2016 at 2:49 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> - This is effective for all modes, when the user specifies an output
> file. In short that's when fileSpec is not NULL.

I have sent the email too early here... In this case the sync is a
no-op. It is missing a sentence.
-- 
Michael



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Thu, Oct 13, 2016 at 2:49 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> In my quest of making the backup tools more compliant to data
> durability, here is a thread for pg_dump and pg_dumpall. Here is in a
> couple of lines my proposal:
> - Addition in _archiveHandle of a field to track if the dump generated
> should be synced or not.
> - This is effective for all modes, when the user specifies an output
> file. In short that's when fileSpec is not NULL.
> - Actually do the the sync in _EndData and _EndBlob[s] if appropriate.
> There is for example nothing to do for pg_backup_null.c
> - Addition of --nosync option to allow users to disable it. By default
> it is enabled.
> Note that to make the data durable, the file need to be sync'ed as
> well as its parent folder. So with pg_dump we can only make that
> really durable with -Fd. I think that in the case where the user
> specifies an output file for the other modes we should sync it, that's
> the best we can do. This last statement applies as well for
> pg_dumpall.
>
> Thoughts? I'd like to prepare a patch according to those lines for the next CF.

Okay, here is a patch doing the above. I have added a new --nosync
option to pg_dump and pg_dumpall to switch to the pre-10 behavior. I
have arrived at the conclusion that it is better not to touch at
_EndData and _EndBlob, and just issue the fsync in CloseArchive when
all the write operations are done. In the case of the directory
format, the fsync is done on all the entries recursively. This makes
as well the patch more simple. The regression tests calling pg_dump
don't use --nosync yet in this patch, that's a move that could be done
afterwards.

I have added that to next CF:
https://commitfest.postgresql.org/11/823/
--
Michael

Attachment

Re: pg_dump, pg_dumpall and data durability

From
Albe Laurenz
Date:
Michael Paquier wrote:
>> In my quest of making the backup tools more compliant to data
>> durability, here is a thread for pg_dump and pg_dumpall.
> 
> Okay, here is a patch doing the above. I have added a new --nosync
> option to pg_dump and pg_dumpall to switch to the pre-10 behavior. I
> have arrived at the conclusion that it is better not to touch at
> _EndData and _EndBlob, and just issue the fsync in CloseArchive when
> all the write operations are done. In the case of the directory
> format, the fsync is done on all the entries recursively. This makes
> as well the patch more simple. The regression tests calling pg_dump
> don't use --nosync yet in this patch, that's a move that could be done
> afterwards.

The patch does not apply, I had to change the hunk for
src/include/common/file_utils.h.

Also, compilation fails because function "pre_fsync_fname" cannot be
resolved during linking. Is that a typo for "pre_fsync_fname" or is
the patch incomplete?

Yours,
Laurenz Albe

Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Tue, Nov 8, 2016 at 1:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> The patch does not apply, I had to change the hunk for
> src/include/common/file_utils.h.

Yes, the patch has rotten a bit because of f82ec32a. 5d58c07a has also
made the --noxxx option names appearing as --no-xxx.

> Also, compilation fails because function "pre_fsync_fname" cannot be
> resolved during linking. Is that a typo for "pre_fsync_fname" or is
> the patch incomplete?

A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.

v2 is attached, fixing those issues.
--
Michael

Attachment

Re: pg_dump, pg_dumpall and data durability

From
Robert Haas
Date:
On Mon, Nov 7, 2016 at 7:52 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Nov 8, 2016 at 1:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> The patch does not apply, I had to change the hunk for
>> src/include/common/file_utils.h.
>
> Yes, the patch has rotten a bit because of f82ec32a. 5d58c07a has also
> made the --noxxx option names appearing as --no-xxx.
>
>> Also, compilation fails because function "pre_fsync_fname" cannot be
>> resolved during linking. Is that a typo for "pre_fsync_fname" or is
>> the patch incomplete?
>
> A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
> I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.
>
> v2 is attached, fixing those issues.

Kyotaro Horiguchi set this patch to "Ready for Committer" in the
CommitFest application, but that seems entirely premature to me
considering that v2 has had no review at all, or at least not that I
see on this thread.  I'm setting it back to "Needs Review".

First question: Do we even want this?  Generally, when a program
writes to a file, we rely on the operating system to decide when that
data should be written back to disk.  We have to override that
distinction for things internal to PostgreSQL because we need certain
bits of data to reach the disk in a certain order, but it's unclear to
me how far outside the core database system we want to extend that.
Are we going to have psql fsync() data it writes to a file when \o is
used, for example?  That would seem to me to be beyond insane, because
we have no idea whether the user actually needs that file to be
durable.  It is a better bet that a pg_dump command's output needs
durability, of course, but I feel that we shouldn't just go disabling
the filesystem cache one program at a time without some guiding
principle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Wed, Nov 9, 2016 at 8:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> First question: Do we even want this?  Generally, when a program
>> writes to a file, we rely on the operating system to decide when that
>> data should be written back to disk.  We have to override that
>> distinction for things internal to PostgreSQL because we need certain
>> bits of data to reach the disk in a certain order, but it's unclear to
>> me how far outside the core database system we want to extend that.
>> Are we going to have psql fsync() data it writes to a file when \o is
>> used, for example?  That would seem to me to be beyond insane, because
>> we have no idea whether the user actually needs that file to be
>> durable.  It is a better bet that a pg_dump command's output needs
>> durability, of course, but I feel that we shouldn't just go disabling
>> the filesystem cache one program at a time without some guiding
>> principle.
>
> FWIW, I find the premise pretty dubious.  People don't normally expect
> programs to fsync their standard output, and the argument that pg_dump's
> output is always critical data doesn't withstand inspection.  Also,
> I don't understand what pg_dump should do if it fails to fsync.  There
> are too many cases where that would fail (eg, output piped to a program)
> for it to be treated as an error condition.  But if it isn't reported as
> an error, then how much durability guarantee are we really adding?

If the output is piped to a program, there is no way to guarantee that
the data will be flushed and the user is responsible for that. We
cannot control all the use cases. The same applies for example with
pg_basebackup where the data is sent to stdout. IMO, the limit set is
that tools aimed at taking physical and logical backups should do a
better effort in the cases where they can do it. That's a cheap
insurance.

Based on this past thread, it seems to me that Magnus, Andres and Jim
Nasby are of the opinion that making things is useful:
https://www.postgresql.org/message-id/20160327233033.GD20662@awork2.anarazel.de
And so do I.

> I think this might be better addressed by adding something to backup.sgml
> pointing out that you'd better fsync or sync your backups before assuming
> that they can't be lost.

Perhaps. That would be better than nothing at least, but that won't
help for cases where we can help a bit.
-- 
Michael



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Wed, Nov 9, 2016 at 5:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Kyotaro Horiguchi set this patch to "Ready for Committer" in the
> CommitFest application, but that seems entirely premature to me
> considering that v2 has had no review at all, or at least not that I
> see on this thread.  I'm setting it back to "Needs Review".

That's indeed too early. Thanks for correcting.
-- 
Michael



Re: pg_dump, pg_dumpall and data durability

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> First question: Do we even want this?  Generally, when a program
> writes to a file, we rely on the operating system to decide when that
> data should be written back to disk.  We have to override that
> distinction for things internal to PostgreSQL because we need certain
> bits of data to reach the disk in a certain order, but it's unclear to
> me how far outside the core database system we want to extend that.
> Are we going to have psql fsync() data it writes to a file when \o is
> used, for example?  That would seem to me to be beyond insane, because
> we have no idea whether the user actually needs that file to be
> durable.  It is a better bet that a pg_dump command's output needs
> durability, of course, but I feel that we shouldn't just go disabling
> the filesystem cache one program at a time without some guiding
> principle.

FWIW, I find the premise pretty dubious.  People don't normally expect
programs to fsync their standard output, and the argument that pg_dump's
output is always critical data doesn't withstand inspection.  Also,
I don't understand what pg_dump should do if it fails to fsync.  There
are too many cases where that would fail (eg, output piped to a program)
for it to be treated as an error condition.  But if it isn't reported as
an error, then how much durability guarantee are we really adding?

I think this might be better addressed by adding something to backup.sgml
pointing out that you'd better fsync or sync your backups before assuming
that they can't be lost.
        regards, tom lane



Re: pg_dump, pg_dumpall and data durability

From
Albe Laurenz
Date:
Michael Paquier wrote:
> A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
> I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.
> 
> v2 is attached, fixing those issues.

The patch applies and compiles fine.

I have tested it on Linux and MinGW and could see the fsync(2) and
FlushFileBuffers calls I expected.
This adds crash safety for a reasonable price, and I think we should have that.

The documentation additions are sufficient.

Looking through the patch, I had two questions that are more about
style and consistency than anything else:

- In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that the return code is ignored, but not
anywhereelse.  Is that by design?
 

- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for pg_dump (because -N is already taken
there).I'd opt for either using the same short option for both or (IMO better) only offering a long option for both.
Thiswould avoid confusion, and we expect that few people will want to use this option anyway, right?
 

Yours,
Laurenz Albe

Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Fri, Nov 11, 2016 at 11:03 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> - In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that
>   the return code is ignored, but not anywhere else.  Is that by design?

Right. The patch is lacking consistency in this area. The main thought
regarding this design is to not consider a fsync failure as critical,
and just issue a warning in stderr. I have updated the two other call
sites with a (void) cast.

> - For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
>   pg_dump (because -N is already taken there).
>   I'd opt for either using the same short option for both or (IMO better)
>   only offering a long option for both.

Okay. For consistency's sake let's do that. I was a bit hesitant
regarding that to be honest.

>   This would avoid confusion, and we expect that few people will want to use
>   this option anyway, right?

Definitely a good point.
--
Michael

Attachment

Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Sat, Nov 12, 2016 at 1:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Nov 11, 2016 at 11:03 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>>   This would avoid confusion, and we expect that few people will want to use
>>   this option anyway, right?
>
> Definitely a good point.

Meh. I forgot docs and --help output updates.
--
Michael

Attachment

Re: pg_dump, pg_dumpall and data durability

From
Dilip Kumar
Date:
On Sat, Nov 12, 2016 at 10:16 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> - For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
>>   pg_dump (because -N is already taken there).
>>   I'd opt for either using the same short option for both or (IMO better)
>>   only offering a long option for both.
>
> Okay. For consistency's sake let's do that. I was a bit hesitant
> regarding that to be honest.

Seems like you have missed to remove -N at some places, as mentioned below.

1.
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -365,6 +365,21 @@ PostgreSQL documentation     </varlistentry>
     <varlistentry>
+      <term><option>-N</option></term>

@@ -543,6 +557,7 @@ help(void)


2. printf(_("\nGeneral options:\n")); printf(_("  -f, --file=FILENAME          output file name\n"));
+ printf(_("  -N, --no-sync                do not wait for changes to
be written safely to disk\n"));

3.
- while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx",
long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "acd:f:gh:l:NoOp:rsS:tU:vwWx",
long_options, &optindex)) != -1)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Sun, Nov 13, 2016 at 12:17 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Sat, Nov 12, 2016 at 10:16 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> - For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
>>>   pg_dump (because -N is already taken there).
>>>   I'd opt for either using the same short option for both or (IMO better)
>>>   only offering a long option for both.
>>
>> Okay. For consistency's sake let's do that. I was a bit hesitant
>> regarding that to be honest.
>
> Seems like you have missed to remove -N at some places, as mentioned below.
>
> 1.
> --- a/doc/src/sgml/ref/pg_dumpall.sgml
> +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> @@ -365,6 +365,21 @@ PostgreSQL documentation
>       </varlistentry>
>
>       <varlistentry>
> +      <term><option>-N</option></term>
>
> @@ -543,6 +557,7 @@ help(void)
>
>
> 2.
>   printf(_("\nGeneral options:\n"));
>   printf(_("  -f, --file=FILENAME          output file name\n"));
> + printf(_("  -N, --no-sync                do not wait for changes to
> be written safely to disk\n"));

v4 fixed those two places.

> 3.
> - while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx",
> long_options, &optindex)) != -1)
> + while ((c = getopt_long(argc, argv, "acd:f:gh:l:NoOp:rsS:tU:vwWx",
> long_options, &optindex)) != -1)

But not this one...
--
Michael

Attachment

Re: pg_dump, pg_dumpall and data durability

From
Peter Eisentraut
Date:
On 11/8/16 3:48 PM, Robert Haas wrote:
> First question: Do we even want this?  Generally, when a program
> writes to a file, we rely on the operating system to decide when that
> data should be written back to disk.  We have to override that
> distinction for things internal to PostgreSQL because we need certain
> bits of data to reach the disk in a certain order, but it's unclear to
> me how far outside the core database system we want to extend that.

I had voiced a similar concern previously:
https://www.postgresql.org/message-id/f8dff810-f5f4-77c3-933b-127df4ed94e5@2ndquadrant.com

At the time, there were no other comments, so we went ahead with it,
which presumably gave encouragement to pursue the current patch.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump, pg_dumpall and data durability

From
Andres Freund
Date:
Hi,

On 2016-11-08 18:18:01 -0500, Tom Lane wrote:
> I think this might be better addressed by adding something to backup.sgml
> pointing out that you'd better fsync or sync your backups before assuming
> that they can't be lost.

How does a normal user do that? I don't think there's a cross-platform
advice we can give, and even on *nix the answer basically is 'sync;
sync;' which is a pretty big hammer, and might be completely
unacceptable on a busy server.

Regards,

Andres



Re: pg_dump, pg_dumpall and data durability

From
Albe Laurenz
Date:
Michael Paquier wrote:
> Meh. I forgot docs and --help output updates.

Looks good, except that you left the "N" option in the getopt_long
call for pg_dumpall.  I fixed that in the attached patch.

I'll mark the patch "ready for committer".

Yours,
Laurenz Albe

Attachment

Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Michael Paquier wrote:
>> Meh. I forgot docs and --help output updates.
>
> Looks good, except that you left the "N" option in the getopt_long
> call for pg_dumpall.  I fixed that in the attached patch.

No, v5 has removed it, but it does not matter much now...

> I'll mark the patch "ready for committer".

Thanks!
-- 
Michael



Re: pg_dump, pg_dumpall and data durability

From
Robert Haas
Date:
On Sun, Nov 13, 2016 at 4:18 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-11-08 18:18:01 -0500, Tom Lane wrote:
>> I think this might be better addressed by adding something to backup.sgml
>> pointing out that you'd better fsync or sync your backups before assuming
>> that they can't be lost.
>
> How does a normal user do that? I don't think there's a cross-platform
> advice we can give, and even on *nix the answer basically is 'sync;
> sync;' which is a pretty big hammer, and might be completely
> unacceptable on a busy server.

Yeah, that's a pretty fair point.  I see the point of this patch
pretty clearly but somehow it makes me nervous anyway.  I'm not sure
there's any better alternative to what's being proposed, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Mon, Nov 14, 2016 at 6:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Michael Paquier wrote:
>>> Meh. I forgot docs and --help output updates.
>>
>> Looks good, except that you left the "N" option in the getopt_long
>> call for pg_dumpall.  I fixed that in the attached patch.
>
> No, v5 has removed it, but it does not matter much now...
>
>> I'll mark the patch "ready for committer".
>
> Thanks!

Moved to CF 2017-01.
-- 
Michael



Re: pg_dump, pg_dumpall and data durability

From
Fujii Masao
Date:
On Wed, Nov 16, 2016 at 1:27 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Nov 13, 2016 at 4:18 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-11-08 18:18:01 -0500, Tom Lane wrote:
>>> I think this might be better addressed by adding something to backup.sgml
>>> pointing out that you'd better fsync or sync your backups before assuming
>>> that they can't be lost.
>>
>> How does a normal user do that? I don't think there's a cross-platform
>> advice we can give, and even on *nix the answer basically is 'sync;
>> sync;' which is a pretty big hammer, and might be completely
>> unacceptable on a busy server.
>
> Yeah, that's a pretty fair point.  I see the point of this patch
> pretty clearly but somehow it makes me nervous anyway.  I'm not sure
> there's any better alternative to what's being proposed, though.

One idea is to provide the utility or extension which fsync's the specified
files and directories (including all the files under them). It may be overkill,
though. But it would be useful for other purposes, for example, we can
improve the durability of archived WAL files by setting this utility in
archive_command, and we can flush any output files generated by psql
by using it.

Regards,

-- 
Fujii Masao



Re: pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Tue, Dec 6, 2016 at 6:00 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> One idea is to provide the utility or extension which fsync's the specified
> files and directories (including all the files under them). It may be overkill,
> though. But it would be useful for other purposes, for example, we can
> improve the durability of archived WAL files by setting this utility in
> archive_command, and we can flush any output files generated by psql
> by using it.

That's the pg_copy utility that we talked a couple of months
(year(s)?) back, which would really be useful for archiving, and the
main advantage is that it would be cross-platform. Still the patch
discussed here is more targeted, this makes sure that once pg_dump
leaves, things created are on disk, facilitating the life of users by
not involving an extra step and by making the safe behavior the
default.
-- 
Michael



Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Tue, Nov 29, 2016 at 1:30 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Nov 14, 2016 at 6:07 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>>> Michael Paquier wrote:
>>>> Meh. I forgot docs and --help output updates.
>>>
>>> Looks good, except that you left the "N" option in the getopt_long
>>> call for pg_dumpall.  I fixed that in the attached patch.
>>
>> No, v5 has removed it, but it does not matter much now...
>>
>>> I'll mark the patch "ready for committer".
>>
>> Thanks!
>
> Moved to CF 2017-01.

Moved to CF 2017-03 with the same status, ready for committer. Perhaps
there is some interest in this feature? v5 of the patch still applies,
with a couple of hunks, so v6 is attached.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
David Steele
Date:
On 1/22/17 11:02 PM, Michael Paquier wrote:
> On Tue, Nov 29, 2016 at 1:30 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Nov 14, 2016 at 6:07 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>>>> Michael Paquier wrote:
>>>>> Meh. I forgot docs and --help output updates.
>>>>
>>>> Looks good, except that you left the "N" option in the getopt_long
>>>> call for pg_dumpall.  I fixed that in the attached patch.
>>>
>>> No, v5 has removed it, but it does not matter much now...
>>>
>>>> I'll mark the patch "ready for committer".
>>>
>>> Thanks!
>>
>> Moved to CF 2017-01.
> 
> Moved to CF 2017-03 with the same status, ready for committer. Perhaps
> there is some interest in this feature? v5 of the patch still applies,
> with a couple of hunks, so v6 is attached.

This patch is in need of a committer.  Any takers?

I didn't see a lot of enthusiasm from committers on the thread so if
nobody picks it up by the end of the CF I'm going to mark the patch RWF.

-- 
-David
david@pgmasters.net



Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Thu, Mar 2, 2017 at 2:26 AM, David Steele <david@pgmasters.net> wrote:
> This patch is in need of a committer.  Any takers?
> I didn't see a lot of enthusiasm from committers on the thread

Stephen at least has showed interest.

> so if nobody picks it up by the end of the CF I'm going to mark the patch RWF.

Yes, that makes sense. If no committer is willing to have a look at
code-level and/or has room for this patch then it is as good as
doomed. Except for bug fixes, I have a couple of other patches that
are piling up so they would likely get the same treatment. There is
nothing really we can do about that.
-- 
Michael



Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Robert Haas
Date:
On Thu, Mar 2, 2017 at 5:02 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Mar 2, 2017 at 2:26 AM, David Steele <david@pgmasters.net> wrote:
>> This patch is in need of a committer.  Any takers?
>> I didn't see a lot of enthusiasm from committers on the thread
>
> Stephen at least has showed interest.
>
>> so if nobody picks it up by the end of the CF I'm going to mark the patch RWF.
>
> Yes, that makes sense. If no committer is willing to have a look at
> code-level and/or has room for this patch then it is as good as
> doomed. Except for bug fixes, I have a couple of other patches that
> are piling up so they would likely get the same treatment. There is
> nothing really we can do about that.

Before we reach the question of whether committers have time to look
at this, we should first consider the question of whether it has
achieved consensus.  I'll try to summarize the votes:

Tom Lane: premise pretty dubious
Robert Haas: do we even want this?
Peter Eisentraut: I had voiced a similar concern [to Robert's] previously
Albe Laurenz: I think we should have that
Andres Freund: [Tom's counterproposal won't work]
Robert Haas: [Andres has a good point, still nervous] I'm not sure
there's any better alternative to what's being proposed, though.
Fujii Masao: One idea is to provide the utility or extension which
fsync's the specified files and directories

Here's an attempt to translate those words into numerical votes.  As
per my usual practice, I will count the patch author as +1:

Michael Paquier: +1
Tom Lane: -1
Peter Eisentraut: -1
Albe Laurenz: +1
Andres Freund: +1
Robert Haas: +0.5
Fujii Masao: -0.5

So, my interpretation is that, out of 7 votes, we have -2.5 and +3.5.
Perhaps that is a consensus for proceeding, but if so it's a pretty
marginal one.  I think the next step for this patch is to consider why
we shouldn't, in lieu of what's proposed here, add a pg_fsync utility
that fsyncs a file or recursively fsyncs a directory, ship that, and
let people use it on their pg_dump files and/or base backups if they
wish.  I am not altogether convinced that's a better option, but I am
also not altogether convinced that it's worse.  Also, if anyone else
wishes to vote, or if anyone to whom I've attributed a vote wishes to
assign a numerical value to their vote other than the one I've
assigned, please feel free.

The issue with this patch isn't that nobody is interested, but that
not everybody thinks it's a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Sat, Mar 4, 2017 at 3:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 2, 2017 at 5:02 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Mar 2, 2017 at 2:26 AM, David Steele <david@pgmasters.net> wrote:
>>> This patch is in need of a committer.  Any takers?
>>> I didn't see a lot of enthusiasm from committers on the thread
>>
>> Stephen at least has showed interest.
>>
>>> so if nobody picks it up by the end of the CF I'm going to mark the patch RWF.
>>
>> Yes, that makes sense. If no committer is willing to have a look at
>> code-level and/or has room for this patch then it is as good as
>> doomed. Except for bug fixes, I have a couple of other patches that
>> are piling up so they would likely get the same treatment. There is
>> nothing really we can do about that.
>
> Before we reach the question of whether committers have time to look
> at this, we should first consider the question of whether it has
> achieved consensus.  I'll try to summarize the votes:
>
> Tom Lane: premise pretty dubious
> Robert Haas: do we even want this?
> Peter Eisentraut: I had voiced a similar concern [to Robert's] previously
> Albe Laurenz: I think we should have that
> Andres Freund: [Tom's counterproposal won't work]
> Robert Haas: [Andres has a good point, still nervous] I'm not sure
> there's any better alternative to what's being proposed, though.
> Fujii Masao: One idea is to provide the utility or extension which
> fsync's the specified files and directories
>
> Here's an attempt to translate those words into numerical votes.  As
> per my usual practice, I will count the patch author as +1:
>
> Michael Paquier: +1
> Tom Lane: -1
> Peter Eisentraut: -1
> Albe Laurenz: +1
> Andres Freund: +1
> Robert Haas: +0.5
> Fujii Masao: -0.5
>
> So, my interpretation is that, out of 7 votes, we have -2.5 and +3.5.
> Perhaps that is a consensus for proceeding, but if so it's a pretty
> marginal one.  I think the next step for this patch is to consider why
> we shouldn't, in lieu of what's proposed here, add a pg_fsync utility
> that fsyncs a file or recursively fsyncs a directory, ship that, and
> let people use it on their pg_dump files and/or base backups if they
> wish.  I am not altogether convinced that's a better option, but I am
> also not altogether convinced that it's worse.  Also, if anyone else
> wishes to vote, or if anyone to whom I've attributed a vote wishes to
> assign a numerical value to their vote other than the one I've
> assigned, please feel free.

Not completely exact by including this message:
https://www.postgresql.org/message-id/20170123050248.GO18360@tamriel.snowman.net
If I interpret this message correctly Stephen Frost can be counted
with either +1 or +0.5.
-- 
Michael



Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Andrew Dunstan
Date:

On 03/04/2017 01:08 AM, Robert Haas wrote:
> On Thu, Mar 2, 2017 at 5:02 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Mar 2, 2017 at 2:26 AM, David Steele <david@pgmasters.net> wrote:
>>> This patch is in need of a committer.  Any takers?
>>> I didn't see a lot of enthusiasm from committers on the thread
>> Stephen at least has showed interest.
>>
>>> so if nobody picks it up by the end of the CF I'm going to mark the patch RWF.
>> Yes, that makes sense. If no committer is willing to have a look at
>> code-level and/or has room for this patch then it is as good as
>> doomed. Except for bug fixes, I have a couple of other patches that
>> are piling up so they would likely get the same treatment. There is
>> nothing really we can do about that.
> Before we reach the question of whether committers have time to look
> at this, we should first consider the question of whether it has
> achieved consensus.  I'll try to summarize the votes:
>
> Tom Lane: premise pretty dubious
> Robert Haas: do we even want this?
> Peter Eisentraut: I had voiced a similar concern [to Robert's] previously
> Albe Laurenz: I think we should have that
> Andres Freund: [Tom's counterproposal won't work]
> Robert Haas: [Andres has a good point, still nervous] I'm not sure
> there's any better alternative to what's being proposed, though.
> Fujii Masao: One idea is to provide the utility or extension which
> fsync's the specified files and directories
>
> Here's an attempt to translate those words into numerical votes.  As
> per my usual practice, I will count the patch author as +1:
>
> Michael Paquier: +1
> Tom Lane: -1
> Peter Eisentraut: -1
> Albe Laurenz: +1
> Andres Freund: +1
> Robert Haas: +0.5
> Fujii Masao: -0.5
>
> So, my interpretation is that, out of 7 votes, we have -2.5 and +3.5.
> Perhaps that is a consensus for proceeding, but if so it's a pretty
> marginal one.  I think the next step for this patch is to consider why
> we shouldn't, in lieu of what's proposed here, add a pg_fsync utility
> that fsyncs a file or recursively fsyncs a directory, ship that, and
> let people use it on their pg_dump files and/or base backups if they
> wish.  I am not altogether convinced that's a better option, but I am
> also not altogether convinced that it's worse.  Also, if anyone else
> wishes to vote, or if anyone to whom I've attributed a vote wishes to
> assign a numerical value to their vote other than the one I've
> assigned, please feel free.
>
> The issue with this patch isn't that nobody is interested, but that
> not everybody thinks it's a good idea.
>


This is really a pretty small patch all things considered, and pretty
low-risk (although I haven;t been threough the code in fine detail yet).
In the end I'm persuaded by Andres' point that there's actually no
practical alternative way to make sure the data is actually synced to disk.

If nobody else wants to pick it up I will, unless there is a strong
objection.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Wed, Mar 22, 2017 at 6:24 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> This is really a pretty small patch all things considered, and pretty
> low-risk (although I haven;t been threough the code in fine detail yet).
> In the end I'm persuaded by Andres' point that there's actually no
> practical alternative way to make sure the data is actually synced to disk.
>
> If nobody else wants to pick it up I will, unless there is a strong
> objection.

Thanks!
-- 
Michael



Re: [HACKERS] pg_dump, pg_dumpall and data durability

From
Michael Paquier
Date:
On Wed, Mar 22, 2017 at 7:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 6:24 AM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> This is really a pretty small patch all things considered, and pretty
>> low-risk (although I haven;t been threough the code in fine detail yet).
>> In the end I'm persuaded by Andres' point that there's actually no
>> practical alternative way to make sure the data is actually synced to disk.
>>
>> If nobody else wants to pick it up I will, unless there is a strong
>> objection.
>
> Thanks!

Thanks Andrew, I can see that this has been committed as 96a7128b.

I also saw that:
https://www.postgresql.org/message-id/75e1b6ff-c3d5-9a26-e38b-3cb22a099ff0@2ndQuadrant.com
I'll send a patch in a bit for the regression tests.
-- 
Michael