Thread: ON CONFLICT DO NOTHING on pg_dump

ON CONFLICT DO NOTHING on pg_dump

From
Surafel Temesgen
Date:

Hello,

Sometimes I have to maintain two similar database and I have to update one from the other and notice having the option to add ON CONFLICT DO NOTHING clause to INSERT command in the dump data will allows pg_restore to be done with free of ignore error.

The attache patch add --on-conflect-do-nothing option to pg_dump in order to do the above.

regards

Surafel

Attachment

RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
>From: Surafel Temesgen [mailto:surafel3000@gmail.com] 
>Subject: ON CONFLICT DO NOTHING on pg_dump

>Sometimes I have to maintain two similar database and I have to update one from the other and notice having the option
toadd ON CONFLICT DO NOTHING clause to >INSERT command in the dump data will allows pg_restore to be done with free of
ignoreerror.
 

Hi,
I feel like that on-conflict-do-nothing support is useful especially coupled with --data-only option.
Only the difference of data can be restored.

>The attache patch add --on-conflect-do-nothing option to pg_dump in order to do the above. 

The followings are some comments.

+      <term><option>--on-conflect-do-nothing</option></term>
Here's a typo: conflect -> conflict. This typo also applies to pg_dump.c

        printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
+       printf(_("  --on-conflect-do-nothing     dump data as INSERT commands with on conflect do nothing\n"));
        printf(_("  --no-comments                do not dump comments\n"));

The output of help should be in alphabetical order according to the convention. So changing the order seems logical. 
Please apply my review to the documentation as well.
By the way, 4d6a854 breaks the patch on this point.

+        This option is not valid unless <option>--inserts</option> is also specified.
+       </para>
       
+       if (dopt.do_nothing && !dopt.dump_inserts)
+               exit_horribly(NULL, "option --on-conflect-do-nothing requires option --inserts\n");

How about mentioning --column-inserts? --on-conflict-do-nothing with --column-inserts should work.

Do you have any plan to support on-conlict-do-update? Supporting this seems to me complicated and take much time so I
don'tmind not implementing this.
 

What do you think about adding some test cases? 
command_fails_like() at 001_basic.pl checks command fail pattern with invalid comnibation of option.
And 002_pg_dump.pl checks the feature iteself.

Regards,
Takeshi Ideriha

Re: ON CONFLICT DO NOTHING on pg_dump

From
Nico Williams
Date:
On Tue, Jun 12, 2018 at 09:05:23AM +0000, Ideriha, Takeshi wrote:
> >From: Surafel Temesgen [mailto:surafel3000@gmail.com] 
> >Subject: ON CONFLICT DO NOTHING on pg_dump
> 
> >Sometimes I have to maintain two similar database and I have to update one from the other and notice having the
optionto add ON CONFLICT DO NOTHING clause to >INSERT command in the dump data will allows pg_restore to be done with
freeof ignore error.
 
> 
> Hi,
> I feel like that on-conflict-do-nothing support is useful especially coupled with --data-only option.
> Only the difference of data can be restored.

But that's additive-only.  Only missing rows are restored this way, and
differences are not addressed.

If you want restore to restore data properly and concurrently (as
opposed to renaming a new database into place or whatever) then you'd
want a) MERGE, b) dump to generate MERGE statements.  A concurrent data
restore operation would be rather neat.

Nico
-- 


Re: ON CONFLICT DO NOTHING on pg_dump

From
Surafel Temesgen
Date:


On Tue, Jun 12, 2018 at 12:05 PM, Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com> wrote:
thank you for the review
Hi,
I feel like that on-conflict-do-nothing support is useful especially coupled with --data-only option.
Only the difference of data can be restored.

>The attache patch add --on-conflect-do-nothing option to pg_dump in order to do the above.

The followings are some comments.

+      <term><option>--on-conflect-do-nothing</option></term>
Here's a typo: conflect -> conflict. This typo also applies to pg_dump.c

        printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
+       printf(_("  --on-conflect-do-nothing     dump data as INSERT commands with on conflect do nothing\n"));
        printf(_("  --no-comments                do not dump comments\n"));

The output of help should be in alphabetical order according to the convention. So changing the order seems logical.
Please apply my review to the documentation as well.
By the way, 4d6a854 breaks the patch on this point.

+        This option is not valid unless <option>--inserts</option> is also specified.
+       </para>

+       if (dopt.do_nothing && !dopt.dump_inserts)
+               exit_horribly(NULL, "option --on-conflect-do-nothing requires option --inserts\n");

How about mentioning --column-inserts? --on-conflict-do-nothing with --column-inserts should work.
fixed 

Do you have any plan to support on-conlict-do-update? Supporting this seems to me complicated and take much time so I don't mind not implementing this.
i agree its complicated and i don't have a plan to implementing it.  
What do you think about adding some test cases?
command_fails_like() at 001_basic.pl checks command fail pattern with invalid comnibation of option.
And 002_pg_dump.pl checks the feature iteself.
thank you for pointing me that i add basic test and it seems to me the rest of the test is covered by column_inserts test
Regards,
Takeshi Ideriha

Attachment

RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
>-----Original Message-----
>From: Nico Williams [mailto:nico@cryptonector.com]
>On Tue, Jun 12, 2018 at 09:05:23AM +0000, Ideriha, Takeshi wrote:
>> >From: Surafel Temesgen [mailto:surafel3000@gmail.com]
>> >Subject: ON CONFLICT DO NOTHING on pg_dump
>>
>> >Sometimes I have to maintain two similar database and I have to update one from
>the other and notice having the option to add ON CONFLICT DO NOTHING clause to
>>INSERT command in the dump data will allows pg_restore to be done with free of
>ignore error.
>>
>> Hi,
>> I feel like that on-conflict-do-nothing support is useful especially coupled with
>--data-only option.
>> Only the difference of data can be restored.
>
>But that's additive-only.  Only missing rows are restored this way, and differences are
>not addressed.
>
>If you want restore to restore data properly and concurrently (as opposed to renaming
>a new database into place or whatever) then you'd want a) MERGE, b) dump to
>generate MERGE statements.  A concurrent data restore operation would be rather
>neat.

I agree with you though supporting MERGE or ON-CONFLICT-DO-UPDATE seems hard work.
Only ON-CONCLICT-DO-NOTHING use case may be narrow.

--
Takeshi 



RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
Hi,

>-----Original Message-----
>From: Surafel Temesgen [mailto:surafel3000@gmail.com]
>thank you for the review
>
>    Do you have any plan to support on-conlict-do-update? Supporting this seems
>to me complicated and take much time so I don't mind not implementing this.
>
>
>i agree its complicated and i don't have a plan to implementing it.

Sure.

>thank you for pointing me that i add basic test and it seems to me the rest of the test
>is covered by column_inserts test

Thank you for updating. 
Just one comment for the code.

+       if (dopt.do_nothing && !(dopt.dump_inserts || dopt.column_inserts))
I think you can remove dopt.column_inserts here because at this point dopt.dump_inserts has
already turned true if --column-inserts is specified. 

But I'm just inclined to think that use case of this feature is vague.
Could you specify more concrete use case that is practical for many users?
(It would lead to more attention to this patch.)
For example, is it useful to backup just before making a big change to DB like delete tables?

===
Takeshi

Re: ON CONFLICT DO NOTHING on pg_dump

From
Dilip Kumar
Date:
On Thu, Jun 14, 2018 at 4:09 PM, Surafel Temesgen <surafel3000@gmail.com> wrote:
>
>
> thank you for pointing me that i add basic test and it seems to me the rest
> of the test is covered by column_inserts test

@@ -172,6 +172,7 @@ typedef struct _dumpOptions
  char    *outputSuperuser;

  int sequence_data; /* dump sequence data even in schema-only mode */
+ int do_nothing;
 } DumpOptions;

The new structure member appears out of place, can you move up along
with other "command-line long options" ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: ON CONFLICT DO NOTHING on pg_dump

From
Surafel Temesgen
Date:


On Sat, Jun 16, 2018 at 11:36 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

@@ -172,6 +172,7 @@ typedef struct _dumpOptions
  char    *outputSuperuser;

  int sequence_data; /* dump sequence data even in schema-only mode */
+ int do_nothing;
 } DumpOptions;

The new structure member appears out of place, can you move up along
with other "command-line long options" ?

Done

regards Surafel

Attachment

Re: ON CONFLICT DO NOTHING on pg_dump

From
Nico Williams
Date:
On Fri, Jun 15, 2018 at 02:20:21AM +0000, Ideriha, Takeshi wrote:
> >From: Nico Williams [mailto:nico@cryptonector.com]
> >On Tue, Jun 12, 2018 at 09:05:23AM +0000, Ideriha, Takeshi wrote:
> >> Only the difference of data can be restored.
> >
> >But that's additive-only.  Only missing rows are restored this way, and differences are
> >not addressed.
> >
> >If you want restore to restore data properly and concurrently (as opposed to renaming
> >a new database into place or whatever) then you'd want a) MERGE, b) dump to
> >generate MERGE statements.  A concurrent data restore operation would be rather
> >neat.
> 
> I agree with you though supporting MERGE or ON-CONFLICT-DO-UPDATE seems hard work.
> Only ON-CONCLICT-DO-NOTHING use case may be narrow.

Is it narrow, or is it just easy enough to add quickly?

And by the way, you don't need MERGE.  You can just generate INSERT/
UPDATE/DELETE statements -- MERGE is mainly an optimization on that, and
could wait until PG has a MERGE.

Nico
-- 


RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
>> I agree with you though supporting MERGE or ON-CONFLICT-DO-UPDATE seems
>hard work.
>> Only ON-CONCLICT-DO-NOTHING use case may be narrow.
>
>Is it narrow, or is it just easy enough to add quickly?

Sorry for late replay.
I read your comment and rethought about it.
What I meant by "narrow" is that the number of people who use this new feature seems limited to me.
And I had a wrong impression that small improvements are always rejected by hackers.
But that's only the case for small improvements with huge source code modification. In short, cost/benefit ratio is low
case.

This patch have some benefits with source code change is small.
So I just wait for other people reviews.

>And by the way, you don't need MERGE.  You can just generate INSERT/
>UPDATE/DELETE statements -- MERGE is mainly an optimization on that, and could
>wait until PG has a MERGE.
Oh, thank you for clarifying this. Now I understand it.

Best regards,
Takeshi Ideriha




RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
Hi,

>    The new structure member appears out of place, can you move up along
>    with other "command-line long options" ?
>
>
>
>Done
>

I did regression tests (make check-world) and 
checked manually pg_dump --on-conflict-do-nothing works properly.
Also it seems to me the code has no problem.
This feature has advantage to some users with small code change.

So I marked it as 'Ready for committer'.
I'd like to wait and see committers opinions.

Regards,
Takeshi Ideriha


Re: ON CONFLICT DO NOTHING on pg_dump

From
Thomas Munro
Date:
On Wed, Jul 11, 2018 at 2:20 PM, Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> I did regression tests (make check-world) and
> checked manually pg_dump --on-conflict-do-nothing works properly.
> Also it seems to me the code has no problem.
> This feature has advantage to some users with small code change.
>
> So I marked it as 'Ready for committer'.
> I'd like to wait and see committers opinions.

Yeah, it's not an earth-shattering feature, but it might occasionally
be useful and isn't complicated.

+        Add ON CONFLICT DO NOTHING clause in the INSERT commands.

I think this would be better as: Add <literal>ON CONFLICT DO
NOTHING</literal> to <command>INSERT</command> commands.

+    printf(_("  --on-conflict-do-nothing     dump data as INSERT
commands with ON CONFLICT DO NOTHING \n"));

That's slightly misleading... let's just use the same wording again,
eg "add ON CONFLICT DO NOTHING to INSERT commands".

                {"no-unlogged-table-data", no_argument,
&no_unlogged_table_data, 1},
+               {"on-conflict-do-nothing", no_argument,
&on_conflict_do_nothing, 1},

I was tempted to say that this should be in alphabetical order, but
then I noticed that these are only *almost* in alphabetical order, not
quite.  Oh well.

Here's the version I'd like to commit, if there are no objections.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
Hi, thanks for the revision.

>
>+        Add ON CONFLICT DO NOTHING clause in the INSERT commands.
>
>I think this would be better as: Add <literal>ON CONFLICT DO NOTHING</literal> to
><command>INSERT</command> commands.

Agreed.

>+    printf(_("  --on-conflict-do-nothing     dump data as INSERT
>commands with ON CONFLICT DO NOTHING \n"));
>
>That's slightly misleading... let's just use the same wording again, eg "add ON
>CONFLICT DO NOTHING to INSERT commands".

Agreed. But you forgot fixing it at pg_dump.c.
So could you please fix this and commit it?

Regards,
Takeshi Ideriha


Re: ON CONFLICT DO NOTHING on pg_dump

From
Thomas Munro
Date:
On Fri, Jul 13, 2018 at 12:33 PM, Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
>>+        Add ON CONFLICT DO NOTHING clause in the INSERT commands.
>>
>>I think this would be better as: Add <literal>ON CONFLICT DO NOTHING</literal> to
>><command>INSERT</command> commands.
>
> Agreed.
>
>>+    printf(_("  --on-conflict-do-nothing     dump data as INSERT
>>commands with ON CONFLICT DO NOTHING \n"));
>>
>>That's slightly misleading... let's just use the same wording again, eg "add ON
>>CONFLICT DO NOTHING to INSERT commands".
>
> Agreed. But you forgot fixing it at pg_dump.c.
> So could you please fix this and commit it?

Right, thanks.

I noticed one more thing: pg_dumpall.c doesn't really need to prohibit
--on-conflict-do-nothing without --insert.  Its existing validation
rejects illegal combinations of the settings that are *not* passed on
to pg_dump.  It seems OK to just pass those on and let pg_dump
complain.  For example, if you say "pg_dumpall --data-only
--schema-only", it's pg_dump that complains, not pg_dumpall.  I think
we should do the same thing here.

Pushed, with those changes.

Thanks for the patch and the reviews!

-- 
Thomas Munro
http://www.enterprisedb.com


RE: ON CONFLICT DO NOTHING on pg_dump

From
"Ideriha, Takeshi"
Date:
>I noticed one more thing: pg_dumpall.c doesn't really need to prohibit
>--on-conflict-do-nothing without --insert.  Its existing validation rejects illegal
>combinations of the settings that are *not* passed on to pg_dump.  It seems OK to
>just pass those on and let pg_dump complain.  For example, if you say "pg_dumpall
>--data-only --schema-only", it's pg_dump that complains, not pg_dumpall.  I think we
>should do the same thing here.

Thank you for the clarification. I didn't give thought to pg_dumpall internally running pg_dump.

>Pushed, with those changes.
Thanks!