Thread: pg_dump multi VALUES INSERT

pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:

Hello,

According to the documentation –inserts option is mainly useful for making dumps that can be loaded into non-PostgreSQL databases and to reduce the amount of rows that might lost during error in reloading but multi values insert command are equally portable and compact and also faster to reload than single row statement. I think it deserve an option of its own

The patch attached add additional option for multi values insert statement with a default values of 100 row per statement so the row lose during error is at most 100 rather than entire table.

Comments?

Regards

Surafel

Attachment

Re: pg_dump multi VALUES INSERT

From
Tom Lane
Date:
Surafel Temesgen <surafel3000@gmail.com> writes:
> According to the documentation –inserts option is mainly useful for making
> dumps that can be loaded into non-PostgreSQL databases and to reduce the
> amount of rows that might lost during error in reloading but multi values
> insert command are equally portable and compact and also faster to reload
> than single row statement. I think it deserve an option of its own

I don't actually see the point of this.  If you want dump/reload speed
you should be using COPY.  If that isn't your first priority, it's
unlikely that using an option like this would be a good idea.  It makes
the effects of a bad row much harder to predict, and it increases your
odds of OOM problems with wide rows substantially.

I grant that COPY might not be an option if you're trying to transfer
data to a different DBMS, but the other problems seem likely to apply
anywhere.  The bad-data hazard, in particular, is probably a much larger
concern than it is for Postgres-to-Postgres transfers.

            regards, tom lane


Re: pg_dump multi VALUES INSERT

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Surafel Temesgen <surafel3000@gmail.com> writes:
> > According to the documentation –inserts option is mainly useful for making
> > dumps that can be loaded into non-PostgreSQL databases and to reduce the
> > amount of rows that might lost during error in reloading but multi values
> > insert command are equally portable and compact and also faster to reload
> > than single row statement. I think it deserve an option of its own
>
> I don't actually see the point of this.  If you want dump/reload speed
> you should be using COPY.  If that isn't your first priority, it's
> unlikely that using an option like this would be a good idea.  It makes
> the effects of a bad row much harder to predict, and it increases your
> odds of OOM problems with wide rows substantially.
>
> I grant that COPY might not be an option if you're trying to transfer
> data to a different DBMS, but the other problems seem likely to apply
> anywhere.  The bad-data hazard, in particular, is probably a much larger
> concern than it is for Postgres-to-Postgres transfers.

I can't say that I can really buy off on this argument.

The point of it is that it makes loading into other RDBMS faster.  Yes,
it has many of the same issues as our COPY does, but we support it
because it's much faster.  The same is true here, just for other
databases, so I'm +1 on the general idea.

I've not looked at the patch itself at all, yet.

Thanks!

Stephen

Attachment

Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
> The patch attached add additional option for multi values insert statement
> with a default values of 100 row per statement so the row lose during error
> is at most 100 rather than entire table.

Patch does not seem to apply anymore, could you rebase?

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
Michael Paquier
Date:
On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote:
> The point of it is that it makes loading into other RDBMS faster.  Yes,
> it has many of the same issues as our COPY does, but we support it
> because it's much faster.  The same is true here, just for other
> databases, so I'm +1 on the general idea.

Well, the patch author has mentioned that he cares about also being able
to detect errors when processing the dump, which multi inserts make that
easier to check for.  However, even if you specify --data-only you still
need to worry about the first SET commands ahead, which requires manual
handling of the dump...

I am honestly not convinced that it is worth complicating pg_dump for
that, as there is no guarantee either that the DDLs generated by pg_dump
will be compatible with what other systems expect.  This kind of
compatibility for fetch and reload can also be kind of tricky with
portability issues, so I'd rather let this stuff being handled correctly
by other tools like pgloader or others rather than expecting to get this
stuff half-baked within Postgres core tools.
--
Michael

Attachment

Re: pg_dump multi VALUES INSERT

From
Stephen Frost
Date:
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote:
> > The point of it is that it makes loading into other RDBMS faster.  Yes,
> > it has many of the same issues as our COPY does, but we support it
> > because it's much faster.  The same is true here, just for other
> > databases, so I'm +1 on the general idea.
>
> Well, the patch author has mentioned that he cares about also being able
> to detect errors when processing the dump, which multi inserts make that
> easier to check for.  However, even if you specify --data-only you still
> need to worry about the first SET commands ahead, which requires manual
> handling of the dump...

That's hardly a serious complication..

> I am honestly not convinced that it is worth complicating pg_dump for
> that, as there is no guarantee either that the DDLs generated by pg_dump
> will be compatible with what other systems expect.  This kind of
> compatibility for fetch and reload can also be kind of tricky with
> portability issues, so I'd rather let this stuff being handled correctly
> by other tools like pgloader or others rather than expecting to get this
> stuff half-baked within Postgres core tools.

I can see an argument for not wanting to complicate pg_dump, but we've
explicitly stated that the purpose of --inserts is to facilitate
restoring into other database systems and I don't agree that we should
just punt on that entirely.  For better or worse, there's an awful lot
of weight put behind things which are in core and we should take care to
do what we can to make those things better, especially when someone is
proposing a patch to improve the situation.

Sure, the patch might need work or have other issues, but that's an
opportunity for us to provide feedback to the author and encourage them
to improve the patch.

As for the other things that make it difficult to use pg_dump to get a
result out that can be loaded into other database systems- let's try to
improve on that too.  Having an option to skip the 'setup' bits, such as
the SET commands, certainly wouldn't be hard.

I certainly don't see us adding code to pg_dump to handle 'fetch and
reload' into some non-PG system, or, really, even into a PG system, and
that certainly isn't something the patch does, so I don't think it's a
particularly interesting argument.  Users can handle that as needed
themselves.

In other words, none of the arguments put forth really seem to be a
reason to reject the general idea of this patch, so I'm still +1 on
that.  Having just glanced over the patch quickly, I think I would have
done something like '--inserts=100' as the way to enable it instead of
adding a new option though.  Not that I feel very strongly about it.

Thanks!

Stephen

Attachment

Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:
hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Patch does not seem to apply anymore, could you rebase?


The attached patch is a rebased version and work by ‘inserts=100’ as Stephen suggest


regards

Surafel
Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
> 
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> 
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.  It shouldn't take much more code to handle
it that way, which makes more sense to me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_dump multi VALUES INSERT

From
Pavel Stehule
Date:


út 6. 11. 2018 v 18:18 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.  It shouldn't take much more code to handle
it that way, which makes more sense to me.


+1

100 looks really strange

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Tue, Nov 6, 2018 at 8:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100. 
ok
It shouldn't take much more code to handle
it that way, which makes more sense to me

yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert statement and if it is not specified one row per statement used

regards

Surafel

Attachment

Re: pg_dump multi VALUES INSERT

From
Dmitry Dolgov
Date:
> On Thu, Nov 8, 2018 at 2:03 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
>
> yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert
statementand if it is not specified one row per statement used
 

Hi,

Unfortunately, patch needs to be rebased, could you please post an updated
version?


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Fri, Nov 30, 2018 at 7:16 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Unfortunately, patch needs to be rebased, could you please post an updated
version?
 
Thank you for informing, Here is an updated patch against current master
Regards
Surafel
 
Attachment

Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello Surafel,

> Thank you for informing, Here is an updated patch against current master

Patch applies cleanly, compiles, "make check" is okay, but given that the 
feature is not tested...

Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided: 
This feature is seldom used (hmmm... 2 existing instances), because it 
interferes with argument processing if such switches are used as the last 
one. It is only okay with commands which do not expect arguments. For 
backward compatibility, this suggests to add another switch, eg 
--insert-multi=100 or whatever, which would possibly default to 100. The 
alternative is to break compatibility with adding a mandatory argument, 
but I guess it would not be admissible to committers.

Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to 
avoid it and use some stricter option and error out on malformed integers.

The --help output does not document the --inserts argument, nor the 
documentation.

There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding 
function, I'd suggest to simply extend it so that it takes into account 
the "multi insert" setting and default to the previous behavior if not 
set.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Tue, Dec 25, 2018 at 2:47 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Thank you for looking into it

Hello Surafel,

> Thank you for informing, Here is an updated patch against current master

Patch applies cleanly, compiles, "make check" is okay, but given that the
feature is not tested...

Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided:
This feature is seldom used (hmmm... 2 existing instances), because it
interferes with argument processing if such switches are used as the last
one. It is only okay with commands which do not expect arguments. For
backward compatibility, this suggests to add another switch, eg
--insert-multi=100 or whatever, which would possibly default to 100. The
alternative is to break compatibility with adding a mandatory argument,
but I guess it would not be admissible to committers.

Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to
avoid it and use some stricter option and error out on malformed integers.

The --help output does not document the --inserts argument, nor the
documentation.


done

There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding
function, I'd suggest to simply extend it so that it takes into account
the "multi insert" setting and default to the previous behavior if not
set. 
 

At first i also try to do it like that but it seems the function will became long and more complex to me

Regards

Surafel

Attachment

Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
> At first i also try to do it like that but it seems the function will
> became long and more complex to me

Probably. But calling it with size 100 should result in the same behavior, 
so it is really just an extension of the preceeding one? Or am I missing 
something?

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> At first i also try to do it like that but it seems the function will
> became long and more complex to me

Probably. But calling it with size 100 should result in the same behavior,
so it is really just an extension of the preceeding one? Or am I missing
something?


Specifying table data using single value insert statement and user specified values insert statement
have enough deference that demand to be separate function and they are not the same thing that should implement
with the same function. Regarding code duplication i think the solution is making those code separate function
and call at appropriate place.

Regards
Surafel

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR:  syntax error at or near ")"
LINE 1: );
        ^

I'm not aware of a valid way to insert multiple 0 column rows in a
single INSERT statement, so not sure how you're going to handle that
case.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:
Hi,
Thank you for looking at it
On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR:  syntax error at or near ")"
LINE 1: );
        ^

The attach patch contain a fix for it
Regards
Surafel
 
Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
>> Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

> The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.

I've not looked at the rest of the patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Thu, Jan 3, 2019 at 1:38 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
>> Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

> The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.


right ,  attach patch add more detail comment

regards
Surafel
Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> > At first i also try to do it like that but it seems the function will
>> > became long and more complex to me
>>
>> Probably. But calling it with size 100 should result in the same behavior,
>> so it is really just an extension of the preceeding one? Or am I missing
>> something?
>>
>
> Specifying table data using single value insert statement and user specified values insert statement
> have enough deference that demand to be separate function and they are not the same thing that should implement
> with the same function. Regarding code duplication i think the solution is making those code separate function
> and call at appropriate place.

I don't really buy this. I've just hacked up a version of
dumpTableData_insert() which supports a variable number rows per
statement. It seems fairly clean and easy to me. Likely the fact that
this is very possible greatly increases the chances of this getting in
since it gets rid of the code duplication. I did also happen to move
the row building code out of the function into its own function, but
that's not really required. I just did that so I could see all the
code in charge of terminating each statement on my screen without
having to scroll.  I've not touched any of the plumbing work to plug
the rows_per_statement variable into the command line argument. So
it'll need a bit of merge work with the existing patch.   There will
need to be some code that ensures that the user does not attempt to
have 0 rows per statement. The code I wrote won't behave well if that
happens.

... Checks existing patch ...

I see you added a test, but missed checking for 0. That needs to be fixed.

+ if (dopt.dump_inserts_multiple < 0)
+ {
+ write_msg(NULL, "argument of --insert-multi must be positive number\n");
+ exit_nicely(1);
+ }

I also didn't adopt passing the rows-per-statement into the FETCH.  I
think that's a very bad idea and we should keep that strictly at 100.
I don't see any reason to tie the two together. If a user wants 10
rows per statement, do we really want to FETCH 10 times more often?
And what happens when they want 1 million rows per statement? We've no
reason to run out of memory from this since we're just dumping the
rows out to the archive on each row.

1.

+        Specify the number of values per <command>INSERT</command> command.
+        This will make the dump file smaller than <option>--inserts</option>
+        and it is faster to reload but lack per row data lost on error
+        instead entire affected insert statement data lost.

Unsure what you mean about "data lost".  It also controls the number
of "rows" per <command>INSERT</command> statement, not the number of
values.

I think it would be fine just to say:

+        When using <option>--inserts</option>, this allows the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+        This setting defaults to 1.

I've used "maximum" there as the final statement on each table can
have less and also 0-column tables will always be 1 row per statement.

2. Is --insert-multi a good name? What if they do --insert-multi=1?
That's not very "multi".  Is --rows-per-insert better?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
FWIW you can insert multiple zero-column rows with "insert into ..
select union all select union all select".

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Fri, Jan 4, 2019 at 3:08 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> > At first i also try to do it like that but it seems the function will
>> > became long and more complex to me
>>
>> Probably. But calling it with size 100 should result in the same behavior,
>> so it is really just an extension of the preceeding one? Or am I missing
>> something?
>>
>
> Specifying table data using single value insert statement and user specified values insert statement
> have enough deference that demand to be separate function and they are not the same thing that should implement
> with the same function. Regarding code duplication i think the solution is making those code separate function
> and call at appropriate place.

I don't really buy this. I've just hacked up a version of
dumpTableData_insert() which supports a variable number rows per
statement. It seems fairly clean and easy to me. Likely the fact that
this is very possible greatly increases the chances of this getting in
since it gets rid of the code duplication. I did also happen to move
the row building code out of the function into its own function, but
that's not really required. I just did that so I could see all the
code in charge of terminating each statement on my screen without
having to scroll.  I've not touched any of the plumbing work to plug
the rows_per_statement variable into the command line argument. So
it'll need a bit of merge work with the existing patch.   There will
need to be some code that ensures that the user does not attempt to
have 0 rows per statement. The code I wrote won't behave well if that
happens.


The attache patch use your method mostly 

... Checks existing patch ...

I see you added a test, but missed checking for 0. That needs to be fixed.

+ if (dopt.dump_inserts_multiple < 0)
+ {
+ write_msg(NULL, "argument of --insert-multi must be positive number\n");
+ exit_nicely(1);
+ }


fixed

I also didn't adopt passing the rows-per-statement into the FETCH.  I
think that's a very bad idea and we should keep that strictly at 100.
I don't see any reason to tie the two together. If a user wants 10
rows per statement, do we really want to FETCH 10 times more often?
And what happens when they want 1 million rows per statement? We've no
reason to run out of memory from this since we're just dumping the
rows out to the archive on each row.


okay 

+        Specify the number of values per <command>INSERT</command> command.
+        This will make the dump file smaller than <option>--inserts</option>
+        and it is faster to reload but lack per row data lost on error
+        instead entire affected insert statement data lost.

Unsure what you mean about "data lost".  It also controls the number
of "rows" per <command>INSERT</command> statement, not the number of
values.

I think it would be fine just to say:

+        When using <option>--inserts</option>, this allows the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+        This setting defaults to 1.



i change it too except "This setting defaults to 1"  because it doesn't have default value.
1 row per statement means --inserts option .
   

2. Is --insert-multi a good name? What if they do --insert-multi=1?
That's not very "multi".  Is --rows-per-insert better?


--rows-per-insert is better for me .

regards
Surafel
Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen <surafel3000@gmail.com> wrote:
> The attache patch use your method mostly

I disagree with the "mostly" part.  As far as I can see, you took the
idea and then made a series of changes to completely break it.  For
bonus points, you put back my comment change to make it incorrect
again.

Here's what I got after applying your latest patch:

$ pg_dump --table=t --inserts --rows-per-insert=4 postgres

[...]
INSERT INTO public.t VALUES (1);
)
INSERT INTO public.t VALUES (, ( 2);
)
INSERT INTO public.t VALUES (, ( 3);
)
INSERT INTO public.t VALUES (, ( 4);
);
INSERT INTO public.t VALUES (5);
)
INSERT INTO public.t VALUES (, ( 6);
)
INSERT INTO public.t VALUES (, ( 7);
)
INSERT INTO public.t VALUES (, ( 8);
);
INSERT INTO public.t VALUES (9);
)
;

I didn't test, but I'm pretty sure that's not valid INSERT syntax.

I'd suggest taking my changes and doing the plumbing work to tie the
rows_per_statement into the command line arg instead of how I left it
hardcoded as 3.

>> +        When using <option>--inserts</option>, this allows the maximum number
>> +        of rows per <command>INSERT</command> statement to be specified.
>> +        This setting defaults to 1.
>>
> i change it too except "This setting defaults to 1"  because it doesn't have default value.
> 1 row per statement means --inserts option .

If it does not default to 1 then what happens when the option is not
specified?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
"David G. Johnston"
Date:
On Thu, Jan 17, 2019 at 5:15 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Jan 4, 2019 at 3:08 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
>>
>> On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote:
>>
>>
>> 2. Is --insert-multi a good name? What if they do --insert-multi=1?
>> That's not very "multi".  Is --rows-per-insert better?
>>
>
> --rows-per-insert is better for me .

Some thoughts/suggestions:

+ int dump_inserts_multiple;

The option name uses rows, seems like this should mirror that and be
named "dump_inserts_max_rows"

+     <varlistentry>
+      <term><option>--rows-per-insert</option></term>
+      <listitem>
+       <para>
+        When using <option>--rows-per-insert</option>, this allows
the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+       </para>
+      </listitem>
+     </varlistentry>

"When using <repeat option name from 20 characters ago>..." - no other
option description uses this redundant language and this should not
either.  Just say what it does.

This specifies the maximum number of rows (default 1) that will be
attached to each <command>INSERT</command> command generated by the
<option>--inserts</option> or <option>--column-inserts</option>
options; exactly one of which must be specified as well. (see my note
at the end)

+ printf(_("  --rows-per-insert            number of row per INSERT
command\n"));

(maximum?) number of row[s] per INSERT command

+ qr/\Qpg_dump: option --on-conflict-do-nothing requires option
--inserts , --rows-per-insert or --column-inserts\E/,
+ 'pg_dump: option --on-conflict-do-nothing requires option --inserts
, --rows-per-insert or --column-inserts');

You don't put spaces on both sides of the comma, just after; and add a
comma before the "or" (I think...not withstanding the below)

I suggest we require that --rows-per-insert be dependent upon exactly
one of --inserts or --column-inserts being set and not let it be set
by itself (in which case the original message for
--on-conflict-do-nothing is OK).

David J.


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Fri, Jan 18, 2019 at 7:14 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen <surafel3000@gmail.com> wrote:
> The attache patch use your method mostly

I disagree with the "mostly" part.  As far as I can see, you took the
idea and then made a series of changes to completely break it.  For
bonus points, you put back my comment change to make it incorrect
again.

Here's what I got after applying your latest patch:

$ pg_dump --table=t --inserts --rows-per-insert=4 postgres

[...]
INSERT INTO public.t VALUES (1);
)
INSERT INTO public.t VALUES (, ( 2);
)
INSERT INTO public.t VALUES (, ( 3);
)
INSERT INTO public.t VALUES (, ( 4);
);
INSERT INTO public.t VALUES (5);
)
INSERT INTO public.t VALUES (, ( 6);
)
INSERT INTO public.t VALUES (, ( 7);
)
INSERT INTO public.t VALUES (, ( 8);
);
INSERT INTO public.t VALUES (9);
)
;

I didn't test, but I'm pretty sure that's not valid INSERT syntax.

this happen because i don't disallow the usage of --inserts  and --rows-per-insert
option together.it should be error out in those case.i correct it in attached patch


I'd suggest taking my changes and doing the plumbing work to tie the
rows_per_statement into the command line arg instead of how I left it
hardcoded as 3.

>> +        When using <option>--inserts</option>, this allows the maximum number
>> +        of rows per <command>INSERT</command> statement to be specified.
>> +        This setting defaults to 1.
>>
> i change it too except "This setting defaults to 1"  because it doesn't have default value.
> 1 row per statement means --inserts option .

If it does not default to 1 then what happens when the option is not
specified

if --inserts option specified it use single values insert statement otherwise
it use COPY command

regards
Surafel
Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <surafel3000@gmail.com> wrote:
> this happen because i don't disallow the usage of --inserts  and --rows-per-insert
> option together.it should be error out in those case.i correct it in attached patch

I don't think it should be an error. It's not like the two options
conflict. I imagined that you'd need to specify you want --inserts and
optionally could control how many rows per statement that would be put
in those commands. I'd be surprised to be confronted with an error for
asking for that.

It might be worth doing the same as what we do if --column-inserts is
specified without --inserts. In this case we just do:

/* --column-inserts implies --inserts */
if (dopt.column_inserts)
dopt.dump_inserts = 1;

If you do it that way you'll not need to modify the code much from how
I wrote it. We can likely debate if we want --rows-per-insert to imply
--inserts once there's a working patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Fri, Jan 18, 2019 at 2:29 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <surafel3000@gmail.com> wrote:
> this happen because i don't disallow the usage of --inserts  and --rows-per-insert
> option together.it should be error out in those case.i correct it in attached patch

I don't think it should be an error. It's not like the two options
conflict. I imagined that you'd need to specify you want --inserts and
optionally could control how many rows per statement that would be put
in those commands. I'd be surprised to be confronted with an error for
asking for that.


if you specified --inserts option you already specified the number of rows per statement which is 1 .
if more than one rows per statement needed it must be specified using --rows-per-insert
and specifying one row per statement using --inserts option at the same time specify
different number of rows per statement with --rows-per-insert option seems conflicting to me.   

It might be worth doing the same as what we do if --column-inserts is
specified without --inserts. In this case we just do:

/* --column-inserts implies --inserts */
if (dopt.column_inserts)
dopt.dump_inserts = 1;

If you do it that way you'll not need to modify the code much from how
I wrote it. We can likely debate if we want --rows-per-insert to imply
--inserts once there's a working patch.
 

version 3 of the patch work in similar way except it doesn't have two option.

regards
Surafel

Re: pg_dump multi VALUES INSERT

From
"David G. Johnston"
Date:
On Fri, Jan 18, 2019 at 5:02 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Jan 18, 2019 at 2:29 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
>>
>> On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <surafel3000@gmail.com> wrote:
>> > this happen because i don't disallow the usage of --inserts  and --rows-per-insert
>> > option together.it should be error out in those case.i correct it in attached patch
>>
>> I don't think it should be an error. It's not like the two options
>> conflict. I imagined that you'd need to specify you want --inserts and
>> optionally could control how many rows per statement that would be put
>> in those commands. I'd be surprised to be confronted with an error for
>> asking for that.
>>
>
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.

So, the other way of looking at it - why do we even need an entirely
new option.  Modify --inserts to accept an optional integer value that
defaults to 1 (I'm not sure how tricky dealing with optional option
values is though...).

--inserts-columns implies --inserts but if you want to change the
number of rows you need to specify both (or add the same optional
integer to --inserts-columns)

David J.


Re: pg_dump multi VALUES INSERT

From
"David G. Johnston"
Date:
On Tue, Dec 25, 2018 at 4:47 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> ISTM that command-line switches with optional arguments should be avoided:
> This feature is seldom used (hmmm... 2 existing instances), because it
> interferes with argument processing if such switches are used as the last
> one.

Excellent point; though avoiding adding yet another limited-use option
seems like a fair trade-off here.  Though maybe we also need to add
the traditional "--" option as well. I'm not married to the idea
though; but its also not like mis-interpreting the final argument as
an integer instead of a database is going to be a silent error.

David J.


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen <surafel3000@gmail.com> wrote:
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.

So you're saying an INSERT, where you insert multiple rows in a single
statement is not an insert? That logic surprises me.  --inserts makes
pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
INSERTs. I'd love to know why you think there's some conflict with
that.

By your logic, you could say --column-inserts and --inserts should
also conflict, but they don't. --column-inserts happens to be coded to
imply --inserts. I really suggest we follow the lead from that. Doing
it this way reduces the complexity of the code where we build the
INSERT statement.  Remember that a patch that is overly complex has
much less chance of making it.  I'd really suggest you keep this as
simple as possible.

It also seems perfectly logical to me to default --rows-per-insert to
1 so that when --inserts is specified we do 1 row per INSERT. If the
user changes that value to something higher then nothing special needs
to happen as the function building the INSERT statement will always be
paying attention to whatever the --rows-per-insert value is set to.
That simplifies the logic meaning you don't need to check if --inserts
was specified.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Tue, Jan 22, 2019 at 3:35 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen <surafel3000@gmail.com> wrote:
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.

So you're saying an INSERT, where you insert multiple rows in a single
statement is not an insert? That logic surprises me.  --inserts makes
pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
INSERTs. I'd love to know why you think there's some conflict with
that.

By your logic, you could say --column-inserts and --inserts should
also conflict, but they don't. --column-inserts happens to be coded to
imply --inserts. I really suggest we follow the lead from that. Doing
it this way reduces the complexity of the code where we build the
INSERT statement.  Remember that a patch that is overly complex has
much less chance of making it.  I'd really suggest you keep this as
simple as possible.


okay i understand it now .Fabien also comment about it uptread i misunderstand it as
using separate new option.

It also seems perfectly logical to me to default --rows-per-insert to
1 so that when --inserts is specified we do 1 row per INSERT. If the
user changes that value to something higher then nothing special needs
to happen as the function building the INSERT statement will always be
paying attention to whatever the --rows-per-insert value is set to.
That simplifies the logic meaning you don't need to check if --inserts
was specified.

 
okay .thank you for explaining. i attach a patch corrected as such
 
Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
Nice stuff.

Is it possible to avoid the special case for 0 columns by using the
UNION ALL syntax I showed?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello Surafel,

> okay .thank you for explaining. i attach a patch corrected as such

About this v9: applies cleanly, compiles, global and local "make check" 
ok.

The option is not exercise in the TAP tests. I'd suggest that it should be 
tested on a small table with zero, 1, more than the value set number of 
rows. Maybe use -t and other options to reduce the output to the minimum.

About the documentation:

  +   When using <option>--rows-per-insert</option>, this allows the maximum number
  +   of rows per <command>INSERT</command> statement to be specified.

I'd suggest a more direct and simple style, something like:

   Set the maximum number of rows per INSERT statement.
   This option implies --inserts.
   Default to 1.

About the help message, the new option expects an argument, but it does 
not show:

  +  printf(_("  --rows-per-insert    number of row per INSERT command\n"));

About the code, maybe avoid using an int as a bool, eg:

     ... && !dopt.dump_inserts_multiple)
  -> ... && dopt.dump_inserts_multiple == 0)

Spacing around operators, eg: "!=1)" -> "!= 1)"

ISTM that the "dump_inserts_multiple" field is useless, you can reuse 
"dump_inserts" instead, i.e. --inserts sets it to 1 *if zero*, and 
--rows-per-inserts=XXX sets it to XXX. That would simplify the code 
significantly.

ISTM that there are indentation issues, eg on added

   if (dopt->dump_inserts_multiple == 1) {

The old code is not indented properly.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Is it possible to avoid the special case for 0 columns by using the
> UNION ALL syntax I showed?

It would be possible, but my thoughts are that we're moving away from
the SQL standard by doing so.

Looking at the standard I see:

<row value constructor element list> ::=
<row value constructor element> [ { <comma> <row value constructor
element> }... ]

so it appears that multirow VALUES clauses are allowed.

INSERT INTO ... DEFAULT VALUES; is standard too, but looking at
SELECT, neither the target list or FROM clause is optional.

You could maybe argue that 0-column tables are not standard anyway.
Going by DROP COLUMN I see "4) C shall be a column of T and C shall
not be the only column of T.". Are we the only database to break that?

I think since pg_dump --inserts is meant to be for importing data into
other databases then we should likely keep it as standard as possible.

Another argument against is that we've only supported empty SELECT
clauses since 9.4, so it may not help anyone who mysteriously wanted
to import data into an old version. Maybe that's a corner case, but
I'm sure 0 column tables are too.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen <surafel3000@gmail.com> wrote:
> okay .thank you for explaining. i attach a patch corrected as such

I did a bit of work to this to fix a bunch of things:

1. Docs for --rows-per-insert didn't mention anything about a parameter.
2. You'd not followed the alphabetical order of how the parameters are
documented.
3. Various parts of the docs claimed that --inserts just inserted 1
row per statement. Those needed to be updated.
4. New options out of order in --help. The rest were in alphabetical order.
5. DumpOptions struct variable was not in the right place. It was
grouped in with some parameterless options.
6. Code in dumpTableData_insert() was convoluted. Not sure what you
had added end_of_statement for or why you were checking PQntuples(res)
== 0.  You'd also made the number_of_row variable 1-based and set it
to 1 when we had added 0 rows. You then checked for the existence of 1
row by checking the variable was > 1... That made very little sense to
me. I've pretty much put back the code that I had sent to you
previously, just without the part where I split the row building code
out into another function.
7. A comment in dumpTableData_insert() claimed that the insertStmt
would end in "VALUES(", but it'll end in "VALUES ". I had updated that
in my last version, but you must have missed that.
8. I've made it so --rows-per-insert implies --inserts. This is
aligned with how --column-inserts behaves.

I changed a few other things. I simplified the condition that raises
an error when someone does --on-conflict-do-nothing without the
--inserts option. There was no need to check for the other options
that imply --inserts since that will already be enabled if one of the
other options has.

I also removed most of the error checking you'd added to ensure that
the --rows-per-insert parameter was a number.  I'd have kept this but
I saw that we do nothing that special for the compression option. I
didn't see why --rows-per-insert was any more special. It was quite a
bit of code for very little reward.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello David & Surafel,

About this v10:

Patch applies and compiles cleanly, local & global "make check" ok.

A few comments, possibly redundant with some already in the thread.

Out of abc-order rows-per-inserts option in getopt struct.

help string does not specify the expected argument.

I still think that the added rows_per_insert field is useless, ISTM That 
"int dump_inserts" can be reused, and you could also drop boolean 
got_rows_per_insert.

The feature is not tested anywhere. I still think that there should be a 
test on empty/small/larger-than-rows-per-insert tables, possibly added to 
existing TAP-tests.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Wed, 23 Jan 2019 at 22:08, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Out of abc-order rows-per-inserts option in getopt struct.

I missed that. Thanks. Fixed in attached.

> help string does not specify the expected argument.

Also fixed.

> I still think that the added rows_per_insert field is useless, ISTM That
> "int dump_inserts" can be reused, and you could also drop boolean
> got_rows_per_insert.

I thought about this and looked into it, but I decided it didn't look
like a smart thing to do.  The reason is that if --inserts sets
dump_inserts to 1 then --rows-per-insert sets it to something else
that's likely fine, but if that happens in the opposite order then the
--rows-per-insert gets overwritten with 1. The bad news is the order
that happens is defined by the order of the command line args.  It
might be possible to make it work by having --inserts set some other
variable, then set dump_inserts to 1 if it's set to 0 and the other
variable is set to >= 1... but that requires another variable,  which
is what you want to avoid... I think it's best to have a variable per
argument.

I could get rid of the got_rows_per_insert variable, but it would
require setting the default value for rows_per_insert in the main()
function rather than in InitDumpOptions().   I thought
InitDumpOptions() looked like just the place to do this, so went with
that option.  To make it work without got_rows_per_insert,
rows_per_insert would have to be 0 by default and we'd know we saw a
--rows-per-insert command line arg by the fact that rows_per_insert
was non-zero.  Would you rather have it that way?

> The feature is not tested anywhere. I still think that there should be a
> test on empty/small/larger-than-rows-per-insert tables, possibly added to
> existing TAP-tests.

I was hoping to get away with not having to do that... mainly because
I've no idea how.   Please have at it if you know how it's done.

FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable
to make it pass because I couldn't really figure out how the regex
matching is meant to work. It does not seem to be explained very well
in the comments and I lack patience for Perl.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello David,

> I thought about this and looked into it, but I decided it didn't look
> like a smart thing to do.  The reason is that if --inserts sets
> dump_inserts to 1 then --rows-per-insert sets it to something else
> that's likely fine, but if that happens in the opposite order then the
> --rows-per-insert gets overwritten with 1.

You can test before doing that!

   case X:
     if (opt.dump_inserts == 0)
       opt.dump_inserts = 1;
     // otherwise option is already set

> The bad news is the order that happens is defined by the order of the 
> command line args.

> It might be possible to make it work by having --inserts set some other 
> variable,

ISTM that it is enough to test whether the variable is zero.

> then set dump_inserts to 1 if it's set to 0 and the other variable is 
> set to >= 1... but that requires another variable, which is what you 
> want to avoid...

I still do not understand the need for another variable.

   int ninserts = 0; // default is to use copy
   while (getopt...)
   {
     switch (...) {
       case "--inserts":
         if (ninserts == 0) ninserts = 1;
         break;
       case "--rows-per-insert":
         ninserts = arg_value;
         checks...
         break;
      ...

> I think it's best to have a variable per argument.

I disagree, because it adds complexity where none is needed: here the new 
option is an extension of a previous one, thus the previous one just 
becomes a particular case, so it seems simpler to manage it as the 
particular case it is rather than a special case, creating the need for 
checking the consistency and so if two variables are used.

> I could get rid of the got_rows_per_insert variable, but it would
> require setting the default value for rows_per_insert in the main()
> function rather than in InitDumpOptions().   I thought
> InitDumpOptions() looked like just the place to do this, so went with
> that option.  To make it work without got_rows_per_insert,
> rows_per_insert would have to be 0 by default and we'd know we saw a
> --rows-per-insert command line arg by the fact that rows_per_insert
> was non-zero.  Would you rather have it that way?

Yep, esp as rows_per_insert & dump_inserts could be the same.

>> The feature is not tested anywhere. I still think that there should be a
>> test on empty/small/larger-than-rows-per-insert tables, possibly added to
>> existing TAP-tests.
>
> I was hoping to get away with not having to do that... mainly because
> I've no idea how.

Hmmm. That is another question! Maybe someone will help.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> I still do not understand the need for another variable.
>
>    int ninserts = 0; // default is to use copy
>    while (getopt...)
>    {
>      switch (...) {
>        case "--inserts":
>          if (ninserts == 0) ninserts = 1;
>          break;
>        case "--rows-per-insert":
>          ninserts = arg_value;
>          checks...
>          break;
>       ...

I didn't think of that. Attached is a version that changes it to work
along those lines.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
On 2019-Jan-23, David Rowley wrote:

> On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Is it possible to avoid the special case for 0 columns by using the
> > UNION ALL syntax I showed?
> 
> It would be possible, but my thoughts are that we're moving away from
> the SQL standard by doing so.

Ah, that's a good point that I missed -- I agree with your reasoning.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> >> The feature is not tested anywhere. I still think that there should be a
> >> test on empty/small/larger-than-rows-per-insert tables, possibly added to
> >> existing TAP-tests.
> >
> > I was hoping to get away with not having to do that... mainly because
> > I've no idea how.
>
> Hmmm. That is another question! Maybe someone will help.

Here's another version, same as before but with tests this time.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Thu, 31 Jan 2019 at 11:49, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Here's another version, same as before but with tests this time.

Hi Fabien,

Wondering if you have anything else here? I'm happy for the v13
version to be marked as ready for committer.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello David,

> Wondering if you have anything else here? I'm happy for the v13
> version to be marked as ready for committer.

I still have a few comments.

Patch applies cleanly, compiles, global & local make check are ok.

Typos and style in the doc:

     "However, since, by default this option generates ..."
     "However, since this option, by default, generates ..."

I'd suggest a more straightforward to my mind and ear: "However, since by 
default the option generates ..., ....", although beware that I'm not a 
native English speaker.

I do not understand why dump_inserts declaration has left the "flags for 
options" section.

I'd suggest not to rely on "atoi" because it does not check the argument 
syntax, so basically anything is accepted, eg "1O" is 1;

On "if (!dopt->do_nothing) $1 else $2;", I'd rather use a straight 
condition "if (dopt->do_nothing) $2 else $1;" (two instances).

There is a test, that is good! Charater "." should be backslashed in the 
regexpr. I'd consider also introducing limit cases: empty table, empty 
columns by creating corresponding tables and using -t repeatedly.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Sat, 2 Feb 2019 at 21:26, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I do not understand why dump_inserts declaration has left the "flags for
> options" section.

I moved that because it's no longer just a flag. It now stores an int value.

> I'd suggest not to rely on "atoi" because it does not check the argument
> syntax, so basically anything is accepted, eg "1O" is 1;

Seems like it's good enough for --jobs and --compress.   Do you think
those should be changed too? or what's the reason to hold
--rows-per-insert to a different standard?

> There is a test, that is good! Charater "." should be backslashed in the
> regexpr.

Yeah, you're right.   I wonder if we should fix the test of them in
another patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello David,

>> I do not understand why dump_inserts declaration has left the "flags for
>> options" section.
>
> I moved that because it's no longer just a flag. It now stores an int value.

Hmmm. Indeed, all th "int"s of this section should be "bool" instead. Now, 
some "flags" do not appear although the culd (clear, createdb, blobs), so 
the logic is kinda fuzzy anyway. Do as you wish.

>> I'd suggest not to rely on "atoi" because it does not check the argument
>> syntax, so basically anything is accepted, eg "1O" is 1;
>
> Seems like it's good enough for --jobs and --compress.   Do you think
> those should be changed too? or what's the reason to hold
> --rows-per-insert to a different standard?

I think that there is a case for avoiding sloppy "good enough" programming 
practices:-) Alas, as you point out, "atoi" is widely used. I'm campaining 
to avoid adding more of them. There has been some push to actually remove 
"atoi" when not appropriate, eg from "libpq". I'd suggest to consider 
starting doing the right thing, and left fixing old patterns to another 
patch.

>> There is a test, that is good! Charater "." should be backslashed in the
>> regexpr.
>
> Yeah, you're right.   I wonder if we should fix the test of them in
> another patch.

From a software engineering perspective, I'd say that a feature and its 
tests really belong to the same patch.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Sun, 3 Feb 2019 at 21:00, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> >> There is a test, that is good! Charater "." should be backslashed in the
> >> regexpr.
> >
> > Yeah, you're right.   I wonder if we should fix the test of them in
> > another patch.
>
> From a software engineering perspective, I'd say that a feature and its
> tests really belong to the same patch.

I meant to say "fix the rest" if them, not "the test of them".


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Sun, Feb 3, 2019 at 11:00 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello David,

>> I do not understand why dump_inserts declaration has left the "flags for
>> options" section.
>
> I moved that because it's no longer just a flag. It now stores an int value.

Hmmm. Indeed, all th "int"s of this section should be "bool" instead. Now,
some "flags" do not appear although the culd (clear, createdb, blobs), so
the logic is kinda fuzzy anyway. Do as you wish.

>> I'd suggest not to rely on "atoi" because it does not check the argument
>> syntax, so basically anything is accepted, eg "1O" is 1;
>
> Seems like it's good enough for --jobs and --compress.   Do you think
> those should be changed too? or what's the reason to hold
> --rows-per-insert to a different standard?

I think that there is a case for avoiding sloppy "good enough" programming
practices:-) Alas, as you point out, "atoi" is widely used. I'm campaining
to avoid adding more of them. There has been some push to actually remove
"atoi" when not appropriate, eg from "libpq". I'd suggest to consider
starting doing the right thing, and left fixing old patterns to another
patch.


 
at least for processing user argument i think it is better to use strtol or other
function that have better error handling. i can make a patch that change usage
of atoi for user argument processing after getting feedback from here or i will do
simultaneously 

regards
Surafel
 

Re: pg_dump multi VALUES INSERT

From
Michael Paquier
Date:
On Sun, Feb 03, 2019 at 01:21:45PM +0300, Surafel Temesgen wrote:
> at least for processing user argument i think it is better to use strtol or
> other
> function that have better error handling. i can make a patch that change
> usage
> of atoi for user argument processing after getting feedback from here or i
> will do
> simultaneously

Moved the patch to next CF for now, the discussion is going on.
--
Michael

Attachment

Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Sat, Feb 2, 2019 at 11:26 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello David,

> Wondering if you have anything else here? I'm happy for the v13
> version to be marked as ready for committer.

I still have a few comments.

Patch applies cleanly, compiles, global & local make check are ok.

Typos and style in the doc:

        "However, since, by default this option generates ..."
        "However, since this option, by default, generates ..."

I'd suggest a more straightforward to my mind and ear: "However, since by
default the option generates ..., ....", although beware that I'm not a
native English speaker.


fixed

I'd suggest not to rely on "atoi" because it does not check the argument
syntax, so basically anything is accepted, eg "1O" is 1;

i change it to strtol

On "if (!dopt->do_nothing) $1 else $2;", I'd rather use a straight
condition "if (dopt->do_nothing) $2 else $1;" (two instances).

fixed

regards
Surafel
Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
Reviewing pg_dump-rows-per-insert-option-v14.

Mostly going back over things that Fabien mentioned:

On Sat, 2 Feb 2019 at 21:26, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> There is a test, that is good! Charater "." should be backslashed in the
> regexpr. I'd consider also introducing limit cases: empty table, empty
> columns by creating corresponding tables and using -t repeatedly.

+ (?:INSERT\ INTO\ dump_test.test_table\ VALUES\ \(\d,\ NULL,\ NULL,\
NULL\),\ \(\d,\ NULL,\ NULL,\ NULL\),\ \(\d,\ NULL,\ NULL,\
NULL\);\n){3}

the . here before the table name needs to be escaped. The ones missing
in the existing tests should have been fixed by d07fb6810e.

There's also the additional tests that Fabien mentions.

Also, maybe one for Fabien (because he seems keen on keeping the
--rows-per-insert validation code)

strtol() returns a long. dump_inserts is an int, so on machines where
sizeof(long) == 8 and sizeof(int) == 4 (most machines, these days) the
validation is not bulletproof.  This could lead to:

$ pg_dump --rows-per-insert=2147483648
pg_dump: rows-per-insert must be a positive number

For me, I was fine with the atoi() code that the other options use,
but maybe Fabien has a problem with the long vs int?

It would be simple to workaround by assigning the strtol() to a long
and making the ERANGE test check for ERANGE or  ... > PG_INT_MAX

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Mon, Feb 11, 2019 at 10:20 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
Reviewing pg_dump-rows-per-insert-option-v14.
Also, maybe one for Fabien (because he seems keen on keeping the
--rows-per-insert validation code)

strtol() returns a long. dump_inserts is an int, so on machines where
sizeof(long) == 8 and sizeof(int) == 4 (most machines, these days) the
validation is not bulletproof.  This could lead to:

$ pg_dump --rows-per-insert=2147483648
pg_dump: rows-per-insert must be a positive number

fixed


For me, I was fine with the atoi() code that the other options use,
but maybe Fabien has a problem with the long vs int?


The main issue with atoi() is it does not detect errors and return 0 for
both invalid input and input value 0 but in our case it doesn't case a
problem because it error out for value 0. but for example in compress Level
if invalid input supplied it silently precede as input value 0 is supplied.
 
regards
Surafel
Attachment

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Wed, 13 Feb 2019 at 19:36, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Mon, Feb 11, 2019 at 10:20 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
>>
>> Reviewing pg_dump-rows-per-insert-option-v14.
>>
>> Also, maybe one for Fabien (because he seems keen on keeping the
>> --rows-per-insert validation code)
>>
>> strtol() returns a long. dump_inserts is an int, so on machines where
>> sizeof(long) == 8 and sizeof(int) == 4 (most machines, these days) the
>> validation is not bulletproof.  This could lead to:
>>
>> $ pg_dump --rows-per-insert=2147483648
>> pg_dump: rows-per-insert must be a positive number
>
>
> fixed

Thanks.

I see you didn't touch the tests yet, so I'll set this back to waiting
on author.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Surafel Temesgen
Date:


On Sat, Feb 2, 2019 at 11:26 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
 
There is a test, that is good! Charater "." should be backslashed in the
regexpr. I'd consider also introducing limit cases: empty table, empty
columns by creating corresponding tables and using -t repeatedly

I see that there are already a test for zero column table in test_fourth_table_zero_col
and if am not wrong table_index_stats is empty table

regards
Surafel

Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Tue, 19 Feb 2019 at 02:34, Surafel Temesgen <surafel3000@gmail.com> wrote:
> I see that there are already a test for zero column table in test_fourth_table_zero_col
> and if am not wrong table_index_stats is empty table

Maybe Fabien would like to see a test that dumps that table with
--rows-per-insert=<something above one> to ensure the output remains
as the other test.  I think it might be a good idea too.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Fri, 22 Feb 2019 at 14:40, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Tue, 19 Feb 2019 at 02:34, Surafel Temesgen <surafel3000@gmail.com> wrote:
> > I see that there are already a test for zero column table in test_fourth_table_zero_col
> > and if am not wrong table_index_stats is empty table
>
> Maybe Fabien would like to see a test that dumps that table with
> --rows-per-insert=<something above one> to ensure the output remains
> as the other test.  I think it might be a good idea too.

This patch was failing to build due to the new extra_float_digits
option that's been added to pg_dump.  It was adding an additional case
for 8 in the getopt_long switch statement. In the attached, I've
changed it to use value 9 and 10 for the new options.

I also went ahead and added the zero column test that Fabien
mentioned. Also added the missing backslash from the other test that
had been added.

Fabien also complained about some wording in the docs. I ended up
changing this a little bit as I thought the change was a little
uninformative about what rows won't be restored when an INSERT fails.
I've changed this so that it mentions that all rows which are part of
the same INSERT command will fail in the restore.

I think this can be marked as ready for committer now, but I'll defer
to Fabien to see if he's any other comments.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello David & Surafel,

> I think this can be marked as ready for committer now, but I'll defer
> to Fabien to see if he's any other comments.

Patch v16 applies and compiles cleanly, local and global "make check" 
are ok. Doc build is ok.

I did some manual testing with limit cases which did work. Good.

Although I'm all in favor of checking the int associated to the option, I 
do not think that it warrants three checks and messages. I would suggest 
to factor them all as just one check and one (terse) message.

Option "--help" line: number of row*s* ?

About the output: I'd suggest to indent one line per row, something like:

   INSERT INTO foo VALUES
     (..., ..., ..., ...),
     (..., ..., ..., ...),
     (..., ..., ..., ...);

so as to avoid very very very very very very very very very very very very 
very very very very long lines in the output.

I'd suggest to add test tables with (1) no rows and (2) no columns but a 
few rows, with multiple --table options.

-- 
Fabien.


Re: pg_dump multi VALUES INSERT

From
Michael Paquier
Date:
On Sat, Mar 02, 2019 at 08:01:50AM +0100, Fabien COELHO wrote:
> About the output: I'd suggest to indent one line per row, something like:
>
>   INSERT INTO foo VALUES
>     (..., ..., ..., ...),
>     (..., ..., ..., ...),
>     (..., ..., ..., ...);
>
> so as to avoid very very very very very very very very very very very very
> very very very very long lines in the output.

Note: folks sometimes manually edit the dump file generated.  So
having one row/SQL query/VALUE per line really brings a lot of value.
--
Michael

Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
On 2019-Mar-02, Fabien COELHO wrote:

> Although I'm all in favor of checking the int associated to the option, I do
> not think that it warrants three checks and messages. I would suggest to
> factor them all as just one check and one (terse) message.

I suggest ("rows-per-insert must be in range 1..%d", INT_MAX), like
extra_float_digits and compression level.

> About the output: I'd suggest to indent one line per row, something like:
> 
>   INSERT INTO foo VALUES
>     (..., ..., ..., ...),
>     (..., ..., ..., ...),
>     (..., ..., ..., ...);

+1.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
Thanks for looking at this again.

On Sat, 2 Mar 2019 at 20:01, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Although I'm all in favor of checking the int associated to the option, I
> do not think that it warrants three checks and messages. I would suggest
> to factor them all as just one check and one (terse) message.

Yeah. I've been trying to keep that area sane for a while, so I agree
that one message is fine. Done that way in the attached and put back
the missing ERANGE check.

> Option "--help" line: number of row*s* ?

Oops. Fixed.

> About the output: I'd suggest to indent one line per row, something like:
>
>    INSERT INTO foo VALUES
>      (..., ..., ..., ...),
>      (..., ..., ..., ...),
>      (..., ..., ..., ...);

Reasonable. Change it to that. I put a tab at the start of those
lines.   There's still the possibility that one 1 final row makes up
the final INSERT.  These will still span multiple lines. I don't think
there's anything that can reasonably be done about that.

> I'd suggest to add test tables with (1) no rows and (2) no columns but a
> few rows, with multiple --table options.

I didn't do that. I partially think that you're asking for tests to
test existing behaviour and partly because perl gives me a sore head.
Maybe Surafel want to do that?

v17 attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
Pushed, thanks!

If anyone is feeling inspired, one additional test we could use is
--rows-per-insert together with --on-conflict-do-nothing.

I made a couple of edits to v17 before pushing,

* rename strtol endptr variable so that it can be used by other strtol
calls, if we ever have them

* use pre-increment in if() test rather than separate line with
post-increment; reduces line count by 2.

* reworded --help output to: "number of rows per INSERT; implies --inserts"

* added one row-ending archputs(")") which makes the repeated
statement-ending archputs() match exactly.  (Negligible slowdown, I
expect)

* moved DUMP_DEFAULT_ROWS_PER_INSERT to pg_dump.c from pg_dump.h

* there was a space-before-comma in an error message, even immortalized
in the test expected output.

* remove the rows_per_insert_zero_col dump output file; the test can be
done by adding the table to the rows_per_insert file.  Add one more row
to that zero-column table, so that the INSERT .. DEFAULT VALUES test
verifies the case with more than one row.

* changed the rows_per_insert to use 4 rows per insert rather than
three; that improves coverage (table had 9 rows so it was always hitting
the case where a full statement is emitted.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_dump multi VALUES INSERT

From
David Rowley
Date:
On Fri, 8 Mar 2019 at 01:46, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Pushed, thanks!

> I made a couple of edits to v17 before pushing,

Thank you for making those changes and for pushing it.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pg_dump multi VALUES INSERT

From
Peter Eisentraut
Date:
Shouldn't the --rows-per-insert option also be available via pg_dumpall?
 All the other options for switching between COPY and INSERT are
settable in pg_dumpall.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
On 2019-Jun-14, Peter Eisentraut wrote:

> Shouldn't the --rows-per-insert option also be available via pg_dumpall?
>  All the other options for switching between COPY and INSERT are
> settable in pg_dumpall.

Uh, yeah, absolutely.

Surafel, are you in a position to provide a patch for that quickly?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump multi VALUES INSERT

From
Fabien COELHO
Date:
Hello Alvaro,

>> Shouldn't the --rows-per-insert option also be available via pg_dumpall?
>>  All the other options for switching between COPY and INSERT are
>> settable in pg_dumpall.
>
> Uh, yeah, absolutely.
>
> Surafel, are you in a position to provide a patch for that quickly?

End of the week, more time, easy enough and I should have seen the issue 
while reviewing. Patch attached.

BTW, is the libpq hostaddr fix ok?

-- 
Fabien.
Attachment

Re: pg_dump multi VALUES INSERT

From
Alvaro Herrera
Date:
On 2019-Jun-14, Fabien COELHO wrote:

> 
> Hello Alvaro,
> 
> > > Shouldn't the --rows-per-insert option also be available via pg_dumpall?
> > >  All the other options for switching between COPY and INSERT are
> > > settable in pg_dumpall.
> > 
> > Uh, yeah, absolutely.
> > 
> > Surafel, are you in a position to provide a patch for that quickly?
> 
> End of the week, more time, easy enough and I should have seen the issue
> while reviewing. Patch attached.

Hello Fabien

Thanks for producing a fix so quickly.  David Rowley mentioned a couple
of issues OOB (stanzas were put in the wrong places, when considering
alphabetical ordering and such).  It's pushed now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services