Thread: pg_dump multi VALUES INSERT
Hello,
According to the documentation –inserts option is mainly useful for making dumps that can be loaded into non-PostgreSQL databases and to reduce the amount of rows that might lost during error in reloading but multi values insert command are equally portable and compact and also faster to reload than single row statement. I think it deserve an option of its own
The patch attached add additional option for multi values insert statement with a default values of 100 row per statement so the row lose during error is at most 100 rather than entire table.
Comments?
Regards
Surafel
Attachment
Surafel Temesgen <surafel3000@gmail.com> writes: > According to the documentation –inserts option is mainly useful for making > dumps that can be loaded into non-PostgreSQL databases and to reduce the > amount of rows that might lost during error in reloading but multi values > insert command are equally portable and compact and also faster to reload > than single row statement. I think it deserve an option of its own I don't actually see the point of this. If you want dump/reload speed you should be using COPY. If that isn't your first priority, it's unlikely that using an option like this would be a good idea. It makes the effects of a bad row much harder to predict, and it increases your odds of OOM problems with wide rows substantially. I grant that COPY might not be an option if you're trying to transfer data to a different DBMS, but the other problems seem likely to apply anywhere. The bad-data hazard, in particular, is probably a much larger concern than it is for Postgres-to-Postgres transfers. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Surafel Temesgen <surafel3000@gmail.com> writes: > > According to the documentation –inserts option is mainly useful for making > > dumps that can be loaded into non-PostgreSQL databases and to reduce the > > amount of rows that might lost during error in reloading but multi values > > insert command are equally portable and compact and also faster to reload > > than single row statement. I think it deserve an option of its own > > I don't actually see the point of this. If you want dump/reload speed > you should be using COPY. If that isn't your first priority, it's > unlikely that using an option like this would be a good idea. It makes > the effects of a bad row much harder to predict, and it increases your > odds of OOM problems with wide rows substantially. > > I grant that COPY might not be an option if you're trying to transfer > data to a different DBMS, but the other problems seem likely to apply > anywhere. The bad-data hazard, in particular, is probably a much larger > concern than it is for Postgres-to-Postgres transfers. I can't say that I can really buy off on this argument. The point of it is that it makes loading into other RDBMS faster. Yes, it has many of the same issues as our COPY does, but we support it because it's much faster. The same is true here, just for other databases, so I'm +1 on the general idea. I've not looked at the patch itself at all, yet. Thanks! Stephen
Attachment
> The patch attached add additional option for multi values insert statement > with a default values of 100 row per statement so the row lose during error > is at most 100 rather than entire table. Patch does not seem to apply anymore, could you rebase? -- Fabien.
On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote: > The point of it is that it makes loading into other RDBMS faster. Yes, > it has many of the same issues as our COPY does, but we support it > because it's much faster. The same is true here, just for other > databases, so I'm +1 on the general idea. Well, the patch author has mentioned that he cares about also being able to detect errors when processing the dump, which multi inserts make that easier to check for. However, even if you specify --data-only you still need to worry about the first SET commands ahead, which requires manual handling of the dump... I am honestly not convinced that it is worth complicating pg_dump for that, as there is no guarantee either that the DDLs generated by pg_dump will be compatible with what other systems expect. This kind of compatibility for fetch and reload can also be kind of tricky with portability issues, so I'd rather let this stuff being handled correctly by other tools like pgloader or others rather than expecting to get this stuff half-baked within Postgres core tools. -- Michael
Attachment
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote: > > The point of it is that it makes loading into other RDBMS faster. Yes, > > it has many of the same issues as our COPY does, but we support it > > because it's much faster. The same is true here, just for other > > databases, so I'm +1 on the general idea. > > Well, the patch author has mentioned that he cares about also being able > to detect errors when processing the dump, which multi inserts make that > easier to check for. However, even if you specify --data-only you still > need to worry about the first SET commands ahead, which requires manual > handling of the dump... That's hardly a serious complication.. > I am honestly not convinced that it is worth complicating pg_dump for > that, as there is no guarantee either that the DDLs generated by pg_dump > will be compatible with what other systems expect. This kind of > compatibility for fetch and reload can also be kind of tricky with > portability issues, so I'd rather let this stuff being handled correctly > by other tools like pgloader or others rather than expecting to get this > stuff half-baked within Postgres core tools. I can see an argument for not wanting to complicate pg_dump, but we've explicitly stated that the purpose of --inserts is to facilitate restoring into other database systems and I don't agree that we should just punt on that entirely. For better or worse, there's an awful lot of weight put behind things which are in core and we should take care to do what we can to make those things better, especially when someone is proposing a patch to improve the situation. Sure, the patch might need work or have other issues, but that's an opportunity for us to provide feedback to the author and encourage them to improve the patch. As for the other things that make it difficult to use pg_dump to get a result out that can be loaded into other database systems- let's try to improve on that too. Having an option to skip the 'setup' bits, such as the SET commands, certainly wouldn't be hard. I certainly don't see us adding code to pg_dump to handle 'fetch and reload' into some non-PG system, or, really, even into a PG system, and that certainly isn't something the patch does, so I don't think it's a particularly interesting argument. Users can handle that as needed themselves. In other words, none of the arguments put forth really seem to be a reason to reject the general idea of this patch, so I'm still +1 on that. Having just glanced over the patch quickly, I think I would have done something like '--inserts=100' as the way to enable it instead of adding a new option though. Not that I feel very strongly about it. Thanks! Stephen
Attachment
Patch does not seem to apply anymore, could you rebase?
The attached patch is a rebased version and work by ‘inserts=100’ as Stephen suggest
regards
Attachment
On 2018-Nov-06, Surafel Temesgen wrote: > hi, > > On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Patch does not seem to apply anymore, could you rebase? > > > The attached patch is a rebased version and work by ‘inserts=100’ as > Stephen suggest I thought the suggestion was that the number could be any positive integer, not hardcoded 100. It shouldn't take much more code to handle it that way, which makes more sense to me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Nov-06, Surafel Temesgen wrote:
> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest
I thought the suggestion was that the number could be any positive
integer, not hardcoded 100. It shouldn't take much more code to handle
it that way, which makes more sense to me.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Nov-06, Surafel Temesgen wrote:
> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest
I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.
It shouldn't take much more code to handle
it that way, which makes more sense to me
yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert statement and if it is not specified one row per statement used
regards
Surafel
Attachment
> On Thu, Nov 8, 2018 at 2:03 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > > yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert statementand if it is not specified one row per statement used Hi, Unfortunately, patch needs to be rebased, could you please post an updated version?
Unfortunately, patch needs to be rebased, could you please post an updated
version?
Attachment
Hello Surafel, > Thank you for informing, Here is an updated patch against current master Patch applies cleanly, compiles, "make check" is okay, but given that the feature is not tested... Feature should be tested somewhere. ISTM that command-line switches with optional arguments should be avoided: This feature is seldom used (hmmm... 2 existing instances), because it interferes with argument processing if such switches are used as the last one. It is only okay with commands which do not expect arguments. For backward compatibility, this suggests to add another switch, eg --insert-multi=100 or whatever, which would possibly default to 100. The alternative is to break compatibility with adding a mandatory argument, but I guess it would not be admissible to committers. Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to avoid it and use some stricter option and error out on malformed integers. The --help output does not document the --inserts argument, nor the documentation. There is an indendation issue within the while loop. Given that the implementation is largely a copy-paste of the preceding function, I'd suggest to simply extend it so that it takes into account the "multi insert" setting and default to the previous behavior if not set. -- Fabien.
Hello Surafel,
> Thank you for informing, Here is an updated patch against current master
Patch applies cleanly, compiles, "make check" is okay, but given that the
feature is not tested...
Feature should be tested somewhere.
ISTM that command-line switches with optional arguments should be avoided:
This feature is seldom used (hmmm... 2 existing instances), because it
interferes with argument processing if such switches are used as the last
one. It is only okay with commands which do not expect arguments. For
backward compatibility, this suggests to add another switch, eg
--insert-multi=100 or whatever, which would possibly default to 100. The
alternative is to break compatibility with adding a mandatory argument,
but I guess it would not be admissible to committers.
Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to
avoid it and use some stricter option and error out on malformed integers.
The --help output does not document the --inserts argument, nor the
documentation.
There is an indendation issue within the while loop.
Given that the implementation is largely a copy-paste of the preceding
function, I'd suggest to simply extend it so that it takes into account
the "multi insert" setting and default to the previous behavior if not
set.
At first i also try to do it like that but it seems the function will became long and more complex to me
Regards
Surafel
Attachment
> At first i also try to do it like that but it seems the function will > became long and more complex to me Probably. But calling it with size 100 should result in the same behavior, so it is really just an extension of the preceeding one? Or am I missing something? -- Fabien.
> At first i also try to do it like that but it seems the function will
> became long and more complex to me
Probably. But calling it with size 100 should result in the same behavior,
so it is really just an extension of the preceeding one? Or am I missing
something?
Just looking at the v5 patch, it seems not to handle 0 column tables correctly. For example: # create table t(); # insert into t default values; # insert into t default values; $ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql # \i dump.sql [...] INSERT 0 1 psql:dump.sql:35: ERROR: syntax error at or near ")" LINE 1: ); ^ I'm not aware of a valid way to insert multiple 0 column rows in a single INSERT statement, so not sure how you're going to handle that case. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Just looking at the v5 patch, it seems not to handle 0 column tables correctly.
For example:
# create table t();
# insert into t default values;
# insert into t default values;
$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql
# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR: syntax error at or near ")"
LINE 1: );
^
Attachment
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <surafel3000@gmail.com> wrote: > On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote: >> Just looking at the v5 patch, it seems not to handle 0 column tables correctly. > The attach patch contain a fix for it + /* if it is zero-column table then we're done */ + if (nfields == 0) + { + archputs(insertStmt->data, fout); + continue; + } So looks like you're falling back on one INSERT per row for this case. Given that this function is meant to be doing 'dump_inserts_multiple' INSERTs per row, I think the comment should give some details of why we can't do multi-inserts, and explain the reason for it. "we're done" is just not enough detail. I've not looked at the rest of the patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
>> Just looking at the v5 patch, it seems not to handle 0 column tables correctly.
> The attach patch contain a fix for it
+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }
So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.
Attachment
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote: > On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> > At first i also try to do it like that but it seems the function will >> > became long and more complex to me >> >> Probably. But calling it with size 100 should result in the same behavior, >> so it is really just an extension of the preceeding one? Or am I missing >> something? >> > > Specifying table data using single value insert statement and user specified values insert statement > have enough deference that demand to be separate function and they are not the same thing that should implement > with the same function. Regarding code duplication i think the solution is making those code separate function > and call at appropriate place. I don't really buy this. I've just hacked up a version of dumpTableData_insert() which supports a variable number rows per statement. It seems fairly clean and easy to me. Likely the fact that this is very possible greatly increases the chances of this getting in since it gets rid of the code duplication. I did also happen to move the row building code out of the function into its own function, but that's not really required. I just did that so I could see all the code in charge of terminating each statement on my screen without having to scroll. I've not touched any of the plumbing work to plug the rows_per_statement variable into the command line argument. So it'll need a bit of merge work with the existing patch. There will need to be some code that ensures that the user does not attempt to have 0 rows per statement. The code I wrote won't behave well if that happens. ... Checks existing patch ... I see you added a test, but missed checking for 0. That needs to be fixed. + if (dopt.dump_inserts_multiple < 0) + { + write_msg(NULL, "argument of --insert-multi must be positive number\n"); + exit_nicely(1); + } I also didn't adopt passing the rows-per-statement into the FETCH. I think that's a very bad idea and we should keep that strictly at 100. I don't see any reason to tie the two together. If a user wants 10 rows per statement, do we really want to FETCH 10 times more often? And what happens when they want 1 million rows per statement? We've no reason to run out of memory from this since we're just dumping the rows out to the archive on each row. 1. + Specify the number of values per <command>INSERT</command> command. + This will make the dump file smaller than <option>--inserts</option> + and it is faster to reload but lack per row data lost on error + instead entire affected insert statement data lost. Unsure what you mean about "data lost". It also controls the number of "rows" per <command>INSERT</command> statement, not the number of values. I think it would be fine just to say: + When using <option>--inserts</option>, this allows the maximum number + of rows per <command>INSERT</command> statement to be specified. + This setting defaults to 1. I've used "maximum" there as the final statement on each table can have less and also 0-column tables will always be 1 row per statement. 2. Is --insert-multi a good name? What if they do --insert-multi=1? That's not very "multi". Is --rows-per-insert better? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
FWIW you can insert multiple zero-column rows with "insert into .. select union all select union all select". -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> > At first i also try to do it like that but it seems the function will
>> > became long and more complex to me
>>
>> Probably. But calling it with size 100 should result in the same behavior,
>> so it is really just an extension of the preceeding one? Or am I missing
>> something?
>>
>
> Specifying table data using single value insert statement and user specified values insert statement
> have enough deference that demand to be separate function and they are not the same thing that should implement
> with the same function. Regarding code duplication i think the solution is making those code separate function
> and call at appropriate place.
I don't really buy this. I've just hacked up a version of
dumpTableData_insert() which supports a variable number rows per
statement. It seems fairly clean and easy to me. Likely the fact that
this is very possible greatly increases the chances of this getting in
since it gets rid of the code duplication. I did also happen to move
the row building code out of the function into its own function, but
that's not really required. I just did that so I could see all the
code in charge of terminating each statement on my screen without
having to scroll. I've not touched any of the plumbing work to plug
the rows_per_statement variable into the command line argument. So
it'll need a bit of merge work with the existing patch. There will
need to be some code that ensures that the user does not attempt to
have 0 rows per statement. The code I wrote won't behave well if that
happens.
... Checks existing patch ...
I see you added a test, but missed checking for 0. That needs to be fixed.
+ if (dopt.dump_inserts_multiple < 0)
+ {
+ write_msg(NULL, "argument of --insert-multi must be positive number\n");
+ exit_nicely(1);
+ }
I also didn't adopt passing the rows-per-statement into the FETCH. I
think that's a very bad idea and we should keep that strictly at 100.
I don't see any reason to tie the two together. If a user wants 10
rows per statement, do we really want to FETCH 10 times more often?
And what happens when they want 1 million rows per statement? We've no
reason to run out of memory from this since we're just dumping the
rows out to the archive on each row.
+ Specify the number of values per <command>INSERT</command> command.
+ This will make the dump file smaller than <option>--inserts</option>
+ and it is faster to reload but lack per row data lost on error
+ instead entire affected insert statement data lost.
Unsure what you mean about "data lost". It also controls the number
of "rows" per <command>INSERT</command> statement, not the number of
values.
I think it would be fine just to say:
+ When using <option>--inserts</option>, this allows the maximum number
+ of rows per <command>INSERT</command> statement to be specified.
+ This setting defaults to 1.
2. Is --insert-multi a good name? What if they do --insert-multi=1?
That's not very "multi". Is --rows-per-insert better?
Attachment
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen <surafel3000@gmail.com> wrote: > The attache patch use your method mostly I disagree with the "mostly" part. As far as I can see, you took the idea and then made a series of changes to completely break it. For bonus points, you put back my comment change to make it incorrect again. Here's what I got after applying your latest patch: $ pg_dump --table=t --inserts --rows-per-insert=4 postgres [...] INSERT INTO public.t VALUES (1); ) INSERT INTO public.t VALUES (, ( 2); ) INSERT INTO public.t VALUES (, ( 3); ) INSERT INTO public.t VALUES (, ( 4); ); INSERT INTO public.t VALUES (5); ) INSERT INTO public.t VALUES (, ( 6); ) INSERT INTO public.t VALUES (, ( 7); ) INSERT INTO public.t VALUES (, ( 8); ); INSERT INTO public.t VALUES (9); ) ; I didn't test, but I'm pretty sure that's not valid INSERT syntax. I'd suggest taking my changes and doing the plumbing work to tie the rows_per_statement into the command line arg instead of how I left it hardcoded as 3. >> + When using <option>--inserts</option>, this allows the maximum number >> + of rows per <command>INSERT</command> statement to be specified. >> + This setting defaults to 1. >> > i change it too except "This setting defaults to 1" because it doesn't have default value. > 1 row per statement means --inserts option . If it does not default to 1 then what happens when the option is not specified? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 17, 2019 at 5:15 AM Surafel Temesgen <surafel3000@gmail.com> wrote: > On Fri, Jan 4, 2019 at 3:08 PM David Rowley <david.rowley@2ndquadrant.com> wrote: >> >> On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote: >> >> >> 2. Is --insert-multi a good name? What if they do --insert-multi=1? >> That's not very "multi". Is --rows-per-insert better? >> > > --rows-per-insert is better for me . Some thoughts/suggestions: + int dump_inserts_multiple; The option name uses rows, seems like this should mirror that and be named "dump_inserts_max_rows" + <varlistentry> + <term><option>--rows-per-insert</option></term> + <listitem> + <para> + When using <option>--rows-per-insert</option>, this allows the maximum number + of rows per <command>INSERT</command> statement to be specified. + </para> + </listitem> + </varlistentry> "When using <repeat option name from 20 characters ago>..." - no other option description uses this redundant language and this should not either. Just say what it does. This specifies the maximum number of rows (default 1) that will be attached to each <command>INSERT</command> command generated by the <option>--inserts</option> or <option>--column-inserts</option> options; exactly one of which must be specified as well. (see my note at the end) + printf(_(" --rows-per-insert number of row per INSERT command\n")); (maximum?) number of row[s] per INSERT command + qr/\Qpg_dump: option --on-conflict-do-nothing requires option --inserts , --rows-per-insert or --column-inserts\E/, + 'pg_dump: option --on-conflict-do-nothing requires option --inserts , --rows-per-insert or --column-inserts'); You don't put spaces on both sides of the comma, just after; and add a comma before the "or" (I think...not withstanding the below) I suggest we require that --rows-per-insert be dependent upon exactly one of --inserts or --column-inserts being set and not let it be set by itself (in which case the original message for --on-conflict-do-nothing is OK). David J.
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen <surafel3000@gmail.com> wrote:
> The attache patch use your method mostly
I disagree with the "mostly" part. As far as I can see, you took the
idea and then made a series of changes to completely break it. For
bonus points, you put back my comment change to make it incorrect
again.
Here's what I got after applying your latest patch:
$ pg_dump --table=t --inserts --rows-per-insert=4 postgres
[...]
INSERT INTO public.t VALUES (1);
)
INSERT INTO public.t VALUES (, ( 2);
)
INSERT INTO public.t VALUES (, ( 3);
)
INSERT INTO public.t VALUES (, ( 4);
);
INSERT INTO public.t VALUES (5);
)
INSERT INTO public.t VALUES (, ( 6);
)
INSERT INTO public.t VALUES (, ( 7);
)
INSERT INTO public.t VALUES (, ( 8);
);
INSERT INTO public.t VALUES (9);
)
;
I didn't test, but I'm pretty sure that's not valid INSERT syntax.
I'd suggest taking my changes and doing the plumbing work to tie the
rows_per_statement into the command line arg instead of how I left it
hardcoded as 3.
>> + When using <option>--inserts</option>, this allows the maximum number
>> + of rows per <command>INSERT</command> statement to be specified.
>> + This setting defaults to 1.
>>
> i change it too except "This setting defaults to 1" because it doesn't have default value.
> 1 row per statement means --inserts option .
If it does not default to 1 then what happens when the option is not
specified
Attachment
On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <surafel3000@gmail.com> wrote: > this happen because i don't disallow the usage of --inserts and --rows-per-insert > option together.it should be error out in those case.i correct it in attached patch I don't think it should be an error. It's not like the two options conflict. I imagined that you'd need to specify you want --inserts and optionally could control how many rows per statement that would be put in those commands. I'd be surprised to be confronted with an error for asking for that. It might be worth doing the same as what we do if --column-inserts is specified without --inserts. In this case we just do: /* --column-inserts implies --inserts */ if (dopt.column_inserts) dopt.dump_inserts = 1; If you do it that way you'll not need to modify the code much from how I wrote it. We can likely debate if we want --rows-per-insert to imply --inserts once there's a working patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <surafel3000@gmail.com> wrote:
> this happen because i don't disallow the usage of --inserts and --rows-per-insert
> option together.it should be error out in those case.i correct it in attached patch
I don't think it should be an error. It's not like the two options
conflict. I imagined that you'd need to specify you want --inserts and
optionally could control how many rows per statement that would be put
in those commands. I'd be surprised to be confronted with an error for
asking for that.
It might be worth doing the same as what we do if --column-inserts is
specified without --inserts. In this case we just do:
/* --column-inserts implies --inserts */
if (dopt.column_inserts)
dopt.dump_inserts = 1;
If you do it that way you'll not need to modify the code much from how
I wrote it. We can likely debate if we want --rows-per-insert to imply
--inserts once there's a working patch.
On Fri, Jan 18, 2019 at 5:02 AM Surafel Temesgen <surafel3000@gmail.com> wrote: > On Fri, Jan 18, 2019 at 2:29 PM David Rowley <david.rowley@2ndquadrant.com> wrote: >> >> On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <surafel3000@gmail.com> wrote: >> > this happen because i don't disallow the usage of --inserts and --rows-per-insert >> > option together.it should be error out in those case.i correct it in attached patch >> >> I don't think it should be an error. It's not like the two options >> conflict. I imagined that you'd need to specify you want --inserts and >> optionally could control how many rows per statement that would be put >> in those commands. I'd be surprised to be confronted with an error for >> asking for that. >> > > if you specified --inserts option you already specified the number of rows per statement which is 1 . > if more than one rows per statement needed it must be specified using --rows-per-insert > and specifying one row per statement using --inserts option at the same time specify > different number of rows per statement with --rows-per-insert option seems conflicting to me. So, the other way of looking at it - why do we even need an entirely new option. Modify --inserts to accept an optional integer value that defaults to 1 (I'm not sure how tricky dealing with optional option values is though...). --inserts-columns implies --inserts but if you want to change the number of rows you need to specify both (or add the same optional integer to --inserts-columns) David J.
On Tue, Dec 25, 2018 at 4:47 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > ISTM that command-line switches with optional arguments should be avoided: > This feature is seldom used (hmmm... 2 existing instances), because it > interferes with argument processing if such switches are used as the last > one. Excellent point; though avoiding adding yet another limited-use option seems like a fair trade-off here. Though maybe we also need to add the traditional "--" option as well. I'm not married to the idea though; but its also not like mis-interpreting the final argument as an integer instead of a database is going to be a silent error. David J.
On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen <surafel3000@gmail.com> wrote: > if you specified --inserts option you already specified the number of rows per statement which is 1 . > if more than one rows per statement needed it must be specified using --rows-per-insert > and specifying one row per statement using --inserts option at the same time specify > different number of rows per statement with --rows-per-insert option seems conflicting to me. So you're saying an INSERT, where you insert multiple rows in a single statement is not an insert? That logic surprises me. --inserts makes pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses INSERTs. I'd love to know why you think there's some conflict with that. By your logic, you could say --column-inserts and --inserts should also conflict, but they don't. --column-inserts happens to be coded to imply --inserts. I really suggest we follow the lead from that. Doing it this way reduces the complexity of the code where we build the INSERT statement. Remember that a patch that is overly complex has much less chance of making it. I'd really suggest you keep this as simple as possible. It also seems perfectly logical to me to default --rows-per-insert to 1 so that when --inserts is specified we do 1 row per INSERT. If the user changes that value to something higher then nothing special needs to happen as the function building the INSERT statement will always be paying attention to whatever the --rows-per-insert value is set to. That simplifies the logic meaning you don't need to check if --inserts was specified. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen <surafel3000@gmail.com> wrote:
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.
So you're saying an INSERT, where you insert multiple rows in a single
statement is not an insert? That logic surprises me. --inserts makes
pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
INSERTs. I'd love to know why you think there's some conflict with
that.
By your logic, you could say --column-inserts and --inserts should
also conflict, but they don't. --column-inserts happens to be coded to
imply --inserts. I really suggest we follow the lead from that. Doing
it this way reduces the complexity of the code where we build the
INSERT statement. Remember that a patch that is overly complex has
much less chance of making it. I'd really suggest you keep this as
simple as possible.
It also seems perfectly logical to me to default --rows-per-insert to
1 so that when --inserts is specified we do 1 row per INSERT. If the
user changes that value to something higher then nothing special needs
to happen as the function building the INSERT statement will always be
paying attention to whatever the --rows-per-insert value is set to.
That simplifies the logic meaning you don't need to check if --inserts
was specified.
Attachment
Nice stuff. Is it possible to avoid the special case for 0 columns by using the UNION ALL syntax I showed? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Surafel, > okay .thank you for explaining. i attach a patch corrected as such About this v9: applies cleanly, compiles, global and local "make check" ok. The option is not exercise in the TAP tests. I'd suggest that it should be tested on a small table with zero, 1, more than the value set number of rows. Maybe use -t and other options to reduce the output to the minimum. About the documentation: + When using <option>--rows-per-insert</option>, this allows the maximum number + of rows per <command>INSERT</command> statement to be specified. I'd suggest a more direct and simple style, something like: Set the maximum number of rows per INSERT statement. This option implies --inserts. Default to 1. About the help message, the new option expects an argument, but it does not show: + printf(_(" --rows-per-insert number of row per INSERT command\n")); About the code, maybe avoid using an int as a bool, eg: ... && !dopt.dump_inserts_multiple) -> ... && dopt.dump_inserts_multiple == 0) Spacing around operators, eg: "!=1)" -> "!= 1)" ISTM that the "dump_inserts_multiple" field is useless, you can reuse "dump_inserts" instead, i.e. --inserts sets it to 1 *if zero*, and --rows-per-inserts=XXX sets it to XXX. That would simplify the code significantly. ISTM that there are indentation issues, eg on added if (dopt->dump_inserts_multiple == 1) { The old code is not indented properly. -- Fabien.
On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Is it possible to avoid the special case for 0 columns by using the > UNION ALL syntax I showed? It would be possible, but my thoughts are that we're moving away from the SQL standard by doing so. Looking at the standard I see: <row value constructor element list> ::= <row value constructor element> [ { <comma> <row value constructor element> }... ] so it appears that multirow VALUES clauses are allowed. INSERT INTO ... DEFAULT VALUES; is standard too, but looking at SELECT, neither the target list or FROM clause is optional. You could maybe argue that 0-column tables are not standard anyway. Going by DROP COLUMN I see "4) C shall be a column of T and C shall not be the only column of T.". Are we the only database to break that? I think since pg_dump --inserts is meant to be for importing data into other databases then we should likely keep it as standard as possible. Another argument against is that we've only supported empty SELECT clauses since 9.4, so it may not help anyone who mysteriously wanted to import data into an old version. Maybe that's a corner case, but I'm sure 0 column tables are too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen <surafel3000@gmail.com> wrote: > okay .thank you for explaining. i attach a patch corrected as such I did a bit of work to this to fix a bunch of things: 1. Docs for --rows-per-insert didn't mention anything about a parameter. 2. You'd not followed the alphabetical order of how the parameters are documented. 3. Various parts of the docs claimed that --inserts just inserted 1 row per statement. Those needed to be updated. 4. New options out of order in --help. The rest were in alphabetical order. 5. DumpOptions struct variable was not in the right place. It was grouped in with some parameterless options. 6. Code in dumpTableData_insert() was convoluted. Not sure what you had added end_of_statement for or why you were checking PQntuples(res) == 0. You'd also made the number_of_row variable 1-based and set it to 1 when we had added 0 rows. You then checked for the existence of 1 row by checking the variable was > 1... That made very little sense to me. I've pretty much put back the code that I had sent to you previously, just without the part where I split the row building code out into another function. 7. A comment in dumpTableData_insert() claimed that the insertStmt would end in "VALUES(", but it'll end in "VALUES ". I had updated that in my last version, but you must have missed that. 8. I've made it so --rows-per-insert implies --inserts. This is aligned with how --column-inserts behaves. I changed a few other things. I simplified the condition that raises an error when someone does --on-conflict-do-nothing without the --inserts option. There was no need to check for the other options that imply --inserts since that will already be enabled if one of the other options has. I also removed most of the error checking you'd added to ensure that the --rows-per-insert parameter was a number. I'd have kept this but I saw that we do nothing that special for the compression option. I didn't see why --rows-per-insert was any more special. It was quite a bit of code for very little reward. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hello David & Surafel, About this v10: Patch applies and compiles cleanly, local & global "make check" ok. A few comments, possibly redundant with some already in the thread. Out of abc-order rows-per-inserts option in getopt struct. help string does not specify the expected argument. I still think that the added rows_per_insert field is useless, ISTM That "int dump_inserts" can be reused, and you could also drop boolean got_rows_per_insert. The feature is not tested anywhere. I still think that there should be a test on empty/small/larger-than-rows-per-insert tables, possibly added to existing TAP-tests. -- Fabien.
On Wed, 23 Jan 2019 at 22:08, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Out of abc-order rows-per-inserts option in getopt struct. I missed that. Thanks. Fixed in attached. > help string does not specify the expected argument. Also fixed. > I still think that the added rows_per_insert field is useless, ISTM That > "int dump_inserts" can be reused, and you could also drop boolean > got_rows_per_insert. I thought about this and looked into it, but I decided it didn't look like a smart thing to do. The reason is that if --inserts sets dump_inserts to 1 then --rows-per-insert sets it to something else that's likely fine, but if that happens in the opposite order then the --rows-per-insert gets overwritten with 1. The bad news is the order that happens is defined by the order of the command line args. It might be possible to make it work by having --inserts set some other variable, then set dump_inserts to 1 if it's set to 0 and the other variable is set to >= 1... but that requires another variable, which is what you want to avoid... I think it's best to have a variable per argument. I could get rid of the got_rows_per_insert variable, but it would require setting the default value for rows_per_insert in the main() function rather than in InitDumpOptions(). I thought InitDumpOptions() looked like just the place to do this, so went with that option. To make it work without got_rows_per_insert, rows_per_insert would have to be 0 by default and we'd know we saw a --rows-per-insert command line arg by the fact that rows_per_insert was non-zero. Would you rather have it that way? > The feature is not tested anywhere. I still think that there should be a > test on empty/small/larger-than-rows-per-insert tables, possibly added to > existing TAP-tests. I was hoping to get away with not having to do that... mainly because I've no idea how. Please have at it if you know how it's done. FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable to make it pass because I couldn't really figure out how the regex matching is meant to work. It does not seem to be explained very well in the comments and I lack patience for Perl. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hello David, > I thought about this and looked into it, but I decided it didn't look > like a smart thing to do. The reason is that if --inserts sets > dump_inserts to 1 then --rows-per-insert sets it to something else > that's likely fine, but if that happens in the opposite order then the > --rows-per-insert gets overwritten with 1. You can test before doing that! case X: if (opt.dump_inserts == 0) opt.dump_inserts = 1; // otherwise option is already set > The bad news is the order that happens is defined by the order of the > command line args. > It might be possible to make it work by having --inserts set some other > variable, ISTM that it is enough to test whether the variable is zero. > then set dump_inserts to 1 if it's set to 0 and the other variable is > set to >= 1... but that requires another variable, which is what you > want to avoid... I still do not understand the need for another variable. int ninserts = 0; // default is to use copy while (getopt...) { switch (...) { case "--inserts": if (ninserts == 0) ninserts = 1; break; case "--rows-per-insert": ninserts = arg_value; checks... break; ... > I think it's best to have a variable per argument. I disagree, because it adds complexity where none is needed: here the new option is an extension of a previous one, thus the previous one just becomes a particular case, so it seems simpler to manage it as the particular case it is rather than a special case, creating the need for checking the consistency and so if two variables are used. > I could get rid of the got_rows_per_insert variable, but it would > require setting the default value for rows_per_insert in the main() > function rather than in InitDumpOptions(). I thought > InitDumpOptions() looked like just the place to do this, so went with > that option. To make it work without got_rows_per_insert, > rows_per_insert would have to be 0 by default and we'd know we saw a > --rows-per-insert command line arg by the fact that rows_per_insert > was non-zero. Would you rather have it that way? Yep, esp as rows_per_insert & dump_inserts could be the same. >> The feature is not tested anywhere. I still think that there should be a >> test on empty/small/larger-than-rows-per-insert tables, possibly added to >> existing TAP-tests. > > I was hoping to get away with not having to do that... mainly because > I've no idea how. Hmmm. That is another question! Maybe someone will help. -- Fabien.
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I still do not understand the need for another variable. > > int ninserts = 0; // default is to use copy > while (getopt...) > { > switch (...) { > case "--inserts": > if (ninserts == 0) ninserts = 1; > break; > case "--rows-per-insert": > ninserts = arg_value; > checks... > break; > ... I didn't think of that. Attached is a version that changes it to work along those lines. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2019-Jan-23, David Rowley wrote: > On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Is it possible to avoid the special case for 0 columns by using the > > UNION ALL syntax I showed? > > It would be possible, but my thoughts are that we're moving away from > the SQL standard by doing so. Ah, that's a good point that I missed -- I agree with your reasoning. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> The feature is not tested anywhere. I still think that there should be a > >> test on empty/small/larger-than-rows-per-insert tables, possibly added to > >> existing TAP-tests. > > > > I was hoping to get away with not having to do that... mainly because > > I've no idea how. > > Hmmm. That is another question! Maybe someone will help. Here's another version, same as before but with tests this time. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, 31 Jan 2019 at 11:49, David Rowley <david.rowley@2ndquadrant.com> wrote: > Here's another version, same as before but with tests this time. Hi Fabien, Wondering if you have anything else here? I'm happy for the v13 version to be marked as ready for committer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello David, > Wondering if you have anything else here? I'm happy for the v13 > version to be marked as ready for committer. I still have a few comments. Patch applies cleanly, compiles, global & local make check are ok. Typos and style in the doc: "However, since, by default this option generates ..." "However, since this option, by default, generates ..." I'd suggest a more straightforward to my mind and ear: "However, since by default the option generates ..., ....", although beware that I'm not a native English speaker. I do not understand why dump_inserts declaration has left the "flags for options" section. I'd suggest not to rely on "atoi" because it does not check the argument syntax, so basically anything is accepted, eg "1O" is 1; On "if (!dopt->do_nothing) $1 else $2;", I'd rather use a straight condition "if (dopt->do_nothing) $2 else $1;" (two instances). There is a test, that is good! Charater "." should be backslashed in the regexpr. I'd consider also introducing limit cases: empty table, empty columns by creating corresponding tables and using -t repeatedly. -- Fabien.
On Sat, 2 Feb 2019 at 21:26, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I do not understand why dump_inserts declaration has left the "flags for > options" section. I moved that because it's no longer just a flag. It now stores an int value. > I'd suggest not to rely on "atoi" because it does not check the argument > syntax, so basically anything is accepted, eg "1O" is 1; Seems like it's good enough for --jobs and --compress. Do you think those should be changed too? or what's the reason to hold --rows-per-insert to a different standard? > There is a test, that is good! Charater "." should be backslashed in the > regexpr. Yeah, you're right. I wonder if we should fix the test of them in another patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello David, >> I do not understand why dump_inserts declaration has left the "flags for >> options" section. > > I moved that because it's no longer just a flag. It now stores an int value. Hmmm. Indeed, all th "int"s of this section should be "bool" instead. Now, some "flags" do not appear although the culd (clear, createdb, blobs), so the logic is kinda fuzzy anyway. Do as you wish. >> I'd suggest not to rely on "atoi" because it does not check the argument >> syntax, so basically anything is accepted, eg "1O" is 1; > > Seems like it's good enough for --jobs and --compress. Do you think > those should be changed too? or what's the reason to hold > --rows-per-insert to a different standard? I think that there is a case for avoiding sloppy "good enough" programming practices:-) Alas, as you point out, "atoi" is widely used. I'm campaining to avoid adding more of them. There has been some push to actually remove "atoi" when not appropriate, eg from "libpq". I'd suggest to consider starting doing the right thing, and left fixing old patterns to another patch. >> There is a test, that is good! Charater "." should be backslashed in the >> regexpr. > > Yeah, you're right. I wonder if we should fix the test of them in > another patch. From a software engineering perspective, I'd say that a feature and its tests really belong to the same patch. -- Fabien.
On Sun, 3 Feb 2019 at 21:00, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> There is a test, that is good! Charater "." should be backslashed in the > >> regexpr. > > > > Yeah, you're right. I wonder if we should fix the test of them in > > another patch. > > From a software engineering perspective, I'd say that a feature and its > tests really belong to the same patch. I meant to say "fix the rest" if them, not "the test of them". -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello David,
>> I do not understand why dump_inserts declaration has left the "flags for
>> options" section.
>
> I moved that because it's no longer just a flag. It now stores an int value.
Hmmm. Indeed, all th "int"s of this section should be "bool" instead. Now,
some "flags" do not appear although the culd (clear, createdb, blobs), so
the logic is kinda fuzzy anyway. Do as you wish.
>> I'd suggest not to rely on "atoi" because it does not check the argument
>> syntax, so basically anything is accepted, eg "1O" is 1;
>
> Seems like it's good enough for --jobs and --compress. Do you think
> those should be changed too? or what's the reason to hold
> --rows-per-insert to a different standard?
I think that there is a case for avoiding sloppy "good enough" programming
practices:-) Alas, as you point out, "atoi" is widely used. I'm campaining
to avoid adding more of them. There has been some push to actually remove
"atoi" when not appropriate, eg from "libpq". I'd suggest to consider
starting doing the right thing, and left fixing old patterns to another
patch.
On Sun, Feb 03, 2019 at 01:21:45PM +0300, Surafel Temesgen wrote: > at least for processing user argument i think it is better to use strtol or > other > function that have better error handling. i can make a patch that change > usage > of atoi for user argument processing after getting feedback from here or i > will do > simultaneously Moved the patch to next CF for now, the discussion is going on. -- Michael
Attachment
Hello David,
> Wondering if you have anything else here? I'm happy for the v13
> version to be marked as ready for committer.
I still have a few comments.
Patch applies cleanly, compiles, global & local make check are ok.
Typos and style in the doc:
"However, since, by default this option generates ..."
"However, since this option, by default, generates ..."
I'd suggest a more straightforward to my mind and ear: "However, since by
default the option generates ..., ....", although beware that I'm not a
native English speaker.
I'd suggest not to rely on "atoi" because it does not check the argument
syntax, so basically anything is accepted, eg "1O" is 1;
On "if (!dopt->do_nothing) $1 else $2;", I'd rather use a straight
condition "if (dopt->do_nothing) $2 else $1;" (two instances).
Attachment
Reviewing pg_dump-rows-per-insert-option-v14. Mostly going back over things that Fabien mentioned: On Sat, 2 Feb 2019 at 21:26, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > There is a test, that is good! Charater "." should be backslashed in the > regexpr. I'd consider also introducing limit cases: empty table, empty > columns by creating corresponding tables and using -t repeatedly. + (?:INSERT\ INTO\ dump_test.test_table\ VALUES\ \(\d,\ NULL,\ NULL,\ NULL\),\ \(\d,\ NULL,\ NULL,\ NULL\),\ \(\d,\ NULL,\ NULL,\ NULL\);\n){3} the . here before the table name needs to be escaped. The ones missing in the existing tests should have been fixed by d07fb6810e. There's also the additional tests that Fabien mentions. Also, maybe one for Fabien (because he seems keen on keeping the --rows-per-insert validation code) strtol() returns a long. dump_inserts is an int, so on machines where sizeof(long) == 8 and sizeof(int) == 4 (most machines, these days) the validation is not bulletproof. This could lead to: $ pg_dump --rows-per-insert=2147483648 pg_dump: rows-per-insert must be a positive number For me, I was fine with the atoi() code that the other options use, but maybe Fabien has a problem with the long vs int? It would be simple to workaround by assigning the strtol() to a long and making the ERANGE test check for ERANGE or ... > PG_INT_MAX -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Reviewing pg_dump-rows-per-insert-option-v14.
Also, maybe one for Fabien (because he seems keen on keeping the
--rows-per-insert validation code)
strtol() returns a long. dump_inserts is an int, so on machines where
sizeof(long) == 8 and sizeof(int) == 4 (most machines, these days) the
validation is not bulletproof. This could lead to:
$ pg_dump --rows-per-insert=2147483648
pg_dump: rows-per-insert must be a positive number
For me, I was fine with the atoi() code that the other options use,
but maybe Fabien has a problem with the long vs int?
Attachment
On Wed, 13 Feb 2019 at 19:36, Surafel Temesgen <surafel3000@gmail.com> wrote: > On Mon, Feb 11, 2019 at 10:20 AM David Rowley <david.rowley@2ndquadrant.com> wrote: >> >> Reviewing pg_dump-rows-per-insert-option-v14. >> >> Also, maybe one for Fabien (because he seems keen on keeping the >> --rows-per-insert validation code) >> >> strtol() returns a long. dump_inserts is an int, so on machines where >> sizeof(long) == 8 and sizeof(int) == 4 (most machines, these days) the >> validation is not bulletproof. This could lead to: >> >> $ pg_dump --rows-per-insert=2147483648 >> pg_dump: rows-per-insert must be a positive number > > > fixed Thanks. I see you didn't touch the tests yet, so I'll set this back to waiting on author. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
There is a test, that is good! Charater "." should be backslashed in the
regexpr. I'd consider also introducing limit cases: empty table, empty
columns by creating corresponding tables and using -t repeatedly
On Tue, 19 Feb 2019 at 02:34, Surafel Temesgen <surafel3000@gmail.com> wrote: > I see that there are already a test for zero column table in test_fourth_table_zero_col > and if am not wrong table_index_stats is empty table Maybe Fabien would like to see a test that dumps that table with --rows-per-insert=<something above one> to ensure the output remains as the other test. I think it might be a good idea too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, 22 Feb 2019 at 14:40, David Rowley <david.rowley@2ndquadrant.com> wrote: > > On Tue, 19 Feb 2019 at 02:34, Surafel Temesgen <surafel3000@gmail.com> wrote: > > I see that there are already a test for zero column table in test_fourth_table_zero_col > > and if am not wrong table_index_stats is empty table > > Maybe Fabien would like to see a test that dumps that table with > --rows-per-insert=<something above one> to ensure the output remains > as the other test. I think it might be a good idea too. This patch was failing to build due to the new extra_float_digits option that's been added to pg_dump. It was adding an additional case for 8 in the getopt_long switch statement. In the attached, I've changed it to use value 9 and 10 for the new options. I also went ahead and added the zero column test that Fabien mentioned. Also added the missing backslash from the other test that had been added. Fabien also complained about some wording in the docs. I ended up changing this a little bit as I thought the change was a little uninformative about what rows won't be restored when an INSERT fails. I've changed this so that it mentions that all rows which are part of the same INSERT command will fail in the restore. I think this can be marked as ready for committer now, but I'll defer to Fabien to see if he's any other comments. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hello David & Surafel, > I think this can be marked as ready for committer now, but I'll defer > to Fabien to see if he's any other comments. Patch v16 applies and compiles cleanly, local and global "make check" are ok. Doc build is ok. I did some manual testing with limit cases which did work. Good. Although I'm all in favor of checking the int associated to the option, I do not think that it warrants three checks and messages. I would suggest to factor them all as just one check and one (terse) message. Option "--help" line: number of row*s* ? About the output: I'd suggest to indent one line per row, something like: INSERT INTO foo VALUES (..., ..., ..., ...), (..., ..., ..., ...), (..., ..., ..., ...); so as to avoid very very very very very very very very very very very very very very very very long lines in the output. I'd suggest to add test tables with (1) no rows and (2) no columns but a few rows, with multiple --table options. -- Fabien.
On Sat, Mar 02, 2019 at 08:01:50AM +0100, Fabien COELHO wrote: > About the output: I'd suggest to indent one line per row, something like: > > INSERT INTO foo VALUES > (..., ..., ..., ...), > (..., ..., ..., ...), > (..., ..., ..., ...); > > so as to avoid very very very very very very very very very very very very > very very very very long lines in the output. Note: folks sometimes manually edit the dump file generated. So having one row/SQL query/VALUE per line really brings a lot of value. -- Michael
Attachment
On 2019-Mar-02, Fabien COELHO wrote: > Although I'm all in favor of checking the int associated to the option, I do > not think that it warrants three checks and messages. I would suggest to > factor them all as just one check and one (terse) message. I suggest ("rows-per-insert must be in range 1..%d", INT_MAX), like extra_float_digits and compression level. > About the output: I'd suggest to indent one line per row, something like: > > INSERT INTO foo VALUES > (..., ..., ..., ...), > (..., ..., ..., ...), > (..., ..., ..., ...); +1. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for looking at this again. On Sat, 2 Mar 2019 at 20:01, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Although I'm all in favor of checking the int associated to the option, I > do not think that it warrants three checks and messages. I would suggest > to factor them all as just one check and one (terse) message. Yeah. I've been trying to keep that area sane for a while, so I agree that one message is fine. Done that way in the attached and put back the missing ERANGE check. > Option "--help" line: number of row*s* ? Oops. Fixed. > About the output: I'd suggest to indent one line per row, something like: > > INSERT INTO foo VALUES > (..., ..., ..., ...), > (..., ..., ..., ...), > (..., ..., ..., ...); Reasonable. Change it to that. I put a tab at the start of those lines. There's still the possibility that one 1 final row makes up the final INSERT. These will still span multiple lines. I don't think there's anything that can reasonably be done about that. > I'd suggest to add test tables with (1) no rows and (2) no columns but a > few rows, with multiple --table options. I didn't do that. I partially think that you're asking for tests to test existing behaviour and partly because perl gives me a sore head. Maybe Surafel want to do that? v17 attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Pushed, thanks! If anyone is feeling inspired, one additional test we could use is --rows-per-insert together with --on-conflict-do-nothing. I made a couple of edits to v17 before pushing, * rename strtol endptr variable so that it can be used by other strtol calls, if we ever have them * use pre-increment in if() test rather than separate line with post-increment; reduces line count by 2. * reworded --help output to: "number of rows per INSERT; implies --inserts" * added one row-ending archputs(")") which makes the repeated statement-ending archputs() match exactly. (Negligible slowdown, I expect) * moved DUMP_DEFAULT_ROWS_PER_INSERT to pg_dump.c from pg_dump.h * there was a space-before-comma in an error message, even immortalized in the test expected output. * remove the rows_per_insert_zero_col dump output file; the test can be done by adding the table to the rows_per_insert file. Add one more row to that zero-column table, so that the INSERT .. DEFAULT VALUES test verifies the case with more than one row. * changed the rows_per_insert to use 4 rows per insert rather than three; that improves coverage (table had 9 rows so it was always hitting the case where a full statement is emitted.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 8 Mar 2019 at 01:46, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Pushed, thanks! > I made a couple of edits to v17 before pushing, Thank you for making those changes and for pushing it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Shouldn't the --rows-per-insert option also be available via pg_dumpall? All the other options for switching between COPY and INSERT are settable in pg_dumpall. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Jun-14, Peter Eisentraut wrote: > Shouldn't the --rows-per-insert option also be available via pg_dumpall? > All the other options for switching between COPY and INSERT are > settable in pg_dumpall. Uh, yeah, absolutely. Surafel, are you in a position to provide a patch for that quickly? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, >> Shouldn't the --rows-per-insert option also be available via pg_dumpall? >> All the other options for switching between COPY and INSERT are >> settable in pg_dumpall. > > Uh, yeah, absolutely. > > Surafel, are you in a position to provide a patch for that quickly? End of the week, more time, easy enough and I should have seen the issue while reviewing. Patch attached. BTW, is the libpq hostaddr fix ok? -- Fabien.
Attachment
On 2019-Jun-14, Fabien COELHO wrote: > > Hello Alvaro, > > > > Shouldn't the --rows-per-insert option also be available via pg_dumpall? > > > All the other options for switching between COPY and INSERT are > > > settable in pg_dumpall. > > > > Uh, yeah, absolutely. > > > > Surafel, are you in a position to provide a patch for that quickly? > > End of the week, more time, easy enough and I should have seen the issue > while reviewing. Patch attached. Hello Fabien Thanks for producing a fix so quickly. David Rowley mentioned a couple of issues OOB (stanzas were put in the wrong places, when considering alphabetical ordering and such). It's pushed now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services