Re: pg_dump multi VALUES INSERT - Mailing list pgsql-hackers

From David Rowley
Subject Re: pg_dump multi VALUES INSERT
Date
Msg-id CAKJS1f_Dxvqjeqo0MbWDGYEfPuNS5gqGVqj2++k=x1iFCnGBZQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump multi VALUES INSERT  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: pg_dump multi VALUES INSERT
Re: pg_dump multi VALUES INSERT
List pgsql-hackers
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> > At first i also try to do it like that but it seems the function will
>> > became long and more complex to me
>>
>> Probably. But calling it with size 100 should result in the same behavior,
>> so it is really just an extension of the preceeding one? Or am I missing
>> something?
>>
>
> Specifying table data using single value insert statement and user specified values insert statement
> have enough deference that demand to be separate function and they are not the same thing that should implement
> with the same function. Regarding code duplication i think the solution is making those code separate function
> and call at appropriate place.

I don't really buy this. I've just hacked up a version of
dumpTableData_insert() which supports a variable number rows per
statement. It seems fairly clean and easy to me. Likely the fact that
this is very possible greatly increases the chances of this getting in
since it gets rid of the code duplication. I did also happen to move
the row building code out of the function into its own function, but
that's not really required. I just did that so I could see all the
code in charge of terminating each statement on my screen without
having to scroll.  I've not touched any of the plumbing work to plug
the rows_per_statement variable into the command line argument. So
it'll need a bit of merge work with the existing patch.   There will
need to be some code that ensures that the user does not attempt to
have 0 rows per statement. The code I wrote won't behave well if that
happens.

... Checks existing patch ...

I see you added a test, but missed checking for 0. That needs to be fixed.

+ if (dopt.dump_inserts_multiple < 0)
+ {
+ write_msg(NULL, "argument of --insert-multi must be positive number\n");
+ exit_nicely(1);
+ }

I also didn't adopt passing the rows-per-statement into the FETCH.  I
think that's a very bad idea and we should keep that strictly at 100.
I don't see any reason to tie the two together. If a user wants 10
rows per statement, do we really want to FETCH 10 times more often?
And what happens when they want 1 million rows per statement? We've no
reason to run out of memory from this since we're just dumping the
rows out to the archive on each row.

1.

+        Specify the number of values per <command>INSERT</command> command.
+        This will make the dump file smaller than <option>--inserts</option>
+        and it is faster to reload but lack per row data lost on error
+        instead entire affected insert statement data lost.

Unsure what you mean about "data lost".  It also controls the number
of "rows" per <command>INSERT</command> statement, not the number of
values.

I think it would be fine just to say:

+        When using <option>--inserts</option>, this allows the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+        This setting defaults to 1.

I've used "maximum" there as the final statement on each table can
have less and also 0-column tables will always be 1 row per statement.

2. Is --insert-multi a good name? What if they do --insert-multi=1?
That's not very "multi".  Is --rows-per-insert better?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Allow auto_explain to log to NOTICE
Next
From: Peter Eisentraut
Date:
Subject: Re: New GUC to sample log queries