Re: pg_dump multi VALUES INSERT - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: pg_dump multi VALUES INSERT |
Date | |
Msg-id | CAKJS1f_Dxvqjeqo0MbWDGYEfPuNS5gqGVqj2++k=x1iFCnGBZQ@mail.gmail.com Whole thread Raw |
In response to | Re: pg_dump multi VALUES INSERT (Surafel Temesgen <surafel3000@gmail.com>) |
Responses |
Re: pg_dump multi VALUES INSERT
Re: pg_dump multi VALUES INSERT |
List | pgsql-hackers |
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <surafel3000@gmail.com> wrote: > On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> > At first i also try to do it like that but it seems the function will >> > became long and more complex to me >> >> Probably. But calling it with size 100 should result in the same behavior, >> so it is really just an extension of the preceeding one? Or am I missing >> something? >> > > Specifying table data using single value insert statement and user specified values insert statement > have enough deference that demand to be separate function and they are not the same thing that should implement > with the same function. Regarding code duplication i think the solution is making those code separate function > and call at appropriate place. I don't really buy this. I've just hacked up a version of dumpTableData_insert() which supports a variable number rows per statement. It seems fairly clean and easy to me. Likely the fact that this is very possible greatly increases the chances of this getting in since it gets rid of the code duplication. I did also happen to move the row building code out of the function into its own function, but that's not really required. I just did that so I could see all the code in charge of terminating each statement on my screen without having to scroll. I've not touched any of the plumbing work to plug the rows_per_statement variable into the command line argument. So it'll need a bit of merge work with the existing patch. There will need to be some code that ensures that the user does not attempt to have 0 rows per statement. The code I wrote won't behave well if that happens. ... Checks existing patch ... I see you added a test, but missed checking for 0. That needs to be fixed. + if (dopt.dump_inserts_multiple < 0) + { + write_msg(NULL, "argument of --insert-multi must be positive number\n"); + exit_nicely(1); + } I also didn't adopt passing the rows-per-statement into the FETCH. I think that's a very bad idea and we should keep that strictly at 100. I don't see any reason to tie the two together. If a user wants 10 rows per statement, do we really want to FETCH 10 times more often? And what happens when they want 1 million rows per statement? We've no reason to run out of memory from this since we're just dumping the rows out to the archive on each row. 1. + Specify the number of values per <command>INSERT</command> command. + This will make the dump file smaller than <option>--inserts</option> + and it is faster to reload but lack per row data lost on error + instead entire affected insert statement data lost. Unsure what you mean about "data lost". It also controls the number of "rows" per <command>INSERT</command> statement, not the number of values. I think it would be fine just to say: + When using <option>--inserts</option>, this allows the maximum number + of rows per <command>INSERT</command> statement to be specified. + This setting defaults to 1. I've used "maximum" there as the final statement on each table can have less and also 0-column tables will always be 1 row per statement. 2. Is --insert-multi a good name? What if they do --insert-multi=1? That's not very "multi". Is --rows-per-insert better? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: