Thread: pg_dump insert with column names speedup
Here's a small patch which greatly increases the speed of pg_dump --column-inserts.
Previous to this patch a string was being build for the names of the columns for each row in the table... I've now changed this to only do this once for the table.
I also did some clean ups to remove calling the va_args function to write to the backup file when it was not needed, it instead now uses the puts type function which should be faster.
Here are some benchmark results on how it performs:
*** patched ***
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5850 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5790 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5700 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5760 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5730 ms
*** unpatched ***
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9580 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9330 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9250 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9230 ms
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9240 ms
The database contained a single 1 million record table with 6 columns.
Regards
David Rowley
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > Here's a small patch which greatly increases the speed of > pg_dump --column-inserts. The reason why no one's paid any attention to the speed of that code path is that if you care about dump/restore speed, you should be using the COPY code paths instead. Is it really worth adding code and complexity to pg_dump for this? regards, tom lane
On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> Here's a small patch which greatly increases the speed of >> pg_dump --column-inserts. > > The reason why no one's paid any attention to the speed of that code path > is that if you care about dump/restore speed, you should be using the COPY > code paths instead. Is it really worth adding code and complexity to > pg_dump for this? One possible reason to care about this is if you're trying to move data to another database. The INSERT format is more portable. Also, this isn't really adding any net code or complexity AFAICS. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > >> Here's a small patch which greatly increases the speed of > >> pg_dump --column-inserts. > > > > The reason why no one's paid any attention to the speed of that code path > > is that if you care about dump/restore speed, you should be using the COPY > > code paths instead. Is it really worth adding code and complexity to > > pg_dump for this? > > One possible reason to care about this is if you're trying to move > data to another database. The INSERT format is more portable. > > Also, this isn't really adding any net code or complexity AFAICS. Agreed- this looks more like a "gee, that makes a lot of sense" than a "wow, that's way more complicated". Not a whole lot of point in building up a known-to-be-constant string on every iteration of the loop. Thanks, Stephen
On Sat, Oct 5, 2013 at 6:39 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Robert Haas (robertmhaas@gmail.com) wrote:Agreed- this looks more like a "gee, that makes a lot of sense" than a
> On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Rowley <dgrowleyml@gmail.com> writes:
> >> Here's a small patch which greatly increases the speed of
> >> pg_dump --column-inserts.
> >
> > The reason why no one's paid any attention to the speed of that code path
> > is that if you care about dump/restore speed, you should be using the COPY
> > code paths instead. Is it really worth adding code and complexity to
> > pg_dump for this?
>
> One possible reason to care about this is if you're trying to move
> data to another database. The INSERT format is more portable.
>
> Also, this isn't really adding any net code or complexity AFAICS.
"wow, that's way more complicated". Not a whole lot of point in
building up a known-to-be-constant string on every iteration of the
loop.
These words made me think that the changes I made were not quite enough to satisfy this what you said.
I understand that most people won't use the --column-inserts feature, but that's not really a great reason to have not very clever and wasteful code in there. This fruit was so low hanging it was pretty much touching the ground, so I couldn't resist fixing it when I saw it.
The attached revised patch goes a little further and prepares everything that is constant on processing the first row, this now includes the "INSERT INTO tablename " part. I don't think the changes make the code any harder to read, the code which builds the staticStmt fits into my small laptop screen.
The timings with my benchmark look something like:
Unpatched: 9200 ms
Version 0.1: 5700 ms
Version 0.2: 5250 ms
So it does shave off a bit more, for what it's worth.
David
Thanks,
Stephen
Attachment
src/bin/pg_dump/pg_dump.c:1604: trailing whitespace. + staticStmt = createPQExpBuffer(); src/bin/pg_dump/pg_dump.c:1612: trailing whitespace. + else src/bin/pg_dump/pg_dump.c:1614: trailing whitespace. + if (column_inserts)
On Sat, Nov 16, 2013 at 9:04 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
src/bin/pg_dump/pg_dump.c:1604: trailing whitespace.
+ staticStmt = createPQExpBuffer();
src/bin/pg_dump/pg_dump.c:1612: trailing whitespace.
+ else
src/bin/pg_dump/pg_dump.c:1614: trailing whitespace.
+ if (column_inserts)
The tailing white space is fixed in the attached patch.
Regards
David Rowley
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > The tailing white space is fixed in the attached patch. Applied with minor cosmetic adjustments. regards, tom lane