Thread: pg_dump insert with column names speedup

pg_dump insert with column names speedup

From
David Rowley
Date:
Here's a small patch which greatly increases the speed of pg_dump --column-inserts.

Previous to this patch a string was being build for the names of the columns for each row in the table... I've now changed this to only do this once for the table.

I also did some clean ups to remove calling the va_args function to write to the backup file when it was not needed, it instead now uses the puts type function which should be faster.

Here are some benchmark results on how it performs:

*** patched ***
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5850 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5790 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5700 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5760 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5730 ms

*** unpatched ***

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9580 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9330 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9250 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9230 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9240 ms

The database contained a single 1 million record table with 6 columns.

Regards

David Rowley

Attachment

Re: pg_dump insert with column names speedup

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> Here's a small patch which greatly increases the speed of
> pg_dump --column-inserts.

The reason why no one's paid any attention to the speed of that code path
is that if you care about dump/restore speed, you should be using the COPY
code paths instead.  Is it really worth adding code and complexity to
pg_dump for this?
        regards, tom lane



Re: pg_dump insert with column names speedup

From
Robert Haas
Date:
On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> Here's a small patch which greatly increases the speed of
>> pg_dump --column-inserts.
>
> The reason why no one's paid any attention to the speed of that code path
> is that if you care about dump/restore speed, you should be using the COPY
> code paths instead.  Is it really worth adding code and complexity to
> pg_dump for this?

One possible reason to care about this is if you're trying to move
data to another database.  The INSERT format is more portable.

Also, this isn't really adding any net code or complexity AFAICS.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dump insert with column names speedup

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Rowley <dgrowleyml@gmail.com> writes:
> >> Here's a small patch which greatly increases the speed of
> >> pg_dump --column-inserts.
> >
> > The reason why no one's paid any attention to the speed of that code path
> > is that if you care about dump/restore speed, you should be using the COPY
> > code paths instead.  Is it really worth adding code and complexity to
> > pg_dump for this?
>
> One possible reason to care about this is if you're trying to move
> data to another database.  The INSERT format is more portable.
>
> Also, this isn't really adding any net code or complexity AFAICS.

Agreed- this looks more like a "gee, that makes a lot of sense" than a
"wow, that's way more complicated".  Not a whole lot of point in
building up a known-to-be-constant string on every iteration of the
loop.
Thanks,
    Stephen

Re: pg_dump insert with column names speedup

From
David Rowley
Date:
On Sat, Oct 5, 2013 at 6:39 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Rowley <dgrowleyml@gmail.com> writes:
> >> Here's a small patch which greatly increases the speed of
> >> pg_dump --column-inserts.
> >
> > The reason why no one's paid any attention to the speed of that code path
> > is that if you care about dump/restore speed, you should be using the COPY
> > code paths instead.  Is it really worth adding code and complexity to
> > pg_dump for this?
>
> One possible reason to care about this is if you're trying to move
> data to another database.  The INSERT format is more portable.
>
> Also, this isn't really adding any net code or complexity AFAICS.

Agreed- this looks more like a "gee, that makes a lot of sense" than a
"wow, that's way more complicated".  Not a whole lot of point in
building up a known-to-be-constant string on every iteration of the
loop.


These words made me think that the changes I made were not quite enough to satisfy this what you said.
I understand that most people won't use the --column-inserts feature, but that's not really a great reason to have not very clever and wasteful code in there. This fruit was so low hanging it was pretty much touching the ground, so I couldn't resist fixing it when I saw it.

The attached revised patch goes a little further and prepares everything that is constant on processing the first row, this now includes the "INSERT INTO tablename " part. I don't think the changes make the code any harder to read, the code which builds the staticStmt fits into my small laptop screen.

The timings with my benchmark look something like:

Unpatched: 9200 ms
Version 0.1:  5700 ms
Version 0.2: 5250 ms

So it does shave off a bit more, for what it's worth.

David


 
        Thanks,

                Stephen

Attachment

Re: pg_dump insert with column names speedup

From
Peter Eisentraut
Date:
src/bin/pg_dump/pg_dump.c:1604: trailing whitespace.
+                staticStmt = createPQExpBuffer(); 
src/bin/pg_dump/pg_dump.c:1612: trailing whitespace.
+                else 
src/bin/pg_dump/pg_dump.c:1614: trailing whitespace.
+                    if (column_inserts) 



Re: pg_dump insert with column names speedup

From
David Rowley
Date:
On Sat, Nov 16, 2013 at 9:04 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
src/bin/pg_dump/pg_dump.c:1604: trailing whitespace.
+                               staticStmt = createPQExpBuffer();
src/bin/pg_dump/pg_dump.c:1612: trailing whitespace.
+                               else
src/bin/pg_dump/pg_dump.c:1614: trailing whitespace.
+                                       if (column_inserts)

The tailing white space is fixed in the attached patch.

Regards

David Rowley
Attachment

Re: pg_dump insert with column names speedup

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> The tailing white space is fixed in the attached patch.

Applied with minor cosmetic adjustments.
        regards, tom lane