Thread: parallel pg_dump

parallel pg_dump

From
Andrew Dunstan
Date:
I haven't finished reviewing this yet - but there are some things that 
need to be fixed.

First, either the creation of the destination directory needs to be 
delayed until all the sanity checks have passed and we're sure we're 
actually going to write something there, or it needs to be removed if we 
error exit before anything gets written there. Example: if there's an 
error because I am dumping a 9.1 server and so should have specified 
--no-synchronized-snapshots then getting the directory as a by-product 
which I need to remove is annoying. Maybe pg_dump -F d should be 
prepared to accept an empty directory as well as a non-existent 
directory, just as initdb can. Maybe this isn't directly related to this 
patch, but I have noticed it more when reviewing this patch.

Second, all the PrintStatus traces are annoying and need to be removed, 
or perhaps better only output in debugging mode (using ahlog() instead 
of just printf())

cheers

andrew




Re: parallel pg_dump

From
Joachim Wieland
Date:
On Tue, Apr 3, 2012 at 9:26 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> First, either the creation of the destination directory needs to be delayed
> until all the sanity checks have passed and we're sure we're actually going
> to write something there, or it needs to be removed if we error exit before
> anything gets written there.

pg_dump also creates empty files which is the analogous case here.
Just try to dump a nonexistant database for example (this also shows
that delaying won't help...).

> Maybe pg_dump -F d should be prepared to accept an empty directory as well as a
> non-existent directory, just as initdb can.

That sounds like a good compromise. I'll implement that.


> Second, all the PrintStatus traces are annoying and need to be removed, or
> perhaps better only output in debugging mode (using ahlog() instead of just
> printf())

Sure, PrintStatus is just there for now to see what's going on. My
plan was to remove it entirely in the final patch.


Joachim


Re: parallel pg_dump

From
Andrew Dunstan
Date:

On 04/04/2012 05:03 AM, Joachim Wieland wrote:
>> Second, all the PrintStatus traces are annoying and need to be removed, or
>> perhaps better only output in debugging mode (using ahlog() instead of just
>> printf())
> Sure, PrintStatus is just there for now to see what's going on. My
> plan was to remove it entirely in the final patch.
>
>


We need that final patch NOW, I think. There is very little time for 
this before it will be too late for 9.2.

cheers

andrew


Re: parallel pg_dump

From
Joachim Wieland
Date:
On Wed, Apr 4, 2012 at 8:27 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> Sure, PrintStatus is just there for now to see what's going on. My
>> plan was to remove it entirely in the final patch.
>
> We need that final patch NOW, I think. There is very little time for this
> before it will be too late for 9.2.

Here are updated patches:

- An empty directory for the directory archive format is okay now.
- Removed PrintStatus().

Let me know if you need anything else.

Attachment

Re: parallel pg_dump

From
Alvaro Herrera
Date:
Excerpts from Joachim Wieland's message of mié abr 04 15:43:53 -0300 2012:
> On Wed, Apr 4, 2012 at 8:27 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> >> Sure, PrintStatus is just there for now to see what's going on. My
> >> plan was to remove it entirely in the final patch.
> >
> > We need that final patch NOW, I think. There is very little time for this
> > before it will be too late for 9.2.
>
> Here are updated patches:
>
> - An empty directory for the directory archive format is okay now.
> - Removed PrintStatus().

In general I'm not so sure that removing debugging printouts is the best
thing to do.  They might be helpful if in the future we continue to
rework this code.  How about a #define that turns them into empty
statements instead, for example?  I didn't read carefully to see if the
PrintStatus() calls are reasonable to keep, though.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: parallel pg_dump

From
Joachim Wieland
Date:
So here's a pg_dump benchmark from a real world database as requested
earlier. This is a ~750 GB large 9.0.6 database, and the backup has
been done over the internal network from a different machine. Both
machines run Linux.

I am attaching a chart that shows the table size distribution of the
largest tables and the overall pg_dump runtime. The resulting (zlib
compressed) dump directory was 28 GB.

Here are the raw numbers:

-Fc dump
real    168m58.005s
user    146m29.175s
sys     7m1.113s

-j 2
real    90m6.152s
user    155m23.887s
sys     15m15.521s

-j 3
real    61m5.787s
user    155m33.118s
sys     13m24.618s

-j 4
real    44m16.757s
user    155m25.917s
sys     13m13.599s

-j 6
real    36m11.743s
user    156m30.794s
sys     12m39.029s

-j 8
real    36m16.662s
user    154m37.495s
sys     11m47.141s

Attachment

Re: parallel pg_dump

From
Stefan Kaltenbrunner
Date:
On 04/05/2012 12:32 PM, Joachim Wieland wrote:
> So here's a pg_dump benchmark from a real world database as requested
> earlier. This is a ~750 GB large 9.0.6 database, and the backup has
> been done over the internal network from a different machine. Both
> machines run Linux.
> 
> I am attaching a chart that shows the table size distribution of the
> largest tables and the overall pg_dump runtime. The resulting (zlib
> compressed) dump directory was 28 GB.
> 
> Here are the raw numbers:
> 
> -Fc dump
> real    168m58.005s
> user    146m29.175s
> sys     7m1.113s
> 
> -j 2
> real    90m6.152s
> user    155m23.887s
> sys     15m15.521s
> 
> -j 3
> real    61m5.787s
> user    155m33.118s
> sys     13m24.618s
> 
> -j 4
> real    44m16.757s
> user    155m25.917s
> sys     13m13.599s
> 
> -j 6
> real    36m11.743s
> user    156m30.794s
> sys     12m39.029s
> 
> -j 8
> real    36m16.662s
> user    154m37.495s
> sys     11m47.141s


interesting numbers, any details on the network speed between the boxes,
the number of cores, the size of the dump uncompressed and what the
appearant bottleneck was?


Stefan


Re: parallel pg_dump

From
Joachim Wieland
Date:
On Wed, Apr 4, 2012 at 2:43 PM, Joachim Wieland <joe@mcknight.de> wrote:
> Here are updated patches:
>
> - An empty directory for the directory archive format is okay now.
> - Removed PrintStatus().

Attached is a rebased version of the parallel pg_dump patch.

Attachment

Re: parallel pg_dump

From
Joachim Wieland
Date:
On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> wrote:
> Attached is a rebased version of the parallel pg_dump patch.

Attached is another rebased version for the current commitfest.

Attachment

Re: parallel pg_dump

From
Andrew Dunstan
Date:
On 09/17/2012 10:01 PM, Joachim Wieland wrote:
> On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> wrote:
>> Attached is a rebased version of the parallel pg_dump patch.
> Attached is another rebased version for the current commitfest.

These did not apply cleanly, but I have fixed them up. The combined diff
against git tip is attached. It can also be pulled from my parallel_dump
branch on <https://github.com/adunstan/postgresql-dev.git> This builds
and runs OK on Linux, which is a start ...

cheers

andrew

Attachment

Re: parallel pg_dump

From
Andrew Dunstan
Date:
On 10/13/2012 10:46 PM, Andrew Dunstan wrote:
>
> On 09/17/2012 10:01 PM, Joachim Wieland wrote:
>> On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de>
>> wrote:
>>> Attached is a rebased version of the parallel pg_dump patch.
>> Attached is another rebased version for the current commitfest.
>
> These did not apply cleanly, but I have fixed them up. The combined
> diff against git tip is attached. It can also be pulled from my
> parallel_dump branch on
> <https://github.com/adunstan/postgresql-dev.git> This builds and runs
> OK on Linux, which is a start ...
>

Well, you would also need this piece if you're applying the patch
(sometimes I forget to do git add ...)

cheers


andrew


Attachment

Re: parallel pg_dump

From
Andres Freund
Date:
Hi,

On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote:
>
> On 10/13/2012 10:46 PM, Andrew Dunstan wrote:
> >
> >On 09/17/2012 10:01 PM, Joachim Wieland wrote:
> >>On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de>
> >>wrote:
> >>>Attached is a rebased version of the parallel pg_dump patch.
> >>Attached is another rebased version for the current commitfest.
> >
> >These did not apply cleanly, but I have fixed them up. The combined diff
> >against git tip is attached. It can also be pulled from my parallel_dump
> >branch on <https://github.com/adunstan/postgresql-dev.git> This builds and
> >runs OK on Linux, which is a start ...
>
> Well, you would also need this piece if you're applying the patch (sometimes
> I forget to do git add ...)

The patch is marked as Ready for Committer in the CF app, but at least
the whole windows situation seems to be unresolved as of yet?

Is anybody working on this? I would *love* to get this...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: parallel pg_dump

From
Andrew Dunstan
Date:
On 12/08/2012 11:01 AM, Andres Freund wrote:
> Hi,
>
> On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote:
>> On 10/13/2012 10:46 PM, Andrew Dunstan wrote:
>>> On 09/17/2012 10:01 PM, Joachim Wieland wrote:
>>>> On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de>
>>>> wrote:
>>>>> Attached is a rebased version of the parallel pg_dump patch.
>>>> Attached is another rebased version for the current commitfest.
>>> These did not apply cleanly, but I have fixed them up. The combined diff
>>> against git tip is attached. It can also be pulled from my parallel_dump
>>> branch on <https://github.com/adunstan/postgresql-dev.git> This builds and
>>> runs OK on Linux, which is a start ...
>> Well, you would also need this piece if you're applying the patch (sometimes
>> I forget to do git add ...)
> The patch is marked as Ready for Committer in the CF app, but at least
> the whole windows situation seems to be unresolved as of yet?
>
> Is anybody working on this? I would *love* to get this...
>
>


I am working on it when I get a chance, but keep getting hammered. I'd 
love somebody else to review it too.

cheers

andrew



Re: parallel pg_dump

From
Bruce Momjian
Date:
On Sat, Dec  8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote:
> 
> On 12/08/2012 11:01 AM, Andres Freund wrote:
> >Hi,
> >
> >On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote:
> >>On 10/13/2012 10:46 PM, Andrew Dunstan wrote:
> >>>On 09/17/2012 10:01 PM, Joachim Wieland wrote:
> >>>>On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de>
> >>>>wrote:
> >>>>>Attached is a rebased version of the parallel pg_dump patch.
> >>>>Attached is another rebased version for the current commitfest.
> >>>These did not apply cleanly, but I have fixed them up. The combined diff
> >>>against git tip is attached. It can also be pulled from my parallel_dump
> >>>branch on <https://github.com/adunstan/postgresql-dev.git> This builds and
> >>>runs OK on Linux, which is a start ...
> >>Well, you would also need this piece if you're applying the patch (sometimes
> >>I forget to do git add ...)
> >The patch is marked as Ready for Committer in the CF app, but at least
> >the whole windows situation seems to be unresolved as of yet?
> >
> >Is anybody working on this? I would *love* to get this...
> >
> >
> 
> 
> I am working on it when I get a chance, but keep getting hammered.
> I'd love somebody else to review it too.

FYI, I will be posting pg_upgrade performance numbers using Unix
processes.  I will try to get the Windows code working but will also
need help.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: parallel pg_dump

From
Joachim Wieland
Date:
On Sat, Dec 8, 2012 at 3:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Dec  8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote:
> I am working on it when I get a chance, but keep getting hammered.
> I'd love somebody else to review it too.

FYI, I will be posting pg_upgrade performance numbers using Unix
processes.  I will try to get the Windows code working but will also
need help.

Just let me know if there's anything I can help you guys with.


Joachim

Re: parallel pg_dump

From
Craig Ringer
Date:
On 12/09/2012 04:05 AM, Bruce Momjian wrote:
>
> FYI, I will be posting pg_upgrade performance numbers using Unix
> processes.  I will try to get the Windows code working but will also
> need help.
I'm interested ... or at least willing to help ... re the Windows side.
Let me know if I can be of any assistance as I have build environments
set up for a variety of Windows compiler variants.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services




Re: parallel pg_dump

From
Craig Ringer
Date:
<div class="moz-cite-prefix">On 01/21/2013 06:02 PM, Craig Ringer wrote:<br /></div><blockquote
cite="mid:50FD1251.5040005@2ndQuadrant.com"type="cite"><pre wrap="">On 12/09/2012 04:05 AM, Bruce Momjian wrote:
 
</pre><blockquote type="cite"><pre wrap="">
FYI, I will be posting pg_upgrade performance numbers using Unix
processes.  I will try to get the Windows code working but will also
need help.
</pre></blockquote><pre wrap="">I'm interested ... or at least willing to help ... re the Windows side.
Let me know if I can be of any assistance as I have build environments
set up for a variety of Windows compiler variants.


</pre></blockquote><br /> Andrew's git branch has a squashed copy of HEAD on top of it, so I've tidied it up and pushed
itto git://github.com/ringerc/postgres.git in the branch parallel_pg_dump ( <a
href="https://github.com/ringerc/postgres/tree/parallel_pg_dump">https://github.com/ringerc/postgres/tree/parallel_pg_dump</a>)
.<br/><br /> It builds and passes "vcregress check" on VS 2010 / WinSDK 7.1 on Win7. I haven't had a chance to test the
actualparallel dump feature yet; pending.<br /><br /><pre class="moz-signature" cols="72">-- Craig Ringer
   <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQL
Development,24x7 Support, Training & Services</pre> 

Re: parallel pg_dump

From
Joachim Wieland
Date:
On Mon, Oct 15, 2012 at 5:13 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> These did not apply cleanly, but I have fixed them up. The combined diff
>> against git tip is attached. It can also be pulled from my parallel_dump
>> branch on <https://github.com/adunstan/postgresql-dev.git> This builds and
>> runs OK on Linux, which is a start ...
>
> Well, you would also need this piece if you're applying the patch (sometimes
> I forget to do git add ...)

I am attaching rebased versions of Andrews latest patches for the
parallel pg_dump feature and a separate doc patch.

In the past I used to post two versions of the patch, one that just
prepared the code and moved stuff around without any real functional
change and one that then added the parallel dump feature on top of the
first, so that the code changes were minimal. As Andrews patch is
combined now and since that's what I rebased, it's only one part now.
If anyone wants the two patches again, please let me know.


Joachim

Attachment