Thread: parallel pg_dump
I haven't finished reviewing this yet - but there are some things that need to be fixed. First, either the creation of the destination directory needs to be delayed until all the sanity checks have passed and we're sure we're actually going to write something there, or it needs to be removed if we error exit before anything gets written there. Example: if there's an error because I am dumping a 9.1 server and so should have specified --no-synchronized-snapshots then getting the directory as a by-product which I need to remove is annoying. Maybe pg_dump -F d should be prepared to accept an empty directory as well as a non-existent directory, just as initdb can. Maybe this isn't directly related to this patch, but I have noticed it more when reviewing this patch. Second, all the PrintStatus traces are annoying and need to be removed, or perhaps better only output in debugging mode (using ahlog() instead of just printf()) cheers andrew
On Tue, Apr 3, 2012 at 9:26 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > First, either the creation of the destination directory needs to be delayed > until all the sanity checks have passed and we're sure we're actually going > to write something there, or it needs to be removed if we error exit before > anything gets written there. pg_dump also creates empty files which is the analogous case here. Just try to dump a nonexistant database for example (this also shows that delaying won't help...). > Maybe pg_dump -F d should be prepared to accept an empty directory as well as a > non-existent directory, just as initdb can. That sounds like a good compromise. I'll implement that. > Second, all the PrintStatus traces are annoying and need to be removed, or > perhaps better only output in debugging mode (using ahlog() instead of just > printf()) Sure, PrintStatus is just there for now to see what's going on. My plan was to remove it entirely in the final patch. Joachim
On 04/04/2012 05:03 AM, Joachim Wieland wrote: >> Second, all the PrintStatus traces are annoying and need to be removed, or >> perhaps better only output in debugging mode (using ahlog() instead of just >> printf()) > Sure, PrintStatus is just there for now to see what's going on. My > plan was to remove it entirely in the final patch. > > We need that final patch NOW, I think. There is very little time for this before it will be too late for 9.2. cheers andrew
On Wed, Apr 4, 2012 at 8:27 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >> Sure, PrintStatus is just there for now to see what's going on. My >> plan was to remove it entirely in the final patch. > > We need that final patch NOW, I think. There is very little time for this > before it will be too late for 9.2. Here are updated patches: - An empty directory for the directory archive format is okay now. - Removed PrintStatus(). Let me know if you need anything else.
Attachment
Excerpts from Joachim Wieland's message of mié abr 04 15:43:53 -0300 2012: > On Wed, Apr 4, 2012 at 8:27 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > >> Sure, PrintStatus is just there for now to see what's going on. My > >> plan was to remove it entirely in the final patch. > > > > We need that final patch NOW, I think. There is very little time for this > > before it will be too late for 9.2. > > Here are updated patches: > > - An empty directory for the directory archive format is okay now. > - Removed PrintStatus(). In general I'm not so sure that removing debugging printouts is the best thing to do. They might be helpful if in the future we continue to rework this code. How about a #define that turns them into empty statements instead, for example? I didn't read carefully to see if the PrintStatus() calls are reasonable to keep, though. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
So here's a pg_dump benchmark from a real world database as requested earlier. This is a ~750 GB large 9.0.6 database, and the backup has been done over the internal network from a different machine. Both machines run Linux. I am attaching a chart that shows the table size distribution of the largest tables and the overall pg_dump runtime. The resulting (zlib compressed) dump directory was 28 GB. Here are the raw numbers: -Fc dump real 168m58.005s user 146m29.175s sys 7m1.113s -j 2 real 90m6.152s user 155m23.887s sys 15m15.521s -j 3 real 61m5.787s user 155m33.118s sys 13m24.618s -j 4 real 44m16.757s user 155m25.917s sys 13m13.599s -j 6 real 36m11.743s user 156m30.794s sys 12m39.029s -j 8 real 36m16.662s user 154m37.495s sys 11m47.141s
Attachment
On 04/05/2012 12:32 PM, Joachim Wieland wrote: > So here's a pg_dump benchmark from a real world database as requested > earlier. This is a ~750 GB large 9.0.6 database, and the backup has > been done over the internal network from a different machine. Both > machines run Linux. > > I am attaching a chart that shows the table size distribution of the > largest tables and the overall pg_dump runtime. The resulting (zlib > compressed) dump directory was 28 GB. > > Here are the raw numbers: > > -Fc dump > real 168m58.005s > user 146m29.175s > sys 7m1.113s > > -j 2 > real 90m6.152s > user 155m23.887s > sys 15m15.521s > > -j 3 > real 61m5.787s > user 155m33.118s > sys 13m24.618s > > -j 4 > real 44m16.757s > user 155m25.917s > sys 13m13.599s > > -j 6 > real 36m11.743s > user 156m30.794s > sys 12m39.029s > > -j 8 > real 36m16.662s > user 154m37.495s > sys 11m47.141s interesting numbers, any details on the network speed between the boxes, the number of cores, the size of the dump uncompressed and what the appearant bottleneck was? Stefan
On Wed, Apr 4, 2012 at 2:43 PM, Joachim Wieland <joe@mcknight.de> wrote: > Here are updated patches: > > - An empty directory for the directory archive format is okay now. > - Removed PrintStatus(). Attached is a rebased version of the parallel pg_dump patch.
Attachment
On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> wrote: > Attached is a rebased version of the parallel pg_dump patch. Attached is another rebased version for the current commitfest.
Attachment
On 09/17/2012 10:01 PM, Joachim Wieland wrote: > On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> wrote: >> Attached is a rebased version of the parallel pg_dump patch. > Attached is another rebased version for the current commitfest. These did not apply cleanly, but I have fixed them up. The combined diff against git tip is attached. It can also be pulled from my parallel_dump branch on <https://github.com/adunstan/postgresql-dev.git> This builds and runs OK on Linux, which is a start ... cheers andrew
Attachment
On 10/13/2012 10:46 PM, Andrew Dunstan wrote: > > On 09/17/2012 10:01 PM, Joachim Wieland wrote: >> On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> >> wrote: >>> Attached is a rebased version of the parallel pg_dump patch. >> Attached is another rebased version for the current commitfest. > > These did not apply cleanly, but I have fixed them up. The combined > diff against git tip is attached. It can also be pulled from my > parallel_dump branch on > <https://github.com/adunstan/postgresql-dev.git> This builds and runs > OK on Linux, which is a start ... > Well, you would also need this piece if you're applying the patch (sometimes I forget to do git add ...) cheers andrew
Attachment
Hi, On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote: > > On 10/13/2012 10:46 PM, Andrew Dunstan wrote: > > > >On 09/17/2012 10:01 PM, Joachim Wieland wrote: > >>On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> > >>wrote: > >>>Attached is a rebased version of the parallel pg_dump patch. > >>Attached is another rebased version for the current commitfest. > > > >These did not apply cleanly, but I have fixed them up. The combined diff > >against git tip is attached. It can also be pulled from my parallel_dump > >branch on <https://github.com/adunstan/postgresql-dev.git> This builds and > >runs OK on Linux, which is a start ... > > Well, you would also need this piece if you're applying the patch (sometimes > I forget to do git add ...) The patch is marked as Ready for Committer in the CF app, but at least the whole windows situation seems to be unresolved as of yet? Is anybody working on this? I would *love* to get this... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/08/2012 11:01 AM, Andres Freund wrote: > Hi, > > On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote: >> On 10/13/2012 10:46 PM, Andrew Dunstan wrote: >>> On 09/17/2012 10:01 PM, Joachim Wieland wrote: >>>> On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> >>>> wrote: >>>>> Attached is a rebased version of the parallel pg_dump patch. >>>> Attached is another rebased version for the current commitfest. >>> These did not apply cleanly, but I have fixed them up. The combined diff >>> against git tip is attached. It can also be pulled from my parallel_dump >>> branch on <https://github.com/adunstan/postgresql-dev.git> This builds and >>> runs OK on Linux, which is a start ... >> Well, you would also need this piece if you're applying the patch (sometimes >> I forget to do git add ...) > The patch is marked as Ready for Committer in the CF app, but at least > the whole windows situation seems to be unresolved as of yet? > > Is anybody working on this? I would *love* to get this... > > I am working on it when I get a chance, but keep getting hammered. I'd love somebody else to review it too. cheers andrew
On Sat, Dec 8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote: > > On 12/08/2012 11:01 AM, Andres Freund wrote: > >Hi, > > > >On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote: > >>On 10/13/2012 10:46 PM, Andrew Dunstan wrote: > >>>On 09/17/2012 10:01 PM, Joachim Wieland wrote: > >>>>On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland <joe@mcknight.de> > >>>>wrote: > >>>>>Attached is a rebased version of the parallel pg_dump patch. > >>>>Attached is another rebased version for the current commitfest. > >>>These did not apply cleanly, but I have fixed them up. The combined diff > >>>against git tip is attached. It can also be pulled from my parallel_dump > >>>branch on <https://github.com/adunstan/postgresql-dev.git> This builds and > >>>runs OK on Linux, which is a start ... > >>Well, you would also need this piece if you're applying the patch (sometimes > >>I forget to do git add ...) > >The patch is marked as Ready for Committer in the CF app, but at least > >the whole windows situation seems to be unresolved as of yet? > > > >Is anybody working on this? I would *love* to get this... > > > > > > > I am working on it when I get a chance, but keep getting hammered. > I'd love somebody else to review it too. FYI, I will be posting pg_upgrade performance numbers using Unix processes. I will try to get the Windows code working but will also need help. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sat, Dec 8, 2012 at 3:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
Just let me know if there's anything I can help you guys with.
Joachim
FYI, I will be posting pg_upgrade performance numbers using UnixOn Sat, Dec 8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote:
> I am working on it when I get a chance, but keep getting hammered.
> I'd love somebody else to review it too.
processes. I will try to get the Windows code working but will also
need help.
Just let me know if there's anything I can help you guys with.
Joachim
On 12/09/2012 04:05 AM, Bruce Momjian wrote: > > FYI, I will be posting pg_upgrade performance numbers using Unix > processes. I will try to get the Windows code working but will also > need help. I'm interested ... or at least willing to help ... re the Windows side. Let me know if I can be of any assistance as I have build environments set up for a variety of Windows compiler variants. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div class="moz-cite-prefix">On 01/21/2013 06:02 PM, Craig Ringer wrote:<br /></div><blockquote cite="mid:50FD1251.5040005@2ndQuadrant.com"type="cite"><pre wrap="">On 12/09/2012 04:05 AM, Bruce Momjian wrote: </pre><blockquote type="cite"><pre wrap=""> FYI, I will be posting pg_upgrade performance numbers using Unix processes. I will try to get the Windows code working but will also need help. </pre></blockquote><pre wrap="">I'm interested ... or at least willing to help ... re the Windows side. Let me know if I can be of any assistance as I have build environments set up for a variety of Windows compiler variants. </pre></blockquote><br /> Andrew's git branch has a squashed copy of HEAD on top of it, so I've tidied it up and pushed itto git://github.com/ringerc/postgres.git in the branch parallel_pg_dump ( <a href="https://github.com/ringerc/postgres/tree/parallel_pg_dump">https://github.com/ringerc/postgres/tree/parallel_pg_dump</a>) .<br/><br /> It builds and passes "vcregress check" on VS 2010 / WinSDK 7.1 on Win7. I haven't had a chance to test the actualparallel dump feature yet; pending.<br /><br /><pre class="moz-signature" cols="72">-- Craig Ringer <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQL Development,24x7 Support, Training & Services</pre>
On Mon, Oct 15, 2012 at 5:13 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> These did not apply cleanly, but I have fixed them up. The combined diff >> against git tip is attached. It can also be pulled from my parallel_dump >> branch on <https://github.com/adunstan/postgresql-dev.git> This builds and >> runs OK on Linux, which is a start ... > > Well, you would also need this piece if you're applying the patch (sometimes > I forget to do git add ...) I am attaching rebased versions of Andrews latest patches for the parallel pg_dump feature and a separate doc patch. In the past I used to post two versions of the patch, one that just prepared the code and moved stuff around without any real functional change and one that then added the parallel dump feature on top of the first, so that the code changes were minimal. As Andrews patch is combined now and since that's what I rebased, it's only one part now. If anyone wants the two patches again, please let me know. Joachim