Thread: pg_dump directory archive format / parallel pg_dump
Here's a new series of patches for the parallel dump/restore. They need to be applied on top of each other. The parallel pg_dump patch does not yet use the synchronized snapshot functionality from my other patch to not create more dependencies than necessary. (1) pg_dump directory archive format (without checks as requested by Heikki) (2) parallel pg_dump (3) checks for the directory archive format Joachim
Attachment
On Fri, Jan 7, 2011 at 3:18 PM, Joachim Wieland <joe@mcknight.de> wrote: > Here's a new series of patches for the parallel dump/restore. They need to be > applied on top of each other. > This one is the last version of this patch? if so, commitfest app should be updated to reflect that -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Mon, Jan 17, 2011 at 5:38 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > This one is the last version of this patch? if so, commitfest app > should be updated to reflect that Here are the latest patches all of them also rebased to current HEAD. Will update the commitfest app as well. Joachim
Attachment
On 19.01.2011 07:45, Joachim Wieland wrote: > On Mon, Jan 17, 2011 at 5:38 PM, Jaime Casanova<jaime@2ndquadrant.com> wrote: >> This one is the last version of this patch? if so, commitfest app >> should be updated to reflect that > > Here are the latest patches all of them also rebased to current HEAD. > Will update the commitfest app as well. What's the idea of storing the file sizes in the toc file? It looks like it's not used for anything. It would be nice to have this format match the tar format. At the moment, there's a couple of cosmetic differences: * TOC file is called "TOC", instead of "toc.dat" * blobs TOC file is called "BLOBS.TOC" instead of "blobs.toc" * each blob is stored as "blobs/<oid>.dat", instead of "blob_<oid>.dat" The only significant difference is that in the directory archive format, each data file has a header in the beginning. What are the benefits of the data file header? Would it be better to leave it out, so that the format would be identical to the tar format? You could then just tar up the directory to get a tar archive, or vice versa. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Jan 19, 2011 at 7:47 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> Here are the latest patches all of them also rebased to current HEAD. >> Will update the commitfest app as well. > > What's the idea of storing the file sizes in the toc file? It looks like > it's not used for anything. It's part of the overall idea to make sure files are not inadvertently exchanged between different backups and that a file is not truncated. In the future I'd also like to add a checksum to the TOC so that a backup can be checked for integrity. This will cost performance but with the parallel backup it can be distributed to several processors. > It would be nice to have this format match the tar format. At the moment, > there's a couple of cosmetic differences: > > * TOC file is called "TOC", instead of "toc.dat" > > * blobs TOC file is called "BLOBS.TOC" instead of "blobs.toc" > > * each blob is stored as "blobs/<oid>.dat", instead of "blob_<oid>.dat" That can be done easily... > The only significant difference is that in the directory archive format, > each data file has a header in the beginning. > What are the benefits of the data file header? Would it be better to leave > it out, so that the format would be identical to the tar format? You could > then just tar up the directory to get a tar archive, or vice versa. The header is there to identify a file, it contains the header that every other pgdump file contains, including the internal version number and the unique backup id. The tar format doesn't support compression so going from one to the other would only work for an uncompressed archive and special care must be taken to get the order of the tar file right. If you want to drop the header altogether, fine with me but if it's just for the tar <-> directory conversion, then I am failing to see what the use case of that would be. A tar archive has the advantage that you can postprocess the dump data with other tools but for this we could also add an option that gives you only the data part of a dump file (and uncompresses it at the same time if compressed). Once we have that however, the question is what anybody would then still want to use the tar format for... Joachim
On 19.01.2011 16:01, Joachim Wieland wrote: > On Wed, Jan 19, 2011 at 7:47 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >>> Here are the latest patches all of them also rebased to current HEAD. >>> Will update the commitfest app as well. >> >> What's the idea of storing the file sizes in the toc file? It looks like >> it's not used for anything. > > It's part of the overall idea to make sure files are not inadvertently > exchanged between different backups and that a file is not truncated. > In the future I'd also like to add a checksum to the TOC so that a > backup can be checked for integrity. This will cost performance but > with the parallel backup it can be distributed to several processors. Ok. I'm going to leave out the filesize. I can see some value in that, and the CRC, but I don't want to add stuff that's not used at this point. >> It would be nice to have this format match the tar format. At the moment, >> there's a couple of cosmetic differences: >> >> * TOC file is called "TOC", instead of "toc.dat" >> >> * blobs TOC file is called "BLOBS.TOC" instead of "blobs.toc" >> >> * each blob is stored as "blobs/<oid>.dat", instead of "blob_<oid>.dat" > > That can be done easily... > >> The only significant difference is that in the directory archive format, >> each data file has a header in the beginning. > >> What are the benefits of the data file header? Would it be better to leave >> it out, so that the format would be identical to the tar format? You could >> then just tar up the directory to get a tar archive, or vice versa. > > The header is there to identify a file, it contains the header that > every other pgdump file contains, including the internal version > number and the unique backup id. > > The tar format doesn't support compression so going from one to the > other would only work for an uncompressed archive and special care > must be taken to get the order of the tar file right. Hmm, tar format doesn't support compression, but looks like the file format issue has been thought of already: there's still code there to add .gz suffix for compressed files. How about adopting that convention in the directory format too? That would make an uncompressed directory format compatible with the tar format. That seems pretty attractive anyway, because you can then dump to a directory, and manually gzip the data files later. Now that we have an API for compression in compress_io.c, it probably wouldn't be very hard to implement the missing compression support to tar format either. > If you want to drop the header altogether, fine with me but if it's > just for the tar<-> directory conversion, then I am failing to see > what the use case of that would be. > > A tar archive has the advantage that you can postprocess the dump data > with other tools but for this we could also add an option that gives > you only the data part of a dump file (and uncompresses it at the same > time if compressed). Once we have that however, the question is what > anybody would then still want to use the tar format for... I don't know how popular it'll be in practice, but it seems very nice to me if you can do things like parallel pg_dump in directory format first, and then tar it up to a file for archival. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jan 20, 2011 at 6:07 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> It's part of the overall idea to make sure files are not inadvertently >> exchanged between different backups and that a file is not truncated. >> In the future I'd also like to add a checksum to the TOC so that a >> backup can be checked for integrity. This will cost performance but >> with the parallel backup it can be distributed to several processors. > > Ok. I'm going to leave out the filesize. I can see some value in that, and > the CRC, but I don't want to add stuff that's not used at this point. Okay. >> The header is there to identify a file, it contains the header that >> every other pgdump file contains, including the internal version >> number and the unique backup id. >> >> The tar format doesn't support compression so going from one to the >> other would only work for an uncompressed archive and special care >> must be taken to get the order of the tar file right. > > Hmm, tar format doesn't support compression, but looks like the file format > issue has been thought of already: there's still code there to add .gz > suffix for compressed files. How about adopting that convention in the > directory format too? That would make an uncompressed directory format > compatible with the tar format. So what you could do is dump in the tar format, untar and restore in the directory format. I see that this sounds nice but still I am not sure why someone would dump to the tar format in the first place. But you still cannot go back from the directory archive to the tar archive because the standard command line tar will not respect the order of the objects that pg_restore expects in a tar format, right? > That seems pretty attractive anyway, because you can then dump to a > directory, and manually gzip the data files later. The command line gzip will probably add its own header to the file that pg_restore would need to strip off... This is a valid use case for people who are concerned with a fast dump, usually they would dump uncompressed and later compress the archive. However once we have parallel pg_dump, this advantage vanishes. > Now that we have an API for compression in compress_io.c, it probably > wouldn't be very hard to implement the missing compression support to tar > format either. True, but the question to the advantage of the tar format remains :-) >> A tar archive has the advantage that you can postprocess the dump data >> with other tools but for this we could also add an option that gives >> you only the data part of a dump file (and uncompresses it at the same >> time if compressed). Once we have that however, the question is what >> anybody would then still want to use the tar format for... > > I don't know how popular it'll be in practice, but it seems very nice to me > if you can do things like parallel pg_dump in directory format first, and > then tar it up to a file for archival. Yes, but you cannot pg_restore the archive then if it was created with standard tar, right? Joachim
On 20.01.2011 15:46, Joachim Wieland wrote: > On Thu, Jan 20, 2011 at 6:07 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >>> The header is there to identify a file, it contains the header that >>> every other pgdump file contains, including the internal version >>> number and the unique backup id. >>> >>> The tar format doesn't support compression so going from one to the >>> other would only work for an uncompressed archive and special care >>> must be taken to get the order of the tar file right. >> >> Hmm, tar format doesn't support compression, but looks like the file format >> issue has been thought of already: there's still code there to add .gz >> suffix for compressed files. How about adopting that convention in the >> directory format too? That would make an uncompressed directory format >> compatible with the tar format. > > So what you could do is dump in the tar format, untar and restore in > the directory format. I see that this sounds nice but still I am not > sure why someone would dump to the tar format in the first place. I'm not sure either. Maybe you want to pipe the output of "pg_dump -F t" via an ssh tunnel to another host, where you untar it, producing a directory format dump. You can then edit the directory format dump, and restore it back to the database without having to tar it again. It gives you a lot of flexibility if the formats are compatible, which is generally good. > But you still cannot go back from the directory archive to the tar > archive because the standard command line tar will not respect the > order of the objects that pg_restore expects in a tar format, right? Hmm, I didn't realize pg_restore requires the files to be in certain order in the tar file. There's no mention of that in the docs either, we should add that. It doesn't actually require that if you read from a file, but from stdin it does. You can put files in the archive in a certain order if you list them explicitly in the tar command line, like "tar cf backup.tar toc.dat ...". It's hard to know the right order, though. In practice you would need to do "tar tf backup.tar >files" before untarring, and use "files" to tar them again in the rightorder. >> That seems pretty attractive anyway, because you can then dump to a >> directory, and manually gzip the data files later. > > The command line gzip will probably add its own header to the file > that pg_restore would need to strip off... Yeah, we should write the header too. That's not hard, e.g gzopen will do that automatically, or you can pass a flag to deflateInit2. >>> A tar archive has the advantage that you can postprocess the dump data >>> with other tools but for this we could also add an option that gives >>> you only the data part of a dump file (and uncompresses it at the same >>> time if compressed). Once we have that however, the question is what >>> anybody would then still want to use the tar format for... >> >> I don't know how popular it'll be in practice, but it seems very nice to me >> if you can do things like parallel pg_dump in directory format first, and >> then tar it up to a file for archival. > > Yes, but you cannot pg_restore the archive then if it was created with > standard tar, right? See above, you can unless you try to pipe it to pg_restore. In fact, that's listed as an advantage of the tar format over other formats in the pg_dump documentation. (I'm working on this, no need to submit a new patch) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Jan20, 2011, at 16:22 , Heikki Linnakangas wrote: > You can put files in the archive in a certain order if you list them explicitly in the tar command line, like "tar cf backup.tartoc.dat ...". It's hard to know the right order, though. In practice you would need to do "tar tf backup.tar >files"before untarring, and use "files" to tar them again in the rightorder. Hm, could we create a file in the backup directory which lists the files in the right order? best regards, Florian Pflug
On 20.01.2011 17:22, Heikki Linnakangas wrote: > (I'm working on this, no need to submit a new patch) Ok, here's a heavily refactored version of this (also available at git://git.postgresql.org/git/users/heikki/postgres.git, branch pg_dump_directory). The directory format is now identical to the tar format, except that in the directory format the files can be compressed. Also we don't write the restore.sql file - it would be nice to have, but pg_restore doesn't require it. We can leave that as a TODO. I ended up writing another compression abstraction layer in compress_io.c. It wraps fopen / gzopen etc. in a common API, so that the caller doesn't need to care if the file is compressed or not. In hindsight, the compression API we put in earlier didn't suit us very well. But I guess it wasn't a complete waste, as it moved the gory details of zlib out of the custom format code. If compression is used, the files are created with the .gz suffix, and include the gzip header so that you can manipulate them easily with gzip/gunzip utilities. When reading, we accept files with or without the .gz suffix, and you can have some files compressed and others uncompressed. I haven't updated the documentation yet. There's one UI thing that bothers me. The option to specify the target directory is called --file. But it's clearly not a file. OTOH, I'd hate to introduce a parallel --dir option just for this. Any thoughts on this? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Fri, Jan 21, 2011 at 4:41 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > There's one UI thing that bothers me. The option to specify the target > directory is called --file. But it's clearly not a file. OTOH, I'd hate to > introduce a parallel --dir option just for this. Any thoughts on this? If we were starting over, I'd probably suggest calling the option -o, --output. But since -o is already taken (for --oids) I'd be inclined to just make the help text read: -f, --file=FILENAME output file (or directory) name -F, --format=c|t|p|d output file format (custom, tar,text, dir) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 21.01.2011 15:35, Robert Haas wrote: > On Fri, Jan 21, 2011 at 4:41 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> There's one UI thing that bothers me. The option to specify the target >> directory is called --file. But it's clearly not a file. OTOH, I'd hate to >> introduce a parallel --dir option just for this. Any thoughts on this? > > If we were starting over, I'd probably suggest calling the option -o, > --output. But since -o is already taken (for --oids) I'd be inclined > to just make the help text read: > > -f, --file=FILENAME output file (or directory) name > -F, --format=c|t|p|d output file format (custom, tar, text, dir) Ok, that's exactly what the patch does now. I guess it's fine then. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 01/21/2011 10:34 AM, Heikki Linnakangas wrote: > On 21.01.2011 15:35, Robert Haas wrote: >> On Fri, Jan 21, 2011 at 4:41 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> There's one UI thing that bothers me. The option to specify the target >>> directory is called --file. But it's clearly not a file. OTOH, I'd >>> hate to >>> introduce a parallel --dir option just for this. Any thoughts on this? >> >> If we were starting over, I'd probably suggest calling the option -o, >> --output. But since -o is already taken (for --oids) I'd be inclined >> to just make the help text read: >> >> -f, --file=FILENAME output file (or directory) name >> -F, --format=c|t|p|d output file format (custom, tar, text, >> dir) > > Ok, that's exactly what the patch does now. I guess it's fine then. > Maybe we could change the hint to say "--file=DESTINATION" or "--file=FILENAME|DIRNAME" ? Just a thought. cheers andrew
Em 21-01-2011 12:47, Andrew Dunstan escreveu: > Maybe we could change the hint to say "--file=DESTINATION" or > "--file=FILENAME|DIRNAME" ? > ... "--file=OUTPUT" or "--file=OUTPUTNAME". -- Euler Taveira de Oliveira http://www.timbira.com/
On 21.01.2011 19:11, Euler Taveira de Oliveira wrote: > Em 21-01-2011 12:47, Andrew Dunstan escreveu: >> Maybe we could change the hint to say "--file=DESTINATION" or >> "--file=FILENAME|DIRNAME" ? >> > ... "--file=OUTPUT" or "--file=OUTPUTNAME". Ok, works for me. I've committed this patch now, with a whole bunch of further fixes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Jan 19, 2011 at 12:45 AM, Joachim Wieland <joe@mcknight.de> wrote: > On Mon, Jan 17, 2011 at 5:38 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: >> This one is the last version of this patch? if so, commitfest app >> should be updated to reflect that > > Here are the latest patches all of them also rebased to current HEAD. > Will update the commitfest app as well. The parallel pg_dump portion of this patch (i.e. the still-uncommitted part) no longer applies. Please rebase. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jan 30, 2011 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The parallel pg_dump portion of this patch (i.e. the still-uncommitted > part) no longer applies. Please rebase. Here is a rebased version with some minor changes as well. I haven't tested it on Windows now but will do so as soon as the Unix part has been reviewed. Joachim
Attachment
On Wed, Feb 2, 2011 at 13:32, Joachim Wieland <joe@mcknight.de> wrote: > Here is a rebased version with some minor changes as well. I read the patch works as below. Am I understanding correctly? 1. Open all connections in a parent process. 2. Start transactionsfor each connection in the parent. 3. Spawn child processes with fork(). 4. Each child process uses one of theinherited connections. I think we have 2 important technical issues here:* The consistency is not perfect. Each transaction is started with smalldelays in step 1, but we cannot guarantee no other transaction between them.* Can we inherit connections to child processeswith fork() ? Moreover, we also need to pass running transactions to children. I wonder libpq is designed forsuch usage. To solve both issues, we might want a way to control visibility in a database server instead of client programs. Don't we need server-side support like [1] before developing parallel dump?[1] http://wiki.postgresql.org/wiki/ClusterFeatures#Export_snapshots_to_other_sessions > I haven't > tested it on Windows now but will do so as soon as the Unix part has > been reviewed. It might be better to remove Windows-specific codes from the first try. I doubt Windows message queue is the best API in such console-based application. I hope we could use the same implementation for all platforms for inter-process/thread communication. -- Itagaki Takahiro
On Thu, Feb 3, 2011 at 11:46 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > I think we have 2 important technical issues here: > * The consistency is not perfect. Each transaction is started > with small delays in step 1, but we cannot guarantee no other > transaction between them. This is exactly where the patch for synchronized snapshot comes into the game. See https://commitfest.postgresql.org/action/patch_view?id=480 > * Can we inherit connections to child processes with fork() ? > Moreover, we also need to pass running transactions to children. > I wonder libpq is designed for such usage. As far as I know you can inherit sockets to a child program, as long as you make sure that after the fork only one, father or child, uses the socket, the other one should close it. But this wouldn't be a matter with the above mentioned patch anyway. > It might be better to remove Windows-specific codes from the first try. > I doubt Windows message queue is the best API in such console-based > application. I hope we could use the same implementation for all > platforms for inter-process/thread communication. Windows doesn't support pipes, but offers the message queues to exchange messages. Parallel pg_dump only exchanges messages in the form of "DUMP 39209" or "RESTORE OK 48 23 93", it doesn't exchange any large chunks of binary data, just these small textual messages. The messages also stay within the same process, they are just sent between the different threads. The windows part worked just fine when I tested it last time. Do you have any other technology in mind that you think is better suited? Joachim
On Sat, Feb 5, 2011 at 04:50, Joachim Wieland <joe@mcknight.de> wrote: > On Thu, Feb 3, 2011 at 11:46 PM, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> It might be better to remove Windows-specific codes from the first try. >> I doubt Windows message queue is the best API in such console-based >> application. I hope we could use the same implementation for all >> platforms for inter-process/thread communication. > > Windows doesn't support pipes, but offers the message queues to > exchange messages. Parallel pg_dump only exchanges messages in the > form of "DUMP 39209" or "RESTORE OK 48 23 93", it doesn't exchange any > large chunks of binary data, just these small textual messages. The > messages also stay within the same process, they are just sent between > the different threads. The windows part worked just fine when I tested > it last time. Do you have any other technology in mind that you think > is better suited? Haven't been following this thread in details or read the code.. But our /port directory contains a pipe() implementation for Windows, that's used for the syslogger at least. Look in the code for pgpipe(). If using that one works, then that should probably be used rather than something completely custom. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Feb 1, 2011 at 11:32 PM, Joachim Wieland <joe@mcknight.de> wrote: > On Sun, Jan 30, 2011 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> The parallel pg_dump portion of this patch (i.e. the still-uncommitted >> part) no longer applies. Please rebase. > > Here is a rebased version with some minor changes as well. I haven't > tested it on Windows now but will do so as soon as the Unix part has > been reviewed. > code review: something i found, and is a very simple one, is this warning (there's a similar issue in _StartMasterParallel with the buf variable) """ pg_backup_directory.c: In function ‘_EndMasterParallel’: pg_backup_directory.c:856: warning: ‘status’ may be used uninitialized in this function """ i guess the huge amount of info is showing the patch is just for debugging and will be removed before commit, right? functional review: it works good most of the time, just a few points: - if i interrupt the process the connections stay, i guess it could catch the signal and finish the connections - if i have an exclusive lock on a table and a worker starts dumping it, it fails because it can't take the lock but it just say "it was ok" and would prefer an error -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Sun, Feb 6, 2011 at 2:12 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Tue, Feb 1, 2011 at 11:32 PM, Joachim Wieland <joe@mcknight.de> wrote: >> On Sun, Jan 30, 2011 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> The parallel pg_dump portion of this patch (i.e. the still-uncommitted >>> part) no longer applies. Please rebase. >> >> Here is a rebased version with some minor changes as well. I haven't >> tested it on Windows now but will do so as soon as the Unix part has >> been reviewed. >> > > code review: > ah! two other things i forget: - there is no docs - pg_dump and pg_restore are inconsistent: pg_dump requires the directory to be provided with the -f option: pg_dump -Fd -f dir_dump pg_restore pass the directory as an argument for -Fd: pg_restore -Fd dir_dump -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
Hi Jaime, thanks for your review! On Sun, Feb 6, 2011 at 2:12 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > code review: > > something i found, and is a very simple one, is this warning (there's > a similar issue in _StartMasterParallel with the buf variable) > """ > pg_backup_directory.c: In function ‘_EndMasterParallel’: > pg_backup_directory.c:856: warning: ‘status’ may be used uninitialized > in this function > """ Cool. My compiler didn't tell me about this. > i guess the huge amount of info is showing the patch is just for > debugging and will be removed before commit, right? That's right. > functional review: > > it works good most of the time, just a few points: > - if i interrupt the process the connections stay, i guess it could > catch the signal and finish the connections Hm, well, recovering gracefully out of errors could be improved. In your example you would signal the children implicitly because the parent process dies and the pipes to the children would get broken as well. Of course the parent could more actively terminate the children but it might not be the best option to just kill them, as then there will be a lot of "unexpected EOF" connections in the log. So if an error condition comes up in the parent (as in your example, because you canceled the process), then ideally the parent should signal the children with a non-lethal signal and the children should catch this "please terminate" signal and exit cleanly but as soon as possible. If the error case comes up at the child however, then we'd need to make sure that the user sees the error message from the child. This should work well as-is but currently it could happen that the parent exists before all of the children have exited. I'll investigate this a bit... > - if i have an exclusive lock on a table and a worker starts dumping > it, it fails because it can't take the lock but it just say "it was > ok" and would prefer an error I'm getting a clear pg_dump: [Archivierer] could not lock table public.c: ERROR: could not obtain lock on relation "c" but I'll look into this as well. Regarding your other post: > - there is no docs True... > - pg_dump and pg_restore are inconsistent: > pg_dump requires the directory to be provided with the -f option: > pg_dump -Fd -f dir_dump > pg_restore pass the directory as an argument for -Fd: pg_restore -Fd dir_dump Well, this is there with pg_dump and pg_restore currently as well. -F is the switch for the format and it just takes "d" as the format. The dir_dump is an option without any switch. See the output for the --help switches: Usage: pg_dump [OPTION]... [DBNAME] Usage: pg_restore [OPTION]... [FILE] So in either case you don't need to give a switch for what you have. If you run pg_dump you don't give the switch for the database but you need to give it for the output (-f) and with pg_restore you don't give a switch for the file that you're restoring but you'd need to give -d for restoring to a database. Joachim
On Mon, Feb 7, 2011 at 10:42 PM, Joachim Wieland <joe@mcknight.de> wrote: >> i guess the huge amount of info is showing the patch is just for >> debugging and will be removed before commit, right? > > That's right. So how close are we to having a committable version of this? Should we push this out to 9.2? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 8, 2011 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote: > So how close are we to having a committable version of this? Should > we push this out to 9.2? I think so. The feature is pretty attractive, but more works are required:* Re-base on synchronized snapshots patch* Considerto use pipe also on Windows.* Research libpq + fork() issue. We have a warning in docs: http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html | On Unix, forking a process with open libpq connections can lead to unpredictable results -- Itagaki Takahiro
On Tue, Feb 8, 2011 at 8:31 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Feb 8, 2011 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote: >> So how close are we to having a committable version of this? Should >> we push this out to 9.2? > > I think so. The feature is pretty attractive, but more works are required: > * Re-base on synchronized snapshots patch > * Consider to use pipe also on Windows. > * Research libpq + fork() issue. We have a warning in docs: > http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html > | On Unix, forking a process with open libpq connections can lead to > unpredictable results Just for the records, once the sync snapshot patch is committed, there is no need to do fancy libpq + fork() combinations anyway. Unfortunately, so far no committer has commented on the synchronized snapshot patch at all. I am not fighting for getting parallel pg_dump done in 9.1, as I don't really have a personal use case for the patch. However it would be the irony of the year if we shipped 9.1 with a synchronized snapshot patch but no parallel dump :-) Joachim
On Tue, Feb 8, 2011 at 10:54 PM, Joachim Wieland <joe@mcknight.de> wrote: > On Tue, Feb 8, 2011 at 8:31 PM, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> On Tue, Feb 8, 2011 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote: >>> So how close are we to having a committable version of this? Should >>> we push this out to 9.2? >> >> I think so. The feature is pretty attractive, but more works are required: >> * Re-base on synchronized snapshots patch >> * Consider to use pipe also on Windows. >> * Research libpq + fork() issue. We have a warning in docs: >> http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html >> | On Unix, forking a process with open libpq connections can lead to >> unpredictable results > > Just for the records, once the sync snapshot patch is committed, there > is no need to do fancy libpq + fork() combinations anyway. > Unfortunately, so far no committer has commented on the synchronized > snapshot patch at all. > > I am not fighting for getting parallel pg_dump done in 9.1, as I don't > really have a personal use case for the patch. However it would be the > irony of the year if we shipped 9.1 with a synchronized snapshot patch > but no parallel dump :-) True. But it looks like there are some outstanding items from previous reviews that you've yet to address, which makes pushing it out seem fairly reasonable... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company