Thread: Non-text mode for pg_dumpall
Tom and Nathan opined recently that providing for non-text mode for pg_dumpall would be a Good Thing (TM). Not having it has been a long-standing complaint, so I've decided to give it a go. I think we would need to restrict it to directory mode, at least to begin with. I would have a toc.dat with a different magic block (say "PGGLO" instead of "PGDMP") containing the global entries (roles, tablespaces, databases). Then for each database there would be a subdirectory (named for its toc entry) with a standard directory mode dump for that database. These could be generated in parallel (possibly by pg_dumpall calling pg_dump for each database). pg_restore on detecting a global type toc.data would restore the globals and then each of the databases (again possibly in parallel). I'm sure there are many wrinkles I haven't thought of, but I don't see any insurmountable obstacles, just a significant amount of code. Barring the unforeseen my main is to have a preliminary patch by the September CF. Following that I would turn my attention to using it in pg_upgrade. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jun 10, 2024 at 08:58:49AM -0400, Andrew Dunstan wrote: > Tom and Nathan opined recently that providing for non-text mode for > pg_dumpall would be a Good Thing (TM). Not having it has been a > long-standing complaint, so I've decided to give it a go. Thank you! > I think we would need to restrict it to directory mode, at least to begin > with. I would have a toc.dat with a different magic block (say "PGGLO" > instead of "PGDMP") containing the global entries (roles, tablespaces, > databases). Then for each database there would be a subdirectory (named for > its toc entry) with a standard directory mode dump for that database. These > could be generated in parallel (possibly by pg_dumpall calling pg_dump for > each database). pg_restore on detecting a global type toc.data would restore > the globals and then each of the databases (again possibly in parallel). I'm curious why we couldn't also support the "custom" format. > Following that I would turn my attention to using it in pg_upgrade. +1 -- nathan
On 2024-06-10 Mo 10:14, Nathan Bossart wrote: > On Mon, Jun 10, 2024 at 08:58:49AM -0400, Andrew Dunstan wrote: >> Tom and Nathan opined recently that providing for non-text mode for >> pg_dumpall would be a Good Thing (TM). Not having it has been a >> long-standing complaint, so I've decided to give it a go. > Thank you! > >> I think we would need to restrict it to directory mode, at least to begin >> with. I would have a toc.dat with a different magic block (say "PGGLO" >> instead of "PGDMP") containing the global entries (roles, tablespaces, >> databases). Then for each database there would be a subdirectory (named for >> its toc entry) with a standard directory mode dump for that database. These >> could be generated in parallel (possibly by pg_dumpall calling pg_dump for >> each database). pg_restore on detecting a global type toc.data would restore >> the globals and then each of the databases (again possibly in parallel). > I'm curious why we couldn't also support the "custom" format. We could, but the housekeeping would be a bit harder. We'd need to keep pointers to the offsets of the per-database TOCs (I don't want to have a single per-cluster TOC). And we can't produce it in parallel, so I'd rather start with something we can produce in parallel. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jun 10, 2024 at 4:14 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Jun 10, 2024 at 08:58:49AM -0400, Andrew Dunstan wrote:
> Tom and Nathan opined recently that providing for non-text mode for
> pg_dumpall would be a Good Thing (TM). Not having it has been a
> long-standing complaint, so I've decided to give it a go.
Thank you!
Indeed, this has been quite annoying!
> I think we would need to restrict it to directory mode, at least to begin
> with. I would have a toc.dat with a different magic block (say "PGGLO"
> instead of "PGDMP") containing the global entries (roles, tablespaces,
> databases). Then for each database there would be a subdirectory (named for
> its toc entry) with a standard directory mode dump for that database. These
> could be generated in parallel (possibly by pg_dumpall calling pg_dump for
> each database). pg_restore on detecting a global type toc.data would restore
> the globals and then each of the databases (again possibly in parallel).
I'm curious why we couldn't also support the "custom" format.
Or maybe even a combo - a directory of custom format files? Plus that one special file being globals? I'd say that's what most use cases I've seen would prefer.
On Mon, Jun 10, 2024 at 10:51:42AM -0400, Andrew Dunstan wrote: > On 2024-06-10 Mo 10:14, Nathan Bossart wrote: >> I'm curious why we couldn't also support the "custom" format. > > We could, but the housekeeping would be a bit harder. We'd need to keep > pointers to the offsets of the per-database TOCs (I don't want to have a > single per-cluster TOC). And we can't produce it in parallel, so I'd rather > start with something we can produce in parallel. Got it. -- nathan
On Mon, Jun 10, 2024 at 04:52:06PM +0200, Magnus Hagander wrote: > On Mon, Jun 10, 2024 at 4:14 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> I'm curious why we couldn't also support the "custom" format. > > Or maybe even a combo - a directory of custom format files? Plus that one > special file being globals? I'd say that's what most use cases I've seen > would prefer. Is there a particular advantage to that approach as opposed to just using "directory" mode for everything? I know pg_upgrade uses "custom" mode for each of the databases, so a combo approach would be a closer match to the existing behavior, but that doesn't strike me as an especially strong reason to keep doing it that way. -- nathan
On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Jun 10, 2024 at 04:52:06PM +0200, Magnus Hagander wrote:
> On Mon, Jun 10, 2024 at 4:14 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> I'm curious why we couldn't also support the "custom" format.
>
> Or maybe even a combo - a directory of custom format files? Plus that one
> special file being globals? I'd say that's what most use cases I've seen
> would prefer.
Is there a particular advantage to that approach as opposed to just using
"directory" mode for everything? I know pg_upgrade uses "custom" mode for
each of the databases, so a combo approach would be a closer match to the
existing behavior, but that doesn't strike me as an especially strong
reason to keep doing it that way.
A gazillion files to deal with? Much easier to work with individual custom files if you're moving databases around and things like that. Much easier to monitor eg sizes/dates if you're using it for backups.
It's not things that are make-it-or-break-it or anything, but there are some smaller things that definitely can be useful.
On Mon, Jun 10, 2024 at 05:45:19PM +0200, Magnus Hagander wrote: > On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> Is there a particular advantage to that approach as opposed to just using >> "directory" mode for everything? I know pg_upgrade uses "custom" mode for >> each of the databases, so a combo approach would be a closer match to the >> existing behavior, but that doesn't strike me as an especially strong >> reason to keep doing it that way. > > A gazillion files to deal with? Much easier to work with individual custom > files if you're moving databases around and things like that. > Much easier to monitor eg sizes/dates if you're using it for backups. > > It's not things that are make-it-or-break-it or anything, but there are > some smaller things that definitely can be useful. Makes sense, thanks for elaborating. -- nathan
Magnus Hagander <magnus@hagander.net> writes: > On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> Is there a particular advantage to that approach as opposed to just using >> "directory" mode for everything? > A gazillion files to deal with? Much easier to work with individual custom > files if you're moving databases around and things like that. > Much easier to monitor eg sizes/dates if you're using it for backups. You can always tar up the directory tree after-the-fact if you want one file. Sure, that step's not parallelized, but I think we'd need some non-parallelized copying to create such a file anyway. regards, tom lane
On 2024-06-10 Mo 12:21, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com> >> wrote: >>> Is there a particular advantage to that approach as opposed to just using >>> "directory" mode for everything? >> A gazillion files to deal with? Much easier to work with individual custom >> files if you're moving databases around and things like that. >> Much easier to monitor eg sizes/dates if you're using it for backups. > You can always tar up the directory tree after-the-fact if you want > one file. Sure, that step's not parallelized, but I think we'd need > some non-parallelized copying to create such a file anyway. > > Yeah. I think I can probably allow for Magnus' suggestion fairly easily, but if I have to choose I'm going to go for the format that can be produced with the maximum parallelism. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jun 10, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> Is there a particular advantage to that approach as opposed to just using
>> "directory" mode for everything?
> A gazillion files to deal with? Much easier to work with individual custom
> files if you're moving databases around and things like that.
> Much easier to monitor eg sizes/dates if you're using it for backups.
You can always tar up the directory tree after-the-fact if you want
one file. Sure, that step's not parallelized, but I think we'd need
some non-parallelized copying to create such a file anyway.
That would require double the disk space.
But you can also just run pg_dump manually on each database and a pg_dumpall -g like people are doing today -- I thought this whole thing was about making it more convenient :)
On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote: > Here, I am attaching an updated patch. I fixed some bugs of v01 patch and > did some code cleanup also. Thank you for picking this up! I started to review it, but the documentation changes didn't build, and a few tests in check-world are failing. Would you mind resolving those issues? Also, if you haven't already, please add an entry to the next commitfest [0] to ensure that 1) this feature is tracked and 2) the automated tests will run. + if (dbfile) + { + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, + dbfile, create_opts); + appendPQExpBufferStr(&cmd, " -F d "); + } Have you given any thought to allowing a directory of custom format files, as discussed upthread [1]? Perhaps that is better handled as a follow-up patch, but it'd be good to understand the plan, anyway. [0] https://commitfest.postgresql.org [1] https://postgr.es/m/CABUevExoQ26jo%2BaQ9QZq%2BUMA1aD6gfpm9xBnh_t5e0DhaCeRYA%40mail.gmail.com -- nathan
Hi,
Le mer. 8 janv. 2025 à 17:41, Mahendra Singh Thalor <mahi6run@gmail.com> a écrit :
On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> Hi all,
>
> On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart <nathandbossart@gmail.com> wrote:
> > >
> > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote:
> > > > Here, I am attaching an updated patch. I fixed some bugs of v01 patch and
> > > > did some code cleanup also.
> > >
> > > Thank you for picking this up! I started to review it, but the
> > > documentation changes didn't build, and a few tests in check-world are
> > > failing. Would you mind resolving those issues? Also, if you haven't
> > > already, please add an entry to the next commitfest [0] to ensure that 1)
> > > this feature is tracked and 2) the automated tests will run.
> >
> > Thanks Nathan for the quick response.
> >
> > I fixed bugs of documentation changes and check-world in the latest patch. Now docs are building and check-world is passing.
> >
> > I added entry into commitfest for this patch.[0]
> >
> > >
> > > + if (dbfile)
> > > + {
> > > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > > + dbfile, create_opts);
> > > + appendPQExpBufferStr(&cmd, " -F d ");
> > > + }
> > >
> > > Have you given any thought to allowing a directory of custom format files,
> > > as discussed upthread [1]? Perhaps that is better handled as a follow-up
> > > patch, but it'd be good to understand the plan, anyway.
> >
> > I will make these changes and will test. I will update my findings after doing some testing.
>
> In the latest patch, I added dump and restoring for directory/custom/tar/plain formats. Please consider this patch for review and testing.
>
> Design:
> When we give --format=d|c|t then we are dumping all global sql commands in global.dat in plain sql format and we are making a map.dat file with dbname and dboid. For each database, we are making separate subdirectory with dboid under databases directory and dumping as per archive format(d|c|t).
> While restoring, first we are restoring all global sql commands from global.dat and then we are restoring one by one all databases. As we are supporting --exclude-database with pg_dumpall, the same we are supporting with pg_restore also to skip restoring on some specified database patterns.
> If we want to restore a single database, then we can specided particular subdirectory from the databases folder. To get file name, we refer dbname into map.file.
>
> TODO: Now I will work on test cases for these new added options to the pg_dumpall and pg_restore.
>
> Here, I am attaching the v04 patch for testing and review.
Sorry. My mistake.
v04 was the delta patch on the top of v03.
Here, I am attaching the v05 patch for testing and review.
Just FWIW, I did a quick test tonight. It applies cleanly, compiles OK. I did a dump:
$ pg_dumpall -Fd -f dir
and then a restore (after dropping the databases I had):
$ pg_restore -Cd postgres -v dir
It worked really well. That's great.
Quick thing to fix: you've got this error message:
pg_restore: error: -d/--dbanme should be given when using archive dump of pg_dumpall
I guess it is --dbname, rather than --dbanme.
Of course, it needs much more testing, but this feature would be great to have. Thanks for working on this!
Guillaume.
> hi. > After some tests and thinking about your reply, I admit that using > expand_dbname_patterns > in pg_restore will not work. > We need to do pattern matching against the map.dat file. > Please check the attached v12 series based on your > v11_pg_dumpall-with-directory-tar-custom-format-21-jan.patch > > v12-0001 cosmetic change. > v12-0002 implement pg_resore --exclude-database=PATTERN. > main gist of implementation: > for each database name in map.dat file, > check if this database name pattern matches with PATTERN or not. > pattern matching is using processSQLNamePattern. > > your substring will not work. > some of the test cases. > $BIN10/pg_restore --exclude-database=* -Cd template1 --verbose dir10 > > dir_format 2>&1 Hi, As per discussion with Robert Haas and Dilip Kumar, we thought that we can't assume that there will be a db connection every time while doing pg_restore but in attached patch, we are assuming that we have a db connection. In my previous updates, I already mentioned this problem. I think, we should not use connection for --exclude-database, rather we should use direct functions to validate patterns or we should restrict as NAME only. On Sun, 26 Jan 2025 at 20:17, jian he <jian.universality@gmail.com> wrote: > > hi. > attached patching trying to refactor ReadOneStatement > for properly handling the single and double quotes. > the commit message also has some tests on it. > > it is based on your > v11_pg_dumpall-with-directory-tar-custom-format-21-jan.patch. I think, instead of char, if we read line by line, then we don't need that much code and need not to worry about double quotes. In the next version, I will merge some patches and will change it to read line by line. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Hi mahendra,
I have reviewed the code in the v11 patch and it looks good to me.
But in common_dumpall_restore.c there's parseDumpFormat which is common between pg_dumpall and pg_restore ,as per the discussion in [1] thread i don't think we should create a common api ,as discussed in the thread there might chances in the future we might decide that some format is obsolete and desupport it in pg_dumpall ,while support in pg_restore for compatibility reasons.
[1] https://www.postgresql.org/message-id/flat/CAFC%2Bb6pfK-BGcWW1kQmtxVrCh-JGjB2X02rLPQs_ZFaDGjZDsQ%40mail.gmail.com
Regards,
Srinath Reddy Sadipiralla,
EDB: http://www.enterprisedb.com
On Tue, 28 Jan 2025 at 10:19, Srinath Reddy <srinath2133@gmail.com> wrote: > > > Hi mahendra, > > I have reviewed the code in the v11 patch and it looks good to me. > > But in common_dumpall_restore.c there's parseDumpFormat which is common between pg_dumpall and pg_restore ,as per thediscussion in [1] thread i don't think we should create a common api ,as discussed in the thread there might chances inthe future we might decide that some format is obsolete and desupport it in pg_dumpall ,while support in pg_restore forcompatibility reasons. > Oaky. Thanks for review. I will make changes as per discussion in another thread. On Tue, 28 Jan 2025 at 11:52, Srinath Reddy <srinath2133@gmail.com> wrote: > > make check-world fails,i think we don't need $port and $filename instead we can use something like 'xxx'.so fixed it inthe below patch. In offline discussion, Andew already reported this test case. I will fix this in the next version. > > Regards, > Srinath Reddy Sadipiralla, > EDB: http://www.enterprisedb.com > -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com