Thread: Non-text mode for pg_dumpall

Non-text mode for pg_dumpall

From
Andrew Dunstan
Date:
Tom and Nathan opined recently that providing for non-text mode for 
pg_dumpall would be a Good Thing (TM). Not having it has been a 
long-standing complaint, so I've decided to give it a go.

I think we would need to restrict it to directory mode, at least to 
begin with. I would have a toc.dat with a different magic block (say 
"PGGLO" instead of "PGDMP") containing the global entries (roles, 
tablespaces, databases). Then for each database there would be a 
subdirectory (named for its toc entry) with a standard directory mode 
dump for that database. These could be generated in parallel (possibly 
by pg_dumpall calling pg_dump for each database). pg_restore on 
detecting a global type toc.data would restore the globals and then each 
of the databases (again possibly in parallel).

I'm sure there are many wrinkles I haven't thought of, but I don't see 
any insurmountable obstacles, just a significant amount of code.

Barring the unforeseen my main is to have a preliminary patch by the 
September CF.

Following that I would turn my attention to using it in pg_upgrade.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Non-text mode for pg_dumpall

From
Nathan Bossart
Date:
On Mon, Jun 10, 2024 at 08:58:49AM -0400, Andrew Dunstan wrote:
> Tom and Nathan opined recently that providing for non-text mode for
> pg_dumpall would be a Good Thing (TM). Not having it has been a
> long-standing complaint, so I've decided to give it a go.

Thank you!

> I think we would need to restrict it to directory mode, at least to begin
> with. I would have a toc.dat with a different magic block (say "PGGLO"
> instead of "PGDMP") containing the global entries (roles, tablespaces,
> databases). Then for each database there would be a subdirectory (named for
> its toc entry) with a standard directory mode dump for that database. These
> could be generated in parallel (possibly by pg_dumpall calling pg_dump for
> each database). pg_restore on detecting a global type toc.data would restore
> the globals and then each of the databases (again possibly in parallel).

I'm curious why we couldn't also support the "custom" format.

> Following that I would turn my attention to using it in pg_upgrade.

+1

-- 
nathan



Re: Non-text mode for pg_dumpall

From
Andrew Dunstan
Date:
On 2024-06-10 Mo 10:14, Nathan Bossart wrote:
> On Mon, Jun 10, 2024 at 08:58:49AM -0400, Andrew Dunstan wrote:
>> Tom and Nathan opined recently that providing for non-text mode for
>> pg_dumpall would be a Good Thing (TM). Not having it has been a
>> long-standing complaint, so I've decided to give it a go.
> Thank you!
>
>> I think we would need to restrict it to directory mode, at least to begin
>> with. I would have a toc.dat with a different magic block (say "PGGLO"
>> instead of "PGDMP") containing the global entries (roles, tablespaces,
>> databases). Then for each database there would be a subdirectory (named for
>> its toc entry) with a standard directory mode dump for that database. These
>> could be generated in parallel (possibly by pg_dumpall calling pg_dump for
>> each database). pg_restore on detecting a global type toc.data would restore
>> the globals and then each of the databases (again possibly in parallel).
> I'm curious why we couldn't also support the "custom" format.


We could, but the housekeeping would be a bit harder. We'd need to keep 
pointers to the offsets of the per-database TOCs (I don't want to have a 
single per-cluster TOC). And we can't produce it in parallel, so I'd 
rather start with something we can produce in parallel.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Non-text mode for pg_dumpall

From
Magnus Hagander
Date:


On Mon, Jun 10, 2024 at 4:14 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Jun 10, 2024 at 08:58:49AM -0400, Andrew Dunstan wrote:
> Tom and Nathan opined recently that providing for non-text mode for
> pg_dumpall would be a Good Thing (TM). Not having it has been a
> long-standing complaint, so I've decided to give it a go.

Thank you!

Indeed, this has been quite annoying!


> I think we would need to restrict it to directory mode, at least to begin
> with. I would have a toc.dat with a different magic block (say "PGGLO"
> instead of "PGDMP") containing the global entries (roles, tablespaces,
> databases). Then for each database there would be a subdirectory (named for
> its toc entry) with a standard directory mode dump for that database. These
> could be generated in parallel (possibly by pg_dumpall calling pg_dump for
> each database). pg_restore on detecting a global type toc.data would restore
> the globals and then each of the databases (again possibly in parallel).

I'm curious why we couldn't also support the "custom" format.

Or maybe even a combo - a directory of custom format files? Plus that one special file being globals? I'd say that's what most use cases I've seen would prefer.

--

Re: Non-text mode for pg_dumpall

From
Nathan Bossart
Date:
On Mon, Jun 10, 2024 at 10:51:42AM -0400, Andrew Dunstan wrote:
> On 2024-06-10 Mo 10:14, Nathan Bossart wrote:
>> I'm curious why we couldn't also support the "custom" format.
> 
> We could, but the housekeeping would be a bit harder. We'd need to keep
> pointers to the offsets of the per-database TOCs (I don't want to have a
> single per-cluster TOC). And we can't produce it in parallel, so I'd rather
> start with something we can produce in parallel.

Got it.

-- 
nathan



Re: Non-text mode for pg_dumpall

From
Nathan Bossart
Date:
On Mon, Jun 10, 2024 at 04:52:06PM +0200, Magnus Hagander wrote:
> On Mon, Jun 10, 2024 at 4:14 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> I'm curious why we couldn't also support the "custom" format.
> 
> Or maybe even a combo - a directory of custom format files? Plus that one
> special file being globals? I'd say that's what most use cases I've seen
> would prefer.

Is there a particular advantage to that approach as opposed to just using
"directory" mode for everything?  I know pg_upgrade uses "custom" mode for
each of the databases, so a combo approach would be a closer match to the
existing behavior, but that doesn't strike me as an especially strong
reason to keep doing it that way.

-- 
nathan



Re: Non-text mode for pg_dumpall

From
Magnus Hagander
Date:


On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Jun 10, 2024 at 04:52:06PM +0200, Magnus Hagander wrote:
> On Mon, Jun 10, 2024 at 4:14 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> I'm curious why we couldn't also support the "custom" format.
>
> Or maybe even a combo - a directory of custom format files? Plus that one
> special file being globals? I'd say that's what most use cases I've seen
> would prefer.

Is there a particular advantage to that approach as opposed to just using
"directory" mode for everything?  I know pg_upgrade uses "custom" mode for
each of the databases, so a combo approach would be a closer match to the
existing behavior, but that doesn't strike me as an especially strong
reason to keep doing it that way.

A gazillion files to deal with? Much easier to work with individual custom files if you're moving databases around and things like that. Much easier to monitor eg sizes/dates if you're using it for backups.

It's not things that are make-it-or-break-it or anything, but there are some smaller things that definitely can be useful.
 
--

Re: Non-text mode for pg_dumpall

From
Nathan Bossart
Date:
On Mon, Jun 10, 2024 at 05:45:19PM +0200, Magnus Hagander wrote:
> On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> Is there a particular advantage to that approach as opposed to just using
>> "directory" mode for everything?  I know pg_upgrade uses "custom" mode for
>> each of the databases, so a combo approach would be a closer match to the
>> existing behavior, but that doesn't strike me as an especially strong
>> reason to keep doing it that way.
> 
> A gazillion files to deal with? Much easier to work with individual custom
> files if you're moving databases around and things like that.
> Much easier to monitor eg sizes/dates if you're using it for backups.
> 
> It's not things that are make-it-or-break-it or anything, but there are
> some smaller things that definitely can be useful.

Makes sense, thanks for elaborating.

-- 
nathan



Re: Non-text mode for pg_dumpall

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> Is there a particular advantage to that approach as opposed to just using
>> "directory" mode for everything?

> A gazillion files to deal with? Much easier to work with individual custom
> files if you're moving databases around and things like that.
> Much easier to monitor eg sizes/dates if you're using it for backups.

You can always tar up the directory tree after-the-fact if you want
one file.  Sure, that step's not parallelized, but I think we'd need
some non-parallelized copying to create such a file anyway.

            regards, tom lane



Re: Non-text mode for pg_dumpall

From
Andrew Dunstan
Date:
On 2024-06-10 Mo 12:21, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com>
>> wrote:
>>> Is there a particular advantage to that approach as opposed to just using
>>> "directory" mode for everything?
>> A gazillion files to deal with? Much easier to work with individual custom
>> files if you're moving databases around and things like that.
>> Much easier to monitor eg sizes/dates if you're using it for backups.
> You can always tar up the directory tree after-the-fact if you want
> one file.  Sure, that step's not parallelized, but I think we'd need
> some non-parallelized copying to create such a file anyway.
>
>             


Yeah.

I think I can probably allow for Magnus' suggestion fairly easily, but 
if I have to choose I'm going to go for the format that can be produced 
with the maximum parallelism.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Non-text mode for pg_dumpall

From
Magnus Hagander
Date:
On Mon, Jun 10, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Jun 10, 2024 at 5:03 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> Is there a particular advantage to that approach as opposed to just using
>> "directory" mode for everything?

> A gazillion files to deal with? Much easier to work with individual custom
> files if you're moving databases around and things like that.
> Much easier to monitor eg sizes/dates if you're using it for backups.

You can always tar up the directory tree after-the-fact if you want
one file.  Sure, that step's not parallelized, but I think we'd need
some non-parallelized copying to create such a file anyway.

That would require double the disk space.

But you can also just run pg_dump manually on each database and a pg_dumpall -g like people are doing today -- I thought this whole thing was about making it more convenient :)
 
--

Re: Non-text mode for pg_dumpall

From
Nathan Bossart
Date:
On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote:
> Here, I am attaching an updated patch. I fixed some bugs of v01 patch and
> did some code cleanup also.

Thank you for picking this up!  I started to review it, but the
documentation changes didn't build, and a few tests in check-world are
failing.  Would you mind resolving those issues?  Also, if you haven't
already, please add an entry to the next commitfest [0] to ensure that 1)
this feature is tracked and 2) the automated tests will run.

+    if (dbfile)
+    {
+        printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
+                          dbfile, create_opts);
+        appendPQExpBufferStr(&cmd, " -F d ");
+    }

Have you given any thought to allowing a directory of custom format files,
as discussed upthread [1]?  Perhaps that is better handled as a follow-up
patch, but it'd be good to understand the plan, anyway.

[0] https://commitfest.postgresql.org
[1] https://postgr.es/m/CABUevExoQ26jo%2BaQ9QZq%2BUMA1aD6gfpm9xBnh_t5e0DhaCeRYA%40mail.gmail.com

-- 
nathan



Re: Non-text mode for pg_dumpall

From
Guillaume Lelarge
Date:
Hi,

Le mer. 8 janv. 2025 à 17:41, Mahendra Singh Thalor <mahi6run@gmail.com> a écrit :
On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> Hi all,
>
> On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart <nathandbossart@gmail.com> wrote:
> > >
> > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote:
> > > > Here, I am attaching an updated patch. I fixed some bugs of v01 patch and
> > > > did some code cleanup also.
> > >
> > > Thank you for picking this up!  I started to review it, but the
> > > documentation changes didn't build, and a few tests in check-world are
> > > failing.  Would you mind resolving those issues?  Also, if you haven't
> > > already, please add an entry to the next commitfest [0] to ensure that 1)
> > > this feature is tracked and 2) the automated tests will run.
> >
> > Thanks Nathan for the quick response.
> >
> > I fixed bugs of documentation changes and check-world in the latest patch. Now docs are building and check-world is passing.
> >
> > I added entry into commitfest for this patch.[0]
> >
> > >
> > > +       if (dbfile)
> > > +       {
> > > +               printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > > +                                                 dbfile, create_opts);
> > > +               appendPQExpBufferStr(&cmd, " -F d ");
> > > +       }
> > >
> > > Have you given any thought to allowing a directory of custom format files,
> > > as discussed upthread [1]?  Perhaps that is better handled as a follow-up
> > > patch, but it'd be good to understand the plan, anyway.
> >
> > I will make these changes and will test. I will update my findings after doing some testing.
>
> In the latest patch, I added dump and restoring for directory/custom/tar/plain formats. Please consider this patch for review and testing.
>
> Design:
> When we give --format=d|c|t then we are dumping all global sql commands in global.dat in plain sql format and we are making a map.dat file with dbname and dboid. For each database, we are making separate subdirectory with dboid under databases directory and dumping as per archive format(d|c|t).
> While restoring, first we are restoring all global sql commands from global.dat and then we are restoring one by one all databases.  As we are supporting --exclude-database with pg_dumpall, the same we are supporting with pg_restore also to skip restoring on some specified database patterns.
> If we want to restore a single database, then we can specided particular subdirectory from the databases folder. To get file name, we refer dbname into map.file.
>
> TODO: Now I will work on test cases for these new added options to the pg_dumpall and pg_restore.
>
> Here, I am attaching the v04 patch for testing and review.

Sorry. My mistake.
v04 was the delta patch on the top of v03.

Here, I am attaching the v05 patch for testing and review.


Just FWIW, I did a quick test tonight. It applies cleanly, compiles OK. I did a dump:

$ pg_dumpall -Fd -f dir

and then a restore (after dropping the databases I had):

$ pg_restore -Cd postgres -v dir

It worked really well. That's great.

Quick thing to fix: you've got this error message:
pg_restore: error:  -d/--dbanme should be given when using archive dump of pg_dumpall

I guess it is --dbname, rather than --dbanme.

Of course, it needs much more testing, but this feature would be great to have. Thanks for working on this!


--
Guillaume.

Re: Non-text mode for pg_dumpall

From
Mahendra Singh Thalor
Date:
> hi.
> After some tests and thinking about your reply, I admit that using
> expand_dbname_patterns
> in pg_restore will not work.
> We need to do pattern matching against the map.dat file.
> Please check the attached v12 series based on your
> v11_pg_dumpall-with-directory-tar-custom-format-21-jan.patch
>
> v12-0001 cosmetic change.
> v12-0002 implement pg_resore --exclude-database=PATTERN.
> main gist of implementation:
> for each database name in map.dat file,
> check if this database name pattern matches with PATTERN or not.
> pattern matching is using processSQLNamePattern.
>
> your substring will not work.
> some of the test cases.
> $BIN10/pg_restore --exclude-database=* -Cd template1 --verbose dir10 >
> dir_format 2>&1

Hi,
As per discussion with Robert Haas and Dilip Kumar, we thought that we
can't assume that
there will be a db connection every time while doing pg_restore but in
attached patch, we are
assuming that we have a db connection.
In my previous updates, I already mentioned this problem. I think, we
should not use connection
for --exclude-database, rather we should use direct functions to
validate patterns or we should
restrict as NAME only.

On Sun, 26 Jan 2025 at 20:17, jian he <jian.universality@gmail.com> wrote:
>
> hi.
> attached patching trying to refactor ReadOneStatement
> for properly handling the single and double quotes.
> the commit message also has some tests on it.
>
> it is based on your
> v11_pg_dumpall-with-directory-tar-custom-format-21-jan.patch.

I think, instead of char, if we read line by line, then we don't need
that much code and need not to worry about double quotes.
In the next version, I will merge some patches and will change it to
read line by line.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: Non-text mode for pg_dumpall

From
Srinath Reddy
Date:

Hi mahendra,

I have reviewed the code in the v11 patch and it looks good to me.

But in common_dumpall_restore.c there's  parseDumpFormat which is common between pg_dumpall and pg_restore ,as per the discussion in [1] thread i don't think we should create a common api ,as discussed in the thread there might chances in the future we might decide that some format is obsolete and desupport it in pg_dumpall ,while support in pg_restore for compatibility reasons.

[1] https://www.postgresql.org/message-id/flat/CAFC%2Bb6pfK-BGcWW1kQmtxVrCh-JGjB2X02rLPQs_ZFaDGjZDsQ%40mail.gmail.com

Regards,
Srinath Reddy Sadipiralla,
EDB: http://www.enterprisedb.com

Re: Non-text mode for pg_dumpall

From
Mahendra Singh Thalor
Date:
On Tue, 28 Jan 2025 at 10:19, Srinath Reddy <srinath2133@gmail.com> wrote:
>
>
> Hi mahendra,
>
> I have reviewed the code in the v11 patch and it looks good to me.
>
> But in common_dumpall_restore.c there's  parseDumpFormat which is common between pg_dumpall and pg_restore ,as per
thediscussion in [1] thread i don't think we should create a common api ,as discussed in the thread there might chances
inthe future we might decide that some format is obsolete and desupport it in pg_dumpall ,while support in pg_restore
forcompatibility reasons. 
>

Oaky. Thanks for review. I will make changes as per discussion in
another thread.


On Tue, 28 Jan 2025 at 11:52, Srinath Reddy <srinath2133@gmail.com> wrote:
>
> make check-world fails,i think we don't need $port and $filename instead we can use something like 'xxx'.so fixed it
inthe below patch. 

In offline discussion, Andew already reported this test case. I will
fix this in the next version.


>
> Regards,
> Srinath Reddy Sadipiralla,
> EDB: http://www.enterprisedb.com
>


--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com