Thread: Purpose of pg_dump tar archive format?

Purpose of pg_dump tar archive format?

From
Ron Johnson
Date:
It doesn't support compression nor restore reordering like the custom format, so I'm having trouble seeing why it still exists (at least without a doc warning that it's obsolete).

Re: Purpose of pg_dump tar archive format?

From
Erik Wienhold
Date:
On 2024-06-04 14:13 +0200, Ron Johnson wrote:
> It doesn't support compression nor restore reordering like the custom
> format, so I'm having trouble seeing why it still exists (at least without
> a doc warning that it's obsolete).

Maybe because of this:

> *    The tar format also includes a 'restore.sql' script which is there for
> *    the benefit of humans. This script is never used by pg_restore.

I've never used that restore.sql, so I can't say how useful it is,
especially since it requires editing $$PATH$$.  Should be mentioned in
the docs if that file is still relevant.

-- 
Erik



Re: Purpose of pg_dump tar archive format?

From
Adrian Klaver
Date:
On 6/4/24 05:13, Ron Johnson wrote:
> It doesn't support compression nor restore reordering like the custom 
> format, so I'm having trouble seeing why it still exists (at least 
> without a doc warning that it's obsolete).

pg_dump -d test -U postgres -Ft  | gzip --stdout > test.tgz

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Purpose of pg_dump tar archive format?

From
Ron Johnson
Date:
On Tue, Jun 4, 2024 at 10:43 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 6/4/24 05:13, Ron Johnson wrote:
> It doesn't support compression nor restore reordering like the custom
> format, so I'm having trouble seeing why it still exists (at least
> without a doc warning that it's obsolete).

pg_dump -d test -U postgres -Ft  | gzip --stdout > test.tgz

Who's got meaningful databases that small anymore?

And if you've got meaningfully sized databases, open port 5432 and move them using pg_dump.

Re: Purpose of pg_dump tar archive format?

From
Adrian Klaver
Date:
On 6/4/24 08:25, Ron Johnson wrote:
> On Tue, Jun 4, 2024 at 10:43 AM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 6/4/24 05:13, Ron Johnson wrote:
>      > It doesn't support compression nor restore reordering like the
>     custom
>      > format, so I'm having trouble seeing why it still exists (at least
>      > without a doc warning that it's obsolete).
> 
>     pg_dump -d test -U postgres -Ft  | gzip --stdout > test.tgz
> 
> 
> Who's got meaningful databases that small anymore?
> 
> And if you've got meaningfully sized databases, open port 5432 and move 
> them using pg_dump.

If you don't need the tar format then don't use it.

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Purpose of pg_dump tar archive format?

From
Tom Lane
Date:
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> If you don't need the tar format then don't use it.

Indeed.  FTR, I think the original motivation for tar format was to
have a "standard" structured output format that could be manipulated
with tools other than pg_restore.  Years later, we added the directory
format which could be argued to serve the same purpose.  But nobody
particularly wanted to remove the tar functionality, especially since
the two were made to be compatible:

commit 7f508f1c6b515df66d27f860b2faa7b5761fa55d
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Sun Jan 23 23:10:15 2011 +0200

    Add 'directory' format to pg_dump. The new directory format is compatible
    with the 'tar' format, in that untarring a tar format archive produces a
    valid directory format archive.

            regards, tom lane



Re: Purpose of pg_dump tar archive format?

From
Adrian Klaver
Date:
Reply to list also.
Ccing list

On 6/4/24 10:03 AM, Ron Johnson wrote:

> 
>     If you don't need the tar format then don't use it.
> 
> 
> That's neither the purpose nor the point of my question.
> 
> I think that a note in the docs mentioning that it's obsolete would be 
> helpful for new users who recognize "tar" so choose it.

You are assuming facts not in evidence, namely that the format is obsolete.

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Purpose of pg_dump tar archive format?

From
Shaheed Haque
Date:

We use it. I bet lots of others do too.

On Tue, 4 Jun 2024, 18:06 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
Reply to list also.
Ccing list

On 6/4/24 10:03 AM, Ron Johnson wrote:

>
>     If you don't need the tar format then don't use it.
>
>
> That's neither the purpose nor the point of my question.
>
> I think that a note in the docs mentioning that it's obsolete would be
> helpful for new users who recognize "tar" so choose it.

You are assuming facts not in evidence, namely that the format is obsolete.

--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Purpose of pg_dump tar archive format?

From
Rob Sargent
Date:

On 6/4/24 11:40, Shaheed Haque wrote:
>
> We use it. I bet lots of others do too.
>
>

Of course.  There are lots of small, real, useful databases in the wild.




Re: Purpose of pg_dump tar archive format?

From
Ron Johnson
Date:
On Tue, Jun 4, 2024 at 2:55 PM Rob Sargent <robjsargent@gmail.com> wrote:


On 6/4/24 11:40, Shaheed Haque wrote:
>
> We use it. I bet lots of others do too.
>
>

Of course.  There are lots of small, real, useful databases in the wild.
 
But why tar instead of custom? That was part of my original question.

Re: Purpose of pg_dump tar archive format?

From
Rob Sargent
Date:


On 6/4/24 13:15, Ron Johnson wrote:
On Tue, Jun 4, 2024 at 2:55 PM Rob Sargent <robjsargent@gmail.com> wrote:


On 6/4/24 11:40, Shaheed Haque wrote:
>
> We use it. I bet lots of others do too.
>
>

Of course.  There are lots of small, real, useful databases in the wild.
 
But why tar instead of custom? That was part of my original question.

That I can't say.  Familiarity?  Fewer keystrokes?


Re: Purpose of pg_dump tar archive format?

From
Gavin Roy
Date:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump for large databases outside of the normal pg_dump/pg_restore workflow. You don't have to seek through one large binary file to get to the data section to get at the data.

Re: Purpose of pg_dump tar archive format?

From
Shaheed Haque
Date:


On Tue, 4 Jun 2024 at 20:47, Gavin Roy <gavinr@aweber.com> wrote:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump for large databases outside of the normal pg_dump/pg_restore workflow. You don't have to seek through one large binary file to get to the data section to get at the data.

This is true for us too; specifically, tar, including with compression, is very convenient for both CLI and Python ecosystems.

Re: Purpose of pg_dump tar archive format?

From
Ron Johnson
Date:
On Tue, Jun 4, 2024 at 3:47 PM Gavin Roy <gavinr@aweber.com> wrote:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump for large databases outside of the normal pg_dump/pg_restore workflow. You don't have to seek through one large binary file to get to the data section to get at the data.

Interesting.  Please explain, though, since a big tarball _is_ "one large binary file" that you have to sequentially scan.  (I don't know the internal structure of custom format files, and whether they have file pointers to each table.)

Is it because you need individual .dat "COPY" files for something other than loading into PG tables (since pg_restore --table=xxxx does that, too), and directory format archives can be inconvenient?

Re: Purpose of pg_dump tar archive format?

From
Gavin Roy
Date:

On Tue, Jun 4, 2024 at 7:36 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Tue, Jun 4, 2024 at 3:47 PM Gavin Roy <gavinr@aweber.com> wrote:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump for large databases outside of the normal pg_dump/pg_restore workflow. You don't have to seek through one large binary file to get to the data section to get at the data.

Interesting.  Please explain, though, since a big tarball _is_ "one large binary file" that you have to sequentially scan.  (I don't know the internal structure of custom format files, and whether they have file pointers to each table.)

Not if you untar it first.
 
Is it because you need individual .dat "COPY" files for something other than loading into PG tables (since pg_restore --table=xxxx does that, too), and directory format archives can be inconvenient?

In the past I've used it for data analysis outside of Postgres.
--
Gavin M. Roy
CTO
AWeber