Thread: RE: RE: [HACKERS] pg_dump & blobs - editable dump?

RE: RE: [HACKERS] pg_dump & blobs - editable dump?

From
Philip Warner
Date:
At 15:25 12/07/00 +0100, Peter Mount wrote:
>No he didn't, just I've been sort of lurking on this subject ;-)
>
>Actually, tar files are simply a small header, followed by the file's
>contents. To add another file, you simply write another header, and contents
>(which is why you can cat two tar files together and get a working file).
>
>http://www.goice.co.jp/member/mo/formats/tar.html has a nice brief
>description of the header.
>

Damn! I knew someone would call my bluff.

As you say, it looks remarkably simple.

A couple of questions:


    136     12 bytes  Modify time (in octal ascii)

    ...do you know the format of the date (seconds since 1970?).


    157    100 bytes  Linkname ('\0' terminated, 99 maxmum length)

    ...what's this? Is it the target for symlinks?


    329      8 bytes  Major device ID (in octal ascii)
    337      8 bytes  Minor device ID (in octal ascii)345    167 bytes
Padding

    ...and what should I set these to?

>As for a C api with a compatible licence, if needs must I'll write one to
>your spec (maidast should be back online in a couple of days, so I'll be
>back in business development wise).

If you're serious about the offer, I'd be happy. But, given how simple the
format is, I can probably tack in into place myself.

There is a minor problem. Currently I compress the output stream as I
receive it from PG, and send it to the archive. I don't know how big it
will be until it is written. The custom output format can handle this, but
in streaming a tar file to tape, I have to know the file size first. This
means writing to /tmp. I supose that's OK, but I've been trying to avoid it.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: RE: [HACKERS] pg_dump & blobs - editable dump?

From
Giles Lean
Date:
> >http://www.goice.co.jp/member/mo/formats/tar.html has a nice brief

Best is to look at one of the actual standards, accessible via:

http://www.opengroup.org

The tar and cpio formats are in the pax specification.

>     136     12 bytes  Modify time (in octal ascii)
>
>     ...do you know the format of the date (seconds since 1970?).

It's just 11 bytes plus \0 in tar's usual encode-this-as-octal format:

encode_octal(unsigned char *p, size_t n, unsigned long value)
{
    const unsigned char octal[] = "01234567";
    while (n) {
        *(p + --n) = octal[value & 07];
        value >>= 3;
    }
}

Warning: some values allowed by tar exceed the size of 'long' on a 32
bit platform.

>     157    100 bytes  Linkname ('\0' terminated, 99 maxmum length)
>
>     ...what's this? Is it the target for symlinks?

Long pathnames get split into two pieces on a '/' as I recall.

The code I offered you previously has code to do this too; I
appreciate that the code is quite likely not what you want, but you
might consider looking at it or other tar/pax code to help you
interpret the standard.

>     329      8 bytes  Major device ID (in octal ascii)
>     337      8 bytes  Minor device ID (in octal ascii)
>     345    167 bytes  Padding
>
>     ...and what should I set these to?

Zero.

> If you're serious about the offer, I'd be happy. But, given how simple the
> format is, I can probably tack in into place myself.

For the very limited formats you want to create, that's probably
the easiest way.  You don't care about unpacking, GNU v. POSIX format,
device files, etc etc.

> There is a minor problem. Currently I compress the output stream as I
> receive it from PG, and send it to the archive. I don't know how big it
> will be until it is written. The custom output format can handle this, but
> in streaming a tar file to tape, I have to know the file size first. This
> means writing to /tmp. I supose that's OK, but I've been trying to
> avoid it.

I recommend you compress the whole stream, not the pieces.  Presumably
you can determine the size of the pieces you're backing up, and ending
with a .tar.gz (or whatever) file is more convenient to manage than a
.tar file of compressed pieces unless you really expect people to be
extracting individual files from the backup very often.

Having to pass everything through /tmp would be really unfortunate.

Regards,

Giles

Re: RE: [HACKERS] pg_dump & blobs - editable dump?

From
Philip Warner
Date:
At 07:58 13/07/00 +1000, Giles Lean wrote:
>
>I recommend you compress the whole stream, not the pieces.  Presumably
>you can determine the size of the pieces you're backing up, and ending
>with a .tar.gz (or whatever) file is more convenient to manage than a
>.tar file of compressed pieces unless you really expect people to be
>extracting individual files from the backup very often.
>
>Having to pass everything through /tmp would be really unfortunate.
>

The only things I compress are the table data and the blobs (ie. the big
things); unfortunately, the table data is of unknown uncompressed size. I
*could* do two 'COPY TO STDOUT' calls, just to get the size, but that seems
like a very bad idea.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/