Thread: pg_dump -Fd must create directory

pg_dump -Fd must create directory

From

François Beausoleil

Date:

12 September 2012, 14:13:17

Hi all!

Why must pg_dump create a fresh new directory everytime? I'm running some tests where I dump a database to a directory,
gitinit and git add --all, then dump again. When I did that after doing some modifications (specifically creating a new
tableand adding a few hundred thousand records), git status told me some files had been renamed and others were
removed.I worked around this by dumping to a new directory, then moving .git manually. 

My real-world use case is I have a largish database (46 GiB gzip'd dump) that I dump every few hours, and some tables
areessentially static. I was thinking of saving some network traffic by transferring only the delta. 

Any thoughts on this? Is this something that can or can be made to change? Where are the IDs used in the dump coming
from?Can they be made stable? 

Thanks!
François Beausoleil

$ cat a.sh
PGDATABASE=${USER}
rm -rf thedump thedump2
psql -c "select * into a from generate_series(1, 100000, 1) as t1(a)"
pg_dump -Fd --file=thedump
cd thedump && git init && git add --all . && git commit -m "initial" && cd ..
psql -c "select * into b from generate_series(1, 100000, 1) as t1(b)"
pg_dump -Fd --file=thedump2
mv thedump/.git thedump2/
cd thedump2
git status --short

$ sh a.sh
SELECT 100000
Initialized empty Git repository in /root/tmp/thedump/.git/
[master (root-commit) 9f8dc9f] initial
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 1882.dat.gz
 create mode 100644 toc.dat
SELECT 100000
 D 1882.dat.gz
 M toc.dat
?? 1886.dat.gz
?? 1887.dat.gz

Re: pg_dump -Fd must create directory

From

David Salisbury

Date:

13 September 2012, 23:52:02

It looks to me like you're misusing git..

You should only git init once, and always use that directory.
Then pg_dump, which should create one file per database
with the file name you've specified.
Not sure of the flags but I'd recommend plain text format.

I'm also unsure what you mean by network traffic, as you don't
mention a remote repository, but there nice visual tools
for you to see the changes to files between you're committed
objects.  git init.. will more than likely lose all changes
to files.

-ds



On 9/12/12 5:12 AM, François Beausoleil wrote:
> Hi all!
>
> Why must pg_dump create a fresh new directory everytime? I'm running some tests where I dump a database to a
directory,git init and git add --all, then dump again. When I did that after doing some modifications (specifically
creatinga new table and adding a few hundred thousand records), git status told me some files had been renamed and
otherswere removed. I worked around this by dumping to a new directory, then moving .git manually. 
>
> My real-world use case is I have a largish database (46 GiB gzip'd dump) that I dump every few hours, and some tables
areessentially static. I was thinking of saving some network traffic by transferring only the delta. 
>
> Any thoughts on this? Is this something that can or can be made to change? Where are the IDs used in the dump coming
from?Can they be made stable? 
>
> Thanks!
> François Beausoleil
>
> $ cat a.sh
> PGDATABASE=${USER}
> rm -rf thedump thedump2
> psql -c "select * into a from generate_series(1, 100000, 1) as t1(a)"
> pg_dump -Fd --file=thedump
> cd thedump&&  git init&&  git add --all .&&  git commit -m "initial"&&  cd ..
> psql -c "select * into b from generate_series(1, 100000, 1) as t1(b)"
> pg_dump -Fd --file=thedump2
> mv thedump/.git thedump2/
> cd thedump2
> git status --short
>
> $ sh a.sh
> SELECT 100000
> Initialized empty Git repository in /root/tmp/thedump/.git/
> [master (root-commit) 9f8dc9f] initial
>   2 files changed, 0 insertions(+), 0 deletions(-)
>   create mode 100644 1882.dat.gz
>   create mode 100644 toc.dat
> SELECT 100000
>   D 1882.dat.gz
>   M toc.dat
> ?? 1886.dat.gz
> ?? 1887.dat.gz
>

Re: pg_dump -Fd must create directory

From

François Beausoleil

Date:

14 September 2012, 06:09:15

Le 2012-09-13 à 16:51, David Salisbury a écrit :

>
> It looks to me like you're misusing git..
>
> You should only git init once, and always use that directory.
> Then pg_dump, which should create one file per database
> with the file name you've specified.
> Not sure of the flags but I'd recommend plain text format.
>
> I'm also unsure what you mean by network traffic, as you don't
> mention a remote repository, but there nice visual tools
> for you to see the changes to files between you're committed
> objects.  git init.. will more than likely lose all changes
> to files.

I was just running a test: looking at a way to transfer large amounts of data for backup purposes with a tool that's
especiallysuited for deltas. I know about rsync, but this was a thought experiment. I was only surprised at the
restrictionof pg_dump that must create a new directory every time. Was looking for a rationale. 

Also, git init is a safe operation: within a repository, git init says it reinitialized, but does not lose files.
Haven'ttried with local changes, or a dirty index. 

Finally, when NOT using the plain text format, pg_restore can restore more than one table at a time, using the --jobs
flag.On a multi-core, multi-spindle machine, this can cut down the restore time tremendously. 

Bye,
François