Thread: pg_dump -Fd must create directory
Hi all! Why must pg_dump create a fresh new directory everytime? I'm running some tests where I dump a database to a directory, gitinit and git add --all, then dump again. When I did that after doing some modifications (specifically creating a new tableand adding a few hundred thousand records), git status told me some files had been renamed and others were removed.I worked around this by dumping to a new directory, then moving .git manually. My real-world use case is I have a largish database (46 GiB gzip'd dump) that I dump every few hours, and some tables areessentially static. I was thinking of saving some network traffic by transferring only the delta. Any thoughts on this? Is this something that can or can be made to change? Where are the IDs used in the dump coming from?Can they be made stable? Thanks! François Beausoleil $ cat a.sh PGDATABASE=${USER} rm -rf thedump thedump2 psql -c "select * into a from generate_series(1, 100000, 1) as t1(a)" pg_dump -Fd --file=thedump cd thedump && git init && git add --all . && git commit -m "initial" && cd .. psql -c "select * into b from generate_series(1, 100000, 1) as t1(b)" pg_dump -Fd --file=thedump2 mv thedump/.git thedump2/ cd thedump2 git status --short $ sh a.sh SELECT 100000 Initialized empty Git repository in /root/tmp/thedump/.git/ [master (root-commit) 9f8dc9f] initial 2 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 1882.dat.gz create mode 100644 toc.dat SELECT 100000 D 1882.dat.gz M toc.dat ?? 1886.dat.gz ?? 1887.dat.gz
It looks to me like you're misusing git.. You should only git init once, and always use that directory. Then pg_dump, which should create one file per database with the file name you've specified. Not sure of the flags but I'd recommend plain text format. I'm also unsure what you mean by network traffic, as you don't mention a remote repository, but there nice visual tools for you to see the changes to files between you're committed objects. git init.. will more than likely lose all changes to files. -ds On 9/12/12 5:12 AM, François Beausoleil wrote: > Hi all! > > Why must pg_dump create a fresh new directory everytime? I'm running some tests where I dump a database to a directory,git init and git add --all, then dump again. When I did that after doing some modifications (specifically creatinga new table and adding a few hundred thousand records), git status told me some files had been renamed and otherswere removed. I worked around this by dumping to a new directory, then moving .git manually. > > My real-world use case is I have a largish database (46 GiB gzip'd dump) that I dump every few hours, and some tables areessentially static. I was thinking of saving some network traffic by transferring only the delta. > > Any thoughts on this? Is this something that can or can be made to change? Where are the IDs used in the dump coming from?Can they be made stable? > > Thanks! > François Beausoleil > > $ cat a.sh > PGDATABASE=${USER} > rm -rf thedump thedump2 > psql -c "select * into a from generate_series(1, 100000, 1) as t1(a)" > pg_dump -Fd --file=thedump > cd thedump&& git init&& git add --all .&& git commit -m "initial"&& cd .. > psql -c "select * into b from generate_series(1, 100000, 1) as t1(b)" > pg_dump -Fd --file=thedump2 > mv thedump/.git thedump2/ > cd thedump2 > git status --short > > $ sh a.sh > SELECT 100000 > Initialized empty Git repository in /root/tmp/thedump/.git/ > [master (root-commit) 9f8dc9f] initial > 2 files changed, 0 insertions(+), 0 deletions(-) > create mode 100644 1882.dat.gz > create mode 100644 toc.dat > SELECT 100000 > D 1882.dat.gz > M toc.dat > ?? 1886.dat.gz > ?? 1887.dat.gz >
Le 2012-09-13 à 16:51, David Salisbury a écrit : > > It looks to me like you're misusing git.. > > You should only git init once, and always use that directory. > Then pg_dump, which should create one file per database > with the file name you've specified. > Not sure of the flags but I'd recommend plain text format. > > I'm also unsure what you mean by network traffic, as you don't > mention a remote repository, but there nice visual tools > for you to see the changes to files between you're committed > objects. git init.. will more than likely lose all changes > to files. I was just running a test: looking at a way to transfer large amounts of data for backup purposes with a tool that's especiallysuited for deltas. I know about rsync, but this was a thought experiment. I was only surprised at the restrictionof pg_dump that must create a new directory every time. Was looking for a rationale. Also, git init is a safe operation: within a repository, git init says it reinitialized, but does not lose files. Haven'ttried with local changes, or a dirty index. Finally, when NOT using the plain text format, pg_restore can restore more than one table at a time, using the --jobs flag.On a multi-core, multi-spindle machine, this can cut down the restore time tremendously. Bye, François