Thread: Reload only specific databases from pg_dumpall

Reload only specific databases from pg_dumpall

From
Guido Neitzer
Date:
Hi.

Is it possible to reload only specific databases from a file created
by pg_dumpall?

Thanks,
cug

Re: Reload only specific databases from pg_dumpall

From
"Gurjeet Singh"
Date:
What was the output format option used (-f option) ? Was it the plain-text (SQL) or custom format?

Best regards

On Feb 4, 2008 2:45 AM, Guido Neitzer <lists@event-s.net> wrote:
Hi.

Is it possible to reload only specific databases from a file created
by pg_dumpall?

Thanks,
cug

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

17° 29' 34.37"N,   78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N,   73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device

Re: Reload only specific databases from pg_dumpall

From
Guido Neitzer
Date:
On 04.02.2008, at 10:41, Gurjeet Singh wrote:

> What was the output format option used (-f option) ? Was it the
> plain-text (SQL) or custom format?

I cannot see a -f option on pg_dumpall. This is the command:

pg_dumpall > `date "+%y-%m-%d"`.sql

I just want to use an older file from a dump to restore a server, but
I don't want to reload all databases (because that will take way
longer).

cug

Re: Reload only specific databases from pg_dumpall

From
"Gurjeet Singh"
Date:
Sorry couldn't respond earlier...

Yeah, there's no -f option to pg_dumpall, I confused it with pg_dump's -F option.

Since the output of dumpall is plain SQL, since and you would use psql to restore the DB, there's no command line option to execute only a part of the script.

Long story short: you have to manually extract the contents of your DB from the dump file.

Here's what I did: created 3 databases test{1,2,3}. Created single table in each of them. And here's what I see in the head of the dump:

REVOKE ALL ON DATABASE template1 FROM PUBLIC;
REVOKE ALL ON DATABASE template1 FROM gsingh;
GRANT ALL ON DATABASE template1 TO gsingh;
GRANT CONNECT ON DATABASE template1 TO PUBLIC;
CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = gsingh ENCODING = 'UTF8';
CREATE DATABASE test2 WITH TEMPLATE = template0 OWNER = gsingh ENCODING = 'UTF8';
CREATE DATABASE test3 WITH TEMPLATE = template0 OWNER = gsingh ENCODING = 'UTF8';

So lets say we want to restore DB test2... here's how I would go about it:

Take that dump, remove all other 'CREATE DATABASE' commands except for the one for test2. Search for string 'test2'; I get to the following line:

\connect test2

delete everything between the a.m 'CREATE DATABASE' command and this \connect command.

Since there's another DB after test2 (we saw the order in 'CREATE DATABASE' commands, remember ), so I search for the next '\connect' command. I find this:

\connect test3

Form this line on, I delete everything from the file. And I am done.

Now I run:

 psql -p 5444 -f ~/08-02-04.sql -d postgres

And my test2 DB is resored.

psql -p 5444 -f ~/08-02-04.sql -d test2 -c "select count(*) from test;"
 count
-------
   100
(1 row)

HTH,

Best regards,

On Feb 4, 2008 10:54 AM, Guido Neitzer <lists@event-s.net> wrote:
On 04.02.2008, at 10:41, Gurjeet Singh wrote:

> What was the output format option used (-f option) ? Was it the
> plain-text (SQL) or custom format?

I cannot see a -f option on pg_dumpall. This is the command:

pg_dumpall > `date "+%y-%m-%d"`.sql

I just want to use an older file from a dump to restore a server, but
I don't want to reload all databases (because that will take way
longer).

cug



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

17° 29' 34.37"N,   78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N,   73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device

Re: Reload only specific databases from pg_dumpall

From
"Gurjeet Singh"
Date:
I understand it all needs a little bit of 'vi' wizardry, (or whichever editor you are using). Also, as with all open-source suggestions, do not rely on this procedure until you understand what and how it does what it does.

Best regards,

On Feb 4, 2008 4:39 PM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
Sorry couldn't respond earlier...

Yeah, there's no -f option to pg_dumpall, I confused it with pg_dump's -F option.

Since the output of dumpall is plain SQL, since and you would use psql to restore the DB, there's no command line option to execute only a part of the script.

Long story short: you have to manually extract the contents of your DB from the dump file.

Here's what I did: created 3 databases test{1,2,3}. Created single table in each of them. And here's what I see in the head of the dump:

REVOKE ALL ON DATABASE template1 FROM PUBLIC;
REVOKE ALL ON DATABASE template1 FROM gsingh;
GRANT ALL ON DATABASE template1 TO gsingh;
GRANT CONNECT ON DATABASE template1 TO PUBLIC;
CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = gsingh ENCODING = 'UTF8';
CREATE DATABASE test2 WITH TEMPLATE = template0 OWNER = gsingh ENCODING = 'UTF8';
CREATE DATABASE test3 WITH TEMPLATE = template0 OWNER = gsingh ENCODING = 'UTF8';

So lets say we want to restore DB test2... here's how I would go about it:

Take that dump, remove all other 'CREATE DATABASE' commands except for the one for test2. Search for string 'test2'; I get to the following line:

\connect test2

delete everything between the a.m 'CREATE DATABASE' command and this \connect command.

Since there's another DB after test2 (we saw the order in 'CREATE DATABASE' commands, remember ), so I search for the next '\connect' command. I find this:

\connect test3

Form this line on, I delete everything from the file. And I am done.

Now I run:

 psql -p 5444 -f ~/08-02-04.sql -d postgres

And my test2 DB is resored.

psql -p 5444 -f ~/08-02-04.sql -d test2 -c "select count(*) from test;"
 count
-------
   100
(1 row)

HTH,

Best regards,


On Feb 4, 2008 10:54 AM, Guido Neitzer <lists@event-s.net> wrote:
On 04.02.2008, at 10:41, Gurjeet Singh wrote:

> What was the output format option used (-f option) ? Was it the
> plain-text (SQL) or custom format?

I cannot see a -f option on pg_dumpall. This is the command:

pg_dumpall > `date "+%y-%m-%d"`.sql

I just want to use an older file from a dump to restore a server, but
I don't want to reload all databases (because that will take way
longer).

cug



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

17° 29' 34.37"N,   78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N,   73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

17° 29' 34.37"N,   78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N,   73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device

Re: Reload only specific databases from pg_dumpall

From
Guido Neitzer
Date:
On 04.02.2008, at 18:00, Gurjeet Singh wrote:

> I understand it all needs a little bit of 'vi' wizardry, (or
> whichever editor you are using). Also, as with all open-source
> suggestions, do not rely on this procedure until you understand what
> and how it does what it does.

Sure, the only problem is, I'm talking about a laaaarge file. Maybe I
just import the whole bunch and drop some of the databases after that.
Seems to be less hazzle, but takes probably an hour or two longer, so
I'll take a service window next night ...

Thanks for the hints.

cug

Re: Reload only specific databases from pg_dumpall

From
"Gurjeet Singh"
Date:
Well... I assumed that'd be the case; production dumps are not small... this calls for some scripting stuff; the script should basically the steps mentioned above, and you will have a trimmed down version of the dump file... :)

But again, the script should be fast enough to make this additional step look smaller than the time required to restore+drop unwanted DBs.

Best regards,

On Feb 4, 2008 5:10 PM, Guido Neitzer <lists@event-s.net> wrote:
On 04.02.2008, at 18:00, Gurjeet Singh wrote:

> I understand it all needs a little bit of 'vi' wizardry, (or
> whichever editor you are using). Also, as with all open-source
> suggestions, do not rely on this procedure until you understand what
> and how it does what it does.

Sure, the only problem is, I'm talking about a laaaarge file. Maybe I
just import the whole bunch and drop some of the databases after that.
Seems to be less hazzle, but takes probably an hour or two longer, so
I'll take a service window next night ...

Thanks for the hints.

cug



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

17° 29' 34.37"N,   78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N,   73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device

Re: Reload only specific databases from pg_dumpall

From
"Scott Marlowe"
Date:
On Feb 4, 2008 7:10 PM, Guido Neitzer <lists@event-s.net> wrote:
> On 04.02.2008, at 18:00, Gurjeet Singh wrote:
>
> > I understand it all needs a little bit of 'vi' wizardry, (or
> > whichever editor you are using). Also, as with all open-source
> > suggestions, do not rely on this procedure until you understand what
> > and how it does what it does.
>
> Sure, the only problem is, I'm talking about a laaaarge file. Maybe I
> just import the whole bunch and drop some of the databases after that.
> Seems to be less hazzle, but takes probably an hour or two longer, so
> I'll take a service window next night ...
>
> Thanks for the hints.

Here's what I'd do.  First, use head, tail, and grep to find the lines
you need to cut at...

for instance, the create database statements will come up first, so
something like this:

For a test from dumpall I did this:

$ grep -in "Create database"  test.sql
34:CREATE DATABASE smarlowe WITH TEMPLATE = template0 OWNER = postgres
ENCODING = 'LATIN1';

That pulls the create database line out.  Then, I can find the dump
data with this:

$ grep -in \\connect test.sql
5:\connect postgres
38:GRANT CONNECT ON DATABASE template1 TO PUBLIC;
41:\connect postgres
141:\connect smarlowe
335:\connect template1

So, for the smarlowe database I need to get lines 141 to 334.  Quick test:

head -n 141 test.sql |tail -n 1
\connect smarlowe
 head -n 335 test.sql|tail -n 1
\connect template1

So, the lines are where They should be and head / tail lets me grab
them.  Now, to grab the bits I need:

echo $((335-141))
194  -- Number of lines to grab

So, to grab the dump, I can use

$ head -n 334 test.sql|tail -n 194 > smarlowe.sql

Note I'm grabbing 335-1 since I don't actually want the \connect template1 bit.

Quick test:
$ head -n 334 test.sql|tail -n 194|head -n 1
\connect smarlowe
$ head -n 334 test.sql|tail -n 194|tail -n 4
--
-- PostgreSQL database dump complete
--

Re: Reload only specific databases from pg_dumpall

From
"Gurjeet Singh"
Date:
Nice...

Can this be reliably put into a script? The script would take only the DB name as parameter. And one gotcha I woud look out for is to see the CREATE DB's template= is nothing but template0, else warn the user that the DB may not be reliably restored.

Best regards,

On Feb 4, 2008 6:03 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Feb 4, 2008 7:10 PM, Guido Neitzer <lists@event-s.net> wrote:
> On 04.02.2008, at 18:00, Gurjeet Singh wrote:
>
> > I understand it all needs a little bit of 'vi' wizardry, (or
> > whichever editor you are using). Also, as with all open-source
> > suggestions, do not rely on this procedure until you understand what
> > and how it does what it does.
>
> Sure, the only problem is, I'm talking about a laaaarge file. Maybe I
> just import the whole bunch and drop some of the databases after that.
> Seems to be less hazzle, but takes probably an hour or two longer, so
> I'll take a service window next night ...
>
> Thanks for the hints.

Here's what I'd do.  First, use head, tail, and grep to find the lines
you need to cut at...

for instance, the create database statements will come up first, so
something like this:

For a test from dumpall I did this:

$ grep -in "Create database"  test.sql
34:CREATE DATABASE smarlowe WITH TEMPLATE = template0 OWNER = postgres
ENCODING = 'LATIN1';

That pulls the create database line out.  Then, I can find the dump
data with this:

$ grep -in \\connect test.sql
5:\connect postgres
38:GRANT CONNECT ON DATABASE template1 TO PUBLIC;
41:\connect postgres
141:\connect smarlowe
335:\connect template1

So, for the smarlowe database I need to get lines 141 to 334.  Quick test:

head -n 141 test.sql |tail -n 1
\connect smarlowe
 head -n 335 test.sql|tail -n 1
\connect template1

So, the lines are where They should be and head / tail lets me grab
them.  Now, to grab the bits I need:

echo $((335-141))
194  -- Number of lines to grab

So, to grab the dump, I can use

$ head -n 334 test.sql|tail -n 194 > smarlowe.sql

Note I'm grabbing 335-1 since I don't actually want the \connect template1 bit.

Quick test:
$ head -n 334 test.sql|tail -n 194|head -n 1
\connect smarlowe
$ head -n 334 test.sql|tail -n 194|tail -n 4
--
-- PostgreSQL database dump complete
--



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

17° 29' 34.37"N,   78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N,   73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device

Re: Reload only specific databases from pg_dumpall

From
"Andrej Ricnik-Bay"
Date:
Or, for the heck of it:
sed -n -e '/DATABASE smarlowe/p' -e  '141,334p'  test.sql > clean.sql

;}



Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Reload only specific databases from pg_dumpall

From
"Andrej Ricnik-Bay"
Date:
On 05/02/2008, Scott Marlowe <scott.marlowe@gmail.com> wrote:
And a more generic version :}

--------8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<--------
#!/bin/bash
# split.sh:  a shell script and wrapper for some (g)awk to extract a single
#            database out of a dumpall file - a quick and ugly hack, as
#            usual no other warranty as that it worked with my test-case. :}
#
# It relies on GNU awk > 3, may work with nawk/mawk, but I didn't test that.
# The old traditional awk as shipped with Solaris as the default will most
# definitely barf on this one.
#
# The reason for the writing out of the awk-script to /tmp is the invocation;
# I couldn't figure out a way to pass the filename to the BEGIN part other
# than via a variable, and I couldn't stand the thought of having to enter
# it manually on the command line twice. :} And I didn't like the idea of
# writing/saving two separate scripts - shoot me :D
#
# It creates two temporary files that it hopefully wipes after a
# successful run.

# hacked up by andrej
function usage {
echo "Usage: $0 databasename inputfile outputfile"
echo ""
echo "       where database is the name of the database you want to isolate"
echo "       out of the dump-file, inputfile is the file generated by
pg_dumpall"
echo "       from which you want to extract a single database, and
outputfile is"
echo "       the target file you want to write the extracted data to"
echo ""
}

if [ $# -ne 3 ]; then
        usage
        exit 1
fi


database=$1
input=$2
output=$3
pid=$$
temp=/tmp/awk.prog.$pid
cat > $temp <<\END
BEGIN{
  system( "fgrep -in \"\\connect \" " file "> /tmp/outPut" )
  while( getline line < "/tmp/outPut" > 0 ){
    count++
    numbers[count]=line
  }
  for (i=1; i<=count;i++ ){
    if ( numbers[i] ~ db ){
      start = gensub(/([0-9]+):.+/, "\\1", "g",  numbers[i])
      stop = gensub(/([0-9]+):.+/, "\\1", "g",  numbers[i+1]) - 1
    }
  }
  matchdb="CREATE DATABASE "db".+;"
}
{
  if( $0 ~ matchdb  ){
    print
  }
  if(( NR >= int(start) ) &&( NR <= int( stop ) ) ){
    print
  }
}
END

sed -i "s/outPut/outPut.$pid/" /tmp/awk.prog.$pid
awk -v file=$input -v db=$database -f /tmp/awk.prog.$pid $input > $output
rm /tmp/awk.prog.$pid /tmp/outPut.$pid
--------8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<--------



Cheers,
Andrej

Re: Reload only specific databases from pg_dumpall

From
Guido Neitzer
Date:
On 06.02.2008, at 19:41, Andrej Ricnik-Bay wrote:

> And a more generic version :}

Thanks for that!

I had solved my problem as I saw that actually importing all database
from the dump file was faster than compressing the dump file (on a 4
core Woodcrest). So it wasn't that big of a deal that I thought it
would be.

Nevertheless good to have that.

cug