Thread: Schemas, databases, and backups

Schemas, databases, and backups

From

Thomas Harold

Date:

24 November 2005, 12:07:23

According to my reading of the pgsql documentation, the two basic backup
scripts are pg_dump and pg_dumpall.  pg_dump allows you to dump a single
database to a file, while pg_dumpall dumps all of the databases to a
single file.

Currently, we use MSSQL's built-in backup facility.  That allows us,
with a single command, to dump every database to separate files on a
daily basis (and we keep 14-days online).  That makes recovering from a
glitch in one of the databases very easy, and it's rather simple to go
back to a particular day.

Also, schemas are new to us, so I'm still thinking about how they will
affect our processes and databases.

(I'm betting that the ultimate answer is going to be to look for some
3rd party tool in pgFoundry.)

So, now for the questions:

1) Is there a tool (or is this easily scripted in bash?) that would
iterate through the databases in pgsql and dump them to individual
files?  I'm guessing that we would query pg_databases and dump the
database names to a file (how?) and then parse that to feed to pg_dump
(I can figure this bit out myself).

2) What if I wanted to dump individual schemas?  Is this dangerous / not
recommended?  (Probably not... if I can have relationships between
tables in different schemas?)

Re: Schemas, databases, and backups

From

Jaime Casanova

Date:

24 November 2005, 12:23:23

On 11/24/05, Thomas Harold <tgh@tgharold.com> wrote:
> According to my reading of the pgsql documentation, the two basic backup
> scripts are pg_dump and pg_dumpall.  pg_dump allows you to dump a single
> database to a file, while pg_dumpall dumps all of the databases to a
> single file.
>
> Currently, we use MSSQL's built-in backup facility.  That allows us,
> with a single command, to dump every database to separate files on a
> daily basis (and we keep 14-days online).  That makes recovering from a
> glitch in one of the databases very easy, and it's rather simple to go
> back to a particular day.
>
> Also, schemas are new to us, so I'm still thinking about how they will
> affect our processes and databases.
>

if you want to make querys that retrives info for more than one
database at the same time think in use schemas.

> (I'm betting that the ultimate answer is going to be to look for some
> 3rd party tool in pgFoundry.)
>
> So, now for the questions:
>
> 1) Is there a tool (or is this easily scripted in bash?) that would
> iterate through the databases in pgsql and dump them to individual
> files?  I'm guessing that we would query pg_databases and dump the
> database names to a file (how?) and then parse that to feed to pg_dump
> (I can figure this bit out myself).
>

psql -d template1 -U postgres -c "select datname from pg_databases
where datname not in ('template1', 'template0', 'postgres');" | while
read D;

or something like that in a shell script and the simply pg_dump $D...

> 2) What if I wanted to dump individual schemas?  Is this dangerous / not
> recommended?  (Probably not... if I can have relationships between
> tables in different schemas?)
>

dunno


--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Schemas, databases, and backups

From

Martin Foster

Date:

24 November 2005, 13:42:12

Thomas Harold wrote:
> According to my reading of the pgsql documentation, the two basic backup
> scripts are pg_dump and pg_dumpall.  pg_dump allows you to dump a single
> database to a file, while pg_dumpall dumps all of the databases to a
> single file.
>
> Currently, we use MSSQL's built-in backup facility.  That allows us,
> with a single command, to dump every database to separate files on a
> daily basis (and we keep 14-days online).  That makes recovering from a
> glitch in one of the databases very easy, and it's rather simple to go
> back to a particular day.
>
> Also, schemas are new to us, so I'm still thinking about how they will
> affect our processes and databases.
>
> (I'm betting that the ultimate answer is going to be to look for some
> 3rd party tool in pgFoundry.)
>
> So, now for the questions:
>
> 1) Is there a tool (or is this easily scripted in bash?) that would
> iterate through the databases in pgsql and dump them to individual
> files?  I'm guessing that we would query pg_databases and dump the
> database names to a file (how?) and then parse that to feed to pg_dump
> (I can figure this bit out myself).
>
> 2) What if I wanted to dump individual schemas?  Is this dangerous / not
> recommended?  (Probably not... if I can have relationships between
> tables in different schemas?)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Its pretty trivial to code it in a bash script and I've done it for a
while now.    Since these are primarily command line tools, you can
easily pipe the output compress the files, sort it by date and make your
life a bit easier overall.

I included a copy of mine, which was used to backup from another server.
   It will also automatically store them in monthly folders with every
day inside for ease of retrieval.   Its commented and I make use of
variables to speed things along and to my knowledge is still setup to
backup databases listed.

Please note however that this script has not been used in a while, the
servers it was intended to work on are no longer operational.  So any
passwords or addresses are of no consequence!

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

#!/bin/bash
#######################################################################
# PG_BACKUP.SHL
#
# Created       : Martin Foster
# Modified      : 24-Nov-2004
#######################################################################
#
# Due to limitations with the PostgreSQL database backup tool not
# accepting passwords on the command prompt to automate the process and
# thus requiring a loose set of security.   This script instead calls
# the backup proceedure locally on the server and through SSH/RSH sends
# the results back over the network to the appropriate workstation.
#
# Advantages to this system when making use of SSH is the encryption
# and certificate based authentication allowing for security while
# ease of use.   This also runs the backup on the database server
# allowing for a copy to be generated simultaneously if so desired.

#################
# Data Members

#
# Site specific configuration
BACKUP='/var/archives/postgres'                # Backup path
DB='ethereal'                        # Databases to backup
CREDENTIALS='martin@io'                    # Username/host combination

#
# Date based hiearchy
DAY=`date +%d`
MONTH=`date +%m`
YEAR=`date +%y`

#
# Location of utilities
ARCHIVER="/usr/bin/bzip2"                # Archive to compress stream
DUMP="/usr/local/bin/pg_dump -cDU postgres"        # Database dumping utility
FILTER="/usr/bin/grep -v 'INSERT INTO param'"        # Filter used with dump
REMOTE="/usr/bin/ssh"                    # Remote access capability

#
# File settings
PERM='0644'                        # Permissions to take on
OWNER='root'                        # Owner of file
GROUP='wheel'                        # Owner group
EXT='.bz2'                        # Extension to use

#
# Verbosity
VERBOSE='NO'                        # Display information to screen


#################
# Program area

  # Explain what happens
  if [ "$VERBOSE" = "YES" ]
   then
    # Go change into the log dir
    if [ -d $BACKUP ]
      then
        # see if month dir exists in log dir - if not create one
        if [ ! -d "$BACKUP/$YEAR$MONTH" ]
         then
            # date dir does not exist in the log dir - create one
            /bin/mkdir "$BACKUP/$YEAR$MONTH" 2>&1 >/dev/null

            if [ $? -ne 0 ]
             then
                 # could not create log date dir
                echo -e "  Could not create: $BACKUP/$YEAR$MONTH - exiting!\n"
                exit 1
              else
                # created dir
                echo -e "  Created: $BACKUP/$YEAR$MONTH\n"
            fi
         fi

        # Change working directory
        cd "$BACKUP/$YEAR$MONTH"


        # Cycle through list
        for SINGLE in $DB
         do
            echo -e "  Initiating backup of database: $SINGLE"
            $REMOTE $CREDENTIALS "$DUMP $SINGLE | $FILTER | $ARCHIVER" > $SINGLE-$MONTH$DAY$EXT

            echo -e "    Setting permissions..."
            chmod $PERM $SINGLE-$MONTH$DAY$EXT

            echo -e "    Setting permissions..."
            chown $OWNER $SINGLE-$MONTH$DAY$EXT

            echo -e "    Setting permissions..."
            chgrp $GROUP $SINGLE-$MONTH$DAY$EXT

            # Verify file exists
            if [ -f "$SINGLE-$MONTH$DAY.bz2" ]
             then
                echo -e "  Backup succesful!"
             else
                echo -e "  Backup failed!"
            fi
         done

     else
        # Warning
        echo -e "  Working directory not found: $BACKUP\n"
    fi

    # Lack of detail
    else
    # Go change into the log dir
    if [ -d $BACKUP ]
      then
        # see if month dir exists in log dir - if not create one
        if [ ! -d "$BACKUP/$YEAR$MONTH" ]
         then
            # date dir does not exist in the log dir - create one
            /bin/mkdir "$BACKUP/$YEAR$MONTH" 2>&1 >/dev/null

            if [ $? -ne 0 ]
             then
                 # could not create log date dir
                echo -e "Could not create: $BACKUP/$YEAR$MONTH - exiting!\n"
                exit 1
            fi
         fi

        # Change working directory
        cd "$BACKUP/$YEAR$MONTH"

        # Cycle through list
        for SINGLE in $DB
         do
            $REMOTE $CREDENTIALS "$DUMP $SINGLE | $FILTER | $ARCHIVER" > $SINGLE-$MONTH$DAY$EXT

            chmod $PERM $SINGLE-$MONTH$DAY$EXT
            chown $OWNER $SINGLE-$MONTH$DAY$EXT
            chgrp $GROUP $SINGLE-$MONTH$DAY$EXT

            # Verify file exists
            if [ ! -f "$SINGLE-$MONTH$DAY$EXT" ]
             then
                echo -e "  Backup of $SINGLE failed!"
            fi
         done

     else
        # Warning
        echo -e "  Working directory not found: $BACKUP\n"
    fi
   fi

Re: Schemas, databases, and backups

From

Thomas Harold

Date:

24 November 2005, 14:47:57

Jaime Casanova wrote:

> psql -d template1 -U postgres -c "select datname from pg_databases
> where datname not in ('template1', 'template0', 'postgres');" | while
> read D;
>
> or something like that in a shell script and the simply pg_dump $D...
>

I found the following snippet of code, which roughly matches yours.  It
was over in the Redhat mailing lists and was used to vacuum databases.

# su postgres -c 'psql -t -c "select datname from pg_database order by
datname;" template1' | xargs -n 1 echo

template0
template1
test1
test2

After some mucking about, I came up with the following single-line shell
command (suitable for adding to root's crontab).

# su postgres -c 'psql -t -c "select datname from pg_database where not
datistemplate and datallowconn order by datname;" template1' | xargs -n
1 -i pg_dump -Ft -b -U postgres -f /backup/postgresql/pgsql.`date
+%Y%m%d.%H%M`.{}.tar {}

I couldn't figure out how to add in the "not in ('template1',
'postgres', 'template1')" into the single-line shell command.  It seemed
to confuse the shell.

Issues with the above command:

1) The date gets reevaluated for each new execution of pg_dump.  Which
is not necessarily ideal if you want filenames that group easily.
Converting to a shell script would allow finer control.

2) The output is not compressed.  I guess I could switch to using "-Fc"
in conjunction with "-Z 9".

Additional questions and notes:

A) pg_dump takes an argument "--schema=schema", which looks like it
allows me to just dump the contents of a particular schema within a
database.  So if I wanted, I could iterate through the list of schemas
and go that route.

B) There's also a "--table=table" argument, which dumps a single table.
  The man page for pg_dump warns me that pg_dump will not output any
objects that the table depends on, so it may not be possible to restore.
  (The same warning applied to --schema=schema.)

C) I'm not sure whether I can get away with using "where not
datistemplate and datallowconn".  For backing up user databases, does it
matter?  I can't figure out how to quote the commands properly to keep
bash from getting confused.  (Doubled-up quotes? Escaped quotes?)

D) After more mucking, I figured out how to set a static datestamp value
for the entire command and compress the tar files using gzip.  I'm not
sure whether I should use "export" or "set" (both worked).

# export DTSTAMP=`date +%Y%m%d.%H%M` ; su postgres -c 'psql -t -c
"select datname from pg_database where not datistemplate and
datallowconn order by datname;" template1' | xargs -n 1 -i bash -c
"pg_dump -Ft -b -U postgres {} | gzip -c >
/backup/postgresql/pgsql.${DTSTAMP}.{}.tgz"

Links:
http://archives.postgresql.org/pgsql-general/2000-01/msg00593.php
http://postgis.refractions.net/pipermail/postgis-users/2005-November/009925.html

Re: Schemas, databases, and backups

From

Jaime Casanova

Date:

24 November 2005, 15:49:31

>
> # su postgres -c 'psql -t -c "select datname from pg_database where not
> datistemplate and datallowconn order by datname;" template1' | xargs -n
> 1 -i pg_dump -Ft -b -U postgres -f /backup/postgresql/pgsql.`date
> +%Y%m%d.%H%M`.{}.tar {}
>
> I couldn't figure out how to add in the "not in ('template1',
> 'postgres', 'template1')" into the single-line shell command.  It seemed
> to confuse the shell.
>
> Issues with the above command:
>
> 1) The date gets reevaluated for each new execution of pg_dump.  Which
> is not necessarily ideal if you want filenames that group easily.
> Converting to a shell script would allow finer control.
>
> 2) The output is not compressed.  I guess I could switch to using "-Fc"
> in conjunction with "-Z 9".
>
> Additional questions and notes:
>
> A) pg_dump takes an argument "--schema=schema", which looks like it
> allows me to just dump the contents of a particular schema within a
> database.  So if I wanted, I could iterate through the list of schemas
> and go that route.
>
> B) There's also a "--table=table" argument, which dumps a single table.
>  The man page for pg_dump warns me that pg_dump will not output any
> objects that the table depends on, so it may not be possible to restore.
>  (The same warning applied to --schema=schema.)
>

sure you can just like with pg_database, it's pg_namespace for schemas
and pg_tables for tables...

> C) I'm not sure whether I can get away with using "where not
> datistemplate and datallowconn".  For backing up user databases, does it
> matter?  I can't figure out how to quote the commands properly to keep
> bash from getting confused.  (Doubled-up quotes? Escaped quotes?)
>

"where not datistemplate and datallowconn" it's better than "datname
not in (values)"

> D) After more mucking, I figured out how to set a static datestamp value
> for the entire command and compress the tar files using gzip.  I'm not
> sure whether I should use "export" or "set" (both worked).
>
> # export DTSTAMP=`date +%Y%m%d.%H%M` ; su postgres -c 'psql -t -c
> "select datname from pg_database where not datistemplate and
> datallowconn order by datname;" template1' | xargs -n 1 -i bash -c
> "pg_dump -Ft -b -U postgres {} | gzip -c >
> /backup/postgresql/pgsql.${DTSTAMP}.{}.tgz"
>

maybe it's a good idea to put all that in a script... it's getting
bigger and bigger (and uglier and uglier)

> Links:
> http://archives.postgresql.org/pgsql-general/2000-01/msg00593.php
> http://postgis.refractions.net/pipermail/postgis-users/2005-November/009925.html
>


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Schemas, databases, and backups

From

John Koller

Date:

27 November 2005, 00:45:27

On Thu, 24 Nov 2005 07:51:03 -0500, Thomas Harold wrote:

> According to my reading of the pgsql documentation, the two basic backup
> scripts are pg_dump and pg_dumpall.  pg_dump allows you to dump a single
> database to a file, while pg_dumpall dumps all of the databases to a
> single file.
>
> Currently, we use MSSQL's built-in backup facility.  That allows us,
> with a single command, to dump every database to separate files on a
> daily basis (and we keep 14-days online).  That makes recovering from a
> glitch in one of the databases very easy, and it's rather simple to go
> back to a particular day.
> So, now for the questions:
[SNIP]
>
> 1) Is there a tool (or is this easily scripted in bash?) that would
> iterate through the databases in pgsql and dump them to individual
> files?  I'm guessing that we would query pg_databases and dump the
> database names to a file (how?) and then parse that to feed to pg_dump
> (I can figure this bit out myself).
{SNIP}

The FreeBSD port installs a script called 502.pgsql that dumps each
database to a separate file. It looks like it is close to what you
are looking for.

http://www.freebsd.org/cgi/cvsweb.cgi/ports/databases/postgresql81-server/
files/502.pgsql?rev=1.7&content-type=text/x-cvsweb-markup