Thread: Restoring a database from a file system snapshot

Restoring a database from a file system snapshot

From

William Garrison

Date:

27 August 2008, 14:53:59

I have a PostgreSQL database on Windows Server 2003, and the database is
kept on a SAN that has the ability to make instantaneous snapshots.
Once I have made such a snapshot, I am unclear how to re-attach it to
another postgres database on another machine.  Postgres seems to create
a directory structure that looks like this:
    Z:\MyDatabase
    Z:\MyDatabase\PG_VERSION
    Z:\MyDatabase\1234567
    Z:\MyDatabase\lots of files...
The "1234567" number above changes with each new database I create.

So to restore a snapshot, I create a new database, which makes a
X:\NewDatabase\98765 directory.  Do I then drop the contents of the
Z:\MyDatabase\1234567\* inside of the X:\NewDatabase\98765?  I attempted
this some time ago, and postgres refused to start.  (I haven't tried it
again so I don't know the exact error).  I got the impression that this
is not the correct procedure, since the number (which I think
corresponds to the hard link inside C:\Program
Files\PostgreSQL\8.3\data\pg_tblspc) did not match.  I am not sure what
else must be altered in order for the snapshot to "attach"

Any suggestions?  Is my procedure correct?  Would I need to also copy
the transaction logs or something like that?

Re: Restoring a database from a file system snapshot

From

Joao Ferreira gmail

Date:

27 August 2008, 15:08:40

> Any suggestions?  Is my procedure correct?  Would I need to also copy
> the transaction logs or something like that?

the 'by the book' procedure for this operation is to use

pg_dumpall ......... > dump_file.sql

and later

psql -f dump_file.sql postgres



pg_dumpall gives you a transaction aware dump.

I'm not sure how you'dd do about that old snapshot, sorry.

joao


>

Re: Restoring a database from a file system snapshot

From

Alvaro Herrera

Date:

27 August 2008, 15:38:00

William Garrison wrote:
> I have a PostgreSQL database on Windows Server 2003, and the database is
> kept on a SAN that has the ability to make instantaneous snapshots.
> Once I have made such a snapshot, I am unclear how to re-attach it to
> another postgres database on another machine.  Postgres seems to create
> a directory structure that looks like this:
>    Z:\MyDatabase
>    Z:\MyDatabase\PG_VERSION
>    Z:\MyDatabase\1234567
>    Z:\MyDatabase\lots of files...
> The "1234567" number above changes with each new database I create.

It doesn't work.  There's a procedure for restoring files, but you need
to also save the pg_xlog files as a stream, for which you need to set up
an archive_command in postgresql.conf beforehand.  If you're interested
in this, see the "Point in time recovery" chapter in the documentation.

pg_dump/pg_restore is the easiest combination to use anyway.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Restoring a database from a file system snapshot

From

"postgres Emanuel CALVO FRANCO"

Date:

27 August 2008, 16:32:51

Sorry Alvaro, i sent to you the message ;P

 Is the same way (from snapshot) that use Mysql on ZFS.
 IF you don't change anything in the database, why it don't works?
 Then you restart the service with the same path.
 The problem it will be that you need to stop postgres BEFORE snapshot.

Re: Restoring a database from a file system snapshot

From

William Garrison

Date:

27 August 2008, 17:32:37

Alvaro Herrera wrote:

William Garrison wrote:

I have a PostgreSQL database on Windows Server 2003, and the database is  
kept on a SAN that has the ability to make instantaneous snapshots.   
Once I have made such a snapshot, I am unclear how to re-attach it to  
another postgres database on another machine.  Postgres seems to create  
a directory structure that looks like this:  Z:\MyDatabase  Z:\MyDatabase\PG_VERSION  Z:\MyDatabase\1234567  Z:\MyDatabase\lots of files...
The "1234567" number above changes with each new database I create.

It doesn't work.  There's a procedure for restoring files, but you need
to also save the pg_xlog files as a stream, for which you need to set up
an archive_command in postgresql.conf beforehand.  If you're interested
in this, see the "Point in time recovery" chapter in the documentation.

pg_dump/pg_restore is the easiest combination to use anyway.

I've read (and I am reading it again now) that chapter, and I'm making this post because that documentation doesn't seem to address the scenario I am in. The PITR article (http://www.postgresql.org/docs/8.2/static/continuous-archiving.html) shows you how to use the WAL archives after a file system backup that is not consistent. But it doesn't address two vital things:
1) I have a file system backup that *IS* consistent. So I should not need any WAL files at all right?
2) It doesn't explain exactly what folders you should be creating a backup of, or how exactly to restore those files onto another system, or how to munge the tablespaces to work. Specifically, it says
"Restore the database files from your backup dump... If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored."
That's exactly what I'm asking how to do.

**update**
I got it working. Here's how
1) I have a file system snapshot. No WAL files were required.
2) Using my example from my first post, here is how I manually rearranged the files so that postgres saw the new database.
- Create a new tablesspace on the new server
- Create a new database on the new server, using that tablespace. I placed it into X:\NewDatabase\
- PostgreSQL will create a folder X:\NewDatabase\98765. Notice that the number postgres created is NOT the same as your old one.
- Stop PostgreSQL
- Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder
- Delete the 98765 directory.
- Start PostgreSql

What I was doing before was moving the files from Z:\MyDatabase\1234567 into X:\NewDatabase\98765. The log file would then log a message like
2008-08-27 13:24:23 FATAL: database "Your Database Name" does not exist
2008-08-27 13:24:23 DETAIL: The database subdirectory "pg_tblspc/32789/12345" is missing.
It specifically wants the old folder name, not the new folder name of the new database you are attaching into.

--- The referenced article is lacking an explanation for how postgres arranges the files and how the symlinks for the tablespaces are made. Without that knowledge, attaching to another database is guesswork. It really only took me a few tries to get the folder correct, but I think there needs to be an attach procedure for attaching an existing database, or the article should describe the folder structure used by postgres. It isn't as simple as just copying the folder. You have to also rename the directory structure to match. That's what I needed to know. meh, it was easy to do, just unclear.

Re: Restoring a database from a file system snapshot

From

Alvaro Herrera

Date:

27 August 2008, 17:54:28

William Garrison wrote:

> 1) I have a file system backup that *IS* consistent.  So I should not
> need any WAL files at all right?

It is consistent only if it was taken when the postmaster was down.

> **update**
> I got it working.  Here's how
> 1) I have a file system snapshot.  No WAL files were required.
> 2) Using my example from my first post, here is how I manually
> rearranged the files so that postgres saw the new database.
> - Create a new tablesspace on the new server
> - Create a new database on the new server, using that tablespace.  I
> placed it into X:\NewDatabase\
> - PostgreSQL will create a folder X:\NewDatabase\98765.  Notice that the
> number postgres created is NOT the same as your old one.
> - Stop PostgreSQL
> - Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder
> - Delete the 98765 directory.
> - Start PostgreSql

This does not really work, because you're missing the necessay pg_clog
files.  You can make it sort-of-work by doing a VACUUM FREEZE and a
CHECKPOINT on the database before taking the snapshot.  However, you'd
still be missing the entries in shared catalogs.  The only one you've
recreated is the one on pg_database, but there are more.

> --- The referenced article is lacking an explanation for how postgres
> arranges the files and how the symlinks for the tablespaces are made.
> Without that knowledge, attaching to another database is guesswork.

Attaching to another database is not supported at all.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Restoring a database from a file system snapshot

From

William Garrison

Date:

27 August 2008, 21:46:47

I have just come to a horrible realization about PostgreSQL that I'm sure is supposed to be pre-requisite knowledge even going into this. So everyone may laugh at me now.

We have a SAN volume, and we created a tablespace that that points to that SAN volume (Z: drive). This put all the table files on Z:. It was our assumption that the table files + the archived transaction would now be on the Z: drive, and that was enough to restore the database. It is shocking to me that I told PostgreSQL to put the database on Z:, but it is only putting a subset of the necessary files on that drive. That is just frightening. A database is not just tables - it is tables and transaction logs. Why on earth would PostgreSQL put the tables separately from the transaction logs? This is having a chilling effect on me as I realize that the transaction log files are not separated by database. So when I have multiple database systems, I have one single shared set of transaction logs. Even though those databases are on completely separate SANs. I'm used to things like MS SQL Server where I say "database Foo goes on Z: and this database Bar goes on X:" and you can take it for granted that the transaction logs for database Foo also go on Z: and the transaction logs for database Bar go on X:. I'm still reeling from the thought that there can somehow be a single transaction log for multiple databases. How is that even possible? Are the transaction ID numbers shared across databases too?

I need to educate our IT group about this. They setup the SAN volumes based on my incorrect assumptions about how PostgreSQL worked. It sounds like, on Windows, we need to just flat-out reinstall postgres and install it into the Z: drive so that the entire data directory is on the SAN volume. Installing it to C: and having only parts of the database on the SAN is not good.

(Thanks to everyone who is replying - this is clearing-up a lot of misconceptions for me)

P.S. I guess on Unix, you guys all just setup the data directory to be a hard-link to some other location?

Alvaro Herrera wrote:

William Garrison wrote:

1) I have a file system backup that *IS* consistent.  So I should not  
need any WAL files at all right?

It is consistent only if it was taken when the postmaster was down.

**update**
I got it working.  Here's how
1) I have a file system snapshot.  No WAL files were required.
2) Using my example from my first post, here is how I manually  
rearranged the files so that postgres saw the new database.
- Create a new tablesspace on the new server
- Create a new database on the new server, using that tablespace.  I  
placed it into X:\NewDatabase\
- PostgreSQL will create a folder X:\NewDatabase\98765.  Notice that the  
number postgres created is NOT the same as your old one.
- Stop PostgreSQL
- Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder
- Delete the 98765 directory.
- Start PostgreSql

This does not really work, because you're missing the necessay pg_clog
files.  You can make it sort-of-work by doing a VACUUM FREEZE and a
CHECKPOINT on the database before taking the snapshot.  However, you'd
still be missing the entries in shared catalogs.  The only one you've
recreated is the one on pg_database, but there are more.

--- The referenced article is lacking an explanation for how postgres  
arranges the files and how the symlinks for the tablespaces are made.   
Without that knowledge, attaching to another database is guesswork.

Attaching to another database is not supported at all.

Re: Restoring a database from a file system snapshot

From

Richard Huxton

Date:

27 August 2008, 22:27:07

William Garrison wrote:
[snip]
> A database is not just tables - it is tables and
> transaction logs.  Why on earth would PostgreSQL put the tables
> separately from the transaction logs?

Because you told it to. If you want everything on Z:\postgresql you just
initdb that location and point PG at that location (or just install
there). Tablespaces let you store sets of tables/indexes on specific
disks (well, filesystem mount-points).

> How is that even possible?  Are the
> transaction ID numbers shared across databases too?

Yes. The PG term for this is a database "cluster" - an installation that
shares transaction logs, ids, users.

> I need to educate our IT group about this.  They setup the SAN volumes
> based on my incorrect assumptions about how PostgreSQL worked.  It
> sounds like, on Windows, we need to just flat-out reinstall postgres and
> install it into the Z: drive so that the entire data directory is on the
> SAN volume.  Installing it to C: and having only parts of the database
> on the SAN is not good.

Yes. A dump/restore is probably the simplest way of doing this.

> P.S.  I guess on Unix, you guys all just setup the data directory to be
> a hard-link to some other location?

Mount a filesystem at the desired point in the directory tree, or just
use soft-links. Which is how tablespaces are implemented, as it happens.

--
  Richard Huxton
  Archonet Ltd

Re: Restoring a database from a file system snapshot

From

Craig Ringer

Date:

28 August 2008, 03:35:11

Alvaro Herrera wrote:

> It is consistent only if it was taken when the postmaster was down.

OR if you are able to take a point-in-time snapshot at the volume level,
say using LVM or some SAN management tool. The effect is the same as if
the server is hardware reset unexpectedly, in that it requires recovery
when next started but is otherwise fine.

Of course, you still need all of your database including the logs etc.

--
Craig Ringer

Re: Restoring a database from a file system snapshot

From

"Scott Marlowe"

Date:

28 August 2008, 04:11:26

On Wed, Aug 27, 2008 at 6:16 PM, William Garrison <postgres@mobydisk.com> wrote:
>
>  I'm still reeling from the thought that there can
> somehow be a single transaction log for multiple databases.  How is that
> even possible?  Are the transaction ID numbers shared across databases too?

Yes it's possible, and it's how pgsql does it.  Yes txid are based on
a common count amongst the dbs in a cluster.

> I need to educate our IT group about this.  They setup the SAN volumes based
> on my incorrect assumptions about how PostgreSQL worked.  It sounds like, on
> Windows, we need to just flat-out reinstall postgres and install it into the
> Z: drive so that the entire data directory is on the SAN volume.  Installing
> it to C: and having only parts of the database on the SAN is not good.

Really?  I'm not so sure.  By having a local mirror set host the db
pg_xlog type files you can get good performance in bursts even if the
SAN is lagging behind a bit.

> (Thanks to everyone who is replying - this is clearing-up a lot of
> misconceptions for me)
>
> P.S.  I guess on Unix, you guys all just setup the data directory to be a
> hard-link to some other location?

Soft link, but yeah that's exactly what I did tonight.  We have a
mirror set where the main pg database lives, then stop the db, copy
out base to another drive, softlink it into the place of the old base
and restart the db.  Now the pg_xlog and such are on the mirror set
and the main data store is on a separate array (Tonight it was a
RAID-5 for a reporting server, last week it was a 12 disk RAID-10 for
the big storage / transactional server)

By putting the pg_xlog, which is written sequentially on the mirror
set, I ensure that it's not competing with my random writes on the big
RAID-10.

Re: Restoring a database from a file system snapshot

From

Magnus Hagander

Date:

28 August 2008, 04:40:57

William Garrison wrote:
>  I have just come to a horrible realization about PostgreSQL that I'm
> sure is supposed to be pre-requisite knowledge even going into this.  So
> everyone may laugh at me now.
>
> We have a SAN volume, and we created a tablespace that that points to
> that SAN volume (Z: drive).  This put all the table files on Z:.  It was
> our assumption that the table files + the archived transaction would now
> be on the Z: drive, and that was enough to restore the database.  It is
> shocking to me that I told PostgreSQL to put the database on Z:, but it
> is only putting a subset of the necessary files on that drive.  That is
> just frightening.  A database is not just tables - it is tables and
> transaction logs.  Why on earth would PostgreSQL put the tables
> separately from the transaction logs?

The common use-case for tablespaces is performance. You want to move
your hot tables off to faster storage, or move your cold tables off to
slower (and cheaper!) storage.

> This is having a chilling effect
> on me as I realize that the transaction log files are not separated by
> database.  So when I have multiple database systems, I have one single
> shared set of transaction logs.  Even though those databases are on
> completely separate SANs.  I'm used to things like MS SQL Server where I
> say "database Foo goes on Z: and this database Bar goes on X:" and you
> can take it for granted that the transaction logs for database Foo also
> go on Z: and the transaction logs for database Bar go on X:.  I'm still

Um, you might need to read up on your SQL Server docs there. Because you
specify the location for the transaction log completely separate from
the data files in SQL Server as well.

You are correct, however, in that SQL Server has one set of transaction
logs for each database, whereas PostgreSQL has one for the whole cluster.

> reeling from the thought that there can somehow be a single transaction
> log for multiple databases.  How is that even possible?  Are the
> transaction ID numbers shared across databases too?

Yes.

> I need to educate our IT group about this.  They setup the SAN volumes
> based on my incorrect assumptions about how PostgreSQL worked.  It
> sounds like, on Windows, we need to just flat-out reinstall postgres and
> install it into the Z: drive so that the entire data directory is on the
> SAN volume.  Installing it to C: and having only parts of the database
> on the SAN is not good.

You should install PostgreSQL on C:, and have the complete data
directory on the SAN. You can set this from the MSI installer, or you
can manually move the directory and then change the path in the service
startup command (may require the use of regedit). (obviously with
postgresql stopped)

> (Thanks to everyone who is replying - this is clearing-up a lot of
> misconceptions for me)
>
> P.S.  I guess on Unix, you guys all just setup the data directory to be
> a hard-link to some other location?

Soft-link, not hard-link. But you can do this on Windows as well, if
that is easier for you. See http://support.microsoft.com/kb/205524 for
some hints. There are also tools from sysinternals, iirc.

//Magnus

Re: Restoring a database from a file system snapshot

From

Shane Ambler

Date:

28 August 2008, 13:57:49

William Garrison wrote:

> We have a SAN volume, and we created a tablespace that that points to
> that SAN volume (Z: drive).  This put all the table files on Z:.  It
> was our assumption that the table files + the archived transaction
> would now be on the Z: drive, and that was enough to restore the
> database.

One instance of postgresql running on a machine constitutes a cluster,
which may contain more than one database. All the data and logs for the
cluster are normally contained in the one folder. Some things like logs
can be configured to be stored outside of this folder. System wide data
is also stored in this folder, such as list of databases, users and
tablespace definitions.

You can configure the postgresql cluster to use a data folder located
anywhere you wish.

You can use tablespaces to store specific items (database, tables,
indexes) outside of the specified data directory - usually this is to
locate them on separate disks for performance reasons.

Only items created to use the tablespace are located within the folder
specified for that tablespace, everything else goes into the data directory.

So CREATE DATABASE mydb WITH TABLESPACE myzdrive; will create a database
and all of it's associated contents in your z drive folder - outside of
the cluster's data folder where the logs, transaction logs and system
wide data is located.

--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: Restoring a database from a file system snapshot

From

William Garrison

Date:

28 August 2008, 14:47:07

Thanks to everyone for for the myriad of informative replies on this.