Thread: Restoring a database from a file system snapshot
I have a PostgreSQL database on Windows Server 2003, and the database is kept on a SAN that has the ability to make instantaneous snapshots. Once I have made such a snapshot, I am unclear how to re-attach it to another postgres database on another machine. Postgres seems to create a directory structure that looks like this: Z:\MyDatabase Z:\MyDatabase\PG_VERSION Z:\MyDatabase\1234567 Z:\MyDatabase\lots of files... The "1234567" number above changes with each new database I create. So to restore a snapshot, I create a new database, which makes a X:\NewDatabase\98765 directory. Do I then drop the contents of the Z:\MyDatabase\1234567\* inside of the X:\NewDatabase\98765? I attempted this some time ago, and postgres refused to start. (I haven't tried it again so I don't know the exact error). I got the impression that this is not the correct procedure, since the number (which I think corresponds to the hard link inside C:\Program Files\PostgreSQL\8.3\data\pg_tblspc) did not match. I am not sure what else must be altered in order for the snapshot to "attach" Any suggestions? Is my procedure correct? Would I need to also copy the transaction logs or something like that?
> Any suggestions? Is my procedure correct? Would I need to also copy > the transaction logs or something like that? the 'by the book' procedure for this operation is to use pg_dumpall ......... > dump_file.sql and later psql -f dump_file.sql postgres pg_dumpall gives you a transaction aware dump. I'm not sure how you'dd do about that old snapshot, sorry. joao >
William Garrison wrote: > I have a PostgreSQL database on Windows Server 2003, and the database is > kept on a SAN that has the ability to make instantaneous snapshots. > Once I have made such a snapshot, I am unclear how to re-attach it to > another postgres database on another machine. Postgres seems to create > a directory structure that looks like this: > Z:\MyDatabase > Z:\MyDatabase\PG_VERSION > Z:\MyDatabase\1234567 > Z:\MyDatabase\lots of files... > The "1234567" number above changes with each new database I create. It doesn't work. There's a procedure for restoring files, but you need to also save the pg_xlog files as a stream, for which you need to set up an archive_command in postgresql.conf beforehand. If you're interested in this, see the "Point in time recovery" chapter in the documentation. pg_dump/pg_restore is the easiest combination to use anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Sorry Alvaro, i sent to you the message ;P Is the same way (from snapshot) that use Mysql on ZFS. IF you don't change anything in the database, why it don't works? Then you restart the service with the same path. The problem it will be that you need to stop postgres BEFORE snapshot.
Alvaro Herrera wrote:
1) I have a file system backup that *IS* consistent. So I should not need any WAL files at all right?
2) It doesn't explain exactly what folders you should be creating a backup of, or how exactly to restore those files onto another system, or how to munge the tablespaces to work. Specifically, it says
"Restore the database files from your backup dump... If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored."
That's exactly what I'm asking how to do.
**update**
I got it working. Here's how
1) I have a file system snapshot. No WAL files were required.
2) Using my example from my first post, here is how I manually rearranged the files so that postgres saw the new database.
- Create a new tablesspace on the new server
- Create a new database on the new server, using that tablespace. I placed it into X:\NewDatabase\
- PostgreSQL will create a folder X:\NewDatabase\98765. Notice that the number postgres created is NOT the same as your old one.
- Stop PostgreSQL
- Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder
- Delete the 98765 directory.
- Start PostgreSql
What I was doing before was moving the files from Z:\MyDatabase\1234567 into X:\NewDatabase\98765. The log file would then log a message like
2008-08-27 13:24:23 FATAL: database "Your Database Name" does not exist
2008-08-27 13:24:23 DETAIL: The database subdirectory "pg_tblspc/32789/12345" is missing.
It specifically wants the old folder name, not the new folder name of the new database you are attaching into.
--- The referenced article is lacking an explanation for how postgres arranges the files and how the symlinks for the tablespaces are made. Without that knowledge, attaching to another database is guesswork. It really only took me a few tries to get the folder correct, but I think there needs to be an attach procedure for attaching an existing database, or the article should describe the folder structure used by postgres. It isn't as simple as just copying the folder. You have to also rename the directory structure to match. That's what I needed to know. meh, it was easy to do, just unclear.
I've read (and I am reading it again now) that chapter, and I'm making this post because that documentation doesn't seem to address the scenario I am in. The PITR article (http://www.postgresql.org/docs/8.2/static/continuous-archiving.html) shows you how to use the WAL archives after a file system backup that is not consistent. But it doesn't address two vital things:William Garrison wrote:I have a PostgreSQL database on Windows Server 2003, and the database is kept on a SAN that has the ability to make instantaneous snapshots. Once I have made such a snapshot, I am unclear how to re-attach it to another postgres database on another machine. Postgres seems to create a directory structure that looks like this: Z:\MyDatabase Z:\MyDatabase\PG_VERSION Z:\MyDatabase\1234567 Z:\MyDatabase\lots of files... The "1234567" number above changes with each new database I create.It doesn't work. There's a procedure for restoring files, but you need to also save the pg_xlog files as a stream, for which you need to set up an archive_command in postgresql.conf beforehand. If you're interested in this, see the "Point in time recovery" chapter in the documentation. pg_dump/pg_restore is the easiest combination to use anyway.
1) I have a file system backup that *IS* consistent. So I should not need any WAL files at all right?
2) It doesn't explain exactly what folders you should be creating a backup of, or how exactly to restore those files onto another system, or how to munge the tablespaces to work. Specifically, it says
"Restore the database files from your backup dump... If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored."
That's exactly what I'm asking how to do.
**update**
I got it working. Here's how
1) I have a file system snapshot. No WAL files were required.
2) Using my example from my first post, here is how I manually rearranged the files so that postgres saw the new database.
- Create a new tablesspace on the new server
- Create a new database on the new server, using that tablespace. I placed it into X:\NewDatabase\
- PostgreSQL will create a folder X:\NewDatabase\98765. Notice that the number postgres created is NOT the same as your old one.
- Stop PostgreSQL
- Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder
- Delete the 98765 directory.
- Start PostgreSql
What I was doing before was moving the files from Z:\MyDatabase\1234567 into X:\NewDatabase\98765. The log file would then log a message like
2008-08-27 13:24:23 FATAL: database "Your Database Name" does not exist
2008-08-27 13:24:23 DETAIL: The database subdirectory "pg_tblspc/32789/12345" is missing.
It specifically wants the old folder name, not the new folder name of the new database you are attaching into.
--- The referenced article is lacking an explanation for how postgres arranges the files and how the symlinks for the tablespaces are made. Without that knowledge, attaching to another database is guesswork. It really only took me a few tries to get the folder correct, but I think there needs to be an attach procedure for attaching an existing database, or the article should describe the folder structure used by postgres. It isn't as simple as just copying the folder. You have to also rename the directory structure to match. That's what I needed to know. meh, it was easy to do, just unclear.
William Garrison wrote: > 1) I have a file system backup that *IS* consistent. So I should not > need any WAL files at all right? It is consistent only if it was taken when the postmaster was down. > **update** > I got it working. Here's how > 1) I have a file system snapshot. No WAL files were required. > 2) Using my example from my first post, here is how I manually > rearranged the files so that postgres saw the new database. > - Create a new tablesspace on the new server > - Create a new database on the new server, using that tablespace. I > placed it into X:\NewDatabase\ > - PostgreSQL will create a folder X:\NewDatabase\98765. Notice that the > number postgres created is NOT the same as your old one. > - Stop PostgreSQL > - Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder > - Delete the 98765 directory. > - Start PostgreSql This does not really work, because you're missing the necessay pg_clog files. You can make it sort-of-work by doing a VACUUM FREEZE and a CHECKPOINT on the database before taking the snapshot. However, you'd still be missing the entries in shared catalogs. The only one you've recreated is the one on pg_database, but there are more. > --- The referenced article is lacking an explanation for how postgres > arranges the files and how the symlinks for the tablespaces are made. > Without that knowledge, attaching to another database is guesswork. Attaching to another database is not supported at all. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
I have just come to a horrible realization about PostgreSQL that I'm sure is supposed to be pre-requisite knowledge even going into this. So everyone may laugh at me now.
We have a SAN volume, and we created a tablespace that that points to that SAN volume (Z: drive). This put all the table files on Z:. It was our assumption that the table files + the archived transaction would now be on the Z: drive, and that was enough to restore the database. It is shocking to me that I told PostgreSQL to put the database on Z:, but it is only putting a subset of the necessary files on that drive. That is just frightening. A database is not just tables - it is tables and transaction logs. Why on earth would PostgreSQL put the tables separately from the transaction logs? This is having a chilling effect on me as I realize that the transaction log files are not separated by database. So when I have multiple database systems, I have one single shared set of transaction logs. Even though those databases are on completely separate SANs. I'm used to things like MS SQL Server where I say "database Foo goes on Z: and this database Bar goes on X:" and you can take it for granted that the transaction logs for database Foo also go on Z: and the transaction logs for database Bar go on X:. I'm still reeling from the thought that there can somehow be a single transaction log for multiple databases. How is that even possible? Are the transaction ID numbers shared across databases too?
I need to educate our IT group about this. They setup the SAN volumes based on my incorrect assumptions about how PostgreSQL worked. It sounds like, on Windows, we need to just flat-out reinstall postgres and install it into the Z: drive so that the entire data directory is on the SAN volume. Installing it to C: and having only parts of the database on the SAN is not good.
(Thanks to everyone who is replying - this is clearing-up a lot of misconceptions for me)
P.S. I guess on Unix, you guys all just setup the data directory to be a hard-link to some other location?
Alvaro Herrera wrote:
We have a SAN volume, and we created a tablespace that that points to that SAN volume (Z: drive). This put all the table files on Z:. It was our assumption that the table files + the archived transaction would now be on the Z: drive, and that was enough to restore the database. It is shocking to me that I told PostgreSQL to put the database on Z:, but it is only putting a subset of the necessary files on that drive. That is just frightening. A database is not just tables - it is tables and transaction logs. Why on earth would PostgreSQL put the tables separately from the transaction logs? This is having a chilling effect on me as I realize that the transaction log files are not separated by database. So when I have multiple database systems, I have one single shared set of transaction logs. Even though those databases are on completely separate SANs. I'm used to things like MS SQL Server where I say "database Foo goes on Z: and this database Bar goes on X:" and you can take it for granted that the transaction logs for database Foo also go on Z: and the transaction logs for database Bar go on X:. I'm still reeling from the thought that there can somehow be a single transaction log for multiple databases. How is that even possible? Are the transaction ID numbers shared across databases too?
I need to educate our IT group about this. They setup the SAN volumes based on my incorrect assumptions about how PostgreSQL worked. It sounds like, on Windows, we need to just flat-out reinstall postgres and install it into the Z: drive so that the entire data directory is on the SAN volume. Installing it to C: and having only parts of the database on the SAN is not good.
(Thanks to everyone who is replying - this is clearing-up a lot of misconceptions for me)
P.S. I guess on Unix, you guys all just setup the data directory to be a hard-link to some other location?
Alvaro Herrera wrote:
William Garrison wrote:1) I have a file system backup that *IS* consistent. So I should not need any WAL files at all right?It is consistent only if it was taken when the postmaster was down.**update** I got it working. Here's how 1) I have a file system snapshot. No WAL files were required. 2) Using my example from my first post, here is how I manually rearranged the files so that postgres saw the new database. - Create a new tablesspace on the new server - Create a new database on the new server, using that tablespace. I placed it into X:\NewDatabase\ - PostgreSQL will create a folder X:\NewDatabase\98765. Notice that the number postgres created is NOT the same as your old one. - Stop PostgreSQL - Move your files from Z:\MyDatabase\1234567 into the X:\NewDatabase folder - Delete the 98765 directory. - Start PostgreSqlThis does not really work, because you're missing the necessay pg_clog files. You can make it sort-of-work by doing a VACUUM FREEZE and a CHECKPOINT on the database before taking the snapshot. However, you'd still be missing the entries in shared catalogs. The only one you've recreated is the one on pg_database, but there are more.--- The referenced article is lacking an explanation for how postgres arranges the files and how the symlinks for the tablespaces are made. Without that knowledge, attaching to another database is guesswork.Attaching to another database is not supported at all.
William Garrison wrote: [snip] > A database is not just tables - it is tables and > transaction logs. Why on earth would PostgreSQL put the tables > separately from the transaction logs? Because you told it to. If you want everything on Z:\postgresql you just initdb that location and point PG at that location (or just install there). Tablespaces let you store sets of tables/indexes on specific disks (well, filesystem mount-points). > How is that even possible? Are the > transaction ID numbers shared across databases too? Yes. The PG term for this is a database "cluster" - an installation that shares transaction logs, ids, users. > I need to educate our IT group about this. They setup the SAN volumes > based on my incorrect assumptions about how PostgreSQL worked. It > sounds like, on Windows, we need to just flat-out reinstall postgres and > install it into the Z: drive so that the entire data directory is on the > SAN volume. Installing it to C: and having only parts of the database > on the SAN is not good. Yes. A dump/restore is probably the simplest way of doing this. > P.S. I guess on Unix, you guys all just setup the data directory to be > a hard-link to some other location? Mount a filesystem at the desired point in the directory tree, or just use soft-links. Which is how tablespaces are implemented, as it happens. -- Richard Huxton Archonet Ltd
Alvaro Herrera wrote: > It is consistent only if it was taken when the postmaster was down. OR if you are able to take a point-in-time snapshot at the volume level, say using LVM or some SAN management tool. The effect is the same as if the server is hardware reset unexpectedly, in that it requires recovery when next started but is otherwise fine. Of course, you still need all of your database including the logs etc. -- Craig Ringer
On Wed, Aug 27, 2008 at 6:16 PM, William Garrison <postgres@mobydisk.com> wrote: > > I'm still reeling from the thought that there can > somehow be a single transaction log for multiple databases. How is that > even possible? Are the transaction ID numbers shared across databases too? Yes it's possible, and it's how pgsql does it. Yes txid are based on a common count amongst the dbs in a cluster. > I need to educate our IT group about this. They setup the SAN volumes based > on my incorrect assumptions about how PostgreSQL worked. It sounds like, on > Windows, we need to just flat-out reinstall postgres and install it into the > Z: drive so that the entire data directory is on the SAN volume. Installing > it to C: and having only parts of the database on the SAN is not good. Really? I'm not so sure. By having a local mirror set host the db pg_xlog type files you can get good performance in bursts even if the SAN is lagging behind a bit. > (Thanks to everyone who is replying - this is clearing-up a lot of > misconceptions for me) > > P.S. I guess on Unix, you guys all just setup the data directory to be a > hard-link to some other location? Soft link, but yeah that's exactly what I did tonight. We have a mirror set where the main pg database lives, then stop the db, copy out base to another drive, softlink it into the place of the old base and restart the db. Now the pg_xlog and such are on the mirror set and the main data store is on a separate array (Tonight it was a RAID-5 for a reporting server, last week it was a 12 disk RAID-10 for the big storage / transactional server) By putting the pg_xlog, which is written sequentially on the mirror set, I ensure that it's not competing with my random writes on the big RAID-10.
William Garrison wrote: > I have just come to a horrible realization about PostgreSQL that I'm > sure is supposed to be pre-requisite knowledge even going into this. So > everyone may laugh at me now. > > We have a SAN volume, and we created a tablespace that that points to > that SAN volume (Z: drive). This put all the table files on Z:. It was > our assumption that the table files + the archived transaction would now > be on the Z: drive, and that was enough to restore the database. It is > shocking to me that I told PostgreSQL to put the database on Z:, but it > is only putting a subset of the necessary files on that drive. That is > just frightening. A database is not just tables - it is tables and > transaction logs. Why on earth would PostgreSQL put the tables > separately from the transaction logs? The common use-case for tablespaces is performance. You want to move your hot tables off to faster storage, or move your cold tables off to slower (and cheaper!) storage. > This is having a chilling effect > on me as I realize that the transaction log files are not separated by > database. So when I have multiple database systems, I have one single > shared set of transaction logs. Even though those databases are on > completely separate SANs. I'm used to things like MS SQL Server where I > say "database Foo goes on Z: and this database Bar goes on X:" and you > can take it for granted that the transaction logs for database Foo also > go on Z: and the transaction logs for database Bar go on X:. I'm still Um, you might need to read up on your SQL Server docs there. Because you specify the location for the transaction log completely separate from the data files in SQL Server as well. You are correct, however, in that SQL Server has one set of transaction logs for each database, whereas PostgreSQL has one for the whole cluster. > reeling from the thought that there can somehow be a single transaction > log for multiple databases. How is that even possible? Are the > transaction ID numbers shared across databases too? Yes. > I need to educate our IT group about this. They setup the SAN volumes > based on my incorrect assumptions about how PostgreSQL worked. It > sounds like, on Windows, we need to just flat-out reinstall postgres and > install it into the Z: drive so that the entire data directory is on the > SAN volume. Installing it to C: and having only parts of the database > on the SAN is not good. You should install PostgreSQL on C:, and have the complete data directory on the SAN. You can set this from the MSI installer, or you can manually move the directory and then change the path in the service startup command (may require the use of regedit). (obviously with postgresql stopped) > (Thanks to everyone who is replying - this is clearing-up a lot of > misconceptions for me) > > P.S. I guess on Unix, you guys all just setup the data directory to be > a hard-link to some other location? Soft-link, not hard-link. But you can do this on Windows as well, if that is easier for you. See http://support.microsoft.com/kb/205524 for some hints. There are also tools from sysinternals, iirc. //Magnus
William Garrison wrote: > We have a SAN volume, and we created a tablespace that that points to > that SAN volume (Z: drive). This put all the table files on Z:. It > was our assumption that the table files + the archived transaction > would now be on the Z: drive, and that was enough to restore the > database. One instance of postgresql running on a machine constitutes a cluster, which may contain more than one database. All the data and logs for the cluster are normally contained in the one folder. Some things like logs can be configured to be stored outside of this folder. System wide data is also stored in this folder, such as list of databases, users and tablespace definitions. You can configure the postgresql cluster to use a data folder located anywhere you wish. You can use tablespaces to store specific items (database, tables, indexes) outside of the specified data directory - usually this is to locate them on separate disks for performance reasons. Only items created to use the tablespace are located within the folder specified for that tablespace, everything else goes into the data directory. So CREATE DATABASE mydb WITH TABLESPACE myzdrive; will create a database and all of it's associated contents in your z drive folder - outside of the cluster's data folder where the logs, transaction logs and system wide data is located. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
Thanks to everyone for for the myriad of informative replies on this.