Thread: Copying database directory
We are trying to select a database engine for a new product we plan to start developing. One of the requirements is to be able to ship the data on the media and install it on the user’s hard-drive.
One of the issues we’ve run into is that pg_restore takes a lot of time to restore large database.
Also, moving entire data directory is not an option since we will have multiple databases that user can choose to install.
We’ve tried to copy the database directory and it seems to work. Unfortunately we are not sure if there is any, not visible, damage to the data.
So, here is my question: Is it possible to:
1. Create a database
2. Stop postgres
3. Empty out the directory created by the new database
4. Copy files from another database (created on a different server) into that directory
5. Start postgres and use the database
Once this is completed we may need to repeat the process for another database.
Is this possible? If not, what are other options?
Any help or suggestions is greatly appreciated.
Thank you.
"Tsing, Vadim" wrote: > We are trying to select a database engine for a new product we plan > to start developing. One of the requirements is to be able to ship > the data on the media and install it on the user's hard-drive. > > One of the issues we've run into is that pg_restore takes a lot of > time to restore large database. > > Also, moving entire data directory is not an option since we will > have multiple databases that user can choose to install. > > We've tried to copy the database directory and it seems to work. > Unfortunately we are not sure if there is any, not visible, damage > to the data. > > So, here is my question: Is it possible to: > > 1. Create a database > > 2. Stop postgres > > 3. Empty out the directory created by the new database > > 4. Copy files from another database (created on a different > server) into that directory > > 5. Start postgres and use the database > > Once this is completed we may need to repeat the process for > another database. This is not supported; you are playing with fire. > what are other options? A couple with come to mind: (1) Create a separate PostgreSQL cluster for each database, so that each has its own data directory -- this is safe at that level, provided the runtime environments are compatible. (2) Again, assuming a compatible runtime environment, copy in the data directory for all of them, but with tablespaces pointing back to the removable medium. Of course, this likely only works if the large part of the data is read-only and fits on one something which can be mounted at run time. The other thing to look at is whether there is some way to adjust your schema to allow faster restore. Which phase is causing you a problem? Have you profiled the restore while it is running? -Kevin
Kevin, I realize that copying database directory is not supported since OIDs sequencing will be broken. I also verified that Postgres makes sure that it creates unique file names when writing pages to the disk, that is why I've kind of hoped that Postgres may somehow deal with it. :-) As far as suggested options, (1) Using separate cluster: I believe that we will have to launch separate instance of the Postgres to handle each cluster, right? Unfortunately our target hardware requirements do not leave us much RAM to deal with. (2) I believe that this is similar to (1) except we would not have to copy data to the hard drive. Out databases vary in size, bat many of the have tables with millions of rows. Those tables are indexed. One of the databases that we've timed, took about 12 min to load the data and another 10-12 min to re-create indexes. The dump file is about 400 MB. This is by far not the largest one we will have to deal with. Right now our approach is to copy the dump file to the hard-drive using MSI and then use custom code to call pg-restore. The entire process takes a lot of time and, for the big part of the process, we cannot even show the progress to the user. What we are looking for is a way to create the database in our environment, copy that to the removable media (DVD), make that data available to the postgres running on the users' computer. What makes this complex, is the need to create multiple databases, only some of them will ever be running on the user's PC. For example: We create the following databases: one for BOLTS, one for NUTS, and one for SCREWS. Each is packaged in a separate DVD. Customers order any combination of those (ex: NUTS and BOLTS). Ideally we would like to quickly load both of those databases on the customer's PC (not at the same time, since one can be orders separately from another). We also plan to have a mechanism to periodically ship new versions of those databases, as well as updating some of the data over Internet (we already have a mechanism for updating installed data on the background). We would appreciate any approach of moving the content of the database from PC to PC that does not require running another instance of postgres. Thank you. P.S. One of our largest databases (during POC) took 8 hours to restore. -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Saturday, December 18, 2010 1:14 PM To: pgsql-admin@postgresql.org; Tsing, Vadim Subject: Re: [ADMIN] Copying database directory "Tsing, Vadim" wrote: > We are trying to select a database engine for a new product we plan > to start developing. One of the requirements is to be able to ship > the data on the media and install it on the user's hard-drive. > > One of the issues we've run into is that pg_restore takes a lot of > time to restore large database. > > Also, moving entire data directory is not an option since we will > have multiple databases that user can choose to install. > > We've tried to copy the database directory and it seems to work. > Unfortunately we are not sure if there is any, not visible, damage > to the data. > > So, here is my question: Is it possible to: > > 1. Create a database > > 2. Stop postgres > > 3. Empty out the directory created by the new database > > 4. Copy files from another database (created on a different > server) into that directory > > 5. Start postgres and use the database > > Once this is completed we may need to repeat the process for > another database. This is not supported; you are playing with fire. > what are other options? A couple with come to mind: (1) Create a separate PostgreSQL cluster for each database, so that each has its own data directory -- this is safe at that level, provided the runtime environments are compatible. (2) Again, assuming a compatible runtime environment, copy in the data directory for all of them, but with tablespaces pointing back to the removable medium. Of course, this likely only works if the large part of the data is read-only and fits on one something which can be mounted at run time. The other thing to look at is whether there is some way to adjust your schema to allow faster restore. Which phase is causing you a problem? Have you profiled the restore while it is running? -Kevin
"Tsing, Vadim" wrote: > Out databases vary in size, bat many of the have tables with > millions of rows. Those tables are indexed. One of the databases > that we've timed, took about 12 min to load the data and another > 10-12 min to re-create indexes. The dump file is about 400 MB. This > is by far not the largest one we will have to deal with. Any chance that you can just have the app make requests to a database on your server, and not ship the database with the software? Maybe even do this as a web app? -Kevin
Actually, that same app we ship on the media is also available as a web app. The media is offered to customers who are requesting local access or that have slow or unreliable Internet access. These are the requirements we have to deal with. :-( -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Saturday, December 18, 2010 2:51 PM To: pgsql-admin@postgresql.org; Tsing, Vadim Subject: RE: [ADMIN] Copying database directory "Tsing, Vadim" wrote: > Out databases vary in size, bat many of the have tables with > millions of rows. Those tables are indexed. One of the databases > that we've timed, took about 12 min to load the data and another > 10-12 min to re-create indexes. The dump file is about 400 MB. This > is by far not the largest one we will have to deal with. Any chance that you can just have the app make requests to a database on your server, and not ship the database with the software? Maybe even do this as a web app? -Kevin