Re: Copying database directory - Mailing list pgsql-admin

From Tsing, Vadim
Subject Re: Copying database directory
Date
Msg-id 035B18EFA3CCA540AF17AD5D3DFD9847E09B55@LISL-XBCK-C1-V2.snaponglobal.com
Whole thread Raw
In response to Re: Copying database directory  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-admin
Kevin,

I realize that copying database directory is not supported since OIDs
sequencing will be broken. I also verified that Postgres makes sure that
it creates unique file names when writing pages to the disk, that is why
I've kind of hoped that Postgres may somehow deal with it. :-)

As far as suggested options,

(1) Using separate cluster: I believe that we will have to launch
separate instance of the Postgres to handle each cluster, right?
Unfortunately our target hardware requirements do not leave us much RAM
to deal with.

(2) I believe that this is similar to (1) except we would not have to
copy data to the hard drive.

Out databases vary in size, bat many of the have tables with millions of
rows. Those tables are indexed. One of the databases that we've timed,
took about 12 min to load the data and another 10-12 min to re-create
indexes. The dump file is about 400 MB. This is by far not the largest
one we will have to deal with.

Right now our approach is to copy the dump file to the hard-drive using
MSI and then use custom code to call pg-restore. The entire process
takes a lot of time and, for the big part of the process, we cannot even
show the progress to the user.

What we are looking for is a way to create the database in our
environment, copy that to the removable media (DVD), make that data
available to the postgres running on the users' computer. What makes
this complex, is the need to create multiple databases, only some of
them will ever be running on the user's PC.

For example:
We create the following databases: one for BOLTS, one for NUTS, and one
for SCREWS. Each is packaged in a separate DVD. Customers order any
combination of those (ex: NUTS and BOLTS). Ideally we would like to
quickly load both of those databases on the customer's PC (not at the
same time, since one can be orders separately from another). We also
plan to have a mechanism to periodically ship new versions of those
databases, as well as updating some of the data over Internet (we
already have a mechanism for updating installed data on the background).


We would appreciate any approach of moving the content of the database
from PC to PC that does not require running another instance of
postgres.

Thank you.

P.S. One of our largest databases (during POC) took 8 hours to restore.

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Saturday, December 18, 2010 1:14 PM
To: pgsql-admin@postgresql.org; Tsing, Vadim
Subject: Re: [ADMIN] Copying database directory

"Tsing, Vadim"  wrote:

> We are trying to select a database engine for a new product we plan
> to start developing. One of the requirements is to be able to ship
> the data on the media and install it on the user's hard-drive.
>
> One of the issues we've run into is that pg_restore takes a lot of
> time to restore large database.
>
> Also, moving entire data directory is not an option since we will
> have multiple databases that user can choose to install.
>
> We've tried to copy the database directory and it seems to work.
> Unfortunately we are not sure if there is any, not visible, damage
> to the data.
>
> So, here is my question: Is it possible to:
>
> 1. Create a database
>
> 2. Stop postgres
>
> 3. Empty out the directory created by the new database
>
> 4. Copy files from another database (created on a different
> server) into that directory
>
> 5. Start postgres and use the database
>
> Once this is completed we may need to repeat the process for
> another database.

This is not supported; you are playing with fire.

> what are other options?

A couple with come to mind:

(1) Create a separate PostgreSQL cluster for each database, so that
each has its own data directory -- this is safe at that level,
provided the runtime environments are compatible.

(2) Again, assuming a compatible runtime environment, copy in the
data directory for all of them, but with tablespaces pointing back to
the removable medium.  Of course, this likely only works if the large
part of the data is read-only and fits on one something which can be
mounted at run time.

The other thing to look at is whether there is some way to adjust
your schema to allow faster restore.  Which phase is causing you a
problem?  Have you profiled the restore while it is running?

-Kevin

pgsql-admin by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: vacuum full in the background
Next
From: "Kevin Grittner"
Date:
Subject: Re: Copying database directory