Thread: PG_RESTORE/DUMP Question

PG_RESTORE/DUMP Question

From
Alex
Date:
Hi,

I have a test system that is setup the same as a production system and
would like to frequently copy the database over.
pg_dump takes a few hours and even sometimes hangs.

Are there any reasons not to simply just copy the entire data directory
over to the test system? I could not find any postings on the net
suggesting otherwise. Is there anything to pay attention too ?

Thanks for any advise
Alex



Re: PG_RESTORE/DUMP Question

From
Doug McNaught
Date:
Alex <alex@meerkatsoft.com> writes:

> Hi,
>
> I have a test system that is setup the same as a production system and
> would like to frequently copy the database over.
> pg_dump takes a few hours and even sometimes hangs.
>
> Are there any reasons not to simply just copy the entire data
> directory over to the test system? I could not find any postings on
> the net suggesting otherwise. Is there anything to pay attention too ?

If the two systems are the same architecture and OS, this can work,
but in order to get a consistent copy, you need to either:

a) Stop (completely shut down) the source database while the copy
runs, or
b) Use volume management and take a snapshot of the source database,
them copy the snapshot over.  This will lose open transactions but
will be otherwise consistent.

-Doug

Re: PG_RESTORE/DUMP Question

From
Shridhar Daithankar
Date:
Alex wrote:

> Hi,
>
> I have a test system that is setup the same as a production system and
> would like to frequently copy the database over.
> pg_dump takes a few hours and even sometimes hangs.
>
> Are there any reasons not to simply just copy the entire data directory
> over to the test system? I could not find any postings on the net
> suggesting otherwise. Is there anything to pay attention too ?

Yes. just shutdown production postmaster. Copy the entire data directory over to
test system.

Two system should be absolutely identical. Same architecture, preferrably same
OS, same postgresql client and server version etc.

Or investigate some of the asynchronous replication systems. That would save you
some time but will affect production performance a bit.

HTH

  Shridhar


Re: PG_RESTORE/DUMP Question

From
Ken Godee
Date:
Alex wrote:
> Hi,
>
> I have a test system that is setup the same as a production system and
> would like to frequently copy the database over.
> pg_dump takes a few hours and even sometimes hangs.
>
> Are there any reasons not to simply just copy the entire data directory
> over to the test system? I could not find any postings on the net
> suggesting otherwise. Is there anything to pay attention too ?
>
> Thanks for any advise
> Alex

Probally a point for debate, but sure, why not.
I would create the database in it's own directory as to not
mix things up on both machines ie. export PGDATA2=/usr/local/database
Then just make sure you stop postgres when copying from or to on each
machine.
If someone doesn't think this will work, I'd like to know.
One of my backup routines depends on this kind of proceedure.
Of coarse I've got pg_dumps as well too. :)








Transaction Performance Question

From
"Rick Gigger"
Date:
In the following situation:

You do a large transaction where lots of rows are update
All of your tables/indexes cached in memory

When are the updated rows written out to disk?  When they are updated inside
the transaction, or when the transaction is completed?


Re: Transaction Performance Question

From
"scott.marlowe"
Date:
On Wed, 29 Oct 2003, Rick Gigger wrote:

> In the following situation:
>
> You do a large transaction where lots of rows are update
> All of your tables/indexes cached in memory
>
> When are the updated rows written out to disk?  When they are updated inside
> the transaction, or when the transaction is completed?

The data is written out but not made real, so to speak, during each
update.  I.e. the updates individually add all these rows.  At the end of
the transaction, if we rollback, all the tuples that were written out are
simply not committed, and therefore the last version of that record
remains the last one in the chain.

If the transaction is committed then each tuple becomes the last in its
chain (could it be second to last because of other transactions?  I'm not
sure.)


Re: Transaction Performance Question

From
"Rick Gigger"
Date:
> > In the following situation:
> >
> > You do a large transaction where lots of rows are update
> > All of your tables/indexes cached in memory
> >
> > When are the updated rows written out to disk?  When they are updated
inside
> > the transaction, or when the transaction is completed?
>
> The data is written out but not made real, so to speak, during each
> update.  I.e. the updates individually add all these rows.  At the end of
> the transaction, if we rollback, all the tuples that were written out are
> simply not committed, and therefore the last version of that record
> remains the last one in the chain.
>
> If the transaction is committed then each tuple becomes the last in its
> chain (could it be second to last because of other transactions?  I'm not
> sure.)

I realize how commiting the transaction works from the users perspective I
am thinking here about the internal implementation.  For instance if I do an
update inside a transaction, postgres could, in order to make sure data was
not lost, make sure that the data was flushed out to disk and fsynced.  That
way it could tell me if there was a problem writing that data out to disk.
But if it is in the middle of a transaction I would think that you could
update the tuples cached in memory and return, then start sending the tuples
out to disk in the background.  When you issue the commit of course
everything would need to be flushed out to disk and fsynced and any errors
with it could be reported before the transaction was finished and it could
still be rolled back.

It seems like if I had to update say 39,000 rows all with separate update
statements that it would be a lot faster if each update statement could just
update memory and then return and flush out to disk in the background while
I continue processing the other updates.  Maybe it does do this already or
maybe it is a bad idea for some reason.  I don't understand the inner
workings of postgres to say.  That is why I'm asking.

Also is there any way to issue a whole bunch of updates together like this
faster than just issuing 39,000 individual update statements.


Re: PG_RESTORE/DUMP Question

From
Alex
Date:
Is it enough to just copy the global and the base directory ?
Is there any reason the db would not come up if the data is copied form
solaris to linux or vice versa as long as the db version is the same?

Shridhar Daithankar wrote:

> Alex wrote:
>
>> Hi,
>>
>> I have a test system that is setup the same as a production system
>> and would like to frequently copy the database over.
>> pg_dump takes a few hours and even sometimes hangs.
>>
>> Are there any reasons not to simply just copy the entire data
>> directory over to the test system? I could not find any postings on
>> the net suggesting otherwise. Is there anything to pay attention too ?
>
>
> Yes. just shutdown production postmaster. Copy the entire data
> directory over to test system.
>
> Two system should be absolutely identical. Same architecture,
> preferrably same OS, same postgresql client and server version etc.
>
> Or investigate some of the asynchronous replication systems. That
> would save you some time but will affect production performance a bit.
>
> HTH
>
>  Shridhar
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>



Re: PG_RESTORE/DUMP Question

From
Martijn van Oosterhout
Date:
The data in the data directory is binary data and is not intended to work
even across different installations on the same machine. To copy the binary
data you'd need at least the global, base, pg_xlog, pg_clog and a few other.
The only thing you may skip may be different database is the base directory.
And even then you're risking a lot. Between linux and solaris I'd expect
various byte boundaries to move so forget transportability.

pg_dump is the only supported way of transporting data around.

On Sat, Nov 01, 2003 at 10:20:42PM +0900, Alex wrote:
> Is it enough to just copy the global and the base directory ?
> Is there any reason the db would not come up if the data is copied form
> solaris to linux or vice versa as long as the db version is the same?
>
> Shridhar Daithankar wrote:
>
> >Alex wrote:
> >
> >>Hi,
> >>
> >>I have a test system that is setup the same as a production system
> >>and would like to frequently copy the database over.
> >>pg_dump takes a few hours and even sometimes hangs.
> >>
> >>Are there any reasons not to simply just copy the entire data
> >>directory over to the test system? I could not find any postings on
> >>the net suggesting otherwise. Is there anything to pay attention too ?
> >
> >
> >Yes. just shutdown production postmaster. Copy the entire data
> >directory over to test system.
> >
> >Two system should be absolutely identical. Same architecture,
> >preferrably same OS, same postgresql client and server version etc.
> >
> >Or investigate some of the asynchronous replication systems. That
> >would save you some time but will affect production performance a bit.
> >
> >HTH
> >
> > Shridhar
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment