Re: Using RSYNC for replication? - Mailing list pgsql-general

From Tom Lane
Subject Re: Using RSYNC for replication?
Date
Msg-id 13051.1043765929@sss.pgh.pa.us
Whole thread Raw
In response to Using RSYNC for replication?  (Jason Hihn <jhihn1@umbc.edu>)
List pgsql-general
Jason Hihn <jhihn1@umbc.edu> writes:
> A sequence of events ocurred to me today that left me wondering if I can
> rsync the raw files as a form of replication.

In general, you can't.  There are very precise synchronization
requirements among the files making up the data directory, and there's
no way that a separate process like tar or rsync is going to capture a
consistent snapshot of all the files.

As an example: one of the recent reports of duplicate rows (in a table
with a unique index) seems to have arisen because someone tried to take
a tar dump of $PGDATA while the postmaster was running.  When he
restored the tar, two different versions of a recently-updated row both
looked to be valid, because the table's data file was out of sync with
pg_clog.

If you had a dump utility that was aware of the synchronization
requirements, it *might* be possible to dump the files in an order that
would work reliably (I'm not totally sure about it, but certainly data
files before WAL would be one essential part of the rules).  But out-of-
the-box tar or rsync won't get it right.

> I'd like to keep postmaster running, but flush and lock everything,
> then perform the copy via rsync so only the new data is propigated,
> all while postmaster is running.
> In general, data is only added to a few tables in the database, with
> updates occuring infrequently to the rest. Rarely are deletes ever done.
>    During the sync neither DB will change except as part of the rsync.

If you checkpoint before the rsync, and guarantee that no updates occur
between that and the conclusion of the rsync, and *take down the
destination postmaster* while it runs, then it might possibly work.
But I'd never trust it.  I'd also kinda wonder what's the point, if you
have to prevent updates; you might as well shut down the postmaster and
avoid the risk of problems.

A final note is that I doubt this would be very efficient: wouldn't
rsync have to ship entire table files (and entire WAL log files) for
even the most piddling change?

            regards, tom lane

pgsql-general by date:

Previous
From: "Nisha Joseph"
Date:
Subject: Inserting large objects
Next
From: Tom Lane
Date:
Subject: Re: Status of tablespaces