Thread: Using RSYNC for replication?

Using RSYNC for replication?

From
Jason Hihn
Date:
A sequence of events ocurred to me today that left me wondering if I can
rsync the raw files as a form of replication. I'd like to keep
postmaster running, but flush and lock everything, then perform the copy
  via rsync so only the new data is propigated, all while postmaster is
running.

In general, data is only added to a few tables in the database, with
updates occuring infrequently to the rest. Rarely are deletes ever done.
   During the sync neither DB will change except as part of the rsync.

I think this would be a quick, dirty, safe and efficient way to
accomplish this without having to take down postmaster and send the
whole shebang over.

Are people using this? Will it work? Is it feasible? Thoughts?

Many thanks in advance,
-Jason


Re: Using RSYNC for replication?

From
Michael Carmack
Date:
On Mon, Jan 27, 2003 at 11:56:09PM -0500, Jason Hihn wrote:
> A sequence of events ocurred to me today that left me wondering if I can
> rsync the raw files as a form of replication. I'd like to keep
> postmaster running, but flush and lock everything, then perform the copy
>  via rsync so only the new data is propigated, all while postmaster is
> running.
>
> In general, data is only added to a few tables in the database, with
> updates occuring infrequently to the rest. Rarely are deletes ever done.
>   During the sync neither DB will change except as part of the rsync.
>
> I think this would be a quick, dirty, safe and efficient way to
> accomplish this without having to take down postmaster and send the
> whole shebang over.
>
> Are people using this? Will it work? Is it feasible? Thoughts?

I've done this, and it seems to work fine. But why not dump the database
to a file and rsync that instead? Then you wouldn't have to ensure that
the database is static, and you will probably get a faster transfer.

m.



Re: Using RSYNC for replication?

From
"Christopher Kings-Lynne"
Date:
I don't know abou the 'safe' in that list there.  You will definitely need
to stop both postmasters from running while you are rsyncing and then start
them again afterwards, since memory contents will be different.  You would
also need to rsync all the WAL files, the pg_controldata file, the
architecture must be identical, etc.

ie.  What you are proposing is a very, very bad idea.

I know that Gavin is tinkering with a log shipping solution based on WAL at
the moment, that would be much better for your needs.  Don't except it any
time soon tho :)  It won't work at all until PITR goes in...

Chris

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Jason Hihn
> Sent: Tuesday, 28 January 2003 12:56 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Using RSYNC for replication?
>
>
> A sequence of events ocurred to me today that left me wondering if I can
> rsync the raw files as a form of replication. I'd like to keep
> postmaster running, but flush and lock everything, then perform the copy
>   via rsync so only the new data is propigated, all while postmaster is
> running.
>
> In general, data is only added to a few tables in the database, with
> updates occuring infrequently to the rest. Rarely are deletes ever done.
>    During the sync neither DB will change except as part of the rsync.
>
> I think this would be a quick, dirty, safe and efficient way to
> accomplish this without having to take down postmaster and send the
> whole shebang over.
>
> Are people using this? Will it work? Is it feasible? Thoughts?
>
> Many thanks in advance,
> -Jason
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


Re: Using RSYNC for replication?

From
Lincoln Yeoh
Date:
What about existing open transactions during the sync?

Link.

At 11:56 PM 1/27/03 -0500, Jason Hihn wrote:

>In general, data is only added to a few tables in the database, with
>updates occuring infrequently to the rest. Rarely are deletes ever
>done.   During the sync neither DB will change except as part of the rsync.
>
>I think this would be a quick, dirty, safe and efficient way to accomplish
>this without having to take down postmaster and send the whole shebang over.
>
>Are people using this? Will it work? Is it feasible? Thoughts?
>
>Many thanks in advance,
>-Jason
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>



Re: Using RSYNC for replication?

From
Jason Hihn
Date:
Well this email is for everyone:

1) I don't want to PG dump because of 1) the time and 2) the space. If I
did it that way, I might as well keep the old copy around and run diff
on that, and ship that over the wire. Since this DB can get quie large,
I'd rather not have to have the binary copy AND the bigger text copy
around.

2)I can assure myself that there are no existing open transactions.
Also, this is the reason why I'd lock all the tables. My call for locks
would block until all other transactions were complete, ensuring I get a
  consistant copy.

I'm trying to do this from a user-invoked script where the user knows
nothing about starting/stoping services. Also, as things currently are,
we can get the modified/added records from our database and ship those
over the wire. As things are now, between script invocations, there's
only 5-15k byte differences. If we were to dump a lot more, our current
remote sites would get upset. Most of these remote sites are via modem.

Thanks for th einput so far. Keep it coming!





Lincoln Yeoh wrote:
> What about existing open transactions during the sync?
>
> Link.
>
> At 11:56 PM 1/27/03 -0500, Jason Hihn wrote:
>
>> In general, data is only added to a few tables in the database, with
>> updates occuring infrequently to the rest. Rarely are deletes ever
>> done.   During the sync neither DB will change except as part of the
>> rsync.
>>
>> I think this would be a quick, dirty, safe and efficient way to
>> accomplish this without having to take down postmaster and send the
>> whole shebang over.
>>
>> Are people using this? Will it work? Is it feasible? Thoughts?
>>
>> Many thanks in advance,
>> -Jason
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: Using RSYNC for replication?

From
Andrew Sullivan
Date:
On Tue, Jan 28, 2003 at 08:58:09AM -0500, Jason Hihn wrote:
> 2)I can assure myself that there are no existing open transactions.
> Also, this is the reason why I'd lock all the tables. My call for locks
> would block until all other transactions were complete, ensuring I get a
>   consistant copy.

No, it won't.
<http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/backup-file.html>:

"The database server _must_ be shut down in order to get a usable
backup. Half-way measures such as disallowing all connections will
not work as there is always some buffering going on."

If you want to minimise the amount of traffic you send to your
backup, I suggest looking into some form of replication.  There is a
pile of information on replication at
<http://gborg.postgresql.org/genpage?replication_research>

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Using RSYNC for replication?

From
"Shridhar Daithankar"
Date:
On 28 Jan 2003 at 8:58, Jason Hihn wrote:

> I'm trying to do this from a user-invoked script where the user knows
> nothing about starting/stoping services. Also, as things currently are,
> we can get the modified/added records from our database and ship those
> over the wire. As things are now, between script invocations, there's
> only 5-15k byte differences. If we were to dump a lot more, our current
> remote sites would get upset. Most of these remote sites are via modem.

Dumb question. Would http://pgreplicator.sourceforge.net/ help you?

Bye
 Shridhar

--
Iron Law of Distribution:    Them that has, gets.


Re: Using RSYNC for replication?

From
Tom Lane
Date:
Jason Hihn <jhihn1@umbc.edu> writes:
> A sequence of events ocurred to me today that left me wondering if I can
> rsync the raw files as a form of replication.

In general, you can't.  There are very precise synchronization
requirements among the files making up the data directory, and there's
no way that a separate process like tar or rsync is going to capture a
consistent snapshot of all the files.

As an example: one of the recent reports of duplicate rows (in a table
with a unique index) seems to have arisen because someone tried to take
a tar dump of $PGDATA while the postmaster was running.  When he
restored the tar, two different versions of a recently-updated row both
looked to be valid, because the table's data file was out of sync with
pg_clog.

If you had a dump utility that was aware of the synchronization
requirements, it *might* be possible to dump the files in an order that
would work reliably (I'm not totally sure about it, but certainly data
files before WAL would be one essential part of the rules).  But out-of-
the-box tar or rsync won't get it right.

> I'd like to keep postmaster running, but flush and lock everything,
> then perform the copy via rsync so only the new data is propigated,
> all while postmaster is running.
> In general, data is only added to a few tables in the database, with
> updates occuring infrequently to the rest. Rarely are deletes ever done.
>    During the sync neither DB will change except as part of the rsync.

If you checkpoint before the rsync, and guarantee that no updates occur
between that and the conclusion of the rsync, and *take down the
destination postmaster* while it runs, then it might possibly work.
But I'd never trust it.  I'd also kinda wonder what's the point, if you
have to prevent updates; you might as well shut down the postmaster and
avoid the risk of problems.

A final note is that I doubt this would be very efficient: wouldn't
rsync have to ship entire table files (and entire WAL log files) for
even the most piddling change?

            regards, tom lane

Re: Using RSYNC for replication?

From
Ron Johnson
Date:
On Mon, 2003-01-27 at 22:56, Jason Hihn wrote:
> A sequence of events ocurred to me today that left me wondering if I can
> rsync the raw files as a form of replication. I'd like to keep
> postmaster running, but flush and lock everything, then perform the copy
>   via rsync so only the new data is propigated, all while postmaster is
> running.
>
> In general, data is only added to a few tables in the database, with
> updates occuring infrequently to the rest. Rarely are deletes ever done.
>    During the sync neither DB will change except as part of the rsync.
>
> I think this would be a quick, dirty, safe and efficient way to
> accomplish this without having to take down postmaster and send the
> whole shebang over.
>
> Are people using this? Will it work? Is it feasible? Thoughts?
>
> Many thanks in advance,
> -Jason

You could create a set of "modification log" tables that mirror the
"regular" tables, and are populated via triggers.

Then, when the main DB is quiesced, COPY out those mod log tables that
have rows in them, and then delete all from them.

Next, at your liesure, send over and apply these log table extracts.

Yes, this is not practical for a large number of tables, but, depending
on your h/w, can sustain even an extremely high-volume aplication.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+


Re: Using RSYNC for replication?

From
Dennis Gearon
Date:
What's WAL?

1/27/2003 10:05:22 PM, "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> wrote:

>I don't know abou the 'safe' in that list there.  You will definitely need
>to stop both postmasters from running while you are rsyncing and then start
>them again afterwards, since memory contents will be different.  You would
>also need to rsync all the WAL files, the pg_controldata file, the
>architecture must be identical, etc.
>
>




Re: Using RSYNC for replication?

From
jhihn1
Date:
Why isn't database data and system data seperate? I realize I'm stretching
ACID here, but  I think isolation also applies to databases themselves, and
from the system as well. At any given time, given an idle database, I should
be able to rip out the data and put a new in. Something as simple as
remounting the data directory to a [NFS] backup copy.

In my idea clog and WAL would be flushed and empty for the given databases.
Even if there is one set of logs for all the databases, as long as there are
no entries for the database being updated (and there won't be, because we
flushed them out) changing out the data should be simple. True, indexes should
be dumped and reloaded, but that is acceptible for me.

I don't understand what is so hard about doing it this way. It would make
replication so simple and fast. I'm attempting to do what amounts to a
file-system level backup, while still running. Normally a bad idea, but I am
in the situation that I can ensure that the clog and WAL are empty (for this
database anyway) and nothing is coming in.

If it can't do this, then it damn well should. Move clogs and WALs into each
database's directory so each is isolated. Put a call into Postgres (psql
function) to catch up on the logs. Then lock every table from writes (JIC),
perform the backup, unlock every table.

PS. Sorry my mailer dropped the subject. Fixed now.
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Tuesday, January 28, 2003 10:51 AM
To: jhihn1
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL]


jhihn1 <jhihn1@umbc.edu> writes:
> Another problem, and this is a big one, is I can't shutdown my master
> postmaster here. I have queries running all the time on multiple databases
on
> 1 postmaster. I can make sure that nothing is running in the database that
is
> being updated, but I can't lock people out of the 600-some other databases
> under this postmaster. No way, uh-uh, never going to happen!

<blink>  You're expecting to rsync just one database out of an
installation?  Not a chance.  The WAL and clog files must be replicated
too, and those are shared across the whole installation.  And the
guaranteed-no-update condition must hold across the whole installation,
too.

Forget rsync; there is no way on earth that you have a prayer of making
that work.  Go look at the various existing replication projects (see
the link someone already provided).  One of them might suit your needs.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Re: Using RSYNC for replication?

From
Vivek Khera
Date:
>>>>> "JH" == Jason Hihn <jhihn1@umbc.edu> writes:

JH> Well this email is for everyone:
JH> 1) I don't want to PG dump because of 1) the time and 2) the space. If I
JH> did it that way, I might as well keep the old copy around and run diff

I don't know what you're doing, but my DB expanded out is about 18Gb.
The compressed dump is about 1.4Gb.  To rsync the data takes about 4
hours over a private 100baseT switch not doing anything else.  To
pg_dump takes about 40 minutes over the same wire.

The advantage is that the pg_dump is guaranteed consistent, no matter
how long it takes.

Here's how I pg_dump:

pg_dump -h dbhost -Fc DBNAME > DBNAME.`date +%d-%b-%Y`.dump



--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Using RSYNC for replication?

From
Vivek Khera
Date:
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> A final note is that I doubt this would be very efficient: wouldn't
TL> rsync have to ship entire table files (and entire WAL log files) for
TL> even the most piddling change?

No, rsync is smart that way.  It checksums hunks of the files, and
sends only the parts that change.  I did a test and the first rsync
took me 4 hours.  The second one the next day took about 1.5 hours.

I was moving the data to another box, and the time to
dump/restore/analyze the tables was enormous.  Using rsync made it
faster.  First I did the rsync live (4 hours), then shut down the
source postmaster, did the rsync again (only 1 hour) and brought up
the new server.  Downtime was 1 hour rather than 8.

However for backup purposes, it makes no sense.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Using RSYNC for replication?

From
Arjen van der Meijden
Date:
You can do without the temporal filestorage, using unix/linux' pipe
functionality:

pg_dump whatever_commands -you -want | psql -h remotehost -etc
Or, when dumping with the binary format, pipe it to pg_restore (afaik
that is possible)

Arjen

> -----Oorspronkelijk bericht-----
> Van: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Namens Vivek Khera
> Verzonden: dinsdag 28 januari 2003 18:39
> Aan: pgsql-general@postgresql.org
> Onderwerp: Re: [GENERAL] Using RSYNC for replication?
>
>
> >>>>> "JH" == Jason Hihn <jhihn1@umbc.edu> writes:
>
> JH> Well this email is for everyone:
> JH> 1) I don't want to PG dump because of 1) the time and 2)
> the space.
> JH> If I
> JH> did it that way, I might as well keep the old copy around
> and run diff
>
> I don't know what you're doing, but my DB expanded out is
> about 18Gb. The compressed dump is about 1.4Gb.  To rsync the
> data takes about 4 hours over a private 100baseT switch not
> doing anything else.  To pg_dump takes about 40 minutes over
> the same wire.
>
> The advantage is that the pg_dump is guaranteed consistent,
> no matter how long it takes.
>
> Here's how I pg_dump:
>
> pg_dump -h dbhost -Fc DBNAME > DBNAME.`date +%d-%b-%Y`.dump
>
>
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.                Khera Communications, Inc.
> Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
http://www.postgresql.org/users-lounge/docs/faq.html



Re: Using RSYNC for replication?

From
Tom Lane
Date:
jhihn1 <jhihn1@umbc.edu> writes:
> I don't understand what is so hard about doing it this way.

If you want separate installations, make separate installations.  Don't
expect multiple databases in a single installation to be implemented
with the same amount of overhead as separate installations would be.
If we did it that way, we'd legitimately get complaints.

> It would make replication so simple and fast.

No it wouldn't; as I've been trying to explain to you, there are a lot
of reasons why rsync'ing a database won't work.  Fixing a few of them
doesn't produce a working solution.  Nor are we going to contort the
system design to make a fundamentally wrongheaded approach to
replication work.  rsync is just not the basis of a workable solution,
because it doesn't and can't know anything about the database state or
the semantics of the different files in the database.

            regards, tom lane

Re: Using RSYNC for replication?

From
Richard Welty
Date:
On Tue, 28 Jan 2003 19:39:23 +0530 Shridhar Daithankar <shridhar_daithankar@persistent.co.in> wrote:

> Dumb question. Would http://pgreplicator.sourceforge.net/ help you?

well, i'm in dire need of pgreplicator or something like it, and quite
frankly it's a bastard to install, and the discussion on the sourceforge
forum suggests that it's not as reliable as it might be.

DBmirror in the 7.3 contrib directory looks like it might be 90% or so
of what i personally need. i'm going through the code right now to see if i
can figure out how to make up the gap.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
              Unix, Linux, IP Network Engineering, Security

Re: Using RSYNC for replication?

From
"Christopher Kings-Lynne"
Date:
From the manual:

http://www.au.postgresql.org/users-lounge/docs/7.3/postgres/wal.html

Chris

> -----Original Message-----
> From: Dennis Gearon [mailto:gearond@cvc.net]
> Sent: Wednesday, 29 January 2003 12:32 AM
> To: Jason Hihn; pgsql-general@postgresql.org; Christopher Kings-Lynne
> Subject: Re: [GENERAL] Using RSYNC for replication?
>
>
> What's WAL?
>
> 1/27/2003 10:05:22 PM, "Christopher Kings-Lynne"
> <chriskl@familyhealth.com.au> wrote:
>
> >I don't know abou the 'safe' in that list there.  You will
> definitely need
> >to stop both postmasters from running while you are rsyncing and
> then start
> >them again afterwards, since memory contents will be different.
> You would
> >also need to rsync all the WAL files, the pg_controldata file, the
> >architecture must be identical, etc.
> >
> >
>
>
>


Re: Using RSYNC for replication?

From
"Shridhar Daithankar"
Date:
On Wednesday 29 Jan 2003 12:09 am, you wrote:
> jhihn1 <jhihn1@umbc.edu> writes:
> > It would make replication so simple and fast.
>
> No it wouldn't; as I've been trying to explain to you, there are a lot
> of reasons why rsync'ing a database won't work.  Fixing a few of them
> doesn't produce a working solution.  Nor are we going to contort the
> system design to make a fundamentally wrongheaded approach to
> replication work.  rsync is just not the basis of a workable solution,
> because it doesn't and can't know anything about the database state or
> the semantics of the different files in the database.

That makes me wonder. How hard it is to create async replication based on WAL,
assuming there is not one already.

Create a daemon that watches WAL dir. As soon as a new file is created, old
file is copied at someplace else, which can also be dropped into WAL dir. of
another installation and it will sync up.

Assumming WAL naming notation/names can be tweaked by external program, is it
correct idea of async notation? It should not be that hard to come up with
such a daemon.

 Shridhar

P.S. Tom, sorry for personal mail. I booted into BSD and KMail is bit
different than pegasus.. Sorry!

Re: Using RSYNC for replication?

From
"Denis A. Doroshenko"
Date:
Do not like stepping in with thoughts on a subject that can already
cause anger. Anyway, just a thought.

What if there would be such a method: the database is "frozen", which
means, all commited work is flushed to the database, ongoing changes
are going to an alternative destination (kind of journal), while the
original database becomes read-only with the "journal" on top of it
(i.e. all inserts, updates and deletes made to journal are visible for
the clients); later once the database is unfrozen, the jorunal is joined
into the database (i.e. database is synched).

Well, may be at least the above paragraph made you laughing. Doctors say
it is very healthy to laugh... :-)

This is how I understand FFS (fast file system) snapshots work for
background fsck and online backups. It is said to work in FreeBSD 5.0
(though I did not try it)...

On Tue, Jan 28, 2003 at 01:39:43PM -0500, Tom Lane wrote:
> jhihn1 <jhihn1@umbc.edu> writes:
> > I don't understand what is so hard about doing it this way.
>
> If you want separate installations, make separate installations.  Don't
> expect multiple databases in a single installation to be implemented
> with the same amount of overhead as separate installations would be.
> If we did it that way, we'd legitimately get complaints.
>
> > It would make replication so simple and fast.
>
> No it wouldn't; as I've been trying to explain to you, there are a lot
> of reasons why rsync'ing a database won't work.  Fixing a few of them
> doesn't produce a working solution.  Nor are we going to contort the
> system design to make a fundamentally wrongheaded approach to
> replication work.  rsync is just not the basis of a workable solution,
> because it doesn't and can't know anything about the database state or
> the semantics of the different files in the database.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Denis A. Doroshenko, GPRS engineer, d.doroshenko@omnitel.net, +37069863486
Omnitel Ltd., Muitines Str. 35, LT-2600 Vilnius, Lithuania; www.omnitel.lt

Re: Using RSYNC for replication?

From
"Rick Gigger"
Date:
> Why isn't database data and system data seperate? I realize I'm stretching
> ACID here, but  I think isolation also applies to databases themselves,
and
> from the system as well. At any given time, given an idle database, I
should
> be able to rip out the data and put a new in. Something as simple as
> remounting the data directory to a [NFS] backup copy.
>
> In my idea clog and WAL would be flushed and empty for the given
databases.
> Even if there is one set of logs for all the databases, as long as there
are
> no entries for the database being updated (and there won't be, because we
> flushed them out) changing out the data should be simple. True, indexes
should
> be dumped and reloaded, but that is acceptible for me.
>
> I don't understand what is so hard about doing it this way. It would make
> replication so simple and fast. I'm attempting to do what amounts to a
> file-system level backup, while still running. Normally a bad idea, but I
am
> in the situation that I can ensure that the clog and WAL are empty (for
this
> database anyway) and nothing is coming in.
>
> If it can't do this, then it damn well should. Move clogs and WALs into
each
> database's directory so each is isolated. Put a call into Postgres (psql
> function) to catch up on the logs. Then lock every table from writes
(JIC),
> perform the backup, unlock every table.

Maybe you should consider using mysql if that is what you want.  Mysql works
that way.  Each database is entirely encapsulated in it's own directory.


Re: Using RSYNC for replication?

From
"Shridhar Daithankar"
Date:
On Wednesday 29 Jan 2003 2:35 pm, you wrote:
> > If it can't do this, then it damn well should. Move clogs and WALs into
> each
> > database's directory so each is isolated. Put a call into Postgres (psql
> > function) to catch up on the logs. Then lock every table from writes
> (JIC),
> > perform the backup, unlock every table.
>
> Maybe you should consider using mysql if that is what you want.  Mysql
> works that way.  Each database is entirely encapsulated in it's own
> directory.

Well, postgresql has each database in it's own directories but WAL and clog
are shared. If you want them distint, run different database site on same
machine.

That would be like oracle. Minimum 4 processes per database. To put it
politely, I don't like it.

 Shridhar

Re: Using RSYNC for replication?

From
"Reid Thompson"
Date:
PostgreSQL Database Replication Options
http://conferences.oreillynet.com/presentations/os2002/johnson_darren.pdf

-----Original Message-----
From: Richard Welty [mailto:rwelty@averillpark.net]
Sent: Tuesday, January 28, 2003 6:21 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Using RSYNC for replication?


On Tue, 28 Jan 2003 19:39:23 +0530 Shridhar Daithankar <shridhar_daithankar@persistent.co.in> wrote:

> Dumb question. Would http://pgreplicator.sourceforge.net/ help you?

well, i'm in dire need of pgreplicator or something like it, and quite
frankly it's a bastard to install, and the discussion on the sourceforge
forum suggests that it's not as reliable as it might be.

DBmirror in the 7.3 contrib directory looks like it might be 90% or so
of what i personally need. i'm going through the code right now to see if i
can figure out how to make up the gap.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
              Unix, Linux, IP Network Engineering, Security



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Using RSYNC for replication?

From
Andrew Sullivan
Date:
On Wed, Jan 29, 2003 at 10:29:17AM +0200, Denis A. Doroshenko wrote:
> What if there would be such a method: the database is "frozen", which
> means, all commited work is flushed to the database, ongoing changes
> are going to an alternative destination (kind of journal), while the
> original database becomes read-only with the "journal" on top of it
> (i.e. all inserts, updates and deletes made to journal are visible for
> the clients); later once the database is unfrozen, the jorunal is joined
> into the database (i.e. database is synched).
>
> Well, may be at least the above paragraph made you laughing. Doctors say

Actually, there is nothing to laugh about in the proposal.  Some
commercial systems do it more or less like that.

This is really a way of making point-in-time backups, and using them
as a basis for replication, too.  I think it may be a mistake to try
to overload point-in-time recovery and replication, however.
PostgreSQL stays remarkably close to the traditional UNIX approach of
small, specialised tools that can be stuck together to do something
larger.  I like this approach, because it tends to localise bugs: a
problem in one area doesn't cause you pain in another.

That said, there is work going on with point-in-time for 7.4, so I
suspect you'll be able to take this approach in the near-ish future.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Using RSYNC for replication?

From
Richard Welty
Date:
On Wed, 29 Jan 2003 08:37:25 -0500 Reid Thompson <Reid.Thompson@ateb.com> wrote:

> PostgreSQL Database Replication Options
> http://conferences.oreillynet.com/presentations/os2002/johnson_darren.pdf

that's a helpful link for initial research. i'm kind of past that point. of
the outlined solutions, pgreplicator is the only one that is relevant to my
situation, and i'm already quite sure i'm not going to use it.

DBMirror (as in the 7.3 contrib directory) is looking like the way to go.
the default config doesn't do what i need exactly, but it looks like i can
get there. i'll probably put up a howto webpage on my variant once i get it
going.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
              Unix, Linux, IP Network Engineering, Security

Re: Using RSYNC for replication?

From
Justin Clift
Date:
Richard Welty wrote:
> On Wed, 29 Jan 2003 08:37:25 -0500 Reid Thompson <Reid.Thompson@ateb.com> wrote:
>
>
>>PostgreSQL Database Replication Options
>>http://conferences.oreillynet.com/presentations/os2002/johnson_darren.pdf
>
> that's a helpful link for initial research. i'm kind of past that point. of
> the outlined solutions, pgreplicator is the only one that is relevant to my
> situation, and i'm already quite sure i'm not going to use it.
>
> DBMirror (as in the 7.3 contrib directory) is looking like the way to go.
> the default config doesn't do what i need exactly, but it looks like i can
> get there. i'll probably put up a howto webpage on my variant once i get it
> going.

That would be very cool, and very welcome.  :)

If you'd like the page on the PostgreSQL Technical Documentation site,
we'd be glad to have it there, or at least link to if it's elsewhere.

Regards and best wishes,

Justin Clift


> richard
> --
> Richard Welty                                         rwelty@averillpark.net
> Averill Park Networking                                         518-573-7592
>               Unix, Linux, IP Network Engineering, Security

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


Re: Using RSYNC for replication?

From
Greg Stark
Date:
"Shridhar Daithankar<shridhar_daithankar@persistent.co.in>" <shridhar_daithankar@persistent.co.in> writes:

> That makes me wonder. How hard it is to create async replication based on WAL,
> assuming there is not one already.

What you describe below sounds like what Oracle calls a warm standby database.

It's not replication because you can't be doing separate operations in the
standby database and expect the logs from the original database to still apply
cleanly.

Even so it's an extremely useful configuration. It's allows instantaneous
fail-over to the standby database without having to restore from backups. And
Oracle lets you open the standby database read-only which is extremely useful
for doing large batch queries without slowing down your main OLTP database.

> Create a daemon that watches WAL dir. As soon as a new file is created, old
> file is copied at someplace else, which can also be dropped into WAL dir. of
> another installation and it will sync up.
>
> Assumming WAL naming notation/names can be tweaked by external program, is it
> correct idea of async notation? It should not be that hard to come up with
> such a daemon.

--
greg