database synchronization - Mailing list pgsql-general

From Chris Jones
Subject database synchronization
Date
Msg-id a5fg0lb5tpt.fsf@merry.mt.sri.com
Whole thread Raw
List pgsql-general
After browsing through the online archive of the thread on database
replication, I feel somewhat enlightened.  I'm wondering if anybody
has any bright suggestions to this problem, which is a subset of the
one discussed in the thread:

We are looking at deploying a database which will need to be
synchronized, but not replicated.  In other words, we want several
database servers to have copies of the data, with all except the
master copy being effectively read-only.  If the data on a particular
slave is a little bit out of date, it's no big loss.  Most clients
will read data, but a few will also make changes.

Here's one possible low-tech approach:

* A script will periodically dump the database on the master.  This
  happens inside a transaction, so we should get a consistent dump.

* In some kind of round-robin fashion, each slave will check that a
  sufficient number of other slaves are running, then bring itself
  down, fetch the dump, and reload it.

* A read-only client will grab a database server out of a pool of them
  and connect to it.

* A read-write client will only accept a connection to the master
  server.

The biggest problem I see with such a solution is that it doesn't
scale well:  As the database grows, so does the time to dump/restore
it.  Also, it's reasonable to assume that the number of slave servers
will grow at roughly the same rate.

An alternative solution would be more like this:

* On the master, create a sync_log table, which would contain:

  log_id serial not null primary key,
  cmd text not null

* On the master, create a set of triggers:  on insert/update/delete,
  insert an SQL statement that will perform that operation.

* On each slave, keep a record of the log_id of the most recent
  sync_log entry which was imported from the master.

* Periodically, each slave will connect to the master and grab all the
  new sync_log entries, and then execute them.

This scenario has two problems:  The first is writing a pl/pgsql
procedure that will do all the right quoting and logic to build that
SQL statement.  The second problem is that this table is going to get
large.  Realistically, though, I could keep track of the most recent
sync_log entry for each slave, and delete old log entries.

I've tried to come up with a more graceful, more space-efficient
method of doing this, but it turns out to be a fairly complicated
problem.  Has somebody else done the homework for this?

Chris

--
----------------------------------------------------- chris@mt.sri.com
Chris Jones                                    SRI International, Inc.

pgsql-general by date:

Previous
From: Marc SCHAEFER
Date:
Subject: Re: Increasing Table Column Size in 7.0 Syntax
Next
From:
Date:
Subject: Is cycle references?