Synchronization Toolkit - Mailing list pgsql-general

From rob
Subject Synchronization Toolkit
Date
Msg-id 006501c05447$fb9aa0c0$4100fd0a@cabrion.org
Whole thread Raw
List pgsql-general
Not to be confused with replication, my concept of synchronization is to
manage changes between a server table (or tables) and one or more mobile,
disconnected databases (i.e. PalmPilot, laptop, etc.).

I read through the notes in the TODO for this topic and devised a tool kit
for doing synchronization.  I hope that the Postgresql development community
will find this useful and will help me refine this concept by offering
insight, experience and some good old fashion hacking if you are so
inclined.

The bottom of this message describes how to use the attached files.

I look forward to your feedback.

--rob


Methodology:

I devised a concept that I call "session versioning".  This means that every
time a row changes it does NOT get a new version.  Rather it gets stamped
with the current session version common to all published tables.  Clients,
when they connect for synchronization, will immediately increment this
common version number reserve the result as a "post version" and then
increment the session version again.  This version number, implemented as a
sequence, is common to all synchronized tables and rows.

Any time the server makes changes to the row gets stamped with the current
session version, when the client posts its changes it uses the reserved
"post version".  The client then makes all it's changes stamping the changed
rows with it's reserved "post version" rather than the current version.  The
reason why is explained later.  It is important that the client post all its
own changes first so that it does not end up receiving records which changed
since it's last session that it is about to update anyway.

Reserving the post version is a two step process.  First, the number is
simply stored in a variable for later use.  Second, the value is added to a
lock table (last_stable) to indicate to any concurrent sessions that rows
with higher version numbers are to be considered "unstable" at the moment
and they should not attempt to retrieve them at this time.  Each client,
upon connection, will use the lowest value in this lock table (max_version)
to determine the upper boundary for versions it should retrieve.  The lower
boundary is simply the  previous session's "max_version" plus one.  Thus
when the client retrieves changes is uses the following SQL "where"
expression:

WHERE row_version >= max_version and row_version <= last_stable_version and
version <> this_post_version

The point of reserving and locking a post version is important in that it
allows concurrent synchronization by multiple clients.  The first, of many,
clients to connect basically dictates to all future clients that they must
not take any rows equal to or greater than the one which it just reserved
and locked.  The reason the session version is incremented a second time is
so that the server may continue to post changes concurrent with any client
changes and be certain that these concurrent server changes will not taint
rows the client is about to retrieve. Once the client is finished with it's
session it removes the lock on it's post version.

Partitioning data for use by each node is the next challenge we face.  How
can we control which "slice" of data each client receives?  A slice can be
horizontal or vertical within a table.  Horizontal slices are easy,  it's
just the where clause of an SQL statement that says "give me the rows that
match X criteria".  We handle this by storing and appending a where clause
to each client's retrieval statement  in addition to where clause described
above.  Actually, two where clauses are stored and appended.  One is per
client and one is per publication (table).

We defined horizontal slices by filtering rows.  Vertical slices are limits
by column.  The tool kit does provide a mechanism for pseudo vertical
partitioning.  When a client is "subscribed" to a publication, the toolkit
stores what columns that node is to receive during a session.  These are
stored in the subscribed_cols table.  While this does limit the number
columns transmitted, the insert/update/delete triggers do not recognize
changes based on columns.   The "pseudo" nature of our vertical partitioning
is evident by example:

Say you have a table with name, address and phone number as columns.  You
restrict a client to see only name and address.  This means that phone
number information will not be sent to the client during synchronization,
and the client can't attempt to alter the phone number of a given entry.
Great, but . . . if, on the server, the phone number (but not the name or
address) is changed, the entire row gets marked with a new version.  This
means that the name and address will get sent to the client even though they
didn't change.

Well, there's the flaw in vertical partitioning.  Other than wasting
bandwidth, the extra row does no harm to the process.  The workaround for
this is to highly normalize your schema when possible.

Collisions are the next crux one encounters with synchronization.  When two
clients retrieve the same row and both make (different)changes, which one is
correct?  So far the system operates totally independent of time.  This is
good because it doesn't rely on the server or client to keep accurate time.
We can just ignore time all together, but then we force our clients to
synchronize on a strict schedule in order to avoid (or reduce) collisions.
If every node synchronized immediately after making changes we could just
stop here.  Unfortunately this isn't reality.  Reality dictates that of two
clients: Client A & B will each pick up the same record on Monday.  A will
make changes on Monday, then leave for vacation.  B will make changes on
Wednesday because new information was gathered in A's absence.  Client B
posts those changes Wednesday.  Meanwhile, client A returns from vacation on
Friday and synchronizes his changes.  A over writes B's changes even though
A made changes before the most recent information was posted by B.

It is clear that we need some form of time stamp to cope with the above
example.  While clocks aren't the most reliable, they are the only common
version control available to solve this problem.  The system is set up to
accept (but not require) timestamps from clients and changes on the server
are time stamped.  The system, when presented a time stamp with a row, will
compare them to figure out who wins in a tie.   The system makes certain
"sanity" checks with regard to these time stamps.  A client may not attempt
to post a change with a timestamp that is more than one hour in the future
(according to what the server thinks "now" is) nor one hour before it's last
synchronization date/time.  The client row will be immediately placed into
the collision table if the timestamp is that far out of whack.
Implementations of the tool kit should take care to ensure that client &
server agree on what "now" is before attempting to submit changes with
timestamps.

Time stamps are not required.  Should a client be incapable of tracking
timestamps, etc.  The system will assume that any server row which has been
changed since the client's last session will win a tie.  This is quite error
prone, so timestamps are encouraged where possible.

Inserts pose an interesting challenge.  Since multiple clients cannot share
a sequence (often used as a primary key) while disconnected.  They will be
responsible for their own unique "row_id" when inserting records.   Inserts
accept any arbitrary key, and write back to the client a special kind of
update that gives the server's row_id.  The client is responsible for making
sure that this update takes place locally.

Deletes are the last portion of the process.  When deletes occur, the
row_id, version, etc. are stored in a "deleted" table.  These entries are
retrieved by the client using the same version filter as described above.
The table is pruned at the end of each session by deleting all records with
versions that are less than the lowest 'last_version' stored for each
client.

Having wrapped up the synchronization process, I'll move on to describe some
points about managing clients, publications and the like.

The tool kit is split into two objects: SyncManagement and Synchronization.
The Synchronization object exposes an API that client implementations use to
communicate and receive changes.  The management functions handle system
install and uninstall in addition to publication of tables and client
subscriptions.

Installation and uninstallation are handled by their corresponding functions
in the API.  All system tables are prefixed and suffixed with four
underscores, in hopes that this avoids conflict with an existing tables.
Calling the install function more than once will generate an error message.
Uninstall will remove all related tables, sequences,  functions and triggers
from the system.

The first step, after installing the system, is to publish a table.  A table
can be published more than once under different names.  Simply provide a
unique name as the second argument to the publish function.  Since object
names are restricted to 32 characters in Postgres, each table is given a
unique id and this id is used to create the trigger and sequence names.
Since one table can be published multiple times, but only needs one set of
triggers and one sequence for change management a reference count is kept so
that we know when to add/drop triggers and functions.  By default, all
columns are published, but the third argument to the publish function
accepts an array reference of column names that allows you to specify a
limited set.  Information about the table is stored in the "tables" table,
info about the publication is in the "publications" table and column names
are stored in "subscribed_cols" table.

The next step is to subscribe a client to a table.  A client is identified
by a user name and a node name.  The subscribe function takes three
arguments: user, node & publication.  The subscription process writes an
entry into the "subscribed" table with default values.  Of note, the
"RefreshOnce" attribute is set to true whenever a table is published.  This
indicates to the system that a full table refresh should be sent the next
time the client connects even if the client requests synchronization rather
than refresh.

The toolkit does not, yet, provide a way to manage the whereclause stored at
either the publication or client level.  To use or test this feature, you
will need to set the whereclause attributes manually.

Tables and users can be unpublished and unsubscribed using the corresponding
functions within the tool kit's management interface.  Because postgres
lacks an "ALTER TABLE DROP COLUMN" function, the unpublish function only
removes default values and indexes for those columns.

The API isn't the most robust thing in the world right now.  All functions
return undef on success and an error string otherwise (like DBD).  I hope to
clean up the API considerably over the next month.  The code has not been
field tested at this time.


The files attached are:

1) SynKit.pm (A perl module that contains install/uninstall functions and a
simple api for synchronization & management)

2) sync_install.pl (Sample code to demonstrate the installation, publishing
and subscribe process)

3) sync_uninstall.pl (Sample code to demonstrate the uninstallation,
unpublishing and unsubscribe process)


To use them on Linux (don't know about Win32 but should work fine):

 - set up a test database and make SURE plpgsql is installed

 - install perl 5.05 along with Date::Parse(TimeDate-1.1) , DBI and DBD::Pg
modules [www.cpan.org]

 - copy all three attached files to a test directory

 - cd to your test directory

 - edit all three files and change the three DBI variables to suit your
system (they are clearly marked)

 - % perl sync_install.pl

 - check out the tables, functions & triggers installed

 - % perl sync.pl

 - check out the 'sync_test' table, do some updates/inserts/deletes and run
sync.pl again
        NOTE: Sanity checks default to allow no more than 50% of the table
to be changed by the client in a single session.
        If you delete all (or most of) the rows  you will get errors when
you run sync.pl again! (by design)

 - % perl sync_uninstall.pl  (when you are done)

 - check out  the sample scripts and the perl module code (commented, but
not documented)






Attachment

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Transaction, Rollback and Database Corruption question,
Next
From: Tom Lane
Date:
Subject: Re: backend dies when a user defined type returns null