Thread: Syncing an application cache with xmin

Syncing an application cache with xmin

From
Jason Dusek
Date:
Hello List,

Imagine a table of records describing the up-to-date state of
some objects:

   id  |      t      | data columns ...
 ------+-------------+------------------
  uuid | timestamptz | ...

The `id' column is a PRIMARY KEY. When an object is updated, the
old record is archived and a new record, with `t' set to
transaction_timestamp(), is added. So the table really contains
just the up-to-date state of objects.

An application would like to cache this up to date state,
synchronizing at regular intervals. The application might not
maintain a persistent connection to the database, so
LISTEN/NOTIFY is not to be preferred. What are some algorithms
by which one can retrieve new rows since the last sync?

Since the sync is made at regular intervals -- let's call it
once a minute -- one could query for all records with `t' that
is less than a minute old. For safety's sake, one can query for
*two* minutes of data.

This would seem to solve the problem. It works pretty well.
However, a new challenger appears: imagine an inserting
transaction that runs for five minutes. None of rows are visible
and then they are added -- with a transaction_timestamp that is
five minutes in the past! These rows are never synced.

Now you could argue that writes shouldn't occur in such long
transactions and you would be right; but it happens that one-off
tools exhibit bad behaviour one wouldn't accept in a production
application, and fixing the tools can be hard to make happen.

I was reading today about the `xmin' column, `txid_current()`
and `pg_export_snapshot()` and I wonder if there is not a better
way to sync, using transaction IDs instead of time.

  http://www.postgresql.org/docs/9.2/static/ddl-system-columns.html
  http://www.postgresql.org/docs/9.2/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
  http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

The idea would be, to store information about the last XID in
the last sync and search for XIDs committed since then upon
reconnecting for sync. Perhaps `txid_current_snapshot()'
preserves enough information. Is this a plausible technique?
Would it be a misuse of XIDs?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B

Re: Syncing an application cache with xmin

From
Tom Lane
Date:
Jason Dusek <jason.dusek@gmail.com> writes:
> The idea would be, to store information about the last XID in
> the last sync and search for XIDs committed since then upon
> reconnecting for sync. Perhaps `txid_current_snapshot()'
> preserves enough information. Is this a plausible technique?

Perfectly plausible, and often done in one guise or another.  You can't
expect row XIDs to survive forever --- they'll be replaced by FrozenXID
after awhile to avoid problems due to transaction counter wraparound.
But for delays of a few minutes, in a database with an unremarkable
transaction rate, that's not an issue.

            regards, tom lane

Re: Syncing an application cache with xmin

From
Jason Dusek
Date:
2013/2/3 Tom Lane <tgl@sss.pgh.pa.us>:
> Jason Dusek <jason.dusek@gmail.com> writes:
>> The idea would be, to store information about the last XID in
>> the last sync and search for XIDs committed since then upon
>> reconnecting for sync. Perhaps `txid_current_snapshot()'
>> preserves enough information. Is this a plausible technique?
>
> Perfectly plausible, and often done in one guise or another.
> You can't expect row XIDs to survive forever --- they'll be
> replaced by FrozenXID after awhile to avoid problems due to
> transaction counter wraparound.  But for delays of a few
> minutes, in a database with an unremarkable transaction rate,
> that's not an issue.

What is the relationship of the epoch-extended XID returned by
`txid_current()' to the XIDs in rows? Do all rows from a
previous epoch always have the FrozenXID?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B