Re: Exposing the Xact commit order to the user - Mailing list pgsql-hackers

From Dan Ports
Subject Re: Exposing the Xact commit order to the user
Date
Msg-id 20100524191021.GA53044@csail.mit.edu
Whole thread Raw
In response to Exposing the Xact commit order to the user  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: Exposing the Xact commit order to the user
List pgsql-hackers
On Sun, May 23, 2010 at 04:21:58PM -0400, Jan Wieck wrote:
> In some systems (data warehousing, replication), the order of commits is
> important, since that is the order in which changes have become visible.
> This information could theoretically be extracted from the WAL, but
> scanning the entire WAL just to extract this tidbit of information would
> be excruciatingly painful.

This is very interesting to me as I've been doing some (research --
nowhere near production-level) work on building a transactional
application-level (i.e. memcached-like) cache atop Postgres. One of the
features I needed to support it was basically what you describe.

Without getting too far into the details of what I'm doing, I needed to
make it clear to a higher layer which commits were visible to a given
query. That is, I wanted to know both the order of commits and where
particular snapshots fit into this ordering. (A SnapshotData struct
obviously contains the visibility information, but a representation in
terms of the commit ordering is both more succinct and allows for easy
ordering comparisons).

Something you might want to consider, then, is adding an interface to
find out the timestamp of the current transaction's snapshot, i.e. the
timestamp of the most recent committed transaction visible to it. I
wouldn't expect this to be difficult to implement as transaction
completion/visibility is already synchronized via ProcArrayLock.

> Each record of the Transaction Commit Info consists of
> 
>       txid          xci_transaction_id
>       timestamptz   xci_begin_timestamp
>       timestamptz   xci_commit_timestamp
>       int64         xci_total_rowcount

Another piece of information that seems useful to provide here would be
the logical timestamp of the transaction, i.e. a counter that's
incremented by one for each transaction. But maybe that's implicit in
the log ordering?

I'm not clear on why the total rowcount is useful, but perhaps I'm
missing something obvious.


I've actually implemented some semblance of this on Postgres 8.2, but it
sounds like what you're interested in is more sophisticated. In
particular, I wasn't at all concerned with durability or WAL stuff, and
I had some specific requirements about when it was OK to purge the
data. Because of this (and very limited development time), I just
threw something together with a simple shared buffer.

I don't think I have any useful code to offer, but let me know if
there's some way I can help out.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Exposing the Xact commit order to the user
Next
From: Robert Haas
Date:
Subject: Re: pg_upgrade docs