Intro
-----
Following patch exports 8 byte txid and snapshot to user level
allowing its use in regular SQL. It is based on Slony-I xxid
module. It provides special 'snapshot' type for snapshot but
uses regular int8 for transaction ID's.
Exported API
------------
Type: snapshot
Functions:
current_txid() returns int8
current_snapshot() returns snapshot
snapshot_xmin(snapshot) returns int8
snapshot_xmax(snapshot) returns int8
snapshot_active_list(snapshot) returns setof int8
snapshot_contains(snapshot, int8) returns bool
pg_sync_txid(int8) returns int8
Operation
---------
Extension to 8-byte is done by keeping track of wraparound count
in pg_control. On every checkpoint, nextxid is compared to one
stored in pg_control. If value is smaller wraparound happened
and epoch is inreased.
When long txid or snapshot is requested, pg_control is locked with
LW_SHARED for retrieving epoch value from it. The patch does not
affect core functionality in any other way.
Backup/restore of txid data
---------------------------
Currently I made pg_dumpall output following statement:
"SELECT pg_sync_txid(%d)", current_txid()
then on target database, pg_sync_txid if it's current
(epoch + GetTopTransactionId()) are larger than given argument.
If not then it bumps epoch, until they are, thus guaranteeing that
new issued txid's are larger then in source database. If restored
into same database instance, nothing will happen.
Advantages of 8-byte txids
--------------------------
* Indexes won't break silently. No need for mandatory periodic
truncate which may not happen for various reasons.
* Allows to keep values from different databases in one table/index.
* Ability to bring data into different server and continue there.
Advantages in being in core
---------------------------
* Core code can guarantee that wraparound check happens in 2G transactions.
* Core code can update pg_control non-transactionally. Module
needs to operate inside user transaction when updating epoch
row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
long transactions, locking, etc).
* Core code has only one place where it needs to update, module
needs to have epoch table in each database.
Todo, tothink
-------------
* Flesh out the documentation. Probably needs some background.
* Better names for some functions?
* pg_sync_txid allows use of pg_dump for moveing database,
but also adds possibility to shoot in the foot by allowing
epoch wraparound to happen. Is "Don't do it then" enough?
* Currently txid keeps its own copy of nextxid in pg_control,
this makes clear data dependencies. Its possible to drop it
and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
* Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?
--
marko