Thread: [PATCH] Provide 8-byte transaction IDs to user level

[PATCH] Provide 8-byte transaction IDs to user level

From
Marko Kreen
Date:
Intro
-----

Following patch exports 8 byte txid and snapshot to user level
allowing its use in regular SQL.  It is based on Slony-I xxid
module.  It provides special 'snapshot' type for snapshot but
uses regular int8 for transaction ID's.

Exported API
------------

Type: snapshot

Functions:

  current_txid()            returns int8
  current_snapshot()            returns snapshot
  snapshot_xmin(snapshot)        returns int8
  snapshot_xmax(snapshot)        returns int8
  snapshot_active_list(snapshot)    returns setof int8
  snapshot_contains(snapshot, int8)    returns bool
  pg_sync_txid(int8)            returns int8

Operation
---------

Extension to 8-byte is done by keeping track of wraparound count
in pg_control.  On every checkpoint, nextxid is compared to one
stored in pg_control.  If value is smaller wraparound happened
and epoch is inreased.

When long txid or snapshot is requested, pg_control is locked with
LW_SHARED for retrieving epoch value from it.  The patch does not
affect core functionality in any other way.

Backup/restore of txid data
---------------------------

Currently I made pg_dumpall output following statement:

  "SELECT pg_sync_txid(%d)", current_txid()

then on target database, pg_sync_txid if it's current
(epoch + GetTopTransactionId()) are larger than given argument.
If not then it bumps epoch, until they are, thus guaranteeing that
new issued txid's are larger then in source database.  If restored
into same database instance, nothing will happen.


Advantages of 8-byte txids
--------------------------

* Indexes won't break silently.  No need for mandatory periodic
  truncate which may not happen for various reasons.
* Allows to keep values from different databases in one table/index.
* Ability to bring data into different server and continue there.

Advantages in being in core
---------------------------

* Core code can guarantee that wraparound check happens in 2G transactions.
* Core code can update pg_control non-transactionally.  Module
  needs to operate inside user transaction when updating epoch
  row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
  long transactions, locking, etc).
* Core code has only one place where it needs to update, module
  needs to have epoch table in each database.

Todo, tothink
-------------

* Flesh out the documentation.  Probably needs some background.
* Better names for some functions?
* pg_sync_txid allows use of pg_dump for moveing database,
  but also adds possibility to shoot in the foot by allowing
  epoch wraparound to happen.  Is "Don't do it then" enough?
* Currently txid keeps its own copy of nextxid in pg_control,
  this makes clear data dependencies.  Its possible to drop it
  and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
* Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?

--
marko


Attachment

Re: [PATCH] Provide 8-byte transaction IDs to user level

From
Bruce Momjian
Date:
I am sure you worked hard on this, but I don't see the use case, nor
have I heard people in the community requesting such functionality.
Perhaps pgfoundry would be a better place for this.

---------------------------------------------------------------------------

Marko Kreen wrote:
>
> Intro
> -----
>
> Following patch exports 8 byte txid and snapshot to user level
> allowing its use in regular SQL.  It is based on Slony-I xxid
> module.  It provides special 'snapshot' type for snapshot but
> uses regular int8 for transaction ID's.
>
> Exported API
> ------------
>
> Type: snapshot
>
> Functions:
>
>   current_txid()            returns int8
>   current_snapshot()            returns snapshot
>   snapshot_xmin(snapshot)        returns int8
>   snapshot_xmax(snapshot)        returns int8
>   snapshot_active_list(snapshot)    returns setof int8
>   snapshot_contains(snapshot, int8)    returns bool
>   pg_sync_txid(int8)            returns int8
>
> Operation
> ---------
>
> Extension to 8-byte is done by keeping track of wraparound count
> in pg_control.  On every checkpoint, nextxid is compared to one
> stored in pg_control.  If value is smaller wraparound happened
> and epoch is inreased.
>
> When long txid or snapshot is requested, pg_control is locked with
> LW_SHARED for retrieving epoch value from it.  The patch does not
> affect core functionality in any other way.
>
> Backup/restore of txid data
> ---------------------------
>
> Currently I made pg_dumpall output following statement:
>
>   "SELECT pg_sync_txid(%d)", current_txid()
>
> then on target database, pg_sync_txid if it's current
> (epoch + GetTopTransactionId()) are larger than given argument.
> If not then it bumps epoch, until they are, thus guaranteeing that
> new issued txid's are larger then in source database.  If restored
> into same database instance, nothing will happen.
>
>
> Advantages of 8-byte txids
> --------------------------
>
> * Indexes won't break silently.  No need for mandatory periodic
>   truncate which may not happen for various reasons.
> * Allows to keep values from different databases in one table/index.
> * Ability to bring data into different server and continue there.
>
> Advantages in being in core
> ---------------------------
>
> * Core code can guarantee that wraparound check happens in 2G transactions.
> * Core code can update pg_control non-transactionally.  Module
>   needs to operate inside user transaction when updating epoch
>   row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
>   long transactions, locking, etc).
> * Core code has only one place where it needs to update, module
>   needs to have epoch table in each database.
>
> Todo, tothink
> -------------
>
> * Flesh out the documentation.  Probably needs some background.
> * Better names for some functions?
> * pg_sync_txid allows use of pg_dump for moveing database,
>   but also adds possibility to shoot in the foot by allowing
>   epoch wraparound to happen.  Is "Don't do it then" enough?
> * Currently txid keeps its own copy of nextxid in pg_control,
>   this makes clear data dependencies.  Its possible to drop it
>   and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
> * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?
>
> --
> marko
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I am sure you worked hard on this, but I don't see the use case, nor
> have I heard people in the community requesting such functionality.
> Perhaps pgfoundry would be a better place for this.

The part of this that would actually be useful to put in core is
maintaining a 64-bit XID counter, ie, keep an additional counter that
bumps every time XID wraps around.  This cannot be done very well from
outside core but it would be nearly trivial, and nearly free, to add
inside.  Everything else in the patch could be done just as well as an
extension datatype.

(I wouldn't do it like this though --- TransactionIdAdvance itself is
the place to bump the secondary counter.)

The question though is if we did that, would Slony actually use it?

            regards, tom lane

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
Darcy Buskermolen
Date:
On Wednesday 26 July 2006 13:04, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I am sure you worked hard on this, but I don't see the use case, nor
> > have I heard people in the community requesting such functionality.
> > Perhaps pgfoundry would be a better place for this.
>
> The part of this that would actually be useful to put in core is
> maintaining a 64-bit XID counter, ie, keep an additional counter that
> bumps every time XID wraps around.  This cannot be done very well from
> outside core but it would be nearly trivial, and nearly free, to add
> inside.  Everything else in the patch could be done just as well as an
> extension datatype.
>
> (I wouldn't do it like this though --- TransactionIdAdvance itself is
> the place to bump the secondary counter.)
>
> The question though is if we did that, would Slony actually use it?

If it made sence to do it, then yes we would do it. The problem ends up being
Slony is designed to work across a multitude of versions of PG, and unless
this was backported to at least 7.4, it would take a while (ie when we
stopped supporting versions older than it was ported into)  before we would
make use of it.

>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Darcy Buskermolen
CommandPrompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
http://www.commandprompt.com


Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
Tom Lane
Date:
Darcy Buskermolen <darcyb@commandprompt.com> writes:
>> The question though is if we did that, would Slony actually use it?

> If it made sence to do it, then yes we would do it. The problem ends up being
> Slony is designed to work across a multitude of versions of PG, and unless
> this was backported to at least 7.4, it would take a while (ie when we
> stopped supporting versions older than it was ported into)  before we would
> make use of it.

[ shrug... ]  That's not happening; for one thing the change requires a
layout change in pg_control and we have no mechanism to do that without
initdb.

            regards, tom lane

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-07-26 kell 13:41, kirjutas Darcy Buskermolen:
> On Wednesday 26 July 2006 13:04, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > I am sure you worked hard on this, but I don't see the use case, nor
> > > have I heard people in the community requesting such functionality.
> > > Perhaps pgfoundry would be a better place for this.
> >
> > The part of this that would actually be useful to put in core is
> > maintaining a 64-bit XID counter, ie, keep an additional counter that
> > bumps every time XID wraps around.  This cannot be done very well from
> > outside core but it would be nearly trivial, and nearly free, to add
> > inside.  Everything else in the patch could be done just as well as an
> > extension datatype.
> >
> > (I wouldn't do it like this though --- TransactionIdAdvance itself is
> > the place to bump the secondary counter.)
> >
> > The question though is if we did that, would Slony actually use it?

It seems that Slony people still hope to circumvent the known brokenness
of xxid btree indexes by dropping and creating them often enough and/or
trying other workarounds.

> If it made sence to do it, then yes we would do it. The problem ends up being
> Slony is designed to work across a multitude of versions of PG, and unless
> this was backported to at least 7.4, it would take a while (ie when we
> stopped supporting versions older than it was ported into)  before we would
> make use of it.

We already have an external implementation, which requires a function
call to be executed at an interval of a few hundreds of millions
transactions to pump up the higher int4 when needed.

It would probably be easy to backport it to any version of postgres
which is supported by slony.

Being in core just makes the overflow accounting part more robust.

The function to retrieve the 8-byte trx id will look exatly the same
from userland in both cases.

> >
> >             regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-07-26 kell 13:35, kirjutas Bruce Momjian:
> I am sure you worked hard on this, but I don't see the use case,

The use case is any slony-like replication system or queueing system
which needs consistent means of knowing batches of transactions which
have finished during some period.

You can think of this as a core component for building slony that does
*not* break at 2G trx.

> nor
> have I heard people in the community requesting such functionality.

You will, once more Slony users reach 2billion trx limit and start
silently losing data. And find out a few weeks later.

> Perhaps pgfoundry would be a better place for this.

At least the part that manages epoch should be in core.

The rest can actually be on pgfoundry as a separate project, or inside
skytools/pgQ.

> ---------------------------------------------------------------------------
>
> Marko Kreen wrote:
> >
> > Intro
> > -----
> >
> > Following patch exports 8 byte txid and snapshot to user level
> > allowing its use in regular SQL.  It is based on Slony-I xxid
> > module.  It provides special 'snapshot' type for snapshot but
> > uses regular int8 for transaction ID's.
> >
> > Exported API
> > ------------
> >
> > Type: snapshot
> >
> > Functions:
> >
> >   current_txid()            returns int8
> >   current_snapshot()            returns snapshot
> >   snapshot_xmin(snapshot)        returns int8
> >   snapshot_xmax(snapshot)        returns int8
> >   snapshot_active_list(snapshot)    returns setof int8
> >   snapshot_contains(snapshot, int8)    returns bool
> >   pg_sync_txid(int8)            returns int8
> >
> > Operation
> > ---------
> >
> > Extension to 8-byte is done by keeping track of wraparound count
> > in pg_control.  On every checkpoint, nextxid is compared to one
> > stored in pg_control.  If value is smaller wraparound happened
> > and epoch is inreased.
> >
> > When long txid or snapshot is requested, pg_control is locked with
> > LW_SHARED for retrieving epoch value from it.  The patch does not
> > affect core functionality in any other way.
> >
> > Backup/restore of txid data
> > ---------------------------
> >
> > Currently I made pg_dumpall output following statement:
> >
> >   "SELECT pg_sync_txid(%d)", current_txid()
> >
> > then on target database, pg_sync_txid if it's current
> > (epoch + GetTopTransactionId()) are larger than given argument.
> > If not then it bumps epoch, until they are, thus guaranteeing that
> > new issued txid's are larger then in source database.  If restored
> > into same database instance, nothing will happen.
> >
> >
> > Advantages of 8-byte txids
> > --------------------------
> >
> > * Indexes won't break silently.  No need for mandatory periodic
> >   truncate which may not happen for various reasons.
> > * Allows to keep values from different databases in one table/index.
> > * Ability to bring data into different server and continue there.
> >
> > Advantages in being in core
> > ---------------------------
> >
> > * Core code can guarantee that wraparound check happens in 2G transactions.
> > * Core code can update pg_control non-transactionally.  Module
> >   needs to operate inside user transaction when updating epoch
> >   row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
> >   long transactions, locking, etc).
> > * Core code has only one place where it needs to update, module
> >   needs to have epoch table in each database.
> >
> > Todo, tothink
> > -------------
> >
> > * Flesh out the documentation.  Probably needs some background.
> > * Better names for some functions?
> > * pg_sync_txid allows use of pg_dump for moveing database,
> >   but also adds possibility to shoot in the foot by allowing
> >   epoch wraparound to happen.  Is "Don't do it then" enough?
> > * Currently txid keeps its own copy of nextxid in pg_control,
> >   this makes clear data dependencies.  Its possible to drop it
> >   and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
> > * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?
> >
> > --
> > marko
> >
>
> [ Attachment, skipping... ]
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I am sure you worked hard on this, but I don't see the use case, nor
> > have I heard people in the community requesting such functionality.
> > Perhaps pgfoundry would be a better place for this.
>
> The part of this that would actually be useful to put in core is
> maintaining a 64-bit XID counter, ie, keep an additional counter that
> bumps every time XID wraps around.  This cannot be done very well from
> outside core but it would be nearly trivial, and nearly free, to add
> inside.  Everything else in the patch could be done just as well as an
> extension datatype.
>
> (I wouldn't do it like this though --- TransactionIdAdvance itself is
> the place to bump the secondary counter.)

Agreed.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
Darcy Buskermolen
Date:
On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> Darcy Buskermolen <darcyb@commandprompt.com> writes:
> >> The question though is if we did that, would Slony actually use it?
> >
> > If it made sence to do it, then yes we would do it. The problem ends up
> > being Slony is designed to work across a multitude of versions of PG, and
> > unless this was backported to at least 7.4, it would take a while (ie
> > when we stopped supporting versions older than it was ported into)
> > before we would make use of it.
>
> [ shrug... ]  That's not happening; for one thing the change requires a
> layout change in pg_control and we have no mechanism to do that without
> initdb.

I'll take a bit more of a look through the patch and see if it is a real boot
to use it on those platforms that support it, and that we have a suitable way
around it on those that don't.   But at this point I wouldn't hold my breath
on that

>
>             regards, tom lane

--
Darcy Buskermolen
CommandPrompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
http://www.commandprompt.com


Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
Alvaro Herrera
Date:
Darcy Buskermolen wrote:
> On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> > Darcy Buskermolen <darcyb@commandprompt.com> writes:
> > >> The question though is if we did that, would Slony actually use it?
> > >
> > > If it made sence to do it, then yes we would do it. The problem ends up
> > > being Slony is designed to work across a multitude of versions of PG, and
> > > unless this was backported to at least 7.4, it would take a while (ie
> > > when we stopped supporting versions older than it was ported into)
> > > before we would make use of it.
> >
> > [ shrug... ]  That's not happening; for one thing the change requires a
> > layout change in pg_control and we have no mechanism to do that without
> > initdb.
>
> I'll take a bit more of a look through the patch and see if it is a real boot
> to use it on those platforms that support it, and that we have a suitable way
> around it on those that don't.   But at this point I wouldn't hold my breath
> on that

The alternative seems to be that the Slony-I team doesn't feel they have
a need for it, nobody else pushes hard enough for the feature to be in
core, and thus Slony-I and all the rest stays broken forever.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-07-26 kell 14:27, kirjutas Darcy Buskermolen:
> On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> > Darcy Buskermolen <darcyb@commandprompt.com> writes:
> > >> The question though is if we did that, would Slony actually use it?
> > >
> > > If it made sence to do it, then yes we would do it. The problem ends up
> > > being Slony is designed to work across a multitude of versions of PG, and
> > > unless this was backported to at least 7.4, it would take a while (ie
> > > when we stopped supporting versions older than it was ported into)
> > > before we would make use of it.
> >
> > [ shrug... ]  That's not happening; for one thing the change requires a
> > layout change in pg_control and we have no mechanism to do that without
> > initdb.
>
> I'll take a bit more of a look through the patch and see if it is a real boot
> to use it on those platforms that support it, and that we have a suitable way
> around it on those that don't.

This patch is actually 2 things together:

1) fixing the xid wraparound and related btree brokenness by moving to
8byte txids represented as int8

2) cleaning up and exposing slony's snapshot usage.

Slony stored snapshots in tables as separate xmin, xmax and
list-of-running-transactions and then constructed the snapshot struct
and used it internally.

This patch exposes the snapshot it by providing a single snapshot type
and operators for telling if any int8 trx is committed before or after
this snapshot.

This makes it possible to use txid and snapshots in a a query that does

SELECT records FROM logtable WHERE txid BETWEEN snap1 AND snap2;

that is it gets all records which are committed between two snapshots.

>  But at this point I wouldn't hold my breath on that

Well, switching to using stuff from this patch would fix the
data-corruption-after-2G problem for slony.

That is unless thera are some bugs or thinkos of its own in this
patch :)

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
Darcy Buskermolen
Date:
On Wednesday 26 July 2006 14:27, Darcy Buskermolen wrote:
> On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> > Darcy Buskermolen <darcyb@commandprompt.com> writes:
> > >> The question though is if we did that, would Slony actually use it?
> > >
> > > If it made sence to do it, then yes we would do it. The problem ends up
> > > being Slony is designed to work across a multitude of versions of PG,
> > > and unless this was backported to at least 7.4, it would take a while
> > > (ie when we stopped supporting versions older than it was ported into)
> > > before we would make use of it.
> >
> > [ shrug... ]  That's not happening; for one thing the change requires a
> > layout change in pg_control and we have no mechanism to do that without
> > initdb.
>
> I'll take a bit more of a look through the patch and see if it is a real
> boot to use it on those platforms that support it, and that we have a
> suitable way around it on those that don't.   But at this point I wouldn't
> hold my breath on that

In one of those 3am lightbulbs I belive I have a way to make use of the 64-bit
XID counter and still maintain the ability to have backwards compatibility.
Is there any chance you could break this patch up into the 2 separate
componenets that Hannu mentions, and rework the XID stuff into
TransactionIdAdvance  as per tom's recommendation.  And in the meantime I'll
pencil out the slony stuff to utilize this.

>
> >             regards, tom lane

--
Darcy Buskermolen
CommandPrompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
http://www.commandprompt.com


Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

From
"Marko Kreen"
Date:
On 7/27/06, Darcy Buskermolen <darcyb@commandprompt.com> wrote:
> In one of those 3am lightbulbs I belive I have a way to make use of the 64-bit
> XID counter and still maintain the ability to have backwards compatibility.
> Is there any chance you could break this patch up into the 2 separate
> componenets that Hannu mentions, and rework the XID stuff into
> TransactionIdAdvance  as per tom's recommendation.  And in the meantime I'll
> pencil out the slony stuff to utilize this.

Yes, I can.  As I am on vacation right now, my computer-time is rather
unstable, hopefully I can do it on weekend.

--
marko

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> The part of this that would actually be useful to put in core is
>> maintaining a 64-bit XID counter, ie, keep an additional counter that
>> bumps every time XID wraps around.  This cannot be done very well from
>> outside core but it would be nearly trivial, and nearly free, to add
>> inside.  Everything else in the patch could be done just as well as an
>> extension datatype.
>>
>> (I wouldn't do it like this though --- TransactionIdAdvance itself is
>> the place to bump the secondary counter.)

> Agreed.

I reconsidered after trying to do it that way --- although fixing
TransactionIdAdvance itself to maintain a 2-word counter isn't hard,
there are a whole lot of other places that can advance nextXid,
mostly bits like this in WAL recovery:

    /* Make sure nextXid is beyond any XID mentioned in the record */
    max_xid = xid;
    for (i = 0; i < xlrec->nsubxacts; i++)
    {
        if (TransactionIdPrecedes(max_xid, sub_xids[i]))
            max_xid = sub_xids[i];
    }
    if (TransactionIdFollowsOrEquals(max_xid,
                                     ShmemVariableCache->nextXid))
    {
        ShmemVariableCache->nextXid = max_xid;
        TransactionIdAdvance(ShmemVariableCache->nextXid);
    }

We could hack all these places to know about maintaining an XID-epoch
value, but it's not looking like a simple single-place-to-touch fix :-(

So I'm now agreeing that the approach of maintaining an epoch counter
in checkpoints is best after all.  That will work so long as the system
doesn't exceed 4G transactions between checkpoints ... and you'd have a
ton of other problems before that, so this restriction does not bother
me.  Putting this in the core code still beats the alternatives
available to non-core code because of the impossibility of being sure
you get control on any fixed schedule, not to mention considerations of
what happens during WAL replay and PITR.

There's still a lot more cruft in the submitted patch than I think
belongs in core, but I'll work on extracting something we can apply.

There was some worry upthread about whether Slony would actually use
this in the near future, but certainly if we don't put it in then
they'll *never* be able to use it.

            regards, tom lane

Re: [PATCH] Provide 8-byte transaction IDs to user level

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> Following patch exports 8 byte txid and snapshot to user level
> allowing its use in regular SQL.  It is based on Slony-I xxid
> module.  It provides special 'snapshot' type for snapshot but
> uses regular int8 for transaction ID's.

Per discussion, I've applied a patch that just implements tracking of
"XID epoch" in checkpoints.  This should be sufficient to let xxid be
handled as an external module.

            regards, tom lane

Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to

From
"Marko Kreen"
Date:
On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> (I wouldn't do it like this though --- TransactionIdAdvance itself is
> >> the place to bump the secondary counter.)
>
> > Agreed.
>
> I reconsidered after trying to do it that way --- although fixing
> TransactionIdAdvance itself to maintain a 2-word counter isn't hard,
> there are a whole lot of other places that can advance nextXid,
> mostly bits like this in WAL recovery:
>
>     /* Make sure nextXid is beyond any XID mentioned in the record */
>     max_xid = xid;
>     for (i = 0; i < xlrec->nsubxacts; i++)
>     {
>         if (TransactionIdPrecedes(max_xid, sub_xids[i]))
>             max_xid = sub_xids[i];
>     }
>     if (TransactionIdFollowsOrEquals(max_xid,
>                                      ShmemVariableCache->nextXid))
>     {
>         ShmemVariableCache->nextXid = max_xid;
>         TransactionIdAdvance(ShmemVariableCache->nextXid);
>     }
>
> We could hack all these places to know about maintaining an XID-epoch
> value, but it's not looking like a simple single-place-to-touch fix :-(

As I was asked to rework the patch, I planned to use
TransactionIdAdvance(ShmemVariableCache), although that would
be conceptually ugly.  Right Thing for this approach would be
to have special struct, but that would touch half the codebase.

That was also the reason I did not want to go that path.

> There's still a lot more cruft in the submitted patch than I think
> belongs in core, but I'll work on extracting something we can apply.

The only cruft I see is the snapshot on-disk "compression" and maybe
the pg_sync_txid() funtionality.  Dropping the compression would not
matter much, snapshots would waste space, but at least for our
usage it would not be a problem.  The reast of the functions are all
required for efficient handling.

Dropping the pg_sync_txid() would be loss, because that means that
user cannot just dump and restore the data and just continue where
it left off.  Maybe its not a problem for replication but for generic
queueing it would need delicate juggling when restoring backup.

Although I must admit the pg_sync_txid() is indeed ugly part
of the patch, and it creates new mode for failure - wrapping
epoch.  So I can kind of agree for removing it.

I hope you don't mean that none of the user-level functions belong
to core.  It's not like there is several ways to expose the info.
And it not like there are much more interesting ways for using
the long xid in C level.  Having long xid available in SQL level
means that efficient async replication can be done without any
use of C.

Now that I am back from vacation I can do some coding myself,
if you give hints what needs rework.

--
marko