Thread: Re: [PATCHES] [PATCH] Provide 8-byte transaction IDs to user level
I am sure you worked hard on this, but I don't see the use case, nor have I heard people in the community requesting such functionality. Perhaps pgfoundry would be a better place for this. --------------------------------------------------------------------------- Marko Kreen wrote: > > Intro > ----- > > Following patch exports 8 byte txid and snapshot to user level > allowing its use in regular SQL. It is based on Slony-I xxid > module. It provides special 'snapshot' type for snapshot but > uses regular int8 for transaction ID's. > > Exported API > ------------ > > Type: snapshot > > Functions: > > current_txid() returns int8 > current_snapshot() returns snapshot > snapshot_xmin(snapshot) returns int8 > snapshot_xmax(snapshot) returns int8 > snapshot_active_list(snapshot) returns setof int8 > snapshot_contains(snapshot, int8) returns bool > pg_sync_txid(int8) returns int8 > > Operation > --------- > > Extension to 8-byte is done by keeping track of wraparound count > in pg_control. On every checkpoint, nextxid is compared to one > stored in pg_control. If value is smaller wraparound happened > and epoch is inreased. > > When long txid or snapshot is requested, pg_control is locked with > LW_SHARED for retrieving epoch value from it. The patch does not > affect core functionality in any other way. > > Backup/restore of txid data > --------------------------- > > Currently I made pg_dumpall output following statement: > > "SELECT pg_sync_txid(%d)", current_txid() > > then on target database, pg_sync_txid if it's current > (epoch + GetTopTransactionId()) are larger than given argument. > If not then it bumps epoch, until they are, thus guaranteeing that > new issued txid's are larger then in source database. If restored > into same database instance, nothing will happen. > > > Advantages of 8-byte txids > -------------------------- > > * Indexes won't break silently. No need for mandatory periodic > truncate which may not happen for various reasons. > * Allows to keep values from different databases in one table/index. > * Ability to bring data into different server and continue there. > > Advantages in being in core > --------------------------- > > * Core code can guarantee that wraparound check happens in 2G transactions. > * Core code can update pg_control non-transactionally. Module > needs to operate inside user transaction when updating epoch > row, which bring various problems (READ COMMITTED vs. SERIALIZABLE, > long transactions, locking, etc). > * Core code has only one place where it needs to update, module > needs to have epoch table in each database. > > Todo, tothink > ------------- > > * Flesh out the documentation. Probably needs some background. > * Better names for some functions? > * pg_sync_txid allows use of pg_dump for moveing database, > but also adds possibility to shoot in the foot by allowing > epoch wraparound to happen. Is "Don't do it then" enough? > * Currently txid keeps its own copy of nextxid in pg_control, > this makes clear data dependencies. Its possible to drop it > and use ->checkPointCopy->nextXid directly, thus saving 4 bytes. > * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall? > > -- > marko > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I am sure you worked hard on this, but I don't see the use case, nor > have I heard people in the community requesting such functionality. > Perhaps pgfoundry would be a better place for this. The part of this that would actually be useful to put in core is maintaining a 64-bit XID counter, ie, keep an additional counter that bumps every time XID wraps around. This cannot be done very well from outside core but it would be nearly trivial, and nearly free, to add inside. Everything else in the patch could be done just as well as an extension datatype. (I wouldn't do it like this though --- TransactionIdAdvance itself is the place to bump the secondary counter.) The question though is if we did that, would Slony actually use it? regards, tom lane
On Wednesday 26 July 2006 13:04, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I am sure you worked hard on this, but I don't see the use case, nor > > have I heard people in the community requesting such functionality. > > Perhaps pgfoundry would be a better place for this. > > The part of this that would actually be useful to put in core is > maintaining a 64-bit XID counter, ie, keep an additional counter that > bumps every time XID wraps around. This cannot be done very well from > outside core but it would be nearly trivial, and nearly free, to add > inside. Everything else in the patch could be done just as well as an > extension datatype. > > (I wouldn't do it like this though --- TransactionIdAdvance itself is > the place to bump the secondary counter.) > > The question though is if we did that, would Slony actually use it? If it made sence to do it, then yes we would do it. The problem ends up being Slony is designed to work across a multitude of versions of PG, and unless this was backported to at least 7.4, it would take a while (ie when we stopped supporting versions older than it was ported into) before we would make use of it. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Darcy Buskermolen CommandPrompt, Inc. Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 http://www.commandprompt.com
Darcy Buskermolen <darcyb@commandprompt.com> writes: >> The question though is if we did that, would Slony actually use it? > If it made sence to do it, then yes we would do it. The problem ends up being > Slony is designed to work across a multitude of versions of PG, and unless > this was backported to at least 7.4, it would take a while (ie when we > stopped supporting versions older than it was ported into) before we would > make use of it. [ shrug... ] That's not happening; for one thing the change requires a layout change in pg_control and we have no mechanism to do that without initdb. regards, tom lane
Ühel kenal päeval, K, 2006-07-26 kell 13:41, kirjutas Darcy Buskermolen: > On Wednesday 26 July 2006 13:04, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > I am sure you worked hard on this, but I don't see the use case, nor > > > have I heard people in the community requesting such functionality. > > > Perhaps pgfoundry would be a better place for this. > > > > The part of this that would actually be useful to put in core is > > maintaining a 64-bit XID counter, ie, keep an additional counter that > > bumps every time XID wraps around. This cannot be done very well from > > outside core but it would be nearly trivial, and nearly free, to add > > inside. Everything else in the patch could be done just as well as an > > extension datatype. > > > > (I wouldn't do it like this though --- TransactionIdAdvance itself is > > the place to bump the secondary counter.) > > > > The question though is if we did that, would Slony actually use it? It seems that Slony people still hope to circumvent the known brokenness of xxid btree indexes by dropping and creating them often enough and/or trying other workarounds. > If it made sence to do it, then yes we would do it. The problem ends up being > Slony is designed to work across a multitude of versions of PG, and unless > this was backported to at least 7.4, it would take a while (ie when we > stopped supporting versions older than it was ported into) before we would > make use of it. We already have an external implementation, which requires a function call to be executed at an interval of a few hundreds of millions transactions to pump up the higher int4 when needed. It would probably be easy to backport it to any version of postgres which is supported by slony. Being in core just makes the overflow accounting part more robust. The function to retrieve the 8-byte trx id will look exatly the same from userland in both cases. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: Don't 'kill -9' the postmaster > -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Ühel kenal päeval, K, 2006-07-26 kell 13:35, kirjutas Bruce Momjian: > I am sure you worked hard on this, but I don't see the use case, The use case is any slony-like replication system or queueing system which needs consistent means of knowing batches of transactions which have finished during some period. You can think of this as a core component for building slony that does *not* break at 2G trx. > nor > have I heard people in the community requesting such functionality. You will, once more Slony users reach 2billion trx limit and start silently losing data. And find out a few weeks later. > Perhaps pgfoundry would be a better place for this. At least the part that manages epoch should be in core. The rest can actually be on pgfoundry as a separate project, or inside skytools/pgQ. > --------------------------------------------------------------------------- > > Marko Kreen wrote: > > > > Intro > > ----- > > > > Following patch exports 8 byte txid and snapshot to user level > > allowing its use in regular SQL. It is based on Slony-I xxid > > module. It provides special 'snapshot' type for snapshot but > > uses regular int8 for transaction ID's. > > > > Exported API > > ------------ > > > > Type: snapshot > > > > Functions: > > > > current_txid() returns int8 > > current_snapshot() returns snapshot > > snapshot_xmin(snapshot) returns int8 > > snapshot_xmax(snapshot) returns int8 > > snapshot_active_list(snapshot) returns setof int8 > > snapshot_contains(snapshot, int8) returns bool > > pg_sync_txid(int8) returns int8 > > > > Operation > > --------- > > > > Extension to 8-byte is done by keeping track of wraparound count > > in pg_control. On every checkpoint, nextxid is compared to one > > stored in pg_control. If value is smaller wraparound happened > > and epoch is inreased. > > > > When long txid or snapshot is requested, pg_control is locked with > > LW_SHARED for retrieving epoch value from it. The patch does not > > affect core functionality in any other way. > > > > Backup/restore of txid data > > --------------------------- > > > > Currently I made pg_dumpall output following statement: > > > > "SELECT pg_sync_txid(%d)", current_txid() > > > > then on target database, pg_sync_txid if it's current > > (epoch + GetTopTransactionId()) are larger than given argument. > > If not then it bumps epoch, until they are, thus guaranteeing that > > new issued txid's are larger then in source database. If restored > > into same database instance, nothing will happen. > > > > > > Advantages of 8-byte txids > > -------------------------- > > > > * Indexes won't break silently. No need for mandatory periodic > > truncate which may not happen for various reasons. > > * Allows to keep values from different databases in one table/index. > > * Ability to bring data into different server and continue there. > > > > Advantages in being in core > > --------------------------- > > > > * Core code can guarantee that wraparound check happens in 2G transactions. > > * Core code can update pg_control non-transactionally. Module > > needs to operate inside user transaction when updating epoch > > row, which bring various problems (READ COMMITTED vs. SERIALIZABLE, > > long transactions, locking, etc). > > * Core code has only one place where it needs to update, module > > needs to have epoch table in each database. > > > > Todo, tothink > > ------------- > > > > * Flesh out the documentation. Probably needs some background. > > * Better names for some functions? > > * pg_sync_txid allows use of pg_dump for moveing database, > > but also adds possibility to shoot in the foot by allowing > > epoch wraparound to happen. Is "Don't do it then" enough? > > * Currently txid keeps its own copy of nextxid in pg_control, > > this makes clear data dependencies. Its possible to drop it > > and use ->checkPointCopy->nextXid directly, thus saving 4 bytes. > > * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall? > > > > -- > > marko > > > > [ Attachment, skipping... ] > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: Don't 'kill -9' the postmaster > -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I am sure you worked hard on this, but I don't see the use case, nor > > have I heard people in the community requesting such functionality. > > Perhaps pgfoundry would be a better place for this. > > The part of this that would actually be useful to put in core is > maintaining a 64-bit XID counter, ie, keep an additional counter that > bumps every time XID wraps around. This cannot be done very well from > outside core but it would be nearly trivial, and nearly free, to add > inside. Everything else in the patch could be done just as well as an > extension datatype. > > (I wouldn't do it like this though --- TransactionIdAdvance itself is > the place to bump the secondary counter.) Agreed. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wednesday 26 July 2006 14:03, Tom Lane wrote: > Darcy Buskermolen <darcyb@commandprompt.com> writes: > >> The question though is if we did that, would Slony actually use it? > > > > If it made sence to do it, then yes we would do it. The problem ends up > > being Slony is designed to work across a multitude of versions of PG, and > > unless this was backported to at least 7.4, it would take a while (ie > > when we stopped supporting versions older than it was ported into) > > before we would make use of it. > > [ shrug... ] That's not happening; for one thing the change requires a > layout change in pg_control and we have no mechanism to do that without > initdb. I'll take a bit more of a look through the patch and see if it is a real boot to use it on those platforms that support it, and that we have a suitable way around it on those that don't. But at this point I wouldn't hold my breath on that > > regards, tom lane -- Darcy Buskermolen CommandPrompt, Inc. Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 http://www.commandprompt.com
Darcy Buskermolen wrote: > On Wednesday 26 July 2006 14:03, Tom Lane wrote: > > Darcy Buskermolen <darcyb@commandprompt.com> writes: > > >> The question though is if we did that, would Slony actually use it? > > > > > > If it made sence to do it, then yes we would do it. The problem ends up > > > being Slony is designed to work across a multitude of versions of PG, and > > > unless this was backported to at least 7.4, it would take a while (ie > > > when we stopped supporting versions older than it was ported into) > > > before we would make use of it. > > > > [ shrug... ] That's not happening; for one thing the change requires a > > layout change in pg_control and we have no mechanism to do that without > > initdb. > > I'll take a bit more of a look through the patch and see if it is a real boot > to use it on those platforms that support it, and that we have a suitable way > around it on those that don't. But at this point I wouldn't hold my breath > on that The alternative seems to be that the Slony-I team doesn't feel they have a need for it, nobody else pushes hard enough for the feature to be in core, and thus Slony-I and all the rest stays broken forever. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Ühel kenal päeval, K, 2006-07-26 kell 14:27, kirjutas Darcy Buskermolen: > On Wednesday 26 July 2006 14:03, Tom Lane wrote: > > Darcy Buskermolen <darcyb@commandprompt.com> writes: > > >> The question though is if we did that, would Slony actually use it? > > > > > > If it made sence to do it, then yes we would do it. The problem ends up > > > being Slony is designed to work across a multitude of versions of PG, and > > > unless this was backported to at least 7.4, it would take a while (ie > > > when we stopped supporting versions older than it was ported into) > > > before we would make use of it. > > > > [ shrug... ] That's not happening; for one thing the change requires a > > layout change in pg_control and we have no mechanism to do that without > > initdb. > > I'll take a bit more of a look through the patch and see if it is a real boot > to use it on those platforms that support it, and that we have a suitable way > around it on those that don't. This patch is actually 2 things together: 1) fixing the xid wraparound and related btree brokenness by moving to 8byte txids represented as int8 2) cleaning up and exposing slony's snapshot usage. Slony stored snapshots in tables as separate xmin, xmax and list-of-running-transactions and then constructed the snapshot struct and used it internally. This patch exposes the snapshot it by providing a single snapshot type and operators for telling if any int8 trx is committed before or after this snapshot. This makes it possible to use txid and snapshots in a a query that does SELECT records FROM logtable WHERE txid BETWEEN snap1 AND snap2; that is it gets all records which are committed between two snapshots. > But at this point I wouldn't hold my breath on that Well, switching to using stuff from this patch would fix the data-corruption-after-2G problem for slony. That is unless thera are some bugs or thinkos of its own in this patch :) -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Alvaro Herrera wrote: > Darcy Buskermolen wrote: > >> I'll take a bit more of a look through the patch and see if it is a real boot >> to use it on those platforms that support it, and that we have a suitable way >> around it on those that don't. But at this point I wouldn't hold my breath >> on that >> > > The alternative seems to be that the Slony-I team doesn't feel they have > a need for it, nobody else pushes hard enough for the feature to be in > core, and thus Slony-I and all the rest stays broken forever. > > Some things are going to take a few generations to be generally useful, ISTM. At least let's go with the bit that Tom says should be in core. cheers andrew
On Wednesday 26 July 2006 14:27, Darcy Buskermolen wrote: > On Wednesday 26 July 2006 14:03, Tom Lane wrote: > > Darcy Buskermolen <darcyb@commandprompt.com> writes: > > >> The question though is if we did that, would Slony actually use it? > > > > > > If it made sence to do it, then yes we would do it. The problem ends up > > > being Slony is designed to work across a multitude of versions of PG, > > > and unless this was backported to at least 7.4, it would take a while > > > (ie when we stopped supporting versions older than it was ported into) > > > before we would make use of it. > > > > [ shrug... ] That's not happening; for one thing the change requires a > > layout change in pg_control and we have no mechanism to do that without > > initdb. > > I'll take a bit more of a look through the patch and see if it is a real > boot to use it on those platforms that support it, and that we have a > suitable way around it on those that don't. But at this point I wouldn't > hold my breath on that In one of those 3am lightbulbs I belive I have a way to make use of the 64-bit XID counter and still maintain the ability to have backwards compatibility. Is there any chance you could break this patch up into the 2 separate componenets that Hannu mentions, and rework the XID stuff into TransactionIdAdvance as per tom's recommendation. And in the meantime I'll pencil out the slony stuff to utilize this. > > > regards, tom lane -- Darcy Buskermolen CommandPrompt, Inc. Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 http://www.commandprompt.com
On 7/27/06, Darcy Buskermolen <darcyb@commandprompt.com> wrote: > In one of those 3am lightbulbs I belive I have a way to make use of the 64-bit > XID counter and still maintain the ability to have backwards compatibility. > Is there any chance you could break this patch up into the 2 separate > componenets that Hannu mentions, and rework the XID stuff into > TransactionIdAdvance as per tom's recommendation. And in the meantime I'll > pencil out the slony stuff to utilize this. Yes, I can. As I am on vacation right now, my computer-time is rather unstable, hopefully I can do it on weekend. -- marko
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> The part of this that would actually be useful to put in core is >> maintaining a 64-bit XID counter, ie, keep an additional counter that >> bumps every time XID wraps around. This cannot be done very well from >> outside core but it would be nearly trivial, and nearly free, to add >> inside. Everything else in the patch could be done just as well as an >> extension datatype. >> >> (I wouldn't do it like this though --- TransactionIdAdvance itself is >> the place to bump the secondary counter.) > Agreed. I reconsidered after trying to do it that way --- although fixing TransactionIdAdvance itself to maintain a 2-word counter isn't hard, there are a whole lot of other places that can advance nextXid, mostly bits like this in WAL recovery: /* Make sure nextXid is beyond any XID mentioned in the record */ max_xid = xid; for (i = 0; i < xlrec->nsubxacts; i++) { if (TransactionIdPrecedes(max_xid, sub_xids[i])) max_xid = sub_xids[i]; } if (TransactionIdFollowsOrEquals(max_xid, ShmemVariableCache->nextXid)) { ShmemVariableCache->nextXid = max_xid; TransactionIdAdvance(ShmemVariableCache->nextXid); } We could hack all these places to know about maintaining an XID-epoch value, but it's not looking like a simple single-place-to-touch fix :-( So I'm now agreeing that the approach of maintaining an epoch counter in checkpoints is best after all. That will work so long as the system doesn't exceed 4G transactions between checkpoints ... and you'd have a ton of other problems before that, so this restriction does not bother me. Putting this in the core code still beats the alternatives available to non-core code because of the impossibility of being sure you get control on any fixed schedule, not to mention considerations of what happens during WAL replay and PITR. There's still a lot more cruft in the submitted patch than I think belongs in core, but I'll work on extracting something we can apply. There was some worry upthread about whether Slony would actually use this in the near future, but certainly if we don't put it in then they'll *never* be able to use it. regards, tom lane
On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> (I wouldn't do it like this though --- TransactionIdAdvance itself is > >> the place to bump the secondary counter.) > > > Agreed. > > I reconsidered after trying to do it that way --- although fixing > TransactionIdAdvance itself to maintain a 2-word counter isn't hard, > there are a whole lot of other places that can advance nextXid, > mostly bits like this in WAL recovery: > > /* Make sure nextXid is beyond any XID mentioned in the record */ > max_xid = xid; > for (i = 0; i < xlrec->nsubxacts; i++) > { > if (TransactionIdPrecedes(max_xid, sub_xids[i])) > max_xid = sub_xids[i]; > } > if (TransactionIdFollowsOrEquals(max_xid, > ShmemVariableCache->nextXid)) > { > ShmemVariableCache->nextXid = max_xid; > TransactionIdAdvance(ShmemVariableCache->nextXid); > } > > We could hack all these places to know about maintaining an XID-epoch > value, but it's not looking like a simple single-place-to-touch fix :-( As I was asked to rework the patch, I planned to use TransactionIdAdvance(ShmemVariableCache), although that would be conceptually ugly. Right Thing for this approach would be to have special struct, but that would touch half the codebase. That was also the reason I did not want to go that path. > There's still a lot more cruft in the submitted patch than I think > belongs in core, but I'll work on extracting something we can apply. The only cruft I see is the snapshot on-disk "compression" and maybe the pg_sync_txid() funtionality. Dropping the compression would not matter much, snapshots would waste space, but at least for our usage it would not be a problem. The reast of the functions are all required for efficient handling. Dropping the pg_sync_txid() would be loss, because that means that user cannot just dump and restore the data and just continue where it left off. Maybe its not a problem for replication but for generic queueing it would need delicate juggling when restoring backup. Although I must admit the pg_sync_txid() is indeed ugly part of the patch, and it creates new mode for failure - wrapping epoch. So I can kind of agree for removing it. I hope you don't mean that none of the user-level functions belong to core. It's not like there is several ways to expose the info. And it not like there are much more interesting ways for using the long xid in C level. Having long xid available in SQL level means that efficient async replication can be done without any use of C. Now that I am back from vacation I can do some coding myself, if you give hints what needs rework. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > Dropping the pg_sync_txid() would be loss, because that means that > user cannot just dump and restore the data and just continue where > it left off. Maybe its not a problem for replication but for generic > queueing it would need delicate juggling when restoring backup. I'm not following the point here. Dump and restore has never intended to preserve the transaction counter, so why should it preserve high-order bits of the transaction counter? There is another problem with pg_sync_txid, too: because it is willing to advance the extended XID counter in multiples of 4G XIDs, it turns wraparound of the extended counter from a never-will-happen scenario into something that could happen in a poorly-managed installation. If you've got to be prepared to cope with wraparound of the extended counter, then what the heck is the point at all? You might as well just work with XIDs as they stand. So I think pg_sync_txid is a bad idea. In the patch as committed, anyone who's really intent on munging the epoch can do it with pg_resetxlog, but there's not a provision for doing it short of that. regards, tom lane
On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > Dropping the pg_sync_txid() would be loss, because that means that > > user cannot just dump and restore the data and just continue where > > it left off. Maybe its not a problem for replication but for generic > > queueing it would need delicate juggling when restoring backup. > > I'm not following the point here. Dump and restore has never intended > to preserve the transaction counter, so why should it preserve > high-order bits of the transaction counter? Thus it guarantees that any new issued large txid's will be larger than existing ones in tables. Thus code can depend on monotonous growth. > There is another problem with pg_sync_txid, too: because it is willing > to advance the extended XID counter in multiples of 4G XIDs, it turns > wraparound of the extended counter from a never-will-happen scenario > into something that could happen in a poorly-managed installation. > If you've got to be prepared to cope with wraparound of the extended > counter, then what the heck is the point at all? You might as well just > work with XIDs as they stand. Indeed. I also don't like that scenario. > So I think pg_sync_txid is a bad idea. In the patch as committed, > anyone who's really intent on munging the epoch can do it with > pg_resetxlog, but there's not a provision for doing it short of that. I like it. It is indeed better than having pg_dump issuing a function call. This fully satisfactory. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not following the point here. Dump and restore has never intended >> to preserve the transaction counter, so why should it preserve >> high-order bits of the transaction counter? > Thus it guarantees that any new issued large txid's will be larger > than existing ones in tables. Thus code can depend on monotonous > growth. Within a single installation, sure, but I don't buy that we ought to try to preserve XIDs across installations. regards, tom lane
On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm not following the point here. Dump and restore has never intended > >> to preserve the transaction counter, so why should it preserve > >> high-order bits of the transaction counter? > > > Thus it guarantees that any new issued large txid's will be larger > > than existing ones in tables. Thus code can depend on monotonous > > growth. > > Within a single installation, sure, but I don't buy that we ought to try > to preserve XIDs across installations. I think you are right in the respect that we should not do it automatically. But now that the long xids may end up in data tables, user may have the need dump/restore it in another installation. If the application is eg. Slony like queue, that depends on xid growth, user needs to be able to bump epoch or application level support for migration. If he has neither, he needs basically to extract old contents by hand (as app would not work reliably) and reset everything. Probably the right thing would be for application have a functions "we moved, fix everything". But bumping epoch is such a simple way of fixing it that it should still be available. And pg_resetxlog is fine for that. Espacially as using it signals "It's dangerous what you are doing!" -- marko