Thread: Set new system identifier using pg_resetxlog
Hello, attached is a simple patch which makes it possible to change the system identifier of the cluster in pg_control. This is useful for individualization of the instance that is started on top of data directory produced by pg_basebackup - something that's helpful for logical replication setup where you need to easily identify each node (it's used by Bidirectional Replication for example). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > attached is a simple patch which makes it possible to change the system > identifier of the cluster in pg_control. This is useful for > individualization of the instance that is started on top of data directory > produced by pg_basebackup - something that's helpful for logical replication > setup where you need to easily identify each node (it's used by > Bidirectional Replication for example). I can clearly understand the utility of being able to reset the system ID to a new, randomly-generated system ID - but giving the user the ability to set a particular value of their own choosing seems like a pretty sharp tool. What is the use case for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17/06/14 16:18, Robert Haas wrote: > On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> attached is a simple patch which makes it possible to change the system >> identifier of the cluster in pg_control. This is useful for >> individualization of the instance that is started on top of data directory >> produced by pg_basebackup - something that's helpful for logical replication >> setup where you need to easily identify each node (it's used by >> Bidirectional Replication for example). > > I can clearly understand the utility of being able to reset the system > ID to a new, randomly-generated system ID - but giving the user the > ability to set a particular value of their own choosing seems like a > pretty sharp tool. What is the use case for that? > Let's say you want to initialize new logical replication node via pg_basebackup and you want your replication slots to be easily identifiable so you use your local system id as part of the slot name. In that case you need to know the future system id of the node because you need to register the slot before consistent point to which you replay via streaming replication (and you can't replay anymore once you changed the system id). Which means you need to generate your system id in advance and be able to change it in pg_control later. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: > On 17/06/14 16:18, Robert Haas wrote: >> On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com> >> wrote: >>> attached is a simple patch which makes it possible to change the system >>> identifier of the cluster in pg_control. This is useful for >>> individualization of the instance that is started on top of data >>> directory >>> produced by pg_basebackup - something that's helpful for logical >>> replication >>> setup where you need to easily identify each node (it's used by >>> Bidirectional Replication for example). >> >> >> I can clearly understand the utility of being able to reset the system >> ID to a new, randomly-generated system ID - but giving the user the >> ability to set a particular value of their own choosing seems like a >> pretty sharp tool. What is the use case for that? > > Let's say you want to initialize new logical replication node via > pg_basebackup and you want your replication slots to be easily identifiable > so you use your local system id as part of the slot name. > > In that case you need to know the future system id of the node because you > need to register the slot before consistent point to which you replay via > streaming replication (and you can't replay anymore once you changed the > system id). Which means you need to generate your system id in advance and > be able to change it in pg_control later. Hmm. I guess that makes sense. But it seems to me that we might need to have a process discussion here, because, while I'm all in favor of incremental feature proposals that build towards a larger goal, it currently appears that the larger goal toward which you are building is not something that's been publicly discussed and debated on this list. And I really think we need to have that conversation. Obviously, individual patches will still need to be debated, but I feel like 2ndQuadrant is trying to construct a castle without showing the community the floor plan. I believe that there is relatively broad agreement that we would all like a castle, but different people may have legitimately different ideas about how it should be constructed. If the work arrives as a series of disconnected pieces (user-specified system ID, event triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has to take it on faith that those pieces are going to eventually fit together in a way that we'll all be happy with. In some cases, that's fine, because the feature is useful on its own merits whether it ends up being part of the castle or not. But in other cases, like this one, if the premise that the slot name should match the system identifier isn't something the community wants to accept, then taking a patch that lets people do that is probably a bad idea, because at least one person will use it to set the system identifier of a system to a value that enables physical replication to take place when that is actually totally unsafe, and we don't want to enable that for no reason. Maybe the slot name should match the replication identifier rather than the standby system ID, for example. There are conflicting proposals for how replication identifiers should work, but one of those proposals limits it to 16 bits. If we're going to have multiple identifiers floating around anyway, I'd rather have a slot called 7 than one called 6024402925054484590. On the other hand, maybe there's going to be a new proposal to use the database system identifier as a replication identifier, which might be a fine idea and which would demolish that argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-17 12:07:04 -0400, Robert Haas wrote: > On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: > > On 17/06/14 16:18, Robert Haas wrote: > >> On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com> > >> wrote: > >>> attached is a simple patch which makes it possible to change the system > >>> identifier of the cluster in pg_control. This is useful for > >>> individualization of the instance that is started on top of data > >>> directory > >>> produced by pg_basebackup - something that's helpful for logical > >>> replication > >>> setup where you need to easily identify each node (it's used by > >>> Bidirectional Replication for example). > >> > >> > >> I can clearly understand the utility of being able to reset the system > >> ID to a new, randomly-generated system ID - but giving the user the > >> ability to set a particular value of their own choosing seems like a > >> pretty sharp tool. What is the use case for that? I've previously hacked this up adhoc during data recovery when I needed to make another cluster similar enough that I could replay WAL. Another usecase is to mark a database as independent from its origin. Imagine a database that gets sharded across several servers. It's not uncommon to do that by initially basebackup'ing the database to several nodes and then use them separately from thereon. It's quite useful to actually mark them as being distinct. Especially as several of them right now would end up with the same timeline id... > But it seems to me that we might need to have a process discussion > here, because, while I'm all in favor of incremental feature proposals > that build towards a larger goal, it currently appears that the larger > goal toward which you are building is not something that's been > publicly discussed and debated on this list. And I really think we > need to have that conversation. Obviously, individual patches will > still need to be debated, but I feel like 2ndQuadrant is trying to > construct a castle without showing the community the floor plan. I > believe that there is relatively broad agreement that we would all > like a castle, but different people may have legitimately different > ideas about how it should be constructed. If the work arrives as a > series of disconnected pieces (user-specified system ID, event > triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has > to take it on faith that those pieces are going to eventually fit > together in a way that we'll all be happy with. In some cases, that's > fine, because the feature is useful on its own merits whether it ends > up being part of the castle or not. > Uh. Right now this patch has been written because it's needed for a out of core replication solution. That's what BDR is at this point. The patch is unobtrusive, has other usecases than just our internal one and doesn't make pg_resetxlog even more dangerous than it already is. I don't see much problem with considering it on it's own cost/benefit? So this seems to be a concern that's relatively independent of this patch. Am I seing that right? I think one very important point here is that BDR is *not* the proposed in core solution. I think a reasonable community perspective - besides also being useful on it's own - is to view it as a *prototype* for a in core solution. And e.g. logical decoding would have looked much worse - and likely not have been integrated - without externally already being used for BDR. I'm not sure how we can ease or even resolve your conerns when talking about pretty independent and general pieces of functionality like the DDL even trigger stuff. We needed to actually *write* those to see how BDR will look like. And the communities feedback heavily influenced how BDR looks like by accepting some pieces, demanding others, and outright rejecting the remainder. I think there's some pieces that need to consider them on their own merit. Logical decoding is useful on it's own. The ability for out of core systems to do DDL replication is another piece (that you referred to above). I think the likelihood of success if we were to try to design a in-core system from ground up first and then follow through prety exactly along those lines is minimal. So, what I think we can do is to continue trying to build independent, generally useful bits. Which imo all the stuff that's been integrated is. Then, somewhat soon I think, we'll have to come up with a proposal how the parts that are *not* necessarily useful outside of in-core logical rep. might look like. Which will likely trigger some long long discussions that turn that design around a couple of times. Which is fine. I *don't* think that's going to be a trimmed down version of todays BDR. > But in other cases, like this one, if the premise that the slot name > should match the system identifier isn't something the community wants > to accept, then taking a patch that lets people do that is probably a > bad idea, because at least one person will use it to set the system > identifier of a system to a value that enables physical replication to > take place when that is actually totally unsafe, and we don't want to > enable that for no reason. It also allows many other dangerous things. Many of which are much more dangerous than changing the system identifier. Resetting an independent cluster is also not very likely to work - the LSNs would still not match. But it wouldn't corrupt the copy of the database that's been changed... > Maybe the slot name should match the > replication identifier rather than the standby system ID, for example. > There are conflicting proposals for how replication identifiers should > work, but one of those proposals limits it to 16 bits. I actually don't think any of the discussions I was involved in had the externally visible version of replication identifiers limited to 16bits? If you are referring to my patch, 16bits was just the width of the *internal* name that should basically never be looked at. User visible replication identifiers are always identified by an arbitrary string - whose format is determined by the user of the replication identifier facility. *BDR* currently stores the system identifer, the database id and a name in there - but that's nothing core needs to concern itself with. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> >> I can clearly understand the utility of being able to reset the system >> >> ID to a new, randomly-generated system ID - but giving the user the >> >> ability to set a particular value of their own choosing seems like a >> >> pretty sharp tool. What is the use case for that? > > I've previously hacked this up adhoc during data recovery when I needed > to make another cluster similar enough that I could replay WAL. > > Another usecase is to mark a database as independent from its > origin. Imagine a database that gets sharded across several > servers. It's not uncommon to do that by initially basebackup'ing the > database to several nodes and then use them separately from > thereon. It's quite useful to actually mark them as being > distinct. Especially as several of them right now would end up with the > same timeline id... Sure, but that only requires being able to reset the ID randomly, not to a particular value. >> But it seems to me that we might need to have a process discussion >> here, because, while I'm all in favor of incremental feature proposals >> that build towards a larger goal, it currently appears that the larger >> goal toward which you are building is not something that's been >> publicly discussed and debated on this list. And I really think we >> need to have that conversation. Obviously, individual patches will >> still need to be debated, but I feel like 2ndQuadrant is trying to >> construct a castle without showing the community the floor plan. I >> believe that there is relatively broad agreement that we would all >> like a castle, but different people may have legitimately different >> ideas about how it should be constructed. If the work arrives as a >> series of disconnected pieces (user-specified system ID, event >> triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has >> to take it on faith that those pieces are going to eventually fit >> together in a way that we'll all be happy with. In some cases, that's >> fine, because the feature is useful on its own merits whether it ends >> up being part of the castle or not. >> > > Uh. Right now this patch has been written because it's needed for a out > of core replication solution. That's what BDR is at this point. The > patch is unobtrusive, has other usecases than just our internal one and > doesn't make pg_resetxlog even more dangerous than it already is. I > don't see much problem with considering it on it's own cost/benefit? Well, I think it *does* make pg_resetxlog more dangerous; see previous discussion of pg_computemaxlsn. > So this seems to be a concern that's relatively independent of this > patch. Am I seing that right? Partially; not completely. > I actually don't think any of the discussions I was involved in had the > externally visible version of replication identifiers limited to 16bits? > If you are referring to my patch, 16bits was just the width of the > *internal* name that should basically never be looked at. User visible > replication identifiers are always identified by an arbitrary string - > whose format is determined by the user of the replication identifier > facility. *BDR* currently stores the system identifer, the database id > and a name in there - but that's nothing core needs to concern itself > with. I don't think you're going to be able to avoid users needing to know about those IDs. The configuration table is going to have to be the same on all nodes, and how are you going to get that set up without those IDs being user-visible? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-18 12:36:13 -0400, Robert Haas wrote: > > I actually don't think any of the discussions I was involved in had the > > externally visible version of replication identifiers limited to 16bits? > > If you are referring to my patch, 16bits was just the width of the > > *internal* name that should basically never be looked at. User visible > > replication identifiers are always identified by an arbitrary string - > > whose format is determined by the user of the replication identifier > > facility. *BDR* currently stores the system identifer, the database id > > and a name in there - but that's nothing core needs to concern itself > > with. > > I don't think you're going to be able to avoid users needing to know > about those IDs. The configuration table is going to have to be the > same on all nodes, and how are you going to get that set up without > those IDs being user-visible? Why? Users and other systems only ever see the external ID. Everything leaving the system is converted to the external form. The short id basically is only used in shared memory and in wal records. For both using longer strings would be problematic. In the patch I have the user can actually see them as they're stored in pg_replication_identifier, but there should never be a need for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-06-18 12:36:13 -0400, Robert Haas wrote: > On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> >> I can clearly understand the utility of being able to reset the system > >> >> ID to a new, randomly-generated system ID - but giving the user the > >> >> ability to set a particular value of their own choosing seems like a > >> >> pretty sharp tool. What is the use case for that? > > > > I've previously hacked this up adhoc during data recovery when I needed > > to make another cluster similar enough that I could replay WAL. > > > > Another usecase is to mark a database as independent from its > > origin. Imagine a database that gets sharded across several > > servers. It's not uncommon to do that by initially basebackup'ing the > > database to several nodes and then use them separately from > > thereon. It's quite useful to actually mark them as being > > distinct. Especially as several of them right now would end up with the > > same timeline id... > > Sure, but that only requires being able to reset the ID randomly, not > to a particular value. I can definitely see a point in a version of the option that generates the id randomly. But the use case one up actually does require setting it to a s specific value... > > Uh. Right now this patch has been written because it's needed for a out > > of core replication solution. That's what BDR is at this point. The > > patch is unobtrusive, has other usecases than just our internal one and > > doesn't make pg_resetxlog even more dangerous than it already is. I > > don't see much problem with considering it on it's own cost/benefit? > > Well, I think it *does* make pg_resetxlog more dangerous; see previous > discussion of pg_computemaxlsn. Wasn't the thing around pg_computemaxlsn that there's actually no case where it could be used safely? And that experienced people suggested to use it an unsafe fashion? I don't see how the proposed ability makes it more dangerous. It *already* has the ability to reset the timelineid. That's the case where users are much more likely to think about resetting it because that's much more plausible than taking a unrelated cluster and resetting its sysid, timeline and LSN. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-06-18 18:54:05 +0200, Andres Freund wrote: > On 2014-06-18 12:36:13 -0400, Robert Haas wrote: > > Sure, but that only requires being able to reset the ID randomly, not > > to a particular value. > > I can definitely see a point in a version of the option that generates > the id randomly. That's actually included in the patch btw (thanks Abhijit)... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Well, I think it *does* make pg_resetxlog more dangerous; see previous >> discussion of pg_computemaxlsn. > > Wasn't the thing around pg_computemaxlsn that there's actually no case > where it could be used safely? And that experienced people suggested to > use it an unsafe fashion? There is a use case - to determine whether WAL has been replayed "from the future" relative to the WAL stream and control file you have on disk. But the rest is true enough. > I don't see how the proposed ability makes it more dangerous. It > *already* has the ability to reset the timelineid. That's the case where > users are much more likely to think about resetting it because that's > much more plausible than taking a unrelated cluster and resetting its > sysid, timeline and LSN. All right, well, I've said my piece. I don't have anything to add to that that isn't sheer repetition. My vote is to hold off on this until we've talked about replication identifiers and other related topics in more depth. But if that position doesn't garner majority support ... so be it! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18/06/14 19:26, Robert Haas wrote: > On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I don't see how the proposed ability makes it more dangerous. It >> *already* has the ability to reset the timelineid. That's the case where >> users are much more likely to think about resetting it because that's >> much more plausible than taking a unrelated cluster and resetting its >> sysid, timeline and LSN. > > All right, well, I've said my piece. I don't have anything to add to > that that isn't sheer repetition. My vote is to hold off on this > until we've talked about replication identifiers and other related > topics in more depth. But if that position doesn't garner majority > support ... so be it! > I am not sure I get what does this have to do with replication identifiers. The patch has several use-cases, one of them has to do that you can know the future system id before you set it, which is useful for automating some things... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-06-18 13:26:37 -0400, Robert Haas wrote: > My vote is to hold off on this until we've talked about replication > identifiers and other related topics in more depth. I really don't understand this. We're *NOT* proposing this patch as an underhanded way of preempting the discussion of whether/how replication identifiers are going to be used. We're proposing it because we currently have a need for the facility and this will reduce the number of patches we have to keep around after 9.5. And more importantly because there's several other use cases than our internal one for it. To settle one more point: I am not planning to propose BDR's generation of replication identifier names for integration. It works well enough for BDR but I think we can come up with something better for core. If I had my current knowledge two years back I'd not have chosen the current scheme. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/13/2014 05:31 PM, Petr Jelinek wrote: > Hello, > > attached is a simple patch which makes it possible to change the system > identifier of the cluster in pg_control. This is useful for > individualization of the instance that is started on top of data > directory produced by pg_basebackup - something that's helpful for > logical replication setup where you need to easily identify each node > (it's used by Bidirectional Replication for example). I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it be better design to have an independant function, "pg_set_system_identifier"? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > On 06/13/2014 05:31 PM, Petr Jelinek wrote: > > Hello, > > > > attached is a simple patch which makes it possible to change the system > > identifier of the cluster in pg_control. This is useful for > > individualization of the instance that is started on top of data > > directory produced by pg_basebackup - something that's helpful for > > logical replication setup where you need to easily identify each node > > (it's used by Bidirectional Replication for example). > > I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it > be better design to have an independant function, > "pg_set_system_identifier"? We have overloaded pg_resetxlog for all pg_control-tweaking needs. I don't see any reason to do differently here. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-06-18 10:44:56 -0700, Josh Berkus wrote: > On 06/13/2014 05:31 PM, Petr Jelinek wrote: > > Hello, > > > > attached is a simple patch which makes it possible to change the system > > identifier of the cluster in pg_control. This is useful for > > individualization of the instance that is started on top of data > > directory produced by pg_basebackup - something that's helpful for > > logical replication setup where you need to easily identify each node > > (it's used by Bidirectional Replication for example). > > I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it > be better design to have an independant function, > "pg_set_system_identifier"? You mean an independent binary? Because it's not possible to change this at runtime. If so, it's because pg_resetxlog already has the option to change many related things (e.g. the timeline id). And it'd require another copy of several hundred lines of code. It's all stored in the control file/checkpoints. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
At 2014-06-18 10:44:56 -0700, josh@agliodbs.com wrote: > > I'm unclear on why we would overload pg_resetxlog for this. Because pg_resetxlog already does something very similar, so the patch is small. If it were independent, it would have to copy quite some code from pg_resetxlog. > Wouldn't it be better design to have an independant function, > "pg_set_system_identifier"? A *function*? Why? -- Abhijit
On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote: > At 2014-06-18 10:44:56 -0700, josh@agliodbs.com wrote: >> >> I'm unclear on why we would overload pg_resetxlog for this. > > Because pg_resetxlog already does something very similar, so the patch > is small. If it were independent, it would have to copy quite some code > from pg_resetxlog. Aha. In that case, it seems like it's time to rename pg_resetxlog, if it does a bunch of things that aren't resetting the xlog. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-06-18 10:59:59 -0700, Josh Berkus wrote: > On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote: > > At 2014-06-18 10:44:56 -0700, josh@agliodbs.com wrote: > >> > >> I'm unclear on why we would overload pg_resetxlog for this. > > > > Because pg_resetxlog already does something very similar, so the patch > > is small. If it were independent, it would have to copy quite some code > > from pg_resetxlog. > > Aha. In that case, it seems like it's time to rename pg_resetxlog, if > it does a bunch of things that aren't resetting the xlog. Well, all those actually do write to the xlog (to write a new checkpoint, containing the updated control file). Since pg_resetxlog has done all this pretty much since forever renaming it now seems to be a big hassle for users for pretty much no benefit? This isn't a tool the average user should ever touch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/18/2014 11:03 AM, Andres Freund wrote: > Well, all those actually do write to the xlog (to write a new > checkpoint, containing the updated control file). Since pg_resetxlog has > done all this pretty much since forever renaming it now seems to be a > big hassle for users for pretty much no benefit? This isn't a tool the > average user should ever touch. If we're using it to create a unique system ID which can be used to orchestrate replication and clustering systems, a lot more people are going to be touching it than ever did before -- and not just for BDR. Or are you saying that we have to destroy the data by resetting the xlog before we can change the system identifier? If so, this feature is less than completely useful ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-06-18 12:36:13 -0400, Robert Haas wrote: >> > I actually don't think any of the discussions I was involved in had the >> > externally visible version of replication identifiers limited to 16bits? >> > If you are referring to my patch, 16bits was just the width of the >> > *internal* name that should basically never be looked at. User visible >> > replication identifiers are always identified by an arbitrary string - >> > whose format is determined by the user of the replication identifier >> > facility. *BDR* currently stores the system identifer, the database id >> > and a name in there - but that's nothing core needs to concern itself >> > with. >> >> I don't think you're going to be able to avoid users needing to know >> about those IDs. The configuration table is going to have to be the >> same on all nodes, and how are you going to get that set up without >> those IDs being user-visible? > > Why? Users and other systems only ever see the external ID. Everything > leaving the system is converted to the external form. The short id > basically is only used in shared memory and in wal records. For both > using longer strings would be problematic. > > In the patch I have the user can actually see them as they're stored in > pg_replication_identifier, but there should never be a need for that. Hmm, so there's no requirement that the short IDs are consistent across different clusters that are replication to each other? If that's the case, that might address my concern, but I'd probably want to go back through the latest patch and think about it a bit more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-23 10:09:49 -0400, Robert Haas wrote: > On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-06-18 12:36:13 -0400, Robert Haas wrote: > >> > I actually don't think any of the discussions I was involved in had the > >> > externally visible version of replication identifiers limited to 16bits? > >> > If you are referring to my patch, 16bits was just the width of the > >> > *internal* name that should basically never be looked at. User visible > >> > replication identifiers are always identified by an arbitrary string - > >> > whose format is determined by the user of the replication identifier > >> > facility. *BDR* currently stores the system identifer, the database id > >> > and a name in there - but that's nothing core needs to concern itself > >> > with. > >> > >> I don't think you're going to be able to avoid users needing to know > >> about those IDs. The configuration table is going to have to be the > >> same on all nodes, and how are you going to get that set up without > >> those IDs being user-visible? > > > > Why? Users and other systems only ever see the external ID. Everything > > leaving the system is converted to the external form. The short id > > basically is only used in shared memory and in wal records. For both > > using longer strings would be problematic. > > > > In the patch I have the user can actually see them as they're stored in > > pg_replication_identifier, but there should never be a need for that. > > Hmm, so there's no requirement that the short IDs are consistent > across different clusters that are replication to each other? Nope. That seemed to be a hard requirement in the earlier discussions we had (~2 years ago). > If > that's the case, that might address my concern, but I'd probably want > to go back through the latest patch and think about it a bit more. I'll send out a new version after I'm finished with the newest atomic ops patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> > Why? Users and other systems only ever see the external ID. Everything >> > leaving the system is converted to the external form. The short id >> > basically is only used in shared memory and in wal records. For both >> > using longer strings would be problematic. >> > >> > In the patch I have the user can actually see them as they're stored in >> > pg_replication_identifier, but there should never be a need for that. >> >> Hmm, so there's no requirement that the short IDs are consistent >> across different clusters that are replication to each other? > > Nope. That seemed to be a hard requirement in the earlier discussions we > had (~2 years ago). Oh, great. Somehow I missed the fact that that had been addressed. I had assumed that we still needed global identifiers in which case I think they'd need to be 64+ bits (preferably more like 128). If they only need to be locally significant that makes things much better. Is there any real reason to add a pg_replication_identifier table, or should we just let individual replication solutions manage the identifiers within their own configuration tables? I guess one question is: What happens if there are multiple replication solutions in use on a single server? How do they coordinate? >> If >> that's the case, that might address my concern, but I'd probably want >> to go back through the latest patch and think about it a bit more. > > I'll send out a new version after I'm finished with the newest atomic > ops patch. Sweet. I'm a little backed up right now, but will look at it when able. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-23 10:45:51 -0400, Robert Haas wrote: > On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> > Why? Users and other systems only ever see the external ID. Everything > >> > leaving the system is converted to the external form. The short id > >> > basically is only used in shared memory and in wal records. For both > >> > using longer strings would be problematic. > >> > > >> > In the patch I have the user can actually see them as they're stored in > >> > pg_replication_identifier, but there should never be a need for that. > >> > >> Hmm, so there's no requirement that the short IDs are consistent > >> across different clusters that are replication to each other? > > > > Nope. That seemed to be a hard requirement in the earlier discussions we > > had (~2 years ago). > > Oh, great. Somehow I missed the fact that that had been addressed. I > had assumed that we still needed global identifiers in which case I > think they'd need to be 64+ bits (preferably more like 128). If they > only need to be locally significant that makes things much better. Well, I was just talking about the 'short ids' here and how they are used in crash recovery/shmem et al. Those indeed don't need to be coordinated. If you ever use logical decoding on a system that receives changes from other systems (cascading replication, multimaster) you'll likely want to add the *long* form of that identifier to the output in the output plugin so the downstream nodes can identify the source. How one specific replication solution deals with coordinating this between systems is essentially that suite's problem. The external identifier currently is a 'text' column, so essentially unlimited. (Well, I just noticed that the table currently doesn't have a toast table assigned, so it's only a couple kb right now, but ...) > Is there any real reason to add a pg_replication_identifier table, or > should we just let individual replication solutions manage the > identifiers within their own configuration tables? I don't think that'd work. During crash recovery the short/internal IDs are read from WAL records and need to be unique across *all* databases. Since there's no way for different replication solutions or even the same to coordinate this across databases (as there's no way to add shared relations) it has to be builtin. It's also useful so we can have stuff like the 'pg_replication_identifier_progress' view which tells you internal_id, external_id, remote_lsn, local_lsn. Just showing the internal ID would imo be bad. > I guess one > question is: What happens if there are multiple replication solutions > in use on a single server? How do they coordinate? What's your concern here? You're wondering how they can make sure the identifiers they create are non-overlapping? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 23, 2014 at 11:28 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Oh, great. Somehow I missed the fact that that had been addressed. I >> had assumed that we still needed global identifiers in which case I >> think they'd need to be 64+ bits (preferably more like 128). If they >> only need to be locally significant that makes things much better. > > Well, I was just talking about the 'short ids' here and how they are > used in crash recovery/shmem et al. Those indeed don't need to be > coordinated. > If you ever use logical decoding on a system that receives changes from > other systems (cascading replication, multimaster) you'll likely want to > add the *long* form of that identifier to the output in the output > plugin so the downstream nodes can identify the source. How one > specific replication solution deals with coordinating this between > systems is essentially that suite's problem. OK. > The external identifier currently is a 'text' column, so essentially > unlimited. (Well, I just noticed that the table currently doesn't have a > toast table assigned, so it's only a couple kb right now, but ...) OK. I have no clear reason to dislike that. >> Is there any real reason to add a pg_replication_identifier table, or >> should we just let individual replication solutions manage the >> identifiers within their own configuration tables? > > I don't think that'd work. During crash recovery the short/internal IDs > are read from WAL records and need to be unique across *all* > databases. Since there's no way for different replication solutions or > even the same to coordinate this across databases (as there's no way to > add shared relations) it has to be builtin. That makes sense. > It's also useful so we can have stuff like the > 'pg_replication_identifier_progress' view which tells you internal_id, > external_id, remote_lsn, local_lsn. Just showing the internal ID would > imo be bad. OK. >> I guess one >> question is: What happens if there are multiple replication solutions >> in use on a single server? How do they coordinate? > > What's your concern here? You're wondering how they can make sure the > identifiers they create are non-overlapping? Yeah, I was just thinking that might be why you installed a catalog table for this, but now I see that there are several other reasons also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<font><span style="background-color:rgba(255,255,255,0)">Hi,<br /><br />I send you review comment about thie patch.<br /><br/>I found no error/warning with compling and installation.<br />I have executed pg_resetxlog with some input pattern.<br/><br />$ initdb -D data -E UTF8 --no-locale<br />$ pg_controldata data | grep "Database system identifier"<br/>Database system identifier: 6028907917695471865<br /><br />--<br />$ pg_resetxlog -s -n data |grep "Database system identifier"<br /> Database system identifier: 6028907917695471865<br /><br />The -s optiondoes not works<span></span> fine with -n option.<br /><br />--<br />$ pg_resetxlog -s6028907917695471865111111111111111111111111111111111111111111111111111111data<br /> Transaction log reset<br />$ pg_controldatadata | grep "Database system identifier"<br />Database system identifier: 18446744073709551615<br/><br />pg_resetxlog is finished successfully, but system identifier was not changed.<br/> Also I think that checking data about number of digits is needed.<br /><br />regards<br />--<br />Sawada Masahiko</span></font><br/><br />On Thursday, June 19, 2014, Petr Jelinek <<a href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 18/06/14 19:26, Robert Haas wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Wed, Jun 18, 2014at 12:54 PM, Andres Freund <<a>andres@2ndquadrant.com</a>> wrote:<br /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I don't see how the proposed ability makes it moredangerous. It<br /> *already* has the ability to reset the timelineid. That's the case where<br /> users are much morelikely to think about resetting it because that's<br /> much more plausible than taking a unrelated cluster and resettingits<br /> sysid, timeline and LSN.<br /></blockquote><br /> All right, well, I've said my piece. I don't have anythingto add to<br /> that that isn't sheer repetition. My vote is to hold off on this<br /> until we've talked aboutreplication identifiers and other related<br /> topics in more depth. But if that position doesn't garner majority<br/> support ... so be it!<br /><br /></blockquote><br /> I am not sure I get what does this have to do with replicationidentifiers. The patch has several use-cases, one of them has to do that you can know the future system id beforeyou set it, which is useful for automating some things...<br /><br /> -- <br /> Petr Jelinek <a href="http://www.2ndQuadrant.com/"target="_blank">http://www.2ndQuadrant.com/</a><br /> PostgreSQL Development, 24x7 Support,Training & Services<br /><br /><br /> -- <br /> Sent via pgsql-hackers mailing list (<a>pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/<u></u>mailpref/pgsql-hackers</a><br/></blockquote><br /><br />-- <br />Regards,<br/><br />-------<br />Sawada Masahiko<br /><br />
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On Thu, Jun 26, 2014 at 2:43 AM, SawadaMasahiko <span dir="ltr"><<a href="mailto:sawada.mshk@gmail.com" target="_blank">sawada.mshk@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><font><span style="background-color:rgba(255,255,255,0)">Hi,<br /><br />Isend you review comment about thie patch.<br /><br /> I found no error/warning with compling and installation.<br />Ihave executed pg_resetxlog with some input pattern.<br /><br />$ initdb -D data -E UTF8 --no-locale<br />$ pg_controldatadata | grep "Database system identifier"<br />Database system identifier: 6028907917695471865<br/><br />--<br />$ pg_resetxlog -s -n data | grep "Database system identifier"<br /> Databasesystem identifier: 6028907917695471865<br /><br />The -s option does not works<span></span> fine with -noption.<br /><br />--<br />$ pg_resetxlog -s6028907917695471865111111111111111111111111111111111111111111111111111111 data<br/> Transaction log reset<br />$ pg_controldata data | grep "Database system identifier"<br />Database system identifier: 18446744073709551615<br /><br />pg_resetxlog is finished successfully, but system identifier was notchanged.<br /> Also I think that checking data about number of digits is needed.<br /></span></font></blockquote><spanclass="HOEnZb"><font color="#888888"><br /></font></span></div><div class="gmail_quote"><spanclass="HOEnZb"><font color="#888888"> Yep, system_identifier is a uint64, and the input you aregiving here is incompatible with that.<br /></font></span></div>-- <br />Michael<br /></div></div>
On 25/06/14 19:43, Sawada Masahiko wrote: > Hi, > > I send you review comment about thie patch. Hello, thanks. > -- > $ pg_resetxlog -s -n data | grep "Database system identifier" > Database system identifier: 6028907917695471865 > > The -s option does not worksfine with -n option. Fixed. > -- > $ pg_resetxlog > -s6028907917695471865111111111111111111111111111111111111111111111111111111 > data > Transaction log reset > $ pg_controldata data | grep "Database system identifier" > Database system identifier: 18446744073709551615 > > pg_resetxlog is finished successfully, but system identifier was not > changed. > Also I think that checking data about number of digits is needed. > It actually did change the identifier, just to ULONG_MAX, since that's the maximum value that can be set (scanf does that conversion), I added check that limits the maximum value of system identifier input to ULONG_MAX-1 and reports error if it's bigger. I also added some additional input validation. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
<font><span style="background-color:rgba(255,255,255,0)">Thank you for updating the patch.<br />I think that the followingbehaviour of pg_resetxlog is bug.<br /><br />$ pg_controldata data | grep "Database system identifier"<br /> Databasesystem identifier: 6029284919152642525<br /><br />--<br />$ pg_resetxlog -s0 -n data<br />Current pg_controlvalues:<br /><br />pg_control version number: 942<br />Catalog version number: <a href="tel:201406181">201406181</a><br/> Database system identifier: 6029284919152642525<br />Latest checkpoint'sTimeLineID: 1<br />Latest checkpoint's full_page_writes: on<br />Latest checkpoint's NextXID: 0/1810<br/>Latest checkpoint's NextOID: 13004<br /> Latest checkpoint's NextMultiXactId: 1<br />Latestcheckpoint's NextMultiOffset: 0<br />Latest checkpoint's oldestXID: 1800<br />Latest checkpoint's oldestXID'sDB: 1<br />Latest checkpoint's oldestActiveXID: 0<br /> Latest checkpoint's oldestMultiXid: 1<br />Latestcheckpoint's oldestMulti's DB: 1<br />Maximum data alignment: 8<br />Database block size: 8192<br/>Blocks per segment of large relation: 131072<br /> WAL block size: 8192<br/>Bytes per WAL segment: <a href="tel:16777216">16777216</a><br />Maximum lengthof identifiers: 64<br />Maximum columns in an index: 32<br />Maximum size of a TOAST chunk: 1996<br/> Size of a large-object chunk: 2048<br />Date/time type storage: 64-bit integers<br/>Float4 argument passing: by value<br />Float8 argument passing: by value<br />Datapage checksum version: 0<br /><br /><br />Values to be changed:<br /><br />First log segment after reset: 000000010000000000000002<br /><br />--<br />$ pg_resetxlog -s0 data<br />Transaction log reset<br /><span></span>$pg_controldata data | grep "Database system identifier"<br /> Database system identifier: 6029284919152642525<br/><br />this patch dose not works fine with -s0.<br /><br />Regards,<br />--<br />SawadaMasahiko</span></font><br /><br />On Thursday, June 26, 2014, Petr Jelinek <<a href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 25/06/14 19:43, Sawada Masahiko wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi,<br /><br /> Isend you review comment about thie patch.<br /></blockquote><br /> Hello, thanks.<br /><br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> --<br /> $ pg_resetxlog -s -n data| grep "Database system identifier"<br /> Database system identifier: 6028907917695471865<br /><br /> The-s option does not worksfine with -n option.<br /></blockquote><br /> Fixed.<br /><br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> --<br /> $ pg_resetxlog<br /> -<u></u>s60289079176954718651111111111<u></u>111111111111111111111111111111<u></u>11111111111111<br/> data<br /> Transactionlog reset<br /> $ pg_controldata data | grep "Database system identifier"<br /> Database system identifier: 18446744073709551615<br /><br /> pg_resetxlog is finished successfully, but system identifier was not<br /> changed.<br/> Also I think that checking data about number of digits is needed.<br /><br /></blockquote><br /> It actuallydid change the identifier, just to ULONG_MAX, since that's the maximum value that can be set (scanf does that conversion),I added check that limits the maximum value of system identifier input to ULONG_MAX-1 and reports error if it'sbigger. I also added some additional input validation.<br /><br /> -- <br /> Petr Jelinek <a href="http://www.2ndQuadrant.com/"target="_blank">http://www.2ndQuadrant.com/</a><br /> PostgreSQL Development, 24x7 Support,Training & Services<br /></blockquote><br /><br />-- <br />Regards,<br /><br />-------<br />Sawada Masahiko<br/><br />
On 26/06/14 19:57, Sawada Masahiko wrote: > $ pg_resetxlog -s0 data > Transaction log reset > $ pg_controldata data | grep "Database system identifier" > Database system identifier: 6029284919152642525 > > this patch dose not works fine with -s0. > Yes, this is a bug, 0 input should throw error, which it does now. Also based on Alvaro's comment, I replaced the scanf parsing code with strtoul(l) function. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
At 2014-06-27 00:51:02 +0200, petr@2ndquadrant.com wrote: > > Also based on Alvaro's comment, I replaced the scanf parsing code with > strtoul(l) function. As far as I can tell (from the thread and a quick look at the patch), your latest version of the patch addresses all the review comments. Should I mark this ready for committer now? -- Abhijit
On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote: > At 2014-06-27 00:51:02 +0200, petr@2ndquadrant.com wrote: > > > > Also based on Alvaro's comment, I replaced the scanf parsing code with > > strtoul(l) function. > > As far as I can tell (from the thread and a quick look at the patch), > your latest version of the patch addresses all the review comments. > > Should I mark this ready for committer now? Well, we haven't really agreed on a way forward yet. Robert disagrees that the feature is independently useful and thinks it might be dangerous for some users. I don't agree with either of those points, but that doesn't absolve the patch from the objection. I think tentatively have been more people in favor, but it's not clear cut imo. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jun 29, 2014 at 11:18 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote: >> At 2014-06-27 00:51:02 +0200, petr@2ndquadrant.com wrote: >> > >> > Also based on Alvaro's comment, I replaced the scanf parsing code with >> > strtoul(l) function. >> >> As far as I can tell (from the thread and a quick look at the patch), >> your latest version of the patch addresses all the review comments. >> >> Should I mark this ready for committer now? > > Well, we haven't really agreed on a way forward yet. Robert disagrees > that the feature is independently useful and thinks it might be > dangerous for some users. I don't agree with either of those points, but > that doesn't absolve the patch from the objection. I think tentatively > have been more people in favor, but it's not clear cut imo. So what's the usecase of this feature? If it's for "normal operation", using pg_resetxlog for that is a bit dangerous because it can corrupt the database easily. Regards, -- Fujii Masao
On 2014-06-30 19:57:58 +0900, Fujii Masao wrote: > On Sun, Jun 29, 2014 at 11:18 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote: > >> At 2014-06-27 00:51:02 +0200, petr@2ndquadrant.com wrote: > >> > > >> > Also based on Alvaro's comment, I replaced the scanf parsing code with > >> > strtoul(l) function. > >> > >> As far as I can tell (from the thread and a quick look at the patch), > >> your latest version of the patch addresses all the review comments. > >> > >> Should I mark this ready for committer now? > > > > Well, we haven't really agreed on a way forward yet. Robert disagrees > > that the feature is independently useful and thinks it might be > > dangerous for some users. I don't agree with either of those points, but > > that doesn't absolve the patch from the objection. I think tentatively > > have been more people in favor, but it's not clear cut imo. > > So what's the usecase of this feature? If it's for "normal operation", > using pg_resetxlog for that is a bit dangerous because it can corrupt > the database easily. a) Mark a database as not being the same. Currently if you promote two databases, e.g. to shard out, they'll continue tohave same system identifier. Which really sucks, especially because timeline ids will often increase synchronously. b) For data recovery it's sometimes useful to create a new database (with the same catalog state) and replay all WAL. Forthat you need to reset the system identifier. I've done so hacking up resetxlog before. c) We already allow to set pretty much all aspects of the control file via resetxlog - there seems little point of not havingthe ability to change the system identifier. d) In a logical replication scenario one way to identify individual nodes is via the system identifier. If you want to converta basebackup into logical standby one sensible way to do so is to create a logical replication slots *before* promotinga physical backup to guarantee that slot is able to stream out all changes. If the slot names contain the consumer'ssystem identifier you need to know the new system identifier beforehand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > c) We already allow to set pretty much all aspects of the control file > via resetxlog - there seems little point of not having the ability to > change the system identifier. I think it's pretty much a given that pg_resetxlog is a tool that can have disastrous effects if used lightly. If people changes their sysid wrongly, they're not any worse than if they change their multixact counters and start getting failures because the old values stored in data cannot be resolved anymore ("it's already been wrapped around"). Or if they remove all the XLOG they have since the latest crash. From that POV, I don't think the objection that "but this can be used to corrupt data!" has any value. Also on the other hand pg_resetxlog is already a tool for PG hackers to fool around and test things. In a way, being able to change values in pg_control is useful to many of us; this is only an extension of that. If we only had bricks and mortar, I think we would have a tool to display and tweak pg_control separately from emptying pg_xlog, rather than this odd separation between pg_controldata and pg_resetxlog, each of which do a mixture of those things. But we have a wall two thirds done already, so it seems to make more sense to me to continue forward rather than tear it down and start afresh. This patch is a natural extension of what we already have. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 30, 2014 at 12:46 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I think it's pretty much a given that pg_resetxlog is a tool that can > have disastrous effects if used lightly. If people changes their sysid > wrongly, they're not any worse than if they change their multixact > counters and start getting failures because the old values stored in > data cannot be resolved anymore ("it's already been wrapped around"). > Or if they remove all the XLOG they have since the latest crash. From > that POV, I don't think the objection that "but this can be used to > corrupt data!" has any value. After thinking about this a little more, I guess I don't really think it's a bit problem either - so consider my objection withdrawn. I am, however, kind of frustrated, still, that the pg_computemaxlsn patch, which I thought was rather a good idea, was scuttled by the essentially that same objection: let's not extend pg_resetxlog & friends because people might use the new functionality to do bad things and then blame us. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Mon, Jun 30, 2014 at 12:46 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > I think it's pretty much a given that pg_resetxlog is a tool that can > > have disastrous effects if used lightly. If people changes their sysid > > wrongly, they're not any worse than if they change their multixact > > counters and start getting failures because the old values stored in > > data cannot be resolved anymore ("it's already been wrapped around"). > > Or if they remove all the XLOG they have since the latest crash. From > > that POV, I don't think the objection that "but this can be used to > > corrupt data!" has any value. > > After thinking about this a little more, I guess I don't really think > it's a bit problem either - so consider my objection withdrawn. Great. > I am, however, kind of frustrated, still, that the pg_computemaxlsn > patch, which I thought was rather a good idea, was scuttled by the > essentially that same objection: let's not extend pg_resetxlog & > friends because people might use the new functionality to do bad > things and then blame us. Uh, I thought you killed that patch yourself: http://www.postgresql.org/message-id/CA+TgmoZqbYHWYbi18FUk-+dGHig=icV+pj-NNav3miy4dajgrQ@mail.gmail.com -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-07-01 11:11:12 -0400, Robert Haas wrote: > On Mon, Jun 30, 2014 at 12:46 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > I think it's pretty much a given that pg_resetxlog is a tool that can > > have disastrous effects if used lightly. If people changes their sysid > > wrongly, they're not any worse than if they change their multixact > > counters and start getting failures because the old values stored in > > data cannot be resolved anymore ("it's already been wrapped around"). > > Or if they remove all the XLOG they have since the latest crash. From > > that POV, I don't think the objection that "but this can be used to > > corrupt data!" has any value. > > After thinking about this a little more, I guess I don't really think > it's a bit problem either - so consider my objection withdrawn. Thanks! > I am, however, kind of frustrated, still, that the pg_computemaxlsn > patch, which I thought was rather a good idea, was scuttled by the > essentially that same objection: let's not extend pg_resetxlog & > friends because people might use the new functionality to do bad > things and then blame us. Well, the reasons were a bit different. Senior community members repeatedly suggested that it'd be usable for faillback - and it's not a unreasonable to think it is. Even though it'd fail subtly because of hint bit and related reasons. In contrast you have to be pretty desperate to think that you could make two clusters replicate from each other by just fudging pg_control long enough, even if the clusters aren't actually related. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jul 1, 2014 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> I am, however, kind of frustrated, still, that the pg_computemaxlsn >> patch, which I thought was rather a good idea, was scuttled by the >> essentially that same objection: let's not extend pg_resetxlog & >> friends because people might use the new functionality to do bad >> things and then blame us. > > Well, the reasons were a bit different. Senior community members > repeatedly suggested that it'd be usable for faillback - and it's not a > unreasonable to think it is. Even though it'd fail subtly because of > hint bit and related reasons. > In contrast you have to be pretty desperate to think that you could make > two clusters replicate from each other by just fudging pg_control long > enough, even if the clusters aren't actually related. Well, I still think it's pretty likely someone could make that mistake here, too. Maybe not a senior community member, but somebody. However, I think the right answer in that case and this one is to tell the person who has made the mistake "you screwed up" and move on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-27 00:51:02 +0200, Petr Jelinek wrote: > - while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1) > + while ((c = getopt(argc, argv, "fl:m:no:O:x:e:s::")) != -1) Why two :? > { > switch (c) > { > @@ -227,6 +229,33 @@ main(int argc, char *argv[]) > XLogFromFileName(optarg, &minXlogTli, &minXlogSegNo); > break; > > + case 's': > + if (optarg) > + { > +#ifdef HAVE_STRTOULL > + set_sysid = strtoull(optarg, &endptr, 10); > +#else > + set_sysid = strtoul(optarg, &endptr, 10); > +#endif Wouldn't it be better to use sscanf()? That should work with large inputs across platforms. > + /* > + * Validate input, we use strspn because strtoul(l) accepts > + * negative numbers and silently converts them to unsigned > + */ > + if (set_sysid == 0 || errno != 0 || > + endptr == optarg || *endptr != '\0' || > + strspn(optarg, "0123456789") != strlen(optarg)) > + { > + fprintf(stderr, _("%s: invalid argument for option -s\n"), progname); > + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); > + exit(1); > + } Maybe: 'invalid argument \"%s\" ...'? > @@ -1087,6 +1147,7 @@ usage(void) > printf(_(" -o OID set next OID\n")); > printf(_(" -O OFFSET set next multitransaction offset\n")); > printf(_(" -V, --version output version information, then exit\n")); > + printf(_(" -s [SYSID] set system identifier (or generate one)\n")); > printf(_(" -x XID set next transaction ID\n")); > printf(_(" -?, --help show this help, then exit\n")); > printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n")); I think we usually try to keep the options roughly alphabetically ordered. Same in the getopt() above. Greetings, Andres Freund
On 2014-07-18 00:41:05 +0200, Andres Freund wrote: > On 2014-06-27 00:51:02 +0200, Petr Jelinek wrote: > > - while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1) > > + while ((c = getopt(argc, argv, "fl:m:no:O:x:e:s::")) != -1) > > Why two :? Obviously strike that, wanted to delete the paragraph, but forgot about it. I hadn't remembered the autogeneration of sysids at the beginning of the patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/30/2014 09:46 AM, Alvaro Herrera wrote: > If we only had bricks and mortar, I think we would have a tool to > display and tweak pg_control separately from emptying pg_xlog, rather > than this odd separation between pg_controldata and pg_resetxlog, each > of which do a mixture of those things. But we have a wall two thirds > done already, so it seems to make more sense to me to continue forward > rather than tear it down and start afresh. This patch is a natural > extension of what we already have. As I said previously, I disagree. Being able to set a system identifier at runtime is useful for a variety of scenarios which do not involve database surgery or the risk of wiping out your data; in fact, the only bad thing you can do by changing the system id is break the replication connection. As such, tying a change in system id to pg_resetxlog is kind of like having a combination dental floss dispenser and bazooka. "No, not *that* trigger!" It pretty much guarantees that the ability to change the systemid, which could be a generally useful feature with all kinds of application for HA, clusters and sharding, would remain an internal feature of BDR only, because everyone else will be afraid to touch it. If this means that we need to finally create pg_editcontroldata, then so be it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 18/07/14 00:41, Andres Freund wrote: > On 2014-06-27 00:51:02 +0200, Petr Jelinek wrote: >> { >> switch (c) >> { >> @@ -227,6 +229,33 @@ main(int argc, char *argv[]) >> XLogFromFileName(optarg, &minXlogTli, &minXlogSegNo); >> break; >> >> + case 's': >> + if (optarg) >> + { >> +#ifdef HAVE_STRTOULL >> + set_sysid = strtoull(optarg, &endptr, 10); >> +#else >> + set_sysid = strtoul(optarg, &endptr, 10); >> +#endif > > Wouldn't it be better to use sscanf()? That should work with large > inputs across platforms. > Well, sscanf does not do much validation and silently truncates too big input (which is why I replaced it with strtoull, original patch did have sscanf), also I think the portability of sscanf might not be as good, see 9d7ded0f4277f5c0063eca8e871a34e2355a8371 commit. >> + /* >> + * Validate input, we use strspn because strtoul(l) accepts >> + * negative numbers and silently converts them to unsigned >> + */ >> + if (set_sysid == 0 || errno != 0 || >> + endptr == optarg || *endptr != '\0' || >> + strspn(optarg, "0123456789") != strlen(optarg)) >> + { >> + fprintf(stderr, _("%s: invalid argument for option -s\n"), progname); >> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); >> + exit(1); >> + } > > Maybe: 'invalid argument \"%s\" ...'? > Ok >> @@ -1087,6 +1147,7 @@ usage(void) >> printf(_(" -o OID set next OID\n")); >> printf(_(" -O OFFSET set next multitransaction offset\n")); >> printf(_(" -V, --version output version information, then exit\n")); >> + printf(_(" -s [SYSID] set system identifier (or generate one)\n")); >> printf(_(" -x XID set next transaction ID\n")); >> printf(_(" -?, --help show this help, then exit\n")); >> printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n")); > > I think we usually try to keep the options roughly alphabetically > ordered. Same in the getopt() above. > Ok, attached v4 with those changes. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2014-07-18 13:08:24 +0200, Petr Jelinek wrote: > On 18/07/14 00:41, Andres Freund wrote: > >On 2014-06-27 00:51:02 +0200, Petr Jelinek wrote: > >> { > >> switch (c) > >> { > >>@@ -227,6 +229,33 @@ main(int argc, char *argv[]) > >> XLogFromFileName(optarg, &minXlogTli, &minXlogSegNo); > >> break; > >> > >>+ case 's': > >>+ if (optarg) > >>+ { > >>+#ifdef HAVE_STRTOULL > >>+ set_sysid = strtoull(optarg, &endptr, 10); > >>+#else > >>+ set_sysid = strtoul(optarg, &endptr, 10); > >>+#endif > > > >Wouldn't it be better to use sscanf()? That should work with large > >inputs across platforms. > > > > Well, sscanf does not do much validation and silently truncates too big > input (which is why I replaced it with strtoull, original patch did have > sscanf) Well, the checks on length you've added should catch that when adapted properly. >, also I think the portability of sscanf might not be as good, see > 9d7ded0f4277f5c0063eca8e871a34e2355a8371 commit. Fair point. But I don't think the arguments why using strtoul instead of strtoull is safe hold much water here. In the pg_stat_statement case the worst that could happen is that the rowcount isn't accurate. Here it's setting a wrong system identifier. Not really the same. Maybe it's time to pursue http://archives.postgresql.org/message-id/20140603144654.GM24145%40awork2.anarazel.de further :( Greetings, Andres Freund
On 18/07/14 13:18, Andres Freund wrote: > On 2014-07-18 13:08:24 +0200, Petr Jelinek wrote: >> On 18/07/14 00:41, Andres Freund wrote: >>> Wouldn't it be better to use sscanf()? That should work with large >>> inputs across platforms. >>> >> >> Well, sscanf does not do much validation and silently truncates too big >> input (which is why I replaced it with strtoull, original patch did have >> sscanf) > > Well, the checks on length you've added should catch that when adapted > properly. I don't see a portable way how to validate too big input values when using sscanf. > > Maybe it's time to pursue > http://archives.postgresql.org/message-id/20140603144654.GM24145%40awork2.anarazel.de > further :( > Yes, I've seen that patch and was quite sad that it didn't make it in core. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 06/18/2014 09:17 PM, Josh Berkus wrote: > On 06/18/2014 11:03 AM, Andres Freund wrote: >> Well, all those actually do write to the xlog (to write a new >> checkpoint, containing the updated control file). Since pg_resetxlog has >> done all this pretty much since forever renaming it now seems to be a >> big hassle for users for pretty much no benefit? This isn't a tool the >> average user should ever touch. > > If we're using it to create a unique system ID which can be used to > orchestrate replication and clustering systems, a lot more people are > going to be touching it than ever did before -- and not just for BDR. I think pg_resetxlog is still appropriate: changing the system ID will reset the WAL. In particular, any archived WAL will become useless. But yeah, this does change the target audience of pg_resetxlog significantly. Now that we'll have admins running pg_resetxlog as part of normal operations, we have to document it carefully. We also have to design the user interface carefully, to make it more clear that while resetting the system ID won't eat your data, some of the other settings will. The proposed "pg_resetxlog --help" output looks like this: > pg_resetxlog resets the PostgreSQL transaction log. > > Usage: > pg_resetxlog [OPTION]... DATADIR > > Options: > -e XIDEPOCH set next transaction ID epoch > -f force update to be done > -l XLOGFILE force minimum WAL starting location for new transaction log > -m MXID,MXID set next and oldest multitransaction ID > -n no update, just show what would be done (for testing) > -o OID set next OID > -O OFFSET set next multitransaction offset > -s [SYSID] set system identifier (or generate one) > -V, --version output version information, then exit > -x XID set next transaction ID > -?, --help show this help, then exit > > Report bugs to <pgsql-bugs@postgresql.org>. I don't think that's good enough. The -s option should be displayed more prominently, and the dangerous options like -l and -x should be more clearly labeled as such. I would de-emphasize setting the system ID to a particular value. It might be useful for disaster recovery, like -x, but in general you should always reset it to a new unique value. If you make it too easy to set it to a particular value, someone will try initialize a streaming standby server using initdb+pg_dump, and changing the system ID to match the master's. The user manual for pg_resetxlog says: > pg_resetxlog clears the write-ahead log (WAL) and optionally resets > some other control information stored in the pg_control file. This > function is sometimes needed if these files have become corrupted. It > should be used only as a last resort, when the server will not start > due to such corruption. That's clearly not true for the -s option. In summary, I think we want this feature in some form, but we'll somehow need to be make the distinction to the dangerous pg_resetxlog usage. It might be best, after all, to make this a separate utility, pg_resetsystemid. It would not need to have the capability to set the system ID to a particular value, only a randomly assigned one (setting it to a particular value could be added to pg_resetxlog, where other dangerous options are). - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > In summary, I think we want this feature in some form, but we'll somehow > need to be make the distinction to the dangerous pg_resetxlog usage. It > might be best, after all, to make this a separate utility, > pg_resetsystemid. That sounds fairly reasonable given your point about not wanting people to confuse this with the can-eat-your-data aspects of pg_resetxlog. (OTOH, won't this result in a lot of code duplication? We'd still need to erase and refill the WAL area.) > It would not need to have the capability to set the > system ID to a particular value, only a randomly assigned one (setting > it to a particular value could be added to pg_resetxlog, where other > dangerous options are). I'm less convinced about that. While you can shoot yourself in the foot by assigning the same system ID to two installations that share WAL archive or something like that, this feels a bit different than the ways you can shoot yourself in the foot with pg_resetxlog. If we do what you say here then I think we'll be right back to the discussion of how to separate the assign-a-sysID option from pg_resetxlog's other, more dangerous options. regards, tom lane
On August 25, 2014 9:45:50 PM CEST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> In summary, I think we want this feature in some form, but we'll >somehow >> need to be make the distinction to the dangerous pg_resetxlog usage. >It >> might be best, after all, to make this a separate utility, >> pg_resetsystemid. > >That sounds fairly reasonable given your point about not wanting people >to >confuse this with the can-eat-your-data aspects of pg_resetxlog. >(OTOH, >won't this result in a lot of code duplication? We'd still need to >erase >and refill the WAL area.) Let's move pg-controldata, resetxlog, resetsysid into a common src/bin directory. Then we can unify the relevant code betweenall three. -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/25/2014 10:45 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> It would not need to have the capability to set the >> system ID to a particular value, only a randomly assigned one (setting >> it to a particular value could be added to pg_resetxlog, where other >> dangerous options are). > > I'm less convinced about that. While you can shoot yourself in the foot > by assigning the same system ID to two installations that share WAL > archive or something like that, this feels a bit different than the ways > you can shoot yourself in the foot with pg_resetxlog. If we do what you > say here then I think we'll be right back to the discussion of how to > separate the assign-a-sysID option from pg_resetxlog's other, more > dangerous options. I don't see the use case for setting system id to a particular value. Andres listed four use cases upthread: > a) Mark a database as not being the same. Currently if you promote two > databases, e.g. to shard out, they'll continue to have same system > identifier. Which really sucks, especially because timeline ids will > often increase synchronously. Yes, this is the legitimate use case a DBA would use this feature for. Resetting the system ID to a random value suffices. > b) For data recovery it's sometimes useful to create a new database > (with the same catalog state) and replay all WAL. For that you need to > reset the system identifier. I've done so hacking up resetxlog > before. This falls squarely in the "dangerous" category, and you'll have to reset other things than the system ID to make it work. Having the option in pg_resetxlog is fine for this. > c) We already allow to set pretty much all aspects of the control file > via resetxlog - there seems little point of not having the ability to > change the system identifier. Ok, but it's not something a regular admin would ever use. > d) In a logical replication scenario one way to identify individual > nodes is via the system identifier. If you want to convert a > basebackup into logical standby one sensible way to do so is to > create a logical replication slots *before* promoting a physical > backup to guarantee that slot is able to stream out all changes. If > the slot names contain the consumer's system identifier you need to > know the new system identifier beforehand. I didn't understand this one. But it seems like the obvious solution is to not use the consumer's system identifier as the slot name. Or rename it afterwards. - Heikki
On Mon, Aug 25, 2014 at 4:06 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I didn't understand this one. But it seems like the obvious solution is to > not use the consumer's system identifier as the slot name. Or rename it > afterwards. You can't use the consumer's system identifier as the slot name, because you have to create the slot before you create the consumer. But you could rename it afterwards, or just use some other naming convention entirely, which is why I'm -0.25 on this whole proposal. What the 2ndQuadrant folks are proposing is not unreasonable (which is why I'm only -0.25) but it opens an (admittedly small) can of worms that I see no real need to open. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company