Thread: [HACKERS] Using non-sequential timelines in order to help with possible collisions
[HACKERS] Using non-sequential timelines in order to help with possible collisions
From
Brian Faherty
Date:
Hey hackers,
I was working with replication and recovery the other day and noticed that there were scenarios where I could cause multiple servers to enter the same timeline while possibly having divergent data. One such scenario is Master A and Replica B are both on timeline 1. There is an event that causes Replica B to become promoted which changes it to timeline 2. Following this, you perform a restore on Master A to a point before the event happened. Once Postgres completes this recovery on Master A, it will switch over to timeline 2. There are now WAL files that have been written to timeline 2 from both servers.
From this scenario, I would like to suggest considering using non-sequential timelines. From what I have investigated so far, I believe the *.history files in the WAL directory already have all the timelines id's in them and are in order. If we could make those timeline ids to be a bit more unique/random, and still rely on the ordering in the *.history file, I think this would help prevent multiple servers on the same timeline with divergent data.
I was hoping to begin a conversation on whether or not non-sequential timelines are a good idea before I looked at the code around timelines.
--
I was working with replication and recovery the other day and noticed that there were scenarios where I could cause multiple servers to enter the same timeline while possibly having divergent data. One such scenario is Master A and Replica B are both on timeline 1. There is an event that causes Replica B to become promoted which changes it to timeline 2. Following this, you perform a restore on Master A to a point before the event happened. Once Postgres completes this recovery on Master A, it will switch over to timeline 2. There are now WAL files that have been written to timeline 2 from both servers.
From this scenario, I would like to suggest considering using non-sequential timelines. From what I have investigated so far, I believe the *.history files in the WAL directory already have all the timelines id's in them and are in order. If we could make those timeline ids to be a bit more unique/random, and still rely on the ordering in the *.history file, I think this would help prevent multiple servers on the same timeline with divergent data.
I was hoping to begin a conversation on whether or not non-sequential timelines are a good idea before I looked at the code around timelines.
--
Brian Faherty
Re: [HACKERS] Using non-sequential timelines in order to help withpossible collisions
From
Robert Haas
Date:
On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty <anothergenericuser@gmail.com> wrote: > Hey hackers, > I was working with replication and recovery the other day and noticed that > there were scenarios where I could cause multiple servers to enter the same > timeline while possibly having divergent data. One such scenario is Master A > and Replica B are both on timeline 1. There is an event that causes Replica > B to become promoted which changes it to timeline 2. Following this, you > perform a restore on Master A to a point before the event happened. Once > Postgres completes this recovery on Master A, it will switch over to > timeline 2. There are now WAL files that have been written to timeline 2 > from both servers. > > From this scenario, I would like to suggest considering using non-sequential > timelines. From what I have investigated so far, I believe the *.history > files in the WAL directory already have all the timelines id's in them and > are in order. If we could make those timeline ids to be a bit more > unique/random, and still rely on the ordering in the *.history file, I think > this would help prevent multiple servers on the same timeline with divergent > data. > > I was hoping to begin a conversation on whether or not non-sequential > timelines are a good idea before I looked at the code around timelines. It's interesting that you bring this up. I've also wondered why we don't use random TLIs. I suppose I'm internally assuming that it's because the people who wrote the code are far more brilliant and knowledgeable of this area than I could ever be and that doing anything else would create some kind of awful problem, but maybe that's not so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Using non-sequential timelines in order to help withpossible collisions
From
Michael Paquier
Date:
On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty > <anothergenericuser@gmail.com> wrote: >> I was working with replication and recovery the other day and noticed that >> there were scenarios where I could cause multiple servers to enter the same >> timeline while possibly having divergent data. One such scenario is Master A >> and Replica B are both on timeline 1. There is an event that causes Replica >> B to become promoted which changes it to timeline 2. Following this, you >> perform a restore on Master A to a point before the event happened. Once >> Postgres completes this recovery on Master A, it will switch over to >> timeline 2. There are now WAL files that have been written to timeline 2 >> from both servers. >> >> From this scenario, I would like to suggest considering using non-sequential >> timelines. From what I have investigated so far, I believe the *.history >> files in the WAL directory already have all the timelines id's in them and >> are in order. If we could make those timeline ids to be a bit more >> unique/random, and still rely on the ordering in the *.history file, I think >> this would help prevent multiple servers on the same timeline with divergent >> data. It seems to me that you are missing one piece here: the history files generated at the moment of the timeline bump. When recovery finishes, an instance scans the archives or from the instances it is streaming from for history files, and chooses a timeline number that does not match existing ones. So you are trying to avoid a problem that can easily be solved with a proper archive for example. >> I was hoping to begin a conversation on whether or not non-sequential >> timelines are a good idea before I looked at the code around timelines. > > It's interesting that you bring this up. I've also wondered why we > don't use random TLIs. I suppose I'm internally assuming that it's > because the people who wrote the code are far more brilliant and > knowledgeable of this area than I could ever be and that doing > anything else would create some kind of awful problem, but maybe > that's not so. I am not the only who worked on that, but the result code is a tad more simple, as it is possible to guess more easily some hierarchy for the timelines, of course with the history files at hand. -- Michael
Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions
From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes: > On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> It's interesting that you bring this up. I've also wondered why we >> don't use random TLIs. I suppose I'm internally assuming that it's >> because the people who wrote the code are far more brilliant and >> knowledgeable of this area than I could ever be and that doing >> anything else would create some kind of awful problem, but maybe >> that's not so. > I am not the only who worked on that, but the result code is a tad > more simple, as it is possible to guess more easily some hierarchy for > the timelines, of course with the history files at hand. Yeah, right now you have the ability to guess that, say, timeline 42 is a descendant of 41, which you couldn't assume with random TLIs. Also, the values are only 32 bits, which is not wide enough to allow imagining that random() could be relied on to produce non-duplicate values. If we had separate database identifiers for slave installations, which AFAIR we don't, it'd be possible to consider incorporating part of the server ID into timeline IDs it creates, which would alleviate Brian's issue I think. That is, instead of 1, 2, 3, ..., a server might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits associated with the particular installation. This is obviously not bulletproof since you could have collisions of the xyz's, but it would help. Also you could imagine allowing DBAs to assign distinct xyz codes to every slave in a given community. regards, tom lane
Re: [HACKERS] Using non-sequential timelines in order to help withpossible collisions
From
Michael Paquier
Date:
On Wed, Jul 19, 2017 at 8:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> It's interesting that you bring this up. I've also wondered why we >>> don't use random TLIs. I suppose I'm internally assuming that it's >>> because the people who wrote the code are far more brilliant and >>> knowledgeable of this area than I could ever be and that doing >>> anything else would create some kind of awful problem, but maybe >>> that's not so. > >> I am not the only who worked on that, but the result code is a tad >> more simple, as it is possible to guess more easily some hierarchy for >> the timelines, of course with the history files at hand. > > Yeah, right now you have the ability to guess that, say, timeline 42 > is a descendant of 41, which you couldn't assume with random TLIs. > Also, the values are only 32 bits, which is not wide enough to allow > imagining that random() could be relied on to produce non-duplicate > values. pg_backend_random() perhaps? If any new code uses random(), those would be slashed quickly at review. > If we had separate database identifiers for slave installations, which > AFAIR we don't, it'd be possible to consider incorporating part of > the server ID into timeline IDs it creates, which would alleviate > Brian's issue I think. That is, instead of 1, 2, 3, ..., a server > might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits > associated with the particular installation. This is obviously > not bulletproof since you could have collisions of the xyz's, but > it would help. Also you could imagine allowing DBAs to assign > distinct xyz codes to every slave in a given community. I am not much into any concept of complicating the timeline name to be honest :) Having a unique identifier per node has value for other purposes, like clustering, and we would have the same information by adding in the history file the ID of the node that generated the new timeline. -- Michael