Thread: [HACKERS] Using non-sequential timelines in order to help with possible collisions

Hey hackers,
  I was working with replication and recovery the other day and noticed that there were scenarios where I could cause multiple servers to enter the same timeline while possibly having divergent data. One such scenario is Master A and Replica B are both on timeline 1. There is an event that causes Replica B to become promoted which changes it to timeline 2. Following this, you perform a restore on Master A to a point before the event happened. Once Postgres completes this recovery on Master A, it will switch over to timeline 2. There are now WAL files that have been written to timeline 2 from both servers.

From this scenario, I would like to suggest considering using non-sequential timelines. From what I have investigated so far, I believe the *.history files in the WAL directory already have all the timelines id's in them and are in order. If we could make those timeline ids to be a bit more unique/random, and still rely on the ordering in the *.history file, I think this would help prevent multiple servers on the same timeline with divergent data.

I was hoping to begin a conversation on whether or not non-sequential timelines are a good idea before I looked at the code around timelines.

--
Brian Faherty
On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty
<anothergenericuser@gmail.com> wrote:
> Hey hackers,
>   I was working with replication and recovery the other day and noticed that
> there were scenarios where I could cause multiple servers to enter the same
> timeline while possibly having divergent data. One such scenario is Master A
> and Replica B are both on timeline 1. There is an event that causes Replica
> B to become promoted which changes it to timeline 2. Following this, you
> perform a restore on Master A to a point before the event happened. Once
> Postgres completes this recovery on Master A, it will switch over to
> timeline 2. There are now WAL files that have been written to timeline 2
> from both servers.
>
> From this scenario, I would like to suggest considering using non-sequential
> timelines. From what I have investigated so far, I believe the *.history
> files in the WAL directory already have all the timelines id's in them and
> are in order. If we could make those timeline ids to be a bit more
> unique/random, and still rely on the ordering in the *.history file, I think
> this would help prevent multiple servers on the same timeline with divergent
> data.
>
> I was hoping to begin a conversation on whether or not non-sequential
> timelines are a good idea before I looked at the code around timelines.

It's interesting that you bring this up.  I've also wondered why we
don't use random TLIs.  I suppose I'm internally assuming that it's
because the people who wrote the code are far more brilliant and
knowledgeable of this area than I could ever be and that doing
anything else would create some kind of awful problem, but maybe
that's not so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty
> <anothergenericuser@gmail.com> wrote:
>>   I was working with replication and recovery the other day and noticed that
>> there were scenarios where I could cause multiple servers to enter the same
>> timeline while possibly having divergent data. One such scenario is Master A
>> and Replica B are both on timeline 1. There is an event that causes Replica
>> B to become promoted which changes it to timeline 2. Following this, you
>> perform a restore on Master A to a point before the event happened. Once
>> Postgres completes this recovery on Master A, it will switch over to
>> timeline 2. There are now WAL files that have been written to timeline 2
>> from both servers.
>>
>> From this scenario, I would like to suggest considering using non-sequential
>> timelines. From what I have investigated so far, I believe the *.history
>> files in the WAL directory already have all the timelines id's in them and
>> are in order. If we could make those timeline ids to be a bit more
>> unique/random, and still rely on the ordering in the *.history file, I think
>> this would help prevent multiple servers on the same timeline with divergent
>> data.

It seems to me that you are missing one piece here: the history files
generated at the moment of the timeline bump. When recovery finishes,
an instance scans the archives or from the instances it is streaming
from for history files, and chooses a timeline number that does not
match existing ones. So you are trying to avoid a problem that can
easily be solved with a proper archive for example.

>> I was hoping to begin a conversation on whether or not non-sequential
>> timelines are a good idea before I looked at the code around timelines.
>
> It's interesting that you bring this up.  I've also wondered why we
> don't use random TLIs.  I suppose I'm internally assuming that it's
> because the people who wrote the code are far more brilliant and
> knowledgeable of this area than I could ever be and that doing
> anything else would create some kind of awful problem, but maybe
> that's not so.

I am not the only who worked on that, but the result code is a tad
more simple, as it is possible to guess more easily some hierarchy for
the timelines, of course with the history files at hand.
-- 
Michael



Michael Paquier <michael.paquier@gmail.com> writes:
> On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It's interesting that you bring this up.  I've also wondered why we
>> don't use random TLIs.  I suppose I'm internally assuming that it's
>> because the people who wrote the code are far more brilliant and
>> knowledgeable of this area than I could ever be and that doing
>> anything else would create some kind of awful problem, but maybe
>> that's not so.

> I am not the only who worked on that, but the result code is a tad
> more simple, as it is possible to guess more easily some hierarchy for
> the timelines, of course with the history files at hand.

Yeah, right now you have the ability to guess that, say, timeline 42
is a descendant of 41, which you couldn't assume with random TLIs.
Also, the values are only 32 bits, which is not wide enough to allow
imagining that random() could be relied on to produce non-duplicate
values.

If we had separate database identifiers for slave installations, which
AFAIR we don't, it'd be possible to consider incorporating part of
the server ID into timeline IDs it creates, which would alleviate
Brian's issue I think.  That is, instead of 1, 2, 3, ..., a server
might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits
associated with the particular installation.  This is obviously
not bulletproof since you could have collisions of the xyz's, but
it would help.  Also you could imagine allowing DBAs to assign
distinct xyz codes to every slave in a given community.
        regards, tom lane



On Wed, Jul 19, 2017 at 8:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> It's interesting that you bring this up.  I've also wondered why we
>>> don't use random TLIs.  I suppose I'm internally assuming that it's
>>> because the people who wrote the code are far more brilliant and
>>> knowledgeable of this area than I could ever be and that doing
>>> anything else would create some kind of awful problem, but maybe
>>> that's not so.
>
>> I am not the only who worked on that, but the result code is a tad
>> more simple, as it is possible to guess more easily some hierarchy for
>> the timelines, of course with the history files at hand.
>
> Yeah, right now you have the ability to guess that, say, timeline 42
> is a descendant of 41, which you couldn't assume with random TLIs.
> Also, the values are only 32 bits, which is not wide enough to allow
> imagining that random() could be relied on to produce non-duplicate
> values.

pg_backend_random() perhaps? If any new code uses random(), those
would be slashed quickly at review.

> If we had separate database identifiers for slave installations, which
> AFAIR we don't, it'd be possible to consider incorporating part of
> the server ID into timeline IDs it creates, which would alleviate
> Brian's issue I think.  That is, instead of 1, 2, 3, ..., a server
> might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits
> associated with the particular installation.  This is obviously
> not bulletproof since you could have collisions of the xyz's, but
> it would help.  Also you could imagine allowing DBAs to assign
> distinct xyz codes to every slave in a given community.

I am not much into any concept of complicating the timeline name to be honest :)

Having a unique identifier per node has value for other purposes, like
clustering, and we would have the same information by adding in the
history file the ID of the node that generated the new timeline.
-- 
Michael