Thread: pg_basebackup from 9.4-bdr to 9.4 results in corrupt index
Hello,
I'm working with a 9.4-bdr cluster and want to move away from BDR tech all together. So my idea was to follow instructions on http://bdr-project.org/docs/stable/ to first strip the node from BDR making it into "regular" node and then moving the data from this node to official 9.4 instance.
Environment:
BDR_Node_1
BDR_Node_2
Regular_Node_1
Steps done on BDR_Node_2
1. Part node from BDR cluster http://bdr-project.org/docs/stable/node-management-removing.html
2. Remove BDR from node http://bdr-project.org/docs/stable/node-management-disabling.html
3. Create SR slot select pg_create_physical_replication_slot('new_cluster_node');
After this. Steps done on Regular_Node_1, which is 9.4.22
1. Init the new node with pg_basebackup with source being BDR_Node_1
2. Update recovery.conf and postgres.conf
3. Start it with pg_ctl
All seems to be working well, but for this:
select * from information_schema.sequences;
produces "[XX000] ERROR: cache lookup failed for index 899807" for all of the DBs that were configured with BDR with various OID.
I can not reproduce this error, when running query on any of the DBs on 9.4-bdr node.
Does anyone have any idea what may be causing this ?
Due to this I'm also unable to run pg_dump on new_cluster_node - it also results in the same error.
That said. For more context - If it worked, next steps would have been to
1. Deactivate virtual IP on BDR_Node_2
2. Check for no replication lag on BDR_Node_2 slot "new_cluster_node"
3. Promote Regular_Node_1
4. Activate virtual IP on BDR_Node_2
Build the new 9.4 cluster from there on.
Any ideas / comments will be much appreciated!
Thank you in advance.
Best regards, Janis Puris
On 5/25/19 11:49 AM, Jānis Pūris wrote:
1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
body{font-family:Helvetica,Arial;font-size:13px}Hello, I'm working with a 9.4-bdr cluster and want to move away from BDR tech all together. So my idea was to follow instructions on http://bdr-project.org/docs/stable/ to first strip the node from BDR making it into "regular" node and then moving the data from this node to official 9.4 instance.Environment:BDR_Node_1BDR_Node_2Regular_Node_1Steps done on BDR_Node_21. Part node from BDR cluster http://bdr-project.org/docs/stable/node-management-removing.html2. Remove BDR from node http://bdr-project.org/docs/stable/node-management-disabling.html3. Create SR slot select pg_create_physical_replication_slot('new_cluster_node');After this. Steps done on Regular_Node_1, which is 9.4.221. Init the new node with pg_basebackup with source being BDR_Node_12. Update recovery.conf and postgres.conf3. Start it with pg_ctlAll seems to be working well, but for this:select * from information_schema.sequences;produces "[XX000] ERROR: cache lookup failed for index 899807" for all of the DBs that were configured with BDR with various OID.I can not reproduce this error, when running query on any of the DBs on 9.4-bdr node.Does anyone have any idea what may be causing this ?Due to this I'm also unable to run pg_dump on new_cluster_node - it also results in the same error.That said. For more context - If it worked, next steps would have been to1. Deactivate virtual IP on BDR_Node_22. Check for no replication lag on BDR_Node_2 slot "new_cluster_node"3. Promote Regular_Node_14. Activate virtual IP on BDR_Node_2Build the new 9.4 cluster from there on.Any ideas / comments will be much appreciated!
1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 5/25/19 9:49 AM, Jānis Pūris wrote: > Hello, > > I'm working with a 9.4-bdr cluster and want to move away from BDR tech > all together. So my idea was to follow instructions on > http://bdr-project.org/docs/stable/ to first strip the node from BDR > making it into "regular" node and then moving the data from this node to > official 9.4 instance. > > Environment: > > BDR_Node_1 > BDR_Node_2 > > Regular_Node_1 > > Steps done on BDR_Node_2 > 1. Part node from BDR cluster > http://bdr-project.org/docs/stable/node-management-removing.html > 2. Remove BDR from node > http://bdr-project.org/docs/stable/node-management-disabling.html > 3. Create SR slot select > pg_create_physical_replication_slot('new_cluster_node'); > > After this. Steps done on Regular_Node_1, which is 9.4.22 > 1. Init the new node with pg_basebackup with source being BDR_Node_1 I am not clear about above: 1) You removed BDR_Node_2 from cluster 2) You took pg_basebackup from BDR_Node_1 to create a new regular cluster. 3) 1) & 2) seem to be at odds with each other. > 2. Update recovery.conf and postgres.conf > 3. Start it with pg_ctl > > All seems to be working well, but for this: > > select * from information_schema.sequences; > > produces "[XX000] ERROR: cache lookup failed for index 899807" for all > of the DBs that were configured with BDR with various OID. > > I can not reproduce this error, when running query on any of the DBs on > 9.4-bdr node. > > Does anyone have any idea what may be causing this ? > Due to this I'm also unable to run pg_dump on new_cluster_node - it also > results in the same error. > > That said. For more context - If it worked, next steps would have been to > 1. Deactivate virtual IP on BDR_Node_2 > 2. Check for no replication lag on BDR_Node_2 slot "new_cluster_node" > 3. Promote Regular_Node_1 > 4. Activate virtual IP on BDR_Node_2 > > Build the new 9.4 cluster from there on. > > Any ideas / comments will be much appreciated! > > Thank you in advance. > Best regards, Janis Puris > -- Adrian Klaver adrian.klaver@aklaver.com
Hi, Adrian
Apologies, it is a typo.
pg_basebackup was taken from BDR_Node_2, not BDR_Node_1
Thank you in advance.
Best regards, Janis Puris
On 25 May 2019 at 19:27:42, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
I am not clear about above:
1) You removed BDR_Node_2 from cluster
2) You took pg_basebackup from BDR_Node_1 to create a new regular cluster.
3) 1) & 2) seem to be at odds with each other.
Hi Ron,
I can not reproduce this error on BDR_Node_2 (it is not BDR_Node_1 as stated before. Typo)
I've been successful in transferring the data with pg_dump on BDR_Node_2 and then restoring it on Regular_Node_1. Then running "select * from information_schema.sequences;" all is OK.
The problem with this approach is that I'm required to have minimal downtime in this transition and we have a lot of data to transfer, which would be lengthy process.
Thank you in advance.
Best regards, Janis Puris
On 25 May 2019 at 19:16:11, Ron (ronljohnsonjr@gmail.com) wrote:
1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
On 5/25/19 11:46 AM, Jānis Pūris wrote: > Hi Ron, > > I can not reproduce this error on BDR_Node_2 (it is not BDR_Node_1 as > stated before. Typo) > > I've been successful in transferring the data with pg_dump on BDR_Node_2 > and then restoring it on Regular_Node_1. Then running "select * from > information_schema.sequences;" all is OK. So the issue is that a binary backup/restore via pg_basebackup fails but a logical backup/restore via pg_dump/pg_restore works, correct? You might want to take a look here: https://github.com/2ndQuadrant/bdr/issues/140 It might make sense to you, it does not to me. Looks to me something is being done on the binary level that makes this difficult. I guessing you are going to have to talk to the BDR folks. > > The problem with this approach is that I'm required to have minimal > downtime in this transition and we have a lot of data to transfer, which > would be lengthy process. > > Thank you in advance. > Best regards, Janis Puris > > On 25 May 2019 at 19:16:11, Ron (ronljohnsonjr@gmail.com > <mailto:ronljohnsonjr@gmail.com>) wrote: > >> 1. Are you sure that you removed all BDR from the node? >> 2. Is the corruption there in BDR_Node_1? >> 3. Can you rebuild the indexes? -- Adrian Klaver adrian.klaver@aklaver.com
Managed to find following in the announcement of BDR on 9.6:
"BDR has always been an extension, but on 9.4 it required a heavily patched PostgreSQL, one that isn’t fully on-disk-format compatible with stock community PostgreSQL 9.4. The goal all along has been to allow it to run as an extension on an unmodified PostgreSQL … and now we’re there."
Source: https://www.2ndquadrant.com/en/blog/bdr-is-coming-to-postgresql-9-6/
I suppose this answers why logical dump / restore works, but pg_basebackup fails - the stock 9.4 and BDR 9.4 are not compatible on disk level.
On 25 May 2019 at 23:26:26, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
On 5/25/19 11:46 AM, Jānis Pūris wrote:
> Hi Ron,
>
> I can not reproduce this error on BDR_Node_2 (it is not BDR_Node_1 as
> stated before. Typo)
>
> I've been successful in transferring the data with pg_dump on BDR_Node_2
> and then restoring it on Regular_Node_1. Then running "select * from
> information_schema.sequences;" all is OK.
So the issue is that a binary backup/restore via pg_basebackup fails but
a logical backup/restore via pg_dump/pg_restore works, correct?
You might want to take a look here:
https://github.com/2ndQuadrant/bdr/issues/140
It might make sense to you, it does not to me. Looks to me something is
being done on the binary level that makes this difficult. I guessing you
are going to have to talk to the BDR folks.
>
> The problem with this approach is that I'm required to have minimal
> downtime in this transition and we have a lot of data to transfer, which
> would be lengthy process.
>
> Thank you in advance.
> Best regards, Janis Puris
>
> On 25 May 2019 at 19:16:11, Ron (ronljohnsonjr@gmail.com
> <mailto:ronljohnsonjr@gmail.com>) wrote:
>
>> 1. Are you sure that you removed all BDR from the node?
>> 2. Is the corruption there in BDR_Node_1?
>> 3. Can you rebuild the indexes?
--
Adrian Klaver
adrian.klaver@aklaver.com