Thread: pg_upgrade and publication/subscription problem

pg_upgrade and publication/subscription problem

From
Marcos Pegoraro
Date:
A publication for all tables was running fine, Master is a PostgreSQL 11.11. Replica was running version 13 (don´t remember minor version).

Then we tried to update only subscriber server, nothing was done on master side. 

Then we did ...
- installed postgresql-14.
- configured postgresql.conf to be similar to previous.
- on version 13 disabled subscription - alter subscription disable.
- changed both port to run pg_upgrade.
- stop services for both 13 e 14.
/usr/lib/postgresql/14/bin/pg_upgrade -b /usr/lib/postgresql/13/bin -B /usr/lib/postgresql/14/bin -d /etc/postgresql/13/main/ -D /etc/postgresql/14/main/ -j 2 --link -p 9999 -P 9998 -U postgres -v
- when finished upgrade process, we removed version 13 and ran vacuumdb -p 9998 -U postgres --all --analyze-in-stages
- last step was to enable that subscription. 
- just wait for the subscriber to get the data changed, pg_upgrade ran for 15 minutes, this should be synced in a few seconds ...
- few seconds later we remembered that some other tables were created on publication server, so we did a refresh publication.

Then, some minutes later we got lots of log entries "duplicate key value violates unique constraint pk..." because it was trying to COPY that table from master.

We disable subscription again until we solve, as remains. 

Selecting from pg_subscription_rel all old tables are with srsubstate i for initialize, not s for synchronized or r for ready, as they should. And all srsublsn of these records were null, so it lost synchronization coordination for all tables which existed before this upgrade process.

So, my first question is, as our publication server continues running, lots of updates were processed, so how can I synchronize both sides without recreating that publication ? 
And my second question is, is this problem documented ? Is this problem expected to happen ?

regards,
Marcos

Re: pg_upgrade and publication/subscription problem

From
Amit Kapila
Date:
On Thu, Nov 25, 2021 at 5:13 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> A publication for all tables was running fine, Master is a PostgreSQL 11.11. Replica was running version 13 (don´t
rememberminor version). 
>
> Then we tried to update only subscriber server, nothing was done on master side.
>
> Then we did ...
> - installed postgresql-14.
> - configured postgresql.conf to be similar to previous.
> - on version 13 disabled subscription - alter subscription disable.
> - changed both port to run pg_upgrade.
> - stop services for both 13 e 14.
> - /usr/lib/postgresql/14/bin/pg_upgrade -b /usr/lib/postgresql/13/bin -B /usr/lib/postgresql/14/bin -d
/etc/postgresql/13/main/-D /etc/postgresql/14/main/ -j 2 --link -p 9999 -P 9998 -U postgres -v 
> - when finished upgrade process, we removed version 13 and ran vacuumdb -p 9998 -U postgres --all --analyze-in-stages
> - last step was to enable that subscription.
> - just wait for the subscriber to get the data changed, pg_upgrade ran for 15 minutes, this should be synced in a few
seconds... 
> - few seconds later we remembered that some other tables were created on publication server, so we did a refresh
publication.
>
> Then, some minutes later we got lots of log entries "duplicate key value violates unique constraint pk..." because it
wastrying to COPY that table from master. 
>
> We disable subscription again until we solve, as remains.
>
> Selecting from pg_subscription_rel all old tables are with srsubstate i for initialize, not s for synchronized or r
forready, as they should. And all srsublsn of these records were null, so it lost synchronization coordination for all
tableswhich existed before this upgrade process. 
>

The reason is after an upgrade, there won't be any data in
pg_subscription_rel, and only when you tried to refresh it is trying
to sync again which leads to the "duplicate key value ..." problem you
are seeing.

> So, my first question is, as our publication server continues running, lots of updates were processed, so how can I
synchronizeboth sides without recreating that publication ? 
>

Don't you want to eventually upgrade the publisher node as well? You
can refer to blog [1] for the detailed steps.

> And my second question is, is this problem documented ? Is this problem expected to happen ?
>

Yes, the way you are doing I think it is bound to happen. There is
some discussion about why this is happening in email [2]. AFAIK, it is
not documented and if so, I think it will be a good idea to document
it.

[1] - https://elephanttamer.net/?p=58
[2] - https://www.postgresql.org/message-id/CALDaNm2-SRGHK0rqJQu7rGiS4hDAb7Nib5HbojEN5ubaXGs2CA%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: pg_upgrade and publication/subscription problem

From
Marcos Pegoraro
Date:
The reason is after an upgrade, there won't be any data in
pg_subscription_rel, and only when you tried to refresh it is trying
to sync again which leads to the "duplicate key value ..." problem you
are seeing.

So, is pg_upgrade populating pg_subscription and not pg_subscription_rel ? It is doing 50% of his job ?

Don't you want to eventually upgrade the publisher node as well? You
can refer to blog [1] for the detailed steps.

It is possible but I don´t think changing publisher will solve anything, will ?

Yes, the way you are doing I think it is bound to happen. There is
some discussion about why this is happening in email [2]. AFAIK, it is
not documented and if so, I think it will be a good idea to document

And my problem remains the same, how to solve it ? All records on pg_subscription_rel are initialize with srsubstate null. How can I replay only updates since yesterday. This replication is a auditing database, so I cannot loose all things happened since that pg_upgrade. [1] points me how to upgrade but if I did the wrong way, how to solve that ?

Re: pg_upgrade and publication/subscription problem

From
Amit Kapila
Date:
On Thu, Nov 25, 2021 at 8:00 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> Yes, the way you are doing I think it is bound to happen. There is
>> some discussion about why this is happening in email [2]. AFAIK, it is
>> not documented and if so, I think it will be a good idea to document
>>
> And my problem remains the same, how to solve it ? All records on pg_subscription_rel are initialize with srsubstate
null.How can I replay only updates since yesterday. This replication is a auditing database, so I cannot loose all
thingshappened since that pg_upgrade. [1] points me how to upgrade but if I did the wrong way, how to solve that ? 
>

AFAIU the main problem in your case is that you didn't block the write
traffic on the publisher side. Let me try to understand the situation.
After the upgrade is finished, there are some new tables with data on
the publisher, and did old tables have any additional data?

Are the contents in pg_replication_origin intact after the upgrade?

So, in short, I think what we need to solve is to get the data from
new tables and newly performed writes on old tables. I could think of
the following two approaches:

Approach-1:
1. Drop subscription and Truncate all tables corresponding to subscription.
2. Create a new subscription for the publication.

I think this will be quite neat and there would be no risk of data
loss but it could be time-consuming since all the data from previous
tables needs to be synced again.

Approach-2:
Here, I am assuming pg_replication_origin is intact.
1. Block new writes on the publisher-side.
2. Disable the existing subscription (say the name of the subscription
is old_sub).
3. Drop the existing all tables publication.
4. Create two new publications, one for old tables (old_pub), and one
for new tables (new_pub).
5. Create a new subscription corresponding to new_pub.
6. Remove the existing publication from old_sub and add the old_pub.
7. Enable the subscription.
8. Now, perform a refresh on old_sub.

The benefit of Approach-1 is that you don't need to change anything on
the publisher-side and it has very few steps. OTOH, in Approach-2, we
can save the effort/time to re-sync the initial data for old tables
but as there are a lot of things to be taken care there is always a
chance of mistake and if that happens you might lose some data.

In any case, before following any of these, I suggest creating a dummy
setup that mimics your original setup, perform the above steps and
ensure everything is fine, then only try the same steps in your main
setup.

--
With Regards,
Amit Kapila.



Re: pg_upgrade and publication/subscription problem

From
Marcos Pegoraro
Date:
AFAIU the main problem in your case is that you didn't block the write
traffic on the publisher side. Let me try to understand the situation.
After the upgrade is finished, there are some new tables with data on
the publisher, and did old tables have any additional data?
Correct.

Are the contents in pg_replication_origin intact after the upgrade?
Yes 

So, in short, I think what we need to solve is to get the data from
new tables and newly performed writes on old tables. I could think of
the following two approaches:

Approach-1:
1. Drop subscription and Truncate all tables corresponding to subscription.
2. Create a new subscription for the publication.
If I drop subscription it will drop WAL ou publication side and I lost all changed data between the starting of pg_upgrade process and now.
My problem is not related with new tables, they will be copied fine because doesn´t exists any record on subscriber.
But old tables had records modified since that pg_upgrade process, that is my problem, only that.

My question remains the same, why pg_subscription_rel was not copied from previous version ?

If pg_upgrade would copy pg_replication_origin (it did) and these pg_subscription_rel (it didn´t) records from version 13 to 14, when I enable subscription it would start copying data from that point on, correct ?

Re: pg_upgrade and publication/subscription problem

From
Amit Kapila
Date:
On Fri, Nov 26, 2021 at 5:47 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> AFAIU the main problem in your case is that you didn't block the write
>> traffic on the publisher side. Let me try to understand the situation.
>> After the upgrade is finished, there are some new tables with data on
>> the publisher, and did old tables have any additional data?
>
> Correct.
>>
>>
>> Are the contents in pg_replication_origin intact after the upgrade?
>
> Yes
>>
>>
>> So, in short, I think what we need to solve is to get the data from
>> new tables and newly performed writes on old tables. I could think of
>> the following two approaches:
>>
>> Approach-1:
>> 1. Drop subscription and Truncate all tables corresponding to subscription.
>>
>> 2. Create a new subscription for the publication.
>
> If I drop subscription it will drop WAL ou publication side and I lost all changed data between the starting of
pg_upgradeprocess and now. 
>

I think you can disable the subscription as well or before dropping
disassociate the slot from subscription.

> My problem is not related with new tables, they will be copied fine because doesn´t exists any record on subscriber.
> But old tables had records modified since that pg_upgrade process, that is my problem, only that.
>

Yeah, I understand that point. Here, the problem is that both old and
new tables belong to the same publication, and you can't refresh some
tables from the publication.

> My question remains the same, why pg_subscription_rel was not copied from previous version ?
>
> If pg_upgrade would copy pg_replication_origin (it did) and these pg_subscription_rel (it didn´t) records from
version13 to 14, when I enable subscription it would start copying data from that point on, correct ? 
>

I think we don't want to make assumptions about the remote end being
the same after the upgrade and we let users reactivate the
subscriptions in a suitable way. See [1] (Start reading from "..When
dumping logical replication subscriptions..") In your case, if you
wouldn't have allowed new tables in the publication then a simple
Alter Subscription <sub_name> Refresh Publication with (copy_data =
false) would have served the purpose.

BTW, just for records, this problem has nothing to do with any changes
in PG-14, the same behavior exists in the previous versions as well.

[1] - https://www.postgresql.org/docs/devel/app-pgdump.html
--
With Regards,
Amit Kapila.



Re: pg_upgrade and publication/subscription problem

From
Marcos Pegoraro
Date:
I think we don't want to make assumptions about the remote end being
the same after the upgrade and we let users reactivate the
subscriptions in a suitable way. See [1] (Start reading from "..When
dumping logical replication subscriptions..") In your case, if you
wouldn't have allowed new tables in the publication then a simple
Alter Subscription <sub_name> Refresh Publication with (copy_data =
false) would have served the purpose.
 
I understand that this is not related with version 14, pg_upgrade would do the same steps on previous versions too.
Additionally it would be interesting to document that pg_upgrade does not upgrade completely if the server is a subscriber of logical replication, so it´ll have pre and post steps to do if the server has this kind of replication.

Re: pg_upgrade and publication/subscription problem

From
Amit Kapila
Date:
On Fri, Nov 26, 2021 at 5:47 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> So, in short, I think what we need to solve is to get the data from
>> new tables and newly performed writes on old tables. I could think of
>> the following two approaches:
>>
>> Approach-1:
>> 1. Drop subscription and Truncate all tables corresponding to subscription.
>>
>> 2. Create a new subscription for the publication.
>
> If I drop subscription it will drop WAL ou publication side and I lost all changed data between the starting of
pg_upgradeprocess and now.
 
>

On thinking about this point again, it is not clear to me why that
would matter for this particular use case? Basically, when you create
a new subscription, it should copy the entire existing data from the
table directly and then will decode changes from WAL. So, I think in
your case all the changes between pg_upgrade and now should be
directly copied from tables, so probably older WAL won't be required.

-- 
With Regards,
Amit Kapila.



Re: pg_upgrade and publication/subscription problem

From
Marcos Pegoraro
Date:
On thinking about this point again, it is not clear to me why that
would matter for this particular use case? Basically, when you create
a new subscription, it should copy the entire existing data from the
table directly and then will decode changes from WAL. So, I think in
your case all the changes between pg_upgrade and now should be
directly copied from tables, so probably older WAL won't be required.

Maybe you did not understand
Production server cannot stop while I  upgrade my subscriber server, so it will be creating WAL continuously.

Subscriber server has trigger functions for auditing on all tables, something like ...
insert into auditable(schemaname, tablename, primarykey, operation, olddata, newdata) values(tg_table_schema, tg_table_name, getpk(new), tg_op, row_to_json(old), row_to_json(new))

Then, all changes between pg_upgrade and now will not be inserted into auditable.

Re: pg_upgrade and publication/subscription problem

From
Amit Kapila
Date:
On Mon, Nov 29, 2021 at 5:04 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> On thinking about this point again, it is not clear to me why that
>> would matter for this particular use case? Basically, when you create
>> a new subscription, it should copy the entire existing data from the
>> table directly and then will decode changes from WAL. So, I think in
>> your case all the changes between pg_upgrade and now should be
>> directly copied from tables, so probably older WAL won't be required.
>
>
> Maybe you did not understand
>

Yeah, because some information like trigger functions was not there in
your previous emails.

-- 
With Regards,
Amit Kapila.



Re: pg_upgrade and publication/subscription problem

From
Marcos Pegoraro
Date:
Sorry, I didn´t explain exactly what I was doing, I just wrote "This replication is a auditing database" on my second email.

Atenciosamente, 




Em seg., 29 de nov. de 2021 às 09:20, Amit Kapila <amit.kapila16@gmail.com> escreveu:
On Mon, Nov 29, 2021 at 5:04 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> On thinking about this point again, it is not clear to me why that
>> would matter for this particular use case? Basically, when you create
>> a new subscription, it should copy the entire existing data from the
>> table directly and then will decode changes from WAL. So, I think in
>> your case all the changes between pg_upgrade and now should be
>> directly copied from tables, so probably older WAL won't be required.
>
>
> Maybe you did not understand
>

Yeah, because some information like trigger functions was not there in
your previous emails.

--
With Regards,
Amit Kapila.