Thread: pg_upgrade and publication/subscription problem
A publication for all tables was running fine, Master is a PostgreSQL 11.11. Replica was running version 13 (don´t remember minor version).
Then we tried to update only subscriber server, nothing was done on master side.
Then we did ...
- installed postgresql-14.
- configured postgresql.conf to be similar to previous.
- on version 13 disabled subscription - alter subscription disable.
- changed both port to run pg_upgrade.
- stop services for both 13 e 14.
- /usr/lib/postgresql/14/bin/pg_upgrade -b /usr/lib/postgresql/13/bin -B /usr/lib/postgresql/14/bin -d /etc/postgresql/13/main/ -D /etc/postgresql/14/main/ -j 2 --link -p 9999 -P 9998 -U postgres -v
- when finished upgrade process, we removed version 13 and ran vacuumdb -p 9998 -U postgres --all --analyze-in-stages
- last step was to enable that subscription.
- just wait for the subscriber to get the data changed, pg_upgrade ran for 15 minutes, this should be synced in a few seconds ...
- few seconds later we remembered that some other tables were created on publication server, so we did a refresh publication.
Then, some minutes later we got lots of log entries "duplicate key value violates unique constraint pk..." because it was trying to COPY that table from master.
We disable subscription again until we solve, as remains.
Selecting from pg_subscription_rel all old tables are with srsubstate i for initialize, not s for synchronized or r for ready, as they should. And all srsublsn of these records were null, so it lost synchronization coordination for all tables which existed before this upgrade process.
So, my first question is, as our publication server continues running, lots of updates were processed, so how can I synchronize both sides without recreating that publication ?
And my second question is, is this problem documented ? Is this problem expected to happen ?
regards,
Marcos
On Thu, Nov 25, 2021 at 5:13 PM Marcos Pegoraro <marcos@f10.com.br> wrote: > > A publication for all tables was running fine, Master is a PostgreSQL 11.11. Replica was running version 13 (don´t rememberminor version). > > Then we tried to update only subscriber server, nothing was done on master side. > > Then we did ... > - installed postgresql-14. > - configured postgresql.conf to be similar to previous. > - on version 13 disabled subscription - alter subscription disable. > - changed both port to run pg_upgrade. > - stop services for both 13 e 14. > - /usr/lib/postgresql/14/bin/pg_upgrade -b /usr/lib/postgresql/13/bin -B /usr/lib/postgresql/14/bin -d /etc/postgresql/13/main/-D /etc/postgresql/14/main/ -j 2 --link -p 9999 -P 9998 -U postgres -v > - when finished upgrade process, we removed version 13 and ran vacuumdb -p 9998 -U postgres --all --analyze-in-stages > - last step was to enable that subscription. > - just wait for the subscriber to get the data changed, pg_upgrade ran for 15 minutes, this should be synced in a few seconds... > - few seconds later we remembered that some other tables were created on publication server, so we did a refresh publication. > > Then, some minutes later we got lots of log entries "duplicate key value violates unique constraint pk..." because it wastrying to COPY that table from master. > > We disable subscription again until we solve, as remains. > > Selecting from pg_subscription_rel all old tables are with srsubstate i for initialize, not s for synchronized or r forready, as they should. And all srsublsn of these records were null, so it lost synchronization coordination for all tableswhich existed before this upgrade process. > The reason is after an upgrade, there won't be any data in pg_subscription_rel, and only when you tried to refresh it is trying to sync again which leads to the "duplicate key value ..." problem you are seeing. > So, my first question is, as our publication server continues running, lots of updates were processed, so how can I synchronizeboth sides without recreating that publication ? > Don't you want to eventually upgrade the publisher node as well? You can refer to blog [1] for the detailed steps. > And my second question is, is this problem documented ? Is this problem expected to happen ? > Yes, the way you are doing I think it is bound to happen. There is some discussion about why this is happening in email [2]. AFAIK, it is not documented and if so, I think it will be a good idea to document it. [1] - https://elephanttamer.net/?p=58 [2] - https://www.postgresql.org/message-id/CALDaNm2-SRGHK0rqJQu7rGiS4hDAb7Nib5HbojEN5ubaXGs2CA%40mail.gmail.com -- With Regards, Amit Kapila.
The reason is after an upgrade, there won't be any data in
pg_subscription_rel, and only when you tried to refresh it is trying
to sync again which leads to the "duplicate key value ..." problem you
are seeing.
So, is pg_upgrade populating pg_subscription and not pg_subscription_rel ? It is doing 50% of his job ?
Don't you want to eventually upgrade the publisher node as well? You
can refer to blog [1] for the detailed steps.
It is possible but I don´t think changing publisher will solve anything, will ?
Yes, the way you are doing I think it is bound to happen. There is
some discussion about why this is happening in email [2]. AFAIK, it is
not documented and if so, I think it will be a good idea to document
And my problem remains the same, how to solve it ? All records on pg_subscription_rel are initialize with srsubstate null. How can I replay only updates since yesterday. This replication is a auditing database, so I cannot loose all things happened since that pg_upgrade. [1] points me how to upgrade but if I did the wrong way, how to solve that ?
On Thu, Nov 25, 2021 at 8:00 PM Marcos Pegoraro <marcos@f10.com.br> wrote: >> >> Yes, the way you are doing I think it is bound to happen. There is >> some discussion about why this is happening in email [2]. AFAIK, it is >> not documented and if so, I think it will be a good idea to document >> > And my problem remains the same, how to solve it ? All records on pg_subscription_rel are initialize with srsubstate null.How can I replay only updates since yesterday. This replication is a auditing database, so I cannot loose all thingshappened since that pg_upgrade. [1] points me how to upgrade but if I did the wrong way, how to solve that ? > AFAIU the main problem in your case is that you didn't block the write traffic on the publisher side. Let me try to understand the situation. After the upgrade is finished, there are some new tables with data on the publisher, and did old tables have any additional data? Are the contents in pg_replication_origin intact after the upgrade? So, in short, I think what we need to solve is to get the data from new tables and newly performed writes on old tables. I could think of the following two approaches: Approach-1: 1. Drop subscription and Truncate all tables corresponding to subscription. 2. Create a new subscription for the publication. I think this will be quite neat and there would be no risk of data loss but it could be time-consuming since all the data from previous tables needs to be synced again. Approach-2: Here, I am assuming pg_replication_origin is intact. 1. Block new writes on the publisher-side. 2. Disable the existing subscription (say the name of the subscription is old_sub). 3. Drop the existing all tables publication. 4. Create two new publications, one for old tables (old_pub), and one for new tables (new_pub). 5. Create a new subscription corresponding to new_pub. 6. Remove the existing publication from old_sub and add the old_pub. 7. Enable the subscription. 8. Now, perform a refresh on old_sub. The benefit of Approach-1 is that you don't need to change anything on the publisher-side and it has very few steps. OTOH, in Approach-2, we can save the effort/time to re-sync the initial data for old tables but as there are a lot of things to be taken care there is always a chance of mistake and if that happens you might lose some data. In any case, before following any of these, I suggest creating a dummy setup that mimics your original setup, perform the above steps and ensure everything is fine, then only try the same steps in your main setup. -- With Regards, Amit Kapila.
AFAIU the main problem in your case is that you didn't block the write
traffic on the publisher side. Let me try to understand the situation.
After the upgrade is finished, there are some new tables with data on
the publisher, and did old tables have any additional data?
Correct.
Are the contents in pg_replication_origin intact after the upgrade?
Yes
So, in short, I think what we need to solve is to get the data from
new tables and newly performed writes on old tables. I could think of
the following two approaches:
Approach-1:
1. Drop subscription and Truncate all tables corresponding to subscription.
2. Create a new subscription for the publication.
If I drop subscription it will drop WAL ou publication side and I lost all changed data between the starting of pg_upgrade process and now.
My problem is not related with new tables, they will be copied fine because doesn´t exists any record on subscriber.
But old tables had records modified since that pg_upgrade process, that is my problem, only that.
My question remains the same, why pg_subscription_rel was not copied from previous version ?
If pg_upgrade would copy pg_replication_origin (it did) and these pg_subscription_rel (it didn´t) records from version 13 to 14, when I enable subscription it would start copying data from that point on, correct ?
On Fri, Nov 26, 2021 at 5:47 PM Marcos Pegoraro <marcos@f10.com.br> wrote: >> >> AFAIU the main problem in your case is that you didn't block the write >> traffic on the publisher side. Let me try to understand the situation. >> After the upgrade is finished, there are some new tables with data on >> the publisher, and did old tables have any additional data? > > Correct. >> >> >> Are the contents in pg_replication_origin intact after the upgrade? > > Yes >> >> >> So, in short, I think what we need to solve is to get the data from >> new tables and newly performed writes on old tables. I could think of >> the following two approaches: >> >> Approach-1: >> 1. Drop subscription and Truncate all tables corresponding to subscription. >> >> 2. Create a new subscription for the publication. > > If I drop subscription it will drop WAL ou publication side and I lost all changed data between the starting of pg_upgradeprocess and now. > I think you can disable the subscription as well or before dropping disassociate the slot from subscription. > My problem is not related with new tables, they will be copied fine because doesn´t exists any record on subscriber. > But old tables had records modified since that pg_upgrade process, that is my problem, only that. > Yeah, I understand that point. Here, the problem is that both old and new tables belong to the same publication, and you can't refresh some tables from the publication. > My question remains the same, why pg_subscription_rel was not copied from previous version ? > > If pg_upgrade would copy pg_replication_origin (it did) and these pg_subscription_rel (it didn´t) records from version13 to 14, when I enable subscription it would start copying data from that point on, correct ? > I think we don't want to make assumptions about the remote end being the same after the upgrade and we let users reactivate the subscriptions in a suitable way. See [1] (Start reading from "..When dumping logical replication subscriptions..") In your case, if you wouldn't have allowed new tables in the publication then a simple Alter Subscription <sub_name> Refresh Publication with (copy_data = false) would have served the purpose. BTW, just for records, this problem has nothing to do with any changes in PG-14, the same behavior exists in the previous versions as well. [1] - https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.
I think we don't want to make assumptions about the remote end being
the same after the upgrade and we let users reactivate the
subscriptions in a suitable way. See [1] (Start reading from "..When
dumping logical replication subscriptions..") In your case, if you
wouldn't have allowed new tables in the publication then a simple
Alter Subscription <sub_name> Refresh Publication with (copy_data =
false) would have served the purpose.
I understand that this is not related with version 14, pg_upgrade would do the same steps on previous versions too.
Additionally it would be interesting to document that pg_upgrade does not upgrade completely if the server is a subscriber of logical replication, so it´ll have pre and post steps to do if the server has this kind of replication.
On Fri, Nov 26, 2021 at 5:47 PM Marcos Pegoraro <marcos@f10.com.br> wrote: >> >> So, in short, I think what we need to solve is to get the data from >> new tables and newly performed writes on old tables. I could think of >> the following two approaches: >> >> Approach-1: >> 1. Drop subscription and Truncate all tables corresponding to subscription. >> >> 2. Create a new subscription for the publication. > > If I drop subscription it will drop WAL ou publication side and I lost all changed data between the starting of pg_upgradeprocess and now. > On thinking about this point again, it is not clear to me why that would matter for this particular use case? Basically, when you create a new subscription, it should copy the entire existing data from the table directly and then will decode changes from WAL. So, I think in your case all the changes between pg_upgrade and now should be directly copied from tables, so probably older WAL won't be required. -- With Regards, Amit Kapila.
On thinking about this point again, it is not clear to me why that
would matter for this particular use case? Basically, when you create
a new subscription, it should copy the entire existing data from the
table directly and then will decode changes from WAL. So, I think in
your case all the changes between pg_upgrade and now should be
directly copied from tables, so probably older WAL won't be required.
Maybe you did not understand
Production server cannot stop while I upgrade my subscriber server, so it will be creating WAL continuously.
Subscriber server has trigger functions for auditing on all tables, something like ...
insert into auditable(schemaname, tablename, primarykey, operation, olddata, newdata) values(tg_table_schema, tg_table_name, getpk(new), tg_op, row_to_json(old), row_to_json(new))
Then, all changes between pg_upgrade and now will not be inserted into auditable.
On Mon, Nov 29, 2021 at 5:04 PM Marcos Pegoraro <marcos@f10.com.br> wrote: >> >> On thinking about this point again, it is not clear to me why that >> would matter for this particular use case? Basically, when you create >> a new subscription, it should copy the entire existing data from the >> table directly and then will decode changes from WAL. So, I think in >> your case all the changes between pg_upgrade and now should be >> directly copied from tables, so probably older WAL won't be required. > > > Maybe you did not understand > Yeah, because some information like trigger functions was not there in your previous emails. -- With Regards, Amit Kapila.
Sorry, I didn´t explain exactly what I was doing, I just wrote "This replication is a auditing database" on my second email.
Atenciosamente,
Em seg., 29 de nov. de 2021 às 09:20, Amit Kapila <amit.kapila16@gmail.com> escreveu:
On Mon, Nov 29, 2021 at 5:04 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> On thinking about this point again, it is not clear to me why that
>> would matter for this particular use case? Basically, when you create
>> a new subscription, it should copy the entire existing data from the
>> table directly and then will decode changes from WAL. So, I think in
>> your case all the changes between pg_upgrade and now should be
>> directly copied from tables, so probably older WAL won't be required.
>
>
> Maybe you did not understand
>
Yeah, because some information like trigger functions was not there in
your previous emails.
--
With Regards,
Amit Kapila.