Thread: Single transaction in the tablesync worker?
The tablesync worker in logical replication performs the table data sync in a single transaction which means it will copy the initial data and then catch up with apply worker in the same transaction. There is a comment in LogicalRepSyncTableStart ("We want to do the table data sync in a single transaction.") saying so but I can't find the concrete theory behind the same. Is there any fundamental problem if we commit the transaction after initial copy and slot creation in LogicalRepSyncTableStart and then allow the apply of transactions as it happens in apply worker? I have tried doing so in the attached (a quick prototype to test) and didn't find any problems with regression tests. I have tried a few manual tests as well to see if it works and didn't find any problem. Now, it is quite possible that it is mandatory to do the way we are doing currently, or maybe something else is required to remove this requirement but I think we can do better with respect to comments in this area. The reason why I am looking into this area is to support the logical decoding of prepared transactions. See the problem [1] reported by Peter Smith. Basically, when we stream prepared transactions in the tablesync worker, it will simply commit the same due to the requirement of maintaining a single transaction for the entire duration of copy and streaming of transactions. Now, we can fix that problem by disabling the decoding of prepared xacts in tablesync worker. But that will arise to a different kind of problems like the prepare will not be sent by the publisher but a later commit might move lsn to a later step which will allow it to catch up till the apply worker. So, now the prepared transaction will be skipped by both tablesync and apply worker. I think apart from unblocking the development of 'logical decoding of prepared xacts', it will make the code consistent between apply and tablesync worker and reduce the chances of future bugs in this area. Basically, it will reduce the checks related to am_tablesync_worker() at various places in the code. I see that this code is added as part of commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920 (Logical replication support for initial data copy). Thoughts? [1] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com -- With Regards, Amit Kapila.
Attachment
On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The tablesync worker in logical replication performs the table data > sync in a single transaction which means it will copy the initial data > and then catch up with apply worker in the same transaction. There is > a comment in LogicalRepSyncTableStart ("We want to do the table data > sync in a single transaction.") saying so but I can't find the > concrete theory behind the same. Is there any fundamental problem if > we commit the transaction after initial copy and slot creation in > LogicalRepSyncTableStart and then allow the apply of transactions as > it happens in apply worker? I have tried doing so in the attached (a > quick prototype to test) and didn't find any problems with regression > tests. I have tried a few manual tests as well to see if it works and > didn't find any problem. Now, it is quite possible that it is > mandatory to do the way we are doing currently, or maybe something > else is required to remove this requirement but I think we can do > better with respect to comments in this area. If we commit the initial copy, the data upto the initial copy's snapshot will be visible downstream. If we apply the changes by committing changes per transaction, the data visible to the other transactions will differ as the apply progresses. You haven't clarified whether we will respect the transaction boundaries in the apply log or not. I assume we will. Whereas if we apply all the changes in one go, other transactions either see the data before resync or after it without any intermediate states. That will not violate consistency, I think. That's all I can think of as the reason behind doing a whole resync as a single transaction. -- Best Wishes, Ashutosh Bapat
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > The tablesync worker in logical replication performs the table data > > sync in a single transaction which means it will copy the initial data > > and then catch up with apply worker in the same transaction. There is > > a comment in LogicalRepSyncTableStart ("We want to do the table data > > sync in a single transaction.") saying so but I can't find the > > concrete theory behind the same. Is there any fundamental problem if > > we commit the transaction after initial copy and slot creation in > > LogicalRepSyncTableStart and then allow the apply of transactions as > > it happens in apply worker? I have tried doing so in the attached (a > > quick prototype to test) and didn't find any problems with regression > > tests. I have tried a few manual tests as well to see if it works and > > didn't find any problem. Now, it is quite possible that it is > > mandatory to do the way we are doing currently, or maybe something > > else is required to remove this requirement but I think we can do > > better with respect to comments in this area. > > If we commit the initial copy, the data upto the initial copy's > snapshot will be visible downstream. If we apply the changes by > committing changes per transaction, the data visible to the other > transactions will differ as the apply progresses. > It is not clear what you mean by the above. The way you have written appears that you are saying that instead of copying the initial data, I am saying to copy it transaction-by-transaction. But that is not the case. I am saying copy the initial data by using REPEATABLE READ isolation level as we are doing now, commit it and then process transaction-by-transaction till we reach sync-point (point till where apply worker has already received the data). > You haven't > clarified whether we will respect the transaction boundaries in the > apply log or not. I assume we will. > It will be transaction-by-transaction. > Whereas if we apply all the > changes in one go, other transactions either see the data before > resync or after it without any intermediate states. > What is the problem even if the user is able to see the data after the initial copy? > That will not > violate consistency, I think. > I am not sure how consistency will be broken. > That's all I can think of as the reason behind doing a whole resync as > a single transaction. > Thanks for sharing your thoughts. -- With Regards, Amit Kapila.
On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > Is there any fundamental problem if > we commit the transaction after initial copy and slot creation in > LogicalRepSyncTableStart and then allow the apply of transactions as > it happens in apply worker? No fundamental problem. Both approaches are fine. Committing the initial copy then doing the rest in individual txns means an incomplete sync state for the table becomes visible, which may not be ideal. Ideally we'd do something like sync the data into a clone of the table then swap the table relfilenodes out once we're synced up. IMO the main advantage of committing as we go is that it would let us use a non-temporary slot and support recovering an incomplete sync and finishing it after interruption by connection loss, crash, etc. That would be advantageous for big table syncs or where the sync has lots of lag to replay. But it means we have to remember sync states, and give users a way to cancel/abort them. Otherwise forgotten temp slots for syncs will cause a mess on the upstream. It also allows the sync slot to advance, freeing any held upstream resources before the whole sync is done, which is good if the upstream is busy and generating lots of WAL. Finally, committing as we go means we won't exceed the cid increment limit in a single txn. > The reason why I am looking into this area is to support the logical > decoding of prepared transactions. See the problem [1] reported by > Peter Smith. Basically, when we stream prepared transactions in the > tablesync worker, it will simply commit the same due to the > requirement of maintaining a single transaction for the entire > duration of copy and streaming of transactions. Now, we can fix that > problem by disabling the decoding of prepared xacts in tablesync > worker. Tablesync should indeed only receive a txn when the commit arrives, it should not attempt to handle uncommitted prepared xacts. > But that will arise to a different kind of problems like the > prepare will not be sent by the publisher but a later commit might > move lsn to a later step which will allow it to catch up till the > apply worker. So, now the prepared transaction will be skipped by both > tablesync and apply worker. I'm not sure I understand. If what you describe is possible then there's already a bug in prepared xact handling. Prepared xact commit progress should be tracked by commit lsn, not by prepare lsn. Can you set out the ordering of events in more detail? > I think apart from unblocking the development of 'logical decoding of > prepared xacts', it will make the code consistent between apply and > tablesync worker and reduce the chances of future bugs in this area. > Basically, it will reduce the checks related to am_tablesync_worker() > at various places in the code. I think we made similar changes in pglogical to switch to applying sync work in individual txns.
On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > > On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > The reason why I am looking into this area is to support the logical > > decoding of prepared transactions. See the problem [1] reported by > > Peter Smith. Basically, when we stream prepared transactions in the > > tablesync worker, it will simply commit the same due to the > > requirement of maintaining a single transaction for the entire > > duration of copy and streaming of transactions. Now, we can fix that > > problem by disabling the decoding of prepared xacts in tablesync > > worker. > > Tablesync should indeed only receive a txn when the commit arrives, it > should not attempt to handle uncommitted prepared xacts. > Why? If we go with the approach of the commit as we go for individual transactions in the tablesync worker then this shouldn't be a problem. > > But that will arise to a different kind of problems like the > > prepare will not be sent by the publisher but a later commit might > > move lsn to a later step which will allow it to catch up till the > > apply worker. So, now the prepared transaction will be skipped by both > > tablesync and apply worker. > > I'm not sure I understand. If what you describe is possible then > there's already a bug in prepared xact handling. Prepared xact commit > progress should be tracked by commit lsn, not by prepare lsn. > Oh no, I am talking about commit of some other transaction. > Can you set out the ordering of events in more detail? > Sure. It will be something like where apply worker is ahead of sync worker: Assume t1 has some data which tablesync worker has to first copy. tx1 Begin; Insert into t1.... Prepare Transaction 'foo' tx2 Begin; Insert into t1.... Commit apply worker • tx1: replays - does not apply anything because should_apply_changes_for_rel thinks relation is not ready • tx2: replays - does not apply anything because should_apply_changes_for_rel thinks relation is not ready tablesync worder • tx1: handles: BEGIN - INSERT - PREPARE 'xyz'; (but tablesync gets nothing because say we disable 2-PC for it) • tx2: handles: BEGIN - INSERT - COMMIT; • tablelsync exits Now the situation is that the apply worker has skipped the prepared xact data and tablesync worker has not received it, so not applied it. Next, when we get Commit Prepared for tx1, it will silently commit the prepared transaction without any data being updated. The commit prepared won't error out in subscriber because the prepare would have been successful even though the data is skipped via should_apply_changes_for_rel. > > I think apart from unblocking the development of 'logical decoding of > > prepared xacts', it will make the code consistent between apply and > > tablesync worker and reduce the chances of future bugs in this area. > > Basically, it will reduce the checks related to am_tablesync_worker() > > at various places in the code. > > I think we made similar changes in pglogical to switch to applying > sync work in individual txns. > oh, cool. Did you make some additional changes as you have mentioned in the earlier part of the email? -- With Regards, Amit Kapila.
On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > > On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Is there any fundamental problem if > > we commit the transaction after initial copy and slot creation in > > LogicalRepSyncTableStart and then allow the apply of transactions as > > it happens in apply worker? > > No fundamental problem. Both approaches are fine. Committing the > initial copy then doing the rest in individual txns means an > incomplete sync state for the table becomes visible, which may not be > ideal. Ideally we'd do something like sync the data into a clone of > the table then swap the table relfilenodes out once we're synced up. > > IMO the main advantage of committing as we go is that it would let us > use a non-temporary slot and support recovering an incomplete sync and > finishing it after interruption by connection loss, crash, etc. That > would be advantageous for big table syncs or where the sync has lots > of lag to replay. But it means we have to remember sync states, and > give users a way to cancel/abort them. Otherwise forgotten temp slots > for syncs will cause a mess on the upstream. > > It also allows the sync slot to advance, freeing any held upstream > resources before the whole sync is done, which is good if the upstream > is busy and generating lots of WAL. > > Finally, committing as we go means we won't exceed the cid increment > limit in a single txn. > Yeah, all these are advantages of processing transaction-by-transaction. IIUC, we need to primarily do two things to achieve it, one is to have an additional state in the catalog (say catch up) which will say that the initial copy is done. Then we need to have a permanent slot using which we can track the progress of the slot so that after restart (due to crash, connection break, etc.) we can start from the appropriate position. Apart from the above, I think with the current design of tablesync we can see partial data of transactions because we allow all the tablesync workers to run parallelly. Consider the below scenario: CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); CREATE TABLE mytbl2(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); Tx1 BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 1); INSERT INTO mytbl2(somedata, text) VALUES (1, 1); COMMIT; CREATE PUBLICATION mypublication FOR TABLE mytbl; CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypublication; Tx2 BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 2); INSERT INTO mytbl2(somedata, text) VALUES (1, 2); Commit; Tx3 BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 3); INSERT INTO mytbl2(somedata, text) VALUES (1, 3); Commit; Now, I could see the below results on subscriber: postgres=# select * from mytbl1; id | somedata | text ----+----------+------ (0 rows) postgres=# select * from mytbl2; id | somedata | text ----+----------+------ 1 | 1 | 1 2 | 1 | 2 3 | 1 | 3 (3 rows) Basically, the results for Tx1, Tx2, Tx3 are visible for mytbl2 but not for mytbl1. To reproduce this I have stopped the tablesync workers (via debugger) for mytbl1 and mytbl2 in LogicalRepSyncTableStart before it changes the relstate to SUBREL_STATE_SYNCWAIT. Then allowed Tx2 and Tx3 to be processed by apply worker and then allowed tablesync worker for mytbl2 to proceed. After that, I can see the above state. Now, won't this behavior be considered as transaction inconsistency where partial transaction data or later transaction data is visible? I don't think we can have such a situation on the master (publisher) node or in physical standby. -- With Regards, Amit Kapila.
On Fri, Dec 4, 2020 at 10:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer > <craig.ringer@enterprisedb.com> wrote: > > > > On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Is there any fundamental problem if > > > we commit the transaction after initial copy and slot creation in > > > LogicalRepSyncTableStart and then allow the apply of transactions as > > > it happens in apply worker? > > > > No fundamental problem. Both approaches are fine. Committing the > > initial copy then doing the rest in individual txns means an > > incomplete sync state for the table becomes visible, which may not be > > ideal. Ideally we'd do something like sync the data into a clone of > > the table then swap the table relfilenodes out once we're synced up. > > > > IMO the main advantage of committing as we go is that it would let us > > use a non-temporary slot and support recovering an incomplete sync and > > finishing it after interruption by connection loss, crash, etc. That > > would be advantageous for big table syncs or where the sync has lots > > of lag to replay. But it means we have to remember sync states, and > > give users a way to cancel/abort them. Otherwise forgotten temp slots > > for syncs will cause a mess on the upstream. > > > > It also allows the sync slot to advance, freeing any held upstream > > resources before the whole sync is done, which is good if the upstream > > is busy and generating lots of WAL. > > > > Finally, committing as we go means we won't exceed the cid increment > > limit in a single txn. > > > > Yeah, all these are advantages of processing > transaction-by-transaction. IIUC, we need to primarily do two things > to achieve it, one is to have an additional state in the catalog (say > catch up) which will say that the initial copy is done. Then we need > to have a permanent slot using which we can track the progress of the > slot so that after restart (due to crash, connection break, etc.) we > can start from the appropriate position. > > Apart from the above, I think with the current design of tablesync we > can see partial data of transactions because we allow all the > tablesync workers to run parallelly. Consider the below scenario: > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > CREATE TABLE mytbl2(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > Tx1 > BEGIN; > INSERT INTO mytbl1(somedata, text) VALUES (1, 1); > INSERT INTO mytbl2(somedata, text) VALUES (1, 1); > COMMIT; > > CREATE PUBLICATION mypublication FOR TABLE mytbl; > oops, the above statement should be CREATE PUBLICATION mypublication FOR TABLE mytbl1, mytbl2; -- With Regards, Amit Kapila.
On Fri, Dec 4, 2020 at 10:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer > <craig.ringer@enterprisedb.com> wrote: > > > > On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Is there any fundamental problem if > > > we commit the transaction after initial copy and slot creation in > > > LogicalRepSyncTableStart and then allow the apply of transactions as > > > it happens in apply worker? > > > > No fundamental problem. Both approaches are fine. Committing the > > initial copy then doing the rest in individual txns means an > > incomplete sync state for the table becomes visible, which may not be > > ideal. Ideally we'd do something like sync the data into a clone of > > the table then swap the table relfilenodes out once we're synced up. > > > > IMO the main advantage of committing as we go is that it would let us > > use a non-temporary slot and support recovering an incomplete sync and > > finishing it after interruption by connection loss, crash, etc. That > > would be advantageous for big table syncs or where the sync has lots > > of lag to replay. But it means we have to remember sync states, and > > give users a way to cancel/abort them. Otherwise forgotten temp slots > > for syncs will cause a mess on the upstream. > > > > It also allows the sync slot to advance, freeing any held upstream > > resources before the whole sync is done, which is good if the upstream > > is busy and generating lots of WAL. > > > > Finally, committing as we go means we won't exceed the cid increment > > limit in a single txn. > > > > > Yeah, all these are advantages of processing > transaction-by-transaction. IIUC, we need to primarily do two things > to achieve it, one is to have an additional state in the catalog (say > catch up) which will say that the initial copy is done. Then we need > to have a permanent slot using which we can track the progress of the > slot so that after restart (due to crash, connection break, etc.) we > can start from the appropriate position. > > Apart from the above, I think with the current design of tablesync we > can see partial data of transactions because we allow all the > tablesync workers to run parallelly. Consider the below scenario: > .. .. > > Basically, the results for Tx1, Tx2, Tx3 are visible for mytbl2 but > not for mytbl1. To reproduce this I have stopped the tablesync workers > (via debugger) for mytbl1 and mytbl2 in LogicalRepSyncTableStart > before it changes the relstate to SUBREL_STATE_SYNCWAIT. Then allowed > Tx2 and Tx3 to be processed by apply worker and then allowed tablesync > worker for mytbl2 to proceed. After that, I can see the above state. > > Now, won't this behavior be considered as transaction inconsistency > where partial transaction data or later transaction data is visible? I > don't think we can have such a situation on the master (publisher) > node or in physical standby. > On briefly checking the pglogical code [1], it seems this problem won't be there in pglogical. Because it seems to first copy all the tables (via pglogical_sync_table) in one process and then catch with the apply worker in a transaction-by-transaction manner. Am, I reading it correctly? If so then why we followed a different approach for in-core solution or is it that the pglogical has improved over time but all the improvements can't be implemented in-core because of some missing features? [1] - https://github.com/2ndQuadrant/pglogical -- With Regards, Amit Kapila.
On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > The tablesync worker in logical replication performs the table data > > > sync in a single transaction which means it will copy the initial data > > > and then catch up with apply worker in the same transaction. There is > > > a comment in LogicalRepSyncTableStart ("We want to do the table data > > > sync in a single transaction.") saying so but I can't find the > > > concrete theory behind the same. Is there any fundamental problem if > > > we commit the transaction after initial copy and slot creation in > > > LogicalRepSyncTableStart and then allow the apply of transactions as > > > it happens in apply worker? I have tried doing so in the attached (a > > > quick prototype to test) and didn't find any problems with regression > > > tests. I have tried a few manual tests as well to see if it works and > > > didn't find any problem. Now, it is quite possible that it is > > > mandatory to do the way we are doing currently, or maybe something > > > else is required to remove this requirement but I think we can do > > > better with respect to comments in this area. > > > > If we commit the initial copy, the data upto the initial copy's > > snapshot will be visible downstream. If we apply the changes by > > committing changes per transaction, the data visible to the other > > transactions will differ as the apply progresses. > > > > It is not clear what you mean by the above. The way you have written > appears that you are saying that instead of copying the initial data, > I am saying to copy it transaction-by-transaction. But that is not the > case. I am saying copy the initial data by using REPEATABLE READ > isolation level as we are doing now, commit it and then process > transaction-by-transaction till we reach sync-point (point till where > apply worker has already received the data). Craig in his mail has clarified this. The changes after the initial COPY will be visible before the table sync catches up. > > > You haven't > > clarified whether we will respect the transaction boundaries in the > > apply log or not. I assume we will. > > > > It will be transaction-by-transaction. > > > Whereas if we apply all the > > changes in one go, other transactions either see the data before > > resync or after it without any intermediate states. > > > > What is the problem even if the user is able to see the data after the > initial copy? > > > That will not > > violate consistency, I think. > > > > I am not sure how consistency will be broken. Some of the transactions applied by apply workers may not have been applied by the resync and vice versa. If the intermediate states of table resync worker are visible, this difference in applied transaction will result in loss of consistency if those transactions are changing the table being resynced and some other table in the same transaction. The changes won't be atomically visible. Thinking more about this, this problem exists today for a table being resynced, but at least it's only the table being resynced that is behind the other tables so it's predictable. -- Best Wishes, Ashutosh Bapat
On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > The tablesync worker in logical replication performs the table data > > > > sync in a single transaction which means it will copy the initial data > > > > and then catch up with apply worker in the same transaction. There is > > > > a comment in LogicalRepSyncTableStart ("We want to do the table data > > > > sync in a single transaction.") saying so but I can't find the > > > > concrete theory behind the same. Is there any fundamental problem if > > > > we commit the transaction after initial copy and slot creation in > > > > LogicalRepSyncTableStart and then allow the apply of transactions as > > > > it happens in apply worker? I have tried doing so in the attached (a > > > > quick prototype to test) and didn't find any problems with regression > > > > tests. I have tried a few manual tests as well to see if it works and > > > > didn't find any problem. Now, it is quite possible that it is > > > > mandatory to do the way we are doing currently, or maybe something > > > > else is required to remove this requirement but I think we can do > > > > better with respect to comments in this area. > > > > > > If we commit the initial copy, the data upto the initial copy's > > > snapshot will be visible downstream. If we apply the changes by > > > committing changes per transaction, the data visible to the other > > > transactions will differ as the apply progresses. > > > > > > > It is not clear what you mean by the above. The way you have written > > appears that you are saying that instead of copying the initial data, > > I am saying to copy it transaction-by-transaction. But that is not the > > case. I am saying copy the initial data by using REPEATABLE READ > > isolation level as we are doing now, commit it and then process > > transaction-by-transaction till we reach sync-point (point till where > > apply worker has already received the data). > > Craig in his mail has clarified this. The changes after the initial > COPY will be visible before the table sync catches up. > I think the problem is not that the changes are visible after COPY rather it is that we don't have a mechanism to restart if it crashes after COPY unless we do all the sync up in one transaction. Assume we commit after COPY and then process transaction-by-transaction and it errors out (due to connection loss) or crashes, in-between one of the following transactions after COPY then after the restart we won't know from where to start for that relation. This is because the catalog (pg_subscription_rel) will show the state as 'd' (data is being copied) and the slot would have gone as it was a temporary slot. But as mentioned in one of my emails above [1] we can solve these problems which Craig also seems to be advocating for as there are many advantages of not doing the entire sync (initial copy + stream changes for that relation) in one single transaction. It will allow us to support decode of prepared xacts in the subscriber. Also, it seems pglogical already does processing transaction-by-transaction after the initial copy. The only thing which is not clear to me is why we haven't decided to go ahead initially and it would be probably better if the original authors would also chime-in to at least clarify the same. > > > > > You haven't > > > clarified whether we will respect the transaction boundaries in the > > > apply log or not. I assume we will. > > > > > > > It will be transaction-by-transaction. > > > > > Whereas if we apply all the > > > changes in one go, other transactions either see the data before > > > resync or after it without any intermediate states. > > > > > > > What is the problem even if the user is able to see the data after the > > initial copy? > > > > > That will not > > > violate consistency, I think. > > > > > > > I am not sure how consistency will be broken. > > Some of the transactions applied by apply workers may not have been > applied by the resync and vice versa. If the intermediate states of > table resync worker are visible, this difference in applied > transaction will result in loss of consistency if those transactions > are changing the table being resynced and some other table in the same > transaction. The changes won't be atomically visible. Thinking more > about this, this problem exists today for a table being resynced, but > at least it's only the table being resynced that is behind the other > tables so it's predictable. > Yeah, I have already shown that this problem [1] exists today and it won't be predictable when the number of tables to be synced are more. I am not sure why but it seems acceptable to original authors that the data of transactions are visibly partially during the initial synchronization phase for a subscription. I don't see it documented clearly either. [1] - https://www.postgresql.org/message-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg%40mail.gmail.com -- With Regards, Amit Kapila.
On Sat, 5 Dec 2020, 10:03 Amit Kapila, <amit.kapila16@gmail.com> wrote:
On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > The tablesync worker in logical replication performs the table data
> > > > sync in a single transaction which means it will copy the initial data
> > > > and then catch up with apply worker in the same transaction. There is
> > > > a comment in LogicalRepSyncTableStart ("We want to do the table data
> > > > sync in a single transaction.") saying so but I can't find the
> > > > concrete theory behind the same. Is there any fundamental problem if
> > > > we commit the transaction after initial copy and slot creation in
> > > > LogicalRepSyncTableStart and then allow the apply of transactions as
> > > > it happens in apply worker? I have tried doing so in the attached (a
> > > > quick prototype to test) and didn't find any problems with regression
> > > > tests. I have tried a few manual tests as well to see if it works and
> > > > didn't find any problem. Now, it is quite possible that it is
> > > > mandatory to do the way we are doing currently, or maybe something
> > > > else is required to remove this requirement but I think we can do
> > > > better with respect to comments in this area.
> > >
> > > If we commit the initial copy, the data upto the initial copy's
> > > snapshot will be visible downstream. If we apply the changes by
> > > committing changes per transaction, the data visible to the other
> > > transactions will differ as the apply progresses.
> > >
> >
> > It is not clear what you mean by the above. The way you have written
> > appears that you are saying that instead of copying the initial data,
> > I am saying to copy it transaction-by-transaction. But that is not the
> > case. I am saying copy the initial data by using REPEATABLE READ
> > isolation level as we are doing now, commit it and then process
> > transaction-by-transaction till we reach sync-point (point till where
> > apply worker has already received the data).
>
> Craig in his mail has clarified this. The changes after the initial
> COPY will be visible before the table sync catches up.
>
I think the problem is not that the changes are visible after COPY
rather it is that we don't have a mechanism to restart if it crashes
after COPY unless we do all the sync up in one transaction. Assume we
commit after COPY and then process transaction-by-transaction and it
errors out (due to connection loss) or crashes, in-between one of the
following transactions after COPY then after the restart we won't know
from where to start for that relation. This is because the catalog
(pg_subscription_rel) will show the state as 'd' (data is being
copied) and the slot would have gone as it was a temporary slot. But
as mentioned in one of my emails above [1] we can solve these problems
which Craig also seems to be advocating for as there are many
advantages of not doing the entire sync (initial copy + stream changes
for that relation) in one single transaction. It will allow us to
support decode of prepared xacts in the subscriber. Also, it seems
pglogical already does processing transaction-by-transaction after the
initial copy. The only thing which is not clear to me is why we
haven't decided to go ahead initially and it would be probably better
if the original authors would also chime-in to at least clarify the
same.
It's partly a resource management issue.
Replication origins are a limited resource. We need to use a replication origin for any sync we want to be durable across restarts.
Then again so are slots and we use temp slots for each sync.
If a sync fails cleanup on the upstream side is simple with a temp slot. With persistent slots we have more risk of creating upstream issues. But then, so long as the subscriber exists it can deal with that. And if the subscriber no longer exists its primary slot is an issue too.
It'd help if we could register pg_shdepend entries between catalog entries and slots, and from a main subscription slot to any extra slots used for resynchronization.
And I should write a patch for a resource retention summarisation view.
I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription.
I don't think there's much alternative there.
Pg would need some kind of cross commit visibility control mechanism that separates durable commit from visibility
Hi, I wanted to float another idea to solve these tablesync/apply worker problems. This idea may or may not have merit. Please consider it. ~ Basically, I was wondering why can't the "tablesync" worker just gather messages in a similar way to how the current streaming feature gathers messages into a "changes" file, so that they can be replayed later. e.g. Imagine if A) The "tablesync" worker (after the COPY) does not ever apply any of the incoming messages, but instead it just gobbles them into a "changes" file until it decides it has reached SYNCDONE state and exits. B) Then, when the "apply" worker proceeds, if it detects the existence of the "changes" file it will replay/apply_dispatch all those gobbled messages before just continuing as normal. So - IIUC this kind of replay is like how the current code stream commit applies the streamed "changes" file. - "tablesync" worker would only be doing table sync (COPY) as its name suggests. Any detected "changes" are recorded and left for the "apply" worker to handle. - "tablesync" worker would just operate in single tx with a temporary slot as per current code - Then the "apply" worker would be the *only* worker that actually applies anything. (as its name suggests) Thoughts? --- Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > > On Sat, 5 Dec 2020, 10:03 Amit Kapila, <amit.kapila16@gmail.com> wrote: >> >> On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat >> <ashutosh.bapat.oss@gmail.com> wrote: >> >> I think the problem is not that the changes are visible after COPY >> rather it is that we don't have a mechanism to restart if it crashes >> after COPY unless we do all the sync up in one transaction. Assume we >> commit after COPY and then process transaction-by-transaction and it >> errors out (due to connection loss) or crashes, in-between one of the >> following transactions after COPY then after the restart we won't know >> from where to start for that relation. This is because the catalog >> (pg_subscription_rel) will show the state as 'd' (data is being >> copied) and the slot would have gone as it was a temporary slot. But >> as mentioned in one of my emails above [1] we can solve these problems >> which Craig also seems to be advocating for as there are many >> advantages of not doing the entire sync (initial copy + stream changes >> for that relation) in one single transaction. It will allow us to >> support decode of prepared xacts in the subscriber. Also, it seems >> pglogical already does processing transaction-by-transaction after the >> initial copy. The only thing which is not clear to me is why we >> haven't decided to go ahead initially and it would be probably better >> if the original authors would also chime-in to at least clarify the >> same. > > > It's partly a resource management issue. > > Replication origins are a limited resource. We need to use a replication origin for any sync we want to be durable acrossrestarts. > > Then again so are slots and we use temp slots for each sync. > > If a sync fails cleanup on the upstream side is simple with a temp slot. With persistent slots we have more risk of creatingupstream issues. But then, so long as the subscriber exists it can deal with that. And if the subscriber no longerexists its primary slot is an issue too. > I think if the only issue is slot clean up, then the same exists today for the slot created by the apply worker (or which I think you are referring to as a primary slot). This can only happen if the subscriber goes away without dropping the subscription. Also, if we are worried about using up too many slots then the slots used by tablesync workers will probably be freed sooner. > It'd help if we could register pg_shdepend entries between catalog entries and slots, and from a main subscription slotto any extra slots used for resynchronization. > Which catalog entries you are referring to here? > And I should write a patch for a resource retention summarisation view. > That would be great. >> >> I am not sure why but it seems acceptable to original authors that the >> data of transactions are visibly partially during the initial >> synchronization phase for a subscription. > > > I don't think there's much alternative there. > I am not sure about this. I think it is primarily to allow some more parallelism among apply and sync workers. One primitive way to achieve parallelism and don't have this problem is to allow apply worker to wait till all the tablesync workers are in DONE state. Then we will never have an inconsistency problem or the prepared xact problem. Now, surely if large copies are required for multiple relations then we would delay a bit to replay transactions partially by the apply worker but don't know how much that matters as compared to transaction visibility issue and anyway we would have achieved the maximum parallelism by allowing copy via multiple workers. -- With Regards, Amit Kapila.
On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote:
Basically, I was wondering why can't the "tablesync" worker just
gather messages in a similar way to how the current streaming feature
gathers messages into a "changes" file, so that they can be replayed
later.
See the related thread "Logical archiving"
where I addressed some parts of this topic in detail earlier today.
A) The "tablesync" worker (after the COPY) does not ever apply any of
the incoming messages, but instead it just gobbles them into a
"changes" file until it decides it has reached SYNCDONE state and
exits.
This has a few issues.
Most importantly, the sync worker must cooperate with the main apply worker to achieve a consistent end-of-sync cutover. The sync worker must have replayed the pending changes in order to make this cut-over, because the non-sync apply worker will need to start applying changes on top of the resync'd table potentially as soon as the next transaction it starts applying, so it needs to see the rows there.
Doing this would also add another round of write multiplication since the data would get spooled then applied to WAL then heap. Write multiplication is already an issue for logical replication so adding to it isn't particularly desirable without a really compelling reason. With the write multiplication comes disk space management issues for big transactions as well as the obvious performance/throughput impact.
It adds even more latency between upstream commit and downstream apply, something that is again already an issue for logical replication.
Right now we don't have any concept of a durable and locally flushed spool.
It's not impossible to do as you suggest but the cutover requirement makes it far from simple. As discussed in the logical archiving thread I think it'd be good to have something like this, and there are times the write multiplication price would be well worth paying. But it's not easy.
B) Then, when the "apply" worker proceeds, if it detects the existence
of the "changes" file it will replay/apply_dispatch all those gobbled
messages before just continuing as normal.
That's going to introduce a really big stall in the apply worker's progress in many cases. During that time it won't be receiving from upstream (since we don't spool logical changes to disk at this time) so the upstream lag will grow. That will impact synchronous replication, pg_wal size management, catalog bloat, etc. It'll also leave the upstream logical decoding session idle, so when it resumes it may create a spike of I/O and CPU load as it catches up, as well as a spike of network traffic. And depending on how close the upstream write rate is to the max decode speed, network throughput max, and downstream apply speed max, it may take some time to catch up over the resulting lag.
Not a big fan of that approach.
On Mon, Dec 7, 2020 at 10:02 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > > On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote: >> >> >> Basically, I was wondering why can't the "tablesync" worker just >> gather messages in a similar way to how the current streaming feature >> gathers messages into a "changes" file, so that they can be replayed >> later. >> > > See the related thread "Logical archiving" > > https://www.postgresql.org/message-id/20D9328B-A189-43D1-80E2-EB25B9284AD6@yandex-team.ru > > where I addressed some parts of this topic in detail earlier today. > >> A) The "tablesync" worker (after the COPY) does not ever apply any of >> the incoming messages, but instead it just gobbles them into a >> "changes" file until it decides it has reached SYNCDONE state and >> exits. > > > This has a few issues. > > Most importantly, the sync worker must cooperate with the main apply worker to achieve a consistent end-of-sync cutover. > In this idea, there is no need to change the end-of-sync cutover. It will work as it is now. I am not sure what makes you think so. > The sync worker must have replayed the pending changes in order to make this cut-over, because the non-sync apply workerwill need to start applying changes on top of the resync'd table potentially as soon as the next transaction it startsapplying, so it needs to see the rows there. > The change here would be that the apply worker will check for changes file and if it exists then apply them before it changes the relstate to SUBREL_STATE_READY in process_syncing_tables_for_apply(). So, it will not miss seeing any rows. > Doing this would also add another round of write multiplication since the data would get spooled then applied to WAL thenheap. Write multiplication is already an issue for logical replication so adding to it isn't particularly desirable withouta really compelling reason. > It will solve our problem of allowing decoding of prepared xacts in pgoutput. I have explained the problem above [1]. The other idea which we discussed is to allow having an additional state in pg_subscription_rel, make the slot as permanent in tablesync worker, and then process transaction-by-transaction in apply worker. Does that approach sounds better? Is there any bigger change involved in this approach (making tablesync slot permanent) which I am missing? > With the write multiplication comes disk space management issues for big transactions as well as the obvious performance/throughputimpact. > > It adds even more latency between upstream commit and downstream apply, something that is again already an issue for logicalreplication. > > Right now we don't have any concept of a durable and locally flushed spool. > I think we have a concept quite close to it for writing changes for in-progress xacts as done in PG-14. It is not durable but that shouldn't be a big problem if we allow syncing the changes file. > It's not impossible to do as you suggest but the cutover requirement makes it far from simple. As discussed in the logicalarchiving thread I think it'd be good to have something like this, and there are times the write multiplication pricewould be well worth paying. But it's not easy. > >> B) Then, when the "apply" worker proceeds, if it detects the existence >> of the "changes" file it will replay/apply_dispatch all those gobbled >> messages before just continuing as normal. > > > That's going to introduce a really big stall in the apply worker's progress in many cases. During that time it won't bereceiving from upstream (since we don't spool logical changes to disk at this time) so the upstream lag will grow. Thatwill impact synchronous replication, pg_wal size management, catalog bloat, etc. It'll also leave the upstream logicaldecoding session idle, so when it resumes it may create a spike of I/O and CPU load as it catches up, as well as aspike of network traffic. And depending on how close the upstream write rate is to the max decode speed, network throughputmax, and downstream apply speed max, it may take some time to catch up over the resulting lag. > This is just for the initial tablesync phase. I think it is equivalent to saying that during basebackup, we need to parallelly start physical replication. I agree that sometimes it can take a lot of time to copy large tables but it will be just one time and no worse than the other situations like basebackup. [1] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer > <craig.ringer@enterprisedb.com> wrote: > > > > >> > >> I am not sure why but it seems acceptable to original authors that the > >> data of transactions are visibly partially during the initial > >> synchronization phase for a subscription. > > > > > > I don't think there's much alternative there. > > > > I am not sure about this. I think it is primarily to allow some more > parallelism among apply and sync workers. One primitive way to achieve > parallelism and don't have this problem is to allow apply worker to > wait till all the tablesync workers are in DONE state. > As the slot of apply worker is created before all the tablesync workers it should never miss any LSN which tablesync workers would have processed. Also, the table sync workers should not process any xact if the apply worker has not processed anything. I think tablesync currently always processes one transaction (because we call process_sync_tables at commit of a txn) even if that is not required to be in sync with the apply worker. This should solve both the problems (a) visibility of partial transactions (b) allow prepared transactions because tablesync worker no longer needs to combine multiple transactions data. I think the other advantages of this would be that it would reduce the load (both CPU and I/O) on the publisher-side by allowing to decode the data only once instead of for each table sync worker once and separately for the apply worker. I think it will use fewer resources to finish the work. Is there any flaw in this idea which I am missing? -- With Regards, Amit Kapila.
On Mon, Dec 7, 2020 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer > > <craig.ringer@enterprisedb.com> wrote: > > > > > > > >> > > >> I am not sure why but it seems acceptable to original authors that the > > >> data of transactions are visibly partially during the initial > > >> synchronization phase for a subscription. > > > > > > > > > I don't think there's much alternative there. > > > > > > > I am not sure about this. I think it is primarily to allow some more > > parallelism among apply and sync workers. One primitive way to achieve > > parallelism and don't have this problem is to allow apply worker to > > wait till all the tablesync workers are in DONE state. > > > > As the slot of apply worker is created before all the tablesync > workers it should never miss any LSN which tablesync workers would > have processed. Also, the table sync workers should not process any > xact if the apply worker has not processed anything. I think tablesync > currently always processes one transaction (because we call > process_sync_tables at commit of a txn) even if that is not required > to be in sync with the apply worker. > One more thing to consider here is that currently in tablesync worker, we create a slot with CRS_USE_SNAPSHOT option which creates a transaction snapshot on the publisher, and then we use the same snapshot for a copy from the publisher. After this, when we try to receive the data from the publisher using the same slot, it will be in sync with the COPY. I think to keep the same consistency between COPY and the data we receive from the publisher in this approach, we need to export the snapshot while creating a slot in the apply worker by using CRS_EXPORT_SNAPSHOT and then use the same snapshot by all the tablesync workers doing the copy. In tablesync workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ" to achieve it. That way the COPY will use the same snapshot as is used for receiving the changes in apply worker and the data will be in sync. -- With Regards, Amit Kapila.
On Mon, Dec 7, 2020 at 7:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > As the slot of apply worker is created before all the tablesync > workers it should never miss any LSN which tablesync workers would > have processed. Also, the table sync workers should not process any > xact if the apply worker has not processed anything. I think tablesync > currently always processes one transaction (because we call > process_sync_tables at commit of a txn) even if that is not required > to be in sync with the apply worker. This should solve both the > problems (a) visibility of partial transactions (b) allow prepared > transactions because tablesync worker no longer needs to combine > multiple transactions data. > > I think the other advantages of this would be that it would reduce the > load (both CPU and I/O) on the publisher-side by allowing to decode > the data only once instead of for each table sync worker once and > separately for the apply worker. I think it will use fewer resources > to finish the work. Yes, I observed this same behavior. IIUC the only way for the tablesync worker to go from CATCHUP mode to SYNCDONE is via the call to process_sync_tables. But a side-effect of this is, when messages arrive during this CATCHUP phase one tx will be getting handled by the tablesync worker before the process_sync_tables() is ever encountered. I have created and attached a simple patch which allows the tablesync to detect if there is anything to do *before* it enters the apply main loop. Calling process_sync_tables() before the apply main loop offers a quick way out so the message handling will not be split unnecessarily between the workers. ~ The result of the patch is demonstrated by the following test/logs which are also attached. Note: I added more logging (not in this patch) to make it easier to see what is going on. LOGS1. Current code. Test: 10 x INSERTS done at CATCHUP time. Result: tablesync worker does 1 x INSERT, then apply worker skips 1 and does remaining 9 x INSERTs. LOGS2. Patched code. Test: Same 10 x INSERTS done at CATCHUP time. Result: tablesync can exit early. apply worker handles all 10 x INSERTs LOGS3. Patched code. Test: 2PC PREPARE then COMMIT PREPARED [1] done at CATCHUP time psql -d test_pub -c "BEGIN;INSERT INTO test_tab VALUES(1, 'foo');PREPARE TRANSACTION 'test_prepared_tab';" psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';" Result: The PREPARE and COMMIT PREPARED are both handle by apply worker. This avoids complications which the split otherwise causes. [1] 2PC prepare test requires v29 patch from https://www.postgresql.org/message-id/flat/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3%2BQ%40mail.gmail.com --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Yes, I observed this same behavior. > > IIUC the only way for the tablesync worker to go from CATCHUP mode to > SYNCDONE is via the call to process_sync_tables. > > But a side-effect of this is, when messages arrive during this CATCHUP > phase one tx will be getting handled by the tablesync worker before > the process_sync_tables() is ever encountered. > > I have created and attached a simple patch which allows the tablesync > to detect if there is anything to do *before* it enters the apply main > loop. Calling process_sync_tables() before the apply main loop offers > a quick way out so the message handling will not be split > unnecessarily between the workers. > Yeah, this demonstrates the idea can work but as mentioned in my previous email [1] this needs much more work to make the COPY and then later fetching the changes from the publisher consistently. So, let me summarize the discussion so far. We wanted to enhance the tablesync phase of Logical Replication to enable decoding of prepared transactions [2]. The problem was when we stream prepared transactions in the tablesync worker, it will simply commit the same due to the requirement of maintaining a single transaction for the entire duration of copy and streaming of transactions afterward. We can't simply disable the decoding of prepared xacts for tablesync workers because it can skip some of the prepared xacts forever on subscriber as explained in one of the emails above [3]. Now, while investigating the solutions to enhance tablesync to support decoding at prepare time, I found that due to the current design of tablesync we can see partial data of transactions on subscribers which is also explained in the email above with an example [4]. This problem of visibility is there since the Logical Replication is introduced in PostgreSQL and the only answer I got till now is that there doesn't seem to be any other alternative which I think is not true and I have provided one alternative as well. Next, we have discussed three different solutions all of which will solve the first problem (allow the tablesync worker to decode transactions at prepare time) and one of which solves both the first and second problem (partial transaction data visibility). Solution-1: Allow the table-sync worker to use multiple transactions. The reason for doing it in a single transaction is that if after initial COPY we commit and then crash while streaming changes of other transactions, the state of the table won't be known after the restart as we are using temporary slot so we don't from where to restart syncing the table. IIUC, we need to primarily do two things to achieve multiple transactions, one is to have an additional state in the catalog (say catch up) which will say that the initial copy is done. Then we need to have a permanent slot using which we can track the progress of the slot so that after restart (due to crash, connection break, etc.) we can start from the appropriate position. Now, this will allow us to do less work after recovering from a crash because we will know the restart point. As Craig mentioned, it also allows the sync slot to advance, freeing any held upstream resources before the whole sync is done, which is good if the upstream is busy and generating lots of WAL. Finally, committing as we go means we won't exceed the cid increment limit in a single txn. Solution-2: The next solution we discussed is to make "tablesync" worker just gather messages after COPY in a similar way to how the current streaming of in-progress transaction feature gathers messages into a "changes" file so that they can be replayed later by the apply worker. Now, here as we don't need to replay the individual transactions in tablesync worker in a single transaction, it will allow us to send decode prepared to the subscriber. This has some disadvantages such as each transaction processed by tablesync worker needs to be durably written to file and it can also lead to some apply lag later when we process the same by apply worker. Solution-3: Allow the table-sync workers to just perform initial COPY and then once the COPY is done for all relations the apply worker will stream all the future changes. Now, surely if large copies are required for multiple relations then we would delay a bit to replay transactions partially by the apply worker but don't know how much that matters as compared to transaction visibility issue and anyway we would have achieved the maximum parallelism by allowing copy via multiple workers. This would reduce the load (both CPU and I/O) on the publisher-side by allowing to decode the data only once instead of for each table sync worker once and separately for the apply worker. I think it will use fewer resources to finish the work. Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT option which creates a transaction snapshot on the publisher, and then we use the same snapshot for COPY from the publisher. After this, when we try to receive the data from the publisher using the same slot, it will be in sync with the COPY. I think to keep the same consistency between COPY and the data we receive from the publisher in this approach, we need to export the snapshot while creating a slot in the apply worker by using CRS_EXPORT_SNAPSHOT and then use the same snapshot by all the tablesync workers doing the copy. In tablesync workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported snapshot. That way the COPY will use the same snapshot as is used for receiving the changes in apply worker and the data will be in sync. Then we also need a way to export snapshot while the apply worker is already receiving the changes because users can use 'ALTER SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be synced. I think we need to introduce a new command in exec_replication_command() to export the snapshot from the existing slot and then use it by the new tablesync worker. Among the above three solutions, the first two will solve the first problem (allow the tablesync worker to decode transactions at prepare time) and the third solution will solve both the first and second problem (partial transaction data visibility). The third solution requires quite some redesign of how the Logical Replication work is synchronized between apply and tablesync workers and might turn out to be a bigger implementation effort. I am tentatively thinking to go with a first or second solution at this stage and anyway if later people feel that we need some bigger redesign then we can go with something on the lines of Solution-3. Thoughts? [1] - https://www.postgresql.org/message-id/CAA4eK1%2BQC74wRQmbYT%2BMmOs%3DYbdUjuq0_A9CBbVoQMB1Ryi-OA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com [3] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com [4] - https://www.postgresql.org/message-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, Dec 8, 2020 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Yes, I observed this same behavior. > > > > IIUC the only way for the tablesync worker to go from CATCHUP mode to > > SYNCDONE is via the call to process_sync_tables. > > > > But a side-effect of this is, when messages arrive during this CATCHUP > > phase one tx will be getting handled by the tablesync worker before > > the process_sync_tables() is ever encountered. > > > > I have created and attached a simple patch which allows the tablesync > > to detect if there is anything to do *before* it enters the apply main > > loop. Calling process_sync_tables() before the apply main loop offers > > a quick way out so the message handling will not be split > > unnecessarily between the workers. > > > > Yeah, this demonstrates the idea can work but as mentioned in my > previous email [1] this needs much more work to make the COPY and then > later fetching the changes from the publisher consistently. So, let me > summarize the discussion so far. We wanted to enhance the tablesync > phase of Logical Replication to enable decoding of prepared > transactions [2]. The problem was when we stream prepared transactions > in the tablesync worker, it will simply commit the same due to the > requirement of maintaining a single transaction for the entire > duration of copy and streaming of transactions afterward. We can't > simply disable the decoding of prepared xacts for tablesync workers > because it can skip some of the prepared xacts forever on subscriber > as explained in one of the emails above [3]. Now, while investigating > the solutions to enhance tablesync to support decoding at prepare > time, I found that due to the current design of tablesync we can see > partial data of transactions on subscribers which is also explained in > the email above with an example [4]. This problem of visibility is > there since the Logical Replication is introduced in PostgreSQL and > the only answer I got till now is that there doesn't seem to be any > other alternative which I think is not true and I have provided one > alternative as well. > > Next, we have discussed three different solutions all of which will > solve the first problem (allow the tablesync worker to decode > transactions at prepare time) and one of which solves both the first > and second problem (partial transaction data visibility). > > Solution-1: Allow the table-sync worker to use multiple transactions. > The reason for doing it in a single transaction is that if after > initial COPY we commit and then crash while streaming changes of other > transactions, the state of the table won't be known after the restart > as we are using temporary slot so we don't from where to restart > syncing the table. > > IIUC, we need to primarily do two things to achieve multiple > transactions, one is to have an additional state in the catalog (say > catch up) which will say that the initial copy is done. Then we need > to have a permanent slot using which we can track the progress of the > slot so that after restart (due to crash, connection break, etc.) we > can start from the appropriate position. Now, this will allow us to do > less work after recovering from a crash because we will know the > restart point. As Craig mentioned, it also allows the sync slot to > advance, freeing any held upstream resources before the whole sync is > done, which is good if the upstream is busy and generating lots of > WAL. Finally, committing as we go means we won't exceed the cid > increment limit in a single txn. > > Solution-2: The next solution we discussed is to make "tablesync" > worker just gather messages after COPY in a similar way to how the > current streaming of in-progress transaction feature gathers messages > into a "changes" file so that they can be replayed later by the apply > worker. Now, here as we don't need to replay the individual > transactions in tablesync worker in a single transaction, it will > allow us to send decode prepared to the subscriber. This has some > disadvantages such as each transaction processed by tablesync worker > needs to be durably written to file and it can also lead to some apply > lag later when we process the same by apply worker. > > Solution-3: Allow the table-sync workers to just perform initial COPY > and then once the COPY is done for all relations the apply worker will > stream all the future changes. Now, surely if large copies are > required for multiple relations then we would delay a bit to replay > transactions partially by the apply worker but don't know how much > that matters as compared to transaction visibility issue and anyway we > would have achieved the maximum parallelism by allowing copy via > multiple workers. This would reduce the load (both CPU and I/O) on the > publisher-side by allowing to decode the data only once instead of for > each table sync worker once and separately for the apply worker. I > think it will use fewer resources to finish the work. > > Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT > option which creates a transaction snapshot on the publisher, and then > we use the same snapshot for COPY from the publisher. After this, when > we try to receive the data from the publisher using the same slot, it > will be in sync with the COPY. I think to keep the same consistency > between COPY and the data we receive from the publisher in this > approach, we need to export the snapshot while creating a slot in the > apply worker by using CRS_EXPORT_SNAPSHOT and then use the same > snapshot by all the tablesync workers doing the copy. In tablesync > workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN > READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported > snapshot. That way the COPY will use the same snapshot as is used for > receiving the changes in apply worker and the data will be in sync. > > Then we also need a way to export snapshot while the apply worker is > already receiving the changes because users can use 'ALTER > SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be > synced. I think we need to introduce a new command in > exec_replication_command() to export the snapshot from the existing > slot and then use it by the new tablesync worker. > > > Among the above three solutions, the first two will solve the first > problem (allow the tablesync worker to decode transactions at prepare > time) and the third solution will solve both the first and second > problem (partial transaction data visibility). The third solution > requires quite some redesign of how the Logical Replication work is > synchronized between apply and tablesync workers and might turn out to > be a bigger implementation effort. I am tentatively thinking to go > with a first or second solution at this stage and anyway if later > people feel that we need some bigger redesign then we can go with > something on the lines of Solution-3. > > Thoughts? > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BQC74wRQmbYT%2BMmOs%3DYbdUjuq0_A9CBbVoQMB1Ryi-OA%40mail.gmail.com > [2] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com > [3] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com > [4] - https://www.postgresql.org/message-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg%40mail.gmail.com > > -- Hi Amit, - Solution-3 has become too complicated to be attempted by me. Anyway, we may be better to just focus on eliminating the new problems exposed by the 2PC work [1], rather than burning too much effort to fix some other quirk which apparently has existed for years. [1] https://www.postgresql.org/message-id/CAHut%2BPtm7E5Jj92tJWPtnnjbNjJN60_%3DaGGKYW3h23b7J%3DqeDg%40mail.gmail.com - Solution-2 has some potential lag problems, and maybe file resource problems as well. This idea did not get a very favourable response when I first proposed it. - This leaves Solution-1 as the best viable option to fix the current known 2PC trouble. ~~ So I will try to write a patch for the proposed Solution-1. --- Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Dec 10, 2020 at 3:19 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Tue, Dec 8, 2020 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > Yes, I observed this same behavior. > > > > > > IIUC the only way for the tablesync worker to go from CATCHUP mode to > > > SYNCDONE is via the call to process_sync_tables. > > > > > > But a side-effect of this is, when messages arrive during this CATCHUP > > > phase one tx will be getting handled by the tablesync worker before > > > the process_sync_tables() is ever encountered. > > > > > > I have created and attached a simple patch which allows the tablesync > > > to detect if there is anything to do *before* it enters the apply main > > > loop. Calling process_sync_tables() before the apply main loop offers > > > a quick way out so the message handling will not be split > > > unnecessarily between the workers. > > > > > > > Yeah, this demonstrates the idea can work but as mentioned in my > > previous email [1] this needs much more work to make the COPY and then > > later fetching the changes from the publisher consistently. So, let me > > summarize the discussion so far. We wanted to enhance the tablesync > > phase of Logical Replication to enable decoding of prepared > > transactions [2]. The problem was when we stream prepared transactions > > in the tablesync worker, it will simply commit the same due to the > > requirement of maintaining a single transaction for the entire > > duration of copy and streaming of transactions afterward. We can't > > simply disable the decoding of prepared xacts for tablesync workers > > because it can skip some of the prepared xacts forever on subscriber > > as explained in one of the emails above [3]. Now, while investigating > > the solutions to enhance tablesync to support decoding at prepare > > time, I found that due to the current design of tablesync we can see > > partial data of transactions on subscribers which is also explained in > > the email above with an example [4]. This problem of visibility is > > there since the Logical Replication is introduced in PostgreSQL and > > the only answer I got till now is that there doesn't seem to be any > > other alternative which I think is not true and I have provided one > > alternative as well. > > > > Next, we have discussed three different solutions all of which will > > solve the first problem (allow the tablesync worker to decode > > transactions at prepare time) and one of which solves both the first > > and second problem (partial transaction data visibility). > > > > Solution-1: Allow the table-sync worker to use multiple transactions. > > The reason for doing it in a single transaction is that if after > > initial COPY we commit and then crash while streaming changes of other > > transactions, the state of the table won't be known after the restart > > as we are using temporary slot so we don't from where to restart > > syncing the table. > > > > IIUC, we need to primarily do two things to achieve multiple > > transactions, one is to have an additional state in the catalog (say > > catch up) which will say that the initial copy is done. Then we need > > to have a permanent slot using which we can track the progress of the > > slot so that after restart (due to crash, connection break, etc.) we > > can start from the appropriate position. Now, this will allow us to do > > less work after recovering from a crash because we will know the > > restart point. As Craig mentioned, it also allows the sync slot to > > advance, freeing any held upstream resources before the whole sync is > > done, which is good if the upstream is busy and generating lots of > > WAL. Finally, committing as we go means we won't exceed the cid > > increment limit in a single txn. > > > > Solution-2: The next solution we discussed is to make "tablesync" > > worker just gather messages after COPY in a similar way to how the > > current streaming of in-progress transaction feature gathers messages > > into a "changes" file so that they can be replayed later by the apply > > worker. Now, here as we don't need to replay the individual > > transactions in tablesync worker in a single transaction, it will > > allow us to send decode prepared to the subscriber. This has some > > disadvantages such as each transaction processed by tablesync worker > > needs to be durably written to file and it can also lead to some apply > > lag later when we process the same by apply worker. > > > > Solution-3: Allow the table-sync workers to just perform initial COPY > > and then once the COPY is done for all relations the apply worker will > > stream all the future changes. Now, surely if large copies are > > required for multiple relations then we would delay a bit to replay > > transactions partially by the apply worker but don't know how much > > that matters as compared to transaction visibility issue and anyway we > > would have achieved the maximum parallelism by allowing copy via > > multiple workers. This would reduce the load (both CPU and I/O) on the > > publisher-side by allowing to decode the data only once instead of for > > each table sync worker once and separately for the apply worker. I > > think it will use fewer resources to finish the work. > > > > Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT > > option which creates a transaction snapshot on the publisher, and then > > we use the same snapshot for COPY from the publisher. After this, when > > we try to receive the data from the publisher using the same slot, it > > will be in sync with the COPY. I think to keep the same consistency > > between COPY and the data we receive from the publisher in this > > approach, we need to export the snapshot while creating a slot in the > > apply worker by using CRS_EXPORT_SNAPSHOT and then use the same > > snapshot by all the tablesync workers doing the copy. In tablesync > > workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN > > READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported > > snapshot. That way the COPY will use the same snapshot as is used for > > receiving the changes in apply worker and the data will be in sync. > > > > Then we also need a way to export snapshot while the apply worker is > > already receiving the changes because users can use 'ALTER > > SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be > > synced. I think we need to introduce a new command in > > exec_replication_command() to export the snapshot from the existing > > slot and then use it by the new tablesync worker. > > > > > > Among the above three solutions, the first two will solve the first > > problem (allow the tablesync worker to decode transactions at prepare > > time) and the third solution will solve both the first and second > > problem (partial transaction data visibility). The third solution > > requires quite some redesign of how the Logical Replication work is > > synchronized between apply and tablesync workers and might turn out to > > be a bigger implementation effort. I am tentatively thinking to go > > with a first or second solution at this stage and anyway if later > > people feel that we need some bigger redesign then we can go with > > something on the lines of Solution-3. > > > > Thoughts? > > > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BQC74wRQmbYT%2BMmOs%3DYbdUjuq0_A9CBbVoQMB1Ryi-OA%40mail.gmail.com > > [2] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com > > [3] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com > > [4] - https://www.postgresql.org/message-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg%40mail.gmail.com > > > > -- > > Hi Amit, > > - Solution-3 has become too complicated to be attempted by me. Anyway, > we may be better to just focus on eliminating the new problems exposed > by the 2PC work [1], rather than burning too much effort to fix some > other quirk which apparently has existed for years. > [1] https://www.postgresql.org/message-id/CAHut%2BPtm7E5Jj92tJWPtnnjbNjJN60_%3DaGGKYW3h23b7J%3DqeDg%40mail.gmail.com > > - Solution-2 has some potential lag problems, and maybe file resource > problems as well. This idea did not get a very favourable response > when I first proposed it. > > - This leaves Solution-1 as the best viable option to fix the current > known 2PC trouble. > > ~~ > > So I will try to write a patch for the proposed Solution-1. Yeah, even I think that the Solution-1 is best for solving the problem for 2PC. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 8:49 PM Peter Smith <smithpb2250@gmail.com> wrote: > So I will try to write a patch for the proposed Solution-1. > Hi Amit. FYI, here is my v3 WIP patch for the Solution1. This patch applies onto the v30 patch set [1] from the other 2PC thread: [1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com Although incomplete, it does continue to pass all the make check, and src/test/subscription TAP tests. ==== Coded / WIP: * tablesync slot is now permanent instead of temporary * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. TODO / Known Issues: * The tablesync replication origin/lsn logic all needs to be updated so that tablesync knows where to restart based on information held by the now permanent slot. * the current implementation of tablesync drop slot (e.g. from DROP SUBSCRIPTION) or finish_sync_worker regenerates the tablesync slot name so it knows what slot to drop. The current code may be ok for normal use cases, but if there is an ALTER SUBSCRIPTION ... SET (slot_name = newname) it would fail to be able to find the tablesync slot. Some redesign may be needed for this part. * help / comments / cleanup * There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Hi Amit. PSA my v4 WIP patch for the Solution1. This patch applies onto the v30 patch set [1] from other 2PC thread: [1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com Although incomplete it does still pass all the make check, and src/test/subscription TAP tests. ==== Coded / WIP: * tablesync slot is now permanent instead of temporary * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync now sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for apply worker) * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply TODO / Known Issues: * the current implementation of tablesync drop slot (e.g. from DropSubscription or finish_sync_worker) regenerates the tablesync slot name so it knows what slot to drop. The current code might be ok for normal use cases, but if there is an ALTER SUBSCRIPTION ... SET (slot_name = newname) it would fail to be able to find the tablesync slot. * I think if there are crashed tablesync workers then they are not known to DropSubscription. So this might be a problem to cleanup slots and/or origin tracking belonging to those unknown workers. * help / comments / cleanup * There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote: > > TODO / Known Issues: > > * the current implementation of tablesync drop slot (e.g. from > DropSubscription or finish_sync_worker) regenerates the tablesync slot > name so it knows what slot to drop. > If you always drop the slot at finish_sync_worker, then in which case do you need to drop it during DropSubscription? Is it when the table sync workers are crashed? > The current code might be ok for > normal use cases, but if there is an ALTER SUBSCRIPTION ... SET > (slot_name = newname) it would fail to be able to find the tablesync > slot. > Sure, but the same will be true for the apply worker slot as well. I agree the problem would be more for table sync workers but I think we can solve it, see below. > * I think if there are crashed tablesync workers then they are not > known to DropSubscription. So this might be a problem to cleanup slots > and/or origin tracking belonging to those unknown workers. > Yeah, I think we can do two things to avoid this and the previous problem. (a) We can generate the slot_name for the table sync worker based on only subscription_id and rel_id. (b) Immediately after creating the slot, advance the replication origin with the position (origin_startpos) we get from walrcv_create_slot, this will help us to start from the right location. Do you see anything which will still not be addressed after doing the above? I understand why you are trying to create this patch atop logical decoding of 2PC patch but I think it is better to create this as an independent patch and then use it to test 2PC problem. Also, please explain what kind of testing you did to ensure that it works properly after the table sync worker restarts after the crash. -- With Regards, Amit Kapila.
On Sat, Dec 19, 2020 at 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > I understand why you are trying to create this patch atop logical > decoding of 2PC patch but I think it is better to create this as an > independent patch and then use it to test 2PC problem. Also, please > explain what kind of testing you did to ensure that it works properly > after the table sync worker restarts after the crash. > Few other comments: ================== 1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + { The above FIXME is not clear to me. Actually, the crashed workers should restart, finish their work, and drop the slots. So not sure what exactly this FIXME refers to? 2. DropSubscription() { .. ReplicationSlotDropAtPubNode( + NULL, + conninfo, /* use conninfo to make a new connection. */ + subname, + syncslotname); .. } With the above call, it will form a connection with the publisher and drop the required slots. I think we need to save the connection info so that we don't need to connect/disconnect for each slot to be dropped. Later in this function, we again connect and drop the apply worker slot. I think we should connect just once drop the apply and table sync slots if any. 3. ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname) { .. + PG_TRY(); .. + PG_CATCH(); + { + /* NOP. Just gobble any ERROR. */ + } + PG_END_TRY(); Why are we suppressing the error instead of handling it the error in the same way as we do while dropping the apply worker slot in DropSubscription? 4. @@ -139,6 +141,28 @@ finish_sync_worker(void) get_rel_name(MyLogicalRepWorker->relid)))); CommitTransactionCommand(); + /* + * Cleanup the tablesync slot. + */ + { + extern void ReplicationSlotDropAtPubNode( + WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname); This is not how we export functions at other places? -- With Regards, Amit Kapila.
Hi Amit. PSA my v5 WIP patch for the Solution1. This patch still applies onto the v30 patch set [1] from other 2PC thread: [1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com (I understand you would like this to be delivered as a separate patch independent of v30. I will convert it ASAP) ==== Coded / WIP: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name. * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply TODO / Known Issues: * I think if there are crashed tablesync workers they may not be known to DropSubscription current code. This might be a problem to cleanup slots and/or origin tracking belonging to those unknown workers. * Help / comments / cleanup * There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development * Address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Sat, Dec 19, 2020 at 5:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > TODO / Known Issues: > > > > * the current implementation of tablesync drop slot (e.g. from > > DropSubscription or finish_sync_worker) regenerates the tablesync slot > > name so it knows what slot to drop. > > > > If you always drop the slot at finish_sync_worker, then in which case > do you need to drop it during DropSubscription? Is it when the table > sync workers are crashed? Yes. It is not the normal case. But if the tablesync never yet got to SYNCDONE state (maybe crashed) then finish_sync_worker may not be called. So I think a rogue tablesync slot might still exist during DropSubscription. > > > The current code might be ok for > > normal use cases, but if there is an ALTER SUBSCRIPTION ... SET > > (slot_name = newname) it would fail to be able to find the tablesync > > slot. > > > > Sure, but the same will be true for the apply worker slot as well. I > agree the problem would be more for table sync workers but I think we > can solve it, see below. > > > * I think if there are crashed tablesync workers then they are not > > known to DropSubscription. So this might be a problem to cleanup slots > > and/or origin tracking belonging to those unknown workers. > > > > Yeah, I think we can do two things to avoid this and the previous > problem. (a) We can generate the slot_name for the table sync worker > based on only subscription_id and rel_id. (b) Immediately after > creating the slot, advance the replication origin with the position > (origin_startpos) we get from walrcv_create_slot, this will help us to > start from the right location. > > Do you see anything which will still not be addressed after doing the above? (a) V5 Patch is updated as suggested. (b) V5 Patch is updated as suggested. Now calling replorigin_advance. No problems seen so far. All TAP tests pass, but more testing needed for the origin stuff > > I understand why you are trying to create this patch atop logical > decoding of 2PC patch but I think it is better to create this as an > independent patch and then use it to test 2PC problem. OK. The latest patch still applies to v30 just for my convenience today, but I will head towards converting this to an independent patch ASAP. > Also, please > explain what kind of testing you did to ensure that it works properly > after the table sync worker restarts after the crash. So far tested like this - I caused the tablesync to crash after COPYDONE (but before SYNCDONE) by sending a row to cause a PK violation while holding the tablesync at the CATCHUP state in the debugger. The tablesync then handles the insert, encounters the PK violation error, and re-launches. Then I can remove the extra row so the PK violation does not happen, so the (re-launched) tablesync can complete and finish normally. The Apply worker then takes over. I have attached some captured/annotated logging of my test scenario which I ran using the V4 patch (the log has a lot of extra temporary output to help see what is going on) --- Kind Regards, Peter Smith. Fujitsu Australia.
Attachment
On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Few other comments: > ================== Thanks for your feedback. > 1. > * FIXME 3 - Crashed tablesync workers may also have remaining slots > because I don't think > + * such workers are even iterated by this loop, and nobody else is > removing them. > + */ > + if (slotname) > + { > > The above FIXME is not clear to me. Actually, the crashed workers > should restart, finish their work, and drop the slots. So not sure > what exactly this FIXME refers to? Yes, normally if the tablesync can complete it should behave like that. But I think there are other scenarios where it may be unable to clean-up after itself. For example: i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert handled by tablesync can give a PK violation which also will crash again and again for each re-launched/replacement tablesync worker. This can be reproduced in the debugger. If the DropSubscription doesn't clean-up the tablesync's slot then nobody will. ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to ensure that the launcher doesn't restart new worker during dropping the subscription". So executing DROP SUBSCRIPTION will prevent a newly crashed tablesync from re-launching, so it won’t be able to take care of its own slot. If the DropSubscription doesn't clean-up that tablesync's slot then nobody will. > > 2. > DropSubscription() > { > .. > ReplicationSlotDropAtPubNode( > + NULL, > + conninfo, /* use conninfo to make a new connection. */ > + subname, > + syncslotname); > .. > } > > With the above call, it will form a connection with the publisher and > drop the required slots. I think we need to save the connection info > so that we don't need to connect/disconnect for each slot to be > dropped. Later in this function, we again connect and drop the apply > worker slot. I think we should connect just once drop the apply and > table sync slots if any. OK. IIUC this is a suggestion for more efficient connection usage, rather than actual bug right? I have added this suggestion to my TODO list. > > 3. > ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char > *conninfo, char *subname, char *slotname) > { > .. > + PG_TRY(); > .. > + PG_CATCH(); > + { > + /* NOP. Just gobble any ERROR. */ > + } > + PG_END_TRY(); > > Why are we suppressing the error instead of handling it the error in > the same way as we do while dropping the apply worker slot in > DropSubscription? This function is common - it is also called from the tablesync finish_sync_worker. But in the finish_sync_worker case I wanted to avoid throwing an ERROR which would cause the tablesync to crash and relaunch (and crash/relaunch/repeat...) when all it was trying to do in the first place was just cleanup and exit the process. Perhaps the error suppression should be conditional depending where this function is called from? > > 4. > @@ -139,6 +141,28 @@ finish_sync_worker(void) > get_rel_name(MyLogicalRepWorker->relid)))); > CommitTransactionCommand(); > > + /* > + * Cleanup the tablesync slot. > + */ > + { > + extern void ReplicationSlotDropAtPubNode( > + WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname); > > This is not how we export functions at other places? Fixed in latest v5 patch - https://www.postgresql.org/message-id/CAHut%2BPvmDJ_EO11_up%3D_cRbOjhdWCMG-n7kF-mdRhjtCHcjHRA%40mail.gmail.com ---- Kind Regards, Peter Smith. Fujitsu Australia.
On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Few other comments: > > ================== > > Thanks for your feedback. > > > 1. > > * FIXME 3 - Crashed tablesync workers may also have remaining slots > > because I don't think > > + * such workers are even iterated by this loop, and nobody else is > > removing them. > > + */ > > + if (slotname) > > + { > > > > The above FIXME is not clear to me. Actually, the crashed workers > > should restart, finish their work, and drop the slots. So not sure > > what exactly this FIXME refers to? > > Yes, normally if the tablesync can complete it should behave like that. > But I think there are other scenarios where it may be unable to > clean-up after itself. For example: > > i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert > handled by tablesync can give a PK violation which also will crash > again and again for each re-launched/replacement tablesync worker. > This can be reproduced in the debugger. If the DropSubscription > doesn't clean-up the tablesync's slot then nobody will. > > ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to > ensure that the launcher doesn't restart new worker during dropping > the subscription". > Yeah, I have also read that comment but do you know how it is preventing relaunch? How does the subscription lock help? > So executing DROP SUBSCRIPTION will prevent a newly > crashed tablesync from re-launching, so it won’t be able to take care > of its own slot. If the DropSubscription doesn't clean-up that > tablesync's slot then nobody will. > > > > > 2. > > DropSubscription() > > { > > .. > > ReplicationSlotDropAtPubNode( > > + NULL, > > + conninfo, /* use conninfo to make a new connection. */ > > + subname, > > + syncslotname); > > .. > > } > > > > With the above call, it will form a connection with the publisher and > > drop the required slots. I think we need to save the connection info > > so that we don't need to connect/disconnect for each slot to be > > dropped. Later in this function, we again connect and drop the apply > > worker slot. I think we should connect just once drop the apply and > > table sync slots if any. > > OK. IIUC this is a suggestion for more efficient connection usage, > rather than actual bug right? > Yes, it is for effective connection usage. > I have added this suggestion to my TODO > list. > > > > > 3. > > ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char > > *conninfo, char *subname, char *slotname) > > { > > .. > > + PG_TRY(); > > .. > > + PG_CATCH(); > > + { > > + /* NOP. Just gobble any ERROR. */ > > + } > > + PG_END_TRY(); > > > > Why are we suppressing the error instead of handling it the error in > > the same way as we do while dropping the apply worker slot in > > DropSubscription? > > This function is common - it is also called from the tablesync > finish_sync_worker. But in the finish_sync_worker case I wanted to > avoid throwing an ERROR which would cause the tablesync to crash and > relaunch (and crash/relaunch/repeat...) when all it was trying to do > in the first place was just cleanup and exit the process. Perhaps the > error suppression should be conditional depending where this function > is called from? > Yeah, that could be one way and if you follow my previous suggestion this function might change a bit more. -- With Regards, Amit Kapila.
Hi Amit. PSA my v6 WIP patch for the Solution1. This patch still applies onto the v30 patch set [1] from other 2PC thread: [1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com (I understand you would like this to be delivered as a separate patch independent of v30. I will convert it ASAP) ==== Coded / WIP: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name. * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply TODO / Known Issues: * Crashed tablesync workers may not be known to DropSubscription current code. This might be a problem to cleanup slots and/or origin tracking belonging to those unknown workers. * There seems to be a race condition during DROP SUBSCRIPTION. It manifests as the TAP test 007 hanging. Logging shows it seems to be during replorigin_drop when called from DropSubscription. It is timing related and quite rare - e.g. Only happens 1x every 10x running subscription TAP tests. * Help / comments / cleanup * There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development * Address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Few other comments: > > > ================== > > > > Thanks for your feedback. > > > > > 1. > > > * FIXME 3 - Crashed tablesync workers may also have remaining slots > > > because I don't think > > > + * such workers are even iterated by this loop, and nobody else is > > > removing them. > > > + */ > > > + if (slotname) > > > + { > > > > > > The above FIXME is not clear to me. Actually, the crashed workers > > > should restart, finish their work, and drop the slots. So not sure > > > what exactly this FIXME refers to? > > > > Yes, normally if the tablesync can complete it should behave like that. > > But I think there are other scenarios where it may be unable to > > clean-up after itself. For example: > > > > i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert > > handled by tablesync can give a PK violation which also will crash > > again and again for each re-launched/replacement tablesync worker. > > This can be reproduced in the debugger. If the DropSubscription > > doesn't clean-up the tablesync's slot then nobody will. > > > > ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to > > ensure that the launcher doesn't restart new worker during dropping > > the subscription". > > > > Yeah, I have also read that comment but do you know how it is > preventing relaunch? How does the subscription lock help? Hmmm. I did see there is a matching lock in get_subscription_list of launcher.c, which may be what that code comment was referring to. But I also am currently unsure how this lock prevents anybody (e.g. process_syncing_tables_for_apply) from executing another logicalrep_worker_launch. > > > So executing DROP SUBSCRIPTION will prevent a newly > > crashed tablesync from re-launching, so it won’t be able to take care > > of its own slot. If the DropSubscription doesn't clean-up that > > tablesync's slot then nobody will. > > > > > > > > > > 2. > > > DropSubscription() > > > { > > > .. > > > ReplicationSlotDropAtPubNode( > > > + NULL, > > > + conninfo, /* use conninfo to make a new connection. */ > > > + subname, > > > + syncslotname); > > > .. > > > } > > > > > > With the above call, it will form a connection with the publisher and > > > drop the required slots. I think we need to save the connection info > > > so that we don't need to connect/disconnect for each slot to be > > > dropped. Later in this function, we again connect and drop the apply > > > worker slot. I think we should connect just once drop the apply and > > > table sync slots if any. > > > > OK. IIUC this is a suggestion for more efficient connection usage, > > rather than actual bug right? > > > > Yes, it is for effective connection usage. > I have addressed this in the latest patch [v6] > > > > > > > > 3. > > > ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char > > > *conninfo, char *subname, char *slotname) > > > { > > > .. > > > + PG_TRY(); > > > .. > > > + PG_CATCH(); > > > + { > > > + /* NOP. Just gobble any ERROR. */ > > > + } > > > + PG_END_TRY(); > > > > > > Why are we suppressing the error instead of handling it the error in > > > the same way as we do while dropping the apply worker slot in > > > DropSubscription? > > > > This function is common - it is also called from the tablesync > > finish_sync_worker. But in the finish_sync_worker case I wanted to > > avoid throwing an ERROR which would cause the tablesync to crash and > > relaunch (and crash/relaunch/repeat...) when all it was trying to do > > in the first place was just cleanup and exit the process. Perhaps the > > error suppression should be conditional depending where this function > > is called from? > > > > Yeah, that could be one way and if you follow my previous suggestion > this function might change a bit more. I have addressed this in the latest patch [v6] --- [v6] https://www.postgresql.org/message-id/CAHut%2BPuCLty2HGNT6neyOcUmBNxOLo%3DybQ2Yv-nTR4kFY-8QLw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
Hi Amit. PSA my v7 WIP patch for the Solution1. This patch still applies onto the v30 patch set [1] from other 2PC thread: [1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com (I understand you would like this to be delivered as a separate patch independent of v30. I will convert it ASAP) ==== Coded / WIP: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name. * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply * The v7 DropSubscription cleanup code has been rewritten since v6. The subscription TAP tests have been executed many (7) times now without observing any of the race problems that I previously reported seeing when using the v6 patch. TODO / Known Issues: * Help / comments / cleanup * There is temporary "!!>>" excessive logging scattered around which I added to help my testing during development * Address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Hi Amit. PSA my v8 WIP patch for the Solution1. This has the same code changes as the v7 patch, but the v8 patch can be applied to the current PG OSS master code base. ==== Coded / WIP: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name. * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply * The DropSubscription cleanup code was changed lots in v7. The subscription TAP tests have been executed 6x now without observing any race problems that were sometimes seen to happen in the v6 patch. TODO / Known Issues: * Help / comments * There is temporary "!!>>" excessive logging scattered around which I added to help my testing during development * Address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Dec 23, 2020 at 11:49 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Amit. > > PSA my v7 WIP patch for the Solution1. > Few comments: ================ 1. + * Rarely, the DropSubscription may be issued when a tablesync still + * is in SYNCDONE but not yet in READY state. If this happens then + * the drop slot could fail because it is already dropped. + * In this case suppress and drop slot error. + * + * FIXME - Is there a better way than this? + */ + if (rstate->state != SUBREL_STATE_SYNCDONE) + PG_RE_THROW(); So, does this situation happens when we try to drop subscription after the state is changed to syncdone but not syncready. If so, then can't we write a function GetSubscriptionNotDoneRelations similar to GetSubscriptionNotReadyRelations where we get a list of relations that are not in done stage. I think this should be safe because once we are here we shouldn't be allowed to start a new worker and old workers are already stopped by this function. 2. Your changes in LogicalRepSyncTableStart() doesn't seem to be right. IIUC, you are copying the table in one transaction, then updating the state to SUBREL_STATE_COPYDONE in another transaction, and after that doing replorigin_advance. Consider what happened if we error out after the first txn is committed in which we have copied the table. After the restart, it will again try to copy and lead to an error. Similarly, consider if we error out after the second transaction, we won't where to start decoding from. I think all these should be done in a single transaction. -- With Regards, Amit Kapila.
On Tue, Dec 22, 2020 at 4:58 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > Few other comments: > > > > ================== > > > > > > Thanks for your feedback. > > > > > > > 1. > > > > * FIXME 3 - Crashed tablesync workers may also have remaining slots > > > > because I don't think > > > > + * such workers are even iterated by this loop, and nobody else is > > > > removing them. > > > > + */ > > > > + if (slotname) > > > > + { > > > > > > > > The above FIXME is not clear to me. Actually, the crashed workers > > > > should restart, finish their work, and drop the slots. So not sure > > > > what exactly this FIXME refers to? > > > > > > Yes, normally if the tablesync can complete it should behave like that. > > > But I think there are other scenarios where it may be unable to > > > clean-up after itself. For example: > > > > > > i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert > > > handled by tablesync can give a PK violation which also will crash > > > again and again for each re-launched/replacement tablesync worker. > > > This can be reproduced in the debugger. If the DropSubscription > > > doesn't clean-up the tablesync's slot then nobody will. > > > > > > ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to > > > ensure that the launcher doesn't restart new worker during dropping > > > the subscription". > > > > > > > Yeah, I have also read that comment but do you know how it is > > preventing relaunch? How does the subscription lock help? > > Hmmm. I did see there is a matching lock in get_subscription_list of > launcher.c, which may be what that code comment was referring to. But > I also am currently unsure how this lock prevents anybody (e.g. > process_syncing_tables_for_apply) from executing another > logicalrep_worker_launch. > process_syncing_tables_for_apply will be called by the apply worker and we are stopping the apply worker. So, after that launcher won't start a new apply worker because of get_subscription_list() and if the apply worker is not started then it won't be able to start tablesync worker. So, we need the handling of crashed tablesync workers here such that we need to drop any new sync slots. -- With Regards, Amit Kapila.
Hi Amit. PSA my v9 WIP patch for the Solution1 which addresses some recent review comments, and other minor changes. ==== Features: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced in v7 to take care of crashed sync workers. * Minor updates to PG docs TODO / Known Issues: * Source includes temporary "!!>>" excessive logging which I added to help testing during development * Address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Dec 23, 2020 at 9:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 22, 2020 at 4:58 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Few other comments: > > > > > ================== > > > > > > > > Thanks for your feedback. > > > > > > > > > 1. > > > > > * FIXME 3 - Crashed tablesync workers may also have remaining slots > > > > > because I don't think > > > > > + * such workers are even iterated by this loop, and nobody else is > > > > > removing them. > > > > > + */ > > > > > + if (slotname) > > > > > + { > > > > > > > > > > The above FIXME is not clear to me. Actually, the crashed workers > > > > > should restart, finish their work, and drop the slots. So not sure > > > > > what exactly this FIXME refers to? > > > > > > > > Yes, normally if the tablesync can complete it should behave like that. > > > > But I think there are other scenarios where it may be unable to > > > > clean-up after itself. For example: > > > > > > > > i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert > > > > handled by tablesync can give a PK violation which also will crash > > > > again and again for each re-launched/replacement tablesync worker. > > > > This can be reproduced in the debugger. If the DropSubscription > > > > doesn't clean-up the tablesync's slot then nobody will. > > > > > > > > ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to > > > > ensure that the launcher doesn't restart new worker during dropping > > > > the subscription". > > > > > > > > > > Yeah, I have also read that comment but do you know how it is > > > preventing relaunch? How does the subscription lock help? > > > > Hmmm. I did see there is a matching lock in get_subscription_list of > > launcher.c, which may be what that code comment was referring to. But > > I also am currently unsure how this lock prevents anybody (e.g. > > process_syncing_tables_for_apply) from executing another > > logicalrep_worker_launch. > > > > process_syncing_tables_for_apply will be called by the apply worker > and we are stopping the apply worker. So, after that launcher won't > start a new apply worker because of get_subscription_list() and if the > apply worker is not started then it won't be able to start tablesync > worker. So, we need the handling of crashed tablesync workers here > such that we need to drop any new sync slots. Yes, in the v6 patch code this was a problem in need of handling. But since the v7 patch the DropSubscription code is now using a separate GetSubscriptionNotReadyRelations loop to handle the cleanup of potentially leftover slots from crashed tablesync workers (i.e. workers that never got to a READY state). Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 1. > + * Rarely, the DropSubscription may be issued when a tablesync still > + * is in SYNCDONE but not yet in READY state. If this happens then > + * the drop slot could fail because it is already dropped. > + * In this case suppress and drop slot error. > + * > + * FIXME - Is there a better way than this? > + */ > + if (rstate->state != SUBREL_STATE_SYNCDONE) > + PG_RE_THROW(); > > So, does this situation happens when we try to drop subscription after > the state is changed to syncdone but not syncready. If so, then can't > we write a function GetSubscriptionNotDoneRelations similar to > GetSubscriptionNotReadyRelations where we get a list of relations that > are not in done stage. I think this should be safe because once we are > here we shouldn't be allowed to start a new worker and old workers are > already stopped by this function. Yes, but I don't see how adding such a function is an improvement over the existing code: e.g.1. GetSubscriptionNotDoneRelations will include the READY state (which we don't want) just like GetSubscriptionNotReadyRelations includes the SYNCDONE state. e.g.2. Or, something like GetSubscriptionNotDoneAndNotReadyRelations would be an unnecessary overkill replacement for the current simple "if". AFAIK the code is OK as-is. That "FIXME" comment was really meant only to highlight this for review, rather than to imply something needed to be fixed. I have removed that "FIXME" comment in the latest patch [v9]. > > 2. Your changes in LogicalRepSyncTableStart() doesn't seem to be > right. IIUC, you are copying the table in one transaction, then > updating the state to SUBREL_STATE_COPYDONE in another transaction, > and after that doing replorigin_advance. Consider what happened if we > error out after the first txn is committed in which we have copied the > table. After the restart, it will again try to copy and lead to an > error. Similarly, consider if we error out after the second > transaction, we won't where to start decoding from. I think all these > should be done in a single transaction. Fixed as suggested. Please see latest patch [v9] --- [v9] https://www.postgresql.org/message-id/CAHut%2BPv8ShLmrjCriVU%2Btprk_9b2kwBxYK2oGSn5Eb__kWVc7A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Dec 30, 2020 at 11:51 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 1. > > + * Rarely, the DropSubscription may be issued when a tablesync still > > + * is in SYNCDONE but not yet in READY state. If this happens then > > + * the drop slot could fail because it is already dropped. > > + * In this case suppress and drop slot error. > > + * > > + * FIXME - Is there a better way than this? > > + */ > > + if (rstate->state != SUBREL_STATE_SYNCDONE) > > + PG_RE_THROW(); > > > > So, does this situation happens when we try to drop subscription after > > the state is changed to syncdone but not syncready. If so, then can't > > we write a function GetSubscriptionNotDoneRelations similar to > > GetSubscriptionNotReadyRelations where we get a list of relations that > > are not in done stage. I think this should be safe because once we are > > here we shouldn't be allowed to start a new worker and old workers are > > already stopped by this function. > > Yes, but I don't see how adding such a function is an improvement over > the existing code: > The advantage is that we don't need to use try..catch to deal with such conditions which I don't think is a good way to deal with such cases. Also, even after using try...catch, still, we can leak the slots because the patch drops the slot after changing the state to syncdone and if there is any error while dropping the slot, it simply skips it. So, it is possible that the rel state is syncdone but the slot still exists and we get an error due to some different reason, and then we will silently skip it again and allow the subscription to be dropped. I think instead what we should do is to drop the slot before we change the rel state to syncdone. Also, if the apply workers fail to drop the slot, it should try to again drop it after restart. In DropSubscription, we can then check if the rel state is not SYNC or READY, we can drop the corresponding slots. > e.g.1. GetSubscriptionNotDoneRelations will include the READY state > (which we don't want) just like GetSubscriptionNotReadyRelations > includes the SYNCDONE state. > e.g.2. Or, something like GetSubscriptionNotDoneAndNotReadyRelations > would be an unnecessary overkill replacement for the current simple > "if". > or we can probably modify the function as GetSubscriptionRelationsNotInStates and pass it an array of states which we don't want. > AFAIK the code is OK as-is. > As described above, there are still race conditions where we can leak slots and also this doesn't look clean. Few other comments: ================= 1. + elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname); ... ... + elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname); Remove these and other elogs added to aid debugging or testing. If you need these for development purposes then move these to separate patch. 2. Remove WIP from the commit message and patch name. -- With Regards, Amit Kapila.
Hi Amit. PSA my v10 patch for the Solution1. v10 is essentially the same as v9, except now all the temporary "!!>>" logging has been isolated to a separate (optional) patch 0002. ==== Features: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * the DropSubscription cleanup code was enhanced (v7+) to take care of crashed sync workers. * minor updates to PG docs TODO / Known Issues: * address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Few other comments: > ================= > 1. > + elog(LOG, "!!>> DropSubscription: dropping the tablesync slot > \"%s\".", syncslotname); > + ReplicationSlotDropAtPubNode(wrconn, syncslotname); > + elog(LOG, "!!>> DropSubscription: dropped the tablesync slot > \"%s\".", syncslotname); > > ... > ... > > + elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot > \"%s\".", syncslotname); > + ReplicationSlotDropAtPubNode(wrconn, syncslotname); > + elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot > \"%s\".", syncslotname); > > Remove these and other elogs added to aid debugging or testing. If you > need these for development purposes then move these to separate patch. Fixed in latest patch (v10). > > 2. Remove WIP from the commit message and patch name. > > -- Fixed in latest patch (v10) --- v10 = https://www.postgresql.org/message-id/CAHut%2BPuzPmFzk3p4oL9H3nkiY6utFryV9c5dW6kRhCe_RY%3DgnA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Jan 4, 2021 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Few other comments: > ================= > Few more comments on v9: ====================== 1. + /* Drop the tablesync slot. */ + { + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); + + /* + * If the subscription slotname is NONE/NULL and the connection to publisher is + * broken, but the DropSubscription should still be allowed to complete. + * But without a connection it is not possible to drop any tablesync slots. + */ + if (!wrconn) + { + /* FIXME - OK to just log a warning? */ + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".", + syncslotname); + } Why is this not an ERROR? We don't want to keep the table slots lingering after DropSubscription. If there is any tablesync slot that needs to be dropped and the publisher is not available then we should raise an error. 2. + /* + * Tablesync resource cleanup (slots and origins). + * + * Any READY-state relations would already have dealt with clean-ups. + */ + { There is no need to start a separate block '{' here. 3. +#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */ You can mention in the comments that sublsn will be NULL for this state as it is mentioned for other similar states. Can we think of using any letter in lower case for this as all other states are in lower-case except for this which makes it a look bit odd? We can use 'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine if you have any better ideas. 4. LogicalRepSyncTableStart() { .. .. +copy_table_done: + + /* Setup replication origin tracking. */ + { + char originname[NAMEDATALEN]; + RepOriginId originid; + + snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid); + originid = replorigin_by_name(originname, true); + if (!OidIsValid(originid)) + { + /* + * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot. + */ + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname); + originid = replorigin_create(originname); + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname); + replorigin_session_setup(originid); + replorigin_session_origin = originid; + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname); + replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr, + true /* go backward */ , true /* WAL log */ ); + } + else + { + /* + * Origin tracking already exists. + */ + elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname); + replorigin_session_setup(originid); + replorigin_session_origin = originid; + elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname); + *origin_startpos = replorigin_session_get_progress(false); + } .. .. } I am not sure if this code is correct because, for the very first time when the copy is done, we don't expect replication origin to exist whereas this code will silently use already existing replication origin which can lead to a wrong start position for the slot. In such a case it should error out. I guess we should create the replication origin before making the state as copydone. I feel we should even have a test case for this as it is not difficult to have a pre-existing replication origin. 5. Is it possible to write a testcase where we fail (say due to pk violation or some other error) after the initial copy is done, then remove the conflicting row and allow a copy to be completed? If we already have any such test then it is fine. 6. +/* + * Drop the replication slot at the publisher node + * using the replication connection. + */ This comment looks a bit odd. The first line appears to be too short. We have limit of 80 chars but this is much lesser than that. 7. @@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node, LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE); /* Make sure it's not used by somebody else */ - if (replication_state->acquired_by != 0) + if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid) { I think you won't need this change if you do replorigin_advance before replorigin_session_setup in your patch. 8. - * that ensures we won't loose knowledge about that after a crash if the + * that ensures we won't lose knowledge about that after a crash if the It is better to submit this as a separate patch. -- With Regards, Amit Kapila.
On Wed, Dec 30, 2020 at 5:08 PM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA my v9 WIP patch for the Solution1 which addresses some recent > review comments, and other minor changes. I did some tests using the test suite prepared by Erik Rijkers in [1] during the initial design of tablesync. Back then, they had seen some errors while doing multiple commits in initial tablesync. So I've rerun the test script on the v9 patch applied on HEAD and found no errors. The script runs pgbench, creates a pub/sub on a standby server, and all of the pgbench tables are replicated to the standby. The contents of the tables are compared at the end of each run to make sure they are identical. I have run it for around 12 hours, and it worked without any errors. Attaching the script I used. regards, Ajin Cherian Fujitsu Australia [1]- https://www.postgresql.org/message-id/93d02794068482f96d31b002e0eb248d%40xs4all.nl
Attachment
Hi Amit. PSA the v11 patch for the Tablesync Solution1. Difference from v10: - Addresses several recent review comments. - pg_indent has been run ==== Features: * tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. * tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * the DropSubscription cleanup code was enhanced (v7+) to take care of crashed sync workers. * minor updates to PG docs TODO / Known Issues: * address review comments --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Few more comments on v9: > ====================== > 1. > + /* Drop the tablesync slot. */ > + { > + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); > + > + /* > + * If the subscription slotname is NONE/NULL and the connection to publisher is > + * broken, but the DropSubscription should still be allowed to complete. > + * But without a connection it is not possible to drop any tablesync slots. > + */ > + if (!wrconn) > + { > + /* FIXME - OK to just log a warning? */ > + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop > tablesync slot \"%s\".", > + syncslotname); > + } > > Why is this not an ERROR? We don't want to keep the table slots > lingering after DropSubscription. If there is any tablesync slot that > needs to be dropped and the publisher is not available then we should > raise an error. Previously there was only the subscription slot. If the connection was broken and caused an error then it was still possible for the user to disassociate the subscription from the slot using ALTER SUBSCRIPTION ... SET (slot_name = NONE). And then (when the slotname is NULL) the DropSubscription could complete OK. I expect in that case the Admin still had some slot clean-up they would need to do on the Publisher machine. But now we have the tablesync slots so if I caused them to give ERROR when the connection is broken then the subscription would become un-droppable. If you think that having ERROR and an undroppable subscription is better than the current WARNING then please let me know - there is no problem to change it. > 2. > + /* > + * Tablesync resource cleanup (slots and origins). > + * > + * Any READY-state relations would already have dealt with clean-ups. > + */ > + { > > There is no need to start a separate block '{' here. Written this way so I can declare variables only at the scope they are needed. I didn’t see anything in the PG code conventions discouraging doing this practice: https://www.postgresql.org/docs/devel/source.html > 3. > +#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */ > > You can mention in the comments that sublsn will be NULL for this > state as it is mentioned for other similar states. Can we think of > using any letter in lower case for this as all other states are in > lower-case except for this which makes it a look bit odd? We can use > 'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine > if you have any better ideas. > Fixed in latest patch [v11] > 4. > LogicalRepSyncTableStart() > { > .. > .. > +copy_table_done: > + > + /* Setup replication origin tracking. */ > + { > + char originname[NAMEDATALEN]; > + RepOriginId originid; > + > + snprintf(originname, sizeof(originname), "pg_%u_%u", > MySubscription->oid, MyLogicalRepWorker->relid); > + originid = replorigin_by_name(originname, true); > + if (!OidIsValid(originid)) > + { > + /* > + * Origin tracking does not exist. Create it now, and advance to LSN > got from walrcv_create_slot. > + */ > + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create > \"%s\".", originname); > + originid = replorigin_create(originname); > + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup > \"%s\".", originname); > + replorigin_session_setup(originid); > + replorigin_session_origin = originid; > + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance > \"%s\".", originname); > + replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr, > + true /* go backward */ , true /* WAL log */ ); > + } > + else > + { > + /* > + * Origin tracking already exists. > + */ > + elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup > \"%s\".", originname); > + replorigin_session_setup(originid); > + replorigin_session_origin = originid; > + elog(LOG, "!!>> LogicalRepSyncTableStart: 2 > replorigin_session_get_progress \"%s\".", originname); > + *origin_startpos = replorigin_session_get_progress(false); > + } > .. > .. > } > > I am not sure if this code is correct because, for the very first time > when the copy is done, we don't expect replication origin to exist > whereas this code will silently use already existing replication > origin which can lead to a wrong start position for the slot. In such > a case it should error out. I guess we should create the replication > origin before making the state as copydone. I feel we should even have > a test case for this as it is not difficult to have a pre-existing > replication origin. > Fixed as suggested in latest patch [v11] > 5. Is it possible to write a testcase where we fail (say due to pk > violation or some other error) after the initial copy is done, then > remove the conflicting row and allow a copy to be completed? If we > already have any such test then it is fine. > Causing a PK violation during the initial copy is not a problem to test, but doing it after the initial copy is difficult. I have done exactly this test scenario before but I thought it cannot be automated. E.g. To cause an PK violation error somewhere between COPYDONE and SYNDONE means that the offending insert (the one which tablesync will fail to replicate) has to be sent while the tablesync is in CATCHUP mode. But AFAIK that can only be achieved using the debugger to get the timing right. > 6. > +/* > + * Drop the replication slot at the publisher node > + * using the replication connection. > + */ > > This comment looks a bit odd. The first line appears to be too short. > We have limit of 80 chars but this is much lesser than that. > Fixed in latest patch [v11] > 7. > @@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node, > LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE); > > /* Make sure it's not used by somebody else */ > - if (replication_state->acquired_by != 0) > + if (replication_state->acquired_by != 0 && > replication_state->acquired_by != MyProcPid) > { > TODO > I think you won't need this change if you do replorigin_advance before > replorigin_session_setup in your patch. > > 8. > - * that ensures we won't loose knowledge about that after a crash if the > + * that ensures we won't lose knowledge about that after a crash if the > > It is better to submit this as a separate patch. > Done. Please see CF entry. https://commitfest.postgresql.org/32/2926/ ---- [v11] = https://www.postgresql.org/message-id/CAHut%2BPu0A6TUPgYC-L3BKYQfa_ScL31kOV_3RsB3ActdkL1iBQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
On Tue, Jan 5, 2021 at 3:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Few more comments on v9: > > ====================== > > 1. > > + /* Drop the tablesync slot. */ > > + { > > + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); > > + > > + /* > > + * If the subscription slotname is NONE/NULL and the connection to publisher is > > + * broken, but the DropSubscription should still be allowed to complete. > > + * But without a connection it is not possible to drop any tablesync slots. > > + */ > > + if (!wrconn) > > + { > > + /* FIXME - OK to just log a warning? */ > > + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop > > tablesync slot \"%s\".", > > + syncslotname); > > + } > > > > Why is this not an ERROR? We don't want to keep the table slots > > lingering after DropSubscription. If there is any tablesync slot that > > needs to be dropped and the publisher is not available then we should > > raise an error. > > Previously there was only the subscription slot. If the connection was > broken and caused an error then it was still possible for the user to > disassociate the subscription from the slot using ALTER SUBSCRIPTION > ... SET (slot_name = NONE). And then (when the slotname is NULL) the > DropSubscription could complete OK. I expect in that case the Admin > still had some slot clean-up they would need to do on the Publisher > machine. > I think such an option could probably be used for user-created slots but it would be difficult for even Admin to know about these internally created slots associated with the particular subscription. I would say it is better to ERROR out. > > > 2. > > + /* > > + * Tablesync resource cleanup (slots and origins). > > + * > > + * Any READY-state relations would already have dealt with clean-ups. > > + */ > > + { > > > > There is no need to start a separate block '{' here. > > Written this way so I can declare variables only at the scope they are > needed. I didn’t see anything in the PG code conventions discouraging > doing this practice: https://www.postgresql.org/docs/devel/source.html > But, do we encourage such a coding convention to declare variables. I find it difficult to read such a code. I guess as a one-off we can do this but I don't see a compelling need here. > > 3. > > +#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */ > > > > You can mention in the comments that sublsn will be NULL for this > > state as it is mentioned for other similar states. Can we think of > > using any letter in lower case for this as all other states are in > > lower-case except for this which makes it a look bit odd? We can use > > 'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine > > if you have any better ideas. > > > > Fixed in latest patch [v11] > It is still not reflected in the docs. See below: --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l State code: <literal>i</literal> = initialize, <literal>d</literal> = data is being copied, + <literal>C</literal> = table data has been copied, <literal>s</literal> = synchronized, -- With Regards, Amit Kapila.
On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > 1. > > > + /* Drop the tablesync slot. */ > > > + { > > > + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); > > > + > > > + /* > > > + * If the subscription slotname is NONE/NULL and the connection to publisher is > > > + * broken, but the DropSubscription should still be allowed to complete. > > > + * But without a connection it is not possible to drop any tablesync slots. > > > + */ > > > + if (!wrconn) > > > + { > > > + /* FIXME - OK to just log a warning? */ > > > + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop > > > tablesync slot \"%s\".", > > > + syncslotname); > > > + } > > > > > > Why is this not an ERROR? We don't want to keep the table slots > > > lingering after DropSubscription. If there is any tablesync slot that > > > needs to be dropped and the publisher is not available then we should > > > raise an error. > > > > Previously there was only the subscription slot. If the connection was > > broken and caused an error then it was still possible for the user to > > disassociate the subscription from the slot using ALTER SUBSCRIPTION > > ... SET (slot_name = NONE). And then (when the slotname is NULL) the > > DropSubscription could complete OK. I expect in that case the Admin > > still had some slot clean-up they would need to do on the Publisher > > machine. > > > > I think such an option could probably be used for user-created slots > but it would be difficult for even Admin to know about these > internally created slots associated with the particular subscription. > I would say it is better to ERROR out. I am having doubts that ERROR is the best choice here. There is a long note in https://www.postgresql.org/docs/devel/sql-dropsubscription.html which describes this problem for the subscription slot and how to disassociate the name to give a workaround “To proceed in this situation”. OTOH if we make the tablesync slot unconditionally ERROR for a broken connection then there is no way to proceed, and the whole (slot_name = NONE) workaround becomes ineffectual. Note - the current patch code is only logging when the user has already disassociated the slot name; of course normally (when the slot name was not disassociated) table slots will give ERROR for broken connections. IMO, if the user has disassociated the slot name then they have already made their decision that they REALLY DO want to “proceed in this situation”. So I thought we should let them proceed. What do you think? ---- Kind Regards, Peter Smith. Fujitsu Australia.
On Wed, Jan 6, 2021 at 4:32 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 1. > > > > + /* Drop the tablesync slot. */ > > > > + { > > > > + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); > > > > + > > > > + /* > > > > + * If the subscription slotname is NONE/NULL and the connection to publisher is > > > > + * broken, but the DropSubscription should still be allowed to complete. > > > > + * But without a connection it is not possible to drop any tablesync slots. > > > > + */ > > > > + if (!wrconn) > > > > + { > > > > + /* FIXME - OK to just log a warning? */ > > > > + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop > > > > tablesync slot \"%s\".", > > > > + syncslotname); > > > > + } > > > > > > > > Why is this not an ERROR? We don't want to keep the table slots > > > > lingering after DropSubscription. If there is any tablesync slot that > > > > needs to be dropped and the publisher is not available then we should > > > > raise an error. > > > > > > Previously there was only the subscription slot. If the connection was > > > broken and caused an error then it was still possible for the user to > > > disassociate the subscription from the slot using ALTER SUBSCRIPTION > > > ... SET (slot_name = NONE). And then (when the slotname is NULL) the > > > DropSubscription could complete OK. I expect in that case the Admin > > > still had some slot clean-up they would need to do on the Publisher > > > machine. > > > > > > > I think such an option could probably be used for user-created slots > > but it would be difficult for even Admin to know about these > > internally created slots associated with the particular subscription. > > I would say it is better to ERROR out. > > I am having doubts that ERROR is the best choice here. There is a long > note in https://www.postgresql.org/docs/devel/sql-dropsubscription.html > which describes this problem for the subscription slot and how to > disassociate the name to give a workaround “To proceed in this > situation”. > > OTOH if we make the tablesync slot unconditionally ERROR for a broken > connection then there is no way to proceed, and the whole (slot_name = > NONE) workaround becomes ineffectual. Note - the current patch code is > only logging when the user has already disassociated the slot name; of > course normally (when the slot name was not disassociated) table slots > will give ERROR for broken connections. > > IMO, if the user has disassociated the slot name then they have > already made their decision that they REALLY DO want to “proceed in > this situation”. So I thought we should let them proceed. > Okay, if we want to go that way then we should add some documentation about it. Currently, the slot name used by apply worker is known to the user because either it is specified by the user or the default is subscription name, so the user can manually remove that slot later but that is not true for tablesync slots. I think we need to update both the Drop Subscription page [1] and logical-replication-subscription page [2] where we have mentioned temporary slots and in the end "Here are some scenarios: .." to mention about these slots and probably how their names are generated so that in such special situations users can drop them manually. [1] - https://www.postgresql.org/docs/devel/sql-dropsubscription.html [2] - https://www.postgresql.org/docs/devel/logical-replication-subscription.html -- With Regards, Amit Kapila.
On Tue, Jan 5, 2021 at 3:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > 5. Is it possible to write a testcase where we fail (say due to pk > > violation or some other error) after the initial copy is done, then > > remove the conflicting row and allow a copy to be completed? If we > > already have any such test then it is fine. > > > > Causing a PK violation during the initial copy is not a problem to > test, but doing it after the initial copy is difficult. I have done > exactly this test scenario before but I thought it cannot be > automated. E.g. To cause an PK violation error somewhere between > COPYDONE and SYNDONE means that the offending insert (the one which > tablesync will fail to replicate) has to be sent while the tablesync > is in CATCHUP mode. But AFAIK that can only be achieved using the > debugger to get the timing right. > Yeah, I am also not able to think of any way to automate such a test. I was thinking about what could go wrong if we error out in that stage. The only thing that could be problematic is if we somehow make the slot and replication origin used during copy dangling. I think if tablesync is restarted after error then we will clean up those which will be normally the case but what if the tablesync worker is not started again? I think the only possibility of tablesync worker not started again is if during Alter Subscription ... Refresh Publication, we remove the corresponding subscription rel (see AlterSubscription_refresh, I guess it could happen if one has dropped the relation from publication). I haven't tested this with your patch but if such a possibility exists then we need to think of cleaning up slot and origin when we remove subscription rel. What do you think? -- With Regards, Amit Kapila.
On Wed, Jan 6, 2021 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 5, 2021 at 3:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > 5. Is it possible to write a testcase where we fail (say due to pk > > > violation or some other error) after the initial copy is done, then > > > remove the conflicting row and allow a copy to be completed? If we > > > already have any such test then it is fine. > > > > > > > Causing a PK violation during the initial copy is not a problem to > > test, but doing it after the initial copy is difficult. I have done > > exactly this test scenario before but I thought it cannot be > > automated. E.g. To cause an PK violation error somewhere between > > COPYDONE and SYNDONE means that the offending insert (the one which > > tablesync will fail to replicate) has to be sent while the tablesync > > is in CATCHUP mode. But AFAIK that can only be achieved using the > > debugger to get the timing right. > > > > Yeah, I am also not able to think of any way to automate such a test. > I was thinking about what could go wrong if we error out in that > stage. The only thing that could be problematic is if we somehow make > the slot and replication origin used during copy dangling. I think if > tablesync is restarted after error then we will clean up those which > will be normally the case but what if the tablesync worker is not > started again? I think the only possibility of tablesync worker not > started again is if during Alter Subscription ... Refresh Publication, > we remove the corresponding subscription rel (see > AlterSubscription_refresh, I guess it could happen if one has dropped > the relation from publication). I haven't tested this with your patch > but if such a possibility exists then we need to think of cleaning up > slot and origin when we remove subscription rel. What do you think? > I think it makes sense. If there can be a race between the tablesync re-launching (after error), and the AlterSubscription_refresh removing some table’s relid from the subscription then there could be lurking slot/origin tablesync resources (of the removed table) which a subsequent DROP SUBSCRIPTION cannot discover. I will think more about how/if it is possible to make this happen. Anyway, I suppose I ought to refactor/isolate some of the tablesync cleanup code in case it needs to be commonly called from DropSubscription and/or from AlterSubscription_refresh. ---- Kind Regards, Peter Smith. Fujitsu Australia.
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote: > > I think it makes sense. If there can be a race between the tablesync > re-launching (after error), and the AlterSubscription_refresh removing > some table’s relid from the subscription then there could be lurking > slot/origin tablesync resources (of the removed table) which a > subsequent DROP SUBSCRIPTION cannot discover. I will think more about > how/if it is possible to make this happen. Anyway, I suppose I ought > to refactor/isolate some of the tablesync cleanup code in case it > needs to be commonly called from DropSubscription and/or from > AlterSubscription_refresh. > Fair enough. BTW, I have analyzed whether we need any modifications to pg_dump/restore for this patch as this changes the state of one of the fields in the system table and concluded that we don't need any change. For subscriptions, we don't dump any of the information from pg_subscription_rel, rather we just dump subscriptions with the connect option as false which means users need to enable the subscription and refresh publication after restore. I have checked this in the code and tested it as well. The related information is present in pg_dump doc page [1], see from "When dumping logical replication subscriptions ....". [1] - https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.
> PSA the v11 patch for the Tablesync Solution1. > > Difference from v10: > - Addresses several recent review comments. > - pg_indent has been run > Hi I took a look into the patch and have some comments. 1. * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY. I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE, But It seems the SUBREL_STATE_TCOPYDONE is actually set before SUBREL_STATE_SYNCWAIT[1]. Did i miss something here ? [1]----------------- + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + SUBREL_STATE_TCOPYDONE, + MyLogicalRepWorker->relstate_lsn); ... /* * We are done with the initial data synchronization, update the state. */ SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; ------------------ 2. <literal>i</literal> = initialize, <literal>d</literal> = data is being copied, + <literal>C</literal> = table data has been copied, <literal>s</literal> = synchronized, <literal>r</literal> = ready (normal replication) +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed + * (sublsn NULL) */ The character representing 'data has been copied' in the catalog seems different from the macro define. Best regards, houzj
Thankyou for the feedback. On Thu, Jan 7, 2021 at 12:45 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > PSA the v11 patch for the Tablesync Solution1. > > > > Difference from v10: > > - Addresses several recent review comments. > > - pg_indent has been run > > > Hi > > I took a look into the patch and have some comments. > > 1. > * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> > - * CATCHUP -> SYNCDONE -> READY. > + * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY. > > I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE, > But It seems the SUBREL_STATE_TCOPYDONE is actually set before SUBREL_STATE_SYNCWAIT[1]. > Did i miss something here ? > > [1]----------------- > + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, > + MyLogicalRepWorker->relid, > + SUBREL_STATE_TCOPYDONE, > + MyLogicalRepWorker->relstate_lsn); > ... > /* > * We are done with the initial data synchronization, update the state. > */ > SpinLockAcquire(&MyLogicalRepWorker->relmutex); > MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; > ------------------ > Thanks for reporting this mistake. I will correct the comment for the next patch (v12) > > 2. > <literal>i</literal> = initialize, > <literal>d</literal> = data is being copied, > + <literal>C</literal> = table data has been copied, > <literal>s</literal> = synchronized, > <literal>r</literal> = ready (normal replication) > > +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed > + * (sublsn NULL) */ > The character representing 'data has been copied' in the catalog seems different from the macro define. > Yes, same was already previously reported [1] [1] https://www.postgresql.org/message-id/CAA4eK1Kyi037XZzyrLE71MS2KoMmNSqa6RrQLdSCeeL27gnL%2BA%40mail.gmail.com It will be fixed in the next patch (v12) ---- Kind Regards, Peter Smith. Fujitsu Australia.
On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > I think it makes sense. If there can be a race between the tablesync > > re-launching (after error), and the AlterSubscription_refresh removing > > some table’s relid from the subscription then there could be lurking > > slot/origin tablesync resources (of the removed table) which a > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about > > how/if it is possible to make this happen. Anyway, I suppose I ought > > to refactor/isolate some of the tablesync cleanup code in case it > > needs to be commonly called from DropSubscription and/or from > > AlterSubscription_refresh. > > > > Fair enough. > I think before implementing, we should once try to reproduce this case. I understand this is a timing issue and can be reproduced only with the help of debugger but we should do that. > BTW, I have analyzed whether we need any modifications to > pg_dump/restore for this patch as this changes the state of one of the > fields in the system table and concluded that we don't need any > change. For subscriptions, we don't dump any of the information from > pg_subscription_rel, rather we just dump subscriptions with the > connect option as false which means users need to enable the > subscription and refresh publication after restore. I have checked > this in the code and tested it as well. The related information is > present in pg_dump doc page [1], see from "When dumping logical > replication subscriptions ....". > I have further analyzed that we don't need to do anything w.r.t pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the schema info of the old cluster and then restore it to the new cluster. And, we know that pg_dump ignores the info in pg_subscription_rel, so we don't need to change anything as our changes are specific to the state of one of the columns in pg_subscription_rel. I have not tested this but we should test it by having some relations in not_ready state and then allow the old cluster (<=PG13) to be upgraded to new (pg14) both with and without this patch and see if there is any change in behavior. -- With Regards, Amit Kapila.
Hi Amit. PSA the v12 patch for the Tablesync Solution1. Differences from v11: + Added PG docs to mention the tablesync slot + Refactored tablesync slot drop (done by DropSubscription/process_syncing_tables_for_sync) + Fixed PG docs mentioning wrong state code + Fixed wrong code comment describing TCOPYDONE state ==== Features: * The tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * The tablesync slot cleanup (drop) code is added for DropSubscription and for process_syncing_tables_for_sync functions * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers. * Updates to PG docs TODO / Known Issues: * Address review comments * Patch applies with whitespace warning --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 30, 2020 at 11:51 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 1. > > > + * Rarely, the DropSubscription may be issued when a tablesync still > > > + * is in SYNCDONE but not yet in READY state. If this happens then > > > + * the drop slot could fail because it is already dropped. > > > + * In this case suppress and drop slot error. > > > + * > > > + * FIXME - Is there a better way than this? > > > + */ > > > + if (rstate->state != SUBREL_STATE_SYNCDONE) > > > + PG_RE_THROW(); > > > > > > So, does this situation happens when we try to drop subscription after > > > the state is changed to syncdone but not syncready. If so, then can't > > > we write a function GetSubscriptionNotDoneRelations similar to > > > GetSubscriptionNotReadyRelations where we get a list of relations that > > > are not in done stage. I think this should be safe because once we are > > > here we shouldn't be allowed to start a new worker and old workers are > > > already stopped by this function. > > > > Yes, but I don't see how adding such a function is an improvement over > > the existing code: > > > > The advantage is that we don't need to use try..catch to deal with > such conditions which I don't think is a good way to deal with such > cases. Also, even after using try...catch, still, we can leak the > slots because the patch drops the slot after changing the state to > syncdone and if there is any error while dropping the slot, it simply > skips it. So, it is possible that the rel state is syncdone but the > slot still exists and we get an error due to some different reason, > and then we will silently skip it again and allow the subscription to > be dropped. > > I think instead what we should do is to drop the slot before we change > the rel state to syncdone. Also, if the apply workers fail to drop the > slot, it should try to again drop it after restart. In > DropSubscription, we can then check if the rel state is not SYNC or > READY, we can drop the corresponding slots. > Fixed as suggested in latest patch [v12] ---- [v12] = https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 3. > > > +#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */ > > > > > > You can mention in the comments that sublsn will be NULL for this > > > state as it is mentioned for other similar states. Can we think of > > > using any letter in lower case for this as all other states are in > > > lower-case except for this which makes it a look bit odd? We can use > > > 'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine > > > if you have any better ideas. > > > > > > > Fixed in latest patch [v11] > > > > It is still not reflected in the docs. See below: > --- a/doc/src/sgml/catalogs.sgml > +++ b/doc/src/sgml/catalogs.sgml > @@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration > count></replaceable>:<replaceable>&l > State code: > <literal>i</literal> = initialize, > <literal>d</literal> = data is being copied, > + <literal>C</literal> = table data has been copied, > <literal>s</literal> = synchronized, > Fixed in latest patch [v12] ---- [v12] = https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Jan 6, 2021 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Okay, if we want to go that way then we should add some documentation > about it. Currently, the slot name used by apply worker is known to > the user because either it is specified by the user or the default is > subscription name, so the user can manually remove that slot later but > that is not true for tablesync slots. I think we need to update both > the Drop Subscription page [1] and logical-replication-subscription > page [2] where we have mentioned temporary slots and in the end "Here > are some scenarios: .." to mention about these slots and probably how > their names are generated so that in such special situations users can > drop them manually. > > [1] - https://www.postgresql.org/docs/devel/sql-dropsubscription.html > [2] - https://www.postgresql.org/docs/devel/logical-replication-subscription.html > PG docs updated in latest patch [v12] ---- [v12] = https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > I think it makes sense. If there can be a race between the tablesync > > > re-launching (after error), and the AlterSubscription_refresh removing > > > some table’s relid from the subscription then there could be lurking > > > slot/origin tablesync resources (of the removed table) which a > > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about > > > how/if it is possible to make this happen. Anyway, I suppose I ought > > > to refactor/isolate some of the tablesync cleanup code in case it > > > needs to be commonly called from DropSubscription and/or from > > > AlterSubscription_refresh. > > > > > > > Fair enough. > > > > I think before implementing, we should once try to reproduce this > case. I understand this is a timing issue and can be reproduced only > with the help of debugger but we should do that. FYI, I was able to reproduce this case in debugger. PSA logs showing details. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
> PSA the v12 patch for the Tablesync Solution1. > > Differences from v11: > + Added PG docs to mention the tablesync slot > + Refactored tablesync slot drop (done by > DropSubscription/process_syncing_tables_for_sync) > + Fixed PG docs mentioning wrong state code > + Fixed wrong code comment describing TCOPYDONE state > Hi I look into the new patch and have some comments. 1. + /* Setup replication origin tracking. */ ①+ originid = replorigin_by_name(originname, true); + if (!OidIsValid(originid)) + { ②+ originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + { There are two different style code which check whether originid is valid. Both are fine, but do you think it’s better to have a same style here? 2. * state to SYNCDONE. There might be zero changes applied between * CATCHUP and SYNCDONE, because the sync worker might be ahead of the * apply worker. + * - The sync worker has a intermediary state TCOPYDONE which comes after + * DATASYNC and before SYNCWAIT. This state indicates that the initial This comment about TCOPYDONE is better to be placed at [1]*, where is between DATASYNC and SYNCWAIT. * - Tablesync worker starts; changes table state from INIT to DATASYNC while * copying. [1]* * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; * waits for state change. 3. + /* + * To build a slot name for the sync work, we are limited to NAMEDATALEN - + * 1 characters. + * + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's + * actually the NAMEDATALEN on the remote that matters, but this scheme + * will also work reasonably if that is different.) + */ + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ + + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); The comments says syncslotname is limit to NAMEDATALEN - 1 characters. But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1. Should we change the comment here? Best regards, houzj
On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > I think it makes sense. If there can be a race between the tablesync > > > > re-launching (after error), and the AlterSubscription_refresh removing > > > > some table’s relid from the subscription then there could be lurking > > > > slot/origin tablesync resources (of the removed table) which a > > > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about > > > > how/if it is possible to make this happen. Anyway, I suppose I ought > > > > to refactor/isolate some of the tablesync cleanup code in case it > > > > needs to be commonly called from DropSubscription and/or from > > > > AlterSubscription_refresh. > > > > > > > > > > Fair enough. > > > > > > > I think before implementing, we should once try to reproduce this > > case. I understand this is a timing issue and can be reproduced only > > with the help of debugger but we should do that. > > FYI, I was able to reproduce this case in debugger. PSA logs showing details. > Thanks for reproducing as I was worried about exactly this case. I have one question related to logs: ## ## ALTER SUBSCRIPTION to REFRESH the publication ## This blocks on some latch until the tablesync worker dies, then it continues ## Did you check which exact latch or lock blocks this? It is important to retain this interlock as otherwise even if decide to drop slot (and or origin) the tablesync worker might continue. -- With Regards, Amit Kapila.
Hi Amit. PSA the v13 patch for the Tablesync Solution1. Differences from v12: + Fixed whitespace errors of v12-0001 + Modify TCOPYDONE state comment (houzj feedback) + WIP fix for AlterSubscripion_refresh (Amit feedback) ==== Features: * The tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * The tablesync slot cleanup (drop) code is added for DropSubscription and for process_syncing_tables_for_sync functions * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers. * Updates to PG docs TODO / Known Issues: * Address review comments * ALTER PUBLICATION DROP TABLE can mean knowledge of tablesyncs gets lost causing resource cleanup to be missed. There is a WIP fix for this in the AlterSubscription_refresh, however it is not entirely correct; there are known race conditions. See FIXME comments. --- Kind Regards, Peter Smith. Fujitsu Australia On Thu, Jan 7, 2021 at 6:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Amit. > > PSA the v12 patch for the Tablesync Solution1. > > Differences from v11: > + Added PG docs to mention the tablesync slot > + Refactored tablesync slot drop (done by > DropSubscription/process_syncing_tables_for_sync) > + Fixed PG docs mentioning wrong state code > + Fixed wrong code comment describing TCOPYDONE state > > ==== > > Features: > > * The tablesync slot is now permanent instead of temporary. The > tablesync slot name is no longer tied to the Subscription slot na > > * The tablesync slot cleanup (drop) code is added for DropSubscription > and for process_syncing_tables_for_sync functions > > * The tablesync worker is now allowing multiple tx instead of single tx > > * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful > copy_table in LogicalRepSyncTableStart. > > * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then > it will bypass the initial copy_table phase. > > * Now tablesync sets up replication origin tracking in > LogicalRepSyncTableStart (similar as done for the apply worker). The > origin is advanced when first created. > > * The tablesync replication origin tracking is cleaned up during > DropSubscription and/or process_syncing_tables_for_apply. > > * The DropSubscription cleanup code was enhanced (v7+) to take care of > any crashed tablesync workers. > > * Updates to PG docs > > TODO / Known Issues: > > * Address review comments > > * Patch applies with whitespace warning > > --- > > Kind Regards, > Peter Smith. > Fujitsu Australia
Attachment
On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > PSA the v12 patch for the Tablesync Solution1. > > > > Differences from v11: > > + Added PG docs to mention the tablesync slot > > + Refactored tablesync slot drop (done by > > DropSubscription/process_syncing_tables_for_sync) > > + Fixed PG docs mentioning wrong state code > > + Fixed wrong code comment describing TCOPYDONE state > > > Hi > > I look into the new patch and have some comments. > > 1. > + /* Setup replication origin tracking. */ > ①+ originid = replorigin_by_name(originname, true); > + if (!OidIsValid(originid)) > + { > > ②+ originid = replorigin_by_name(originname, true); > + if (originid != InvalidRepOriginId) > + { > > There are two different style code which check whether originid is valid. > Both are fine, but do you think it’s better to have a same style here? Yes. I think the 1st style is better, so I used the OidIsValid for all the new code of the patch. But the check in DropSubscription is an exception; there I used 2nd style but ONLY to be consistent with another originid check which already existed in that same function. > > > 2. > * state to SYNCDONE. There might be zero changes applied between > * CATCHUP and SYNCDONE, because the sync worker might be ahead of the > * apply worker. > + * - The sync worker has a intermediary state TCOPYDONE which comes after > + * DATASYNC and before SYNCWAIT. This state indicates that the initial > > This comment about TCOPYDONE is better to be placed at [1]*, where is between DATASYNC and SYNCWAIT. > > * - Tablesync worker starts; changes table state from INIT to DATASYNC while > * copying. > [1]* > * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; > * waits for state change. > Agreed. I have moved the comment per your suggestion (and I also re-worded it again). Fixed in latest patch [v13] > 3. > + /* > + * To build a slot name for the sync work, we are limited to NAMEDATALEN - > + * 1 characters. > + * > + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's > + * actually the NAMEDATALEN on the remote that matters, but this scheme > + * will also work reasonably if that is different.) > + */ > + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > + > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > The comments says syncslotname is limit to NAMEDATALEN - 1 characters. > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1. > Should we change the comment here? > The comment wording is a remnant from older code which had a differently format slot name. I think the comment is still valid, albeit maybe unnecessary since in the current code the tablesync slot name length is fixed. But I left the older comment here as a safety reminder in case some future change would want to modify the slot name. What do you think? ---- [v13] = https://www.postgresql.org/message-id/CAHut%2BPvby4zg6kM1RoGd_j-xs9OtPqZPPVhbiC53gCCRWdNSrw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
On Fri, Jan 8, 2021 at 2:55 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > 3. > > + /* > > + * To build a slot name for the sync work, we are limited to NAMEDATALEN - > > + * 1 characters. > > + * > > + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's > > + * actually the NAMEDATALEN on the remote that matters, but this scheme > > + * will also work reasonably if that is different.) > > + */ > > + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > > + > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > > > The comments says syncslotname is limit to NAMEDATALEN - 1 characters. > > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1. > > Should we change the comment here? > > > > The comment wording is a remnant from older code which had a > differently format slot name. > I think the comment is still valid, albeit maybe unnecessary since in > the current code the tablesync slot > name length is fixed. But I left the older comment here as a safety reminder > in case some future change would want to modify the slot name. What do > you think? > I find it quite confusing. The comments should reflect the latest code. You can probably say in some form that the length of slotname shouldn't exceed NAMEDATALEN because of remote node constraints on slot name length. Also, probably the StaticAssert on NAMEDATALEN is not required. 1. + <para> + Additional table synchronization slots are normally transient, created + internally and dropped automatically when they are no longer needed. + These table synchronization slots have generated names: + <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription <parameter>oid</parameter>, Table <parameter>relid</parameter>) + </para> The last line seems too long. I think we are not strict for 80 char limit in docs but it is good to be close to that, however, this appears quite long. 2. + /* + * Cleanup any remaining tablesync resources. + */ + { + char originname[NAMEDATALEN]; + RepOriginId originid; + char state; + XLogRecPtr statelsn; I have already mentioned previously that let's not use this new style of code (start using { to localize the scope of variables). I don't know about others but I find it difficult to read such a code. You might want to consider moving this whole block to a separate function. 3. /* + * XXX - Should optimize this to avoid multiple + * connect/disconnect. + */ + wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err); I think it is better to avoid multiple connect/disconnect here. In this same function, we have connected to the publisher, we should be able to use the same connection. 4. process_syncing_tables_for_sync() { .. + /* + * Cleanup the tablesync slot. + */ + syncslotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid); + PG_TRY(); + { + elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + } + PG_FINALLY(); + { + pfree(syncslotname); + } + PG_END_TRY(); .. } Both here and in DropSubscription(), it seems we are using PG_TRY..PG_FINALLY just to free the memory even though ReplicationSlotDropAtPubNode already has try..finally. Can we arrange code to move allocation of syncslotname inside ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if the usage of try..finally here is only to free the memory, I am not sure if it is required because I think we will anyway Reset the memory context where this memory is allocated as part of error handling. 5. #define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn * NULL) */ +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed + * (sublsn NULL) */ #define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of * apply (sublsn set) */ I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as it is quite different from other adjoining state names and somehow not going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or SUBREL_STATE_FINISHEDCOPY 'f'? -- With Regards, Amit Kapila.
On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > FYI, I was able to reproduce this case in debugger. PSA logs showing details. > > > > Thanks for reproducing as I was worried about exactly this case. I > have one question related to logs: > > ## > ## ALTER SUBSCRIPTION to REFRESH the publication > > ## This blocks on some latch until the tablesync worker dies, then it continues > ## > > Did you check which exact latch or lock blocks this? > I have checked this myself and the command is waiting on the drop of origin till the tablesync worker is finished because replorigin_drop() requires state->acquired_by to be 0 which will only be true once the tablesync worker exits. I think this is the reason you might have noticed that the command can't be finished until the tablesync worker died. So this can't be an interlock between ALTER SUBSCRIPTION .. REFRESH command and tablesync worker and to that end it seems you have below Fixme's in the patch: + * FIXME - Usually this cleanup would be OK, but will not + * always be OK because the logicalrep_worker_stop_at_commit + * only "flags" the worker to be stopped in the near future + * but meanwhile it may still be running. In this case there + * could be a race between the tablesync worker and this code + * to see who will succeed with the tablesync drop (and the + * loser will ERROR). + * + * FIXME - Also, checking the state is also not guaranteed + * correct because state might be TCOPYDONE when we checked + * but has since progressed to SYNDONE + */ + + if (state == SUBREL_STATE_TCOPYDONE) + { I feel this was okay for an earlier code but now we need to stop the tablesync workers before trying to drop the slot as we do in DropSubscription. Now, if we do that then that would fix the race conditions mentioned in Fixme but still, there are few more things I am worried about: (a) What if the launcher again starts the tablesync worker? One idea could be to acquire AccessExclusiveLock on SubscriptionRelationId as we do in DropSubscription which is not a very good idea but I can't think of any other good way. (b) the patch is just checking SUBREL_STATE_TCOPYDONE before dropping the replication slot but the slot could be created even before that (in SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the slot and if we are not able to drop then we can simply continue assuming it didn't exist. One minor comment: 1. + SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE; MyLogicalRepWorker->relstate_lsn = current_lsn; - Spurious line removal. -- With Regards, Amit Kapila.
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > BTW, I have analyzed whether we need any modifications to > > pg_dump/restore for this patch as this changes the state of one of the > > fields in the system table and concluded that we don't need any > > change. For subscriptions, we don't dump any of the information from > > pg_subscription_rel, rather we just dump subscriptions with the > > connect option as false which means users need to enable the > > subscription and refresh publication after restore. I have checked > > this in the code and tested it as well. The related information is > > present in pg_dump doc page [1], see from "When dumping logical > > replication subscriptions ....". > > > > I have further analyzed that we don't need to do anything w.r.t > pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the > schema info of the old cluster and then restore it to the new cluster. > And, we know that pg_dump ignores the info in pg_subscription_rel, so > we don't need to change anything as our changes are specific to the > state of one of the columns in pg_subscription_rel. I have not tested > this but we should test it by having some relations in not_ready state > and then allow the old cluster (<=PG13) to be upgraded to new (pg14) > both with and without this patch and see if there is any change in > behavior. I have tested this scenario, stopped a server running PG_13 when subscription table sync was in progress. One of the tables in pg_subscription_rel was still in 'd' state (DATASYNC) postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn ---------+---------+------------+------------ 16424 | 16384 | d | 16424 | 16390 | r | 0/247A63D8 16424 | 16395 | r | 0/247A6410 16424 | 16387 | r | 0/247A6448 (4 rows) then initiated the pg_upgrade to PG_14 with the patch and without the patch: I see that the subscription exists but is not enabled: postgres=# select * from pg_subscription; oid | subdbid | subname | subowner | subenabled | subbinary | substream | subconninfo | subslotname | subsynccommit | subpublications -------+---------+---------+----------+------------+-----------+-----------+------------------------------------------+-------------+---------------+----------------- 16407 | 16401 | tap_sub | 10 | f | f | f | host=localhost port=6972 dbname=postgres | tap_sub | off | {tap_pub} (1 row) and looking at the pg_subscription_rel: postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn ---------+---------+------------+---------- (0 rows) As can be seen, none of the data in the pg_subscription_rel has been copied over. Same behaviour is seen with the patch and without the patch. regards, Ajin Cherian Fujitsu Australia
On Mon, Jan 11, 2021 at 3:53 PM Ajin Cherian <itsajin@gmail.com> wrote: > > On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > BTW, I have analyzed whether we need any modifications to > > > pg_dump/restore for this patch as this changes the state of one of the > > > fields in the system table and concluded that we don't need any > > > change. For subscriptions, we don't dump any of the information from > > > pg_subscription_rel, rather we just dump subscriptions with the > > > connect option as false which means users need to enable the > > > subscription and refresh publication after restore. I have checked > > > this in the code and tested it as well. The related information is > > > present in pg_dump doc page [1], see from "When dumping logical > > > replication subscriptions ....". > > > > > > > I have further analyzed that we don't need to do anything w.r.t > > pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the > > schema info of the old cluster and then restore it to the new cluster. > > And, we know that pg_dump ignores the info in pg_subscription_rel, so > > we don't need to change anything as our changes are specific to the > > state of one of the columns in pg_subscription_rel. I have not tested > > this but we should test it by having some relations in not_ready state > > and then allow the old cluster (<=PG13) to be upgraded to new (pg14) > > both with and without this patch and see if there is any change in > > behavior. > > I have tested this scenario, stopped a server running PG_13 when > subscription table sync was in progress. > Thanks for the test. This confirms my analysis and we don't need any change in pg_dump or pg_upgrade for this patch. -- With Regards, Amit Kapila.
Hi Amit. PSA the v14 patch for the Tablesync Solution1. Main differences from v13: + Addresses all review comments 1-5, posted 9/Jan [ak9] + Addresses review comment 1, posted 11/Jan [ak11] + Modifications per suggestion [ak11] to handle race scenarios during Drop/AlterSubscription + Changed LOG to WARNING if DropSubscription unable to drop tablesync slot [ak9] = https://www.postgresql.org/message-id/CAA4eK1%2BgUBxKcYWg%2BMCC6Qbw-My%2B2wKUct%2BiFtr-_HgundUUBQ%40mail.gmail.com [ak11] = https://www.postgresql.org/message-id/CAA4eK1KGUt86A7CfuQW6OeDvAhEbVk8VOBJmcoZjrYBn965kOA%40mail.gmail.com ==== Features: * The tablesync slot is now permanent instead of temporary. * The tablesync slot name is no longer tied to the Subscription slot name. * The tablesync slot cleanup (drop) code is added for DropSubscription, AlterSubscription_refresh and for process_syncing_tables_for_sync functions. Drop/AlterSubscription will issue WARNING instead of ERROR in case the slot drop fails. * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers. * The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping workers for any "removed" tables. * Updates to PG docs. TODO / Known Issues: * Minor review comments === Also PSA some detailed logging evidence of some test scenarios involving Drop/AlterSubscription: + Test-20210112-AlterSubscriptionRefresh-ok.txt = AlterSubscription_refresh which successfully drops a tablesync slot + Test-20210112-AlterSubscriptionRefresh-warning.txt = AlterSubscription_refresh gives WARNING that it cannot drop the tablesync slot (which no longer exists) + Test-20210112-DropSubscription-warning.txt = DropSubscription with a disassociated slot_name gives a WARNING that it cannot drop the tablesync slot (due to broken connection) --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Sat, Jan 9, 2021 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 8, 2021 at 2:55 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > > > > 3. > > > + /* > > > + * To build a slot name for the sync work, we are limited to NAMEDATALEN - > > > + * 1 characters. > > > + * > > > + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's > > > + * actually the NAMEDATALEN on the remote that matters, but this scheme > > > + * will also work reasonably if that is different.) > > > + */ > > > + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > > > + > > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > > > > > The comments says syncslotname is limit to NAMEDATALEN - 1 characters. > > > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1. > > > Should we change the comment here? > > > > > > > The comment wording is a remnant from older code which had a > > differently format slot name. > > I think the comment is still valid, albeit maybe unnecessary since in > > the current code the tablesync slot > > name length is fixed. But I left the older comment here as a safety reminder > > in case some future change would want to modify the slot name. What do > > you think? > > > > I find it quite confusing. The comments should reflect the latest > code. You can probably say in some form that the length of slotname > shouldn't exceed NAMEDATALEN because of remote node constraints on > slot name length. Also, probably the StaticAssert on NAMEDATALEN is > not required. Modified comment in latest patch [v14] > > 1. > + <para> > + Additional table synchronization slots are normally transient, created > + internally and dropped automatically when they are no longer needed. > + These table synchronization slots have generated names: > + <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: > Subscription <parameter>oid</parameter>, Table > <parameter>relid</parameter>) > + </para> > > The last line seems too long. I think we are not strict for 80 char > limit in docs but it is good to be close to that, however, this > appears quite long. Fixed in latest patch [v14] > > 2. > + /* > + * Cleanup any remaining tablesync resources. > + */ > + { > + char originname[NAMEDATALEN]; > + RepOriginId originid; > + char state; > + XLogRecPtr statelsn; > > I have already mentioned previously that let's not use this new style > of code (start using { to localize the scope of variables). I don't > know about others but I find it difficult to read such a code. You > might want to consider moving this whole block to a separate function. > Removed extra code block in latest patch [v14] > 3. > /* > + * XXX - Should optimize this to avoid multiple > + * connect/disconnect. > + */ > + wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err); > > I think it is better to avoid multiple connect/disconnect here. In > this same function, we have connected to the publisher, we should be > able to use the same connection. > Fixed in latest patch [v14] > 4. > process_syncing_tables_for_sync() > { > .. > + /* > + * Cleanup the tablesync slot. > + */ > + syncslotname = ReplicationSlotNameForTablesync( > + MySubscription->oid, > + MyLogicalRepWorker->relid); > + PG_TRY(); > + { > + elog(DEBUG1, "process_syncing_tables_for_sync: dropping the > tablesync slot \"%s\".", syncslotname); > + ReplicationSlotDropAtPubNode(wrconn, syncslotname); > + } > + PG_FINALLY(); > + { > + pfree(syncslotname); > + } > + PG_END_TRY(); > .. > } > > Both here and in DropSubscription(), it seems we are using > PG_TRY..PG_FINALLY just to free the memory even though > ReplicationSlotDropAtPubNode already has try..finally. Can we arrange > code to move allocation of syncslotname inside > ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if > the usage of try..finally here is only to free the memory, I am not > sure if it is required because I think we will anyway Reset the memory > context where this memory is allocated as part of error handling. > Eliminated need for TRY/FINALLY to free syncslotname in latest patch [v14] > 5. > #define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn > * NULL) */ > +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed > + * (sublsn NULL) */ > #define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of > * apply (sublsn set) */ > > I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as > it is quite different from other adjoining state names and somehow not > going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or > SUBREL_STATE_FINISHEDCOPY 'f'? > Using SUBREL_STATE_FINISHEDCOPY in latest patch [v14] --- [v14] = https://www.postgresql.org/message-id/CAHut%2BPsPO2vOp%2BP7U2szROMy15PKKGanhUrCYQ0ffpy9zG1V1A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > FYI, I was able to reproduce this case in debugger. PSA logs showing details. > > > > > > > Thanks for reproducing as I was worried about exactly this case. I > > have one question related to logs: > > > > ## > > ## ALTER SUBSCRIPTION to REFRESH the publication > > > > ## This blocks on some latch until the tablesync worker dies, then it continues > > ## > > > > Did you check which exact latch or lock blocks this? > > > > I have checked this myself and the command is waiting on the drop of > origin till the tablesync worker is finished because replorigin_drop() > requires state->acquired_by to be 0 which will only be true once the > tablesync worker exits. I think this is the reason you might have > noticed that the command can't be finished until the tablesync worker > died. So this can't be an interlock between ALTER SUBSCRIPTION .. > REFRESH command and tablesync worker and to that end it seems you have > below Fixme's in the patch: I have also seen this same blocking reason before in the replorigin_drop(). However, back when I first tested/reproduced the refresh issue [test-refresh] that AlterSubscription_refresh was still *original* unchanged code, so at that time it did not have any replorigin_drop() in at all. In any case in the latest code [v14] the AlterSubscription is immediately stopping the workers so this question may be moot. > > + * FIXME - Usually this cleanup would be OK, but will not > + * always be OK because the logicalrep_worker_stop_at_commit > + * only "flags" the worker to be stopped in the near future > + * but meanwhile it may still be running. In this case there > + * could be a race between the tablesync worker and this code > + * to see who will succeed with the tablesync drop (and the > + * loser will ERROR). > + * > + * FIXME - Also, checking the state is also not guaranteed > + * correct because state might be TCOPYDONE when we checked > + * but has since progressed to SYNDONE > + */ > + > + if (state == SUBREL_STATE_TCOPYDONE) > + { > > I feel this was okay for an earlier code but now we need to stop the > tablesync workers before trying to drop the slot as we do in > DropSubscription. Now, if we do that then that would fix the race > conditions mentioned in Fixme but still, there are few more things I > am worried about: (a) What if the launcher again starts the tablesync > worker? One idea could be to acquire AccessExclusiveLock on > SubscriptionRelationId as we do in DropSubscription which is not a > very good idea but I can't think of any other good way. (b) the patch > is just checking SUBREL_STATE_TCOPYDONE before dropping the > replication slot but the slot could be created even before that (in > SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the > slot and if we are not able to drop then we can simply continue > assuming it didn't exist. The code was modified in the latest patch [v14] something like as suggested. The workers for removed tables are now immediately stopped (like DropSubscription does). Although I did include the AccessExclusiveLock as (a) suggested, AFAIK this was actually ineffective at preventing the workers relaunching. Instead, I am using logicalrep_worker_stop_at_commit to do this - testing shows it as working ok. Please see the code and latest test logs [v14] for details. Also, now the Drop/AlterSubscription will only give WARNING if unable to drop slots, a per suggestion (b). This is also tested [v14]. > > One minor comment: > 1. > + SpinLockAcquire(&MyLogicalRepWorker->relmutex); > MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE; > MyLogicalRepWorker->relstate_lsn = current_lsn; > - > > Spurious line removal. Fixed in latest patch [v14] ---- [v14] = https://www.postgresql.org/message-id/CAHut%2BPsPO2vOp%2BP7U2szROMy15PKKGanhUrCYQ0ffpy9zG1V1A%40mail.gmail.com [test-refresh] https://www.postgresql.org/message-id/CAHut%2BPv7YW7AyO_Q_nf9kzogcJcDFQNe8FBP6yXdzowMz3dY_Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
> Also PSA some detailed logging evidence of some test scenarios involving > Drop/AlterSubscription: > + Test-20210112-AlterSubscriptionRefresh-ok.txt = > AlterSubscription_refresh which successfully drops a tablesync slot > + Test-20210112-AlterSubscriptionRefresh-warning.txt = > AlterSubscription_refresh gives WARNING that it cannot drop the tablesync > slot (which no longer exists) > + Test-20210112-DropSubscription-warning.txt = DropSubscription with a > disassociated slot_name gives a WARNING that it cannot drop the tablesync > slot (due to broken connection) Hi > * The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping workers for any "removed"tables. I have an issue about the above feature. With the patch, it seems does not stop the worker in the case of [1]. I probably missed something, should we stop the worker in such case ? [1] https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com Best regards, houzj
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > 7. > @@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node, > LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE); > > /* Make sure it's not used by somebody else */ > - if (replication_state->acquired_by != 0) > + if (replication_state->acquired_by != 0 && > replication_state->acquired_by != MyProcPid) > { > > I think you won't need this change if you do replorigin_advance before > replorigin_session_setup in your patch. > As you know the replorigin_session_setup sets the replication_state->acquired_by to be the current PID. So without this change the replorigin_advance rejects that same slot state thinking that it is already active for a different process. Root problem is that the same process/PID calling both functions would hang. So this patch change allows replorigin_advance code to be called by self. IIUC that acquired_by check condition is like a sanity check for the originid passed. The patched code only does just like what the comment says: "/* Make sure it's not used by somebody else */" Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid). Also, “setup” of a thing generally comes before usage of that thing, so won't it seem strange to do (like the suggestion) and deliberately call the "setup" function 2nd instead of 1st? Can you please explain why is it better to do it the suggested way (switch the calls around) than keep the patch code? Probably there is a good reason but I am just not understanding it. ---- Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Jan 13, 2021 at 1:07 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > Also PSA some detailed logging evidence of some test scenarios involving > > Drop/AlterSubscription: > > + Test-20210112-AlterSubscriptionRefresh-ok.txt = > > AlterSubscription_refresh which successfully drops a tablesync slot > > + Test-20210112-AlterSubscriptionRefresh-warning.txt = > > AlterSubscription_refresh gives WARNING that it cannot drop the tablesync > > slot (which no longer exists) > > + Test-20210112-DropSubscription-warning.txt = DropSubscription with a > > disassociated slot_name gives a WARNING that it cannot drop the tablesync > > slot (due to broken connection) > > Hi > > > * The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping workers for any "removed"tables. > I have an issue about the above feature. > > With the patch, it seems does not stop the worker in the case of [1]. > I probably missed something, should we stop the worker in such case ? > > [1] https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com > I am not exactly sure of the concern. (If the extra info below does not help can you please describe your concern with more details). This [v14] patch code/feature is only referring to the immediate stopping of only the *** "tablesync" *** worker (if any) for any/each table being removed from the subscription. It has nothing to say about the "apply" worker of the subscription, which continues replicating as before. OTOH, I think the other mail problem is not really related to the "tablesync" workers. As you can see (e.g. steps 7,8,9,10 of [2]), that problem is described as continuing over multiple transactions to replicate unexpected rows - I think this could only be done by the subscription "apply" worker, and is after the "tablesync" worker has gone away. So AFAIK these are 2 quite unrelated problems, and would be solved independently. It just happens that they are both exposed using ALTER SUBSCRIPTION ... REFRESH PUBLICATION; ---- [v14] = https://www.postgresql.org/message-id/CAHut%2BPsPO2vOp%2BP7U2szROMy15PKKGanhUrCYQ0ffpy9zG1V1A%40mail.gmail.com [2] = https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
> I am not exactly sure of the concern. (If the extra info below does not > help can you please describe your concern with more details). > > This [v14] patch code/feature is only referring to the immediate stopping > of only the *** "tablesync" *** worker (if any) for any/each table being > removed from the subscription. It has nothing to say about the "apply" worker > of the subscription, which continues replicating as before. > > OTOH, I think the other mail problem is not really related to the "tablesync" > workers. As you can see (e.g. steps 7,8,9,10 of [2]), that problem is > described as continuing over multiple transactions to replicate unexpected > rows - I think this could only be done by the subscription "apply" worker, > and is after the "tablesync" worker has gone away. > > So AFAIK these are 2 quite unrelated problems, and would be solved > independently. > > It just happens that they are both exposed using ALTER SUBSCRIPTION ... > REFRESH PUBLICATION; So sorry for the confusion, you are right that these are 2 quite unrelated problems. I misunderstood the 'stop the worker' here. + /* Immediately stop the worker. */ + logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */ + logicalrep_worker_stop(subid, relid); /* stop immediately */ Do you think we can add some comments to describe what type "worker" is stop here ? (sync worker here) And should we add some more comments to talk about the reason of " Immediately stop " here ? it may looks easier to understand. Best regards, Houzj
On Wed, Jan 13, 2021 at 1:30 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > I am not exactly sure of the concern. (If the extra info below does not > > help can you please describe your concern with more details). > > > > This [v14] patch code/feature is only referring to the immediate stopping > > of only the *** "tablesync" *** worker (if any) for any/each table being > > removed from the subscription. It has nothing to say about the "apply" worker > > of the subscription, which continues replicating as before. > > > > OTOH, I think the other mail problem is not really related to the "tablesync" > > workers. As you can see (e.g. steps 7,8,9,10 of [2]), that problem is > > described as continuing over multiple transactions to replicate unexpected > > rows - I think this could only be done by the subscription "apply" worker, > > and is after the "tablesync" worker has gone away. > > > > So AFAIK these are 2 quite unrelated problems, and would be solved > > independently. > > > > It just happens that they are both exposed using ALTER SUBSCRIPTION ... > > REFRESH PUBLICATION; > > So sorry for the confusion, you are right that these are 2 quite unrelated problems. > I misunderstood the 'stop the worker' here. > > > + /* Immediately stop the worker. */ > + logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */ > + logicalrep_worker_stop(subid, relid); /* stop immediately */ > > Do you think we can add some comments to describe what type "worker" is stop here ? (sync worker here) > And should we add some more comments to talk about the reason of " Immediately stop " here ? it may looks easier to understand. > Another thing related to this is why we need to call both logicalrep_worker_stop_at_commit() and logicalrep_worker_stop()? -- With Regards, Amit Kapila.
On Wed, Jan 13, 2021 at 11:18 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 7. > > @@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node, > > LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE); > > > > /* Make sure it's not used by somebody else */ > > - if (replication_state->acquired_by != 0) > > + if (replication_state->acquired_by != 0 && > > replication_state->acquired_by != MyProcPid) > > { > > > > I think you won't need this change if you do replorigin_advance before > > replorigin_session_setup in your patch. > > > > As you know the replorigin_session_setup sets the > replication_state->acquired_by to be the current PID. So without this > change the replorigin_advance rejects that same slot state thinking > that it is already active for a different process. Root problem is > that the same process/PID calling both functions would hang. > I think the hang happens only if we call unchanged replorigin_advance after session_setup API, right? > So this > patch change allows replorigin_advance code to be called by self. > > IIUC that acquired_by check condition is like a sanity check for the > originid passed. The patched code only does just like what the comment > says: > "/* Make sure it's not used by somebody else */" > Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid). > > Also, “setup” of a thing generally comes before usage of that thing, > so won't it seem strange to do (like the suggestion) and deliberately > call the "setup" function 2nd instead of 1st? > > Can you please explain why is it better to do it the suggested way > (switch the calls around) than keep the patch code? Probably there is > a good reason but I am just not understanding it. > Because there is no requirement for origin_advance API to be called after session setup. Session setup is required to mark the node as replaying from a remote node, see [1] whereas origin_advance is used for setting up the initial location or setting a new location, see [2] (pg_replication_origin_advance). Now here, after creating the origin, we need to set up the initial location and it seems fine to call origin_advance before session_setup. In short, as such, I don't see any problem with your change in replorigin_advance but OTOH, I don't see the need for the same as well. So, let's try to avoid that change unless we can't do without it. Also, another thing is we need to take RowExclusiveLock on pg_replication_origin as written in comments atop replorigin_advance before calling it. See its usage in pg_replication_origin_advance. Also, write comments on why we need to use replorigin_advance here (... something, like we need to WAL log this for the purpose of recovery...). [1] - https://www.postgresql.org/docs/devel/replication-origins.html [2] - https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION -- With Regards, Amit Kapila.
On Tue, Jan 12, 2021 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > FYI, I was able to reproduce this case in debugger. PSA logs showing details. > > > > > > > > > > Thanks for reproducing as I was worried about exactly this case. I > > > have one question related to logs: > > > > > > ## > > > ## ALTER SUBSCRIPTION to REFRESH the publication > > > > > > ## This blocks on some latch until the tablesync worker dies, then it continues > > > ## > > > > > > Did you check which exact latch or lock blocks this? > > > > > > > I have checked this myself and the command is waiting on the drop of > > origin till the tablesync worker is finished because replorigin_drop() > > requires state->acquired_by to be 0 which will only be true once the > > tablesync worker exits. I think this is the reason you might have > > noticed that the command can't be finished until the tablesync worker > > died. So this can't be an interlock between ALTER SUBSCRIPTION .. > > REFRESH command and tablesync worker and to that end it seems you have > > below Fixme's in the patch: > > I have also seen this same blocking reason before in the replorigin_drop(). > However, back when I first tested/reproduced the refresh issue > [test-refresh] that > AlterSubscription_refresh was still *original* unchanged code, so at > that time it did not > have any replorigin_drop() in at all. In any case in the latest code > [v14] the AlterSubscription is > immediately stopping the workers so this question may be moot. > > > > > + * FIXME - Usually this cleanup would be OK, but will not > > + * always be OK because the logicalrep_worker_stop_at_commit > > + * only "flags" the worker to be stopped in the near future > > + * but meanwhile it may still be running. In this case there > > + * could be a race between the tablesync worker and this code > > + * to see who will succeed with the tablesync drop (and the > > + * loser will ERROR). > > + * > > + * FIXME - Also, checking the state is also not guaranteed > > + * correct because state might be TCOPYDONE when we checked > > + * but has since progressed to SYNDONE > > + */ > > + > > + if (state == SUBREL_STATE_TCOPYDONE) > > + { > > > > I feel this was okay for an earlier code but now we need to stop the > > tablesync workers before trying to drop the slot as we do in > > DropSubscription. Now, if we do that then that would fix the race > > conditions mentioned in Fixme but still, there are few more things I > > am worried about: (a) What if the launcher again starts the tablesync > > worker? One idea could be to acquire AccessExclusiveLock on > > SubscriptionRelationId as we do in DropSubscription which is not a > > very good idea but I can't think of any other good way. (b) the patch > > is just checking SUBREL_STATE_TCOPYDONE before dropping the > > replication slot but the slot could be created even before that (in > > SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the > > slot and if we are not able to drop then we can simply continue > > assuming it didn't exist. > > The code was modified in the latest patch [v14] something like as suggested. > > The workers for removed tables are now immediately stopped (like > DropSubscription does). Although I did include the AccessExclusiveLock > as (a) suggested, AFAIK this was actually ineffective at preventing > the workers relaunching. > The reason why it was ineffective is that you are locking SubscriptionRelationId which is to protect relaunch of apply workers not tablesync workers. But in current form even acquiring SubscriptionRelRelationId lock won't serve the purpose because process_syncing_tables_for_apply() doesn't always acquire it before relaunching the tablesync workers. However, if we acquire SubscriptionRelRelationId in process_syncing_tables_for_apply() then it would prevent relaunch of workers but not sure if that is a good idea. Can you think of some other way? > Instead, I am using > logicalrep_worker_stop_at_commit to do this - testing shows it as > working ok. Please see the code and latest test logs [v14] for > details. > There is still a window where it can relaunch. Basically, after you stop the worker in AlterSubscription_refresh and till the commit happens apply worker can relaunch the tablesync workers. I don't see code-wise how we can protect that. And if the tablesync workers are restarted after we stopped them, the purpose won't be achieved because it can recreate or try to reuse the slot which we have dropped. The other issue with the current code could be that after we drop the slot and origin what if the transaction (in which we are doing Alter Subscription) is rolledback? Basically, the workers will be relaunched and it would assume that slot should be there but the slot won't be present. I have thought of dropping the slot at commit time after we stop the workers but again not sure if that is a good idea because at that point we don't want to establish the connection with the publisher. I think this needs some more thought. -- With Regards, Amit Kapila.
Hi Amit. PSA the v15 patch for the Tablesync Solution1. Main differences from v14: + Addresses review comment, posted 13/Jan [ak13] [ak13] = https://www.postgresql.org/message-id/CAA4eK1KzNbudfwmJD-ureYigX6sNyCU6YgHscg29xWoZG6osvA%40mail.gmail.com ==== Features: * The tablesync slot is now permanent instead of temporary. * The tablesync slot name is no longer tied to the Subscription slot name. * The tablesync slot cleanup (drop) code is added for DropSubscription, AlterSubscription_refresh and for process_syncing_tables_for_sync functions. Drop/AlterSubscription will issue WARNING instead of ERROR in case the slot drop fails. * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers. * The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping tablesync workers for any "removed" tables. * Updates to PG docs. TODO / Known Issues: * The AlterSubscription_refresh tablesync cleanup code still has some problems [1] [1] = https://www.postgresql.org/message-id/CAA4eK1JuwZF7FHM%2BEPjWdVh%3DXaz-7Eo-G0TByMjWeUU32Xue3w%40mail.gmail.com --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Jan 13, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 13, 2021 at 11:18 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > 7. > > > @@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node, > > > LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE); > > > > > > /* Make sure it's not used by somebody else */ > > > - if (replication_state->acquired_by != 0) > > > + if (replication_state->acquired_by != 0 && > > > replication_state->acquired_by != MyProcPid) > > > { > > > > > > I think you won't need this change if you do replorigin_advance before > > > replorigin_session_setup in your patch. > > > > > > > As you know the replorigin_session_setup sets the > > replication_state->acquired_by to be the current PID. So without this > > change the replorigin_advance rejects that same slot state thinking > > that it is already active for a different process. Root problem is > > that the same process/PID calling both functions would hang. > > > > I think the hang happens only if we call unchanged replorigin_advance > after session_setup API, right? > > > So this > > patch change allows replorigin_advance code to be called by self. > > > > IIUC that acquired_by check condition is like a sanity check for the > > originid passed. The patched code only does just like what the comment > > says: > > "/* Make sure it's not used by somebody else */" > > Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid). > > > > Also, “setup” of a thing generally comes before usage of that thing, > > so won't it seem strange to do (like the suggestion) and deliberately > > call the "setup" function 2nd instead of 1st? > > > > Can you please explain why is it better to do it the suggested way > > (switch the calls around) than keep the patch code? Probably there is > > a good reason but I am just not understanding it. > > > > Because there is no requirement for origin_advance API to be called > after session setup. Session setup is required to mark the node as > replaying from a remote node, see [1] whereas origin_advance is used > for setting up the initial location or setting a new location, see [2] > (pg_replication_origin_advance). > > Now here, after creating the origin, we need to set up the initial > location and it seems fine to call origin_advance before > session_setup. In short, as such, I don't see any problem with your > change in replorigin_advance but OTOH, I don't see the need for the > same as well. So, let's try to avoid that change unless we can't do > without it. > > Also, another thing is we need to take RowExclusiveLock on > pg_replication_origin as written in comments atop replorigin_advance > before calling it. See its usage in pg_replication_origin_advance. > Also, write comments on why we need to use replorigin_advance here > (... something, like we need to WAL log this for the purpose of > recovery...). > Modified in latest patch [v15]. ---- [v15] = https://www.postgresql.org/message-id/CAHut%2BPu3he2rOWjbXcNUO6z3aH2LYzW03KV%2BfiMWim49qW9etQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Jan 13, 2021 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 12, 2021 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The workers for removed tables are now immediately stopped (like > > DropSubscription does). Although I did include the AccessExclusiveLock > > as (a) suggested, AFAIK this was actually ineffective at preventing > > the workers relaunching. > > > > The reason why it was ineffective is that you are locking > SubscriptionRelationId which is to protect relaunch of apply workers > not tablesync workers. But in current form even acquiring > SubscriptionRelRelationId lock won't serve the purpose because > process_syncing_tables_for_apply() doesn't always acquire it before > relaunching the tablesync workers. However, if we acquire > SubscriptionRelRelationId in process_syncing_tables_for_apply() then > it would prevent relaunch of workers but not sure if that is a good > idea. Can you think of some other way? > > > Instead, I am using > > logicalrep_worker_stop_at_commit to do this - testing shows it as > > working ok. Please see the code and latest test logs [v14] for > > details. > > > > There is still a window where it can relaunch. Basically, after you > stop the worker in AlterSubscription_refresh and till the commit > happens apply worker can relaunch the tablesync workers. I don't see > code-wise how we can protect that. And if the tablesync workers are > restarted after we stopped them, the purpose won't be achieved because > it can recreate or try to reuse the slot which we have dropped. > > The other issue with the current code could be that after we drop the > slot and origin what if the transaction (in which we are doing Alter > Subscription) is rolledback? Basically, the workers will be relaunched > and it would assume that slot should be there but the slot won't be > present. I have thought of dropping the slot at commit time after we > stop the workers but again not sure if that is a good idea because at > that point we don't want to establish the connection with the > publisher. > > I think this needs some more thought. > I have another idea to solve this problem. Instead of Alter Subscription drop the slot/origin, we can let tablesync worker do it. Basically, we need to register SignalHandlerForShutdownRequest as SIGTERM handler and then later need to check ShutdownRequestPending flag in the tablesync worker. If the flag is set, then we can drop the slot/origin and allow the process to exit cleanly. This will obviate the need to take the lock and all sort of rollback problems. If this works out well then I think we can use this for DropSubscription as well but that is a matter of separate patch. Thoughts? -- With Regards, Amit Kapila.
Hi Amit. PSA the v16 patch for the Tablesync Solution1. Main differences from v15: + Tablesync cleanups of DropSubscription/AlterSubscription_refresh are re-implemented as as ProcessInterrupts function ==== Features: * The tablesync slot is now permanent instead of temporary. * The tablesync slot name is no longer tied to the Subscription slot name. * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * Cleanup of tablesync resources: - The tablesync slot cleanup (drop) code is added for process_syncing_tables_for_sync functions. - The tablesync replication origin tracking is cleaned process_syncing_tables_for_apply. - A tablesync function to cleanup its own slot/origin is called from ProcessInterrupt. This is indirectly invoked by DropSubscription/AlterSubscrition when they signal the tablesync worker to stop. * Updates to PG docs. TODO / Known Issues: * Race condition observed in "make check" may be related to this patch. * Add test cases. --- Please also see some test scenario logging which shows the new tablesync cleanup function getting called as results of Drop/AlterSUbscription. --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Hi Amit. PSA the v17 patch for the Tablesync Solution1. Main differences from v16: + Small refactor for DropSubscription to correct the "make check" deadlock + Added test case + Some comment wording ==== Features: * The tablesync slot is now permanent instead of temporary. * The tablesync slot name is no longer tied to the Subscription slot name. * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * Cleanup of tablesync resources: - The tablesync slot cleanup (drop) code is added for process_syncing_tables_for_sync functions. - The tablesync replication origin tracking is cleaned process_syncing_tables_for_apply. - A tablesync function to cleanup its own slot/origin is called fro ProcessInterrupts. This is indirectly invoked by DropSubscription/AlterSubscription when they signal the tablesync worker to stop. * Updates to PG docs. * New TAP test case TODO / Known Issues: * None known. --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Amit. > > PSA the v17 patch for the Tablesync Solution1. > Thanks for the updated patch. Below are few comments: 1. Why are we changing the scope of PG_TRY in DropSubscription()? Also, it might be better to keep the replication slot drop part as it is. 2. - * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; - * waits for state change. + * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to + * indicate when the copy phase has completed, so if the worker crashes + * before reaching SYNCDONE the copy will not be re-attempted. In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE? 3. +void +tablesync_cleanup_at_interrupt(void) +{ + bool drop_slot_needed; + char originname[NAMEDATALEN] = {0}; + RepOriginId originid; + TimeLineID tli; + Oid subid = MySubscription->oid; + Oid relid = MyLogicalRepWorker->relid; + + elog(DEBUG1, + "tablesync_cleanup_at_interrupt for relid = %d", + MyLogicalRepWorker->relid); The function name and message makes it sound like that we drop slot and origin at any interrupt. Isn't it better to name it as tablesync_cleanup_at_shutdown()? 4. + drop_slot_needed = + wrconn != NULL && + MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE && + MyLogicalRepWorker->relstate != SUBREL_STATE_READY; + + if (drop_slot_needed) + { + char syncslotname[NAMEDATALEN] = {0}; + bool missing_ok = true; /* no ERROR if slot is missing. */ I think we can avoid using missing_ok and drop_slot_needed variables. 5. Can we drop the origin along with the slot in process_syncing_tables_for_sync() instead of process_syncing_tables_for_apply()? I think this is possible because of the other changes you made in origin.c. Also, if possible, we can try to use the same code to drop the slot and origin in tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync. 6. + if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY) + { + /* + * The COPY phase was previously done, but tablesync then crashed/etc + * before it was able to finish normally. + */ There seems to be a typo (crashed/etc) in the above comment. 7. +# check for occurrence of the expected error +poll_output_until("replication slot \"$slotname\" already exists") + or die "no error stop for the pre-existing origin"; In this test, isn't it better to check for datasync state like below? 004_sync.pl has some other similar test. my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;"; $node_subscriber->poll_query_until('postgres', $started_query) or die "Timed out while waiting for subscriber to start sync"; Is there a reason why we can't use the existing way to check for failure in this case? -- With Regards, Amit Kapila.
On Thu, Jan 21, 2021 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi Amit. > > > > PSA the v17 patch for the Tablesync Solution1. > > > > Thanks for the updated patch. Below are few comments: > One more comment: In LogicalRepSyncTableStart(), you are trying to remove the slot on the failure of copy which won't work if the publisher is down. If that happens on restart of tablesync worker, we will retry to create the slot with the same name and it will fail because the previous slot is still not removed from the publisher. I think the same problem can happen if, after an error in tablesync worker and we drop the subscription before tablesync worker gets a chance to restart. So, to avoid these problems can we use the TEMPORARY slot for tablesync workers as previously? If I remember correctly, the main problem was we don't know where to start decoding if we fail in catchup phase. But for that origins should be sufficient because if we fail before copy then anyway we have to create a new slot and origin but if we fail after copy then we can use the start_decoding_position from the origin. So before copy, we still need to use CRS_USE_SNAPSHOT while creating a temporary slot but if we are already in FINISHED COPY state at the start of tablesync worker then create a slot with CRS_NOEXPORT_SNAPSHOT option and then use origin's start_pos and proceed decoding changes from that point onwards similar to how currently the apply worker works. -- With Regards, Amit Kapila.
Hi Amit. PSA the v18 patch for the Tablesync Solution1. Main differences from v17: + Design change to use TEMPORARY tablesync slots [ak0122] means lots of the v17 slot cleanup code became unnecessary. + Small refactor in LogicalReplicationSyncTableStart to fix a deadlock scenario. + Addressing some review comments [ak0121]. [ak0121] https://www.postgresql.org/message-id/CAA4eK1LGxuB_RTfZ2HLJT76wv%3DFLV6UPqT%2BFWkiDg61rvQkkmQ%40mail.gmail.com [ak0122] https://www.postgresql.org/message-id/CAA4eK1LS0_mdVx2zG3cS%2BH88FJiwyS3kZi7zxijJ_gEuw2uQ2g%40mail.gmail.com ==== Features: * The tablesync slot name is no longer tied to the Subscription slot name. * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking record is cleaned up by: - process_syncing_tables_for_apply - DropSubscription - AlterSubscription_refresh * Updates to PG docs. * New TAP test case Known Issues: * None. --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Fri, Jan 22, 2021 at 1:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 21, 2021 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > Hi Amit. > > > > > > PSA the v17 patch for the Tablesync Solution1. > > > > > > > Thanks for the updated patch. Below are few comments: > > > > One more comment: > > In LogicalRepSyncTableStart(), you are trying to remove the slot on > the failure of copy which won't work if the publisher is down. If that > happens on restart of tablesync worker, we will retry to create the > slot with the same name and it will fail because the previous slot is > still not removed from the publisher. I think the same problem can > happen if, after an error in tablesync worker and we drop the > subscription before tablesync worker gets a chance to restart. So, to > avoid these problems can we use the TEMPORARY slot for tablesync > workers as previously? If I remember correctly, the main problem was > we don't know where to start decoding if we fail in catchup phase. But > for that origins should be sufficient because if we fail before copy > then anyway we have to create a new slot and origin but if we fail > after copy then we can use the start_decoding_position from the > origin. So before copy, we still need to use CRS_USE_SNAPSHOT while > creating a temporary slot but if we are already in FINISHED COPY state > at the start of tablesync worker then create a slot with > CRS_NOEXPORT_SNAPSHOT option and then use origin's start_pos and > proceed decoding changes from that point onwards similar to how > currently the apply worker works. > OK. Code is modified as suggested in the latest patch [v18]. Now that tablesync slots are temporary, quite a lot of cleanup code from the previous patch (v17) is no longer required so has been removed. ---- [v18] = https://www.postgresql.org/message-id/CAHut%2BPvm0R%3DMn_uVN_JhK0scE54V6%2BEDGHJg1WYJx0Q8HX_mkQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi Amit. > > > > PSA the v17 patch for the Tablesync Solution1. > > > > Thanks for the updated patch. Below are few comments: > 1. Why are we changing the scope of PG_TRY in DropSubscription()? > Also, it might be better to keep the replication slot drop part as it > is. > The latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so this code in DropSubscription is modified a lot. This review comment is not applicable anymore. > 2. > - * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; > - * waits for state change. > + * - Tablesync worker does initial table copy; there is a > FINISHEDCOPY state to > + * indicate when the copy phase has completed, so if the worker crashes > + * before reaching SYNCDONE the copy will not be re-attempted. > > In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE? > OK. The code comment was correct, but maybe confusing. I have reworded it in the latest patch [v18]. > 3. > +void > +tablesync_cleanup_at_interrupt(void) > +{ > + bool drop_slot_needed; > + char originname[NAMEDATALEN] = {0}; > + RepOriginId originid; > + TimeLineID tli; > + Oid subid = MySubscription->oid; > + Oid relid = MyLogicalRepWorker->relid; > + > + elog(DEBUG1, > + "tablesync_cleanup_at_interrupt for relid = %d", > + MyLogicalRepWorker->relid); > > The function name and message makes it sound like that we drop slot > and origin at any interrupt. Isn't it better to name it as > tablesync_cleanup_at_shutdown()? > The latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so this cleanup function is removed. This review comment is not applicable anymore. > 4. > + drop_slot_needed = > + wrconn != NULL && > + MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE && > + MyLogicalRepWorker->relstate != SUBREL_STATE_READY; > + > + if (drop_slot_needed) > + { > + char syncslotname[NAMEDATALEN] = {0}; > + bool missing_ok = true; /* no ERROR if slot is missing. */ > > I think we can avoid using missing_ok and drop_slot_needed variables. > The latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so this code no longer exists. This review comment is not applicable anymore. > 5. Can we drop the origin along with the slot in > process_syncing_tables_for_sync() instead of > process_syncing_tables_for_apply()? I think this is possible because > of the other changes you made in origin.c. Also, if possible, we can > try to use the same code to drop the slot and origin in > tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync. > No, the origin tracking cannot be dropped by the tablesync worker for the normal use-case even with my modified origin.c; it would fail during the commit TX because while trying to do replorigin_session_advance it would find the asserted origin id was not there anymore. Also, the latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so the tablesync_cleanup_at_interrupt function no longer exists (so the origin.c change of v17 has also been removed). > 6. > + if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY) > + { > + /* > + * The COPY phase was previously done, but tablesync then crashed/etc > + * before it was able to finish normally. > + */ > > There seems to be a typo (crashed/etc) in the above comment. > OK. Fixed in latest patch [v18]. ---- [ak0122] = https://www.postgresql.org/message-id/CAA4eK1LS0_mdVx2zG3cS%2BH88FJiwyS3kZi7zxijJ_gEuw2uQ2g%40mail.gmail.com [v18] = https://www.postgresql.org/message-id/CAHut%2BPvm0R%3DMn_uVN_JhK0scE54V6%2BEDGHJg1WYJx0Q8HX_mkQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > 7. > +# check for occurrence of the expected error > +poll_output_until("replication slot \"$slotname\" already exists") > + or die "no error stop for the pre-existing origin"; > > In this test, isn't it better to check for datasync state like below? > 004_sync.pl has some other similar test. > my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;"; > $node_subscriber->poll_query_until('postgres', $started_query) > or die "Timed out while waiting for subscriber to start sync"; > > Is there a reason why we can't use the existing way to check for > failure in this case? Since the new design now uses temporary slots, is this test case still required?. If required, I can change it accordingly. regards, Ajin Cherian Fujitsu Australia
On Sat, Jan 23, 2021 at 8:37 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > 7. > > +# check for occurrence of the expected error > > +poll_output_until("replication slot \"$slotname\" already exists") > > + or die "no error stop for the pre-existing origin"; > > > > In this test, isn't it better to check for datasync state like below? > > 004_sync.pl has some other similar test. > > my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;"; > > $node_subscriber->poll_query_until('postgres', $started_query) > > or die "Timed out while waiting for subscriber to start sync"; > > > > Is there a reason why we can't use the existing way to check for > > failure in this case? > > Since the new design now uses temporary slots, is this test case still > required? > I think so. But do you have any reason to believe that it won't be required anymore? -- With Regards, Amit Kapila.
On Sat, Jan 23, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I think so. But do you have any reason to believe that it won't be > required anymore? A temporary slot will not clash with a permanent slot of the same name, regards, Ajin Cherian Fujitsu
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA the v18 patch for the Tablesync Solution1. > Few comments: ============= 1. - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * So the state progression is always: INIT -> DATASYNC -> + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY. I don't think we need to be specific here that sync worker sets FINISHEDCOPY state. 2. @@ -98,11 +102,16 @@ #include "miscadmin.h" #include "parser/parse_relation.h" #include "pgstat.h" +#include "postmaster/interrupt.h" #include "replication/logicallauncher.h" #include "replication/logicalrelation.h" +#include "replication/logicalworker.h" #include "replication/walreceiver.h" #include "replication/worker_internal.h" +#include "replication/slot.h" I don't think the above includes are required. They seem to the remnant of the previous approach. 3. process_syncing_tables_for_sync(XLogRecPtr current_lsn) { - Assert(IsTransactionState()); + bool sync_done = false; SpinLockAcquire(&MyLogicalRepWorker->relmutex); + sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && + current_lsn >= MyLogicalRepWorker->relstate_lsn; + SpinLockRelease(&MyLogicalRepWorker->relmutex); - if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && - current_lsn >= MyLogicalRepWorker->relstate_lsn) + if (sync_done) { TimeLineID tli; + /* + * Change state to SYNCDONE. + */ + SpinLockAcquire(&MyLogicalRepWorker->relmutex); Why do we need these changes? If you have done it for the code-readability purpose then we can consider this as a separate patch because I don't see why these are required w.r.t this patch. 4. - /* - * To build a slot name for the sync work, we are limited to NAMEDATALEN - - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the - * NAMEDATALEN on the remote that matters, but this scheme will also work - * reasonably if that is different.) - */ - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ - slotname = psprintf("%.*s_%u_sync_%u", - NAMEDATALEN - 28, - MySubscription->slotname, - MySubscription->oid, - MyLogicalRepWorker->relid); + /* Calculate the name of the tablesync slot. */ + slotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid); What is the reason for changing the slot name calculation? If there is any particular reasons, then we can add a comment to indicate why we can't include the subscription's slotname in this calculation. 5. This is WAL + * logged for for the purpose of recovery. Locks are to prevent the + * replication origin from vanishing while advancing. /for for/for 6. + /* Remove the tablesync's origin tracking if exists. */ + snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid); + originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + { + elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname); I don't think we need this and the DEBUG1 message in AlterSubscription_refresh. IT is fine to print this information for background workers like in apply-worker but not sure if need it here. The DropSubscription drops the origin of apply worker but it doesn't use such a DEBUG message so I guess we don't it for tablesync origins as well. 7. Have you tested with the new patch the scenario where we crash after FINISHEDCOPY and before SYNCDONE, is it able to pick up the replication using the new temporary slot? Here, we need to test the case where during the catchup phase we have received few commits and then the tablesync worker is crashed/errored out? Basically, check if the replication is continued from the same point? I understand that this can be only tested by adding some logs and we might not be able to write a test for it. -- With Regards, Amit Kapila.
FYI - I have done some long-running testing using the current patch [v18]. 1. The src/test/subscription TAP tests: - Subscription TAP tests were executed in a loop X 150 iterations. - Duration 5 hrs. - All iterations report "Result: PASS" 2. The postgres "make check" tests: - make check was executed in a loop X 150 iterations. - Duration 2 hrs. - All iterations report "All 202 tests passed" --- [v18] https://www.postgresql.org/message-id/CAHut%2BPvm0R%3DMn_uVN_JhK0scE54V6%2BEDGHJg1WYJx0Q8HX_mkQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > PSA the v18 patch for the Tablesync Solution1. > > > > Few comments: > ============= > 1. > - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> > - * CATCHUP -> SYNCDONE -> READY. > + * So the state progression is always: INIT -> DATASYNC -> > + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY. > > I don't think we need to be specific here that sync worker sets > FINISHEDCOPY state. > This was meant to indicate that *only* the sync worker knows about the FINISHEDCOPY state, whereas all the other states are either known (assigned and/or used) by *both* kinds of workers. But, I can remove it if you feel that distinction is not useful. > 4. > - /* > - * To build a slot name for the sync work, we are limited to NAMEDATALEN - > - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars > - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the > - * NAMEDATALEN on the remote that matters, but this scheme will also work > - * reasonably if that is different.) > - */ > - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > - slotname = psprintf("%.*s_%u_sync_%u", > - NAMEDATALEN - 28, > - MySubscription->slotname, > - MySubscription->oid, > - MyLogicalRepWorker->relid); > + /* Calculate the name of the tablesync slot. */ > + slotname = ReplicationSlotNameForTablesync( > + MySubscription->oid, > + MyLogicalRepWorker->relid); > > What is the reason for changing the slot name calculation? If there is > any particular reasons, then we can add a comment to indicate why we > can't include the subscription's slotname in this calculation. > The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION) and so including the subscription slot name as part of the tablesync slot name was considered to be: a) possibly risky/undefined, if the subscription slot_name = NONE b) confusing, if we end up using 2 different slot names for the same tablesync (e.g. if the subscription slot name is changed before a sync worker is re-launched). And since this subscription slot name part is not necessary for uniqueness anyway, it was removed from the tablesync slot name to eliminate those concerns. Also, the tablesync slot name calculation was encapsulated as a separate function because previously (i.e. before v18) it was used by various other cleanup codes. I still like it better as a function, but now it is only called from one place so we could put that code back inline if you prefer it how it was.. ---- Kind Regards, Peter Smith. Fujitsu Australia
On Sun, Jan 24, 2021 at 5:54 PM Peter Smith <smithpb2250@gmail.com> wrote: > > 4. > > - /* > > - * To build a slot name for the sync work, we are limited to NAMEDATALEN - > > - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars > > - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the > > - * NAMEDATALEN on the remote that matters, but this scheme will also work > > - * reasonably if that is different.) > > - */ > > - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > > - slotname = psprintf("%.*s_%u_sync_%u", > > - NAMEDATALEN - 28, > > - MySubscription->slotname, > > - MySubscription->oid, > > - MyLogicalRepWorker->relid); > > + /* Calculate the name of the tablesync slot. */ > > + slotname = ReplicationSlotNameForTablesync( > > + MySubscription->oid, > > + MyLogicalRepWorker->relid); > > > > What is the reason for changing the slot name calculation? If there is > > any particular reasons, then we can add a comment to indicate why we > > can't include the subscription's slotname in this calculation. > > > > The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION) > and so including the subscription slot name as part of the tablesync > slot name was considered to be: > a) possibly risky/undefined, if the subscription slot_name = NONE > b) confusing, if we end up using 2 different slot names for the same > tablesync (e.g. if the subscription slot name is changed before a sync > worker is re-launched). > And since this subscription slot name part is not necessary for > uniqueness anyway, it was removed from the tablesync slot name to > eliminate those concerns. > > Also, the tablesync slot name calculation was encapsulated as a > separate function because previously (i.e. before v18) it was used by > various other cleanup codes. I still like it better as a function, but > now it is only called from one place so we could put that code back > inline if you prefer it how it was.. It turns out those (a/b) concerns I wrote above are maybe unfounded, because it seems not possible to alter the slot_name = NONE unless the subscription is first DISABLED. So probably I can revert all this tablesync slot name calculation back to how it originally was in the OSS HEAD if you want. ---- Kind Regards, Peter Smith. Fujitsu Australia
Hi Amit. PSA the v19 patch for the Tablesync Solution1. Main differences from v18: + Patch has been rebased off HEAD @ 24/Jan + Addressing some review comments [ak0123] [ak0123] https://www.postgresql.org/message-id/CAA4eK1JhpuwujrV6ABMmZ3jXfW37ssZnJ3fikrY7rRdvoEmu_g%40mail.gmail.com ==== Features: * The tablesync worker is now allowing multiple tx instead of single tx. * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking record is cleaned up by: - process_syncing_tables_for_apply - DropSubscription - AlterSubscription_refresh * Updates to PG docs. * New TAP test case. Known Issues: * None. --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Jan 25, 2021 at 6:15 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sun, Jan 24, 2021 at 5:54 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > 4. > > > - /* > > > - * To build a slot name for the sync work, we are limited to NAMEDATALEN - > > > - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars > > > - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the > > > - * NAMEDATALEN on the remote that matters, but this scheme will also work > > > - * reasonably if that is different.) > > > - */ > > > - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > > > - slotname = psprintf("%.*s_%u_sync_%u", > > > - NAMEDATALEN - 28, > > > - MySubscription->slotname, > > > - MySubscription->oid, > > > - MyLogicalRepWorker->relid); > > > + /* Calculate the name of the tablesync slot. */ > > > + slotname = ReplicationSlotNameForTablesync( > > > + MySubscription->oid, > > > + MyLogicalRepWorker->relid); > > > > > > What is the reason for changing the slot name calculation? If there is > > > any particular reasons, then we can add a comment to indicate why we > > > can't include the subscription's slotname in this calculation. > > > > > > > The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION) > > and so including the subscription slot name as part of the tablesync > > slot name was considered to be: > > a) possibly risky/undefined, if the subscription slot_name = NONE > > b) confusing, if we end up using 2 different slot names for the same > > tablesync (e.g. if the subscription slot name is changed before a sync > > worker is re-launched). > > And since this subscription slot name part is not necessary for > > uniqueness anyway, it was removed from the tablesync slot name to > > eliminate those concerns. > > > > Also, the tablesync slot name calculation was encapsulated as a > > separate function because previously (i.e. before v18) it was used by > > various other cleanup codes. I still like it better as a function, but > > now it is only called from one place so we could put that code back > > inline if you prefer it how it was.. > > It turns out those (a/b) concerns I wrote above are maybe unfounded, > because it seems not possible to alter the slot_name = NONE unless the > subscription is first DISABLED. > Yeah, but I think the user can still change to some other predefined slot_name. However, I guess it doesn't matter unless it can lead what you have mentioned in (a). As that can't happen, it is probably better to take out that change from the patch. I see your point of moving this calculation to a separate function but not sure if it is worth it unless we have to call it from multiple places or it simplifies the existing code. -- With Regards, Amit Kapila.
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 2. > @@ -98,11 +102,16 @@ > #include "miscadmin.h" > #include "parser/parse_relation.h" > #include "pgstat.h" > +#include "postmaster/interrupt.h" > #include "replication/logicallauncher.h" > #include "replication/logicalrelation.h" > +#include "replication/logicalworker.h" > #include "replication/walreceiver.h" > #include "replication/worker_internal.h" > +#include "replication/slot.h" > > I don't think the above includes are required. They seem to the > remnant of the previous approach. > OK. Fixed in the latest patch [v19]. > 3. > process_syncing_tables_for_sync(XLogRecPtr current_lsn) > { > - Assert(IsTransactionState()); > + bool sync_done = false; > > SpinLockAcquire(&MyLogicalRepWorker->relmutex); > + sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && > + current_lsn >= MyLogicalRepWorker->relstate_lsn; > + SpinLockRelease(&MyLogicalRepWorker->relmutex); > > - if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && > - current_lsn >= MyLogicalRepWorker->relstate_lsn) > + if (sync_done) > { > TimeLineID tli; > > + /* > + * Change state to SYNCDONE. > + */ > + SpinLockAcquire(&MyLogicalRepWorker->relmutex); > > Why do we need these changes? If you have done it for the > code-readability purpose then we can consider this as a separate patch > because I don't see why these are required w.r.t this patch. > Yes it was for code readability in v17 when this function used to be much larger. But it is not very necessary anymore and has been reverted in the latest patch [v19]. > 4. > - /* > - * To build a slot name for the sync work, we are limited to NAMEDATALEN - > - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars > - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the > - * NAMEDATALEN on the remote that matters, but this scheme will also work > - * reasonably if that is different.) > - */ > - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ > - slotname = psprintf("%.*s_%u_sync_%u", > - NAMEDATALEN - 28, > - MySubscription->slotname, > - MySubscription->oid, > - MyLogicalRepWorker->relid); > + /* Calculate the name of the tablesync slot. */ > + slotname = ReplicationSlotNameForTablesync( > + MySubscription->oid, > + MyLogicalRepWorker->relid); > > What is the reason for changing the slot name calculation? If there is > any particular reasons, then we can add a comment to indicate why we > can't include the subscription's slotname in this calculation. > The tablesync slot name changes were not strictly necessary, so the code is all reverted to be the same as OSS HEAD now in the latest patch [v19]. > 5. > This is WAL > + * logged for for the purpose of recovery. Locks are to prevent the > + * replication origin from vanishing while advancing. > > /for for/for > OK. Fixed in the latest patch [v19]. > 6. > + /* Remove the tablesync's origin tracking if exists. */ > + snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid); > + originid = replorigin_by_name(originname, true); > + if (originid != InvalidRepOriginId) > + { > + elog(DEBUG1, "DropSubscription: dropping origin tracking for > \"%s\"", originname); > > I don't think we need this and the DEBUG1 message in > AlterSubscription_refresh. IT is fine to print this information for > background workers like in apply-worker but not sure if need it here. > The DropSubscription drops the origin of apply worker but it doesn't > use such a DEBUG message so I guess we don't it for tablesync origins > as well. > OK. These DEBUG1 logs are removed in the latest patch [v19]. ---- [v19] https://www.postgresql.org/message-id/CAHut%2BPsj7Xm8C1LbqeAbk-3duyS8xXJtL9TiGaeu3P8g272mAA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Sun, Jan 24, 2021 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Few comments: > > ============= > > 1. > > - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> > > - * CATCHUP -> SYNCDONE -> READY. > > + * So the state progression is always: INIT -> DATASYNC -> > > + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY. > > > > I don't think we need to be specific here that sync worker sets > > FINISHEDCOPY state. > > > > This was meant to indicate that *only* the sync worker knows about the > FINISHEDCOPY state, whereas all the other states are either known > (assigned and/or used) by *both* kinds of workers. But, I can remove > it if you feel that distinction is not useful. > Okay, but I feel you can mention that in the description you have added for FINISHEDCOPY state. It looks a bit odd here and the message you want to convey is also not that clear. -- With Regards, Amit Kapila.
On Sat, Jan 23, 2021 at 11:08 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Sat, Jan 23, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I think so. But do you have any reason to believe that it won't be > > required anymore? > > A temporary slot will not clash with a permanent slot of the same name, > I have tried below and it seems to be clashing: postgres=# SELECT 'init' FROM pg_create_logical_replication_slot('test_slot2', 'test_decoding'); ?column? ---------- init (1 row) postgres=# SELECT 'init' FROM pg_create_logical_replication_slot('test_slot2', 'test_decoding', true); ERROR: replication slot "test_slot2" already exists Note that the third parameter in the second statement above indicates whether it is a temporary slot or not. What am I missing? -- With Regards, Amit Kapila.
On Mon, Jan 25, 2021 at 8:23 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 2. > > @@ -98,11 +102,16 @@ > > #include "miscadmin.h" > > #include "parser/parse_relation.h" > > #include "pgstat.h" > > +#include "postmaster/interrupt.h" > > #include "replication/logicallauncher.h" > > #include "replication/logicalrelation.h" > > +#include "replication/logicalworker.h" > > #include "replication/walreceiver.h" > > #include "replication/worker_internal.h" > > +#include "replication/slot.h" > > > > I don't think the above includes are required. They seem to the > > remnant of the previous approach. > > > > OK. Fixed in the latest patch [v19]. > You seem to forgot removing #include "replication/slot.h". Check, if it is not required then remove that as well. -- With Regards, Amit Kapila.
On Mon, Jan 25, 2021 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Amit. > > PSA the v19 patch for the Tablesync Solution1. > I see one race condition in this patch where we try to drop the origin via apply process and DropSubscription. I think it can lead to the error "cache lookup failed for replication origin with oid %u". The same problem can happen via exposed API pg_replication_origin_drop but probably because this is not used concurrently so nobody faced this issue. I think for the matter of this patch we can try to suppress such an error either via try..catch, or by adding missing_ok argument to replorigin_drop API, or we can just add to comments that such a race exists. Additionally, we should try to start a new thread for the existence of this problem in pg_replication_origin_drop. What do you think? -- With Regards, Amit Kapila.
Hi Amit. PSA the v20 patch for the Tablesync Solution1. Main differences from v19: + Updated TAP test [ak0123-7] + Fixed comment [ak0125-1] + Removed redundant header [ak0125-2] + Protection against race condition [ak0125-race] [ak0123-7] https://www.postgresql.org/message-id/CAA4eK1JhpuwujrV6ABMmZ3jXfW37ssZnJ3fikrY7rRdvoEmu_g%40mail.gmail.com [ak0125-1] https://www.postgresql.org/message-id/CAA4eK1JmP2VVpH2%3DO%3D5BBbuH7gyQtWn40aXp_Jyjn1%2BKggfq8A%40mail.gmail.com [ak0125-2] https://www.postgresql.org/message-id/CAA4eK1L1j5sfBgHb0-H-%2B2quBstsA3hMcDfP-4vLuU-UF43nXQ%40mail.gmail.com [ak0125-race] https://www.postgresql.org/message-id/CAA4eK1%2ByeLwBCkTvTdPM-hSk1fr6jT8KJc362CN8zrGztq_JqQ%40mail.gmail.com ==== Features: * The tablesync worker is now allowing multiple tx instead of single tx. * A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking record is cleaned up by: - process_syncing_tables_for_apply - DropSubscription - AlterSubscription_refresh * Updates to PG docs. * New TAP test case. Known Issues: * Some records arriving between FINISHEDCOPY and SYNCDONE state may be lost (currently under investigation). --- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > 7. > +# check for occurrence of the expected error > +poll_output_until("replication slot \"$slotname\" already exists") > + or die "no error stop for the pre-existing origin"; > > In this test, isn't it better to check for datasync state like below? > 004_sync.pl has some other similar test. > my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;"; > $node_subscriber->poll_query_until('postgres', $started_query) > or die "Timed out while waiting for subscriber to start sync"; > > Is there a reason why we can't use the existing way to check for > failure in this case? > The TAP test is updated in the latest patch [v20]. ---- [v20] https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Jan 25, 2021 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Jan 24, 2021 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Few comments: > > > ============= > > > 1. > > > - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> > > > - * CATCHUP -> SYNCDONE -> READY. > > > + * So the state progression is always: INIT -> DATASYNC -> > > > + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY. > > > > > > I don't think we need to be specific here that sync worker sets > > > FINISHEDCOPY state. > > > > > > > This was meant to indicate that *only* the sync worker knows about the > > FINISHEDCOPY state, whereas all the other states are either known > > (assigned and/or used) by *both* kinds of workers. But, I can remove > > it if you feel that distinction is not useful. > > > > Okay, but I feel you can mention that in the description you have > added for FINISHEDCOPY state. It looks a bit odd here and the message > you want to convey is also not that clear. > The comment is updated in the latest patch [v20]. ---- [v20] https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Jan 25, 2021 at 2:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 25, 2021 at 8:23 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 2. > > > @@ -98,11 +102,16 @@ > > > #include "miscadmin.h" > > > #include "parser/parse_relation.h" > > > #include "pgstat.h" > > > +#include "postmaster/interrupt.h" > > > #include "replication/logicallauncher.h" > > > #include "replication/logicalrelation.h" > > > +#include "replication/logicalworker.h" > > > #include "replication/walreceiver.h" > > > #include "replication/worker_internal.h" > > > +#include "replication/slot.h" > > > > > > I don't think the above includes are required. They seem to the > > > remnant of the previous approach. > > > > > > > OK. Fixed in the latest patch [v19]. > > > > You seem to forgot removing #include "replication/slot.h". Check, if > it is not required then remove that as well. > Fixed in the latest patch [v20]. ---- [v20] https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Jan 25, 2021 at 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 25, 2021 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi Amit. > > > > PSA the v19 patch for the Tablesync Solution1. > > > > I see one race condition in this patch where we try to drop the origin > via apply process and DropSubscription. I think it can lead to the > error "cache lookup failed for replication origin with oid %u". The > same problem can happen via exposed API pg_replication_origin_drop but > probably because this is not used concurrently so nobody faced this > issue. I think for the matter of this patch we can try to suppress > such an error either via try..catch, or by adding missing_ok argument > to replorigin_drop API, or we can just add to comments that such a > race exists. OK. This has been isolated to a common function called from 3 places. The potential race ERROR is suppressed by TRY/CATCH. Please see code of latest patch [v20] > Additionally, we should try to start a new thread for the > existence of this problem in pg_replication_origin_drop. What do you > think? OK. It is on my TODO list.. ---- [v20] https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Jan 25, 2021 at 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 25, 2021 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi Amit. > > > > PSA the v19 patch for the Tablesync Solution1. > > > > I see one race condition in this patch where we try to drop the origin > via apply process and DropSubscription. I think it can lead to the > error "cache lookup failed for replication origin with oid %u". The > same problem can happen via exposed API pg_replication_origin_drop but > probably because this is not used concurrently so nobody faced this > issue. I think for the matter of this patch we can try to suppress > such an error either via try..catch, or by adding missing_ok argument > to replorigin_drop API, or we can just add to comments that such a > race exists. Additionally, we should try to start a new thread for the > existence of this problem in pg_replication_origin_drop. What do you > think? OK. A new thread [ps0127] for this problem was started --- [ps0127] = https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > PSA the v18 patch for the Tablesync Solution1. > > 7. Have you tested with the new patch the scenario where we crash > after FINISHEDCOPY and before SYNCDONE, is it able to pick up the > replication using the new temporary slot? Here, we need to test the > case where during the catchup phase we have received few commits and > then the tablesync worker is crashed/errored out? Basically, check if > the replication is continued from the same point? > I have tested this and it didn't work, see the below example. Publisher-side ================ CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 1); INSERT INTO mytbl1(somedata, text) VALUES (1, 2); COMMIT; CREATE PUBLICATION mypublication FOR TABLE mytbl1; Subscriber-side ================ - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync worker stops. CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypublication; During debug, stop after we mark FINISHEDCOPY state. Publisher-side ================ INSERT INTO mytbl1(somedata, text) VALUES (1, 3); INSERT INTO mytbl1(somedata, text) VALUES (1, 4); Subscriber-side ================ - Have a breakpoint in apply_dispatch - continue in debugger; - After we replay first commit (which will be for values(1,3), note down the origin position in apply_handle_commit_internal and somehow error out. I have forced the debugger to set to the last line in apply_dispatch where the error is raised. - After the error, again the tablesync worker is restarted and it starts from the position noted in the previous step - It exits without replaying the WAL for (1,4) So, on the subscriber-side, you will see 3 records. Fourth is missing. Now, if you insert more records on the publisher, it will anyway replay those but the fourth one got missing. The temporary slots didn't seem to work because we created again the new temporary slot after the crash and ask it to start decoding from the point we noted in origin_lsn. The publisher didn’t hold the required WAL as our slot was temporary so it started sending from some later point. We retain WAL based on the slots restart_lsn position and wal_keep_size. For our case, the positions of the slots will matter and as we have created temporary slots, there is no way for a publisher to save that WAL. In this particular case, even if the WAL would have been there we only pass the start_decoding_at position but didn’t pass restart_lsn, so it picked a random location (current insert position in WAL) which is ahead of start_decoding_at point so it never sent the required fourth record. Now, I don’t think it will work even if somehow sent the correct restart_lsn because of what I wrote earlier that there is no guarantee that the earlier WAL would have been saved. At this point, I can't think of any way to fix this problem except for going back to the previous approach of permanent slots but let me know if you have any ideas to salvage this approach? -- With Regards, Amit Kapila.
Hi Amit. PSA the v21 patch for the Tablesync Solution1. Main differences from v20: + Rebased to latest OSS HEAD @ 27/Jan + v21 is a merging of patches [v17] and [v20], which was made necessary when it was found [ak0127] that the v20 usage of TEMPORARY tablesync slots did not work correctly. v21 reverts to using PERMANENT tablesync slots same as implemented in v17, while retaining other review comment fixes made for v18, v19, v20. ---- [v17] https://www.postgresql.org/message-id/CAHut%2BPt9%2Bg8qQR0kMC85nY-O4uDQxXboamZAYhHbvkebzC9fAQ%40mail.gmail.com [v20] https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com [ak0127] https://www.postgresql.org/message-id/CAA4eK1LDsj9kw4FbWAw3CMHyVsjafgDum03cYy-wpGmor%3D8-1w%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > PSA the v18 patch for the Tablesync Solution1. > > > > 7. Have you tested with the new patch the scenario where we crash > > after FINISHEDCOPY and before SYNCDONE, is it able to pick up the > > replication using the new temporary slot? Here, we need to test the > > case where during the catchup phase we have received few commits and > > then the tablesync worker is crashed/errored out? Basically, check if > > the replication is continued from the same point? > > > > I have tested this and it didn't work, see the below example. > > Publisher-side > ================ > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > BEGIN; > INSERT INTO mytbl1(somedata, text) VALUES (1, 1); > INSERT INTO mytbl1(somedata, text) VALUES (1, 2); > COMMIT; > > CREATE PUBLICATION mypublication FOR TABLE mytbl1; > > Subscriber-side > ================ > - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync > worker stops. > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > > CREATE SUBSCRIPTION mysub > CONNECTION 'host=localhost port=5432 dbname=postgres' > PUBLICATION mypublication; > > During debug, stop after we mark FINISHEDCOPY state. > > Publisher-side > ================ > INSERT INTO mytbl1(somedata, text) VALUES (1, 3); > INSERT INTO mytbl1(somedata, text) VALUES (1, 4); > > > Subscriber-side > ================ > - Have a breakpoint in apply_dispatch > - continue in debugger; > - After we replay first commit (which will be for values(1,3), note > down the origin position in apply_handle_commit_internal and somehow > error out. I have forced the debugger to set to the last line in > apply_dispatch where the error is raised. > - After the error, again the tablesync worker is restarted and it > starts from the position noted in the previous step > - It exits without replaying the WAL for (1,4) > > So, on the subscriber-side, you will see 3 records. Fourth is missing. > Now, if you insert more records on the publisher, it will anyway > replay those but the fourth one got missing. > > The temporary slots didn't seem to work because we created again the > new temporary slot after the crash and ask it to start decoding from > the point we noted in origin_lsn. The publisher didn’t hold the > required WAL as our slot was temporary so it started sending from some > later point. We retain WAL based on the slots restart_lsn position and > wal_keep_size. For our case, the positions of the slots will matter > and as we have created temporary slots, there is no way for a > publisher to save that WAL. > > In this particular case, even if the WAL would have been there we only > pass the start_decoding_at position but didn’t pass restart_lsn, so it > picked a random location (current insert position in WAL) which is > ahead of start_decoding_at point so it never sent the required fourth > record. Now, I don’t think it will work even if somehow sent the > correct restart_lsn because of what I wrote earlier that there is no > guarantee that the earlier WAL would have been saved. > > At this point, I can't think of any way to fix this problem except for > going back to the previous approach of permanent slots but let me know > if you have any ideas to salvage this approach? > OK. The latest patch [v21] now restores the permanent slot (and slot cleanup) approach as it was implemented in an earlier version [v17]. Please note that this change also re-introduces some potential slot cleanup problems for some race scenarios. These will be addressed by future patches. ---- [v17] https://www.postgresql.org/message-id/CAHut%2BPt9%2Bg8qQR0kMC85nY-O4uDQxXboamZAYhHbvkebzC9fAQ%40mail.gmail.com [v21] https://www.postgresql.org/message-id/CAHut%2BPvzHRRA_5O0R8KZCb1tVe1mBVPxFtmttXJnmuOmAegoWA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Jan 28, 2021 at 12:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > PSA the v18 patch for the Tablesync Solution1. > > > > > > 7. Have you tested with the new patch the scenario where we crash > > > after FINISHEDCOPY and before SYNCDONE, is it able to pick up the > > > replication using the new temporary slot? Here, we need to test the > > > case where during the catchup phase we have received few commits and > > > then the tablesync worker is crashed/errored out? Basically, check if > > > the replication is continued from the same point? > > > > > > > I have tested this and it didn't work, see the below example. > > > > Publisher-side > > ================ > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > > > BEGIN; > > INSERT INTO mytbl1(somedata, text) VALUES (1, 1); > > INSERT INTO mytbl1(somedata, text) VALUES (1, 2); > > COMMIT; > > > > CREATE PUBLICATION mypublication FOR TABLE mytbl1; > > > > Subscriber-side > > ================ > > - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync > > worker stops. > > > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > > > > > CREATE SUBSCRIPTION mysub > > CONNECTION 'host=localhost port=5432 dbname=postgres' > > PUBLICATION mypublication; > > > > During debug, stop after we mark FINISHEDCOPY state. > > > > Publisher-side > > ================ > > INSERT INTO mytbl1(somedata, text) VALUES (1, 3); > > INSERT INTO mytbl1(somedata, text) VALUES (1, 4); > > > > > > Subscriber-side > > ================ > > - Have a breakpoint in apply_dispatch > > - continue in debugger; > > - After we replay first commit (which will be for values(1,3), note > > down the origin position in apply_handle_commit_internal and somehow > > error out. I have forced the debugger to set to the last line in > > apply_dispatch where the error is raised. > > - After the error, again the tablesync worker is restarted and it > > starts from the position noted in the previous step > > - It exits without replaying the WAL for (1,4) > > > > So, on the subscriber-side, you will see 3 records. Fourth is missing. > > Now, if you insert more records on the publisher, it will anyway > > replay those but the fourth one got missing. > > ... > > > > At this point, I can't think of any way to fix this problem except for > > going back to the previous approach of permanent slots but let me know > > if you have any ideas to salvage this approach? > > > > OK. The latest patch [v21] now restores the permanent slot (and slot > cleanup) approach as it was implemented in an earlier version [v17]. > Please note that this change also re-introduces some potential slot > cleanup problems for some race scenarios. > I am able to reproduce the race condition where slot/origin will remain on the publisher node even when the corresponding subscription is dropped. Basically, if we error out in the 'catchup' phase in tablesync worker then either it will restart and cleanup slot/origin or if in the meantime we have dropped the subscription and stopped apply worker then probably the slot and origin will be dangling on the publisher. I have used exactly the same test procedure as was used to expose the problem in the temporary slots with some minor changes as mentioned below: Subscriber-side ================ - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync worker stops. - Have a while(1) loop in wait_for_relation_state_change so that we can control apply worker via debugger at the right time. Subscriber-side ================ - Have a breakpoint in apply_dispatch - continue in debugger; - After we replay first commit somehow error out. I have forced the debugger to set to the last line in apply_dispatch where the error is raised. - Now, the table sync worker won't restart because the apply worker is looping in wait_for_relation_state_change. - Execute DropSubscription; - We can allow apply worker to continue by skipping the while(1) and it will exit because DropSubscription would have sent a terminate signal. After the above steps, check the publisher (select * from pg_replication_slots) and you will find the dangling tablesync slot. I think to solve the above problem we should drop tablesync slot/origin at the Drop/Alter Subscription time and additionally we need to ensure that apply worker doesn't let tablesync workers restart (or it must not do any work to access the slot because the slots are dropped) once we stopped them. To ensure that, I think we need to make the following changes: 1. Take AccessExclusivelock on subscription_rel during Alter (before calling RemoveSubscriptionRel) and don't release it till transaction end (do table_close with NoLock) similar to DropSubscription. 2. Take share lock (AccessShareLock) in GetSubscriptionRelState (it gets called from logicalrepsyncstartworker), we can release this lock at the end of that function. This will ensure that even if the tablesync worker is restarted, it will be blocked till the transaction performing Alter will commit. 3. Make Alter command to not run in xact block so that we don't keep locks for a longer time and second for the slots related stuff similar to dropsubscription. Few comments on v21: =================== 1. DropSubscription() { .. - /* Clean up dependencies */ + /* Clean up dependencies. */ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0); .. } The above change seems unnecessary w.r.t current patch. 2. DropSubscription() { .. /* - * If there is no slot associated with the subscription, we can finish - * here. + * If there is a slot associated with the subscription, then drop the + * replication slot at the publisher node using the replication + * connection. */ - if (!slotname) + if (slotname) { - table_close(rel, NoLock); - return; .. } What is the reason for this change? Can't we keep the check in its existing form? -- With Regards, Amit Kapila.
Hi Amit. PSA the v22 patch for the Tablesync Solution1. Differences from v21: + Patch is rebased to latest OSS HEAD @ 29/Jan. + Includes new code as suggested [ak0128] to ensure no dangling slots at Drop/AlterSubscription. + Removes the slot/origin cleanup down by process interrupt logic (cleanup_at_shutdown function). + Addresses some minor review comments. ---- [ak0128] https://www.postgresql.org/message-id/CAA4eK1LMYXZY1SpzgW-WyFdy%2BFTMZ4BMz1dj0rT2rxGv-zLwFA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Thu, Jan 28, 2021 at 9:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 28, 2021 at 12:32 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > PSA the v18 patch for the Tablesync Solution1. > > > > > > > > 7. Have you tested with the new patch the scenario where we crash > > > > after FINISHEDCOPY and before SYNCDONE, is it able to pick up the > > > > replication using the new temporary slot? Here, we need to test the > > > > case where during the catchup phase we have received few commits and > > > > then the tablesync worker is crashed/errored out? Basically, check if > > > > the replication is continued from the same point? > > > > > > > > > > I have tested this and it didn't work, see the below example. > > > > > > Publisher-side > > > ================ > > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > > > > > BEGIN; > > > INSERT INTO mytbl1(somedata, text) VALUES (1, 1); > > > INSERT INTO mytbl1(somedata, text) VALUES (1, 2); > > > COMMIT; > > > > > > CREATE PUBLICATION mypublication FOR TABLE mytbl1; > > > > > > Subscriber-side > > > ================ > > > - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync > > > worker stops. > > > > > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); > > > > > > > > > CREATE SUBSCRIPTION mysub > > > CONNECTION 'host=localhost port=5432 dbname=postgres' > > > PUBLICATION mypublication; > > > > > > During debug, stop after we mark FINISHEDCOPY state. > > > > > > Publisher-side > > > ================ > > > INSERT INTO mytbl1(somedata, text) VALUES (1, 3); > > > INSERT INTO mytbl1(somedata, text) VALUES (1, 4); > > > > > > > > > Subscriber-side > > > ================ > > > - Have a breakpoint in apply_dispatch > > > - continue in debugger; > > > - After we replay first commit (which will be for values(1,3), note > > > down the origin position in apply_handle_commit_internal and somehow > > > error out. I have forced the debugger to set to the last line in > > > apply_dispatch where the error is raised. > > > - After the error, again the tablesync worker is restarted and it > > > starts from the position noted in the previous step > > > - It exits without replaying the WAL for (1,4) > > > > > > So, on the subscriber-side, you will see 3 records. Fourth is missing. > > > Now, if you insert more records on the publisher, it will anyway > > > replay those but the fourth one got missing. > > > > ... > > > > > > At this point, I can't think of any way to fix this problem except for > > > going back to the previous approach of permanent slots but let me know > > > if you have any ideas to salvage this approach? > > > > > > > OK. The latest patch [v21] now restores the permanent slot (and slot > > cleanup) approach as it was implemented in an earlier version [v17]. > > Please note that this change also re-introduces some potential slot > > cleanup problems for some race scenarios. > > > > I am able to reproduce the race condition where slot/origin will > remain on the publisher node even when the corresponding subscription > is dropped. Basically, if we error out in the 'catchup' phase in > tablesync worker then either it will restart and cleanup slot/origin > or if in the meantime we have dropped the subscription and stopped > apply worker then probably the slot and origin will be dangling on the > publisher. > > I have used exactly the same test procedure as was used to expose the > problem in the temporary slots with some minor changes as mentioned > below: > Subscriber-side > ================ > - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync > worker stops. > - Have a while(1) loop in wait_for_relation_state_change so that we > can control apply worker via debugger at the right time. > > Subscriber-side > ================ > - Have a breakpoint in apply_dispatch > - continue in debugger; > - After we replay first commit somehow error out. I have forced the > debugger to set to the last line in apply_dispatch where the error is > raised. > - Now, the table sync worker won't restart because the apply worker is > looping in wait_for_relation_state_change. > - Execute DropSubscription; > - We can allow apply worker to continue by skipping the while(1) and > it will exit because DropSubscription would have sent a terminate > signal. > > After the above steps, check the publisher (select * from > pg_replication_slots) and you will find the dangling tablesync slot. > > I think to solve the above problem we should drop tablesync > slot/origin at the Drop/Alter Subscription time and additionally we > need to ensure that apply worker doesn't let tablesync workers restart > (or it must not do any work to access the slot because the slots are > dropped) once we stopped them. To ensure that, I think we need to make > the following changes: > > 1. Take AccessExclusivelock on subscription_rel during Alter (before > calling RemoveSubscriptionRel) and don't release it till transaction > end (do table_close with NoLock) similar to DropSubscription. > 2. Take share lock (AccessShareLock) in GetSubscriptionRelState (it > gets called from logicalrepsyncstartworker), we can release this lock > at the end of that function. This will ensure that even if the > tablesync worker is restarted, it will be blocked till the transaction > performing Alter will commit. > 3. Make Alter command to not run in xact block so that we don't keep > locks for a longer time and second for the slots related stuff similar > to dropsubscription. > OK. The latest patch [v22] changes the code as suggested above. > Few comments on v21: > =================== > 1. > DropSubscription() > { > .. > - /* Clean up dependencies */ > + /* Clean up dependencies. */ > deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0); > .. > } > > The above change seems unnecessary w.r.t current patch. > OK. Modified in patch [v22]. > 2. > DropSubscription() > { > .. > /* > - * If there is no slot associated with the subscription, we can finish > - * here. > + * If there is a slot associated with the subscription, then drop the > + * replication slot at the publisher node using the replication > + * connection. > */ > - if (!slotname) > + if (slotname) > { > - table_close(rel, NoLock); > - return; > .. > } > > What is the reason for this change? Can't we keep the check in its > existing form? > I think the above comment is longer applicable in the latest patch [v22]. Early exit for null slotname is not desirable anymore; we still need to process all the tablesync slots/origins regardless. ---- [v22] https://www.postgresql.org/message-id/CAHut%2BPtrAVrtjc8srASTeUhbJtviw0Up-bzFSc14Ss%3DmAMxz9g%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Jan 29, 2021 at 4:07 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > Differences from v21: > + Patch is rebased to latest OSS HEAD @ 29/Jan. > + Includes new code as suggested [ak0128] to ensure no dangling slots > at Drop/AlterSubscription. > + Removes the slot/origin cleanup down by process interrupt logic > (cleanup_at_shutdown function). > + Addresses some minor review comments. > I have made the below changes in the patch. Let me know what you think about these? 1. It was a bit difficult to understand the code in DropSubscription so I have rearranged the code to match the way we are doing in HEAD where we drop the slots at the end after finishing all the other cleanup. 2. In AlterSubscription_refresh(), we can't allow workers to be stopped at commit time as we have already dropped the slots because the worker can access the dropped slot. We need to stop the workers before dropping slots. This makes all the code related to logicalrep_worker_stop_at_commit redundant. 3. In AlterSubscription_refresh(), we need to acquire the lock on pg_subscription_rel only when we try to remove any subscription rel. 4. Added/Changed quite a few comments. -- With Regards, Amit Kapila.
Attachment
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I have made the below changes in the patch. Let me know what you think > about these? > 1. It was a bit difficult to understand the code in DropSubscription > so I have rearranged the code to match the way we are doing in HEAD > where we drop the slots at the end after finishing all the other > cleanup. There was a reason why the v22 logic was different from HEAD. The broken connection leaves dangling slots which is unavoidable. But, whereas the user knows the name of the Subscription slot (they named it), there is no easy way for them to know the names of the remaining tablesync slots unless we log them. That is why the v22 code was written to process the tablesync slots even for wrconn == NULL so their name could be logged: elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".", syncslotname); The v23 patch removed this dangling slot name information, so it makes it difficult for the user to know what tablesync slots to cleanup. > 2. In AlterSubscription_refresh(), we can't allow workers to be > stopped at commit time as we have already dropped the slots because > the worker can access the dropped slot. We need to stop the workers > before dropping slots. This makes all the code related to > logicalrep_worker_stop_at_commit redundant. OK. > 3. In AlterSubscription_refresh(), we need to acquire the lock on > pg_subscription_rel only when we try to remove any subscription rel. + if (!sub_rel_locked) + { + rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock); + sub_rel_locked = true; + } OK. But the sub_rel_locked bool is not really necessary. Why not just check for rel == NULL? e.g. if (!rel) rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock); > 4. Added/Changed quite a few comments. > @@ -1042,6 +1115,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel) } list_free(subworkers); + /* + * Tablesync resource cleanup (slots and origins). The comment is misleading; this code is only dropping origins. ---- Kind Regards, Peter Smith. Fujitsu Australia
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > 2. In AlterSubscription_refresh(), we can't allow workers to be > stopped at commit time as we have already dropped the slots because > the worker can access the dropped slot. We need to stop the workers > before dropping slots. This makes all the code related to > logicalrep_worker_stop_at_commit redundant. @@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId Oid relid; } LogicalRepWorkerId; -typedef struct StopWorkersData -{ - int nestDepth; /* Sub-transaction nest level */ - List *workers; /* List of LogicalRepWorkerId */ - struct StopWorkersData *parent; /* This need not be an immediate - * subtransaction parent */ -} StopWorkersData; Since v23 removes that typedef from the code, don't you also have to remove it from src/tools/pgindent/typedefs.list? ---- Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I have made the below changes in the patch. Let me know what you think > > about these? > > 1. It was a bit difficult to understand the code in DropSubscription > > so I have rearranged the code to match the way we are doing in HEAD > > where we drop the slots at the end after finishing all the other > > cleanup. > > There was a reason why the v22 logic was different from HEAD. > > The broken connection leaves dangling slots which is unavoidable. > I think this is true only when the user specifically requested it by the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? Otherwise, we give an error on a broken connection. Also, if that is true then is there a reason to pass missing_ok as true while dropping tablesync slots? > But, > whereas the user knows the name of the Subscription slot (they named > it), there is no easy way for them to know the names of the remaining > tablesync slots unless we log them. > > That is why the v22 code was written to process the tablesync slots > even for wrconn == NULL so their name could be logged: > elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".", > syncslotname); > > The v23 patch removed this dangling slot name information, so it makes > it difficult for the user to know what tablesync slots to cleanup. > Okay, then can we think of combining with the existing error of the replication slot? I think that might produce a very long message, so another idea could be to LOG a separate WARNING for each such slot just before giving the error. > > 2. In AlterSubscription_refresh(), we can't allow workers to be > > stopped at commit time as we have already dropped the slots because > > the worker can access the dropped slot. We need to stop the workers > > before dropping slots. This makes all the code related to > > logicalrep_worker_stop_at_commit redundant. > > OK. > > > 3. In AlterSubscription_refresh(), we need to acquire the lock on > > pg_subscription_rel only when we try to remove any subscription rel. > > + if (!sub_rel_locked) > + { > + rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock); > + sub_rel_locked = true; > + } > > OK. But the sub_rel_locked bool is not really necessary. Why not just > check for rel == NULL? e.g. > if (!rel) > rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock); > Okay, that seems to be better, will change accordingly. > > 4. Added/Changed quite a few comments. > > > > @@ -1042,6 +1115,31 @@ DropSubscription(DropSubscriptionStmt *stmt, > bool isTopLevel) > } > list_free(subworkers); > > + /* > + * Tablesync resource cleanup (slots and origins). > > The comment is misleading; this code is only dropping origins. > Okay, will change to something like: "Cleanup of tablesync replication origins." > @@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId > Oid relid; > } LogicalRepWorkerId; > > -typedef struct StopWorkersData > -{ > - int nestDepth; /* Sub-transaction nest level */ > - List *workers; /* List of LogicalRepWorkerId */ > - struct StopWorkersData *parent; /* This need not be an immediate > - * subtransaction parent */ > -} StopWorkersData; > > Since v23 removes that typedef from the code, don't you also have to > remove it from src/tools/pgindent/typedefs.list? > Yeah. -- With Regards, Amit Kapila.
On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have made the below changes in the patch. Let me know what you think > > > about these? > > > 1. It was a bit difficult to understand the code in DropSubscription > > > so I have rearranged the code to match the way we are doing in HEAD > > > where we drop the slots at the end after finishing all the other > > > cleanup. > > > > There was a reason why the v22 logic was different from HEAD. > > > > The broken connection leaves dangling slots which is unavoidable. > > > > I think this is true only when the user specifically requested it by > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? > Otherwise, we give an error on a broken connection. Also, if that is > true then is there a reason to pass missing_ok as true while dropping > tablesync slots? > AFAIK there is always a potential race with DropSubscription dropping slots. The DropSubscription might be running at exactly the same time the apply worker has just dropped the very same tablesync slot. By saying missing_ok = true it means DropSubscription would not give ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail with an unexpected error. > > > But, > > whereas the user knows the name of the Subscription slot (they named > > it), there is no easy way for them to know the names of the remaining > > tablesync slots unless we log them. > > > > That is why the v22 code was written to process the tablesync slots > > even for wrconn == NULL so their name could be logged: > > elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".", > > syncslotname); > > > > The v23 patch removed this dangling slot name information, so it makes > > it difficult for the user to know what tablesync slots to cleanup. > > > > Okay, then can we think of combining with the existing error of the > replication slot? I think that might produce a very long message, so > another idea could be to LOG a separate WARNING for each such slot > just before giving the error. There may be many subscribed tables so I agree combining to one message might be too long. Yes, we can add another loop to output the necessary information. But, isn’t logging each tablesync slot WARNING before the subscription slot ERROR exactly the behaviour which already existed in v22. IIUC the DropSubscription refactoring in V23 was not done for any functional change, but was intended only to make the code simpler, but how is that goal achieved if v23 ends up needing 3 loops where v22 only needed 1 loop to do the same thing? ---- Kind Regards, Peter Smith. Fujitsu Australia.
On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I have made the below changes in the patch. Let me know what you think > > > > about these? > > > > 1. It was a bit difficult to understand the code in DropSubscription > > > > so I have rearranged the code to match the way we are doing in HEAD > > > > where we drop the slots at the end after finishing all the other > > > > cleanup. > > > > > > There was a reason why the v22 logic was different from HEAD. > > > > > > The broken connection leaves dangling slots which is unavoidable. > > > > > > > I think this is true only when the user specifically requested it by > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? > > Otherwise, we give an error on a broken connection. Also, if that is > > true then is there a reason to pass missing_ok as true while dropping > > tablesync slots? > > > > AFAIK there is always a potential race with DropSubscription dropping > slots. The DropSubscription might be running at exactly the same time > the apply worker has just dropped the very same tablesync slot. > We stopped the workers before getting a list of NotReady relations and then we try to drop the corresponding slots. So, how such a race condition can happen? Note, because we have a lock on pg_subscrition, there is no chance that the workers can restart till the transaction end. > By > saying missing_ok = true it means DropSubscription would not give > ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail > with an unexpected error. > > > > > > But, > > > whereas the user knows the name of the Subscription slot (they named > > > it), there is no easy way for them to know the names of the remaining > > > tablesync slots unless we log them. > > > > > > That is why the v22 code was written to process the tablesync slots > > > even for wrconn == NULL so their name could be logged: > > > elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".", > > > syncslotname); > > > > > > The v23 patch removed this dangling slot name information, so it makes > > > it difficult for the user to know what tablesync slots to cleanup. > > > > > > > Okay, then can we think of combining with the existing error of the > > replication slot? I think that might produce a very long message, so > > another idea could be to LOG a separate WARNING for each such slot > > just before giving the error. > > There may be many subscribed tables so I agree combining to one > message might be too long. Yes, we can add another loop to output the > necessary information. But, isn’t logging each tablesync slot WARNING > before the subscription slot ERROR exactly the behaviour which already > existed in v22. IIUC the DropSubscription refactoring in V23 was not > done for any functional change, but was intended only to make the code > simpler, but how is that goal achieved if v23 ends up needing 3 loops > where v22 only needed 1 loop to do the same thing? > No, there is a functionality change as well. The way we have code in v22 can easily lead to a problem when we have dropped the slots but get an error while removing origins or an entry from subscription rel. In such cases, we won't be able to rollback the drop of slots but the other database operations will be rolled back. This is the reason we have to drop the slots at the end. We need to ensure the same thing for AlterSubscription_refresh. Does this make sense now? -- With Regards, Amit Kapila.
On Mon, Feb 1, 2021 at 10:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > I have made the below changes in the patch. Let me know what you think > > > > > about these? > > > > > 1. It was a bit difficult to understand the code in DropSubscription > > > > > so I have rearranged the code to match the way we are doing in HEAD > > > > > where we drop the slots at the end after finishing all the other > > > > > cleanup. > > > > > > > > There was a reason why the v22 logic was different from HEAD. > > > > > > > > The broken connection leaves dangling slots which is unavoidable. > > > > > > > > > > I think this is true only when the user specifically requested it by > > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? > > > Otherwise, we give an error on a broken connection. Also, if that is > > > true then is there a reason to pass missing_ok as true while dropping > > > tablesync slots? > > > > > > > AFAIK there is always a potential race with DropSubscription dropping > > slots. The DropSubscription might be running at exactly the same time > > the apply worker has just dropped the very same tablesync slot. > > > > We stopped the workers before getting a list of NotReady relations and > then we try to drop the corresponding slots. So, how such a race > condition can happen? > I think it is possible that the state is still not SYNCDONE but the slot is already dropped so here we should be ready with the missing slot. -- With Regards, Amit Kapila.
On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > I have made the below changes in the patch. Let me know what you think > > > > > about these? > > > > > 1. It was a bit difficult to understand the code in DropSubscription > > > > > so I have rearranged the code to match the way we are doing in HEAD > > > > > where we drop the slots at the end after finishing all the other > > > > > cleanup. > > > > > > > > There was a reason why the v22 logic was different from HEAD. > > > > > > > > The broken connection leaves dangling slots which is unavoidable. > > > > > > > > > > I think this is true only when the user specifically requested it by > > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? > > > Otherwise, we give an error on a broken connection. Also, if that is > > > true then is there a reason to pass missing_ok as true while dropping > > > tablesync slots? > > > > > > > AFAIK there is always a potential race with DropSubscription dropping > > slots. The DropSubscription might be running at exactly the same time > > the apply worker has just dropped the very same tablesync slot. > > > > We stopped the workers before getting a list of NotReady relations and > then we try to drop the corresponding slots. So, how such a race > condition can happen? Note, because we have a lock on pg_subscrition, > there is no chance that the workers can restart till the transaction > end. OK. I think I was forgetting the logicalrep_worker_stop would also go into a loop waiting for the worker process to die. So even if the tablesync worker does simultaneously drop it's own slot, I think it will certainly at least be in SYNCDONE state before DropSubscription does anything else with that worker. > > > By > > saying missing_ok = true it means DropSubscription would not give > > ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail > > with an unexpected error. > > > > > > > > > But, > > > > whereas the user knows the name of the Subscription slot (they named > > > > it), there is no easy way for them to know the names of the remaining > > > > tablesync slots unless we log them. > > > > > > > > That is why the v22 code was written to process the tablesync slots > > > > even for wrconn == NULL so their name could be logged: > > > > elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".", > > > > syncslotname); > > > > > > > > The v23 patch removed this dangling slot name information, so it makes > > > > it difficult for the user to know what tablesync slots to cleanup. > > > > > > > > > > Okay, then can we think of combining with the existing error of the > > > replication slot? I think that might produce a very long message, so > > > another idea could be to LOG a separate WARNING for each such slot > > > just before giving the error. > > > > There may be many subscribed tables so I agree combining to one > > message might be too long. Yes, we can add another loop to output the > > necessary information. But, isn’t logging each tablesync slot WARNING > > before the subscription slot ERROR exactly the behaviour which already > > existed in v22. IIUC the DropSubscription refactoring in V23 was not > > done for any functional change, but was intended only to make the code > > simpler, but how is that goal achieved if v23 ends up needing 3 loops > > where v22 only needed 1 loop to do the same thing? > > > > No, there is a functionality change as well. The way we have code in > v22 can easily lead to a problem when we have dropped the slots but > get an error while removing origins or an entry from subscription rel. > In such cases, we won't be able to rollback the drop of slots but the > other database operations will be rolled back. This is the reason we > have to drop the slots at the end. We need to ensure the same thing > for AlterSubscription_refresh. Does this make sense now? > OK. ---- Kind Regards, Peter Smith. Fujitsu Australia.
On Mon, Feb 1, 2021 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > I think this is true only when the user specifically requested it by > > > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? > > > > Otherwise, we give an error on a broken connection. Also, if that is > > > > true then is there a reason to pass missing_ok as true while dropping > > > > tablesync slots? > > > > > > > > > > AFAIK there is always a potential race with DropSubscription dropping > > > slots. The DropSubscription might be running at exactly the same time > > > the apply worker has just dropped the very same tablesync slot. > > > > > > > We stopped the workers before getting a list of NotReady relations and > > then we try to drop the corresponding slots. So, how such a race > > condition can happen? Note, because we have a lock on pg_subscrition, > > there is no chance that the workers can restart till the transaction > > end. > > OK. I think I was forgetting the logicalrep_worker_stop would also go > into a loop waiting for the worker process to die. So even if the > tablesync worker does simultaneously drop it's own slot, I think it > will certainly at least be in SYNCDONE state before DropSubscription > does anything else with that worker. > How is that ensured? We don't have anything like HOLD_INTERRUPTS between the time dropped the slot and updated rel state as SYNCDONE. So, isn't it possible that after we dropped the slot and before we update the state, the SIGTERM signal arrives and led to worker exit? -- With Regards, Amit Kapila.
On Mon, Feb 1, 2021 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > AFAIK there is always a potential race with DropSubscription dropping > > > > slots. The DropSubscription might be running at exactly the same time > > > > the apply worker has just dropped the very same tablesync slot. > > > > > > > > > > We stopped the workers before getting a list of NotReady relations and > > > then we try to drop the corresponding slots. So, how such a race > > > condition can happen? Note, because we have a lock on pg_subscrition, > > > there is no chance that the workers can restart till the transaction > > > end. > > > > OK. I think I was forgetting the logicalrep_worker_stop would also go > > into a loop waiting for the worker process to die. So even if the > > tablesync worker does simultaneously drop it's own slot, I think it > > will certainly at least be in SYNCDONE state before DropSubscription > > does anything else with that worker. > > > > How is that ensured? We don't have anything like HOLD_INTERRUPTS > between the time dropped the slot and updated rel state as SYNCDONE. > So, isn't it possible that after we dropped the slot and before we > update the state, the SIGTERM signal arrives and led to worker exit? > The worker has the SIGTERM handler of "die". IIUC the "die" function doesn't normally do anything except set some flags to say please die at the next convenient opportunity. My understanding is that the worker process will not actually exit until it next executes CHECK_FOR_INTERRUPTS(), whereupon it will see the ProcDiePending flag and *really* die. So even if the SIGTERM signal arrives immediately after the slot is dropped, the tablesync will still become SYNCDONE. Is this wrong understanding? But your scenario could still be possible if "die" exited immediately (e.g. only in single user mode?). ---- Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Feb 1, 2021 at 1:08 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > AFAIK there is always a potential race with DropSubscription dropping > > > > > slots. The DropSubscription might be running at exactly the same time > > > > > the apply worker has just dropped the very same tablesync slot. > > > > > > > > > > > > > We stopped the workers before getting a list of NotReady relations and > > > > then we try to drop the corresponding slots. So, how such a race > > > > condition can happen? Note, because we have a lock on pg_subscrition, > > > > there is no chance that the workers can restart till the transaction > > > > end. > > > > > > OK. I think I was forgetting the logicalrep_worker_stop would also go > > > into a loop waiting for the worker process to die. So even if the > > > tablesync worker does simultaneously drop it's own slot, I think it > > > will certainly at least be in SYNCDONE state before DropSubscription > > > does anything else with that worker. > > > > > > > How is that ensured? We don't have anything like HOLD_INTERRUPTS > > between the time dropped the slot and updated rel state as SYNCDONE. > > So, isn't it possible that after we dropped the slot and before we > > update the state, the SIGTERM signal arrives and led to worker exit? > > > > The worker has the SIGTERM handler of "die". IIUC the "die" function > doesn't normally do anything except set some flags to say please die > at the next convenient opportunity. My understanding is that the > worker process will not actually exit until it next executes > CHECK_FOR_INTERRUPTS(), whereupon it will see the ProcDiePending flag > and *really* die. So even if the SIGTERM signal arrives immediately > after the slot is dropped, the tablesync will still become SYNCDONE. > Is this wrong understanding? > > But your scenario could still be possible if "die" exited immediately > (e.g. only in single user mode?). > I think it is possible without that as well. There are many calls in-between those two operations which can internally call CHECK_FOR_INTERRUPTS. One of the flows where such a possibility exists is UpdateSubscriptionRelState->SearchSysCacheCopy2->SearchSysCacheCopy->SearchSysCache->SearchCatCache->SearchCatCacheInternal->SearchCatCacheMiss->systable_getnext. This can internally call heapgetpage where we have CHECK_FOR_INTERRUPTS. I think even if today there was no CFI call we can't take a guarantee for the future as the calls used are quite common. So, probably we need missing_ok flag in DropSubscription. One more point in the tablesync code you are calling ReplicationSlotDropAtPubNode with missing_ok as false. What if we get an error after that and before we have marked the state as SYNCDONE? I guess it will always error from ReplicationSlotDropAtPubNode after that because we had already dropped the slot. -- With Regards, Amit Kapila.
On Mon, Feb 1, 2021 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > No, there is a functionality change as well. The way we have code in > > v22 can easily lead to a problem when we have dropped the slots but > > get an error while removing origins or an entry from subscription rel. > > In such cases, we won't be able to rollback the drop of slots but the > > other database operations will be rolled back. This is the reason we > > have to drop the slots at the end. We need to ensure the same thing > > for AlterSubscription_refresh. Does this make sense now? > > > > OK. > I have updated the patch to display WARNING for each of the tablesync slots during DropSubscription. As discussed, I have moved the drop slot related code towards the end in AlterSubscription_refresh. Apart from this, I have fixed one more issue in tablesync code where in after catching the exception we were not clearing the transaction state on the publisher, see changes in LogicalRepSyncTableStart. I have also fixed other comments raised by you. Additionally, I have removed the test because it was creating the same name slot as the tablesync worker and tablesync worker removed the same due to new logic in LogicalRepSyncStart. Earlier, it was not failing because of the bug in that code which I have fixed in the attached. I wonder whether we should restrict creating slots with prefix pg_ because we are internally creating slots with those names? I think this was a problem previously also. We already prohibit it for few other objects like origins, schema, etc., see the usage of IsReservedName. -- With Regards, Amit Kapila.
Attachment
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I have updated the patch to display WARNING for each of the tablesync > slots during DropSubscription. As discussed, I have moved the drop > slot related code towards the end in AlterSubscription_refresh. Apart > from this, I have fixed one more issue in tablesync code where in > after catching the exception we were not clearing the transaction > state on the publisher, see changes in LogicalRepSyncTableStart. I > have also fixed other comments raised by you. Here are some additional feedback comments about the v24 patch: ~~ ReportSlotConnectionError: 1,2,3,4. + foreach(lc, rstates) + { + SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc); + Oid relid = rstate->relid; + + /* Only cleanup resources of tablesync workers */ + if (!OidIsValid(relid)) + continue; + + /* + * Caller needs to ensure that we have appropriate locks so that + * relstate doesn't change underneath us. + */ + if (rstate->state != SUBREL_STATE_SYNCDONE) + { + char syncslotname[NAMEDATALEN] = { 0 }; + + ReplicationSlotNameForTablesync(subid, relid, syncslotname); + elog(WARNING, "could not drop tablesync replication slot \"%s\"", + syncslotname); + + } + } 1. I wonder if "rstates" would be better named something like "not_ready_rstates", otherwise it is not apparent what states are in this list 2. The comment "/* Only cleanup resources of tablesync workers */" is not quite correct because there is no cleanup happening here. Maybe change to: if (!OidIsValid(relid)) continue; /* not a tablesync worker */ 3. Maybe the "appropriate locks" comment can say what locks are the "appropriate" ones? 4. Spurious blank line after the elog? ~~ AlterSubscription_refresh: 5. + /* + * Drop the tablesync slot. This has to be at the end because otherwise if there + * is an error while doing the database operations we won't be able to rollback + * dropped slot. + */ Maybe "Drop the tablesync slot." should say "Drop the tablesync slots associated with removed tables." ~~ DropSubscription: 6. + /* + * Cleanup of tablesync replication origins. + * + * Any READY-state relations would already have dealt with clean-ups. + * + * Note that the state can't change because we have already stopped both + * the apply and tablesync workers and they can't restart because of + * exclusive lock on the subscription. + */ + rstates = GetSubscriptionNotReadyRelations(subid); + foreach(lc, rstates) I wonder if "rstates" would be better named as "not_ready_rstates", because it is used in several places where not READY is assumed. 7. + { + if (!slotname) + { + /* be tidy */ + list_free(rstates); + return; + } + else + { + ReportSlotConnectionError(rstates, subid, slotname, err); + } + + } Spurious blank line above? 8. The new logic of calling the ReportSlotConnectionError seems to be expecting that the user has encountered some connection error, and *after* that they have assigned slot_name = NONE as a workaround. In this scenario the code looks ok since names of any dangling tablesync slots were being logged at the time of the error. But I am wondering what about where the user might have set slot_name = NONE *before* the connection is broken. In this scenario, there is no ERROR, so if there are still (is it possible?) dangling tablesync slots, their names are never getting logged at all. So how can the user know what to delete? ~~ > Additionally, I have > removed the test because it was creating the same name slot as the > tablesync worker and tablesync worker removed the same due to new > logic in LogicalRepSyncStart. Earlier, it was not failing because of > the bug in that code which I have fixed in the attached. Wasn't causing a tablesync slot clash and seeing if it could recover the point of that test? Why not just keep, and modify the test to make it work again? Isn't it still valuable because at least it would execute the code through the PG_CATCH which otherwise may not get executed by any other test? > > I wonder whether we should restrict creating slots with prefix pg_ > because we are internally creating slots with those names? I think > this was a problem previously also. We already prohibit it for few > other objects like origins, schema, etc., see the usage of > IsReservedName. > Yes, we could restrict the create slot API if you really wanted to. But, IMO it is implausible that a user could "accidentally" clash with the internal tablesync slot name, so in practice maybe this change would not help much but it might make it more difficult to test some scenarios. ---- Kind Regards, Peter Smith. Fujitsu Australia
On Tue, Feb 2, 2021 at 8:29 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I have updated the patch to display WARNING for each of the tablesync > > slots during DropSubscription. As discussed, I have moved the drop > > slot related code towards the end in AlterSubscription_refresh. Apart > > from this, I have fixed one more issue in tablesync code where in > > after catching the exception we were not clearing the transaction > > state on the publisher, see changes in LogicalRepSyncTableStart. I > > have also fixed other comments raised by you. > > Here are some additional feedback comments about the v24 patch: > > ~~ > > ReportSlotConnectionError: > > 1,2,3,4. > + foreach(lc, rstates) > + { > + SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc); > + Oid relid = rstate->relid; > + > + /* Only cleanup resources of tablesync workers */ > + if (!OidIsValid(relid)) > + continue; > + > + /* > + * Caller needs to ensure that we have appropriate locks so that > + * relstate doesn't change underneath us. > + */ > + if (rstate->state != SUBREL_STATE_SYNCDONE) > + { > + char syncslotname[NAMEDATALEN] = { 0 }; > + > + ReplicationSlotNameForTablesync(subid, relid, syncslotname); > + elog(WARNING, "could not drop tablesync replication slot \"%s\"", > + syncslotname); > + > + } > + } > > 1. I wonder if "rstates" would be better named something like > "not_ready_rstates", otherwise it is not apparent what states are in > this list > I don't know if that would be better and it is used in the same way in the existing code. I find the current naming succinct. > 2. The comment "/* Only cleanup resources of tablesync workers */" is > not quite correct because there is no cleanup happening here. Maybe > change to: > if (!OidIsValid(relid)) > continue; /* not a tablesync worker */ > Aren't we trying to cleanup the tablesync slots here? So, I don't see the comment as irrelevant. > 3. Maybe the "appropriate locks" comment can say what locks are the > "appropriate" ones? > > 4. Spurious blank line after the elog? > Will fix both the above. > ~~ > > AlterSubscription_refresh: > > 5. > + /* > + * Drop the tablesync slot. This has to be at the end because > otherwise if there > + * is an error while doing the database operations we won't be able to rollback > + * dropped slot. > + */ > > Maybe "Drop the tablesync slot." should say "Drop the tablesync slots > associated with removed tables." > makes sense, will fix. > ~~ > > DropSubscription: > > 6. > + /* > + * Cleanup of tablesync replication origins. > + * > + * Any READY-state relations would already have dealt with clean-ups. > + * > + * Note that the state can't change because we have already stopped both > + * the apply and tablesync workers and they can't restart because of > + * exclusive lock on the subscription. > + */ > + rstates = GetSubscriptionNotReadyRelations(subid); > + foreach(lc, rstates) > > I wonder if "rstates" would be better named as "not_ready_rstates", > because it is used in several places where not READY is assumed. > Same response as above for similar comment. > 7. > + { > + if (!slotname) > + { > + /* be tidy */ > + list_free(rstates); > + return; > + } > + else > + { > + ReportSlotConnectionError(rstates, subid, slotname, err); > + } > + > + } > > Spurious blank line above? > Will fix. > 8. > The new logic of calling the ReportSlotConnectionError seems to be > expecting that the user has encountered some connection error, and > *after* that they have assigned slot_name = NONE as a workaround. In > this scenario the code looks ok since names of any dangling tablesync > slots were being logged at the time of the error. > > But I am wondering what about where the user might have set slot_name > = NONE *before* the connection is broken. In this scenario, there is > no ERROR, so if there are still (is it possible?) dangling tablesync > slots, their names are never getting logged at all. So how can the > user know what to delete? > It has been mentioned in docs that the user is responsible for cleaning that up manually in such a case. The patch has also described how the names are generated so that can help user to remove those. + These table synchronization slots have generated names: + <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription + <parameter>oid</parameter>, Table <parameter>relid</parameter>) I think if the user changes slot_name associated with the subscription, it would be his responsibility to clean up the previously associated slot. This is currently the case with the main subscription slot as well. I think it won't be advisable for the user to change slot_name unless under some rare cases where the system might be stuck like the one for which we are giving WARNING and providing a hint for setting the slot_name to NONE. > ~~ > > > Additionally, I have > > removed the test because it was creating the same name slot as the > > tablesync worker and tablesync worker removed the same due to new > > logic in LogicalRepSyncStart. Earlier, it was not failing because of > > the bug in that code which I have fixed in the attached. > > Wasn't causing a tablesync slot clash and seeing if it could recover > the point of that test? Why not just keep, and modify the test to make > it work again? > We can do that but my other worry was that we might want to reserve the names for slots that start with pg_. > Isn't it still valuable because at least it would > execute the code through the PG_CATCH which otherwise may not get > executed by any other test? > It is valuable but IIRC there was a test (in subscription/004_sync.pl) where PK violation happens during copy which will lead to the coverage of code in CATCH. > > > > I wonder whether we should restrict creating slots with prefix pg_ > > because we are internally creating slots with those names? I think > > this was a problem previously also. We already prohibit it for few > > other objects like origins, schema, etc., see the usage of > > IsReservedName. > > > > Yes, we could restrict the create slot API if you really wanted to. > But, IMO it is implausible that a user could "accidentally" clash with > the internal tablesync slot name, so in practice maybe this change > would not help much but it might make it more difficult to test some > scenarios. > Isn't the same true for origins? -- With Regards, Amit Kapila.
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I have updated the patch to display WARNING for each of the tablesync > slots during DropSubscription. As discussed, I have moved the drop > slot related code towards the end in AlterSubscription_refresh. Apart > from this, I have fixed one more issue in tablesync code where in > after catching the exception we were not clearing the transaction > state on the publisher, see changes in LogicalRepSyncTableStart. I > have also fixed other comments raised by you. Additionally, I have > removed the test because it was creating the same name slot as the > tablesync worker and tablesync worker removed the same due to new > logic in LogicalRepSyncStart. Earlier, it was not failing because of > the bug in that code which I have fixed in the attached. > I was testing this patch. I had a table on the subscriber which had a row that would cause a PK constraint violation during the table copy. This is resulting in the subscriber trying to rollback the table copy and failing. 2021-02-01 23:28:16.041 EST [23738] LOG: logical replication apply worker for subscription "tap_sub" has started 2021-02-01 23:28:16.051 EST [23740] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:21.118 EST [23740] ERROR: table copy could not rollback transaction on publisher 2021-02-01 23:28:21.118 EST [23740] DETAIL: The error was: another command is already in progress 2021-02-01 23:28:21.122 EST [8028] LOG: background worker "logical replication worker" (PID 23740) exited with exit code 1 2021-02-01 23:28:21.125 EST [23908] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:21.138 EST [23908] ERROR: could not create replication slot "pg_16398_sync_16384": ERROR: replication slot "pg_16398_sync_16384" already exists 2021-02-01 23:28:21.139 EST [8028] LOG: background worker "logical replication worker" (PID 23908) exited with exit code 1 2021-02-01 23:28:26.168 EST [24048] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:34.244 EST [24048] ERROR: table copy could not rollback transaction on publisher 2021-02-01 23:28:34.244 EST [24048] DETAIL: The error was: another command is already in progress 2021-02-01 23:28:34.251 EST [8028] LOG: background worker "logical replication worker" (PID 24048) exited with exit code 1 2021-02-01 23:28:34.254 EST [24337] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:34.263 EST [24337] ERROR: could not create replication slot "pg_16398_sync_16384": ERROR: replication slot "pg_16398_sync_16384" already exists 2021-02-01 23:28:34.264 EST [8028] LOG: background worker "logical replication worker" (PID 24337) exited with exit code 1 And one more thing I see is that now we error out in PG_CATCH() in LogicalRepSyncTableStart() with the above error and as a result, the tablesync slot is not dropped. Hence causing the slot create to fail in the next restart. I think this can be avoided. We could either attempt a rollback only on specific failures and drop slot prior to erroring out. regards, Ajin Cherian Fujitsu Australia
Another failure I see in my testing On publisher create a big enough table: publisher: postgres=# CREATE TABLE tab_rep (a int primary key);CREATE TABLE postgres=# INSERT INTO tab_rep SELECT generate_series(1,1000000); INSERT 0 1000000 postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES; CREATE PUBLICATION Subscriber: postgres=# CREATE TABLE tab_rep (a int primary key); CREATE TABLE postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false); Create the subscription but do not enable it: The below two commands on the subscriber should be issued quickly with no delay between them. postgres=# ALTER SUBSCRIPTION tap_sub enable; ALTER SUBSCRIPTION postgres=# ALTER SUBSCRIPTION tap_sub disable; ALTER SUBSCRIPTION This leaves the below state for the pg_subscription rel: postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn ---------+---------+------------+---------- 16395 | 16384 | f | (1 row) The rel is in the SUBREL_STATE_FINISHEDCOPY state. Meanwhile on the publisher, looking at the slots created: postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | x min | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size ---------------------+----------+-----------+--------+----------+-----------+--------+------------+-- ----+--------------+-------------+---------------------+------------+--------------- tap_sub | pgoutput | logical | 13859 | postgres | f | f | | | 517 | 0/9303660 | 0/9303698 | reserved | pg_16395_sync_16384 | pgoutput | logical | 13859 | postgres | f | f | | | 517 | 0/9303660 | 0/9303698 | reserved | (2 rows) There are two slots, the main slot as well as the tablesync slot, drop the table, re-enable the subscription and then drop the subscription Now on the subscriber: postgres=# drop table tab_rep; DROP TABLE postgres=# ALTER SUBSCRIPTION tap_sub enable; ALTER SUBSCRIPTION postgres=# drop subscription tap_sub ; NOTICE: dropped replication slot "tap_sub" on publisher DROP SUBSCRIPTION We see the tablesync slot dangling in the publisher: postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | x min | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size ---------------------+----------+-----------+--------+----------+-----------+--------+------------+-- ----+--------------+-------------+---------------------+------------+--------------- pg_16395_sync_16384 | pgoutput | logical | 13859 | postgres | f | f | | | 517 | 0/9303660 | 0/9303698 | reserved | (1 row) The dropping of the table, meant that after the tablesync is restarted, the worker has no idea about the old slot created as its name uses the relid of the dropped table. regards, Ajin Cherian Fujitsu Australia
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I have updated the patch to display WARNING for each of the tablesync > slots during DropSubscription. As discussed, I have moved the drop > slot related code towards the end in AlterSubscription_refresh. Apart > from this, I have fixed one more issue in tablesync code where in > after catching the exception we were not clearing the transaction > state on the publisher, see changes in LogicalRepSyncTableStart. ... I know that in another email [ac0202] Ajin has reported some problem he found related to this new (LogicalRepSyncTableStart PG_CATCH) code for some different use-case, but for my test scenario of a "broken connection during a table copy" the code did appear to be working properly. PSA detailed logs which show the test steps and output for this ""broken connection during a table copy" scenario. ---- [ac0202] https://www.postgresql.org/message-id/CAFPTHDaZw5o%2BwMbv3aveOzuLyz_rqZebXAj59rDKTJbwXFPYgw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Feb 2, 2021 at 10:34 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I have updated the patch to display WARNING for each of the tablesync > > slots during DropSubscription. As discussed, I have moved the drop > > slot related code towards the end in AlterSubscription_refresh. Apart > > from this, I have fixed one more issue in tablesync code where in > > after catching the exception we were not clearing the transaction > > state on the publisher, see changes in LogicalRepSyncTableStart. I > > have also fixed other comments raised by you. Additionally, I have > > removed the test because it was creating the same name slot as the > > tablesync worker and tablesync worker removed the same due to new > > logic in LogicalRepSyncStart. Earlier, it was not failing because of > > the bug in that code which I have fixed in the attached. > > > > I was testing this patch. I had a table on the subscriber which had a > row that would cause a PK constraint > violation during the table copy. This is resulting in the subscriber > trying to rollback the table copy and failing. > I am not getting this error. I have tried by below test: Publisher =========== CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 1); INSERT INTO mytbl1(somedata, text) VALUES (1, 2); COMMIT; CREATE PUBLICATION mypublication FOR TABLE mytbl1; Subscriber ============= CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 1); INSERT INTO mytbl1(somedata, text) VALUES (1, 2); COMMIT; CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypublication; It generates the PK violation the first time and then I removed the conflicting rows in the subscriber and it passed. See logs below. 2021-02-02 13:51:34.316 IST [20796] LOG: logical replication table synchronization worker for subscription "mysub", table "mytbl1" has started 2021-02-02 13:52:43.625 IST [20796] ERROR: duplicate key value violates unique constraint "mytbl1_pkey" 2021-02-02 13:52:43.625 IST [20796] DETAIL: Key (id)=(1) already exists. 2021-02-02 13:52:43.625 IST [20796] CONTEXT: COPY mytbl1, line 1 2021-02-02 13:52:43.695 IST [27840] LOG: background worker "logical replication worker" (PID 20796) exited with exit code 1 2021-02-02 13:52:43.884 IST [6260] LOG: logical replication table synchronization worker for subscription "mysub", table "mytbl1" has started 2021-02-02 13:53:54.680 IST [6260] LOG: logical replication table synchronization worker for subscription "mysub", table "mytbl1" has finished Also, a similar test exists in 0004_sync.pl, is that also failing for you? Can you please provide detailed steps that led to this failure? > > And one more thing I see is that now we error out in PG_CATCH() in > LogicalRepSyncTableStart() with the above error and as a result, the > tablesync slot is not dropped. Hence causing the slot create to fail > in the next restart. > I think this can be avoided. We could either attempt a rollback only > on specific failures and drop slot prior to erroring out. > Hmm, we have to first rollback before attempting any other operation because the transaction on the publisher is in an errored state. -- With Regards, Amit Kapila.
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also tried a simple test where I do a DROP TABLE with very bad timing for the tablesync worker. It seems that doing this can cause the sync worker's MyLogicalRepWorker->relid to become invalid. In my test this caused a stack trace within some logging, but I imagine other bad things can happen if the tablesync worker can be executed with an invalid relid. Possibly this is an existing PG bug which has just never been seen before; The ereport which has failed here is not new code. PSA the log for the test steps and the stack trace details. ---- [ac0202] https://www.postgresql.org/message-id/CAFPTHDYzjaNfzsFHpER9idAPB8v5j%3DSUbWL0AKj5iVy0BKbTpg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Feb 2, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 2, 2021 at 10:34 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I have updated the patch to display WARNING for each of the tablesync > > > slots during DropSubscription. As discussed, I have moved the drop > > > slot related code towards the end in AlterSubscription_refresh. Apart > > > from this, I have fixed one more issue in tablesync code where in > > > after catching the exception we were not clearing the transaction > > > state on the publisher, see changes in LogicalRepSyncTableStart. I > > > have also fixed other comments raised by you. Additionally, I have > > > removed the test because it was creating the same name slot as the > > > tablesync worker and tablesync worker removed the same due to new > > > logic in LogicalRepSyncStart. Earlier, it was not failing because of > > > the bug in that code which I have fixed in the attached. > > > > > > > I was testing this patch. I had a table on the subscriber which had a > > row that would cause a PK constraint > > violation during the table copy. This is resulting in the subscriber > > trying to rollback the table copy and failing. > > > > I am not getting this error. I have tried by below test: I am sorry, my above steps were not correct. I think the reason for the failure I was seeing were some other steps I did prior to this. I will recreate this and update you with the appropriate steps. regards, Ajin Cherian Fujitsu Australia
On Tue, Feb 2, 2021 at 11:35 AM Ajin Cherian <itsajin@gmail.com> wrote: > > Another failure I see in my testing > The problem here is that we are allowing to drop the table when table synchronization is still in progress and then we don't have any way to know the corresponding slot or origin. I think we can try to drop the slot and origin as well but that is not a good idea because slots once dropped won't be rolled back. So, I have added a fix to disallow the drop of the table when table synchronization is still in progress. Apart from that, I have fixed comments raised by Peter as discussed above and made some additional changes in comments, code (code changes are cosmetic), and docs. Let me know if the issue reported is fixed or not? -- With Regards, Amit Kapila.
Attachment
On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also > tried a simple test where I do a DROP TABLE with very bad timing for > the tablesync worker. It seems that doing this can cause the sync > worker's MyLogicalRepWorker->relid to become invalid. > I think this should be fixed by latest patch because I have disallowed drop of a table when its synchronization is in progress. You can check once and let me know if the issue still exists? -- With Regards, Amit Kapila.
On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also > > tried a simple test where I do a DROP TABLE with very bad timing for > > the tablesync worker. It seems that doing this can cause the sync > > worker's MyLogicalRepWorker->relid to become invalid. > > > > I think this should be fixed by latest patch because I have disallowed > drop of a table when its synchronization is in progress. You can check > once and let me know if the issue still exists? > FYI - I confirmed that the problem scenario that I reported yesterday is no longer possible because now the V25 patch is disallowing the DROP TABLE while the tablesync is still running. PSA my test logs showing it is now working as expected. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Wed, Feb 3, 2021 at 12:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > The problem here is that we are allowing to drop the table when table > synchronization is still in progress and then we don't have any way to > know the corresponding slot or origin. I think we can try to drop the > slot and origin as well but that is not a good idea because slots once > dropped won't be rolled back. So, I have added a fix to disallow the > drop of the table when table synchronization is still in progress. > Apart from that, I have fixed comments raised by Peter as discussed > above and made some additional changes in comments, code (code changes > are cosmetic), and docs. > > Let me know if the issue reported is fixed or not? Yes, the issue is fixed, now the table drop results in an error. postgres=# drop table tab_rep ; ERROR: could not drop relation mapping for subscription "tap_sub" DETAIL: Table synchronization for relation "tab_rep" is in progress and is in state "f". HINT: Use ALTER SUBSCRIPTION ... ENABLE to enable subscription if not already enabled or use DROP SUBSCRIPTION ... to drop the subscription. regards, Ajin Cherian Fujitsu Australia
On Wed, Feb 3, 2021 at 6:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also > > > tried a simple test where I do a DROP TABLE with very bad timing for > > > the tablesync worker. It seems that doing this can cause the sync > > > worker's MyLogicalRepWorker->relid to become invalid. > > > > > > > I think this should be fixed by latest patch because I have disallowed > > drop of a table when its synchronization is in progress. You can check > > once and let me know if the issue still exists? > > > > FYI - I confirmed that the problem scenario that I reported yesterday > is no longer possible because now the V25 patch is disallowing the > DROP TABLE while the tablesync is still running. > Thanks for the confirmation. BTW, can you please check if we can reproduce that problem without this patch? If so, we might want to apply this fix irrespective of this patch. If not, why not? -- With Regards, Amit Kapila.
On Wed, Feb 3, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 3, 2021 at 6:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also > > > > tried a simple test where I do a DROP TABLE with very bad timing for > > > > the tablesync worker. It seems that doing this can cause the sync > > > > worker's MyLogicalRepWorker->relid to become invalid. > > > > > > > > > > I think this should be fixed by latest patch because I have disallowed > > > drop of a table when its synchronization is in progress. You can check > > > once and let me know if the issue still exists? > > > > > > > FYI - I confirmed that the problem scenario that I reported yesterday > > is no longer possible because now the V25 patch is disallowing the > > DROP TABLE while the tablesync is still running. > > > > Thanks for the confirmation. BTW, can you please check if we can > reproduce that problem without this patch? If so, we might want to > apply this fix irrespective of this patch. If not, why not? > Yes, this was an existing postgres bug. It is independent of the patch. I can reproduce exactly the same stacktrace using the HEAD src pulled @ 3/Feb. PSA my test logs showing the details. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Feb 2, 2021 at 9:03 PM Ajin Cherian <itsajin@gmail.com> wrote: > I am sorry, my above steps were not correct. I think the reason for > the failure I was seeing were some other steps I did prior to this. I > will recreate this and update you with the appropriate steps. The correct steps are as follows: Publisher: postgres=# CREATE TABLE tab_rep (a int primary key); CREATE TABLE postgres=# INSERT INTO tab_rep SELECT generate_series(1,1000000); INSERT 0 1000000 postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES; CREATE PUBLICATION Subscriber: postgres=# CREATE TABLE tab_rep (a int primary key); CREATE TABLE postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false); NOTICE: created replication slot "tap_sub" on publisher CREATE SUBSCRIPTION postgres=# ALTER SUBSCRIPTION tap_sub enable; ALTER SUBSCRIPTION Allow the tablesync to complete and then drop the subscription, the table remains full and restarting the subscription should fail with a constraint violation during tablesync but it does not. Subscriber: postgres=# drop subscription tap_sub ; NOTICE: dropped replication slot "tap_sub" on publisher DROP SUBSCRIPTION postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false); NOTICE: created replication slot "tap_sub" on publisher CREATE SUBSCRIPTION postgres=# ALTER SUBSCRIPTION tap_sub enable; ALTER SUBSCRIPTION This takes the subscriber into an error loop but no mention of what the error was: 2021-02-02 05:01:34.698 EST [1549] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-02 05:01:34.739 EST [1549] ERROR: table copy could not rollback transaction on publisher 2021-02-02 05:01:34.739 EST [1549] DETAIL: The error was: another command is already in progress 2021-02-02 05:01:34.740 EST [8028] LOG: background worker "logical replication worker" (PID 1549) exited with exit code 1 2021-02-02 05:01:40.107 EST [1711] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-02 05:01:40.121 EST [1711] ERROR: could not create replication slot "pg_16479_sync_16435": ERROR: replication slot "pg_16479_sync_16435" already exists 2021-02-02 05:01:40.121 EST [8028] LOG: background worker "logical replication worker" (PID 1711) exited with exit code 1 2021-02-02 05:01:45.140 EST [1891] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started regards, Ajin Cherian Fujitsu Australia
On Wed, Feb 3, 2021 at 2:51 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Feb 3, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Feb 3, 2021 at 6:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also > > > > > tried a simple test where I do a DROP TABLE with very bad timing for > > > > > the tablesync worker. It seems that doing this can cause the sync > > > > > worker's MyLogicalRepWorker->relid to become invalid. > > > > > > > > > > > > > I think this should be fixed by latest patch because I have disallowed > > > > drop of a table when its synchronization is in progress. You can check > > > > once and let me know if the issue still exists? > > > > > > > > > > FYI - I confirmed that the problem scenario that I reported yesterday > > > is no longer possible because now the V25 patch is disallowing the > > > DROP TABLE while the tablesync is still running. > > > > > > > Thanks for the confirmation. BTW, can you please check if we can > > reproduce that problem without this patch? If so, we might want to > > apply this fix irrespective of this patch. If not, why not? > > > > Yes, this was an existing postgres bug. It is independent of the patch. > > I can reproduce exactly the same stacktrace using the HEAD src pulled @ 3/Feb. > > PSA my test logs showing the details. > Since this is an existing PG bug independent of this patch, I spawned a new thread [ps0202] to deal with this problem. ---- [ps0202] https://www.postgresql.org/message-id/CAHut%2BPu7Z4a%3Domo%2BTvK5Gub2hxcJ7-3%2BBu1FO_%2B%2BfpFTW0oQfQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Feb 3, 2021 at 1:28 PM Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Feb 2, 2021 at 9:03 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > I am sorry, my above steps were not correct. I think the reason for > > the failure I was seeing were some other steps I did prior to this. I > > will recreate this and update you with the appropriate steps. > > The correct steps are as follows: > > Publisher: > > postgres=# CREATE TABLE tab_rep (a int primary key); > CREATE TABLE > postgres=# INSERT INTO tab_rep SELECT generate_series(1,1000000); > INSERT 0 1000000 > postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES; > CREATE PUBLICATION > > Subscriber: > postgres=# CREATE TABLE tab_rep (a int primary key); > CREATE TABLE > postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost > dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false); > NOTICE: created replication slot "tap_sub" on publisher > CREATE SUBSCRIPTION > postgres=# ALTER SUBSCRIPTION tap_sub enable; > ALTER SUBSCRIPTION > > Allow the tablesync to complete and then drop the subscription, the > table remains full and restarting the subscription should fail with a > constraint violation during tablesync but it does not. > > > Subscriber: > postgres=# drop subscription tap_sub ; > NOTICE: dropped replication slot "tap_sub" on publisher > DROP SUBSCRIPTION > postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost > dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false); > NOTICE: created replication slot "tap_sub" on publisher > CREATE SUBSCRIPTION > postgres=# ALTER SUBSCRIPTION tap_sub enable; > ALTER SUBSCRIPTION > > This takes the subscriber into an error loop but no mention of what > the error was: > Thanks for the report. The problem here was that the error occurred when we were trying to copy the large data. Now, before fetching the entire data we issued a rollback that led to this problem. I think the alternative here could be to first fetch the entire data when the error occurred then issue the following commands. Instead, I have modified the patch to perform 'drop_replication_slot' in the beginning if the relstate is datasync. Do let me know if you can think of a better way to fix this? -- With Regards, Amit Kapila.
Attachment
On Wed, Feb 3, 2021 at 11:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Thanks for the report. The problem here was that the error occurred > when we were trying to copy the large data. Now, before fetching the > entire data we issued a rollback that led to this problem. I think the > alternative here could be to first fetch the entire data when the > error occurred then issue the following commands. Instead, I have > modified the patch to perform 'drop_replication_slot' in the beginning > if the relstate is datasync. Do let me know if you can think of a > better way to fix this? I have verified that the problem is not seen after this patch. I also agree with the approach taken for the fix, regards, Ajin Cherian Fujitsu Australia
On Thu, Feb 4, 2021 at 9:55 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Wed, Feb 3, 2021 at 11:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Thanks for the report. The problem here was that the error occurred > > when we were trying to copy the large data. Now, before fetching the > > entire data we issued a rollback that led to this problem. I think the > > alternative here could be to first fetch the entire data when the > > error occurred then issue the following commands. Instead, I have > > modified the patch to perform 'drop_replication_slot' in the beginning > > if the relstate is datasync. Do let me know if you can think of a > > better way to fix this? > > I have verified that the problem is not seen after this patch. I also > agree with the approach taken for the fix, > Thanks. I have fixed one of the issues reported by me earlier [1] wherein the tablesync worker can repeatedly fail if after dropping the slot there is an error while updating the SYNCDONE state in the database. I have moved the drop of the slot just before commit of the transaction where we are marking the state as SYNCDONE. Additionally, I have removed unnecessary includes in tablesync.c, updated the docs for Alter Subscription, and updated the comments at various places in the patch. I have also updated the commit message this time. I am still not very happy with the way we handle concurrent drop origins but probably that would be addressed by the other patch Peter is working on [2]. [1] - https://www.postgresql.org/message-id/CAA4eK1JdWv84nMyCpTboBURjj70y3BfO1xdy8SYPRqNxtH7TEA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAHut%2BPsW6%2B7Ucb1sxjSNBaSYPGAVzQFbAEg4y1KsYQiGCnyGeQ%40mail.gmail.com -- With Regards, Amit Kapila.
Attachment
On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > ... > Thanks. I have fixed one of the issues reported by me earlier [1] > wherein the tablesync worker can repeatedly fail if after dropping the > slot there is an error while updating the SYNCDONE state in the > database. I have moved the drop of the slot just before commit of the > transaction where we are marking the state as SYNCDONE. Additionally, > I have removed unnecessary includes in tablesync.c, updated the docs > for Alter Subscription, and updated the comments at various places in > the patch. I have also updated the commit message this time. > Below are my feedback comments for V17 (nothing functional) ~~ 1. V27 Commit message: For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronizes the position in the stream with the main apply worker. Typo: "synchronizes" -> "synchronize" ~~ 2. @@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO < (Currently, all subscription owners must be superusers, so the owner checks will be bypassed in practice. But this might change in the future.) </para> + + <para> + When refreshing a publication we remove the relations that are no longer + part of the publication and we also remove the tablesync slots if there are + any. It is necessary to remove tablesync slots so that the resources + allocated for the subscription on the remote host are released. If due to + network breakdown or some other error, we are not able to remove the slots, + we give WARNING and the user needs to manually remove such slots later as + otherwise, they will continue to reserve WAL and might eventually cause + the disk to fill up. See also <xref linkend="logical-replication-subscription-slot"/>. + </para> I think the content is good, but the 1st-person wording seemed strange. e.g. "we are not able to remove the slots, we give WARNING and the user needs..." Maybe it should be like: "... PostgreSQL is unable to remove the slots, so a WARNING is reported. The user needs... " ~~ 3. @@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data) ... + * XXX If there is a network break down while dropping the "network break down" -> "network breakdown" ~~ 4. @@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) (errmsg("could not connect to the publisher: %s", err))); ... + * XXX We could also instead try to drop the slot, last time we failed + * but for that, we might need to clean up the copy state as it might + * be in the middle of fetching the rows. Also, if there is a network + * break down then it wouldn't have succeeded so trying it next time + * seems like a better bet. "network break down" -> "network breakdown" ~~ 5. @@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue) ... + + /* + * Cleanup the tablesync slot. + * + * This has to be done after updating the state because otherwise if + * there is an error while doing the database operations we won't be + * able to rollback dropped slot. + */ + ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + syncslotname); + + ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */); + Should this comment also describe why the missing_ok is false for this case? ---- Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Feb 5, 2021 at 7:09 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > ... > > > Thanks. I have fixed one of the issues reported by me earlier [1] > > wherein the tablesync worker can repeatedly fail if after dropping the > > slot there is an error while updating the SYNCDONE state in the > > database. I have moved the drop of the slot just before commit of the > > transaction where we are marking the state as SYNCDONE. Additionally, > > I have removed unnecessary includes in tablesync.c, updated the docs > > for Alter Subscription, and updated the comments at various places in > > the patch. I have also updated the commit message this time. > > > > Below are my feedback comments for V17 (nothing functional) > > ~~ > > 1. > V27 Commit message: > For the initial table data synchronization in logical replication, we use > a single transaction to copy the entire table and then synchronizes the > position in the stream with the main apply worker. > > Typo: > "synchronizes" -> "synchronize" > Fixed and added a note about Alter Sub .. Refresh .. command can't be executed in the transaction block. > ~~ > > 2. > @@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable > class="parameter">name</replaceable> RENAME TO < > (Currently, all subscription owners must be superusers, so the owner checks > will be bypassed in practice. But this might change in the future.) > </para> > + > + <para> > + When refreshing a publication we remove the relations that are no longer > + part of the publication and we also remove the tablesync slots if there are > + any. It is necessary to remove tablesync slots so that the resources > + allocated for the subscription on the remote host are released. If due to > + network breakdown or some other error, we are not able to remove the slots, > + we give WARNING and the user needs to manually remove such slots later as > + otherwise, they will continue to reserve WAL and might eventually cause > + the disk to fill up. See also <xref > linkend="logical-replication-subscription-slot"/>. > + </para> > > I think the content is good, but the 1st-person wording seemed strange. > e.g. > "we are not able to remove the slots, we give WARNING and the user needs..." > Maybe it should be like: > "... PostgreSQL is unable to remove the slots, so a WARNING is > reported. The user needs... " > Changed as per suggestion with a minor tweak. > ~~ > > 3. > @@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub, > bool copy_data) > ... > + * XXX If there is a network break down while dropping the > > "network break down" -> "network breakdown" > > ~~ > > 4. > @@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) > (errmsg("could not connect to the publisher: %s", err))); > ... > + * XXX We could also instead try to drop the slot, last time we failed > + * but for that, we might need to clean up the copy state as it might > + * be in the middle of fetching the rows. Also, if there is a network > + * break down then it wouldn't have succeeded so trying it next time > + * seems like a better bet. > > "network break down" -> "network breakdown" > Changed as per suggestion. > ~~ > > 5. > @@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int > cacheid, uint32 hashvalue) > ... > + > + /* > + * Cleanup the tablesync slot. > + * > + * This has to be done after updating the state because otherwise if > + * there is an error while doing the database operations we won't be > + * able to rollback dropped slot. > + */ > + ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid, > + MyLogicalRepWorker->relid, > + syncslotname); > + > + ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */); > + > > Should this comment also describe why the missing_ok is false for this case? > Yeah that makes sense, so added a comment. Additionally, I have changed the errorcode in RemoveSubscriptionRel, moved the setup of origin before copy_table in LogicalRepSyncTableStart to avoid doing copy again due to an error in setting up origin. I have made a few comment changes as well. -- With Regards, Amit Kapila.
Attachment
Hello On Friday, February 5, 2021 2:23 PM Amit Kapila <amit.kapila16@gmail.com> > On Fri, Feb 5, 2021 at 7:09 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > ... > > > > > Thanks. I have fixed one of the issues reported by me earlier [1] > > > wherein the tablesync worker can repeatedly fail if after dropping > > > the slot there is an error while updating the SYNCDONE state in the > > > database. I have moved the drop of the slot just before commit of > > > the transaction where we are marking the state as SYNCDONE. > > > Additionally, I have removed unnecessary includes in tablesync.c, > > > updated the docs for Alter Subscription, and updated the comments at > > > various places in the patch. I have also updated the commit message this > time. > > > > > > > Below are my feedback comments for V17 (nothing functional) > > > > ~~ > > > > 1. > > V27 Commit message: > > For the initial table data synchronization in logical replication, we > > use a single transaction to copy the entire table and then > > synchronizes the position in the stream with the main apply worker. > > > > Typo: > > "synchronizes" -> "synchronize" > > > > Fixed and added a note about Alter Sub .. Refresh .. command can't be > executed in the transaction block. Thank you for the updates. We need to add some tests to prove the new checks of AlterSubscription() work. I chose TAP tests as we need to set connect = true for the subscription. When it can contribute to the development, please utilize this. I used v28 to check my patch and works as we expect. Best Regards, Takamichi Osumi
Attachment
On Fri, Feb 5, 2021 at 12:36 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > We need to add some tests to prove the new checks of AlterSubscription() work. > I chose TAP tests as we need to set connect = true for the subscription. > When it can contribute to the development, please utilize this. > I used v28 to check my patch and works as we expect. > Thanks for writing the tests but I don't understand why you need to set connect = true for this test? I have tried below '... with connect = false' and it seems to be working: postgres=# CREATE SUBSCRIPTION mysub postgres-# CONNECTION 'host=localhost port=5432 dbname=postgres' postgres-# PUBLICATION mypublication WITH (connect = false); WARNING: tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables CREATE SUBSCRIPTION postgres=# Begin; BEGIN postgres=*# Alter Subscription mysub Refresh Publication; ERROR: ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled subscriptions So, if possible lets write this test in src/test/regress/sql/subscription.sql. I have another idea for a test case: What if we write a test such that it fails PK violation on copy and then drop the subscription. Then check there shouldn't be any dangling slot on the publisher? This is similar to a test in subscription/t/004_sync.pl, we can use some of that framework but have a separate test for this. -- With Regards, Amit Kapila.
I did some basic cross-version testing, publisher on PG13 and subscriber on PG14 and publisher on PG14 and subscriber on PG13. Did some basic operations, CREATE, ALTER and STOP subscriptions and it seemed to work fine, no errors. regards, Ajin Cherian Fujitsu Australia.
Hi, We had a bit high-level discussion about this patches with Amit off-list, so I decided to also take a look at the actual code. My main concern originally was the potential for left-over slots on publisher, but I think the state now is relatively okay, with couple of corner cases that are documented and don't seem much worse than the main slot. I wonder if we should mention the max_slot_wal_keep_size GUC in the table sync docs though. Another thing that might need documentation is that the the visibility of changes done by table sync is not anymore isolated in that table contents will show intermediate progress to other backends, rather than switching from nothing to state consistent with rest of replication. Some minor comments about code: > + else if (res->status == WALRCV_ERROR && missing_ok) > + { > + /* WARNING. Error, but missing_ok = true. */ > + ereport(WARNING, I wonder if we need to add error code to the WalRcvExecResult and check for the appropriate ones here. Because this can for example return error because of timeout, not because slot is missing. Not sure if it matters for current callers though (but then maybe don't call the param missign_ok?). > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) > +{ > + if (syncslotname) > + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); > + else > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > + > + return syncslotname; > +} Given that we are now explicitly dropping slots, what happens here if we have 2 different downstreams that happen to get same suboid and reloid, will one of the drop the slot of the other one? Previously with the cleanup being left to temp slot we'd at maximum got error when creating it but with the new logic in LogicalRepSyncTableStart it feels like we could get into situation where 2 downstreams are fighting over slot no? -- Petr
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote: > > > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) > > +{ > > + if (syncslotname) > > + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); > > + else > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > + > > + return syncslotname; > > +} > > Given that we are now explicitly dropping slots, what happens here if we > have 2 different downstreams that happen to get same suboid and reloid, > will one of the drop the slot of the other one? Previously with the > cleanup being left to temp slot we'd at maximum got error when creating > it but with the new logic in LogicalRepSyncTableStart it feels like we > could get into situation where 2 downstreams are fighting over slot no? > The PG docs [1] says "there is only one copy of pg_subscription per cluster, not one per database". IIUC that means it is not possible for 2 different subscriptions to have the same suboid. And if the suboid is globally unique then syncslotname name is also unique. Is that understanding not correct? ----- [1] https://www.postgresql.org/docs/devel/catalog-pg-subscription.html Kind Regards, Peter Smith. Fujitsu Australia
On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek > <petr.jelinek@enterprisedb.com> wrote: > > > > > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) > > > +{ > > > + if (syncslotname) > > > + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); > > > + else > > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > > + > > > + return syncslotname; > > > +} > > > > Given that we are now explicitly dropping slots, what happens here if we > > have 2 different downstreams that happen to get same suboid and reloid, > > will one of the drop the slot of the other one? Previously with the > > cleanup being left to temp slot we'd at maximum got error when creating > > it but with the new logic in LogicalRepSyncTableStart it feels like we > > could get into situation where 2 downstreams are fighting over slot no? > > I think so. See, if the alternative suggested below works or if you have any other suggestions for the same? > > The PG docs [1] says "there is only one copy of pg_subscription per > cluster, not one per database". IIUC that means it is not possible for > 2 different subscriptions to have the same suboid. > I think he is talking about two different clusters having separate subscriptions but point to the same publisher. In different clusters, we can get the same subid/relid. I think we need a cluster-wide unique identifier to distinguish among different subscribers. How about using the system_identifier stored in the control file (we can use GetSystemIdentifier to retrieve it). I think one concern could be that adding that to slot name could exceed the max length of slot (NAMEDATALEN -1) but I don't think that is the case here (pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last is system_identifier in this scheme. Do you guys think that works or let me know if you have any other better idea? Petr, is there a reason why such an identifier is not considered originally, is there any risk in it? -- With Regards, Amit Kapila.
Hi On Friday, February 5, 2021 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Feb 5, 2021 at 12:36 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > We need to add some tests to prove the new checks of AlterSubscription() > work. > > I chose TAP tests as we need to set connect = true for the subscription. > > When it can contribute to the development, please utilize this. > > I used v28 to check my patch and works as we expect. > > > > Thanks for writing the tests but I don't understand why you need to set > connect = true for this test? I have tried below '... with connect = false' and it > seems to be working: > postgres=# CREATE SUBSCRIPTION mysub > postgres-# CONNECTION 'host=localhost port=5432 > dbname=postgres' > postgres-# PUBLICATION mypublication WITH (connect = false); > WARNING: tables were not subscribed, you will have to run ALTER > SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables CREATE > SUBSCRIPTION postgres=# Begin; BEGIN postgres=*# Alter Subscription > mysub Refresh Publication; > ERROR: ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled > subscriptions > > So, if possible lets write this test in src/test/regress/sql/subscription.sql. OK. I changed the place to write the tests for those. > I have another idea for a test case: What if we write a test such that it fails PK > violation on copy and then drop the subscription. Then check there shouldn't > be any dangling slot on the publisher? This is similar to a test in > subscription/t/004_sync.pl, we can use some of that framework but have a > separate test for this. I've added this PK violation test to the attached tests. The patch works with v28 and made no failure during regression tests. Best Regards, Takamichi Osumi
Attachment
On 06/02/2021 06:07, Amit Kapila wrote: > On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2250@gmail.com> wrote: >> On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek >> <petr.jelinek@enterprisedb.com> wrote: >>>> +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) >>>> +{ >>>> + if (syncslotname) >>>> + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); >>>> + else >>>> + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); >>>> + >>>> + return syncslotname; >>>> +} >>> Given that we are now explicitly dropping slots, what happens here if we >>> have 2 different downstreams that happen to get same suboid and reloid, >>> will one of the drop the slot of the other one? Previously with the >>> cleanup being left to temp slot we'd at maximum got error when creating >>> it but with the new logic in LogicalRepSyncTableStart it feels like we >>> could get into situation where 2 downstreams are fighting over slot no? >>> > I think so. See, if the alternative suggested below works or if you > have any other suggestions for the same? > >> The PG docs [1] says "there is only one copy of pg_subscription per >> cluster, not one per database". IIUC that means it is not possible for >> 2 different subscriptions to have the same suboid. >> > I think he is talking about two different clusters having separate > subscriptions but point to the same publisher. In different clusters, > we can get the same subid/relid. I think we need a cluster-wide unique > identifier to distinguish among different subscribers. How about using > the system_identifier stored in the control file (we can use > GetSystemIdentifier to retrieve it). I think one concern could be > that adding that to slot name could exceed the max length of slot > (NAMEDATALEN -1) but I don't think that is the case here > (pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last > is system_identifier in this scheme. Yep that's what I mean and system_identifier seems like a good choice to me. > Do you guys think that works or let me know if you have any other > better idea? Petr, is there a reason why such an identifier is not > considered originally, is there any risk in it? Originally it was not considered likely because it's all based on pglogical/BDR work where ids are hashes of stuff that's unique across group of instances, not counter based like Oids in PostgreSQL and I simply didn't realize it could be a problem until reading this patch :) -- Petr Jelinek
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote: > > Hi, > > Some minor comments about code: > > > + else if (res->status == WALRCV_ERROR && missing_ok) > > + { > > + /* WARNING. Error, but missing_ok = true. */ > > + ereport(WARNING, > > I wonder if we need to add error code to the WalRcvExecResult and check > for the appropriate ones here. Because this can for example return error > because of timeout, not because slot is missing. Not sure if it matters > for current callers though (but then maybe don't call the param > missign_ok?). You are right. The way we are using this function has evolved beyond the original intention. Probably renaming the param to something like "error_ok" would be more appropriate now. ---- Kind Regards, Peter Smith. Fujitsu Australia
On Sun, Feb 7, 2021 at 2:38 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek > <petr.jelinek@enterprisedb.com> wrote: > > > > Hi, > > > > Some minor comments about code: > > > > > + else if (res->status == WALRCV_ERROR && missing_ok) > > > + { > > > + /* WARNING. Error, but missing_ok = true. */ > > > + ereport(WARNING, > > > > I wonder if we need to add error code to the WalRcvExecResult and check > > for the appropriate ones here. Because this can for example return error > > because of timeout, not because slot is missing. Not sure if it matters > > for current callers though (but then maybe don't call the param > > missign_ok?). > > You are right. The way we are using this function has evolved beyond > the original intention. > Probably renaming the param to something like "error_ok" would be more > appropriate now. > PSA a patch (apply on top of V28) to change the misleading param name. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > Hi > > > On Friday, February 5, 2021 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Feb 5, 2021 at 12:36 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > We need to add some tests to prove the new checks of AlterSubscription() > > work. > > > I chose TAP tests as we need to set connect = true for the subscription. > > > When it can contribute to the development, please utilize this. > > > I used v28 to check my patch and works as we expect. > > > > > > > Thanks for writing the tests but I don't understand why you need to set > > connect = true for this test? I have tried below '... with connect = false' and it > > seems to be working: > > postgres=# CREATE SUBSCRIPTION mysub > > postgres-# CONNECTION 'host=localhost port=5432 > > dbname=postgres' > > postgres-# PUBLICATION mypublication WITH (connect = false); > > WARNING: tables were not subscribed, you will have to run ALTER > > SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables CREATE > > SUBSCRIPTION postgres=# Begin; BEGIN postgres=*# Alter Subscription > > mysub Refresh Publication; > > ERROR: ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled > > subscriptions > > > > So, if possible lets write this test in src/test/regress/sql/subscription.sql. > OK. I changed the place to write the tests for those. > > > > I have another idea for a test case: What if we write a test such that it fails PK > > violation on copy and then drop the subscription. Then check there shouldn't > > be any dangling slot on the publisher? This is similar to a test in > > subscription/t/004_sync.pl, we can use some of that framework but have a > > separate test for this. > I've added this PK violation test to the attached tests. > The patch works with v28 and made no failure during regression tests. > I checked this patch. It applied cleanly on top of V28, and all tests passed OK. Here are two feedback comments. 1. For the regression test there is 2 x SQL and 1 x function test. I thought to cover all the combinations there should be another function test. e.g. Tests ALTER … REFRESH Tests ALTER …. (refresh = true) Tests ALTER … (refresh = true) in a function Tests ALTER … REFRESH in a function <== this combination is not being testing ?? 2. For the 004 test case I know the test is needing some PK constraint violation # Check if DROP SUBSCRIPTION cleans up slots on the publisher side # when the subscriber is stuck on data copy for constraint But it is not clear to me what was the exact cause of that PK violation. I think you must be relying on data that is leftover from some previous test case but I am not sure which one. Can you make the comment more detailed to say *how* the PK violation is happening - e.g something to say which rows, in which table, and inserted by who? ------ Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Feb 8, 2021 at 8:06 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > > I have another idea for a test case: What if we write a test such that it fails PK > > > violation on copy and then drop the subscription. Then check there shouldn't > > > be any dangling slot on the publisher? This is similar to a test in > > > subscription/t/004_sync.pl, we can use some of that framework but have a > > > separate test for this. > > I've added this PK violation test to the attached tests. > > The patch works with v28 and made no failure during regression tests. > > > > I checked this patch. It applied cleanly on top of V28, and all tests passed OK. > > Here are two feedback comments. > > 1. For the regression test there is 2 x SQL and 1 x function test. I > thought to cover all the combinations there should be another function > test. e.g. > Tests ALTER … REFRESH > Tests ALTER …. (refresh = true) > Tests ALTER … (refresh = true) in a function > Tests ALTER … REFRESH in a function <== this combination is not being > testing ?? > I am not sure whether there is much value in adding more to this set of negative test cases unless it really covers a different code path which I think won't happen if we add more tests here. -- With Regards, Amit Kapila.
Hello On Mon, Feb 8, 2021 12:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Feb 8, 2021 at 8:06 AM Peter Smith <smithpb2250@gmail.com> > wrote: > > > > On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > > I have another idea for a test case: What if we write a test such > > > > that it fails PK violation on copy and then drop the subscription. > > > > Then check there shouldn't be any dangling slot on the publisher? > > > > This is similar to a test in subscription/t/004_sync.pl, we can > > > > use some of that framework but have a separate test for this. > > > I've added this PK violation test to the attached tests. > > > The patch works with v28 and made no failure during regression tests. > > > > > > > I checked this patch. It applied cleanly on top of V28, and all tests passed > OK. > > > > Here are two feedback comments. > > > > 1. For the regression test there is 2 x SQL and 1 x function test. I > > thought to cover all the combinations there should be another function > > test. e.g. > > Tests ALTER … REFRESH > > Tests ALTER …. (refresh = true) > > Tests ALTER … (refresh = true) in a function Tests ALTER … REFRESH in > > a function <== this combination is not being testing ?? > > > > I am not sure whether there is much value in adding more to this set of > negative test cases unless it really covers a different code path which I think > won't happen if we add more tests here. Yeah, I agree. Accordingly, I didn't fix that part. On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com> wrote: > 2. For the 004 test case I know the test is needing some PK constraint > violation # Check if DROP SUBSCRIPTION cleans up slots on the publisher > side # when the subscriber is stuck on data copy for constraint > > But it is not clear to me what was the exact cause of that PK violation. I think > you must be relying on data that is leftover from some previous test case but > I am not sure which one. Can you make the comment more detailed to say > *how* the PK violation is happening - e.g something to say which rows, in > which table, and inserted by who? I added some comments to clarify how the PK violation happens. Please have a look. Best Regards, Takamichi Osumi
Attachment
On Monday, February 8, 2021 1:44 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> > On Mon, Feb 8, 2021 12:40 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > On Mon, Feb 8, 2021 at 8:06 AM Peter Smith <smithpb2250@gmail.com> > > wrote: > > > > > > On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com > > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > > > > I have another idea for a test case: What if we write a test > > > > > such that it fails PK violation on copy and then drop the subscription. > > > > > Then check there shouldn't be any dangling slot on the publisher? > > > > > This is similar to a test in subscription/t/004_sync.pl, we can > > > > > use some of that framework but have a separate test for this. > > > > I've added this PK violation test to the attached tests. > > > > The patch works with v28 and made no failure during regression tests. > > > > > > > > > > I checked this patch. It applied cleanly on top of V28, and all > > > tests passed > > OK. > > > > > > Here are two feedback comments. > > > > > > 1. For the regression test there is 2 x SQL and 1 x function test. I > > > thought to cover all the combinations there should be another > > > function test. e.g. > > > Tests ALTER … REFRESH > > > Tests ALTER …. (refresh = true) > > > Tests ALTER … (refresh = true) in a function Tests ALTER … REFRESH > > > in a function <== this combination is not being testing ?? > > > > > > > I am not sure whether there is much value in adding more to this set > > of negative test cases unless it really covers a different code path > > which I think won't happen if we add more tests here. > Yeah, I agree. Accordingly, I didn't fix that part. > > > On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com> > wrote: > > 2. For the 004 test case I know the test is needing some PK constraint > > violation # Check if DROP SUBSCRIPTION cleans up slots on the > > publisher side # when the subscriber is stuck on data copy for > > constraint > > > > But it is not clear to me what was the exact cause of that PK > > violation. I think you must be relying on data that is leftover from > > some previous test case but I am not sure which one. Can you make the > > comment more detailed to say > > *how* the PK violation is happening - e.g something to say which rows, > > in which table, and inserted by who? > I added some comments to clarify how the PK violation happens. > Please have a look. Sorry, I had a one typo in the tests of subscription.sql in v2. I used 'foo' for the first test of "ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true) in v02", but I should have used 'mypub' to make this test clearly independent from other previous tests. Attached the fixed version. Best Regards, Takamichi Osumi
Attachment
On Fri, Feb 5, 2021 at 8:40 PM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote: > > Hi, > > We had a bit high-level discussion about this patches with Amit > off-list, so I decided to also take a look at the actual code. > Thanks for the discussion and a follow-up review. > My main concern originally was the potential for left-over slots on > publisher, but I think the state now is relatively okay, with couple of > corner cases that are documented and don't seem much worse than the main > slot. > > I wonder if we should mention the max_slot_wal_keep_size GUC in the > table sync docs though. > I have added the reference of this in Alter Subscription where we mentioned the risk of leftover slots. Let me know if you have something else in mind? > Another thing that might need documentation is that the the visibility > of changes done by table sync is not anymore isolated in that table > contents will show intermediate progress to other backends, rather than > switching from nothing to state consistent with rest of replication. > Agreed and updated the docs accordingly. > > Some minor comments about code: > > > + else if (res->status == WALRCV_ERROR && missing_ok) > > + { > > + /* WARNING. Error, but missing_ok = true. */ > > + ereport(WARNING, > > I wonder if we need to add error code to the WalRcvExecResult and check > for the appropriate ones here. Because this can for example return error > because of timeout, not because slot is missing. > I think there are both pros and cons of distinguishing the error ("slot doesnot exist" from others). The benefit is if there a network glitch then the user can probably retry the commands Alter/Drop and it will be successful next time. OTOH, say the network is broken for a long time and the user wants to proceed but there won't be any way to proceed for Alter Subscription ... Refresh or Drop Command. So by giving WARNING at least we can provide a way to proceed and then they can drop such slots later. We have mentioned this in docs as well. I think we can go either way here, let me know what do you think is a better way? > Not sure if it matters > for current callers though (but then maybe don't call the param > missign_ok?). > Sure, if we decide not to change the behavior as suggested by you then this makes sense. > > > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) > > +{ > > + if (syncslotname) > > + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); > > + else > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > + > > + return syncslotname; > > +} > > Given that we are now explicitly dropping slots, what happens here if we > have 2 different downstreams that happen to get same suboid and reloid, > will one of the drop the slot of the other one? Previously with the > cleanup being left to temp slot we'd at maximum got error when creating > it but with the new logic in LogicalRepSyncTableStart it feels like we > could get into situation where 2 downstreams are fighting over slot no? > As discussed, added system_identifier to distinguish subscriptions between different clusters. Apart from fixing the above comment, I have integrated it with the new replorigin_drop_by_name() API being discussed in the thread [1] and posted that patch just for ease. I have also integrated Osumi-San's test case patch with minor modifications. [1] - https://www.postgresql.org/message-id/CAA4eK1L7mLhY%3DwyCB0qsEGUpfzWfncDSS9_0a4Co%2BN0GUyNGNQ%40mail.gmail.com -- With Regards, Amit Kapila.
Attachment
On Mon, Feb 8, 2021 at 12:22 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Monday, February 8, 2021 1:44 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> > > On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com> > > wrote: > > > 2. For the 004 test case I know the test is needing some PK constraint > > > violation # Check if DROP SUBSCRIPTION cleans up slots on the > > > publisher side # when the subscriber is stuck on data copy for > > > constraint > > > > > > But it is not clear to me what was the exact cause of that PK > > > violation. I think you must be relying on data that is leftover from > > > some previous test case but I am not sure which one. Can you make the > > > comment more detailed to say > > > *how* the PK violation is happening - e.g something to say which rows, > > > in which table, and inserted by who? > > I added some comments to clarify how the PK violation happens. > > Please have a look. > Sorry, I had a one typo in the tests of subscription.sql in v2. > I used 'foo' for the first test of "ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true) in v02", > but I should have used 'mypub' to make this test clearly independent from other previous tests. > Attached the fixed version. > Thanks. I have integrated this into the main patch with minor modifications in the comments. The main change I have done is to remove the test that was testing that there are two slots remaining after the initial sync failure. This is because on restart of tablesync worker we again try to drop the slot so we can't guarantee that the tablesync slot would be remaining. I think this is a timing issue so it might not have occurred on your machine but I could reproduce that by repeated runs of the tests provided by you. -- With Regards, Amit Kapila.
On Mon, Feb 8, 2021 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sun, Feb 7, 2021 at 2:38 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek > > <petr.jelinek@enterprisedb.com> wrote: > > > > > > Hi, > > > > > > Some minor comments about code: > > > > > > > + else if (res->status == WALRCV_ERROR && missing_ok) > > > > + { > > > > + /* WARNING. Error, but missing_ok = true. */ > > > > + ereport(WARNING, > > > > > > I wonder if we need to add error code to the WalRcvExecResult and check > > > for the appropriate ones here. Because this can for example return error > > > because of timeout, not because slot is missing. Not sure if it matters > > > for current callers though (but then maybe don't call the param > > > missign_ok?). > > > > You are right. The way we are using this function has evolved beyond > > the original intention. > > Probably renaming the param to something like "error_ok" would be more > > appropriate now. > > > > PSA a patch (apply on top of V28) to change the misleading param name. > PSA an alternative patch. This one adds a new member to WalRcvExecResult and so is able to detect the "slot does not exist" error. This patch also applies on top of V28, if you want it. ------ Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Feb 8, 2021 8:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Feb 8, 2021 at 12:22 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > On Monday, February 8, 2021 1:44 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> > > > On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com> > > > wrote: > > > > 2. For the 004 test case I know the test is needing some PK > > > > constraint violation # Check if DROP SUBSCRIPTION cleans up slots > > > > on the publisher side # when the subscriber is stuck on data copy > > > > for constraint > > > > > > > > But it is not clear to me what was the exact cause of that PK > > > > violation. I think you must be relying on data that is leftover > > > > from some previous test case but I am not sure which one. Can you > > > > make the comment more detailed to say > > > > *how* the PK violation is happening - e.g something to say which > > > > rows, in which table, and inserted by who? > > > I added some comments to clarify how the PK violation happens. > > > Please have a look. > > Sorry, I had a one typo in the tests of subscription.sql in v2. > > I used 'foo' for the first test of "ALTER SUBSCRIPTION mytest SET > > PUBLICATION foo WITH (refresh = true) in v02", but I should have used > 'mypub' to make this test clearly independent from other previous tests. > > Attached the fixed version. > > > > Thanks. I have integrated this into the main patch with minor modifications in > the comments. The main change I have done is to remove the test that was > testing that there are two slots remaining after the initial sync failure. This is > because on restart of tablesync worker we again try to drop the slot so we > can't guarantee that the tablesync slot would be remaining. I think this is a > timing issue so it might not have occurred on your machine but I could > reproduce that by repeated runs of the tests provided by you. OK. I understand. Thank you so much that your modified and integrated it into the main patch. Best Regards, Takamichi Osumi
Here are my feedback comments for the V29 patch. ==== FILE: logical-replication.sgml + slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote> + (parameters: Subscription <parameter>oid</parameter>, + Table <parameter>relid</parameter>, system identifier<parameter>sysid</parameter>) + </para> 1. There is a missing space before the sysid parameter. ===== FILE: subscriptioncmds.c + * SUBREL_STATE_FINISHEDCOPY. The apply worker can also + * concurrently try to drop the origin and by this time the + * origin might be already removed. For these reasons, + * passing missing_ok = true from here. + */ + snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid); + replorigin_drop_by_name(originname, true, false); + } 2. Don't really need to say "from here". (same comment applies multiple places, in this file and in tablesync.c) 3. Previously the tablesync origin name format was encapsulated in a common function. IMO it was cleaner/safer how it was before, instead of the same "pg_%u_%u" cut/paste and scattered in many places. (same comment applies multiple places, in this file and in tablesync.c) 4. Calls like replorigin_drop_by_name(originname, true, false); make it unnecessarily hard to read code when the boolean params are neither named as variables nor commented. I noticed on another thread [et0205] there was an idea that having no name/comments is fine because anyway it is not difficult to figure out when using a "modern IDE", but since my review tools are only "vi" and "meld" I beg to differ with that justification. (same comment applies multiple places, in this file and in tablesync.c) [et0205] https://www.postgresql.org/message-id/c1d9833f-eeeb-40d5-89ba-87674e1b7ba3%40www.fastmail.com ===== FILE: tablesync.c 5. Previously there was a function tablesync_replorigin_drop which was encapsulating the tablesync origin name formatting. I thought that was better than the V29 code which now has the same formatting scattered over many places. (same comment applies for worker_internal.h) + * Determine the tablesync slot name. + * + * The name must not exceed NAMEDATALEN - 1 because of remote node constraints + * on slot name length. We do append system_identifier to avoid slot_name + * collision with subscriptions in other clusters. With current scheme + * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum + * length of slot_name will be 50. + * + * The returned slot name is either: + * - stored in the supplied buffer (syncslotname), or + * - palloc'ed in current memory context (if syncslotname = NULL). + * + * Note: We don't use the subscription slot name as part of tablesync slot name + * because we are responsible for cleaning up these slots and it could become + * impossible to recalculate what name to cleanup if the subscription slot name + * had changed. + */ +char * +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid, + GetSystemIdentifier()); + else + syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid, + GetSystemIdentifier()); + + return syncslotname; +} 6. "We do append" --> "We append" "With current scheme" -> "With the current scheme" 7. Maybe consider to just assign GetSystemIdentifier() to a static instead of calling that function for every slot? static uint64 sysid = GetSystemIdentifier(); IIUC the sysid value is never going to change for a process, right? ------ Kind Regards, Peter Smith. Fujitsu Australia On Mon, Feb 8, 2021 at 9:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Feb 5, 2021 at 8:40 PM Petr Jelinek > <petr.jelinek@enterprisedb.com> wrote: > > > > Hi, > > > > We had a bit high-level discussion about this patches with Amit > > off-list, so I decided to also take a look at the actual code. > > > > Thanks for the discussion and a follow-up review. > > > My main concern originally was the potential for left-over slots on > > publisher, but I think the state now is relatively okay, with couple of > > corner cases that are documented and don't seem much worse than the main > > slot. > > > > I wonder if we should mention the max_slot_wal_keep_size GUC in the > > table sync docs though. > > > > I have added the reference of this in Alter Subscription where we > mentioned the risk of leftover slots. Let me know if you have > something else in mind? > > > Another thing that might need documentation is that the the visibility > > of changes done by table sync is not anymore isolated in that table > > contents will show intermediate progress to other backends, rather than > > switching from nothing to state consistent with rest of replication. > > > > Agreed and updated the docs accordingly. > > > > > Some minor comments about code: > > > > > + else if (res->status == WALRCV_ERROR && missing_ok) > > > + { > > > + /* WARNING. Error, but missing_ok = true. */ > > > + ereport(WARNING, > > > > I wonder if we need to add error code to the WalRcvExecResult and check > > for the appropriate ones here. Because this can for example return error > > because of timeout, not because slot is missing. > > > > I think there are both pros and cons of distinguishing the error > ("slot doesnot exist" from others). The benefit is if there a network > glitch then the user can probably retry the commands Alter/Drop and it > will be successful next time. OTOH, say the network is broken for a > long time and the user wants to proceed but there won't be any way to > proceed for Alter Subscription ... Refresh or Drop Command. So by > giving WARNING at least we can provide a way to proceed and then they > can drop such slots later. We have mentioned this in docs as well. I > think we can go either way here, let me know what do you think is a > better way? > > > Not sure if it matters > > for current callers though (but then maybe don't call the param > > missign_ok?). > > > > Sure, if we decide not to change the behavior as suggested by you then > this makes sense. > > > > > > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) > > > +{ > > > + if (syncslotname) > > > + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); > > > + else > > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > > + > > > + return syncslotname; > > > +} > > > > Given that we are now explicitly dropping slots, what happens here if we > > have 2 different downstreams that happen to get same suboid and reloid, > > will one of the drop the slot of the other one? Previously with the > > cleanup being left to temp slot we'd at maximum got error when creating > > it but with the new logic in LogicalRepSyncTableStart it feels like we > > could get into situation where 2 downstreams are fighting over slot no? > > > > As discussed, added system_identifier to distinguish subscriptions > between different clusters. > > Apart from fixing the above comment, I have integrated it with the new > replorigin_drop_by_name() API being discussed in the thread [1] and > posted that patch just for ease. I have also integrated Osumi-San's > test case patch with minor modifications. > > [1] - https://www.postgresql.org/message-id/CAA4eK1L7mLhY%3DwyCB0qsEGUpfzWfncDSS9_0a4Co%2BN0GUyNGNQ%40mail.gmail.com > > -- > With Regards, > Amit Kapila.
More V29 Feedback FILE: alter_subscription.sgml 8. + <para> + Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and + <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh + option as true cannot be executed inside a transaction block. + </para> My guess is those two lots of double dots ("..") were probably meant to be ellipsis ("...") ---- Kind Regards, Peter Smith. Fujitsu Australia
Looking at the V29 style tablesync slot names now they appear like this: WARNING: could not drop tablesync replication slot "pg_16397_sync_16389_6927117142022745645" That is in the order subid + relid + sysid Now that I see it in a message it seems a bit strange with the sysid just tacked onto the end like that. I am wondering if reordering of parent to child might be more natural. e.g sysid + subid + relid gives a more intuitive name IMO. So in this example it would be "pg_sync_6927117142022745645_16397_16389" Thoughts? ---- Kind Regards, Peter Smith Fujitsu Australia
When looking at the DropSubscription code I noticed that there is a small difference between the HEAD code and the V29 code when slot_name = NONE. HEAD does ------ if (!slotname) { table_close(rel, NoLock); return; } ------ V29 does ------ if (!slotname) { /* be tidy */ list_free(rstates); return; } ------ Isn't the V29 code missing doing a table_close(rel, NoLock) there? ------ Kind Regards, Peter Smith. Fujitsu Australia
On Tue, Feb 9, 2021 at 12:02 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are my feedback comments for the V29 patch. > Thanks. > > 3. > Previously the tablesync origin name format was encapsulated in a > common function. IMO it was cleaner/safer how it was before, instead > of the same "pg_%u_%u" cut/paste and scattered in many places. > (same comment applies multiple places, in this file and in tablesync.c) > > 4. > Calls like replorigin_drop_by_name(originname, true, false); make it > unnecessarily hard to read code when the boolean params are neither > named as variables nor commented. I noticed on another thread [et0205] > there was an idea that having no name/comments is fine because anyway > it is not difficult to figure out when using a "modern IDE", but since > my review tools are only "vi" and "meld" I beg to differ with that > justification. > (same comment applies multiple places, in this file and in tablesync.c) > It would be a bit convenient for you but for most others, I think it would be noise. Personally, I find the code more readable without such name comments, it just breaks the flow of code unless you want to study in detail the value of each param. > [et0205] https://www.postgresql.org/message-id/c1d9833f-eeeb-40d5-89ba-87674e1b7ba3%40www.fastmail.com > > ===== > > FILE: tablesync.c > > 5. > Previously there was a function tablesync_replorigin_drop which was > encapsulating the tablesync origin name formatting. I thought that was > better than the V29 code which now has the same formatting scattered > over many places. > (same comment applies for worker_internal.h) > Isn't this the same as what you want to say in point-3? > > 7. > Maybe consider to just assign GetSystemIdentifier() to a static > instead of calling that function for every slot? > static uint64 sysid = GetSystemIdentifier(); > IIUC the sysid value is never going to change for a process, right? > That's right but I am not sure if there is much value in saving one call here by introducing extra variable. I'll fix other comments raised by you. -- With Regards, Amit Kapila.
On Tue, Feb 9, 2021 at 1:37 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Looking at the V29 style tablesync slot names now they appear like this: > > WARNING: could not drop tablesync replication slot > "pg_16397_sync_16389_6927117142022745645" > That is in the order subid + relid + sysid > > Now that I see it in a message it seems a bit strange with the sysid > just tacked onto the end like that. > > I am wondering if reordering of parent to child might be more natural. > e.g sysid + subid + relid gives a more intuitive name IMO. > > So in this example it would be "pg_sync_6927117142022745645_16397_16389" > I have kept the order based on the importance of each parameter. Say when the user sees this message in the server log of the subscriber either for the purpose of tracking the origins progress or for errors, the sysid parameter won't be of much use and they will be mostly looking at subid and relid. OTOH, if due to some reason this parameter appears in the publisher logs then sysid might be helpful. Petr, anyone else, do you have any opinion on this matter? -- With Regards, Amit Kapila.
On Tue, Feb 9, 2021 at 12:02 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are my feedback comments for the V29 patch. > > ==== > > FILE: logical-replication.sgml > > + slots have generated names: > <quote><literal>pg_%u_sync_%u_%llu</literal></quote> > + (parameters: Subscription <parameter>oid</parameter>, > + Table <parameter>relid</parameter>, system > identifier<parameter>sysid</parameter>) > + </para> > > 1. > There is a missing space before the sysid parameter. > > ===== > > FILE: subscriptioncmds.c > > + * SUBREL_STATE_FINISHEDCOPY. The apply worker can also > + * concurrently try to drop the origin and by this time the > + * origin might be already removed. For these reasons, > + * passing missing_ok = true from here. > + */ > + snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid); > + replorigin_drop_by_name(originname, true, false); > + } > > 2. > Don't really need to say "from here". > (same comment applies multiple places, in this file and in tablesync.c) > > 3. > Previously the tablesync origin name format was encapsulated in a > common function. IMO it was cleaner/safer how it was before, instead > of the same "pg_%u_%u" cut/paste and scattered in many places. > (same comment applies multiple places, in this file and in tablesync.c) > Fixed all the three above comments. > 4. > Calls like replorigin_drop_by_name(originname, true, false); make it > unnecessarily hard to read code when the boolean params are neither > named as variables nor commented. I noticed on another thread [et0205] > there was an idea that having no name/comments is fine because anyway > it is not difficult to figure out when using a "modern IDE", but since > my review tools are only "vi" and "meld" I beg to differ with that > justification. > (same comment applies multiple places, in this file and in tablesync.c) > Already responded to it separately. I went ahead and removed such comments from other places in the patch. > [et0205] https://www.postgresql.org/message-id/c1d9833f-eeeb-40d5-89ba-87674e1b7ba3%40www.fastmail.com > > ===== > > FILE: tablesync.c > > 5. > Previously there was a function tablesync_replorigin_drop which was > encapsulating the tablesync origin name formatting. I thought that was > better than the V29 code which now has the same formatting scattered > over many places. > (same comment applies for worker_internal.h) > I am not sure what different you are expecting here than point-3? > + * Determine the tablesync slot name. > + * > + * The name must not exceed NAMEDATALEN - 1 because of remote node constraints > + * on slot name length. We do append system_identifier to avoid slot_name > + * collision with subscriptions in other clusters. With current scheme > + * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum > + * length of slot_name will be 50. > + * > + * The returned slot name is either: > + * - stored in the supplied buffer (syncslotname), or > + * - palloc'ed in current memory context (if syncslotname = NULL). > + * > + * Note: We don't use the subscription slot name as part of tablesync slot name > + * because we are responsible for cleaning up these slots and it could become > + * impossible to recalculate what name to cleanup if the subscription slot name > + * had changed. > + */ > +char * > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char > syncslotname[NAMEDATALEN]) > +{ > + if (syncslotname) > + sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid, > + GetSystemIdentifier()); > + else > + syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid, > + GetSystemIdentifier()); > + > + return syncslotname; > +} > > 6. > "We do append" --> "We append" > "With current scheme" -> "With the current scheme" > Fixed. > 7. > Maybe consider to just assign GetSystemIdentifier() to a static > instead of calling that function for every slot? > static uint64 sysid = GetSystemIdentifier(); > IIUC the sysid value is never going to change for a process, right? > Already responded. > FILE: alter_subscription.sgml > > 8. > + <para> > + Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and > + <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh > + option as true cannot be executed inside a transaction block. > + </para> > > My guess is those two lots of double dots ("..") were probably meant > to be ellipsis ("...") > Fixed, for the first one I completed the command by adding PUBLICATION. > > When looking at the DropSubscription code I noticed that there is a > small difference between the HEAD code and the V29 code when slot_name > = NONE. > > HEAD does > ------ > if (!slotname) > { > table_close(rel, NoLock); > return; > } > ------ > > V29 does > ------ > if (!slotname) > { > /* be tidy */ > list_free(rstates); > return; > } > ------ > > Isn't the V29 code missing doing a table_close(rel, NoLock) there? > Yes, good catch. Fixed. -- With Regards, Amit Kapila.
Attachment
On Tue, Feb 9, 2021 at 8:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 12:02 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Here are my feedback comments for the V29 patch. > > > > Thanks. > > > > > 3. > > Previously the tablesync origin name format was encapsulated in a > > common function. IMO it was cleaner/safer how it was before, instead > > of the same "pg_%u_%u" cut/paste and scattered in many places. > > (same comment applies multiple places, in this file and in tablesync.c) OK. I confirmed it is fixed in V30. But I noticed that the new function name is not quite consistent with existing function for slot name. e.g. ReplicationSlotNameForTablesync versus ReplicationOriginNameForTableSync (see "TableSync" instead of "Tablesync") ------ Kind Regards, Peter Smith. Fujitsu Australia
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Feb 8, 2021 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Sun, Feb 7, 2021 at 2:38 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek > > > <petr.jelinek@enterprisedb.com> wrote: > > > > > > > > Hi, > > > > > > > > Some minor comments about code: > > > > > > > > > + else if (res->status == WALRCV_ERROR && missing_ok) > > > > > + { > > > > > + /* WARNING. Error, but missing_ok = true. */ > > > > > + ereport(WARNING, > > > > > > > > I wonder if we need to add error code to the WalRcvExecResult and check > > > > for the appropriate ones here. Because this can for example return error > > > > because of timeout, not because slot is missing. Not sure if it matters > > > > for current callers though (but then maybe don't call the param > > > > missign_ok?). > > > > > > You are right. The way we are using this function has evolved beyond > > > the original intention. > > > Probably renaming the param to something like "error_ok" would be more > > > appropriate now. > > > > > > > PSA a patch (apply on top of V28) to change the misleading param name. > > > > PSA an alternative patch. This one adds a new member to > WalRcvExecResult and so is able to detect the "slot does not exist" > error. This patch also applies on top of V28, if you want it. > PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes some PG doc updates). This applies OK on top of v30 of the main patch. ------ Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > PSA an alternative patch. This one adds a new member to > WalRcvExecResult and so is able to detect the "slot does not exist" > error. This patch also applies on top of V28, if you want it. Did some testing with this patch on top of v29. I could see that now, while dropping the subscription, if the tablesync slot does not exist on the publisher, then it gives a warning but the command does not fail. postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false); NOTICE: created replication slot "tap_sub" on publisher CREATE SUBSCRIPTION postgres=# ALTER SUBSCRIPTION tap_sub enable; ALTER SUBSCRIPTION postgres=# ALTER SUBSCRIPTION tap_sub disable; ALTER SUBSCRIPTION === here, the tablesync slot exists on the publisher but I go and === manually drop it. postgres=# drop subscription tap_sub; WARNING: could not drop the replication slot "pg_16401_sync_16389_6927117142022745645" on publisher DETAIL: The error was: ERROR: replication slot "pg_16401_sync_16389_6927117142022745645" does not exist NOTICE: dropped replication slot "tap_sub" on publisher DROP SUBSCRIPTION I have a minor comment on the error message, the "The error was:" seems a bit redundant here. Maybe remove it? So that it looks like: WARNING: could not drop the replication slot "pg_16401_sync_16389_6927117142022745645" on publisher DETAIL: ERROR: replication slot "pg_16401_sync_16389_6927117142022745645" does not exist regards, Ajin Cherian Fujitsu Australia
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes > some PG doc updates). > This applies OK on top of v30 of the main patch. > Thanks, I have integrated these changes into the main patch and additionally made some changes to comments and docs. I have also fixed the function name inconsistency issue you reported and ran pgindent. -- With Regards, Amit Kapila.
Attachment
I have reviewed again the latest patch (V31) I found only a few minor nitpick issues not worth listing. Then I ran the subscription TAP tests 50x in a loop as a kind of stress test. That ran for 2.5hrs and the result was all 50x 'Result: PASS'. So V31 looks good to me. ------ Kind Regards, Peter Smith. Fujitsu Australia
On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote: >> >> On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: >>> >> >> PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes >> some PG doc updates). >> This applies OK on top of v30 of the main patch. >> > > Thanks, I have integrated these changes into the main patch and > additionally made some changes to comments and docs. I have also fixed > the function name inconsistency issue you reported and ran pgindent. One thing: > + else if (res->status == WALRCV_ERROR && > + missing_ok && > + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) > + { > + /* WARNING. Error, but missing_ok = true. */ > + ereport(WARNING, > (errmsg("could not drop the replication slot \"%s\" on publisher", > slotname), > errdetail("The error was: %s", res->err))); Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it doesnot seem correct to report it as warning? -- Petr
On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote: > > On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote: > >> > >> On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > >>> > >> > >> PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes > >> some PG doc updates). > >> This applies OK on top of v30 of the main patch. > >> > > > > Thanks, I have integrated these changes into the main patch and > > additionally made some changes to comments and docs. I have also fixed > > the function name inconsistency issue you reported and ran pgindent. > > One thing: > > > + else if (res->status == WALRCV_ERROR && > > + missing_ok && > > + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) > > + { > > + /* WARNING. Error, but missing_ok = true. */ > > + ereport(WARNING, > > (errmsg("could not drop the replication slot \"%s\" on publisher", > > slotname), > > errdetail("The error was: %s", res->err))); > > Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it doesnot seem correct to report it as warning? > WARNING is for the cases where we don't always expect slots to exist and we don't want to stop the operation due to it. For example, in DropSubscription, for some of the rel states like (SUBREL_STATE_INIT and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we fail (due to network error) after removing some of the slots, next time, it will again try to drop already dropped slots and fail. For these reasons, we need to use WARNING. Similarly for tablesync workers when we are trying to initially drop the slot there is no certainty that it exists, so we can't throw ERROR and stop the operation there. There are other cases like when the table sync worker has finished syncing the table, there we will raise an ERROR if the slot doesn't exist. Does this make sense? -- With Regards, Amit Kapila.
On 11 Feb 2021, at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek > <petr.jelinek@enterprisedb.com> wrote: >> >> On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >>> On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote: >>>> >>>> On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: >>>>> >>>> >>>> PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes >>>> some PG doc updates). >>>> This applies OK on top of v30 of the main patch. >>>> >>> >>> Thanks, I have integrated these changes into the main patch and >>> additionally made some changes to comments and docs. I have also fixed >>> the function name inconsistency issue you reported and ran pgindent. >> >> One thing: >> >>> + else if (res->status == WALRCV_ERROR && >>> + missing_ok && >>> + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) >>> + { >>> + /* WARNING. Error, but missing_ok = true. */ >>> + ereport(WARNING, >>> (errmsg("could not drop the replication slot \"%s\" on publisher", >>> slotname), >>> errdetail("The error was: %s", res->err))); >> >> Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it doesnot seem correct to report it as warning? >> > > WARNING is for the cases where we don't always expect slots to exist > and we don't want to stop the operation due to it. For example, in > DropSubscription, for some of the rel states like (SUBREL_STATE_INIT > and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we > fail (due to network error) after removing some of the slots, next > time, it will again try to drop already dropped slots and fail. For > these reasons, we need to use WARNING. Similarly for tablesync workers > when we are trying to initially drop the slot there is no certainty > that it exists, so we can't throw ERROR and stop the operation there. > There are other cases like when the table sync worker has finished > syncing the table, there we will raise an ERROR if the slot doesn't > exist. Does this make sense? Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me. — Petr
On Thu, Feb 11, 2021 at 3:20 PM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote: > > On 11 Feb 2021, at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek > > <petr.jelinek@enterprisedb.com> wrote: > >> > >> On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote: > >>> > >>> On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote: > >>>> > >>>> On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: > >>>>> > >>>> > >>>> PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes > >>>> some PG doc updates). > >>>> This applies OK on top of v30 of the main patch. > >>>> > >>> > >>> Thanks, I have integrated these changes into the main patch and > >>> additionally made some changes to comments and docs. I have also fixed > >>> the function name inconsistency issue you reported and ran pgindent. > >> > >> One thing: > >> > >>> + else if (res->status == WALRCV_ERROR && > >>> + missing_ok && > >>> + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) > >>> + { > >>> + /* WARNING. Error, but missing_ok = true. */ > >>> + ereport(WARNING, > >>> (errmsg("could not drop the replication slot \"%s\" on publisher", > >>> slotname), > >>> errdetail("The error was: %s", res->err))); > >> > >> Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so itdoes not seem correct to report it as warning? > >> > > > > WARNING is for the cases where we don't always expect slots to exist > > and we don't want to stop the operation due to it. For example, in > > DropSubscription, for some of the rel states like (SUBREL_STATE_INIT > > and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we > > fail (due to network error) after removing some of the slots, next > > time, it will again try to drop already dropped slots and fail. For > > these reasons, we need to use WARNING. Similarly for tablesync workers > > when we are trying to initially drop the slot there is no certainty > > that it exists, so we can't throw ERROR and stop the operation there. > > There are other cases like when the table sync worker has finished > > syncing the table, there we will raise an ERROR if the slot doesn't > > exist. Does this make sense? > > Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me. > I am fine with LOG and will make that change. Do you have any more comments or want to spend more time on this patch before we call it good? -- With Regards, Amit Kapila.
On 11 Feb 2021, at 10:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 3:20 PM Petr Jelinek > <petr.jelinek@enterprisedb.com> wrote: >> >> On 11 Feb 2021, at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >>> On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek >>> <petr.jelinek@enterprisedb.com> wrote: >>>> >>>> On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>> >>>>> On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote: >>>>>> >>>>>> On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote: >>>>>>> >>>>>> >>>>>> PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes >>>>>> some PG doc updates). >>>>>> This applies OK on top of v30 of the main patch. >>>>>> >>>>> >>>>> Thanks, I have integrated these changes into the main patch and >>>>> additionally made some changes to comments and docs. I have also fixed >>>>> the function name inconsistency issue you reported and ran pgindent. >>>> >>>> One thing: >>>> >>>>> + else if (res->status == WALRCV_ERROR && >>>>> + missing_ok && >>>>> + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) >>>>> + { >>>>> + /* WARNING. Error, but missing_ok = true. */ >>>>> + ereport(WARNING, >>>>> (errmsg("could not drop the replication slot \"%s\" on publisher", >>>>> slotname), >>>>> errdetail("The error was: %s", res->err))); >>>> >>>> Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so itdoes not seem correct to report it as warning? >>>> >>> >>> WARNING is for the cases where we don't always expect slots to exist >>> and we don't want to stop the operation due to it. For example, in >>> DropSubscription, for some of the rel states like (SUBREL_STATE_INIT >>> and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we >>> fail (due to network error) after removing some of the slots, next >>> time, it will again try to drop already dropped slots and fail. For >>> these reasons, we need to use WARNING. Similarly for tablesync workers >>> when we are trying to initially drop the slot there is no certainty >>> that it exists, so we can't throw ERROR and stop the operation there. >>> There are other cases like when the table sync worker has finished >>> syncing the table, there we will raise an ERROR if the slot doesn't >>> exist. Does this make sense? >> >> Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me. >> > > I am fine with LOG and will make that change. Do you have any more > comments or want to spend more time on this patch before we call it > good? I am good, thanks! — Petr
On Thu, Feb 11, 2021 at 3:32 PM Petr Jelinek <petr.jelinek@enterprisedb.com> wrote: > > On 11 Feb 2021, at 10:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > >> Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases tome. > >> > > > > I am fine with LOG and will make that change. Do you have any more > > comments or want to spend more time on this patch before we call it > > good? > > I am good, thanks! > Okay, attached an updated patch with only that change. -- With Regards, Amit Kapila.
Attachment
On Thu, Feb 11, 2021 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Okay, attached an updated patch with only that change. I ran Erik's test suite [1] on this patch overnight and found no errors. No more comments from me. The patch looks good. regards, Ajin Cherian Fujitsu Australia [1]- https://www.postgresql.org/message-id/93d02794068482f96d31b002e0eb248d%40xs4all.nl
On Fri, Feb 12, 2021 at 7:18 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Okay, attached an updated patch with only that change. > > I ran Erik's test suite [1] on this patch overnight and found no > errors. No more comments from me. The patch looks good. > Thanks, I have pushed the patch but getting one failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12 The reason seems to be that we are trying to connect and max_wal_senders is set to zero. I think we can write this without trying to connect. The attached patch fixes the problem for me. What do you think? -- With Regards, Amit Kapila.
Attachment
On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Thanks, I have pushed the patch but getting one failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12 > > The reason seems to be that we are trying to connect and > max_wal_senders is set to zero. I think we can write this without > trying to connect. The attached patch fixes the problem for me. What > do you think? Verified this with installcheck and modified configuration to have wal_level = minimal and max_wal_senders = 0. Tests passed. The changes look good to me. regards, Ajin Cherian Fujitsu Australia
On Fri, Feb 12, 2021 at 10:08 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Thanks, I have pushed the patch but getting one failure: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12 > > > > The reason seems to be that we are trying to connect and > > max_wal_senders is set to zero. I think we can write this without > > trying to connect. The attached patch fixes the problem for me. What > > do you think? > > Verified this with installcheck and modified configuration to have > wal_level = minimal and max_wal_senders = 0. > Tests passed. The changes look good to me. > Thanks, I have pushed the fix and the latest run of 'thorntail' has passed. -- With Regards, Amit Kapila.