Thread: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16643 Logged by: Henry Hinze Email address: henry.hinze@googlemail.com PostgreSQL version: 13.0 Operating system: Ubuntu 20.04 Description: The following setup of logical replication for one table works nicely with PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts over and over again. All settings are default except for wal_level which was changed to "logical". To replicate: -- Create 2 Databases psql CREATE DATABASE d1; CREATE DATABASE d2; -- In d1: Create table, fill with data, create publication, create replication slot psql d1 CREATE TABLE t (f int); INSERT INTO t SELECT * FROM generate_series(1, 10000000); ALTER TABLE t ADD PRIMARY KEY (f); CREATE PUBLICATION p; ALTER PUBLICATION p ADD TABLE t; SELECT pg_create_logical_replication_slot('my_slot','pgoutput'); -- In d2: Create table, create subscription psql d2 CREATE TABLE t (f int); ALTER TABLE t ADD PRIMARY KEY (f); CREATE SUBSCRIPTION s CONNECTION 'user=postgres dbname=d1' PUBLICATION p WITH (create_slot = false, slot_name='my_slot'); Here are the Logs: ********************** 2020-09-29 21:27:13.267 CEST [17178] postgres@d1 LOG: logical decoding found consistent point at 0/33CDDBB8 2020-09-29 21:27:13.267 CEST [17178] postgres@d1 DETAIL: There are no running transactions. 2020-09-29 21:27:13.267 CEST [17178] postgres@d1 STATEMENT: select pg_create_logical_replication_slot('my_slot','pgoutput'); 2020-09-29 21:27:37.543 CEST [17229] LOG: logical replication apply worker for subscription "s" has started 2020-09-29 21:27:37.548 CEST [17230] postgres@d1 LOG: starting logical decoding for slot "my_slot" 2020-09-29 21:27:37.548 CEST [17230] postgres@d1 DETAIL: Streaming transactions committing after 0/33CDDBF0, reading WAL from 0/33CDDBB8. 2020-09-29 21:27:37.548 CEST [17230] postgres@d1 LOG: logical decoding found consistent point at 0/33CDDBB8 2020-09-29 21:27:37.548 CEST [17230] postgres@d1 DETAIL: There are no running transactions. 2020-09-29 21:27:37.552 CEST [17231] LOG: logical replication table synchronization worker for subscription "s", table "t" has started 2020-09-29 21:27:37.704 CEST [17232] postgres@d1 LOG: logical decoding found consistent point at 0/33D49208 2020-09-29 21:27:37.704 CEST [17232] postgres@d1 DETAIL: There are no running transactions. 2020-09-29 21:27:58.888 CEST [17232] postgres@d1 LOG: starting logical decoding for slot "my_slot_16398_sync_16393" 2020-09-29 21:27:58.888 CEST [17232] postgres@d1 DETAIL: Streaming transactions committing after 0/33D49240, reading WAL from 0/33D49208. 2020-09-29 21:27:58.888 CEST [17232] postgres@d1 LOG: logical decoding found consistent point at 0/33D49208 2020-09-29 21:27:58.888 CEST [17232] postgres@d1 DETAIL: There are no running transactions. 2020-09-29 21:28:05.138 CEST [17231] ERROR: error reading result of streaming command: 2020-09-29 21:28:05.143 CEST [17130] LOG: background worker "logical replication worker" (PID 17231) exited with exit code 1 2020-09-29 21:28:05.145 CEST [17256] LOG: logical replication table synchronization worker for subscription "s", table "t" has started 2020-09-29 21:28:05.222 CEST [17257] postgres@d1 LOG: logical decoding found consistent point at 0/62FBD4E8 2020-09-29 21:28:05.222 CEST [17257] postgres@d1 DETAIL: There are no running transactions. 2020-09-29 21:28:49.187 CEST [17257] postgres@d1 LOG: starting logical decoding for slot "my_slot_16398_sync_16393" 2020-09-29 21:28:49.187 CEST [17257] postgres@d1 DETAIL: Streaming transactions committing after 0/62FBD520, reading WAL from 0/62FBD4E8. 2020-09-29 21:28:49.187 CEST [17257] postgres@d1 LOG: logical decoding found consistent point at 0/62FBD4E8 2020-09-29 21:28:49.187 CEST [17257] postgres@d1 DETAIL: There are no running transactions. 2020-09-29 21:28:55.672 CEST [17256] ERROR: error reading result of streaming command: 2020-09-29 21:28:55.676 CEST [17130] LOG: background worker "logical replication worker" (PID 17256) exited with exit code 1 2020-09-29 21:28:55.679 CEST [17308] LOG: logical replication table synchronization worker for subscription "s", table "t" has started 2020-09-29 21:28:55.833 CEST [17309] postgres@d1 LOG: logical decoding found consistent point at 0/9E9AEA38 2020-09-29 21:28:55.833 CEST [17309] postgres@d1 DETAIL: There are no running transactions.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > The following setup of logical replication for one table works nicely with > PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts > over and over again. It kinda looks like something is timing out before it's managed to catch up with the (rather large) initial state of table t. What non-default replication settings are you using? regards, tom lane
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Henry Hinze
Date:
Hi Tom,
thanks for your reply!
For my comparison test of PG 12 and 13 I did not change any default setting except wal_level.
But I also did some tests and increased wal_keep_size and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but without success. The setup works in PG 13 only with a small amount of data.
Best,
Henry
Am Di., 29. Sept. 2020 um 22:09 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> The following setup of logical replication for one table works nicely with
> PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts
> over and over again.
It kinda looks like something is timing out before it's managed to
catch up with the (rather large) initial state of table t. What
non-default replication settings are you using?
regards, tom lane
Diplominformatiker Henry Hinze
Dietzgenstraße 75
13156 Berlin
Tel: +49 - 177 - 3160621
USt-ID: DE306639264
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Henry Hinze
Date:
I've made an important observation!
Since I had the impression this setup was already working with RC1 of PG 13, I re-installed RC1 and did the same test. And it's working fine!
So it seems that something has changed from RC1 to the final version that breaks logical replication in some cases.
Best,
Henry
Am Mi., 30. Sept. 2020 um 08:34 Uhr schrieb Henry Hinze <henry.hinze@gmail.com>:
Hi Tom,thanks for your reply!For my comparison test of PG 12 and 13 I did not change any default setting except wal_level.But I also did some tests and increased wal_keep_size and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but without success. The setup works in PG 13 only with a small amount of data.Best,HenryAm Di., 29. Sept. 2020 um 22:09 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:PG Bug reporting form <noreply@postgresql.org> writes:
> The following setup of logical replication for one table works nicely with
> PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts
> over and over again.
It kinda looks like something is timing out before it's managed to
catch up with the (rather large) initial state of table t. What
non-default replication settings are you using?
regards, tom lane
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Tom Lane
Date:
Henry Hinze <henry.hinze@gmail.com> writes: > I've made an important observation! > Since I had the impression this setup was already working with RC1 of PG > 13, I re-installed RC1 and did the same test. And it's working fine! Ugh. So that points the finger at commits 07082b08c/bfb12cd2b, which are the only nearby change between rc1 and 13.0. A quick comparison of before-and-after checkouts confirms it. After some digging around, I realize that that commit actually resulted in a protocol break. libpqwalreceiver is expecting to get an additional CommandComplete message after COPY OUT finishes, per libpqrcv_endstreaming(), and it's no longer getting one. (I have not read the protocol document to see if this is per spec; but spec or no, that's what libpqwalreceiver is expecting.) The question that this raises is how the heck did that get past our test suites? It seems like the error should have been obvious to even the most minimal testing. regards, tom lane
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Alvaro Herrera
Date:
On 2020-Sep-30, Tom Lane wrote: > Henry Hinze <henry.hinze@gmail.com> writes: > > I've made an important observation! > > Since I had the impression this setup was already working with RC1 of PG > > 13, I re-installed RC1 and did the same test. And it's working fine! > > Ugh. So that points the finger at commits 07082b08c/bfb12cd2b, > which are the only nearby change between rc1 and 13.0. A quick > comparison of before-and-after checkouts confirms it. Oh dear. > After some digging around, I realize that that commit actually > resulted in a protocol break. libpqwalreceiver is expecting to > get an additional CommandComplete message after COPY OUT finishes, > per libpqrcv_endstreaming(), and it's no longer getting one. > > (I have not read the protocol document to see if this is per spec; > but spec or no, that's what libpqwalreceiver is expecting.) Yeah, definitely. The minimal fix seems to be to add an EndReplicationCommand() call in the T_StartReplicationCmd case. Testing this now ... > The question that this raises is how the heck did that get past > our test suites? It seems like the error should have been obvious > to even the most minimal testing. ... yeah, that's indeed an important question. I'm going to guess that the TAP suites are too forgiving :-(
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Sep-30, Tom Lane wrote: >> The question that this raises is how the heck did that get past >> our test suites? It seems like the error should have been obvious >> to even the most minimal testing. > ... yeah, that's indeed an important question. I'm going to guess that > the TAP suites are too forgiving :-( One thing I noticed while trying to trace this down is that while the initial table sync is happening, we have *both* a regular walsender/walreceiver pair and a "sync" pair, eg postgres 905650 0.0 0.0 186052 11888 ? Ss 17:12 0:00 postgres: logical replication worker for subscription16398 postgres 905651 50.1 0.0 173704 13496 ? Ss 17:12 0:09 postgres: walsender postgres [local] idle postgres 905652 104 0.4 186832 148608 ? Rs 17:12 0:19 postgres: logical replication worker for subscription16398 sync 16393 postgres 905653 12.2 0.0 174380 15524 ? Ss 17:12 0:02 postgres: walsender postgres [local] COPY Is it supposed to be like that? Notice also that the regular walsender has consumed significant CPU time; it's not pinning a CPU like the sync walreceiver is, but it's eating maybe 20% of a CPU according to "top". I wonder whether in cases with only small tables (which is likely all that our tests test), the regular walreceiver manages to complete the table sync despite repeated(?) failures of the sync worker. regards, tom lane
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Eisentraut
Date:
On 2020-09-30 23:32, Tom Lane wrote: > The question that this raises is how the heck did that get past > our test suites? It seems like the error should have been obvious > to even the most minimal testing. What's the difference between this case and what the test suite is testing? Is it that it replicates between two databases on the same instance? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Eisentraut
Date:
On 2020-09-30 23:52, Tom Lane wrote: > One thing I noticed while trying to trace this down is that while the > initial table sync is happening, we have *both* a regular > walsender/walreceiver pair and a "sync" pair, eg > > postgres 905650 0.0 0.0 186052 11888 ? Ss 17:12 0:00 postgres: logical replication worker for subscription16398 > postgres 905651 50.1 0.0 173704 13496 ? Ss 17:12 0:09 postgres: walsender postgres [local] idle > postgres 905652 104 0.4 186832 148608 ? Rs 17:12 0:19 postgres: logical replication worker for subscription16398 sync 16393 > postgres 905653 12.2 0.0 174380 15524 ? Ss 17:12 0:02 postgres: walsender postgres [local] COPY That's normal. You could also have even more if tables are syncing in parallel. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Oct-01, Peter Eisentraut wrote: >> What's the difference between this case and what the test suite is testing? >> Is it that it replicates between two databases on the same instance? > I don't know why the tests pass, but the message > ERROR: error reading result of streaming command: > does appear in the logs after running src/test/subscription many times > (I see it in tests 001, 002, 013 and 014, apart from the new one in > 100). It's indeed surprising that these tests all pass! > I turned Henry's reproducer into the attached TAP test, and it does > reproduce the problem; but if I reduce the number of rows from 5000 to > 1000, then it no longer does. I don't quite see why this would be a > problem with a larger table only. Do you? I think we really need to figure that out before concluding that this problem is solved. Now that we've seen this, I'm wondering urgently what other coverage gaps we've got there. > The fix is the commented-out line in walsender.c; the test reliably > passes for me if I uncomment that, and the error message disappear from > the server logs in all the other tests. I agree that this is what we need to do code-wise; we can't let the protocol break stand, or we'll break all sorts of cross-version replication scenarios. However, we'd better also change the protocol spec to say that this is what is supposed to happen. regards, tom lane
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Alvaro Herrera
Date:
On 2020-Sep-30, Tom Lane wrote: > After some digging around, I realize that that commit actually > resulted in a protocol break. libpqwalreceiver is expecting to > get an additional CommandComplete message after COPY OUT finishes, > per libpqrcv_endstreaming(), and it's no longer getting one. > > (I have not read the protocol document to see if this is per spec; > but spec or no, that's what libpqwalreceiver is expecting.) > > The question that this raises is how the heck did that get past > our test suites? It seems like the error should have been obvious > to even the most minimal testing. So I think I can answer this now. Petr, as intellectual author of this code I would appreciate your thoughts on this -- you probably understand this much better. Same goes for Peter as committer. I think the logical replication feature is missing an detailed architectural diagram. What is happening is that the tablesync code, which is in charge of initial syncing of tables, has two main code paths: the first one is an optimization (?) in LogicalRepSyncTableStart() that if the sync takes little time, the process will exit immediately (via finish_sync_worker()) without returning control to the main logical replication apply worker code. In the TAP tests, all tables are pretty small, so that optimization always fires. So in that case we never get control back to walrcv_endstreaming (the complainant in Henry's test case), which is why replication never fails in the tests. You can verify this because the final "return slotname" line in LogicalRepSyncTableStart has zero coverage. The other code path returns the slot name to ApplyWorkerMain, which does the normal walrcv_startstreaming / LogicalRepApplyLoop / endstreaming dance. And whenever that code path is taken, all the src/test/subscription tests fail with the missing command tag, and they work correctly when the command tag is restored, which is what we wanted. My proposal at this stage is to remove the optimization in LogicalRepSyncTableStart, per 0001, if only because the code becomes simpler (0002 reindents code). With this, we get a coverage increase for tablesync.c not only in the final "return slotname" but also for process_syncing_tables_for_sync() which is currently uncovered: https://coverage.postgresql.org/src/backend/replication/logical/tablesync.c.gcov.html#268 However, and this is one reason why I'd welcome Petr/Peter thoughts on this, I don't really understand what happens in LogicalRepApplyLoop afterwards with a tablesync worker; are we actually doing anything useful there, considering that the actual data copy seems to have occurred in the CopyFrom() call in copy_table? In other words, by the time we return control to ApplyWorkerMain with a slot name, isn't the work all done, and the only thing we need is to synchronize protocol and close the connection? Or is it? Another thing I noticed is that if I crank up the number of rows in the new t/100_bugs.pl test case(*), the two sync workers run in sequence -- not in parallel. What's going on with that? Shouldn't there be two simultaneous workers? (*) In the submitted patch, the test uses 10 million rows. I intend to reduce that to, say, 5000 rows in the committed version. Strictly speaking, there's no need for this test at all since with the LogicalRepSyncTableStart changes I propose, even t/001_rep_changes.pl fails. But since my new test case fails even without that change, I prefer to keep it anyway.
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Petr Jelinek
Date:
Hi, On 10/10/2020 21:06, Alvaro Herrera wrote: > On 2020-Sep-30, Tom Lane wrote: > >> After some digging around, I realize that that commit actually >> resulted in a protocol break. libpqwalreceiver is expecting to >> get an additional CommandComplete message after COPY OUT finishes, >> per libpqrcv_endstreaming(), and it's no longer getting one. >> >> (I have not read the protocol document to see if this is per spec; >> but spec or no, that's what libpqwalreceiver is expecting.) >> >> The question that this raises is how the heck did that get past >> our test suites? It seems like the error should have been obvious >> to even the most minimal testing. > > So I think I can answer this now. Petr, as intellectual author of this > code I would appreciate your thoughts on this -- you probably understand > this much better. Same goes for Peter as committer. > > I think the logical replication feature is missing an detailed > architectural diagram. > > What is happening is that the tablesync code, which is in charge of > initial syncing of tables, has two main code paths: the first one is an > optimization (?) in LogicalRepSyncTableStart() that if the sync takes > little time, the process will exit immediately (via > finish_sync_worker()) without returning control to the main logical > replication apply worker code. In the TAP tests, all tables are pretty > small, so that optimization always fires. It's not only about size of the tables, it's mainly that there is no write activity so the main apply is not moving past the LSN at which table sync has started at. With bigger table you get at least something written (running xacts, autovacuum, or whatever) that moves lsn forward eventually. > > So in that case we never get control back to walrcv_endstreaming (the > complainant in Henry's test case), which is why replication never fails > in the tests. You can verify this because the final "return slotname" > line in LogicalRepSyncTableStart has zero coverage. > Correct. > > The other code path returns the slot name to ApplyWorkerMain, which > does the normal walrcv_startstreaming / LogicalRepApplyLoop / > endstreaming dance. And whenever that code path is taken, all the > src/test/subscription tests fail with the missing command tag, and > they work correctly when the command tag is restored, which is what > we wanted. > > My proposal at this stage is to remove the optimization in > LogicalRepSyncTableStart, per 0001, if only because the code becomes > simpler (0002 reindents code). With this, we get a coverage increase > for tablesync.c not only in the final "return slotname" but also for > process_syncing_tables_for_sync() which is currently uncovered: > https://coverage.postgresql.org/src/backend/replication/logical/tablesync.c.gcov.html#268 > > However, and this is one reason why I'd welcome Petr/Peter thoughts on > this, I don't really understand what happens in LogicalRepApplyLoop > afterwards with a tablesync worker; are we actually doing anything > useful there, considering that the actual data copy seems to have > occurred in the CopyFrom() call in copy_table? In other words, by the > time we return control to ApplyWorkerMain with a slot name, isn't the > work all done, and the only thing we need is to synchronize protocol and > close the connection? > There are 2 possible states at that point, either tablesync is ahead (when main apply lags or nothing is happening on publication side) or it's behind the main apply. When tablesync is ahead we are indeed done and just need to update the state of the table (which is what the code you removed did, but LogicalRepApplyLoop should do it as well, just a bit later). When it's behind we need to do catchup for that table only which still happens in the tablesync worker. See the explanation at the beginning of tablesync.c, it probably needs some small adjustments after the changes in your first patch. > > Another thing I noticed is that if I crank up the number of rows in the > new t/100_bugs.pl test case(*), the two sync workers run in sequence -- not > in parallel. What's going on with that? Shouldn't there be two > simultaneous workers? > That's likely because the second replication slot is waiting for xid created by the first one to commit or abort while building consistent snapshot. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Petr Jelinek
Date:
Hi, On 14/10/2020 03:12, Alvaro Herrera wrote: > On 2020-Oct-12, Petr Jelinek wrote: > >>> However, and this is one reason why I'd welcome Petr/Peter thoughts on >>> this, I don't really understand what happens in LogicalRepApplyLoop >>> afterwards with a tablesync worker; are we actually doing anything >>> useful there, considering that the actual data copy seems to have >>> occurred in the CopyFrom() call in copy_table? In other words, by the >>> time we return control to ApplyWorkerMain with a slot name, isn't the >>> work all done, and the only thing we need is to synchronize protocol and >>> close the connection? >> >> There are 2 possible states at that point, either tablesync is ahead (when >> main apply lags or nothing is happening on publication side) or it's behind >> the main apply. When tablesync is ahead we are indeed done and just need to >> update the state of the table (which is what the code you removed did, but >> LogicalRepApplyLoop should do it as well, just a bit later). When it's >> behind we need to do catchup for that table only which still happens in the >> tablesync worker. See the explanation at the beginning of tablesync.c, it >> probably needs some small adjustments after the changes in your first patch. > > ... Ooh, things start to make some sense now. So how about the > attached? There are some not really related cleanups. (Changes to > protocol.sgml are still pending.) > It would be nice if the new sentences at the beginning of tablesync.c started with uppercase, but that's about as nitpicky as I can be :) > If I understand correcly, the early exit in tablesync.c is not saving *a > lot* of time (we don't actually skip replaying any WAL), even if it's > saving execution of a bunch of code. So I stand by my position that > removing the code is better because it's clearer about what is actually > happening. > I don't really have any problems with the simplification you propose. The saved time is probably in order of hundreds of ms which for table sync is insignificant. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Oct-14, Petr Jelinek wrote: > > > It would be nice if the new sentences at the beginning of tablesync.c > > started with uppercase, but that's about as nitpicky as I can be :) > > OK, fixed that :-) And pushed (to master only). There's one more > change I added at the last minute, which is to remove the 'missing_ok' > parameter of GetSubscriptionRelState. There are some other cosmetic > changes, but nothing of substance. > > > > If I understand correcly, the early exit in tablesync.c is not saving *a > > > lot* of time (we don't actually skip replaying any WAL), even if it's > > > saving execution of a bunch of code. So I stand by my position that > > > removing the code is better because it's clearer about what is actually > > > happening. > > > > I don't really have any problems with the simplification you propose. The > > saved time is probably in order of hundreds of ms which for table sync is > > insignificant. > > Great, thanks. > > If you think this is done ... I have taken a few notes in the process: > > * STREAM COMMIT bug? > In apply_handle_stream_commit, we do CommitTransactionCommand, but > apparently in a tablesync worker we shouldn't do it. > In the tablesync stage, we don't allow streaming. See pgoutput_startup where we disable streaming for the init phase. As far as I understand, for tablesync we create the initial slot during which streaming will be disabled then we will copy the table (here logical decoding won't be used) and then allow the apply worker to get any other data which is inserted in the meantime. Now, I might be missing something here but if you can explain it a bit more or share some test to show how we can reach here via tablesync worker then we can discuss the possible solution. Sorry for responding late. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Alvaro Herrera
Date:
On 2020-Nov-04, Amit Kapila wrote: > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > * STREAM COMMIT bug? > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > apparently in a tablesync worker we shouldn't do it. > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > where we disable streaming for the init phase. As far as I understand, > for tablesync we create the initial slot during which streaming will > be disabled then we will copy the table (here logical decoding won't > be used) and then allow the apply worker to get any other data which > is inserted in the meantime. Now, I might be missing something here > but if you can explain it a bit more or share some test to show how we > can reach here via tablesync worker then we can discuss the possible > solution. Hmm, okay, that sounds like there would be no bug then. Maybe what we need is just an assert in apply_handle_stream_commit that !am_tablesync_worker(), as in the attached patch. Passes tests.
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Nov-04, Amit Kapila wrote: > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > * STREAM COMMIT bug? > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > apparently in a tablesync worker we shouldn't do it. > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > where we disable streaming for the init phase. As far as I understand, > > for tablesync we create the initial slot during which streaming will > > be disabled then we will copy the table (here logical decoding won't > > be used) and then allow the apply worker to get any other data which > > is inserted in the meantime. Now, I might be missing something here > > but if you can explain it a bit more or share some test to show how we > > can reach here via tablesync worker then we can discuss the possible > > solution. > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > need is just an assert in apply_handle_stream_commit that > !am_tablesync_worker(), as in the attached patch. Passes tests. > +1. But do we want to have this Assert only in stream_commit API or all stream APIs as well? -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Thu, Nov 5, 2020 at 9:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2020-Nov-04, Amit Kapila wrote: > > > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > * STREAM COMMIT bug? > > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > > apparently in a tablesync worker we shouldn't do it. > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > > where we disable streaming for the init phase. As far as I understand, > > > for tablesync we create the initial slot during which streaming will > > > be disabled then we will copy the table (here logical decoding won't > > > be used) and then allow the apply worker to get any other data which > > > is inserted in the meantime. Now, I might be missing something here > > > but if you can explain it a bit more or share some test to show how we > > > can reach here via tablesync worker then we can discuss the possible > > > solution. > > > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > > need is just an assert in apply_handle_stream_commit that > > !am_tablesync_worker(), as in the attached patch. Passes tests. > > > > +1. But do we want to have this Assert only in stream_commit API or > all stream APIs as well? > One more point to look here is at what point does the tablesync worker is involved in applying decode transactions if any? Basically, I would like to ensure that if it uses the slot it has initially created (before copy) then it is probably fine because we don't enable streaming with it during the initial phase but if it uses the slot to decode xacts after copy then we need to probably once check if it is still true that streaming is not enabled at that point. I am not completely sure if there are existing test cases to cover any such scenarios so probably thinking a bit more about this might be helpful. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Alvaro Herrera
Date:
On 2020-Nov-05, Amit Kapila wrote: > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2020-Nov-04, Amit Kapila wrote: > > > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > * STREAM COMMIT bug? > > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > > apparently in a tablesync worker we shouldn't do it. > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > > where we disable streaming for the init phase. As far as I understand, > > > for tablesync we create the initial slot during which streaming will > > > be disabled then we will copy the table (here logical decoding won't > > > be used) and then allow the apply worker to get any other data which > > > is inserted in the meantime. Now, I might be missing something here > > > but if you can explain it a bit more or share some test to show how we > > > can reach here via tablesync worker then we can discuss the possible > > > solution. > > > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > > need is just an assert in apply_handle_stream_commit that > > !am_tablesync_worker(), as in the attached patch. Passes tests. > > > > +1. But do we want to have this Assert only in stream_commit API or > all stream APIs as well? Well, the only reason I care about this is that apply_handle_commit contains a comment that we must not do CommitTransactionCommand in the syncworker case; so if you look at apply_handle_stream_commit and note that it doesn't concern it about that, you become concerned that it might be broken. I don't think the other routines handling the "stream" thing have that issue.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Sat, Nov 7, 2020 at 5:31 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Nov-05, Amit Kapila wrote: > > > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > On 2020-Nov-04, Amit Kapila wrote: > > > > > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > > > * STREAM COMMIT bug? > > > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > > > apparently in a tablesync worker we shouldn't do it. > > > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > > > where we disable streaming for the init phase. As far as I understand, > > > > for tablesync we create the initial slot during which streaming will > > > > be disabled then we will copy the table (here logical decoding won't > > > > be used) and then allow the apply worker to get any other data which > > > > is inserted in the meantime. Now, I might be missing something here > > > > but if you can explain it a bit more or share some test to show how we > > > > can reach here via tablesync worker then we can discuss the possible > > > > solution. > > > > > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > > > need is just an assert in apply_handle_stream_commit that > > > !am_tablesync_worker(), as in the attached patch. Passes tests. > > > > > > > +1. But do we want to have this Assert only in stream_commit API or > > all stream APIs as well? > > Well, the only reason I care about this is that apply_handle_commit > contains a comment that we must not do CommitTransactionCommand in the > syncworker case; so if you look at apply_handle_stream_commit and note > that it doesn't concern it about that, you become concerned that it > might be broken. I don't think the other routines handling the "stream" > thing have that issue. > Fair enough, as mentioned in my previous email, I think we need to confirm once that after copy how the decoding happens on upstream for transactions during the phase where tablesync workers is moving to state SUBREL_STATE_SYNCDONE from SUBREL_STATE_CATCHUP. I'll try to come up (in next few days) with some test case to debug and test this particular scenario and share my findings. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Sat, Nov 7, 2020 at 9:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Nov 7, 2020 at 5:31 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2020-Nov-05, Amit Kapila wrote: > > > > > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > > > On 2020-Nov-04, Amit Kapila wrote: > > > > > > > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > > > > > * STREAM COMMIT bug? > > > > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > > > > apparently in a tablesync worker we shouldn't do it. > > > > > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > > > > where we disable streaming for the init phase. As far as I understand, > > > > > for tablesync we create the initial slot during which streaming will > > > > > be disabled then we will copy the table (here logical decoding won't > > > > > be used) and then allow the apply worker to get any other data which > > > > > is inserted in the meantime. Now, I might be missing something here > > > > > but if you can explain it a bit more or share some test to show how we > > > > > can reach here via tablesync worker then we can discuss the possible > > > > > solution. > > > > > > > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > > > > need is just an assert in apply_handle_stream_commit that > > > > !am_tablesync_worker(), as in the attached patch. Passes tests. > > > > > > > > > > +1. But do we want to have this Assert only in stream_commit API or > > > all stream APIs as well? > > > > Well, the only reason I care about this is that apply_handle_commit > > contains a comment that we must not do CommitTransactionCommand in the > > syncworker case; so if you look at apply_handle_stream_commit and note > > that it doesn't concern it about that, you become concerned that it > > might be broken. I don't think the other routines handling the "stream" > > thing have that issue. > > > > Fair enough, as mentioned in my previous email, I think we need to > confirm once that after copy how the decoding happens on upstream for > transactions during the phase where tablesync workers is moving to > state SUBREL_STATE_SYNCDONE from SUBREL_STATE_CATCHUP. I'll try to > come up (in next few days) with some test case to debug and test this > particular scenario and share my findings. IIUC, the table sync worker does the initial copy using the consistent snapshot. And after that, if the main apply worker is behind us then it will wait for the apply worker to reach the table sync worker's start point and after that, the apply worker can continue applying the changes. OTOH, of the apply worker have already moved ahead in processing the WAL after it had launched the table sync worker that means the apply worker would have skipped those many transactions as the table was not in SYNC DONE state so now the table sync worker need to cover this gap by applying the walls using normal apply path and it can be moved to the SYNC done state once it catches up with the actual apply worker. After putting the table sync worker in the catchup state the apply worker will wait for the table sync worker to catchup. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2020-Oct-14, Petr Jelinek wrote: > > > > > It would be nice if the new sentences at the beginning of tablesync.c > > > started with uppercase, but that's about as nitpicky as I can be :) > > > > OK, fixed that :-) And pushed (to master only). There's one more > > change I added at the last minute, which is to remove the 'missing_ok' > > parameter of GetSubscriptionRelState. There are some other cosmetic > > changes, but nothing of substance. > > > > > > If I understand correcly, the early exit in tablesync.c is not saving *a > > > > lot* of time (we don't actually skip replaying any WAL), even if it's > > > > saving execution of a bunch of code. So I stand by my position that > > > > removing the code is better because it's clearer about what is actually > > > > happening. > > > > > > I don't really have any problems with the simplification you propose. The > > > saved time is probably in order of hundreds of ms which for table sync is > > > insignificant. > > > > Great, thanks. > > > > If you think this is done ... I have taken a few notes in the process: > > > > * STREAM COMMIT bug? > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > apparently in a tablesync worker we shouldn't do it. > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > where we disable streaming for the init phase. As far as I understand, > for tablesync we create the initial slot during which streaming will > be disabled then we will copy the table (here logical decoding won't > be used) and then allow the apply worker to get any other data which > is inserted in the meantime. I think this assumption is not completely correct, because if the tablesync worker is behind the apply worker then it will start the streaming by itself until it reaches from CATCHUP to SYNC DONE state. So during that time if streaming is on then the tablesync worker will also send the streaming on. I think for disabling the streaming in the tablesync worker we can do something like this. diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 3a5b733ee3..21ac29f703 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -3084,7 +3084,7 @@ ApplyWorkerMain(Datum main_arg) LOGICALREP_PROTO_STREAM_VERSION_NUM : LOGICALREP_PROTO_VERSION_NUM; options.proto.logical.publication_names = MySubscription->publications; options.proto.logical.binary = MySubscription->binary; - options.proto.logical.streaming = MySubscription->stream; + options.proto.logical.streaming = am_tablesync_worker() ? false : MySubscription->stream; -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > where we disable streaming for the init phase. As far as I understand, > > for tablesync we create the initial slot during which streaming will > > be disabled then we will copy the table (here logical decoding won't > > be used) and then allow the apply worker to get any other data which > > is inserted in the meantime. > > I think this assumption is not completely correct, because if the > tablesync worker is behind the apply worker then it will start the > streaming by itself until it reaches from CATCHUP to SYNC DONE state. > So during that time if streaming is on then the tablesync worker will > also send the streaming on. > Yeah, this seems to be possible and this is the reason I mentioned above to dig more into this case. Did you try it via some test case? I think we can generate a test via debugger where after the tablesync worker reaches CATCHUP state stop it via debugger, then we can perform some large transaction on the same table which apply worker will skip and tablesync worker will try to apply changes and should fail. > I think for disabling the streaming in > the tablesync worker we can do something like this. > Sure, but why do we want to prohibit streaming in tablesync worker unless there is some fundamental reason for the same? If we can write a test based on what I described above then we can probably know if there is any real issue with allowing streaming via tablesync worker. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Mon, Nov 9, 2020 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > > where we disable streaming for the init phase. As far as I understand, > > > for tablesync we create the initial slot during which streaming will > > > be disabled then we will copy the table (here logical decoding won't > > > be used) and then allow the apply worker to get any other data which > > > is inserted in the meantime. > > > > I think this assumption is not completely correct, because if the > > tablesync worker is behind the apply worker then it will start the > > streaming by itself until it reaches from CATCHUP to SYNC DONE state. > > So during that time if streaming is on then the tablesync worker will > > also send the streaming on. > > > > Yeah, this seems to be possible and this is the reason I mentioned > above to dig more into this case. Did you try it via some test case? I > think we can generate a test via debugger where after the tablesync > worker reaches CATCHUP state stop it via debugger, then we can perform > some large transaction on the same table which apply worker will skip > and tablesync worker will try to apply changes and should fail. Yeah, we can test like that. I haven't yet tested yet. > > I think for disabling the streaming in > > the tablesync worker we can do something like this. > > > > Sure, but why do we want to prohibit streaming in tablesync worker > unless there is some fundamental reason for the same? If we can write > a test based on what I described above then we can probably know if > there is any real issue with allowing streaming via tablesync worker. I think there is no fundamental reason for the same, but I thought it is just an initial catchup state so does it make sense to stream the transaction. But if we want to stream then we need to put some handling in apply_handle_stream_commit so that it can avoid committing if this is the table-sync worker. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Mon, Nov 9, 2020 at 1:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Nov 9, 2020 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Sure, but why do we want to prohibit streaming in tablesync worker > > unless there is some fundamental reason for the same? If we can write > > a test based on what I described above then we can probably know if > > there is any real issue with allowing streaming via tablesync worker. > > I think there is no fundamental reason for the same, but I thought it > is just an initial catchup state so does it make sense to stream the > transaction. But if we want to stream then we need to put some > handling in apply_handle_stream_commit so that it can avoid committing > if this is the table-sync worker. > Sure, let's try that then instead of blindly avoid streaming if it is possible. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Tue, Nov 17, 2020 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Yeah, this seems to be possible and this is the reason I mentioned > above to dig more into this case. Did you try it via some test case? I > think we can generate a test via debugger where after the tablesync > worker reaches CATCHUP state stop it via debugger, then we can perform > some large transaction on the same table which apply worker will skip > and tablesync worker will try to apply changes and should fail. Hello Amit. FYI - This email is just to confirm that your above idea for debugging the tablesync worker does work. I have made a small temporary patch to aid testing this: PSA. The patch just adds some more LOGs - it helps see what is going on. The patch also gives a 30 second opportunity to attach to the tablesync worker process. ---- So the necessary debugging steps are like this: 0. Start debugger - Later we will attach to the tablesync worker process so it is convenient to set all the breakpoint in advance (gdb) b LogicalRepSyncTableStart (gdb) b LogicalRepApplyLoop (gdb) b apply_dispatch (gdb) b process_syncing_tables_for_sync 1. Pub/Sub nodes: Create/Start Publisher and Subscriber nodes with same test table. 2. Pub node: Add some initial data to the table (to give tablesync init something to copy) 3. Pub node: CREATE PUBLISHER 4. Sub node: CREATE SUBSCRIBER - at this point the slot gets created and the tablesync worker and apply worker are launched - that patch causes the tablesync worker to sleep 30 seconds to give opportunity to attach to it in debugger 5. Debugger/tablesync: Attach to the tablesync worker - continue - breaks at LogicalRepSyncTableStart - step over all the initial COPY code to the end of the function (where apply worker has told it to CATCHUP) 6. Pub: While the tablesync worker is paused in debugger do some other operations on the original table - e.g. psql -d test_pub -c "INSERT INTO test_tab VALUES(1, 'foo');" 7. Debugger/tablesync: Allow the tablesync worker to continue - tablesync worker will return from LogicalRepSyncTableStart and then into LogicalRepApplyLoop - if tablesync worker detects it is "behind" then the LogicalRepApplyLoop will call apply_dispatch to deal with the message resulting from the step 6 operation. 8. Debugger/tablesync: Some apply handlers (e.g. apply_handle_commit) if not the LogicalRepApplyLoop itself, will end up calling the process_syncing_tables. - which calls process_syncing_tables_for_sync - which will cause the tablesync worker to set SYNCDONE status just before killing itself. 9. Sub node: The apply worker wait loop will now proceed again tick/tick/tick every 1 second to process any incoming replication messages. --- Using the above technique, you can cause a tablesync worker to enter some of the normal "apply" handling which might otherwise be very hard to reproduce outside of the debugger. --- Here is an example of some of the logging for the above debugging scenario: 2020-11-17 10:46:11.325 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:12.328 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:12.328 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:13.337 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:13.337 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:14.340 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:14.340 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:15.155 AEDT [9841] LOG: !!>> tablesync worker: About to call LogicalRepSyncTableStart to do initial syncing 2020-11-17 10:46:15.343 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:15.343 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:15.343 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:15.343 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:16.349 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:16.349 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:17.351 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:17.351 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:18.353 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:18.353 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:18.650 AEDT [10734] LOG: logical decoding found consistent point at 0/1B6D5E0 2020-11-17 10:46:18.650 AEDT [10734] DETAIL: There are no running transactions. 2020-11-17 10:46:18.650 AEDT [10734] STATEMENT: CREATE_REPLICATION_SLOT "tap_sub_24599_sync_16385" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 2020-11-17 10:46:18.658 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:18.658 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:46:18.744 AEDT [9841] LOG: !!>> tablesync worker: wait for CATCHUP state notification 2020-11-17 10:46:18.744 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:46:18.744 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables psql -d test_pub -c "INSERT INTO test_tab VALUES(1, 'foo');" INSERT 0 1 [postgres@CentOS7-x64 ~]$ 2020-11-17 10:47:39.269 AEDT [9841] LOG: !!>> tablesync worker: received CATCHUP state notification 2020-11-17 10:47:44.117 AEDT [9841] LOG: !!>> tablesync worker: Returned from LogicalRepSyncTableStart 2020-11-17 10:48:05.678 AEDT [10734] LOG: starting logical decoding for slot "tap_sub_24599_sync_16385" 2020-11-17 10:48:05.678 AEDT [10734] DETAIL: Streaming transactions committing after 0/1B6D618, reading WAL from 0/1B6D5E0. 2020-11-17 10:48:05.678 AEDT [10734] STATEMENT: START_REPLICATION SLOT "tap_sub_24599_sync_16385" LOGICAL 0/1B6D618 (proto_version '2', publication_names '"tap_pub"') 2020-11-17 10:48:05.678 AEDT [10734] LOG: logical decoding found consistent point at 0/1B6D5E0 2020-11-17 10:48:05.678 AEDT [10734] DETAIL: There are no running transactions. 2020-11-17 10:48:05.678 AEDT [10734] STATEMENT: START_REPLICATION SLOT "tap_sub_24599_sync_16385" LOGICAL 0/1B6D618 (proto_version '2', publication_names '"tap_pub"') 2020-11-17 10:48:09.258 AEDT [9841] LOG: !!>> tablesync worker: LogicalRepApplyLoop 2020-11-17 10:49:36.537 AEDT [9841] LOG: !!>> tablesync worker: called process_syncing_tables 2020-11-17 10:49:36.538 AEDT [9841] LOG: logical replication table synchronization worker for subscription "tap_sub", table "test_tab" has finished 2020-11-17 10:49:36.588 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:49:36.589 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:49:36.590 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:49:37.591 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:49:37.591 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:49:38.592 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:49:38.592 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:49:39.594 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:49:39.594 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables 2020-11-17 10:49:40.595 AEDT [9838] LOG: !!>> apply worker: LogicalRepApplyLoop 2020-11-17 10:49:40.595 AEDT [9838] LOG: !!>> apply worker: called process_syncing_tables --- I have so far only been trying above with the non-streaming subscription, and TBH stepping through this startup state "dance" of the tablesync/apply workers is already causing more questions than answers for me. Anyway, I will raise any questions as separate emails to this one. BTW, do you think these tablesync discussions should be moved to hackers list? I think they are no longer related to the reported BUG of this current thread. Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Tue, Nov 17, 2020 at 7:44 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Tue, Nov 17, 2020 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Yeah, this seems to be possible and this is the reason I mentioned > > above to dig more into this case. Did you try it via some test case? I > > think we can generate a test via debugger where after the tablesync > > worker reaches CATCHUP state stop it via debugger, then we can perform > > some large transaction on the same table which apply worker will skip > > and tablesync worker will try to apply changes and should fail. > > Hello Amit. > > FYI - This email is just to confirm that your above idea for debugging > the tablesync worker does work. > Thanks for trying this out. > --- > > I have so far only been trying above with the non-streaming > subscription, and TBH stepping through this startup state "dance" of > the tablesync/apply workers is already causing more questions than > answers for me. Anyway, I will raise any questions as separate emails > to this one. > > BTW, do you think these tablesync discussions should be moved to > hackers list? > Sure. I think it is better to start a new thread for the streaming issue we have suspected here and possible ways to fix the same. I guess you have some other observations as well which you might want to discuss separately. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Nov-04, Amit Kapila wrote: > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > * STREAM COMMIT bug? > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > apparently in a tablesync worker we shouldn't do it. > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > where we disable streaming for the init phase. As far as I understand, > > for tablesync we create the initial slot during which streaming will > > be disabled then we will copy the table (here logical decoding won't > > be used) and then allow the apply worker to get any other data which > > is inserted in the meantime. Now, I might be missing something here > > but if you can explain it a bit more or share some test to show how we > > can reach here via tablesync worker then we can discuss the possible > > solution. > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > need is just an assert in apply_handle_stream_commit that > !am_tablesync_worker(), as in the attached patch. Passes tests. Hi. Using the same debugging technique described in a previous mail [1], I have tested again but this time using a SUBSCRIPTION capable of streaming. While paused in the debugger (to force an unusual timing situation) I can publish INSERTs en masse and cause streaming replication to occur. To cut a long story short, a tablesync worker CAN in fact end up processing (e.g. apply_dispatch) streaming messages. So the tablesync worker CAN get into the apply_handle_stream_commit. And this scenario, albeit rare, will crash. For example, --- Program received signal SIGABRT, Aborted. 0x00007f37570f22c7 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007f37570f22c7 in raise () from /lib64/libc.so.6 #1 0x00007f37570f39b8 in abort () from /lib64/libc.so.6 #2 0x0000000000f2bf6a in ExceptionalCondition (conditionName=0x11330a4 "IsTransactionState()", errorType=0x1133094 "FailedAssertion", fileName=0x1133088 "tablesync.c", lineNumber=273) at assert.c:69 #3 0x0000000000bb046e in process_syncing_tables_for_sync (current_lsn=23984432) at tablesync.c:273 #4 0x0000000000bb1039 in process_syncing_tables (current_lsn=23984432) at tablesync.c:535 #5 0x0000000000bb5c7a in apply_handle_stream_commit (s=0x7ffd2d892fd0) at worker.c:1080 #6 0x0000000000bb8edb in apply_dispatch (s=0x7ffd2d892fd0) at worker.c:1966 #7 0x0000000000bb97c9 in LogicalRepApplyLoop (last_received=23984432) at worker.c:2166 #8 0x0000000000bbca84 in ApplyWorkerMain (main_arg=1) at worker.c:3122 #9 0x0000000000b36f11 in StartBackgroundWorker () at bgworker.c:820 #10 0x0000000000b5cf3a in do_start_bgworker (rw=0x1f3da50) at postmaster.c:5837 #11 0x0000000000b5d661 in maybe_start_bgworkers () at postmaster.c:6062 #12 0x0000000000b5b560 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5223 #13 <signal handler called> #14 0x00007f37571b10d3 in __select_nocancel () from /lib64/libc.so.6 #15 0x0000000000b52a40 in ServerLoop () at postmaster.c:1691 #16 0x0000000000b51fa8 in PostmasterMain (argc=3, argv=0x1f164f0) at postmaster.c:1400 #17 0x00000000009ab55d in main (argc=3, argv=0x1f164f0) at main.c:209 --- Also, PSA v2 of my "helper" patch for debugging the tablesync worker. (This patch has more logging than v1) --- [1] - https://www.postgresql.org/message-id/CAHut%2BPsprtsa4o89wtNnKLxxwXeDKAX9nNsdghT1Pv63siz%2BAA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Wed, Nov 18, 2020 at 8:18 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2020-Nov-04, Amit Kapila wrote: > > > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > * STREAM COMMIT bug? > > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > > apparently in a tablesync worker we shouldn't do it. > > > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > > where we disable streaming for the init phase. As far as I understand, > > > for tablesync we create the initial slot during which streaming will > > > be disabled then we will copy the table (here logical decoding won't > > > be used) and then allow the apply worker to get any other data which > > > is inserted in the meantime. Now, I might be missing something here > > > but if you can explain it a bit more or share some test to show how we > > > can reach here via tablesync worker then we can discuss the possible > > > solution. > > > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > > need is just an assert in apply_handle_stream_commit that > > !am_tablesync_worker(), as in the attached patch. Passes tests. > > Hi. > > Using the same debugging technique described in a previous mail [1], I > have tested again but this time using a SUBSCRIPTION capable of > streaming. > > While paused in the debugger (to force an unusual timing situation) I > can publish INSERTs en masse and cause streaming replication to occur. > > To cut a long story short, a tablesync worker CAN in fact end up > processing (e.g. apply_dispatch) streaming messages. > So the tablesync worker CAN get into the apply_handle_stream_commit. > And this scenario, albeit rare, will crash. > Thank you for reproducing this issue. Dilip, Peter, is anyone of you interested in writing a fix for this? -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > To cut a long story short, a tablesync worker CAN in fact end up > > processing (e.g. apply_dispatch) streaming messages. > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > And this scenario, albeit rare, will crash. > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > interested in writing a fix for this? Hi Amit. FYI - Sorry, I am away/offline for the next 5 days. However, if this bug still remains unfixed after next Tuesday then I can look at it then. --- IIUC there are 2 options: 1) Disallow streaming for the tablesync worker. 2) Make streaming work for the tablesync worker. I prefer option (a) not only because of the KISS principle, but also because this is how the tablesync worker was previously thought to behave anyway. I expect this fix may be like the code that Dilip already posted [1] [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com OTOH, option (b) fix may or may not be possible (I don't know), but I have doubts that it is worthwhile to consider making a special fix for a scenario which so far has never been reproduced outside of the debugger. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > To cut a long story short, a tablesync worker CAN in fact end up > > > processing (e.g. apply_dispatch) streaming messages. > > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > > And this scenario, albeit rare, will crash. > > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > > interested in writing a fix for this? > > Hi Amit. > > FYI - Sorry, I am away/offline for the next 5 days. > > However, if this bug still remains unfixed after next Tuesday then I > can look at it then. > Fair enough. Let's see if Dilip or I can get a chance to look into this before that. > --- > > IIUC there are 2 options: > 1) Disallow streaming for the tablesync worker. > 2) Make streaming work for the tablesync worker. > > I prefer option (a) not only because of the KISS principle, but also > because this is how the tablesync worker was previously thought to > behave anyway. I expect this fix may be like the code that Dilip > already posted [1] > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com > > OTOH, option (b) fix may or may not be possible (I don't know), but I > have doubts that it is worthwhile to consider making a special fix for > a scenario which so far has never been reproduced outside of the > debugger. > I would prefer option (b) unless the fix is not possible due to design constraints. I don't think it is a good idea to make tablesync workers behave differently unless we have a reason for doing so. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > To cut a long story short, a tablesync worker CAN in fact end up > > > > processing (e.g. apply_dispatch) streaming messages. > > > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > > > And this scenario, albeit rare, will crash. > > > > > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > > > interested in writing a fix for this? > > > > Hi Amit. > > > > FYI - Sorry, I am away/offline for the next 5 days. > > > > However, if this bug still remains unfixed after next Tuesday then I > > can look at it then. > > > > Fair enough. Let's see if Dilip or I can get a chance to look into > this before that. > > > --- > > > > IIUC there are 2 options: > > 1) Disallow streaming for the tablesync worker. > > 2) Make streaming work for the tablesync worker. > > > > I prefer option (a) not only because of the KISS principle, but also > > because this is how the tablesync worker was previously thought to > > behave anyway. I expect this fix may be like the code that Dilip > > already posted [1] > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I > > have doubts that it is worthwhile to consider making a special fix for > > a scenario which so far has never been reproduced outside of the > > debugger. > > > > I would prefer option (b) unless the fix is not possible due to design > constraints. I don't think it is a good idea to make tablesync workers > behave differently unless we have a reason for doing so. > Okay, I will analyze this and try to submit my finding today. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > To cut a long story short, a tablesync worker CAN in fact end up > > > > > processing (e.g. apply_dispatch) streaming messages. > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > > > > And this scenario, albeit rare, will crash. > > > > > > > > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > > > > interested in writing a fix for this? > > > > > > Hi Amit. > > > > > > FYI - Sorry, I am away/offline for the next 5 days. > > > > > > However, if this bug still remains unfixed after next Tuesday then I > > > can look at it then. > > > > > > > Fair enough. Let's see if Dilip or I can get a chance to look into > > this before that. > > > > > --- > > > > > > IIUC there are 2 options: > > > 1) Disallow streaming for the tablesync worker. > > > 2) Make streaming work for the tablesync worker. > > > > > > I prefer option (a) not only because of the KISS principle, but also > > > because this is how the tablesync worker was previously thought to > > > behave anyway. I expect this fix may be like the code that Dilip > > > already posted [1] > > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com > > > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I > > > have doubts that it is worthwhile to consider making a special fix for > > > a scenario which so far has never been reproduced outside of the > > > debugger. > > > > > > > I would prefer option (b) unless the fix is not possible due to design > > constraints. I don't think it is a good idea to make tablesync workers > > behave differently unless we have a reason for doing so. > > > > Okay, I will analyze this and try to submit my finding today. I have done my analysis, basically, the table sync worker is applying all the changes (for multiple transactions from upstream) under the single transaction (on downstream). Now for normal changes, we can just avoid committing in apply_handle_commit and everything is fine for streaming changes we also have the transaction at the stream level which we need to manage the buffiles for storing the streaming changes. So if we want to do that for the streaming transaction then we need to avoid commit transactions on apply_handle_stream_commit as apply_handle_stream_stop for the table sync worker. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Fri, Nov 20, 2020 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > To cut a long story short, a tablesync worker CAN in fact end up > > > > > > processing (e.g. apply_dispatch) streaming messages. > > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > > > > > And this scenario, albeit rare, will crash. > > > > > > > > > > > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > > > > > interested in writing a fix for this? > > > > > > > > Hi Amit. > > > > > > > > FYI - Sorry, I am away/offline for the next 5 days. > > > > > > > > However, if this bug still remains unfixed after next Tuesday then I > > > > can look at it then. > > > > > > > > > > Fair enough. Let's see if Dilip or I can get a chance to look into > > > this before that. > > > > > > > --- > > > > > > > > IIUC there are 2 options: > > > > 1) Disallow streaming for the tablesync worker. > > > > 2) Make streaming work for the tablesync worker. > > > > > > > > I prefer option (a) not only because of the KISS principle, but also > > > > because this is how the tablesync worker was previously thought to > > > > behave anyway. I expect this fix may be like the code that Dilip > > > > already posted [1] > > > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com > > > > > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I > > > > have doubts that it is worthwhile to consider making a special fix for > > > > a scenario which so far has never been reproduced outside of the > > > > debugger. > > > > > > > > > > I would prefer option (b) unless the fix is not possible due to design > > > constraints. I don't think it is a good idea to make tablesync workers > > > behave differently unless we have a reason for doing so. > > > > > > > Okay, I will analyze this and try to submit my finding today. > > I have done my analysis, basically, the table sync worker is applying > all the changes (for multiple transactions from upstream) under the > single transaction (on downstream). Now for normal changes, we can > just avoid committing in apply_handle_commit and everything is fine > for streaming changes we also have the transaction at the stream level > which we need to manage the buffiles for storing the streaming > changes. So if we want to do that for the streaming transaction then > we need to avoid commit transactions on apply_handle_stream_commit as > apply_handle_stream_stop for the table sync worker. > And what about apply_handle_stream_abort? And, I guess we need to avoid other related things like update of replorigin_session_origin_lsn, replorigin_session_origin_timestamp, etc. in apply_handle_stream_commit() as we are apply_handle_commit(). I think it is difficult to have a reliable test case for this but feel free to propose if you have any ideas on the same. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 20, 2020 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > To cut a long story short, a tablesync worker CAN in fact end up > > > > > > > processing (e.g. apply_dispatch) streaming messages. > > > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > > > > > > And this scenario, albeit rare, will crash. > > > > > > > > > > > > > > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > > > > > > interested in writing a fix for this? > > > > > > > > > > Hi Amit. > > > > > > > > > > FYI - Sorry, I am away/offline for the next 5 days. > > > > > > > > > > However, if this bug still remains unfixed after next Tuesday then I > > > > > can look at it then. > > > > > > > > > > > > > Fair enough. Let's see if Dilip or I can get a chance to look into > > > > this before that. > > > > > > > > > --- > > > > > > > > > > IIUC there are 2 options: > > > > > 1) Disallow streaming for the tablesync worker. > > > > > 2) Make streaming work for the tablesync worker. > > > > > > > > > > I prefer option (a) not only because of the KISS principle, but also > > > > > because this is how the tablesync worker was previously thought to > > > > > behave anyway. I expect this fix may be like the code that Dilip > > > > > already posted [1] > > > > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com > > > > > > > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I > > > > > have doubts that it is worthwhile to consider making a special fix for > > > > > a scenario which so far has never been reproduced outside of the > > > > > debugger. > > > > > > > > > > > > > I would prefer option (b) unless the fix is not possible due to design > > > > constraints. I don't think it is a good idea to make tablesync workers > > > > behave differently unless we have a reason for doing so. > > > > > > > > > > Okay, I will analyze this and try to submit my finding today. > > > > I have done my analysis, basically, the table sync worker is applying > > all the changes (for multiple transactions from upstream) under the > > single transaction (on downstream). Now for normal changes, we can > > just avoid committing in apply_handle_commit and everything is fine > > for streaming changes we also have the transaction at the stream level > > which we need to manage the buffiles for storing the streaming > > changes. So if we want to do that for the streaming transaction then > > we need to avoid commit transactions on apply_handle_stream_commit as > > apply_handle_stream_stop for the table sync worker. > > > > And what about apply_handle_stream_abort? And, I guess we need to > avoid other related things like update of > replorigin_session_origin_lsn, replorigin_session_origin_timestamp, > etc. in apply_handle_stream_commit() as we are apply_handle_commit(). Yes, we need to change these as well. I have tested using the POC patch and working fine. I will send the patch after some more testing. > > I think it is difficult to have a reliable test case for this but feel > free to propose if you have any ideas on the same. I am not sure how to write an automated test case for this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Fri, Nov 20, 2020 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Nov 20, 2020 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > To cut a long story short, a tablesync worker CAN in fact end up > > > > > > > > processing (e.g. apply_dispatch) streaming messages. > > > > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit. > > > > > > > > And this scenario, albeit rare, will crash. > > > > > > > > > > > > > > > > > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you > > > > > > > interested in writing a fix for this? > > > > > > > > > > > > Hi Amit. > > > > > > > > > > > > FYI - Sorry, I am away/offline for the next 5 days. > > > > > > > > > > > > However, if this bug still remains unfixed after next Tuesday then I > > > > > > can look at it then. > > > > > > > > > > > > > > > > Fair enough. Let's see if Dilip or I can get a chance to look into > > > > > this before that. > > > > > > > > > > > --- > > > > > > > > > > > > IIUC there are 2 options: > > > > > > 1) Disallow streaming for the tablesync worker. > > > > > > 2) Make streaming work for the tablesync worker. > > > > > > > > > > > > I prefer option (a) not only because of the KISS principle, but also > > > > > > because this is how the tablesync worker was previously thought to > > > > > > behave anyway. I expect this fix may be like the code that Dilip > > > > > > already posted [1] > > > > > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com > > > > > > > > > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I > > > > > > have doubts that it is worthwhile to consider making a special fix for > > > > > > a scenario which so far has never been reproduced outside of the > > > > > > debugger. > > > > > > > > > > > > > > > > I would prefer option (b) unless the fix is not possible due to design > > > > > constraints. I don't think it is a good idea to make tablesync workers > > > > > behave differently unless we have a reason for doing so. > > > > > > > > > > > > > Okay, I will analyze this and try to submit my finding today. > > > > > > I have done my analysis, basically, the table sync worker is applying > > > all the changes (for multiple transactions from upstream) under the > > > single transaction (on downstream). Now for normal changes, we can > > > just avoid committing in apply_handle_commit and everything is fine > > > for streaming changes we also have the transaction at the stream level > > > which we need to manage the buffiles for storing the streaming > > > changes. So if we want to do that for the streaming transaction then > > > we need to avoid commit transactions on apply_handle_stream_commit as > > > apply_handle_stream_stop for the table sync worker. > > > > > > > And what about apply_handle_stream_abort? And, I guess we need to > > avoid other related things like update of > > replorigin_session_origin_lsn, replorigin_session_origin_timestamp, > > etc. in apply_handle_stream_commit() as we are apply_handle_commit(). > > Yes, we need to change these as well. I have tested using the POC > patch and working fine. I will send the patch after some more > testing. > > > > I think it is difficult to have a reliable test case for this but feel > > free to propose if you have any ideas on the same. > > I am not sure how to write an automated test case for this. Here is the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Fri, Nov 20, 2020 at 11:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Nov 20, 2020 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > And what about apply_handle_stream_abort? And, I guess we need to > > > avoid other related things like update of > > > replorigin_session_origin_lsn, replorigin_session_origin_timestamp, > > > etc. in apply_handle_stream_commit() as we are apply_handle_commit(). > > > > Yes, we need to change these as well. I have tested using the POC > > patch and working fine. I will send the patch after some more > > testing. > > > > > > I think it is difficult to have a reliable test case for this but feel > > > free to propose if you have any ideas on the same. > > > > I am not sure how to write an automated test case for this. > > Here is the patch. > Few comments: ============== 1. + /* The synchronization worker runs in single transaction. */ + if (!am_tablesync_worker()) + { + /* + * Update origin state so we can restart streaming from correct position + * in case of crash. + */ + replorigin_session_origin_lsn = commit_data.end_lsn; + replorigin_session_origin_timestamp = commit_data.committime; + CommitTransactionCommand(); + pgstat_report_stat(false); + store_flush_position(commit_data.end_lsn); + } + else + { + /* Process any invalidation messages that might have accumulated. */ + AcceptInvalidationMessages(); + maybe_reread_subscription(); + } Here, why the check IsTransactionState() is not there similar to apply_handle_commit? Also, this code looks the same as in apply_handle_commit(), can we move this into a common function say apply_handle_commit_internal or something like that? If we decide to do so then we can move a few more things as mentioned below in the common function: apply_handle_commit() { .. in_remote_transaction = false; /* Process any tables that are being synchronized in parallel. */ process_syncing_tables(commit_data.end_lsn); .. } 2. @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s) { /* Cleanup the subxact info */ cleanup_subxact_info(); - CommitTransactionCommand(); + + if (!am_tablesync_worker()) + CommitTransactionCommand(); Here, also you can add a comment: "/* The synchronization worker runs in single transaction. */" -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 20, 2020 at 11:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Nov 20, 2020 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > And what about apply_handle_stream_abort? And, I guess we need to > > > > avoid other related things like update of > > > > replorigin_session_origin_lsn, replorigin_session_origin_timestamp, > > > > etc. in apply_handle_stream_commit() as we are apply_handle_commit(). > > > > > > Yes, we need to change these as well. I have tested using the POC > > > patch and working fine. I will send the patch after some more > > > testing. > > > > > > > > I think it is difficult to have a reliable test case for this but feel > > > > free to propose if you have any ideas on the same. > > > > > > I am not sure how to write an automated test case for this. > > > > Here is the patch. > > > > Few comments: > ============== > 1. > + /* The synchronization worker runs in single transaction. */ > + if (!am_tablesync_worker()) > + { > + /* > + * Update origin state so we can restart streaming from correct position > + * in case of crash. > + */ > + replorigin_session_origin_lsn = commit_data.end_lsn; > + replorigin_session_origin_timestamp = commit_data.committime; > + CommitTransactionCommand(); > + pgstat_report_stat(false); > + store_flush_position(commit_data.end_lsn); > + } > + else > + { > + /* Process any invalidation messages that might have accumulated. */ > + AcceptInvalidationMessages(); > + maybe_reread_subscription(); > + } > > Here, why the check IsTransactionState() is not there similar to > apply_handle_commit? Also, this code looks the same as in > apply_handle_commit(), can we move this into a common function say > apply_handle_commit_internal or something like that? If we decide to > do so then we can move a few more things as mentioned below in the > common function: Moved to the common function as suggested. > apply_handle_commit() > { > .. > in_remote_transaction = false; > > /* Process any tables that are being synchronized in parallel. */ > process_syncing_tables(commit_data.end_lsn); > .. > } This as well. > 2. > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s) > { > /* Cleanup the subxact info */ > cleanup_subxact_info(); > - CommitTransactionCommand(); > + > + if (!am_tablesync_worker()) > + CommitTransactionCommand(); > > Here, also you can add a comment: "/* The synchronization worker runs > in single transaction. */" > Done -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 2. > > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s) > > { > > /* Cleanup the subxact info */ > > cleanup_subxact_info(); > > - CommitTransactionCommand(); > > + > > + if (!am_tablesync_worker()) > > + CommitTransactionCommand(); > > > > Here, also you can add a comment: "/* The synchronization worker runs > > in single transaction. */" > > > > Done > Okay, thanks. I have slightly changed the comments and moved the newly added function in the attached patch. I have tested the reported scenario and additionally verified that the fix is good even if the tablesync worker processed the partial transaction due to streaming. This won't do any harm because later apply worker will replay the entire transaction. This could be a problem if the apply worker also tries to stream the transaction between the SUBREL_STATE_CATCHUP and SUBREL_STATE_SYNCDONE state because then apply worker might have skipped applying the partial transactions processed by tablesync worker. But, I have checked that the apply worker waits for sync worker to complete its processing between these two states. See process_syncing_tables_for_apply. Does this make sense? Peter, can you also please once test the attached and see if this fixes the problem for you as well? -- With Regards, Amit Kapila.
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Dilip Kumar
Date:
On Mon, Nov 23, 2020 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 2. > > > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s) > > > { > > > /* Cleanup the subxact info */ > > > cleanup_subxact_info(); > > > - CommitTransactionCommand(); > > > + > > > + if (!am_tablesync_worker()) > > > + CommitTransactionCommand(); > > > > > > Here, also you can add a comment: "/* The synchronization worker runs > > > in single transaction. */" > > > > > > > Done > > > > Okay, thanks. I have slightly changed the comments and moved the newly > added function in the attached patch. Okay, looks good to me. I have tested the reported > scenario and additionally verified that the fix is good even if the > tablesync worker processed the partial transaction due to streaming. > This won't do any harm because later apply worker will replay the > entire transaction. This could be a problem if the apply worker also > tries to stream the transaction between the SUBREL_STATE_CATCHUP and > SUBREL_STATE_SYNCDONE state because then apply worker might have > skipped applying the partial transactions processed by tablesync > worker. But, I have checked that the apply worker waits for sync > worker to complete its processing between these two states. Right See > process_syncing_tables_for_apply. Does this make sense? Yes, it makes sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Mon, Nov 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Peter, can you also please once test the attached and see if this > fixes the problem for you as well? I have reviewed the v3 patch code, and repeated the tablesync testing (for a stream-enabled SUBSCRIPTION) using the debugging technique previously described [1]. https://www.postgresql.org/message-id/CAHut%2BPt4PyKQCwqzQ%3DEFF%3DbpKKJD7XKt_S23F6L20ayQNxg77A%40mail.gmail.com Now the test was successful. The stream start/stop/commit/abort are all being handled by the tablesync worker without errors. === TESTING I performed the tablesync stream test by multiple different ways of inserting the en masse data: 1. Cause streaming (using a PREPARE TRANSACTION) ------------------------------------------------ psql -d test_pub -c "BEGIN;INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);PREPARE TRANSACTION 'test_prepared_tab';" psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';" tablesync does 'S' (stream start) tablesync does 'R' tablesync does 'I' (insert) x n tablesync does 'E' (stream end) tablesync does 'S' tablesync does 'I' x n tablesync does 'E' tablesync does { 'S', 'I' x n, 'E' } repeats 13 more times tablesync does 'c' (stream commit) - In handle_stream_commit the tablesync worker processes the spool file, calling apply_dispatch for all the messages - 'R' - { 'I' + should_apply_changes_for_rel = true } x 5000 - then calls process_syncing_tables_for_sync, which sets the state to SUBREL_STATE_SYNCDONE - then tablesync worker process exits Now the "apply" worker catches up. - it runs all the same messages again but does nothing because should_apply_changes_for_rel = false 2. Cause streaming (not using a PREPARE TRANSACTION) ---------------------------------------------------- psql -d test_pub -c "INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);" gives very similar results to above 3. Causing a stream abort to happen ----------------------------------- psql -d test_pub -c "BEGIN; INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i); INSERT INTO test_tab VALUES(4999, 'already exists'); END;" tablesync does 'S' (stream start) tablesync does 'E' (stream end) tablesync does 'A' (stream abort) tablesync does 'B' tablesync does 'C' - calls process_syncing_tables - tablesync worker process exits apply worker continues... === REVIEW COMMENTS Despite the tests working OK I thought there should be a couple small changes made for this patch, as follows. (1) process_sync_tables should be last IMO the process_syncing_tables should be always the *last* thing any apply handler does, because it seems a bad idea for the worker to change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact there is still a little bit more processing remaining. This is why I made the same modification in the other WIP patch v26-0006 [2] [2] https://www.postgresql.org/message-id/CAHut%2BPuQcLbjrBXM54%2By%2B%3DYsmEDFd3CCtp31K-_jS4Xka2vwbQ%40mail.gmail.com So, I think the process_syncing_tables() call should be put *after* the stream_cleanup_files() in function apply_handle_stream_commit(). But to do this means the process_syncing_tables has to be removed from your new apply_handle_commit_internal function. (2) misleading comment /* * Start a transaction on stream start, this transaction will be committed * on the stream stop. We need the transaction for handling the buffile, * used for serializing the streaming data and subxact info. */ That above comment (in the apply_handle_stream_start function) may now be inaccurate/misleading for the case of tablesync worker, because I think for tablesync you are explicitly avoiding the tx commit within the apply_handle_stream_stop(). --- Kind Regards, Peter Smith. Fujitsu Australia
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Wed, Nov 25, 2020 at 11:20 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Nov 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > REVIEW COMMENTS > > Despite the tests working OK > Thanks for the tests. > I thought there should be a couple small > changes made for this patch, as follows. > > (1) process_sync_tables should be last > > IMO the process_syncing_tables should be always the *last* thing any > apply handler does, because it seems a bad idea for the worker to > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact > there is still a little bit more processing remaining. > Hmm, what makes you think it is a bad idea, is there any comment which makes you feel so? As far as I understand, the only thing tablesync worker needs to ensure before reaching that state is it should apply till the required lsn which is done by the time it is called in the patch. I think doing it sooner is better because it will let apply worker start its work. In any case, this is not related to this bug-fix patch, so, if you want to build a case for doing it later then we can discuss it separately. > > (2) misleading comment > > /* > * Start a transaction on stream start, this transaction will be committed > * on the stream stop. We need the transaction for handling the buffile, > * used for serializing the streaming data and subxact info. > */ > > That above comment (in the apply_handle_stream_start function) may now > be inaccurate/misleading for the case of tablesync worker, because I > think for tablesync you are explicitly avoiding the tx commit within > the apply_handle_stream_stop(). > Okay, I think we can slightly adjust the comment as: "Start a transaction on stream start, this transaction will be committed on the stream stop unless it is a tablesync worker in which case it will be committed after processing all the messages. We need the transaction for handling the buffile, used for serializing the streaming data and subxact info.". I can update the patch once we agree on the previous point. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Wed, Nov 25, 2020 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I thought there should be a couple small > > changes made for this patch, as follows. > > > > (1) process_sync_tables should be last > > > > IMO the process_syncing_tables should be always the *last* thing any > > apply handler does, because it seems a bad idea for the worker to > > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact > > there is still a little bit more processing remaining. > > > > Hmm, what makes you think it is a bad idea, is there any comment which > makes you feel so? As far as I understand, the only thing tablesync > worker needs to ensure before reaching that state is it should apply > till the required lsn which is done by the time it is called in the > patch. I think doing it sooner is better because it will let apply > worker start its work. In any case, this is not related to this > bug-fix patch, so, if you want to build a case for doing it later then > we can discuss it separately. Yes, this review issue is independent of the initial streaming bug the patch is solving. But by refactoring this exact line of code into a new function apply_handle_commit_internal() I felt the patch is kind of now taking ownership of it - so it is not *really* unrelated to this patch anymore... (However if you prefer to treat this review item as a separate issue it is fine by me). AFAIK the process_syncing_tables function acts like the main handshake mechanism between the "tablesync" and "apply" workers. e.g. From "tablesync" worker POV - am I finished? Should I tell the apply worker I am DONE? Is it time to exit? e.g. From "apply" worker POV - have the tablesyncs caught up yet? Can I proceed? I do think these handshake functions ought to be consistently located (e.g always kept last in the apply handler functions where possible). It is true, a slightly different placement (e.g. like current code) still passes the tests, but I see more problems than benefits to keep it that way. IIUC for the tablesync worker to be executing some of these handlers at all is a quite a rare occurrence caused by unusual timing. I think putting the process_syncing_tables() call earlier with the objective to allow apply worker to proceed a few ms earlier (for the already rare case) is not worth it. OTOH, putting the handshake function consistently last does have benefits: - the consistent code is easier to read - none of the tablesync timing stuff is very easy to test (certainly not with automated regression tests). Keeping code consistent means less risk of introducing some unforeseen/untestable issue - that stream_cleanup_files(...) called by apply_handle_stream_commit() is supposed to be cleaning up file resources. IIUC by allowing the tablesync worker to call process_syncing_tables() before this cleanup it means the tablesync worker may decide that it is SUBREL_STATE_SYNCDONE and then just exit the process! So in this scenario that file resource cleanup will never get a chance to happen at all. Isn't that just a plain bug? > > > > > (2) misleading comment > > > > /* > > * Start a transaction on stream start, this transaction will be committed > > * on the stream stop. We need the transaction for handling the buffile, > > * used for serializing the streaming data and subxact info. > > */ > > > > That above comment (in the apply_handle_stream_start function) may now > > be inaccurate/misleading for the case of tablesync worker, because I > > think for tablesync you are explicitly avoiding the tx commit within > > the apply_handle_stream_stop(). > > > > Okay, I think we can slightly adjust the comment as: "Start a > transaction on stream start, this transaction will be committed on the > stream stop unless it is a tablesync worker in which case it will be > committed after processing all the messages. We need the transaction > for handling the buffile, used for serializing the streaming data and > subxact info.". The reworded comment looks ok now. --- Kind Regards, Peter Smith. Fujitsu Australia
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Thu, Nov 26, 2020 at 6:47 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Nov 25, 2020 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I thought there should be a couple small > > > changes made for this patch, as follows. > > > > > > (1) process_sync_tables should be last > > > > > > IMO the process_syncing_tables should be always the *last* thing any > > > apply handler does, because it seems a bad idea for the worker to > > > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact > > > there is still a little bit more processing remaining. > > > > > > > Hmm, what makes you think it is a bad idea, is there any comment which > > makes you feel so? As far as I understand, the only thing tablesync > > worker needs to ensure before reaching that state is it should apply > > till the required lsn which is done by the time it is called in the > > patch. I think doing it sooner is better because it will let apply > > worker start its work. In any case, this is not related to this > > bug-fix patch, so, if you want to build a case for doing it later then > > we can discuss it separately. > > Yes, this review issue is independent of the initial streaming bug the > patch is solving. But by refactoring this exact line of code into a > new function apply_handle_commit_internal() I felt the patch is kind > of now taking ownership of it - so it is not *really* unrelated to > this patch anymore... > > (However if you prefer to treat this review item as a separate issue > it is fine by me). > Let's try to see if we can have a common understanding before I push this. I don't think this is an issue neither did you presented any theory or test why you believe it is an issue. > AFAIK the process_syncing_tables function acts like the main handshake > mechanism between the "tablesync" and "apply" workers. > e.g. From "tablesync" worker POV - am I finished? Should I tell the > apply worker I am DONE? Is it time to exit? > e.g. From "apply" worker POV - have the tablesyncs caught up yet? Can I proceed? > Here, I think we need to have a common understanding of finished and or DONE. It is not defined by where that function is called but when it is appropriate to call it. I have tried to explain in my previous response but not sure you have read it carefully. > I do think these handshake functions ought to be consistently located > (e.g always kept last in the apply handler functions where possible). > There is no harm in doing so but you haven't presented any theory which makes me feel that is correct or more appropriate. > It is true, a slightly different placement (e.g. like current code) > still passes the tests, > Hmm, it is not only about passing the tests. It is not always true that if tests pass, the code is correct especially in such cases where writing test is not feasible. We need to understand the design principle behind doing so which I am trying to explain. > but I see more problems than benefits to keep > it that way. IIUC for the tablesync worker to be executing some of > these handlers at all is a quite a rare occurrence caused by unusual > timing. It is due to the reason that we want tablesync worker to apply the WAL up to LSN apply worker has read by that time so that the apply worker can continue from there. > I think putting the process_syncing_tables() call earlier with > the objective to allow apply worker to proceed a few ms earlier (for > the already rare case) is not worth it. > I think that is not the primary objective. It is also that having it in common function as the patch is doing allows us to not accidentally remove it or forget calling it from some other similar place. OTOH, I don't see any clear advantage of moving out of the common function. > OTOH, putting the handshake function consistently last does have benefits: > - the consistent code is easier to read > - none of the tablesync timing stuff is very easy to test (certainly > not with automated regression tests). Keeping code consistent means > less risk of introducing some unforeseen/untestable issue > > - that stream_cleanup_files(...) called by > apply_handle_stream_commit() is supposed to be cleaning up file > resources. IIUC by allowing the tablesync worker to call > process_syncing_tables() before this cleanup it means the tablesync > worker may decide that it is SUBREL_STATE_SYNCDONE and then just exit > the process! So in this scenario that file resource cleanup will never > get a chance to happen at all. Isn't that just a plain bug? > No, we clean up the files on process exit (via SharedFileSetDeleteOnProcExit). If we don't have such a mechism, it would be a problem in more number of cases then you describe. See comments atop src/backend/replication/logical/worker.c (especially the STREAMED TRANSACTIONS section). > > > > > > > > (2) misleading comment > > > > > > /* > > > * Start a transaction on stream start, this transaction will be committed > > > * on the stream stop. We need the transaction for handling the buffile, > > > * used for serializing the streaming data and subxact info. > > > */ > > > > > > That above comment (in the apply_handle_stream_start function) may now > > > be inaccurate/misleading for the case of tablesync worker, because I > > > think for tablesync you are explicitly avoiding the tx commit within > > > the apply_handle_stream_stop(). > > > > > > > Okay, I think we can slightly adjust the comment as: "Start a > > transaction on stream start, this transaction will be committed on the > > stream stop unless it is a tablesync worker in which case it will be > > committed after processing all the messages. We need the transaction > > for handling the buffile, used for serializing the streaming data and > > subxact info.". > > The reworded comment looks ok now. > Okay, will change once we agree on previous point. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Thu, Nov 26, 2020 at 1:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 26, 2020 at 6:47 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Nov 25, 2020 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I thought there should be a couple small > > > > changes made for this patch, as follows. > > > > > > > > (1) process_sync_tables should be last > > > > > > > > IMO the process_syncing_tables should be always the *last* thing any > > > > apply handler does, because it seems a bad idea for the worker to > > > > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact > > > > there is still a little bit more processing remaining. > > > > > > > > > > Hmm, what makes you think it is a bad idea, is there any comment which > > > makes you feel so? As far as I understand, the only thing tablesync > > > worker needs to ensure before reaching that state is it should apply > > > till the required lsn which is done by the time it is called in the > > > patch. I think doing it sooner is better because it will let apply > > > worker start its work. In any case, this is not related to this > > > bug-fix patch, so, if you want to build a case for doing it later then > > > we can discuss it separately. > > > > Yes, this review issue is independent of the initial streaming bug the > > patch is solving. But by refactoring this exact line of code into a > > new function apply_handle_commit_internal() I felt the patch is kind > > of now taking ownership of it - so it is not *really* unrelated to > > this patch anymore... > > > > (However if you prefer to treat this review item as a separate issue > > it is fine by me). > > > > Let's try to see if we can have a common understanding before I push > this. I don't think this is an issue neither did you presented any > theory or test why you believe it is an issue. > Right, there is no "issue", I only felt current code is more confusing that it needs to be: - Since process_syncing_tables is called many times by apply handlers I prefered a common code pattern (e.g. always last) because I would find it easier to understand. YMMV. - And I also thought it would be neater to simply move the process_syncing_tables by one line so you can do the stream file resource cleanup explicitly the normal way instead of relying on implicit cleanup as the tablesync process exits. But I also understand you may choose to leave everything as-is and that is still OK too. > > AFAIK the process_syncing_tables function acts like the main handshake > > mechanism between the "tablesync" and "apply" workers. > > e.g. From "tablesync" worker POV - am I finished? Should I tell the > > apply worker I am DONE? Is it time to exit? > > e.g. From "apply" worker POV - have the tablesyncs caught up yet? Can I proceed? > > > > Here, I think we need to have a common understanding of finished and > or DONE. It is not defined by where that function is called but when > it is appropriate to call it. I have tried to explain in my previous > response but not sure you have read it carefully. > > > I do think these handshake functions ought to be consistently located > > (e.g always kept last in the apply handler functions where possible). > > > > There is no harm in doing so but you haven't presented any theory > which makes me feel that is correct or more appropriate. > > > It is true, a slightly different placement (e.g. like current code) > > still passes the tests, > > > > Hmm, it is not only about passing the tests. It is not always true > that if tests pass, the code is correct especially in such cases where > writing test is not feasible. We need to understand the design > principle behind doing so which I am trying to explain. > > > but I see more problems than benefits to keep > > it that way. IIUC for the tablesync worker to be executing some of > > these handlers at all is a quite a rare occurrence caused by unusual > > timing. > > It is due to the reason that we want tablesync worker to apply the WAL > up to LSN apply worker has read by that time so that the apply worker > can continue from there. I think it is not always/only "up to LSN apply worker has read". IIUC, my tests of deliberately sending messages while tablesync is paused in the debugger means the tablesync worker will get *ahead* of the apply worker. > > > I think putting the process_syncing_tables() call earlier with > > the objective to allow apply worker to proceed a few ms earlier (for > > the already rare case) is not worth it. > > > > I think that is not the primary objective. It is also that having it > in common function as the patch is doing allows us to not accidentally > remove it or forget calling it from some other similar place. OTOH, I > don't see any clear advantage of moving out of the common function. > > > OTOH, putting the handshake function consistently last does have benefits: > > - the consistent code is easier to read > > - none of the tablesync timing stuff is very easy to test (certainly > > not with automated regression tests). Keeping code consistent means > > less risk of introducing some unforeseen/untestable issue > > > > - that stream_cleanup_files(...) called by > > apply_handle_stream_commit() is supposed to be cleaning up file > > resources. IIUC by allowing the tablesync worker to call > > process_syncing_tables() before this cleanup it means the tablesync > > worker may decide that it is SUBREL_STATE_SYNCDONE and then just exit > > the process! So in this scenario that file resource cleanup will never > > get a chance to happen at all. Isn't that just a plain bug? > > > > No, we clean up the files on process exit (via > SharedFileSetDeleteOnProcExit). If we don't have such a mechism, it > would be a problem in more number of cases then you describe. See > comments atop src/backend/replication/logical/worker.c (especially the > STREAMED TRANSACTIONS section). Got it. Thanks. --- Kind Regards, Peter Smith. Fujitsu Australia
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Thu, Nov 26, 2020 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > - And I also thought it would be neater to simply move the > process_syncing_tables by one line so you can do the stream file > resource cleanup explicitly the normal way instead of relying on > implicit cleanup as the tablesync process exits. > Okay, I see your point, so changed it accordingly and fixed the code comments as suggested by you. Does this address all your concerns/suggestions? -- With Regards, Amit Kapila.
Attachment
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Peter Smith
Date:
On Thu, Nov 26, 2020 at 11:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Okay, I see your point, so changed it accordingly and fixed the code > comments as suggested by you. Does this address all your > concerns/suggestions? Yes. Now the v4 patch addresses everything I previously mentioned. However, today I did notice one more small thing you may want to change. === REVIEW COMMENT: (1) missing comment? @@ -928,7 +917,9 @@ apply_handle_stream_abort(StringInfo s) /* write the updated subxact list */ subxact_info_write(MyLogicalRepWorker->subid, xid); - CommitTransactionCommand(); + + if (!am_tablesync_worker()) + CommitTransactionCommand(); } } That new condition seems to be missing a comment saying "/* The synchronization worker runs in a single transaction */". Such a comment accompanies all other am_tablesync_worker() conditions that look like this one. === I also re-ran the tablesync test using the v4 patch, and have stepped to see the normal stream file normal cleanup being executed by the tablesync worker. So apart from that trivial missing comment, I think now it is all good. --- Kind Regards, Peter Smith. Fujitsu Australia
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Fri, Nov 27, 2020 at 7:29 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Nov 26, 2020 at 11:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Okay, I see your point, so changed it accordingly and fixed the code > > comments as suggested by you. Does this address all your > > concerns/suggestions? > > Yes. Now the v4 patch addresses everything I previously mentioned. > > However, today I did notice one more small thing you may want to change. > > === > > REVIEW COMMENT: > > (1) missing comment? > > @@ -928,7 +917,9 @@ apply_handle_stream_abort(StringInfo s) > > /* write the updated subxact list */ > subxact_info_write(MyLogicalRepWorker->subid, xid); > - CommitTransactionCommand(); > + > + if (!am_tablesync_worker()) > + CommitTransactionCommand(); > } > } > > That new condition seems to be missing a comment saying "/* The > synchronization worker runs in a single transaction */". Such a > comment accompanies all other am_tablesync_worker() conditions that > look like this one. > Yeah, I had intentionally left it because, in the same function, few lines before we have that condition and comment, so adding it again didn't appear to be a good idea to me. > === > > I also re-ran the tablesync test using the v4 patch, and have stepped > to see the normal stream file normal cleanup being executed by the > tablesync worker. > > So apart from that trivial missing comment, I think now it is all good. > Cool, I'm planning to push this in a few minutes time. -- With Regards, Amit Kapila.
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
From
Amit Kapila
Date:
On Fri, Nov 27, 2020 at 7:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 27, 2020 at 7:29 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > === > > > > I also re-ran the tablesync test using the v4 patch, and have stepped > > to see the normal stream file normal cleanup being executed by the > > tablesync worker. > > > > So apart from that trivial missing comment, I think now it is all good. > > > > Cool, I'm planning to push this in a few minutes time. > Pushed. -- With Regards, Amit Kapila.