Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Column Filtering in Logical Replication |
Date | |
Msg-id | 3162f7f0-86ac-e3a9-dc55-0a660cb9bab1@enterprisedb.com Whole thread Raw |
In response to | Re: Column Filtering in Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Column Filtering in Logical Replication
Re: Column Filtering in Logical Replication Re: Column Filtering in Logical Replication |
List | pgsql-hackers |
On 3/18/22 06:52, Amit Kapila wrote: > On Fri, Mar 18, 2022 at 12:47 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> I pushed the second fix. Interestingly enough, wrasse failed in the >> 013_partition test. I don't see how that could be caused by this >> particular commit, though - see the pgsql-committers thread [1]. >> > > I have a theory about what's going on here. I think this is due to a > test added in your previous commit c91f71b9dc. The newly added test > added hangs in tablesync because there was no apply worker to set the > state to SUBREL_STATE_CATCHUP which blocked tablesync workers from > proceeding. > > See below logs from pogona [1]. > 2022-03-18 01:33:15.190 CET [2551176][client > backend][3/74:0][013_partition.pl] LOG: statement: ALTER SUBSCRIPTION > sub2 SET PUBLICATION pub_lower_level, pub_all > 2022-03-18 01:33:15.354 CET [2551193][logical replication > worker][4/57:0][] LOG: logical replication apply worker for > subscription "sub2" has started > 2022-03-18 01:33:15.605 CET [2551176][client > backend][:0][013_partition.pl] LOG: disconnection: session time: > 0:00:00.415 user=bf database=postgres host=[local] > 2022-03-18 01:33:15.607 CET [2551209][logical replication > worker][3/76:0][] LOG: logical replication table synchronization > worker for subscription "sub2", table "tab4_1" has started > 2022-03-18 01:33:15.609 CET [2551211][logical replication > worker][5/11:0][] LOG: logical replication table synchronization > worker for subscription "sub2", table "tab3" has started > 2022-03-18 01:33:15.617 CET [2551193][logical replication > worker][4/62:0][] LOG: logical replication apply worker for > subscription "sub2" will restart because of a parameter change > > You will notice that the apply worker is never restarted after a > parameter change. The reason was that the particular subscription > reaches the limit of max_sync_workers_per_subscription after which we > don't allow to restart the apply worker. I think you might want to > increase the values of > max_sync_workers_per_subscription/max_logical_replication_workers to > make it work. > Hmmm. So the theory is that in most runs we manage to sync the tables faster than starting the workers, so we don't hit the limit. But on some machines the sync worker takes a bit longer, we hit the limit. Seems possible, yes. Unfortunately we don't seem to log anything when we hit the limit, so hard to say for sure :-( I suggest we add a WARNING message to logicalrep_worker_launch or something. Not just because of this test, it seems useful in general. However, how come we don't retry the sync? Surely we don't just give up forever, that'd be a pretty annoying behavior. Presumably we just end up sleeping for a long time before restarting the sync worker, somewhere. >> I'd like to test & polish the main patch over the weekend, and get it >> committed early next week. Unless someone thinks it's definitely not >> ready for that ... >> > > I think it is in good shape but apart from cleanup, there are issues > with dependency handling which I have analyzed and reported as one of > the comments in the email [2]. I was getting some weird behavior > during my testing due to that. Apart from that still the patch has DDL > handling code in tablecmds.c which probably is not required. > Similarly, Shi-San has reported an issue with replica full in her > email [3]. It is up to you what to do here but it would be good if you > can once share the patch after fixing these issues so that we can > re-test/review it. Ah, thanks for reminding me - it's hard to keep track of all the issues in threads as long as this one. BTW do you have any opinion on the SET COLUMNS syntax? Peter Smith proposed to get rid of it in [1] but I'm not sure that's a good idea. Because if we ditch it, then removing the column list would look like this: ALTER PUBLICATION pub ALTER TABLE tab; And if we happen to add other per-table options, this would become pretty ambiguous. Actually, do we even want to allow resetting column lists like this? We don't allow this for row filters, so if you want to change a row filter you have to re-add the table, right? So maybe we should just ditch ALTER TABLE entirely. regards [4] https://www.postgresql.org/message-id/CAHut%2BPtc7Rh187eQKrxdUmUNWyfxz7OkhYAX%3DAW411Qwxya0LQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: