Thread: Re: [HACKERS] Interval for launching the table sync worker
At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp> > At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com> > > On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > I was thinking the same. > > > > > > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com> > > >> Hi all, > > >> > > >> While testing table sync worker for logical replication I noticed that > > >> if the table sync worker of logical replication failed to insert the > > >> data for whatever reason, the table sync worker process exits with > > >> error. And then the main apply worker launches the table sync worker > > >> again soon without interval. This routine is executed at very high > > >> frequency without interval. > > >> > > >> Should we do put a interval (wal_retrieve_interval or make a new GUC > > >> parameter?) for launching the table sync worker? Hmm. I was playing with something wrong. Now I see the invervals 5 seconds. No problem. > > > After introducing encoding conversion, untranslatable characters > > > in a published table causes this situation. > > > > I think it's better to make a new GUC parameter for the table sync > > worker. Having multiple behaviors in wal_retrieve_retry_interval is > > not good idea. Thought? So, this is working, sorry. > I prefer subscription option than GUC. Something like following. > > CREATE SUBSCRIPTION s1 CONNECTION 'blah' > PUBLICATION p1 WITH (noreconnect = true); > > Stored in pg_subscription? > > > > Interval can reduce > > > the frequence of reconnecting, but I think that walreciever > > > should refrain from reconnecting on unrecoverable(or repeating) > > > error in walsender. > > > > > > > Yeah, that's also considerable issue. > > But not to do now. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp> >> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com> >> > On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI >> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> > > I was thinking the same. >> > > >> > > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com> >> > >> Hi all, >> > >> >> > >> While testing table sync worker for logical replication I noticed that >> > >> if the table sync worker of logical replication failed to insert the >> > >> data for whatever reason, the table sync worker process exits with >> > >> error. And then the main apply worker launches the table sync worker >> > >> again soon without interval. This routine is executed at very high >> > >> frequency without interval. >> > >> >> > >> Should we do put a interval (wal_retrieve_interval or make a new GUC >> > >> parameter?) for launching the table sync worker? > > Hmm. I was playing with something wrong. Now I see the invervals > 5 seconds. No problem. Yeah, this issue happens only in the table sync worker. > >> > > After introducing encoding conversion, untranslatable characters >> > > in a published table causes this situation. >> > >> > I think it's better to make a new GUC parameter for the table sync >> > worker. Having multiple behaviors in wal_retrieve_retry_interval is >> > not good idea. Thought? > > So, this is working, sorry. > >> I prefer subscription option than GUC. Something like following. >> >> CREATE SUBSCRIPTION s1 CONNECTION 'blah' >> PUBLICATION p1 WITH (noreconnect = true); >> >> Stored in pg_subscription? >> >> > > Interval can reduce >> > > the frequence of reconnecting, but I think that walreciever >> > > should refrain from reconnecting on unrecoverable(or repeating) >> > > error in walsender. >> > > >> > >> > Yeah, that's also considerable issue. >> >> But not to do now. > I've added this as an open item, and sent a patch for this. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 06/04/17 16:44, Masahiko Sawada wrote: > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp> >>> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com> >>>> On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI >>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>> I was thinking the same. >>>>> >>>>> At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com> >>>>>> Hi all, >>>>>> >>>>>> While testing table sync worker for logical replication I noticed that >>>>>> if the table sync worker of logical replication failed to insert the >>>>>> data for whatever reason, the table sync worker process exits with >>>>>> error. And then the main apply worker launches the table sync worker >>>>>> again soon without interval. This routine is executed at very high >>>>>> frequency without interval. >>>>>> >>>>>> Should we do put a interval (wal_retrieve_interval or make a new GUC >>>>>> parameter?) for launching the table sync worker? >> >> Hmm. I was playing with something wrong. Now I see the invervals >> 5 seconds. No problem. > > Yeah, this issue happens only in the table sync worker. > >> >>>>> After introducing encoding conversion, untranslatable characters >>>>> in a published table causes this situation. >>>> >>>> I think it's better to make a new GUC parameter for the table sync >>>> worker. Having multiple behaviors in wal_retrieve_retry_interval is >>>> not good idea. Thought? >> >> So, this is working, sorry. >> >>> I prefer subscription option than GUC. Something like following. >>> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah' >>> PUBLICATION p1 WITH (noreconnect = true); >>> >>> Stored in pg_subscription? I don't think that's a very good solution, you'd lose replication on every network glitch, upstream server restart, etc. >>> >>>>> Interval can reduce >>>>> the frequence of reconnecting, but I think that walreciever >>>>> should refrain from reconnecting on unrecoverable(or repeating) >>>>> error in walsender. >>>>> >>>> >>>> Yeah, that's also considerable issue. >>> >>> But not to do now. >> > > I've added this as an open item, and sent a patch for this. > I am not exactly sure what's the open item from this thread. To use the wal_retrieve_interval to limit table sync restarts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <8c7afb37-be73-c6bd-80bc-e87522f0461a@2ndquadrant.com> > On 06/04/17 16:44, Masahiko Sawada wrote: > > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >>> I prefer subscription option than GUC. Something like following. > >>> > >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah' > >>> PUBLICATION p1 WITH (noreconnect = true); > >>> > >>> Stored in pg_subscription? > > I don't think that's a very good solution, you'd lose replication on > every network glitch, upstream server restart, etc. Yes, you're right. This would work if apply worker distinguishes permanent error. But it is overkill so far. > > I've added this as an open item, and sent a patch for this. > > > > I am not exactly sure what's the open item from this thread. To use the > wal_retrieve_interval to limit table sync restarts? It's not me. I also don't think this critical. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 7, 2017 at 1:23 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <8c7afb37-be73-c6bd-80bc-e87522f0461a@2ndquadrant.com> >> On 06/04/17 16:44, Masahiko Sawada wrote: >> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI >> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> >>> I prefer subscription option than GUC. Something like following. >> >>> >> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah' >> >>> PUBLICATION p1 WITH (noreconnect = true); >> >>> >> >>> Stored in pg_subscription? >> >> I don't think that's a very good solution, you'd lose replication on >> every network glitch, upstream server restart, etc. > > Yes, you're right. This would work if apply worker distinguishes > permanent error. But it is overkill so far. > >> > I've added this as an open item, and sent a patch for this. >> > >> >> I am not exactly sure what's the open item from this thread. To use the >> wal_retrieve_interval to limit table sync restarts? > > It's not me. I also don't think this critical. > Thank you for the comment. It's not critical but it could be problem. So I thought we should fix it before the PostgreSQL 10 release. If it's not appropriate as an open item I'll remove it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 4/7/17 01:10, Masahiko Sawada wrote: > It's not critical but it could be problem. So I thought we should fix > it before the PostgreSQL 10 release. If it's not appropriate as an > open item I'll remove it. You wrote that you "sent" a patch, but I don't see a patch anywhere. I think a nonintrusive patch for this could be considered. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 4/7/17 01:10, Masahiko Sawada wrote: >> It's not critical but it could be problem. So I thought we should fix >> it before the PostgreSQL 10 release. If it's not appropriate as an >> open item I'll remove it. > > You wrote that you "sent" a patch, but I don't see a patch anywhere. > > I think a nonintrusive patch for this could be considered. Oops, I made a mistake. I'll send a patch tomorrow. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 4/7/17 01:10, Masahiko Sawada wrote: >>> It's not critical but it could be problem. So I thought we should fix >>> it before the PostgreSQL 10 release. If it's not appropriate as an >>> open item I'll remove it. >> >> You wrote that you "sent" a patch, but I don't see a patch anywhere. >> >> I think a nonintrusive patch for this could be considered. > > Oops, I made a mistake. I'll send a patch tomorrow. > I've attached the patch. This patch introduces GUC parameter table_sync_retry_interval which controls the interval of launching the table sync worker process. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 10/04/17 11:02, Masahiko Sawada wrote: > On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> On 4/7/17 01:10, Masahiko Sawada wrote: >>>> It's not critical but it could be problem. So I thought we should fix >>>> it before the PostgreSQL 10 release. If it's not appropriate as an >>>> open item I'll remove it. >>> >>> You wrote that you "sent" a patch, but I don't see a patch anywhere. >>> >>> I think a nonintrusive patch for this could be considered. >> >> Oops, I made a mistake. I'll send a patch tomorrow. >> > > I've attached the patch. This patch introduces GUC parameter > table_sync_retry_interval which controls the interval of launching the > table sync worker process. > I don't think solution is quite this simple. This will cause all table sync workers to be delayed which means concurrency will suffer and the initial sync of all tables will take much longer especially if there is little data. We need a way to either detect if we are launching same worker that was already launched before, or alternatively if we are launching crashed worker and only then apply the delay. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 4/10/17 08:10, Petr Jelinek wrote: > I don't think solution is quite this simple. This will cause all table > sync workers to be delayed which means concurrency will suffer and the > initial sync of all tables will take much longer especially if there is > little data. We need a way to either detect if we are launching same > worker that was already launched before, or alternatively if we are > launching crashed worker and only then apply the delay. Perhaps instead of a global last_start_time, we store a per relation last_start_time in SubscriptionRelState? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 4/10/17 08:10, Petr Jelinek wrote: >> I don't think solution is quite this simple. This will cause all table >> sync workers to be delayed which means concurrency will suffer and the >> initial sync of all tables will take much longer especially if there is >> little data. We need a way to either detect if we are launching same >> worker that was already launched before, or alternatively if we are >> launching crashed worker and only then apply the delay. > > Perhaps instead of a global last_start_time, we store a per relation > last_start_time in SubscriptionRelState? I was thinking the same. But a problem is that the list of SubscriptionRelState is refreshed whenever the syncing table state becomes invalid (table_state_valid = false). I guess we need to improve these logic including GetSubscriptionNotReadyRelations(). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 4/12/17 00:48, Masahiko Sawada wrote: > On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >> Perhaps instead of a global last_start_time, we store a per relation >> last_start_time in SubscriptionRelState? > > I was thinking the same. But a problem is that the list of > SubscriptionRelState is refreshed whenever the syncing table state > becomes invalid (table_state_valid = false). I guess we need to > improve these logic including GetSubscriptionNotReadyRelations(). The table states are invalidated on a syscache callback from pg_subscription_rel, which happens roughly speaking when a table finishes the initial sync. So if we're worried about failing tablesync workers relaunching to quickly, this would only be a problem if a tablesync of another table finishes right in that restart window. That doesn't seem a terrible issue to me. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 4/12/17 00:48, Masahiko Sawada wrote: >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>> Perhaps instead of a global last_start_time, we store a per relation >>> last_start_time in SubscriptionRelState? >> >> I was thinking the same. But a problem is that the list of >> SubscriptionRelState is refreshed whenever the syncing table state >> becomes invalid (table_state_valid = false). I guess we need to >> improve these logic including GetSubscriptionNotReadyRelations(). > > The table states are invalidated on a syscache callback from > pg_subscription_rel, which happens roughly speaking when a table > finishes the initial sync. So if we're worried about failing tablesync > workers relaunching to quickly, this would only be a problem if a > tablesync of another table finishes right in that restart window. That > doesn't seem a terrible issue to me. > I think the table states are invalidated whenever the table sync worker starts, because the table sync worker updates its status of pg_subscription_rel and commits it before starting actual copy. So we cannot rely on that. I thought we can store last_start_time into pg_subscription_rel but it might be overkill. I'm now thinking to change GetSubscriptionNotReadyRealtions so that last_start_time in SubscriptionRelState is taken over to new list. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 4/12/17 00:48, Masahiko Sawada wrote: >>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>> Perhaps instead of a global last_start_time, we store a per relation >>>> last_start_time in SubscriptionRelState? >>> >>> I was thinking the same. But a problem is that the list of >>> SubscriptionRelState is refreshed whenever the syncing table state >>> becomes invalid (table_state_valid = false). I guess we need to >>> improve these logic including GetSubscriptionNotReadyRelations(). >> >> The table states are invalidated on a syscache callback from >> pg_subscription_rel, which happens roughly speaking when a table >> finishes the initial sync. So if we're worried about failing tablesync >> workers relaunching to quickly, this would only be a problem if a >> tablesync of another table finishes right in that restart window. That >> doesn't seem a terrible issue to me. >> > > I think the table states are invalidated whenever the table sync > worker starts, because the table sync worker updates its status of > pg_subscription_rel and commits it before starting actual copy. So we > cannot rely on that. I thought we can store last_start_time into > pg_subscription_rel but it might be overkill. I'm now thinking to > change GetSubscriptionNotReadyRealtions so that last_start_time in > SubscriptionRelState is taken over to new list. > Attached the latest patch. It didn't actually necessary to change GetSubscriptionNotReadyRelations. I just changed the logic refreshing the sync table state list. Please give me feedback. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
I confused sync and apply workers. sync worker failure at start causes immediate retries. At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com> > On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 4/12/17 00:48, Masahiko Sawada wrote: > >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut > >>> Perhaps instead of a global last_start_time, we store a per relation > >>> last_start_time in SubscriptionRelState? > >> > >> I was thinking the same. But a problem is that the list of > >> SubscriptionRelState is refreshed whenever the syncing table state > >> becomes invalid (table_state_valid = false). I guess we need to > >> improve these logic including GetSubscriptionNotReadyRelations(). > > > > The table states are invalidated on a syscache callback from > > pg_subscription_rel, which happens roughly speaking when a table > > finishes the initial sync. So if we're worried about failing tablesync > > workers relaunching to quickly, this would only be a problem if a > > tablesync of another table finishes right in that restart window. That > > doesn't seem a terrible issue to me. > > > > I think the table states are invalidated whenever the table sync > worker starts, because the table sync worker updates its status of > pg_subscription_rel and commits it before starting actual copy. So we > cannot rely on that. I thought we can store last_start_time into > pg_subscription_rel but it might be overkill. I'm now thinking to > change GetSubscriptionNotReadyRealtions so that last_start_time in > SubscriptionRelState is taken over to new list. This resolves the problem but, if I understand correctly, the many pallocs in process_syncing_tables_for_apply() is working on ApplyContext and the context is reset before the next visit here (in LogicalRepApplyLoop). Although this is not a problem of this patch, this is a problem generally. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Ouch! I replied to wrong mail. At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170413.195504.89348773.horiguchi.kyotaro@lab.ntt.co.jp> > I confused sync and apply workers. > sync worker failure at start causes immediate retries. > > At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com> > > On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut > > <peter.eisentraut@2ndquadrant.com> wrote: > > > On 4/12/17 00:48, Masahiko Sawada wrote: > > >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut > > >>> Perhaps instead of a global last_start_time, we store a per relation > > >>> last_start_time in SubscriptionRelState? > > >> > > >> I was thinking the same. But a problem is that the list of > > >> SubscriptionRelState is refreshed whenever the syncing table state > > >> becomes invalid (table_state_valid = false). I guess we need to > > >> improve these logic including GetSubscriptionNotReadyRelations(). > > > > > > The table states are invalidated on a syscache callback from > > > pg_subscription_rel, which happens roughly speaking when a table > > > finishes the initial sync. So if we're worried about failing tablesync > > > workers relaunching to quickly, this would only be a problem if a > > > tablesync of another table finishes right in that restart window. That > > > doesn't seem a terrible issue to me. > > > > > > > I think the table states are invalidated whenever the table sync > > worker starts, because the table sync worker updates its status of > > pg_subscription_rel and commits it before starting actual copy. So we > > cannot rely on that. I thought we can store last_start_time into > > pg_subscription_rel but it might be overkill. I'm now thinking to > > change GetSubscriptionNotReadyRealtions so that last_start_time in > > SubscriptionRelState is taken over to new list. The right target of "This" below is found at the following URL. https://www.postgresql.org/message-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG%2Bekd7SkzLmg%40mail.gmail.com > This resolves the problem but, if I understand correctly, the > many pallocs in process_syncing_tables_for_apply() is working on > ApplyContext and the context is reset before the next visit here > (in LogicalRepApplyLoop). > > Although this is not a problem of this patch, this is a problem > generally. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 13/04/17 13:01, Kyotaro HORIGUCHI wrote: > Ouch! I replied to wrong mail. > > At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20170413.195504.89348773.horiguchi.kyotaro@lab.ntt.co.jp> >> I confused sync and apply workers. >> sync worker failure at start causes immediate retries. >> >> At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com> >>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut >>> <peter.eisentraut@2ndquadrant.com> wrote: >>>> On 4/12/17 00:48, Masahiko Sawada wrote: >>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>>>> Perhaps instead of a global last_start_time, we store a per relation >>>>>> last_start_time in SubscriptionRelState? >>>>> >>>>> I was thinking the same. But a problem is that the list of >>>>> SubscriptionRelState is refreshed whenever the syncing table state >>>>> becomes invalid (table_state_valid = false). I guess we need to >>>>> improve these logic including GetSubscriptionNotReadyRelations(). >>>> >>>> The table states are invalidated on a syscache callback from >>>> pg_subscription_rel, which happens roughly speaking when a table >>>> finishes the initial sync. So if we're worried about failing tablesync >>>> workers relaunching to quickly, this would only be a problem if a >>>> tablesync of another table finishes right in that restart window. That >>>> doesn't seem a terrible issue to me. >>>> >>> >>> I think the table states are invalidated whenever the table sync >>> worker starts, because the table sync worker updates its status of >>> pg_subscription_rel and commits it before starting actual copy. So we >>> cannot rely on that. I thought we can store last_start_time into >>> pg_subscription_rel but it might be overkill. I'm now thinking to >>> change GetSubscriptionNotReadyRealtions so that last_start_time in >>> SubscriptionRelState is taken over to new list. > > The right target of "This" below is found at the following URL. > > https://www.postgresql.org/message-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG%2Bekd7SkzLmg%40mail.gmail.com > >> This resolves the problem but, if I understand correctly, the >> many pallocs in process_syncing_tables_for_apply() is working on >> ApplyContext and the context is reset before the next visit here >> (in LogicalRepApplyLoop). >> >> Although this is not a problem of this patch, this is a problem >> generally. Huh? We explicitly switch to CacheMemoryContext before pallocing anything that should survive long term. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13/04/17 12:23, Masahiko Sawada wrote: > On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> On 4/12/17 00:48, Masahiko Sawada wrote: >>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>>> Perhaps instead of a global last_start_time, we store a per relation >>>>> last_start_time in SubscriptionRelState? >>>> >>>> I was thinking the same. But a problem is that the list of >>>> SubscriptionRelState is refreshed whenever the syncing table state >>>> becomes invalid (table_state_valid = false). I guess we need to >>>> improve these logic including GetSubscriptionNotReadyRelations(). >>> >>> The table states are invalidated on a syscache callback from >>> pg_subscription_rel, which happens roughly speaking when a table >>> finishes the initial sync. So if we're worried about failing tablesync >>> workers relaunching to quickly, this would only be a problem if a >>> tablesync of another table finishes right in that restart window. That >>> doesn't seem a terrible issue to me. >>> >> >> I think the table states are invalidated whenever the table sync >> worker starts, because the table sync worker updates its status of >> pg_subscription_rel and commits it before starting actual copy. So we >> cannot rely on that. I thought we can store last_start_time into >> pg_subscription_rel but it might be overkill. I'm now thinking to >> change GetSubscriptionNotReadyRealtions so that last_start_time in >> SubscriptionRelState is taken over to new list. >> > > Attached the latest patch. It didn't actually necessary to change > GetSubscriptionNotReadyRelations. I just changed the logic refreshing > the sync table state list. > Please give me feedback. > Hmm this might work. Although I was actually wondering if we could store the last start timestamp in the worker shared memory and do some magic with that (ie, not clearing subid and relid and try to then do rate limiting based on subid+relid+timestamp stored in shmem). That would then work same way for the main apply workers as well. It would have the disadvantage that if some tables were consistently failing, no other tables could get synchronized as the amount of workers is limited. OTOH the approach chosen in the patch will first try all tables and only then start rate limiting, not quite sure which is better. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
At Thu, 13 Apr 2017 20:02:33 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <a06d19af-a63a-c546-873c-818b26f4ef10@2ndquadrant.com> > >> Although this is not a problem of this patch, this is a problem > >> generally. > > Huh? We explicitly switch to CacheMemoryContext before pallocing > anything that should survive long term. Ah.. yes, sorry for the noise. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 13/04/17 12:23, Masahiko Sawada wrote: >> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut >>> <peter.eisentraut@2ndquadrant.com> wrote: >>>> On 4/12/17 00:48, Masahiko Sawada wrote: >>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>>>> Perhaps instead of a global last_start_time, we store a per relation >>>>>> last_start_time in SubscriptionRelState? >>>>> >>>>> I was thinking the same. But a problem is that the list of >>>>> SubscriptionRelState is refreshed whenever the syncing table state >>>>> becomes invalid (table_state_valid = false). I guess we need to >>>>> improve these logic including GetSubscriptionNotReadyRelations(). >>>> >>>> The table states are invalidated on a syscache callback from >>>> pg_subscription_rel, which happens roughly speaking when a table >>>> finishes the initial sync. So if we're worried about failing tablesync >>>> workers relaunching to quickly, this would only be a problem if a >>>> tablesync of another table finishes right in that restart window. That >>>> doesn't seem a terrible issue to me. >>>> >>> >>> I think the table states are invalidated whenever the table sync >>> worker starts, because the table sync worker updates its status of >>> pg_subscription_rel and commits it before starting actual copy. So we >>> cannot rely on that. I thought we can store last_start_time into >>> pg_subscription_rel but it might be overkill. I'm now thinking to >>> change GetSubscriptionNotReadyRealtions so that last_start_time in >>> SubscriptionRelState is taken over to new list. >>> >> >> Attached the latest patch. It didn't actually necessary to change >> GetSubscriptionNotReadyRelations. I just changed the logic refreshing >> the sync table state list. >> Please give me feedback. >> > > Hmm this might work. Although I was actually wondering if we could store > the last start timestamp in the worker shared memory and do some magic > with that (ie, not clearing subid and relid and try to then do rate > limiting based on subid+relid+timestamp stored in shmem). That would > then work same way for the main apply workers as well. It would have the > disadvantage that if some tables were consistently failing, no other > tables could get synchronized as the amount of workers is limited. Hmm I guess that it's not a good design that a table sync worker and a apply worker for a relation takes sole possession of a worker slot until it successes. It would bother each other. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 14/04/17 12:18, Masahiko Sawada wrote: > On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> On 13/04/17 12:23, Masahiko Sawada wrote: >>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut >>>> <peter.eisentraut@2ndquadrant.com> wrote: >>>>> On 4/12/17 00:48, Masahiko Sawada wrote: >>>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>>>>> Perhaps instead of a global last_start_time, we store a per relation >>>>>>> last_start_time in SubscriptionRelState? >>>>>> >>>>>> I was thinking the same. But a problem is that the list of >>>>>> SubscriptionRelState is refreshed whenever the syncing table state >>>>>> becomes invalid (table_state_valid = false). I guess we need to >>>>>> improve these logic including GetSubscriptionNotReadyRelations(). >>>>> >>>>> The table states are invalidated on a syscache callback from >>>>> pg_subscription_rel, which happens roughly speaking when a table >>>>> finishes the initial sync. So if we're worried about failing tablesync >>>>> workers relaunching to quickly, this would only be a problem if a >>>>> tablesync of another table finishes right in that restart window. That >>>>> doesn't seem a terrible issue to me. >>>>> >>>> >>>> I think the table states are invalidated whenever the table sync >>>> worker starts, because the table sync worker updates its status of >>>> pg_subscription_rel and commits it before starting actual copy. So we >>>> cannot rely on that. I thought we can store last_start_time into >>>> pg_subscription_rel but it might be overkill. I'm now thinking to >>>> change GetSubscriptionNotReadyRealtions so that last_start_time in >>>> SubscriptionRelState is taken over to new list. >>>> >>> >>> Attached the latest patch. It didn't actually necessary to change >>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing >>> the sync table state list. >>> Please give me feedback. >>> >> >> Hmm this might work. Although I was actually wondering if we could store >> the last start timestamp in the worker shared memory and do some magic >> with that (ie, not clearing subid and relid and try to then do rate >> limiting based on subid+relid+timestamp stored in shmem). That would >> then work same way for the main apply workers as well. It would have the >> disadvantage that if some tables were consistently failing, no other >> tables could get synchronized as the amount of workers is limited. > > Hmm I guess that it's not a good design that a table sync worker and a > apply worker for a relation takes sole possession of a worker slot > until it successes. It would bother each other. > Not sure what you mean by apply worker for relation, I meant the main subscription apply worker, the "per relation apply worker" is same as tablesync worker. Thinking about the possible failure scenarios more, I guess you are right. Because if there is "global" event for the publisher/subscriber connection (ie network goes down or something) it will affect the main worker as well so it won't launch anything. If there is table specific issue it will affect only that table. The corner-cases with doing it per table where we launch tablesync needlessly are pretty much just a) connection limit on publisher reached, b) slot limit on publisher reached, that's probably okay. We might be able to catch these two specifically and do some global limiting based on those (something like your original patch except the start-time global variable would only be set on specific error and stored in shmem), but that does not need to be solved immediately. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 14, 2017 at 9:14 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 14/04/17 12:18, Masahiko Sawada wrote: >> On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek >> <petr.jelinek@2ndquadrant.com> wrote: >>> On 13/04/17 12:23, Masahiko Sawada wrote: >>>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut >>>>> <peter.eisentraut@2ndquadrant.com> wrote: >>>>>> On 4/12/17 00:48, Masahiko Sawada wrote: >>>>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>>>>>> Perhaps instead of a global last_start_time, we store a per relation >>>>>>>> last_start_time in SubscriptionRelState? >>>>>>> >>>>>>> I was thinking the same. But a problem is that the list of >>>>>>> SubscriptionRelState is refreshed whenever the syncing table state >>>>>>> becomes invalid (table_state_valid = false). I guess we need to >>>>>>> improve these logic including GetSubscriptionNotReadyRelations(). >>>>>> >>>>>> The table states are invalidated on a syscache callback from >>>>>> pg_subscription_rel, which happens roughly speaking when a table >>>>>> finishes the initial sync. So if we're worried about failing tablesync >>>>>> workers relaunching to quickly, this would only be a problem if a >>>>>> tablesync of another table finishes right in that restart window. That >>>>>> doesn't seem a terrible issue to me. >>>>>> >>>>> >>>>> I think the table states are invalidated whenever the table sync >>>>> worker starts, because the table sync worker updates its status of >>>>> pg_subscription_rel and commits it before starting actual copy. So we >>>>> cannot rely on that. I thought we can store last_start_time into >>>>> pg_subscription_rel but it might be overkill. I'm now thinking to >>>>> change GetSubscriptionNotReadyRealtions so that last_start_time in >>>>> SubscriptionRelState is taken over to new list. >>>>> >>>> >>>> Attached the latest patch. It didn't actually necessary to change >>>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing >>>> the sync table state list. >>>> Please give me feedback. >>>> >>> >>> Hmm this might work. Although I was actually wondering if we could store >>> the last start timestamp in the worker shared memory and do some magic >>> with that (ie, not clearing subid and relid and try to then do rate >>> limiting based on subid+relid+timestamp stored in shmem). That would >>> then work same way for the main apply workers as well. It would have the >>> disadvantage that if some tables were consistently failing, no other >>> tables could get synchronized as the amount of workers is limited. >> >> Hmm I guess that it's not a good design that a table sync worker and a >> apply worker for a relation takes sole possession of a worker slot >> until it successes. It would bother each other. >> > > Not sure what you mean by apply worker for relation, I meant the main > subscription apply worker, the "per relation apply worker" is same as > tablesync worker. Oops, I made a mistake. I meant just a apply worker. > > Thinking about the possible failure scenarios more, I guess you are > right. Because if there is "global" event for the publisher/subscriber > connection (ie network goes down or something) it will affect the main > worker as well so it won't launch anything. If there is table specific > issue it will affect only that table. > > The corner-cases with doing it per table where we launch tablesync > needlessly are pretty much just a) connection limit on publisher > reached, b) slot limit on publisher reached, that's probably okay. We > might be able to catch these two specifically and do some global > limiting based on those (something like your original patch except the > start-time global variable would only be set on specific error and > stored in shmem), but that does not need to be solved immediately. > > -- > Petr Jelinek http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 4/13/17 06:23, Masahiko Sawada wrote: > Attached the latest patch. It didn't actually necessary to change > GetSubscriptionNotReadyRelations. I just changed the logic refreshing > the sync table state list. > Please give me feedback. I think this is a reasonable direction, but do we really need table_sync_retry_interval separate from wal_retrieve_retry_interval? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/13/17 18:09, Petr Jelinek wrote: > It would have the > disadvantage that if some tables were consistently failing, no other > tables could get synchronized as the amount of workers is limited. OTOH > the approach chosen in the patch will first try all tables and only then > start rate limiting, not quite sure which is better. I think the latter is better. One of the scenarios cited originally somewhere was one table failing because of an encoding issue. So it's useful to allow other tables to continue. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/13/17 06:23, Masahiko Sawada wrote: > Attached the latest patch. It didn't actually necessary to change > GetSubscriptionNotReadyRelations. I just changed the logic refreshing > the sync table state list. I think this was the right direction, but then I got worried about having a loop within a loop to copy over the last start times. If you have very many tables, that could be a big nested loop. Here is an alternative proposal to store the last start times in a hash table. (We also might want to lower the interval for the test suite, because there are intentional failures in there, and this patch or one like it will cause the tests to run a few seconds longer.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 18/04/17 16:22, Peter Eisentraut wrote: > On 4/13/17 06:23, Masahiko Sawada wrote: >> Attached the latest patch. It didn't actually necessary to change >> GetSubscriptionNotReadyRelations. I just changed the logic refreshing >> the sync table state list. > > I think this was the right direction, but then I got worried about > having a loop within a loop to copy over the last start times. If you > have very many tables, that could be a big nested loop. > > Here is an alternative proposal to store the last start times in a hash > table. > Hmm if we create hashtable for this, I'd say create hashtable for the whole table_states then. The reason why it's list now was that it seemed unnecessary to have hashtable when it will be empty almost always but there is no need to have both hashtable + list IMHO. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 18, 2017 at 11:22 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 4/13/17 06:23, Masahiko Sawada wrote: >> Attached the latest patch. It didn't actually necessary to change >> GetSubscriptionNotReadyRelations. I just changed the logic refreshing >> the sync table state list. > > I think this was the right direction, but then I got worried about > having a loop within a loop to copy over the last start times. If you > have very many tables, that could be a big nested loop. > > Here is an alternative proposal to store the last start times in a hash > table. > If we use wal_retrieve_retry_interval for the table sync worker, I think we need to update the documentation as well. Currently the documentation mentions that a bit, but since wal_retrieve_retry_interval will be used at three different places for different reason it would confuse the user. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 4/18/17 11:59, Petr Jelinek wrote: > Hmm if we create hashtable for this, I'd say create hashtable for the > whole table_states then. The reason why it's list now was that it seemed > unnecessary to have hashtable when it will be empty almost always but > there is no need to have both hashtable + list IMHO. The difference is that we blow away the list of states when the catalog changes, but we keep the hash table with the start times around. We need two things with different life times. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 18/04/17 18:14, Peter Eisentraut wrote: > On 4/18/17 11:59, Petr Jelinek wrote: >> Hmm if we create hashtable for this, I'd say create hashtable for the >> whole table_states then. The reason why it's list now was that it seemed >> unnecessary to have hashtable when it will be empty almost always but >> there is no need to have both hashtable + list IMHO. > > The difference is that we blow away the list of states when the catalog > changes, but we keep the hash table with the start times around. We > need two things with different life times. > Why can't we just update the hashtable based on the catalog? I mean once the record is not needed in the list, the table has been synced so there is no need for the timestamp either since we'll not try to start the worker again. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> > On 18/04/17 18:14, Peter Eisentraut wrote: > > On 4/18/17 11:59, Petr Jelinek wrote: > >> Hmm if we create hashtable for this, I'd say create hashtable for the > >> whole table_states then. The reason why it's list now was that it seemed > >> unnecessary to have hashtable when it will be empty almost always but > >> there is no need to have both hashtable + list IMHO. I understant that but I also don't like the frequent palloc/pfree in long-lasting context and double loop like Peter. > > The difference is that we blow away the list of states when the catalog > > changes, but we keep the hash table with the start times around. We > > need two things with different life times. On the other hand, hash seems overdone. Addition to that, the hash-version leaks stale entries while subscriptions are modified. But vacuuming them costs high. > Why can't we just update the hashtable based on the catalog? I mean once > the record is not needed in the list, the table has been synced so there > is no need for the timestamp either since we'll not try to start the > worker again. Considering the anticipated complexity when many subscriptions exist in the list, and complexity to remove stale entries in the hash, I'm inclining toward poroposing just to add 'worker_start' in pg_subscription_rel. We already have the similars in pg_stat_activity and pg_stat_replication. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >> On 18/04/17 18:14, Peter Eisentraut wrote: >> > On 4/18/17 11:59, Petr Jelinek wrote: >> >> Hmm if we create hashtable for this, I'd say create hashtable for the >> >> whole table_states then. The reason why it's list now was that it seemed >> >> unnecessary to have hashtable when it will be empty almost always but >> >> there is no need to have both hashtable + list IMHO. > > I understant that but I also don't like the frequent palloc/pfree > in long-lasting context and double loop like Peter. > >> > The difference is that we blow away the list of states when the catalog >> > changes, but we keep the hash table with the start times around. We >> > need two things with different life times. > > On the other hand, hash seems overdone. Addition to that, the > hash-version leaks stale entries while subscriptions are > modified. But vacuuming them costs high. > >> Why can't we just update the hashtable based on the catalog? I mean once >> the record is not needed in the list, the table has been synced so there >> is no need for the timestamp either since we'll not try to start the >> worker again. I guess the table sync worker for the same table could need to be started again. For example, please image a case where the table belonging to the publication is removed from it and the corresponding subscription is refreshed, and then the table is added to it again. We have the record of the table with timestamp in the hash table when the table sync in the first time, but the table sync after refreshed could have to wait for the interval. > > Considering the anticipated complexity when many subscriptions > exist in the list, and complexity to remove stale entries in the > hash, I'm inclining toward poroposing just to add 'worker_start' > in pg_subscription_rel. We already have the similars in > pg_stat_activity and pg_stat_replication. > I was thinking the same. But I'm concerned last start time of table sync worker is not kind of useful information for the user and we already have similar value in pg_stat_activity (pg_stat_replication.backend_start is actually taken from pg_stat_activity.backend_start). I'm not sure whether it's good idea to show the slightly shifed timestamps having same meaning in two system view. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 19/04/17 14:42, Masahiko Sawada wrote: > On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>> whole table_states then. The reason why it's list now was that it seemed >>>>> unnecessary to have hashtable when it will be empty almost always but >>>>> there is no need to have both hashtable + list IMHO. >> >> I understant that but I also don't like the frequent palloc/pfree >> in long-lasting context and double loop like Peter. >> >>>> The difference is that we blow away the list of states when the catalog >>>> changes, but we keep the hash table with the start times around. We >>>> need two things with different life times. >> >> On the other hand, hash seems overdone. Addition to that, the >> hash-version leaks stale entries while subscriptions are >> modified. But vacuuming them costs high. >> >>> Why can't we just update the hashtable based on the catalog? I mean once >>> the record is not needed in the list, the table has been synced so there >>> is no need for the timestamp either since we'll not try to start the >>> worker again. > > I guess the table sync worker for the same table could need to be > started again. For example, please image a case where the table > belonging to the publication is removed from it and the corresponding > subscription is refreshed, and then the table is added to it again. We > have the record of the table with timestamp in the hash table when the > table sync in the first time, but the table sync after refreshed could > have to wait for the interval. > But why do we want to wait in such case where user has explicitly requested refresh? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 19/04/17 14:42, Masahiko Sawada wrote: >> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>> there is no need to have both hashtable + list IMHO. >>> >>> I understant that but I also don't like the frequent palloc/pfree >>> in long-lasting context and double loop like Peter. >>> >>>>> The difference is that we blow away the list of states when the catalog >>>>> changes, but we keep the hash table with the start times around. We >>>>> need two things with different life times. >>> >>> On the other hand, hash seems overdone. Addition to that, the >>> hash-version leaks stale entries while subscriptions are >>> modified. But vacuuming them costs high. >>> >>>> Why can't we just update the hashtable based on the catalog? I mean once >>>> the record is not needed in the list, the table has been synced so there >>>> is no need for the timestamp either since we'll not try to start the >>>> worker again. >> >> I guess the table sync worker for the same table could need to be >> started again. For example, please image a case where the table >> belonging to the publication is removed from it and the corresponding >> subscription is refreshed, and then the table is added to it again. We >> have the record of the table with timestamp in the hash table when the >> table sync in the first time, but the table sync after refreshed could >> have to wait for the interval. >> > > But why do we want to wait in such case where user has explicitly > requested refresh? > Yeah, sorry, I meant that we don't want to wait but cannot launch the tablesync worker in such case. But after more thought, the latest patch Peter proposed has the same problem. Perhaps we need to scan always whole pg_subscription_rel and remove the entry if the corresponding table get synced. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 19/04/17 15:57, Masahiko Sawada wrote: > On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> On 19/04/17 14:42, Masahiko Sawada wrote: >>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>>> there is no need to have both hashtable + list IMHO. >>>> >>>> I understant that but I also don't like the frequent palloc/pfree >>>> in long-lasting context and double loop like Peter. >>>> >>>>>> The difference is that we blow away the list of states when the catalog >>>>>> changes, but we keep the hash table with the start times around. We >>>>>> need two things with different life times. >>>> >>>> On the other hand, hash seems overdone. Addition to that, the >>>> hash-version leaks stale entries while subscriptions are >>>> modified. But vacuuming them costs high. >>>> >>>>> Why can't we just update the hashtable based on the catalog? I mean once >>>>> the record is not needed in the list, the table has been synced so there >>>>> is no need for the timestamp either since we'll not try to start the >>>>> worker again. >>> >>> I guess the table sync worker for the same table could need to be >>> started again. For example, please image a case where the table >>> belonging to the publication is removed from it and the corresponding >>> subscription is refreshed, and then the table is added to it again. We >>> have the record of the table with timestamp in the hash table when the >>> table sync in the first time, but the table sync after refreshed could >>> have to wait for the interval. >>> >> >> But why do we want to wait in such case where user has explicitly >> requested refresh? >> > > Yeah, sorry, I meant that we don't want to wait but cannot launch the > tablesync worker in such case. > > But after more thought, the latest patch Peter proposed has the same > problem. Perhaps we need to scan always whole pg_subscription_rel and > remove the entry if the corresponding table get synced. > Yes that's what I mean by "Why can't we just update the hashtable based on the catalog". And if we do that then I don't understand why do we need both hastable and linked list if we need to update both based on catalog reads anyway. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 19/04/17 15:57, Masahiko Sawada wrote: >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >> <petr.jelinek@2ndquadrant.com> wrote: >>> On 19/04/17 14:42, Masahiko Sawada wrote: >>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>>>> there is no need to have both hashtable + list IMHO. >>>>> >>>>> I understant that but I also don't like the frequent palloc/pfree >>>>> in long-lasting context and double loop like Peter. >>>>> >>>>>>> The difference is that we blow away the list of states when the catalog >>>>>>> changes, but we keep the hash table with the start times around. We >>>>>>> need two things with different life times. >>>>> >>>>> On the other hand, hash seems overdone. Addition to that, the >>>>> hash-version leaks stale entries while subscriptions are >>>>> modified. But vacuuming them costs high. >>>>> >>>>>> Why can't we just update the hashtable based on the catalog? I mean once >>>>>> the record is not needed in the list, the table has been synced so there >>>>>> is no need for the timestamp either since we'll not try to start the >>>>>> worker again. >>>> >>>> I guess the table sync worker for the same table could need to be >>>> started again. For example, please image a case where the table >>>> belonging to the publication is removed from it and the corresponding >>>> subscription is refreshed, and then the table is added to it again. We >>>> have the record of the table with timestamp in the hash table when the >>>> table sync in the first time, but the table sync after refreshed could >>>> have to wait for the interval. >>>> >>> >>> But why do we want to wait in such case where user has explicitly >>> requested refresh? >>> >> >> Yeah, sorry, I meant that we don't want to wait but cannot launch the >> tablesync worker in such case. >> >> But after more thought, the latest patch Peter proposed has the same >> problem. Perhaps we need to scan always whole pg_subscription_rel and >> remove the entry if the corresponding table get synced. >> > > Yes that's what I mean by "Why can't we just update the hashtable based > on the catalog". And if we do that then I don't understand why do we > need both hastable and linked list if we need to update both based on > catalog reads anyway. Thanks, I've now understood correctly. Yes, I think you're right. If we update the hash table based on the catalog whenever table state is invalidated, we don't need to have both hash table and list. BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the pg_subscription_catalog. So the following condition seems not correct. We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT" instead? /* * There is a worker synchronizing the relation and waiting for * apply to do something. */ if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT) { /* * There are three possible synchronization situations here. * * a) Apply isin front of the table sync: We tell the table * sync to CATCHUP. * * b)Apply is behind the table sync: We tell the table sync * to mark the table as SYNCDONE and finish. * c) Apply and table sync are at the same position: We tell * table sync to mark the tableas READY and finish. * * In any case we'll need to wait for table sync to change * the state in catalog and only then continue ourselves. */ Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 20/04/17 06:21, Masahiko Sawada wrote: > On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> On 19/04/17 15:57, Masahiko Sawada wrote: >>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >>> <petr.jelinek@2ndquadrant.com> wrote: >>>> On 19/04/17 14:42, Masahiko Sawada wrote: >>>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>>>>> there is no need to have both hashtable + list IMHO. >>>>>> >>>>>> I understant that but I also don't like the frequent palloc/pfree >>>>>> in long-lasting context and double loop like Peter. >>>>>> >>>>>>>> The difference is that we blow away the list of states when the catalog >>>>>>>> changes, but we keep the hash table with the start times around. We >>>>>>>> need two things with different life times. >>>>>> >>>>>> On the other hand, hash seems overdone. Addition to that, the >>>>>> hash-version leaks stale entries while subscriptions are >>>>>> modified. But vacuuming them costs high. >>>>>> >>>>>>> Why can't we just update the hashtable based on the catalog? I mean once >>>>>>> the record is not needed in the list, the table has been synced so there >>>>>>> is no need for the timestamp either since we'll not try to start the >>>>>>> worker again. >>>>> >>>>> I guess the table sync worker for the same table could need to be >>>>> started again. For example, please image a case where the table >>>>> belonging to the publication is removed from it and the corresponding >>>>> subscription is refreshed, and then the table is added to it again. We >>>>> have the record of the table with timestamp in the hash table when the >>>>> table sync in the first time, but the table sync after refreshed could >>>>> have to wait for the interval. >>>>> >>>> >>>> But why do we want to wait in such case where user has explicitly >>>> requested refresh? >>>> >>> >>> Yeah, sorry, I meant that we don't want to wait but cannot launch the >>> tablesync worker in such case. >>> >>> But after more thought, the latest patch Peter proposed has the same >>> problem. Perhaps we need to scan always whole pg_subscription_rel and >>> remove the entry if the corresponding table get synced. >>> >> >> Yes that's what I mean by "Why can't we just update the hashtable based >> on the catalog". And if we do that then I don't understand why do we >> need both hastable and linked list if we need to update both based on >> catalog reads anyway. > > Thanks, I've now understood correctly. Yes, I think you're right. If > we update the hash table based on the catalog whenever table state is > invalidated, we don't need to have both hash table and list. > > BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the > pg_subscription_catalog. So the following condition seems not correct. > We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT" > instead? > > /* > * There is a worker synchronizing the relation and waiting for > * apply to do something. > */ > if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT) > { > /* > * There are three possible synchronization situations here. > * > * a) Apply is in front of the table sync: We tell the table > * sync to CATCHUP. > * > * b) Apply is behind the table sync: We tell the table sync > * to mark the table as SYNCDONE and finish. > > * c) Apply and table sync are at the same position: We tell > * table sync to mark the table as READY and finish. > * > * In any case we'll need to wait for table sync to change > * the state in catalog and only then continue ourselves. > */ > Good catch. Although it's not comment that's wrong, it's the if. We should not compare rstate->state but syncworker->relstate. The reason why SUBREL_STATE_SYNCWAIT is not stored in catalog is that we'd have to commit from sync worker which could leave table in partially synchronized state on crash without the ability to resume afterwards (since the slot on other side will be gone). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Apr 20, 2017 at 8:43 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 20/04/17 06:21, Masahiko Sawada wrote: >> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >> <petr.jelinek@2ndquadrant.com> wrote: >>> On 19/04/17 15:57, Masahiko Sawada wrote: >>>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >>>> <petr.jelinek@2ndquadrant.com> wrote: >>>>> On 19/04/17 14:42, Masahiko Sawada wrote: >>>>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>>>>>> there is no need to have both hashtable + list IMHO. >>>>>>> >>>>>>> I understant that but I also don't like the frequent palloc/pfree >>>>>>> in long-lasting context and double loop like Peter. >>>>>>> >>>>>>>>> The difference is that we blow away the list of states when the catalog >>>>>>>>> changes, but we keep the hash table with the start times around. We >>>>>>>>> need two things with different life times. >>>>>>> >>>>>>> On the other hand, hash seems overdone. Addition to that, the >>>>>>> hash-version leaks stale entries while subscriptions are >>>>>>> modified. But vacuuming them costs high. >>>>>>> >>>>>>>> Why can't we just update the hashtable based on the catalog? I mean once >>>>>>>> the record is not needed in the list, the table has been synced so there >>>>>>>> is no need for the timestamp either since we'll not try to start the >>>>>>>> worker again. >>>>>> >>>>>> I guess the table sync worker for the same table could need to be >>>>>> started again. For example, please image a case where the table >>>>>> belonging to the publication is removed from it and the corresponding >>>>>> subscription is refreshed, and then the table is added to it again. We >>>>>> have the record of the table with timestamp in the hash table when the >>>>>> table sync in the first time, but the table sync after refreshed could >>>>>> have to wait for the interval. >>>>>> >>>>> >>>>> But why do we want to wait in such case where user has explicitly >>>>> requested refresh? >>>>> >>>> >>>> Yeah, sorry, I meant that we don't want to wait but cannot launch the >>>> tablesync worker in such case. >>>> >>>> But after more thought, the latest patch Peter proposed has the same >>>> problem. Perhaps we need to scan always whole pg_subscription_rel and >>>> remove the entry if the corresponding table get synced. >>>> >>> >>> Yes that's what I mean by "Why can't we just update the hashtable based >>> on the catalog". And if we do that then I don't understand why do we >>> need both hastable and linked list if we need to update both based on >>> catalog reads anyway. >> >> Thanks, I've now understood correctly. Yes, I think you're right. If >> we update the hash table based on the catalog whenever table state is >> invalidated, we don't need to have both hash table and list. >> >> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the >> pg_subscription_catalog. So the following condition seems not correct. >> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT" >> instead? >> >> /* >> * There is a worker synchronizing the relation and waiting for >> * apply to do something. >> */ >> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT) >> { >> /* >> * There are three possible synchronization situations here. >> * >> * a) Apply is in front of the table sync: We tell the table >> * sync to CATCHUP. >> * >> * b) Apply is behind the table sync: We tell the table sync >> * to mark the table as SYNCDONE and finish. >> >> * c) Apply and table sync are at the same position: We tell >> * table sync to mark the table as READY and finish. >> * >> * In any case we'll need to wait for table sync to change >> * the state in catalog and only then continue ourselves. >> */ >> > > Good catch. Although it's not comment that's wrong, it's the if. We > should not compare rstate->state but syncworker->relstate. I've attached a patch to fix this bug. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hello, At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> > On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: > > On 19/04/17 15:57, Masahiko Sawada wrote: > >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek > >> <petr.jelinek@2ndquadrant.com> wrote: > >>> On 19/04/17 14:42, Masahiko Sawada wrote: > >>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI > >>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> > >>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: > >>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: > >>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the > >>>>>>>> whole table_states then. The reason why it's list now was that it seemed > >>>>>>>> unnecessary to have hashtable when it will be empty almost always but > >>>>>>>> there is no need to have both hashtable + list IMHO. > >>>>> > >>>>> I understant that but I also don't like the frequent palloc/pfree > >>>>> in long-lasting context and double loop like Peter. > >>>>> > >>>>>>> The difference is that we blow away the list of states when the catalog > >>>>>>> changes, but we keep the hash table with the start times around. We > >>>>>>> need two things with different life times. > >>>>> > >>>>> On the other hand, hash seems overdone. Addition to that, the > >>>>> hash-version leaks stale entries while subscriptions are > >>>>> modified. But vacuuming them costs high. > >>>>> > >>>>>> Why can't we just update the hashtable based on the catalog? I mean once > >>>>>> the record is not needed in the list, the table has been synced so there > >>>>>> is no need for the timestamp either since we'll not try to start the > >>>>>> worker again. > >>>> > >>>> I guess the table sync worker for the same table could need to be > >>>> started again. For example, please image a case where the table > >>>> belonging to the publication is removed from it and the corresponding > >>>> subscription is refreshed, and then the table is added to it again. We > >>>> have the record of the table with timestamp in the hash table when the > >>>> table sync in the first time, but the table sync after refreshed could > >>>> have to wait for the interval. > >>>> > >>> > >>> But why do we want to wait in such case where user has explicitly > >>> requested refresh? > >>> > >> > >> Yeah, sorry, I meant that we don't want to wait but cannot launch the > >> tablesync worker in such case. > >> > >> But after more thought, the latest patch Peter proposed has the same > >> problem. Perhaps we need to scan always whole pg_subscription_rel and > >> remove the entry if the corresponding table get synced. > >> > > > > Yes that's what I mean by "Why can't we just update the hashtable based > > on the catalog". And if we do that then I don't understand why do we > > need both hastable and linked list if we need to update both based on > > catalog reads anyway. > > Thanks, I've now understood correctly. Yes, I think you're right. If > we update the hash table based on the catalog whenever table state is > invalidated, we don't need to have both hash table and list. Ah, ok. The patch from Peter still generating and replacing the content of the list. The attached patch stores everything into SubscriptionRelState. Oppositte to my anticiation, the hash can be efectively kept small and removed. > BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the > pg_subscription_catalog. So the following condition seems not correct. > We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT" > instead? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 21/04/17 04:38, Masahiko Sawada wrote: > On Thu, Apr 20, 2017 at 8:43 PM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> On 20/04/17 06:21, Masahiko Sawada wrote: >>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >>> <petr.jelinek@2ndquadrant.com> wrote: >>>> On 19/04/17 15:57, Masahiko Sawada wrote: >>>>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >>>>> <petr.jelinek@2ndquadrant.com> wrote: >>>>>> On 19/04/17 14:42, Masahiko Sawada wrote: >>>>>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>>>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>>>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>>>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>>>>>>> there is no need to have both hashtable + list IMHO. >>>>>>>> >>>>>>>> I understant that but I also don't like the frequent palloc/pfree >>>>>>>> in long-lasting context and double loop like Peter. >>>>>>>> >>>>>>>>>> The difference is that we blow away the list of states when the catalog >>>>>>>>>> changes, but we keep the hash table with the start times around. We >>>>>>>>>> need two things with different life times. >>>>>>>> >>>>>>>> On the other hand, hash seems overdone. Addition to that, the >>>>>>>> hash-version leaks stale entries while subscriptions are >>>>>>>> modified. But vacuuming them costs high. >>>>>>>> >>>>>>>>> Why can't we just update the hashtable based on the catalog? I mean once >>>>>>>>> the record is not needed in the list, the table has been synced so there >>>>>>>>> is no need for the timestamp either since we'll not try to start the >>>>>>>>> worker again. >>>>>>> >>>>>>> I guess the table sync worker for the same table could need to be >>>>>>> started again. For example, please image a case where the table >>>>>>> belonging to the publication is removed from it and the corresponding >>>>>>> subscription is refreshed, and then the table is added to it again. We >>>>>>> have the record of the table with timestamp in the hash table when the >>>>>>> table sync in the first time, but the table sync after refreshed could >>>>>>> have to wait for the interval. >>>>>>> >>>>>> >>>>>> But why do we want to wait in such case where user has explicitly >>>>>> requested refresh? >>>>>> >>>>> >>>>> Yeah, sorry, I meant that we don't want to wait but cannot launch the >>>>> tablesync worker in such case. >>>>> >>>>> But after more thought, the latest patch Peter proposed has the same >>>>> problem. Perhaps we need to scan always whole pg_subscription_rel and >>>>> remove the entry if the corresponding table get synced. >>>>> >>>> >>>> Yes that's what I mean by "Why can't we just update the hashtable based >>>> on the catalog". And if we do that then I don't understand why do we >>>> need both hastable and linked list if we need to update both based on >>>> catalog reads anyway. >>> >>> Thanks, I've now understood correctly. Yes, I think you're right. If >>> we update the hash table based on the catalog whenever table state is >>> invalidated, we don't need to have both hash table and list. >>> >>> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the >>> pg_subscription_catalog. So the following condition seems not correct. >>> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT" >>> instead? >>> >>> /* >>> * There is a worker synchronizing the relation and waiting for >>> * apply to do something. >>> */ >>> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT) >>> { >>> /* >>> * There are three possible synchronization situations here. >>> * >>> * a) Apply is in front of the table sync: We tell the table >>> * sync to CATCHUP. >>> * >>> * b) Apply is behind the table sync: We tell the table sync >>> * to mark the table as SYNCDONE and finish. >>> >>> * c) Apply and table sync are at the same position: We tell >>> * table sync to mark the table as READY and finish. >>> * >>> * In any case we'll need to wait for table sync to change >>> * the state in catalog and only then continue ourselves. >>> */ >>> >> >> Good catch. Although it's not comment that's wrong, it's the if. We >> should not compare rstate->state but syncworker->relstate. > > I've attached a patch to fix this bug. > Rereading the code again, it's actually not bug as we update the rstate to what syncworker says, but it's obviously confusing so probably still worth to commit that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 21/04/17 10:33, Kyotaro HORIGUCHI wrote: > Hello, > > At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> >> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >> <petr.jelinek@2ndquadrant.com> wrote: >>> On 19/04/17 15:57, Masahiko Sawada wrote: >>>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >>>> <petr.jelinek@2ndquadrant.com> wrote: >>>>> On 19/04/17 14:42, Masahiko Sawada wrote: >>>>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>>>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>>>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>>>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>>>>>>>>> whole table_states then. The reason why it's list now was that it seemed >>>>>>>>>> unnecessary to have hashtable when it will be empty almost always but >>>>>>>>>> there is no need to have both hashtable + list IMHO. >>>>>>> >>>>>>> I understant that but I also don't like the frequent palloc/pfree >>>>>>> in long-lasting context and double loop like Peter. >>>>>>> >>>>>>>>> The difference is that we blow away the list of states when the catalog >>>>>>>>> changes, but we keep the hash table with the start times around. We >>>>>>>>> need two things with different life times. >>>>>>> >>>>>>> On the other hand, hash seems overdone. Addition to that, the >>>>>>> hash-version leaks stale entries while subscriptions are >>>>>>> modified. But vacuuming them costs high. >>>>>>> >>>>>>>> Why can't we just update the hashtable based on the catalog? I mean once >>>>>>>> the record is not needed in the list, the table has been synced so there >>>>>>>> is no need for the timestamp either since we'll not try to start the >>>>>>>> worker again. >>>>>> >>>>>> I guess the table sync worker for the same table could need to be >>>>>> started again. For example, please image a case where the table >>>>>> belonging to the publication is removed from it and the corresponding >>>>>> subscription is refreshed, and then the table is added to it again. We >>>>>> have the record of the table with timestamp in the hash table when the >>>>>> table sync in the first time, but the table sync after refreshed could >>>>>> have to wait for the interval. >>>>>> >>>>> >>>>> But why do we want to wait in such case where user has explicitly >>>>> requested refresh? >>>>> >>>> >>>> Yeah, sorry, I meant that we don't want to wait but cannot launch the >>>> tablesync worker in such case. >>>> >>>> But after more thought, the latest patch Peter proposed has the same >>>> problem. Perhaps we need to scan always whole pg_subscription_rel and >>>> remove the entry if the corresponding table get synced. >>>> >>> >>> Yes that's what I mean by "Why can't we just update the hashtable based >>> on the catalog". And if we do that then I don't understand why do we >>> need both hastable and linked list if we need to update both based on >>> catalog reads anyway. >> >> Thanks, I've now understood correctly. Yes, I think you're right. If >> we update the hash table based on the catalog whenever table state is >> invalidated, we don't need to have both hash table and list. > > Ah, ok. The patch from Peter still generating and replacing the > content of the list. The attached patch stores everything into > SubscriptionRelState. Oppositte to my anticiation, the hash can > be efectively kept small and removed. > Yeah this does not look bad in general, I have some comments about the patch though: > + if (!found) { > + relstate->relid = subrel->srrelid; > + relstate->state = subrel->srsubstate; > + relstate->lsn = subrel->srsublsn; > + relstate->last_worker_start = 0; > + } > + I think we should document why this is done this way - I mean it's not very obvious that it's okay to update just when not found and why and even less obvious that it would be in fact wrong to update already existing entry. I also think that in it's current form the GetSubscriptionNotReadyRelations should be moved to tablesync.c as it's no longer generically usable function but it's behavior depends heavily on how it's used in tablesync.c. I also think we can't add the new fields to SubscriptionRelState directly as that's used by catalog access functions as well. We need to have some struct that's local to tablesync which contains SubscriptionRelState as one field, and the two new fields probably. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> >> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >> <petr.jelinek@2ndquadrant.com> wrote: >> > On 19/04/17 15:57, Masahiko Sawada wrote: >> >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >> >> <petr.jelinek@2ndquadrant.com> wrote: >> >>> On 19/04/17 14:42, Masahiko Sawada wrote: >> >>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >> >>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> >>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >> >>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >> >>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >> >>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >> >>>>>>>> whole table_states then. The reason why it's list now was that it seemed >> >>>>>>>> unnecessary to have hashtable when it will be empty almost always but >> >>>>>>>> there is no need to have both hashtable + list IMHO. >> >>>>> >> >>>>> I understant that but I also don't like the frequent palloc/pfree >> >>>>> in long-lasting context and double loop like Peter. >> >>>>> >> >>>>>>> The difference is that we blow away the list of states when the catalog >> >>>>>>> changes, but we keep the hash table with the start times around. We >> >>>>>>> need two things with different life times. >> >>>>> >> >>>>> On the other hand, hash seems overdone. Addition to that, the >> >>>>> hash-version leaks stale entries while subscriptions are >> >>>>> modified. But vacuuming them costs high. >> >>>>> >> >>>>>> Why can't we just update the hashtable based on the catalog? I mean once >> >>>>>> the record is not needed in the list, the table has been synced so there >> >>>>>> is no need for the timestamp either since we'll not try to start the >> >>>>>> worker again. >> >>>> >> >>>> I guess the table sync worker for the same table could need to be >> >>>> started again. For example, please image a case where the table >> >>>> belonging to the publication is removed from it and the corresponding >> >>>> subscription is refreshed, and then the table is added to it again. We >> >>>> have the record of the table with timestamp in the hash table when the >> >>>> table sync in the first time, but the table sync after refreshed could >> >>>> have to wait for the interval. >> >>>> >> >>> >> >>> But why do we want to wait in such case where user has explicitly >> >>> requested refresh? >> >>> >> >> >> >> Yeah, sorry, I meant that we don't want to wait but cannot launch the >> >> tablesync worker in such case. >> >> >> >> But after more thought, the latest patch Peter proposed has the same >> >> problem. Perhaps we need to scan always whole pg_subscription_rel and >> >> remove the entry if the corresponding table get synced. >> >> >> > >> > Yes that's what I mean by "Why can't we just update the hashtable based >> > on the catalog". And if we do that then I don't understand why do we >> > need both hastable and linked list if we need to update both based on >> > catalog reads anyway. >> >> Thanks, I've now understood correctly. Yes, I think you're right. If >> we update the hash table based on the catalog whenever table state is >> invalidated, we don't need to have both hash table and list. > > Ah, ok. The patch from Peter still generating and replacing the > content of the list. The attached patch stores everything into > SubscriptionRelState. Oppositte to my anticiation, the hash can > be efectively kept small and removed. > Thank you for the patch! Actually, I also bumped into the same the situation where we got the following error during hash_seq_search. I guess we cannot commit a transaction during hash_seq_scan but the sequential scan loop in process_syncing_tables_for_apply could attempt to do that. 2017-04-21 21:35:22.587 JST [7508] WARNING: leaked hash_seq_search scan for hash table 0x1f54980 2017-04-21 21:35:22.587 JST [7508] ERROR: no hash_seq_search scan for hash table "Logical replication table sync worker start times" Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Hello, >> >> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> >>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >>> <petr.jelinek@2ndquadrant.com> wrote: >>> > On 19/04/17 15:57, Masahiko Sawada wrote: >>> >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >>> >> <petr.jelinek@2ndquadrant.com> wrote: >>> >>> On 19/04/17 14:42, Masahiko Sawada wrote: >>> >>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>> >>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>> >>>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee@2ndquadrant.com> >>> >>>>>> On 18/04/17 18:14, Peter Eisentraut wrote: >>> >>>>>>> On 4/18/17 11:59, Petr Jelinek wrote: >>> >>>>>>>> Hmm if we create hashtable for this, I'd say create hashtable for the >>> >>>>>>>> whole table_states then. The reason why it's list now was that it seemed >>> >>>>>>>> unnecessary to have hashtable when it will be empty almost always but >>> >>>>>>>> there is no need to have both hashtable + list IMHO. >>> >>>>> >>> >>>>> I understant that but I also don't like the frequent palloc/pfree >>> >>>>> in long-lasting context and double loop like Peter. >>> >>>>> >>> >>>>>>> The difference is that we blow away the list of states when the catalog >>> >>>>>>> changes, but we keep the hash table with the start times around. We >>> >>>>>>> need two things with different life times. >>> >>>>> >>> >>>>> On the other hand, hash seems overdone. Addition to that, the >>> >>>>> hash-version leaks stale entries while subscriptions are >>> >>>>> modified. But vacuuming them costs high. >>> >>>>> >>> >>>>>> Why can't we just update the hashtable based on the catalog? I mean once >>> >>>>>> the record is not needed in the list, the table has been synced so there >>> >>>>>> is no need for the timestamp either since we'll not try to start the >>> >>>>>> worker again. >>> >>>> >>> >>>> I guess the table sync worker for the same table could need to be >>> >>>> started again. For example, please image a case where the table >>> >>>> belonging to the publication is removed from it and the corresponding >>> >>>> subscription is refreshed, and then the table is added to it again. We >>> >>>> have the record of the table with timestamp in the hash table when the >>> >>>> table sync in the first time, but the table sync after refreshed could >>> >>>> have to wait for the interval. >>> >>>> >>> >>> >>> >>> But why do we want to wait in such case where user has explicitly >>> >>> requested refresh? >>> >>> >>> >> >>> >> Yeah, sorry, I meant that we don't want to wait but cannot launch the >>> >> tablesync worker in such case. >>> >> >>> >> But after more thought, the latest patch Peter proposed has the same >>> >> problem. Perhaps we need to scan always whole pg_subscription_rel and >>> >> remove the entry if the corresponding table get synced. >>> >> >>> > >>> > Yes that's what I mean by "Why can't we just update the hashtable based >>> > on the catalog". And if we do that then I don't understand why do we >>> > need both hastable and linked list if we need to update both based on >>> > catalog reads anyway. >>> >>> Thanks, I've now understood correctly. Yes, I think you're right. If >>> we update the hash table based on the catalog whenever table state is >>> invalidated, we don't need to have both hash table and list. >> >> Ah, ok. The patch from Peter still generating and replacing the >> content of the list. The attached patch stores everything into >> SubscriptionRelState. Oppositte to my anticiation, the hash can >> be efectively kept small and removed. >> > > Thank you for the patch! > Actually, I also bumped into the same the situation where we got the > following error during hash_seq_search. I guess we cannot commit a > transaction during hash_seq_scan but the sequential scan loop in > process_syncing_tables_for_apply could attempt to do that. > So I guess we should commit the changing status to SUBREL_STATE_READY after finished hash_seq_scan. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hello, At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoApBU+nz7FYaWX6gjyB9r6WWrTujbL4rrZiocHLc=pEbA@mail.gmail.com> > On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> Hello, > >> > >> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> > >>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek > >>> <petr.jelinek@2ndquadrant.com> wrote: > >>> > On 19/04/17 15:57, Masahiko Sawada wrote: > >>> >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek > >>> >> <petr.jelinek@2ndquadrant.com> wrote: > >>> >> Yeah, sorry, I meant that we don't want to wait but cannot launch the > >>> >> tablesync worker in such case. > >>> >> > >>> >> But after more thought, the latest patch Peter proposed has the same > >>> >> problem. Perhaps we need to scan always whole pg_subscription_rel and > >>> >> remove the entry if the corresponding table get synced. > >>> >> > >>> > > >>> > Yes that's what I mean by "Why can't we just update the hashtable based > >>> > on the catalog". And if we do that then I don't understand why do we > >>> > need both hastable and linked list if we need to update both based on > >>> > catalog reads anyway. > >>> > >>> Thanks, I've now understood correctly. Yes, I think you're right. If > >>> we update the hash table based on the catalog whenever table state is > >>> invalidated, we don't need to have both hash table and list. > >> > >> Ah, ok. The patch from Peter still generating and replacing the > >> content of the list. The attached patch stores everything into > >> SubscriptionRelState. Oppositte to my anticiation, the hash can > >> be efectively kept small and removed. > >> > > > > Thank you for the patch! > > Actually, I also bumped into the same the situation where we got the > > following error during hash_seq_search. I guess we cannot commit a > > transaction during hash_seq_scan but the sequential scan loop in > > process_syncing_tables_for_apply could attempt to do that. Ah. Thanks. I forgot that I saw the same error once. The hash has nothing to do with any transactions. The scan now runs after freezing of the hash. Petr: > I think we should document why this is done this way - I mean it's not > very obvious that it's okay to update just when not found and why and > even less obvious that it would be in fact wrong to update already > existing entry. It is not prudently designed. I changed there.. seemingly more reasonable, maybe. Petr: > I also think that in it's current form the > GetSubscriptionNotReadyRelations should be moved to tablesync.c Removed then moved to tablesync.c. Petr: > I also think we can't add the new fields to > SubscriptionRelState directly as that's used by catalog access > functions as well. Yeah, it was my lazyness. A new struct SubscriptionWorkerState is added in tablesync.c and the interval stuff runs on it. At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoApBU+nz7FYaWX6gjyB9r6WWrTujbL4rrZiocHLc=pEbA@mail.gmail.com> > So I guess we should commit the changing status to SUBREL_STATE_READY > after finished hash_seq_scan. We could so either, but I thought that a hash can explicitly "unfreeze" without a side effect. The attached first one adds the function in dynahash.c. I don't think that just allowing unregsiterd scan on unfrozen hash is worse that this. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoApBU+nz7FYaWX6gjyB9r6WWrTujbL4rrZiocHLc=pEbA@mail.gmail.com> >> On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI >> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> >> Hello, >> >> >> >> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> >> >>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >> >>> <petr.jelinek@2ndquadrant.com> wrote: >> >>> > On 19/04/17 15:57, Masahiko Sawada wrote: >> >>> >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >> >>> >> <petr.jelinek@2ndquadrant.com> wrote: >> >>> >> Yeah, sorry, I meant that we don't want to wait but cannot launch the >> >>> >> tablesync worker in such case. >> >>> >> >> >>> >> But after more thought, the latest patch Peter proposed has the same >> >>> >> problem. Perhaps we need to scan always whole pg_subscription_rel and >> >>> >> remove the entry if the corresponding table get synced. >> >>> >> >> >>> > >> >>> > Yes that's what I mean by "Why can't we just update the hashtable based >> >>> > on the catalog". And if we do that then I don't understand why do we >> >>> > need both hastable and linked list if we need to update both based on >> >>> > catalog reads anyway. >> >>> >> >>> Thanks, I've now understood correctly. Yes, I think you're right. If >> >>> we update the hash table based on the catalog whenever table state is >> >>> invalidated, we don't need to have both hash table and list. >> >> >> >> Ah, ok. The patch from Peter still generating and replacing the >> >> content of the list. The attached patch stores everything into >> >> SubscriptionRelState. Oppositte to my anticiation, the hash can >> >> be efectively kept small and removed. >> >> >> > >> > Thank you for the patch! >> > Actually, I also bumped into the same the situation where we got the >> > following error during hash_seq_search. I guess we cannot commit a >> > transaction during hash_seq_scan but the sequential scan loop in >> > process_syncing_tables_for_apply could attempt to do that. > > Ah. Thanks. I forgot that I saw the same error once. The hash has > nothing to do with any transactions. The scan now runs after > freezing of the hash. > > Petr: >> I think we should document why this is done this way - I mean it's not >> very obvious that it's okay to update just when not found and why and >> even less obvious that it would be in fact wrong to update already >> existing entry. > > It is not prudently designed. I changed there.. seemingly more > reasonable, maybe. > > Petr: >> I also think that in it's current form the >> GetSubscriptionNotReadyRelations should be moved to tablesync.c > > Removed then moved to tablesync.c. > > Petr: >> I also think we can't add the new fields to >> SubscriptionRelState directly as that's used by catalog access >> functions as well. > > Yeah, it was my lazyness. A new struct SubscriptionWorkerState is > added in tablesync.c and the interval stuff runs on it. > > > At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoApBU+nz7FYaWX6gjyB9r6WWrTujbL4rrZiocHLc=pEbA@mail.gmail.com> >> So I guess we should commit the changing status to SUBREL_STATE_READY >> after finished hash_seq_scan. > > We could so either, but I thought that a hash can explicitly > "unfreeze" without a side effect. The attached first one adds the > function in dynahash.c. I don't think that just allowing > unregsiterd scan on unfrozen hash is worse that this. > Thank you for updating the patch. + elog(ERROR, "this hash hashtable \"%s\" is not frozen", + hashp->tabname); Maybe the error message should be "this hashtable \"%s\" is not frozen". + /* + * Freeze hash table. This guarantees that the hash won't be broken for + * the scan below. Unfreezing on error leavs the hash freezed but any + * errors within this loop blows away the worker process involving the + * hash. + */ s/leavs/leaves/ s/freezed/frozen/ + /* This will be true in the next rurn only for live entries */ + wstate->alive = false; s/rurn/run/ + /* + * Remove entries no longer necessary. The flag signals nothing if + * subrel_local_state is not updated above. We can remove entries in + * frozen hash safely. + */ + if (local_state_updated && !wstate->alive) + { + hash_search(subrel_local_state, &wstate->rs.relid, + HASH_REMOVE, NULL); + continue; + } IIUC since the apply worker can change the status from SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop the table sync worker which is changed to SUBREL_STATE_READY by the apply worker before updating the subrel_local_state could be remained in the hash table. I think that we should scan pg_subscription_rel by using only a condition "subid". Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 24/04/17 17:52, Masahiko Sawada wrote: > On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > + /* > + * Remove entries no longer necessary. The flag signals nothing if > + * subrel_local_state is not updated above. We can remove entries in > + * frozen hash safely. > + */ > + if (local_state_updated && !wstate->alive) > + { > + hash_search(subrel_local_state, &wstate->rs.relid, > + HASH_REMOVE, NULL); > + continue; > + } > > IIUC since the apply worker can change the status from > SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop > the table sync worker which is changed to SUBREL_STATE_READY by the > apply worker before updating the subrel_local_state could be remained > in the hash table. I think that we should scan pg_subscription_rel by > using only a condition "subid". > I don't follow this. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 25, 2017 at 1:42 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 24/04/17 17:52, Masahiko Sawada wrote: >> On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> + /* >> + * Remove entries no longer necessary. The flag signals nothing if >> + * subrel_local_state is not updated above. We can remove entries in >> + * frozen hash safely. >> + */ >> + if (local_state_updated && !wstate->alive) >> + { >> + hash_search(subrel_local_state, &wstate->rs.relid, >> + HASH_REMOVE, NULL); >> + continue; >> + } >> >> IIUC since the apply worker can change the status from >> SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop >> the table sync worker which is changed to SUBREL_STATE_READY by the >> apply worker before updating the subrel_local_state could be remained >> in the hash table. I think that we should scan pg_subscription_rel by >> using only a condition "subid". >> > > I don't follow this. > Hmm, I'd misunderstood something. It should work fine. Sorry for the noise. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hello, At Tue, 25 Apr 2017 00:52:09 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCHPfBqKRiFSMmZSXM5qbB5rD_HZMrniL4E19T2r8WpkA@mail.gmail.com> > + elog(ERROR, "this hash hashtable \"%s\" is not frozen", > + hashp->tabname); > > Maybe the error message should be "this hashtable \"%s\" is not frozen". Both of "hashtable" and "hash table" appear there but hash_freeze uses the former. I followed that. > s/leavs/leaves/ > s/freezed/frozen/ > s/rurn/run/ Thanks! But the "rurn" was a typo of "turn". > On Tue, Apr 25, 2017 at 1:42 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: > > On 24/04/17 17:52, Masahiko Sawada wrote: > >> On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI > >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> + /* > >> + * Remove entries no longer necessary. The flag signals nothing if > >> + * subrel_local_state is not updated above. We can remove entries in > >> + * frozen hash safely. > >> + */ > >> + if (local_state_updated && !wstate->alive) > >> + { > >> + hash_search(subrel_local_state, &wstate->rs.relid, > >> + HASH_REMOVE, NULL); > >> + continue; > >> + } > >> > >> IIUC since the apply worker can change the status from > >> SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop > >> the table sync worker which is changed to SUBREL_STATE_READY by the > >> apply worker before updating the subrel_local_state could be remained > >> in the hash table. Every time after pg_subscription_rel is updated, the hash entries are marked alive only when the corresponding not-ready relations found in pg_subscription_rel. If any live entries remains, nrels becomes a positive number and dead entries are removed in the loop just after. If no entry survives, the hash will be immediately destroyed. Some dead entries can survive under ceratin condition but the one of the aboves will occur shortly. If it is hard to understand, I might should put some additional comments. > >> I think that we should scan pg_subscription_rel by > >> using only a condition "subid". > >> > > > > I don't follow this. > > > > Hmm, I'd misunderstood something. It should work fine. Sorry for the noise. Anyway, the typo-fixed version is attached. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 4/21/17 09:59, Petr Jelinek wrote: > Rereading the code again, it's actually not bug as we update the rstate > to what syncworker says, but it's obviously confusing so probably still > worth to commit that. We don't have the syncworker->relmutex at that point, so it's probably better to read the previously-copied value in rstate->state. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I feel it's getting a bit late for reworkings of this extent, also considering the marginal nature of the problem we are trying to fix. My patch from April 18 is very localized and gets the job done. I think this is still a good direction to investigate, but if we have to extend the hash table API to get it done, this might not be the best time. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 25/04/17 19:54, Peter Eisentraut wrote: > I feel it's getting a bit late for reworkings of this extent, also > considering the marginal nature of the problem we are trying to fix. My > patch from April 18 is very localized and gets the job done. > > I think this is still a good direction to investigate, but if we have to > extend the hash table API to get it done, this might not be the best time. > Yeah the hash API change needed is a bummer at this stage. One thing I am missing in your patch however is cleanup of entries for relations that finished sync. I wonder if it would be enough to just destroy the hash when we get to empty list. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 4/27/17 06:47, Petr Jelinek wrote: > One thing I am missing in your patch however is cleanup of entries for > relations that finished sync. I wonder if it would be enough to just > destroy the hash when we get to empty list. I had omitted that because the amount of memory "leaked" is not much, but I guess it wouldn't hurt to clean it up. How about the attached? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 27/04/17 21:00, Peter Eisentraut wrote: > On 4/27/17 06:47, Petr Jelinek wrote: >> One thing I am missing in your patch however is cleanup of entries for >> relations that finished sync. I wonder if it would be enough to just >> destroy the hash when we get to empty list. > > I had omitted that because the amount of memory "leaked" is not much, > but I guess it wouldn't hurt to clean it up. > > How about the attached? > Yes that's more or less what I meant. All good. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 28, 2017 at 4:00 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 4/27/17 06:47, Petr Jelinek wrote: >> One thing I am missing in your patch however is cleanup of entries for >> relations that finished sync. I wonder if it would be enough to just >> destroy the hash when we get to empty list. > > I had omitted that because the amount of memory "leaked" is not much, > but I guess it wouldn't hurt to clean it up. > > How about the attached? > Thank you for updating patch! + /* + * Clean up the hash table when we're done with all tables (just to + * release the bit of memory). + */ + else if (!table_states && last_start_times) + { Isn't it better to use != NIL instead as follows? else if (table_state != NIL && last_start_times) Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBY9UvS9QLrmaENGBGfQKOfGkGaLm=uYH24gmf-6CAoiw@mail.gmail.com> > On Fri, Apr 28, 2017 at 4:00 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 4/27/17 06:47, Petr Jelinek wrote: > >> One thing I am missing in your patch however is cleanup of entries for > >> relations that finished sync. I wonder if it would be enough to just > >> destroy the hash when we get to empty list. > > > > I had omitted that because the amount of memory "leaked" is not much, > > but I guess it wouldn't hurt to clean it up. > > > > How about the attached? > > This seems rasonable enough. > Thank you for updating patch! > > + /* > + * Clean up the hash table when we're done with all tables (just to > + * release the bit of memory). > + */ > + else if (!table_states && last_start_times) > + { > > Isn't it better to use != NIL instead as follows? > > else if (table_state != NIL && last_start_times) Definitely!, but maybe should be reverse condition. - if (table_states && !last_start_times) + if (table_states != NIL && !last_start_times) === - else if (!table_states && last_start_times) + else if (table_states == NIL && last_start_times) reagrds, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 28, 2017 at 5:26 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBY9UvS9QLrmaENGBGfQKOfGkGaLm=uYH24gmf-6CAoiw@mail.gmail.com> >> On Fri, Apr 28, 2017 at 4:00 AM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >> > On 4/27/17 06:47, Petr Jelinek wrote: >> >> One thing I am missing in your patch however is cleanup of entries for >> >> relations that finished sync. I wonder if it would be enough to just >> >> destroy the hash when we get to empty list. >> > >> > I had omitted that because the amount of memory "leaked" is not much, >> > but I guess it wouldn't hurt to clean it up. >> > >> > How about the attached? >> > > > This seems rasonable enough. > >> Thank you for updating patch! >> >> + /* >> + * Clean up the hash table when we're done with all tables (just to >> + * release the bit of memory). >> + */ >> + else if (!table_states && last_start_times) >> + { >> >> Isn't it better to use != NIL instead as follows? >> >> else if (table_state != NIL && last_start_times) > > Definitely!, but maybe should be reverse condition. Yeah, that's right. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 4/27/17 15:33, Petr Jelinek wrote: > On 27/04/17 21:00, Peter Eisentraut wrote: >> On 4/27/17 06:47, Petr Jelinek wrote: >>> One thing I am missing in your patch however is cleanup of entries for >>> relations that finished sync. I wonder if it would be enough to just >>> destroy the hash when we get to empty list. >> >> I had omitted that because the amount of memory "leaked" is not much, >> but I guess it wouldn't hurt to clean it up. >> >> How about the attached? > > Yes that's more or less what I meant. All good. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/27/17 21:20, Masahiko Sawada wrote: > Isn't it better to use != NIL instead as follows? > > else if (table_state != NIL && last_start_times) I'm not a fan of that in general, and it doesn't really add any clarity here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services