Re: [HACKERS] Interval for launching the table sync worker - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] Interval for launching the table sync worker
Date
Msg-id b1669e2a-4765-d16f-ff64-b10059cecc54@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Interval for launching the table sync worker  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Interval for launching the table sync worker  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [HACKERS] Interval for launching the table sync worker  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On 13/04/17 12:23, Masahiko Sawada wrote:
> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>>>> Perhaps instead of a global last_start_time, we store a per relation
>>>>> last_start_time in SubscriptionRelState?
>>>>
>>>> I was thinking the same. But a problem is that the list of
>>>> SubscriptionRelState is refreshed whenever the syncing table state
>>>> becomes invalid (table_state_valid = false). I guess we need to
>>>> improve these logic including GetSubscriptionNotReadyRelations().
>>>
>>> The table states are invalidated on a syscache callback from
>>> pg_subscription_rel, which happens roughly speaking when a table
>>> finishes the initial sync.  So if we're worried about failing tablesync
>>> workers relaunching to quickly, this would only be a problem if a
>>> tablesync of another table finishes right in that restart window.  That
>>> doesn't seem a terrible issue to me.
>>>
>>
>> I think the table states are invalidated whenever the table sync
>> worker starts, because the table sync worker updates its status of
>> pg_subscription_rel and commits it before starting actual copy. So we
>> cannot rely on that. I thought we can store last_start_time into
>> pg_subscription_rel but it might be overkill. I'm now thinking to
>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>> SubscriptionRelState is taken over to new list.
>>
> 
> Attached the latest patch. It didn't actually necessary to change
> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
> the sync table state list.
> Please give me feedback.
> 

Hmm this might work. Although I was actually wondering if we could store
the last start timestamp in the worker shared memory and do some magic
with that (ie, not clearing subid and relid and try to then do rate
limiting based on subid+relid+timestamp stored in shmem). That would
then work same way for the main apply workers as well. It would have the
disadvantage that if some tables were consistently failing, no other
tables could get synchronized as the amount of workers is limited. OTOH
the approach chosen in the patch will first try all tables and only then
start rate limiting, not quite sure which is better.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] logical replication apply to run with sync commit offby default
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] Undefined psql variables