On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 13/04/17 12:23, Masahiko Sawada wrote:
>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>>>>> Perhaps instead of a global last_start_time, we store a per relation
>>>>>> last_start_time in SubscriptionRelState?
>>>>>
>>>>> I was thinking the same. But a problem is that the list of
>>>>> SubscriptionRelState is refreshed whenever the syncing table state
>>>>> becomes invalid (table_state_valid = false). I guess we need to
>>>>> improve these logic including GetSubscriptionNotReadyRelations().
>>>>
>>>> The table states are invalidated on a syscache callback from
>>>> pg_subscription_rel, which happens roughly speaking when a table
>>>> finishes the initial sync. So if we're worried about failing tablesync
>>>> workers relaunching to quickly, this would only be a problem if a
>>>> tablesync of another table finishes right in that restart window. That
>>>> doesn't seem a terrible issue to me.
>>>>
>>>
>>> I think the table states are invalidated whenever the table sync
>>> worker starts, because the table sync worker updates its status of
>>> pg_subscription_rel and commits it before starting actual copy. So we
>>> cannot rely on that. I thought we can store last_start_time into
>>> pg_subscription_rel but it might be overkill. I'm now thinking to
>>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>>> SubscriptionRelState is taken over to new list.
>>>
>>
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
>> Please give me feedback.
>>
>
> Hmm this might work. Although I was actually wondering if we could store
> the last start timestamp in the worker shared memory and do some magic
> with that (ie, not clearing subid and relid and try to then do rate
> limiting based on subid+relid+timestamp stored in shmem). That would
> then work same way for the main apply workers as well. It would have the
> disadvantage that if some tables were consistently failing, no other
> tables could get synchronized as the amount of workers is limited.
Hmm I guess that it's not a good design that a table sync worker and a
apply worker for a relation takes sole possession of a worker slot
until it successes. It would bother each other.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center