Re: [HACKERS] tablesync patch broke the assumption that logical repdepends on? - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [HACKERS] tablesync patch broke the assumption that logical repdepends on?
Date
Msg-id CAHGQGwH2-Vp5tfZjhdhGx_Acs7kdPdWawOGw-ZPTS9d0i3z5sw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] tablesync patch broke the assumption that logical repdepends on?  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: [HACKERS] tablesync patch broke the assumption that logical repdepends on?  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
List pgsql-hackers
On Fri, Apr 14, 2017 at 1:28 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 4/10/17 13:28, Fujii Masao wrote:
>>          src/backend/replication/logical/launcher.c
>>          * Worker started and attached to our shmem. This check is safe
>>          * because only launcher ever starts the workers, so nobody can steal
>>          * the worker slot.
>>
>> The tablesync patch enabled even worker to start another worker.
>> So the above assumption is not valid for now.
>>
>> This issue seems to cause the corner case where the launcher picks up
>> the same worker slot that previously-started worker has already picked
>> up to start another worker.
>
> I think what the comment should rather say is that workers are always
> started through logicalrep_worker_launch() and worker slots are always
> handed out while holding LogicalRepWorkerLock exclusively, so nobody can
> steal the worker slot.
>
> Does that make sense?

No unless I'm missing something.

logicalrep_worker_launch() picks up unused worker slot (slot's proc == NULL)
while holding LogicalRepWorkerLock. But it releases the lock before the slot
is marked as used (i.e., slot is set to non-NULL). Then newly-launched worker
calls logicalrep_worker_attach() and marks the slot as used.

So if another logicalrep_worker_launch() starts after LogicalRepWorkerLock
is released before the slot is marked as used, it can pick up the same slot
because that slot looks unused.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] bugfix: xpath encoding issue