Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c
Date
Msg-id CAD21AoB_p+okFK_tROGxG-P1xfSN2TVxwPZoY8gf9BXcahq-WQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
List pgsql-hackers
On Tue, May 9, 2017 at 1:26 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 08/05/17 17:52, Masahiko Sawada wrote:
>> On Fri, May 5, 2017 at 8:13 PM, Petr Jelinek
>> <petr.jelinek@2ndquadrant.com> wrote:
>>> On 03/05/17 13:23, Erik Rijkers wrote:
>>>> On 2017-05-03 08:17, Petr Jelinek wrote:
>>>>> On 02/05/17 20:43, Robert Haas wrote:
>>>>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>>>
>>>>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>>>>
>>>>>> Is there anything left to be committed here?
>>>>>>
>>>>>
>>>>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>>>>> which didn't happen. I think something like attached should do the job.
>>>>
>>>> I'm running my pgbench-over-logical-replication test in chunk of 15
>>>> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>>>>
>>>> With this patch (and nothing else)  on top of master (8f8b9be51fd7 to be
>>>> precise):
>>>>
>>>>> fix-statistics-reporting-in-logical-replication-work.patch
>>>>
>>>> logical replication is still often failing (as expected, I suppose; it
>>>> seems because of "inital snapshot too large") but indeed I do not see
>>>
>>> Yes that's different thing that we've been discussing a bit in snapbuild
>>> woes thread.
>>>
>>>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>>>
>>>> (If there is any other configuration of patches worth testing please let
>>>> me know)
>>>>
>>>
>>> Thanks, so the patch works.
>>>
>>
>> I think that we should commit the local transaction that did initial
>> data copy, and then report stat as well. Currently table sync worker
>> doesn't commit the local transaction in LogicalRepSyncTableStart
>> (maybe until apply commit record?) if its status is changed to
>> SUBREL_STATE_CATCHUP. That's why the table sync worker issues
>> assertion failure.
>>
>
> That would fix the assert as well yes, but it would also mean that if
> the worker crashed between the initial copy and the end of catchup there
> would be no way to restart it without manual intervention from user
> since the synchronization position would be lost. Hence the fix I
> proposed which does it differently and has the whole sync in a single
> transaction.

I understood that the data synchronization even including apply
logical record after changed to SUBREL_STATE_CATCHUP should be done in
a single transaction. Thank you for explanation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Erik Rijkers
Date:
Subject: Re: [HACKERS] snapbuild woes
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table