Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c
Date
Msg-id 5f656818-4800-62e7-7c28-b2f1b584e2cb@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On 08/05/17 17:52, Masahiko Sawada wrote:
> On Fri, May 5, 2017 at 8:13 PM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> On 03/05/17 13:23, Erik Rijkers wrote:
>>> On 2017-05-03 08:17, Petr Jelinek wrote:
>>>> On 02/05/17 20:43, Robert Haas wrote:
>>>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>>
>>>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>>>
>>>>> Is there anything left to be committed here?
>>>>>
>>>>
>>>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>>>> which didn't happen. I think something like attached should do the job.
>>>
>>> I'm running my pgbench-over-logical-replication test in chunk of 15
>>> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>>>
>>> With this patch (and nothing else)  on top of master (8f8b9be51fd7 to be
>>> precise):
>>>
>>>> fix-statistics-reporting-in-logical-replication-work.patch
>>>
>>> logical replication is still often failing (as expected, I suppose; it
>>> seems because of "inital snapshot too large") but indeed I do not see
>>
>> Yes that's different thing that we've been discussing a bit in snapbuild
>> woes thread.
>>
>>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>>
>>> (If there is any other configuration of patches worth testing please let
>>> me know)
>>>
>>
>> Thanks, so the patch works.
>>
> 
> I think that we should commit the local transaction that did initial
> data copy, and then report stat as well. Currently table sync worker
> doesn't commit the local transaction in LogicalRepSyncTableStart
> (maybe until apply commit record?) if its status is changed to
> SUBREL_STATE_CATCHUP. That's why the table sync worker issues
> assertion failure.
> 

That would fix the assert as well yes, but it would also mean that if
the worker crashed between the initial copy and the end of catchup there
would be no way to restart it without manual intervention from user
since the synchronization position would be lost. Hence the fix I
proposed which does it differently and has the whole sync in a single
transaction.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Adding support for Default partition in partitioning
Next
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c