Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date
Msg-id CAA4eK1LqpFqcmQhDEdTKtHCAC1vZ9eGr2JWJ6Fgvc64f+twDGQ@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Mon, Sep 14, 2020 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > Pushed.
>
> Observe the following reports:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-09-10%2009%3A08%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-09-05%2020%3A22%3A02
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-04%2001%3A52%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-03%2020%3A54%3A04
>
> These are all on HEAD, and all within the last ten days, and I see
> nothing comparable in any branch before that.  So it's hard to avoid
> the conclusion that somebody broke something about ten days ago.
>
> None of these animals provided gdb backtraces; but we do have a built-in
> trace from several, and they all look like pgoutput.so is trying to
> list_free() garbage, somewhere inside a relcache invalidation/rebuild
> scenario:
>

Yeah, this is right, and here is some initial analysis. It seems to be
failing in below code:
rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}

This list can have elements only in 'streaming' mode (need to enable
'streaming' with Create Subscription command) whereas none of the
tests in 010_truncate.pl is using 'streaming', so this list should be
empty (NULL). The two different assertion failures shown in BF reports
in list_free code are as below:
Assert(list->length > 0);
Assert(list->length <= list->max_length);

It seems to me that this list is not initialized properly when it is
not used or maybe that is true in some special circumstances because
we initialize it in get_rel_sync_entry(). I am not sure if CCI build
is impacting this in some way.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: Amit Langote
Date:
Subject: Re: ModifyTable overheads in generic plans