Re: Fix for segfault in logical replication on master - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fix for segfault in logical replication on master
Date
Msg-id CAA4eK1K3_JcTeFKf6zmh2_C-x2mmPp+o52yHwDwXq0fgc1-1dg@mail.gmail.com
Whole thread Raw
In response to RE: Fix for segfault in logical replication on master  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
On Thu, Jun 17, 2021 at 4:09 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> On Thursday, June 17, 2021 2:43 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> > > On Jun 16, 2021, at 10:19 PM, osumi.takamichi@fujitsu.com wrote:
> > >
> > > I started to analyze your report and
> > > will reply after my idea to your modification is settled.
> >
> > Thank you.
> I'll share my first analysis.
>
> > In commit e7eea52b2d, you introduced a new function,
> > RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> > if a relation has a replica identity index.  That code segfaults under certain
> > conditions.  A test case to demonstrate that is attached.  Prior to patching
> > the code, this new test gets stuck waiting for replication to finish, which never
> > happens.  You have to break out of the test and check
> > tmp_check/log/021_no_replica_identity_publisher.log.
> >
> > I believe this bit of logic in src/backend/utils/cache/relcache.c:
> >
> >       indexDesc = RelationIdGetRelation(relation->rd_replidindex);
> >       for (i = 0; i < indexDesc->rd_index->indnatts; i++)
> >
> > is unsafe without further checks, also attached.
> You are absolutely right.
> I checked the crash scenario and reproduced the core,
> which has a null indexDesc. Also, rd_replidindex must be checked beforehand
> as you included in your patch, because having an index does not necessarily
> mean to have a replica identity index. As the proof of this, the oid of
> rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
> has passed with your fix.
>
> Also, your test looks essentially minimum(suitable for the problem) to me.
>
> * RelationGetIdentityKeyBitmap
> +       /*
> +        * Fall out if the description is not for an index, suggesting
> +        * affairs have changed since we looked.  XXX Should we log a
> +        * complaint here?
> +        */
> +       if (!indexDesc)
> +               return NULL;
> +       if (!indexDesc->rd_index)
> +       {
> +               RelationClose(indexDesc);
> +               return NULL;
> +       }
> For the 1st check, isn't it better to use RelationIsValid() ?
> I agree with having the check itself of course, though.
>
> Additionally, In what kind of actual scenario, did you think that
> we come to the part to "log a complaint" ?
>

Yeah, I think that part is not required unless there is some case
where it can happen. I guess we might want to have an elog at that
place with a check like:
if (!RelationIsValid(relation))
elog(ERROR, "could not open relation with OID %u", relid);

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PoC: Using Count-Min Sketch for join cardinality estimation
Next
From: Amit Kapila
Date:
Subject: Re: Unresolved repliaction hang and stop problem.