Re: hyrax vs. RelationBuildPartitionDesc - Mailing list pgsql-hackers

From Amit Langote
Subject Re: hyrax vs. RelationBuildPartitionDesc
Date
Msg-id CA+HiwqEwrP0TettuL5bbwGH_hu3L=5am9pajy7FmASQYDBq4aw@mail.gmail.com
Whole thread Raw
In response to Re: hyrax vs. RelationBuildPartitionDesc  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: hyrax vs. RelationBuildPartitionDesc
List pgsql-hackers
On Tue, Jun 4, 2019 at 9:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 17, 2019 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yeah, I did some additional testing that showed that it's pretty darn
> > hard to get the leak to amount to anything.  The test case that I
> > originally posted did many DDLs in a single transaction, and it
> > seems that that's actually essential to get a meaningful leak; as
> > soon as you exit the transaction the leaked contexts will be recovered
> > during sinval cleanup.
>
> My colleague Amul Sul rediscovered this same leak when he tried to
> attach lots of partitions to an existing partitioned table, all in the
> course of a single transaction.  This seems a little less artificial
> than Tom's original reproducer, which involved attaching and detaching
> the same partition repeatedly.
>
> Here is a patch that tries to fix this, along the lines I previously
> suggested; Amul reports that it does work for him.

Thanks for the patch.

I noticed a crash with one of the scenarios that I think the patch is
meant to address.  Let me describe the steps:

Attach gdb (or whatever) to session 1 in which we'll run a query that
will scan at least two partitions and set a breakpoint in
expand_partitioned_rtentry().  Run the query, so the breakpoint will
be hit.  Step through up to the start of the following loop in this
function:

    i = -1;
    while ((i = bms_next_member(live_parts, i)) >= 0)
    {
        Oid         childOID = partdesc->oids[i];
        Relation    childrel;
        RangeTblEntry *childrte;
        Index       childRTindex;
        RelOptInfo *childrelinfo;

        /* Open rel, acquiring required locks */
        childrel = table_open(childOID, lockmode);

Note that 'partdesc' in the above code is from the partition
directory.  Before stepping through into the loop, start another
session and attach a new partition.  On into the loop.  When the 1st
partition is opened, its locking will result in
RelationClearRelation() being called on the parent relation due to
partition being attached concurrently, which I observed, is actually
invoked a couple of times due to recursion.  Parent relation's
rd_oldpdcxt will be set in this process, which contains the
aforementioned partition descriptor.

Before moving the loop to process the 2nd partition, attach another
partition in session 2.  When the 2nd partition is opened,
RelationClearRelation() will run again and in one of its recursive
invocations, it destroys the rd_oldpdcxt that was set previously,
taking the partition directory's partition descriptor with it.
Anything that tries to access it later crashes.

I couldn't figure out what to blame here though -- the design of
rd_pdoldcxt or the fact that RelationClearRelation() is invoked
multiple times.  I will try to study it more closely tomorrow.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Confusing error message for REINDEX TABLE CONCURRENTLY
Next
From: Amit Kapila
Date:
Subject: Re: Confusing comment for function ExecParallelEstimate