Thread: PHJ file leak.

PHJ file leak.

From

Kyotaro Horiguchi

Date:

11 November 2019, 12:24:18

Hello. While looking a patch, I found that PHJ sometimes complains for
file leaks if accompanied by LIMIT.

Repro is very simple:

create table t as (select a, a as b from generate_series(0, 999999) a);
analyze t;
select t.a from t join t t2 on (t.a = t2.a) limit 1;

Once in several (or dozen of) times execution of the last query
complains as follows.

WARNING:  temporary file leak: File 15 still referenced
WARNING:  temporary file leak: File 17 still referenced

This is using PHJ and the leaked file was a shared tuplestore for
outer tuples, which was opend by sts_parallel_scan_next() called from
ExecParallelHashJoinOuterGetTuple(). It seems to me that
ExecHashTableDestroy is forgeting to release shared tuplestore
accessors. Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 224cbb32ba..8399683569 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -871,6 +871,9 @@ ExecHashTableDestroy(HashJoinTable hashtable)
         }
     }
 
+    /* Also, free up accessors to shared tuplestores if any */
+    ExecParallelHashCloseBatchAccessors(hashtable);
+
     /* Release working memory (batchCxt is a child, so it goes away too) */
     MemoryContextDelete(hashtable->hashCxt);
 
@@ -2991,6 +2994,7 @@ ExecParallelHashCloseBatchAccessors(HashJoinTable hashtable)
         sts_end_parallel_scan(hashtable->batches[i].outer_tuples);
     }
     pfree(hashtable->batches);
+    hashtable->nbatch = 0;
     hashtable->batches = NULL;
 }

Re: PHJ file leak.

From

Thomas Munro

Date:

11 November 2019, 22:18:39

On Tue, Nov 12, 2019 at 1:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> Hello. While looking a patch, I found that PHJ sometimes complains for
> file leaks if accompanied by LIMIT.

Oops.

> Repro is very simple:
>
> create table t as (select a, a as b from generate_series(0, 999999) a);
> analyze t;
> select t.a from t join t t2 on (t.a = t2.a) limit 1;
>
> Once in several (or dozen of) times execution of the last query
> complains as follows.
>
> WARNING:  temporary file leak: File 15 still referenced
> WARNING:  temporary file leak: File 17 still referenced

Ack.  Reproduced here.

> This is using PHJ and the leaked file was a shared tuplestore for
> outer tuples, which was opend by sts_parallel_scan_next() called from
> ExecParallelHashJoinOuterGetTuple(). It seems to me that
> ExecHashTableDestroy is forgeting to release shared tuplestore
> accessors. Please find the attached.

Thanks for the patch!  Yeah, this seems correct, but I'd like to think
about it some more before committing.  I'm going to be a bit tied up
with travel so that might be next week.

Re: PHJ file leak.

From

Tom Lane

Date:

11 November 2019, 22:24:45

Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Nov 12, 2019 at 1:24 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>> Hello. While looking a patch, I found that PHJ sometimes complains for
>> file leaks if accompanied by LIMIT.

> Thanks for the patch!  Yeah, this seems correct, but I'd like to think
> about it some more before committing.  I'm going to be a bit tied up
> with travel so that might be next week.

At this point we've missed the window for this week's releases, so
there's no great hurry (and it'd be best not to push any noncritical
patches into the back branches anyway, for the next 24 hours).

Although the patch seems unobjectionable as far as it goes, I'd like
to understand why we didn't see the need for it long since.  Is there
another call to ExecParallelHashCloseBatchAccessors somewhere, and
if so, is that one wrongly placed?

            regards, tom lane

Re: PHJ file leak.

From

Kyotaro Horiguchi

Date:

12 November 2019, 03:11:00

At Mon, 11 Nov 2019 17:24:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Tue, Nov 12, 2019 at 1:24 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> >> Hello. While looking a patch, I found that PHJ sometimes complains for
> >> file leaks if accompanied by LIMIT.
> 
> > Thanks for the patch!  Yeah, this seems correct, but I'd like to think
> > about it some more before committing.  I'm going to be a bit tied up
> > with travel so that might be next week.
> 
> At this point we've missed the window for this week's releases, so
> there's no great hurry (and it'd be best not to push any noncritical
> patches into the back branches anyway, for the next 24 hours).
> 
> Although the patch seems unobjectionable as far as it goes, I'd like
> to understand why we didn't see the need for it long since.  Is there
> another call to ExecParallelHashCloseBatchAccessors somewhere, and
> if so, is that one wrongly placed?

It's a simple race conditions between leader and workers.

If a scan on workers reached to the end, no batch file is open, but a
batch file is open if it doesn't reaches to the end.

If a worker notices that the channel to the leader is already closed
before reaching the limit, it calls ExecEndNode and doesn't call
ExecShutdownNode. Otherwise, if the worker finds the limit first, the
worker *shutdowns* itself then ends. Everything's clean.

On second thought, it seems a issue of ExecutePlan, rather than PHJ
itself. ExecutePlan should shutdown nodes when output channel is
broken.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ea4b586984..038bafe777 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1683,7 +1683,15 @@ ExecutePlan(EState *estate,
              * end the loop.
              */
             if (!dest->receiveSlot(slot, dest))
+            {
+                /*
+                 * If we know we won't need to back up, we can release
+                 * resources at this point.
+                 */
+                if (!(estate->es_top_eflags & EXEC_FLAG_BACKWARD))
+                    (void) ExecShutdownNode(planstate);
                 break;
+            }
         }
 
         /*

Re: PHJ file leak.

From

Kyotaro Horiguchi

Date:

12 November 2019, 03:19:58

At Mon, 11 Nov 2019 17:24:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Although the patch seems unobjectionable as far as it goes, I'd like
> to understand why we didn't see the need for it long since.  Is there
> another call to ExecParallelHashCloseBatchAccessors somewhere, and
> if so, is that one wrongly placed?

The previous patch would be wrong. The root cause is a open batch so
the right thing to be done at scan end is
ExecHashTableDeatchBatch. And the real issue here seems to be in
ExecutePlan, not in PHJ.

regards

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PHJ file leak.

From

Thomas Munro

Date:

12 November 2019, 03:23:46

On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Mon, 11 Nov 2019 17:24:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in
> > Although the patch seems unobjectionable as far as it goes, I'd like
> > to understand why we didn't see the need for it long since.  Is there
> > another call to ExecParallelHashCloseBatchAccessors somewhere, and
> > if so, is that one wrongly placed?
>
> The previous patch would be wrong. The root cause is a open batch so
> the right thing to be done at scan end is
> ExecHashTableDeatchBatch. And the real issue here seems to be in
> ExecutePlan, not in PHJ.

You are right.  Here is the email I just wrote that says the same
thing, but with less efficiency:

On Tue, Nov 12, 2019 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Tue, Nov 12, 2019 at 1:24 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> >> Hello. While looking a patch, I found that PHJ sometimes complains for
> >> file leaks if accompanied by LIMIT.
>
> > Thanks for the patch!  Yeah, this seems correct, but I'd like to think
> > about it some more before committing.  I'm going to be a bit tied up
> > with travel so that might be next week.
>
> At this point we've missed the window for this week's releases, so
> there's no great hurry (and it'd be best not to push any noncritical
> patches into the back branches anyway, for the next 24 hours).
>
> Although the patch seems unobjectionable as far as it goes, I'd like
> to understand why we didn't see the need for it long since.  Is there
> another call to ExecParallelHashCloseBatchAccessors somewhere, and
> if so, is that one wrongly placed?

I'll need to look into this some more in a few days, but here's a
partial analysis:

The usual way that these files get closed is by
sts_end_parallel_scan().  For the particular file in question here --
an outer batch file that is open for reading while we probe -- that
usually happens at the end of the per-batch probe phase before we try
to move to another batch or reach end of data.  In case of an early
end, it also happens in ExecShutdownHashJoin(), which detaches from
and cleans up shared resources, which includes closing these files.

There are a few ways that ExecShutdownNode() can be reached:

1.  ExecLimit() (which we now suspect to be bogus -- see nearby
unfinished business[1]), though that only happens in the leader in
this case
2.  ExecutePlan() on end-of-data.
3.  ExecutePlan() on reaching the requested tuple count.

Unfortunately those aren't the only ways out of ExecutePlan()'s loop,
and that may be a problem.  I think what's happening in this case is
that there is a race where dest->receiveSlot(slot, dest) returns false
because the leader has stopped receiving tuples (having received
enough tuples to satisfy the LIMIT) so we exit early, but that path
out of the loop doesn't run ExecShutdownNode().  EXEC_FLAG_BACKWARD
would also inhibit it, but that shouldn't be set in a parallel plan.

So, after thinking about it, I'm not sure the proposed patch is
conceptually sound (even if it seems to work), because
ExecHashTableDestroy() runs at 'end' time and that's after shared
memory has disappeared, so it shouldn't be doing shared
resource-related cleanup work, whereas
ExecParallelHashCloseBatchAccessors() is relates to shared resources
(for example, it calls sts_end_write(), which should never actually do
anything at this time but it is potentially a shm-updating routine,
which seems wrong to me).

Recommending a change is going to require more brainpower than I have
spare today due to other commitments.  ExecShutdownNode() is certainly
a bit tricky.

[1] https://www.postgresql.org/message-id/flat/87ims2amh6.fsf%40jsievers.enova.com

Re: PHJ file leak.

From

Thomas Munro

Date:

12 November 2019, 04:03:21

On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > At Mon, 11 Nov 2019 17:24:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in
> > > Although the patch seems unobjectionable as far as it goes, I'd like
> > > to understand why we didn't see the need for it long since.  Is there
> > > another call to ExecParallelHashCloseBatchAccessors somewhere, and
> > > if so, is that one wrongly placed?
> >
> > The previous patch would be wrong. The root cause is a open batch so
> > the right thing to be done at scan end is
> > ExecHashTableDeatchBatch. And the real issue here seems to be in
> > ExecutePlan, not in PHJ.
>
> You are right.  Here is the email I just wrote that says the same
> thing, but with less efficiency:

And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems
like the real fix here.  It's not optional to run that at
end-of-query, though you might get that impression from various
comments, and it's also not OK to call it before the end of the query,
though you might get that impression from what the code actually does.
CC'ing to Robert who designed ExecShutdownNode(), though admittedly
the matter might have been theoretical until PHJ came along (since the
only other executor node that implements ExecShutdownNode() was
Gather, and you can't have a Gather under a Gather, and if this
happens to you in the leader process then the user is gone and no one
will hear the screams).

Re: PHJ file leak.

From

Thomas Munro

Date:

12 November 2019, 20:48:19

On Tue, Nov 12, 2019 at 5:03 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > The previous patch would be wrong. The root cause is a open batch so
> > > the right thing to be done at scan end is
> > > ExecHashTableDeatchBatch. And the real issue here seems to be in
> > > ExecutePlan, not in PHJ.
> >
> > You are right.  Here is the email I just wrote that says the same
> > thing, but with less efficiency:
>
> And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems
> like the real fix here.  It's not optional to run that at
> end-of-query, though you might get that impression from various
> comments, and it's also not OK to call it before the end of the query,
> though you might get that impression from what the code actually does.

Here's the version I'd like to commit in a day or two, once the dust
has settled on the minor release.  Instead of adding yet another copy
of that code, I just moved it out of the loop; this way there is no
way to miss it.  I think the comment could also be better, but I'll
wait for the concurrent discussions about the meaning of
ExecShutdownNode() to fix that in master.

Attachment

0001-Make-sure-we-call-ExecShutdownNode-if-appropriate.patch

Re: PHJ file leak.

From

Kyotaro Horiguchi

Date:

13 November 2019, 00:42:43

At Wed, 13 Nov 2019 09:48:19 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> On Tue, Nov 12, 2019 at 5:03 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi
> > > <horikyota.ntt@gmail.com> wrote:
> > > > The previous patch would be wrong. The root cause is a open batch so
> > > > the right thing to be done at scan end is
> > > > ExecHashTableDeatchBatch. And the real issue here seems to be in
> > > > ExecutePlan, not in PHJ.
> > >
> > > You are right.  Here is the email I just wrote that says the same
> > > thing, but with less efficiency:
> >
> > And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems
> > like the real fix here.  It's not optional to run that at
> > end-of-query, though you might get that impression from various
> > comments, and it's also not OK to call it before the end of the query,
> > though you might get that impression from what the code actually does.
> 
> Here's the version I'd like to commit in a day or two, once the dust
> has settled on the minor release.  Instead of adding yet another copy
> of that code, I just moved it out of the loop; this way there is no
> way to miss it.  I think the comment could also be better, but I'll
> wait for the concurrent discussions about the meaning of
> ExecShutdownNode() to fix that in master.

The phatch's shape looks better. Thanks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PHJ file leak.

From

Amit Kapila

Date:

13 November 2019, 08:52:45

On Wed, Nov 13, 2019 at 6:13 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 13 Nov 2019 09:48:19 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in
> >
> > Here's the version I'd like to commit in a day or two, once the dust
> > has settled on the minor release.  Instead of adding yet another copy
> > of that code, I just moved it out of the loop; this way there is no
> > way to miss it.  I think the comment could also be better, but I'll
> > wait for the concurrent discussions about the meaning of
> > ExecShutdownNode() to fix that in master.
>
> The phatch's shape looks better. Thanks.
>

+1.  LGTM as well.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PHJ file leak.

From

Thomas Munro

Date:

15 November 2019, 21:32:14

On Wed, Nov 13, 2019 at 9:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Nov 13, 2019 at 6:13 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > The phatch's shape looks better. Thanks.
>
> +1.  LGTM as well.

Thanks.  Pushed.

Re: PHJ file leak.

From

Jeremy Finzel

Date:

06 March 2020, 15:18:34

On Fri, Nov 15, 2019 at 3:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Nov 13, 2019 at 9:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Nov 13, 2019 at 6:13 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > The phatch's shape looks better. Thanks.
>
> +1. LGTM as well.

Thanks. Pushed.

We are hitting this leak in production on an 11.6 system for a query that is using a parallel hash join. Was this fix pushed in 11.7? I can't tell clearly from the release notes for 11.7 or this thread.

Thanks!
Jeremy

Re: PHJ file leak.

From

Tom Lane

Date:

06 March 2020, 15:43:56

Jeremy Finzel <finzelj@gmail.com> writes:
> We are hitting this leak in production on an 11.6 system for a query that
> is using a parallel hash join.  Was this fix pushed in 11.7? I can't tell
> clearly from the release notes for 11.7 or this thread.

It looks like you're asking about this commit:

Author: Thomas Munro <tmunro@postgresql.org>
Branch: master [76cbfcdf3] 2019-11-16 10:11:30 +1300
Branch: REL_12_STABLE Release: REL_12_2 [24897e1a1] 2019-11-16 10:18:45 +1300
Branch: REL_11_STABLE Release: REL_11_7 [bc049d0d4] 2019-11-16 10:19:16 +1300

    Always call ExecShutdownNode() if appropriate.

which is documented thus in the 11.7 release notes:

    <listitem>
<!--
Author: Thomas Munro <tmunro@postgresql.org>
Branch: master [76cbfcdf3] 2019-11-16 10:11:30 +1300
Branch: REL_12_STABLE [24897e1a1] 2019-11-16 10:18:45 +1300
Branch: REL_11_STABLE [bc049d0d4] 2019-11-16 10:19:16 +1300
-->
     <para>
      Ensure parallel plans are always shut down at the correct time
      (Kyotaro Horiguchi)
     </para>

     <para>
      This oversight is known to result in <quote>temporary file
      leak</quote> warnings from multi-batch parallel hash joins.
     </para>
    </listitem>


            regards, tom lane

Re: PHJ file leak.

From

Jeremy Finzel

Date:

06 March 2020, 15:47:49

On Fri, Mar 6, 2020 at 9:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeremy Finzel <finzelj@gmail.com> writes:
> We are hitting this leak in production on an 11.6 system for a query that
> is using a parallel hash join. Was this fix pushed in 11.7? I can't tell
> clearly from the release notes for 11.7 or this thread.

It looks like you're asking about this commit:

Author: Thomas Munro <tmunro@postgresql.org>
Branch: master [76cbfcdf3] 2019-11-16 10:11:30 +1300
Branch: REL_12_STABLE Release: REL_12_2 [24897e1a1] 2019-11-16 10:18:45 +1300
Branch: REL_11_STABLE Release: REL_11_7 [bc049d0d4] 2019-11-16 10:19:16 +1300

Always call ExecShutdownNode() if appropriate.

which is documented thus in the 11.7 release notes:

<listitem>

<para>
Ensure parallel plans are always shut down at the correct time
(Kyotaro Horiguchi)
</para>

<para>
This oversight is known to result in <quote>temporary file
leak</quote> warnings from multi-batch parallel hash joins.
</para>
</listitem>

regards, tom lane

Thank you! Yep, pretty clear :).