Re: shared tempfile was not removed on statement_timeout - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: shared tempfile was not removed on statement_timeout
Date
Msg-id CA+fd4k4k0QQv=7m2GkXQqi7toy1EU8JUjo6P+u8iw2NKYdzqrA@mail.gmail.com
Whole thread Raw
In response to Re: shared tempfile was not removed on statement_timeout  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: shared tempfile was not removed on statement_timeout
List pgsql-hackers
On Wed, 29 Jul 2020 at 10:37, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote:
> > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote:
> > > Why can't tuplesort_end do it?
> >
> > Because then I think the parallel workers remove their own files, with tests
> > failing like:
> >
> > +ERROR:  could not open temporary file "0.0" from BufFile "0": No such file or directory
> >
> > I look around a bit more and came up with this, which works, but I don't know
> > enough to say if it's right.
>
> I convinced myself this is right, since state->nParticipants==-1 for workers.
> Only the leader should do the cleanup.
>
> Added here:
> https://commitfest.postgresql.org/29/2657/

I've also investigated this issue. As Thomas mentioned before, this
problem is not specific to parallel index creation. Shared temporary
files could be left if the process is interrupted while deleting the
file as a part of the work of detaching dsm segment.

To fix this issue, possible solutions would be:

1. Like the current patch, we call SharedFileSetDeleteAll() before
DestroyParallelContext() which calls dsm_detach() so that we can make
sure to delete these files while not relying on on_dsm_detach
callback. That way, even if the process is interrupted during that
cleaning, it will clean these files again during transaction abort
(AtEOXact_Parallel() calls dsm_detach()). OTOH a downside would be
that we will end up setting a rule that we need to explicitly call
SharedFileSetDeleteAll().

2. We don't use on_dsm_detach callback to delete the shared file set.
Instead, I wonder if we can delete them at the end of the transaction
by using ResourceOwner mechanism, like we do for non-shared temporary
files cleanup. This idea doesn't have the cons that idea #1 has. OTOH,
the lifetime of the shared file set will change from the parallel
context to the transaction, leading to keep many temporary files until
the transaction end. Also, we would need to rework the handling shared
file set.

3. We use on_dsm_detach as well as on_proc_exit callback to delete the
shared file set. It doesn't resolve the root cause but that way, even
if the process didn’t delete it on destroying the parallel context, we
can make sure to delete it on process exit.

I think #1 is suitable for back branches. For HEAD, I think #2 and #3
would be better in terms of not setting an implicit rule. Thoughts?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Disable WAL logging to speed up data loading
Next
From: Julien Rouhaud
Date:
Subject: Re: cutting down the TODO list thread