Thread: shared tempfile was not removed on statement_timeout (unreproducible)

shared tempfile was not removed on statement_timeout (unreproducible)

From

Justin Pryzby

Date:

12 December 2019, 18:05:06

I have a nagios check on ancient tempfiles, intended to catch debris left by
crashed processes.  But triggered on this file:

$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
142977    4 drwxr-x---   3 postgres postgres     4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp
169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0

I found:
 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout
        | CLUSTER pg_stat_database_snap USI
 
 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",
size5455872 | CLUSTER pg_stat_database_snap USI
 

I don't have a saved log entry for sharedfileset/0.0 - that may be significant,
or may be a bug in my log-archiving script.

The process has not crashed since it started:

postgres 18145     1  0 Nov18 ?        00:51:39 /usr/pgsql-12/bin/postmaster -D /var/lib/pgsql/12/data
postgres 18147 18145  0 Nov18 ?        00:00:53 postgres: logger                                      
postgres 18149 18145  0 Nov18 ?        00:38:42 postgres: checkpointer                                

version | PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

Just to be sure:
$ ps -wwf 11025 || echo not running
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
not running

I wasn't able to reproduce it like this.
PGOPTIONS='-c maintenance_work_mem=128MB -c client_min_messages=debug' psql postgres -c 'CREATE TABLE t (i int unique);
INSERTINTO t SELECT generate_series(1,999999)' -c 'SET statement_timeout=4999' -c 'CLUSTER t USING t_i_key'
 

Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks
non-regular files and thus shared filesets.  Maybe that's worth discussion on a
new thread ?

src/backend/utils/adt/genfile.c
                /* Ignore anything but regular files */
                if (!S_ISREG(attrib.st_mode))
                        continue;

BTW there's no other tablespaces.

Justin

Re: shared tempfile was not removed on statement_timeout (unreproducible)

From

Thomas Munro

Date:

13 December 2019, 02:03:47

On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I have a nagios check on ancient tempfiles, intended to catch debris left by
> crashed processes.  But triggered on this file:
>
> $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> 142977    4 drwxr-x---   3 postgres postgres     4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp
> 169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> 169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> 169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
>
> I found:
>  2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout
          | CLUSTER pg_stat_database_snap USI
 
>  2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",
size5455872 | CLUSTER pg_stat_database_snap USI
 

Hmm.  I played around with this and couldn't reproduce it, but I
thought of something.  What if the statement timeout is reached while
we're in here:

#0  PathNameDeleteTemporaryDir (dirname=0x7fffffffd010
"base/pgsql_tmp/pgsql_tmp28884.31.sharedfileset") at fd.c:1471
#1  0x0000000000a32c77 in SharedFileSetDeleteAll (fileset=0x80182e2cc)
at sharedfileset.c:177
#2  0x0000000000a327e1 in SharedFileSetOnDetach (segment=0x80a6e62d8,
datum=34385093324) at sharedfileset.c:206
#3  0x0000000000a365ca in dsm_detach (seg=0x80a6e62d8) at dsm.c:684
#4  0x000000000061621b in DestroyParallelContext (pcxt=0x80a708f20) at
parallel.c:904
#5  0x00000000005d97b3 in _bt_end_parallel (btleader=0x80fe9b4b0) at
nbtsort.c:1473
#6  0x00000000005d92f0 in btbuild (heap=0x80a7bc4c8,
index=0x80a850a50, indexInfo=0x80fec1ab0) at nbtsort.c:340
#7  0x000000000067445b in index_build (heapRelation=0x80a7bc4c8,
indexRelation=0x80a850a50, indexInfo=0x80fec1ab0, isreindex=true,
parallel=true) at index.c:2963
#8  0x0000000000677bd3 in reindex_index (indexId=16532,
skip_constraint_checks=true, persistence=112 'p', options=0) at
index.c:3591
#9  0x0000000000678402 in reindex_relation (relid=16508, flags=18,
options=0) at index.c:3807
#10 0x000000000073928f in finish_heap_swap (OIDOldHeap=16508,
OIDNewHeap=16573, is_system_catalog=false,
swap_toast_by_content=false, check_constraints=false,
is_internal=true, frozenXid=604, cutoffMulti=1, newrelpersistence=112
'p') at cluster.c:1409
#11 0x00000000007389ab in rebuild_relation (OldHeap=0x80a7bc4c8,
indexOid=16532, verbose=false) at cluster.c:622
#12 0x000000000073849e in cluster_rel (tableOid=16508, indexOid=16532,
options=0) at cluster.c:428
#13 0x0000000000737f22 in cluster (stmt=0x800cfcbf0, isTopLevel=true)
at cluster.c:185
#14 0x0000000000a7cc5c in standard_ProcessUtility (pstmt=0x800cfcf40,
queryString=0x800cfc120 "cluster t USING t_i_idx ;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x800cfd030, completionTag=0x7fffffffe0b0 "") at utility.c:654

The CHECK_FOR_INTERRUPTS() inside the walkdir() loop could ereport()
out of there after deleting some but not all of your files, but the
code in dsm_detach() has already popped the callback (which it does
"to avoid infinite error recursion"), so it won't run again on error
cleanup.  Hmm.  But then... maybe the two log lines you quoted should
be the other way around for that.

> Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks
> non-regular files and thus shared filesets.  Maybe that's worth discussion on a
> new thread ?
>
> src/backend/utils/adt/genfile.c
>                 /* Ignore anything but regular files */
>                 if (!S_ISREG(attrib.st_mode))
>                         continue;

+1, that's worth fixing.

Re: shared tempfile was not removed on statement_timeout(unreproducible)

From

Justin Pryzby

Date:

13 December 2019, 02:13:22

On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote:
> On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I have a nagios check on ancient tempfiles, intended to catch debris left by
> > crashed processes.  But triggered on this file:
> >
> > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> > 142977    4 drwxr-x---   3 postgres postgres     4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp
> > 169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> > 169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> > 169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
> >
> > I found:
> >  2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout
            | CLUSTER pg_stat_database_snap USI
 
> >  2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",
size5455872 | CLUSTER pg_stat_database_snap USI
 
> 
> Hmm.  But then... maybe the two log lines you quoted should
> be the other way around for that.

And, it's actually the other way around, when I order BY something better than
left(log_time::text,19).

postgres=# SELECT log_time pid, session_line ln, pid, database db, left(message,99), left(query,33) FROM
jrn_postgres_logWHERE user_name IS NOT NULL AND log_time BETWEEN '2019-12-07 01:35' AND '2019-12-07 01:36' ORDER BY
1,2;

 2019-12-07 01:35:56.626-06 |  1 | 11025 | postgres | temporary file: path
"base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",size 5455872 | CLUSTER pg_stat_database_snap USI
 
 2019-12-07 01:35:56.626-06 |  2 | 11025 | postgres | canceling statement due to statement timeout
                    | CLUSTER pg_stat_database_snap USI

Re: shared tempfile was not removed on statement_timeout (unreproducible)

From

Thomas Munro

Date:

13 December 2019, 02:49:26

On Fri, Dec 13, 2019 at 3:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote:
> > On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >  2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout
              | CLUSTER pg_stat_database_snap USI

> > >  2019-12-07 01:35:56 | 11025 | postgres | temporary file: path
"base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",size 5455872 | CLUSTER pg_stat_database_snap USI

> >
> > Hmm.  But then... maybe the two log lines you quoted should
> > be the other way around for that.
>
> And, it's actually the other way around, when I order BY something better than
> left(log_time::text,19).

Hah.

Ok, so it looks like we shouldn't be relying on the same code path for
'happy' and 'error' cleanup.  This could probably be fixed with a well
placed explicit call to SharedFileSetDeleteAll() or a new function
SharedFileSetDestroy(), and perhaps a flag in shmem to say it's been
done so the callback doesn't do it again needlessly.  I don't think
this problem is specific to parallel index creation.

Re: shared tempfile was not removed on statement_timeout

From

Justin Pryzby

Date:

21 July 2020, 04:32:58

On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote:
> On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I have a nagios check on ancient tempfiles, intended to catch debris left by
> > crashed processes.  But triggered on this file:
> >
> > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> > 142977    4 drwxr-x---   3 postgres postgres     4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp
> > 169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> > 169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> > 169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
> >
> > I found:
> >  2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout
            | CLUSTER pg_stat_database_snap USI
 
> >  2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",
size5455872 | CLUSTER pg_stat_database_snap USI
 
> 
> Hmm.  I played around with this and couldn't reproduce it, but I
> thought of something.  What if the statement timeout is reached while
> we're in here:
> 
> #0  PathNameDeleteTemporaryDir (dirname=0x7fffffffd010 "base/pgsql_tmp/pgsql_tmp28884.31.sharedfileset") at
fd.c:1471
> #1  0x0000000000a32c77 in SharedFileSetDeleteAll (fileset=0x80182e2cc) at sharedfileset.c:177
> #2  0x0000000000a327e1 in SharedFileSetOnDetach (segment=0x80a6e62d8, datum=34385093324) at sharedfileset.c:206
> #3  0x0000000000a365ca in dsm_detach (seg=0x80a6e62d8) at dsm.c:684
> #4  0x000000000061621b in DestroyParallelContext (pcxt=0x80a708f20) at parallel.c:904
> #5  0x00000000005d97b3 in _bt_end_parallel (btleader=0x80fe9b4b0) at nbtsort.c:1473
> #6  0x00000000005d92f0 in btbuild (heap=0x80a7bc4c8, index=0x80a850a50, indexInfo=0x80fec1ab0) at nbtsort.c:340
...

> The CHECK_FOR_INTERRUPTS() inside the walkdir() loop could ereport()
> out of there after deleting some but not all of your files, but the
> code in dsm_detach() has already popped the callback (which it does
> "to avoid infinite error recursion"), so it won't run again on error
> cleanup.  Hmm.  But then... maybe the two log lines you quoted should
> be the other way around for that.

With inspired from re-reading your messages several times in rapid succession,
I was able to reproduce this easily with:

--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3344,6 +3344,7 @@ walkdir(const char *path,
                struct stat fst;
                int                     sret;
 
+               usleep(99999);
                CHECK_FOR_INTERRUPTS();
 
On Fri, Dec 13, 2019 at 03:49:26PM +1300, Thomas Munro wrote:
> Ok, so it looks like we shouldn't be relying on the same code path for
> 'happy' and 'error' cleanup.  This could probably be fixed with a well
> placed explicit call to SharedFileSetDeleteAll() or a new function
> SharedFileSetDestroy(), and perhaps a flag in shmem to say it's been
> done so the callback doesn't do it again needlessly.  I don't think
> this problem is specific to parallel index creation.

Find below a caveman-approved patch which avoids leaving behind tmpfiles.

I'm not sure how to do this cleanly, since:
| src/include/utils/tuplesort.h: * Tuplesortstate and Sharedsort are opaque types whose details are not

Maybe we need to add a new parameter like:
| tuplesort_end(Tuplesortstate *state, bool do_delete_fileset)

Arguably, that has the benefit that existing callers *have* to confront whether
they should delete the fileset or not.  This is such a minor issue that it's
unfortunate to force a confrontation over it.

-- 
Justin

diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index efee86784b..f5b0a48d64 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -511,12 +511,17 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
     return reltuples;
 }
 
+extern void *tuplesort_shared_fileset(Tuplesortstate*);
+
 /*
  * clean up a spool structure and its substructures.
  */
 static void
 _bt_spooldestroy(BTSpool *btspool)
 {
+    void *fileset = tuplesort_shared_fileset(btspool->sortstate);
+    if (fileset)
+        SharedFileSetDeleteAll(fileset);
     tuplesort_end(btspool->sortstate);
     pfree(btspool);
 }
@@ -1669,6 +1674,10 @@ _bt_end_parallel(BTLeader *btleader)
     /* Free last reference to MVCC snapshot, if one was used */
     if (IsMVCCSnapshot(btleader->snapshot))
         UnregisterSnapshot(btleader->snapshot);
+
+    // SharedFileSetDeleteAll(btleader->sharedsort->fileset);
+    // SharedFileSetDeleteAll(btleader->sharedsort2->fileset);
+
     DestroyParallelContext(btleader->pcxt);
     ExitParallelMode();
 }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 5f6420efb2..f89d42f475 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3344,6 +3344,7 @@ walkdir(const char *path,
         struct stat fst;
         int            sret;
 
+        usleep(99999);
         CHECK_FOR_INTERRUPTS();
 
         if (strcmp(de->d_name, ".") == 0 ||
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3c49476483..3de9592b78 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1375,6 +1375,11 @@ tuplesort_free(Tuplesortstate *state)
     MemoryContextReset(state->sortcontext);
 }
 
+void *tuplesort_shared_fileset(Tuplesortstate *state)
+{
+    return state->shared ? &state->shared->fileset : NULL;
+}
+
 /*
  * tuplesort_end
  *

Re: shared tempfile was not removed on statement_timeout

From

Thomas Munro

Date:

27 July 2020, 08:00:46

On Tue, Jul 21, 2020 at 4:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>  /*
>   * clean up a spool structure and its substructures.
>   */
>  static void
>  _bt_spooldestroy(BTSpool *btspool)
>  {
> +       void *fileset = tuplesort_shared_fileset(btspool->sortstate);
> +       if (fileset)
> +               SharedFileSetDeleteAll(fileset);
>         tuplesort_end(btspool->sortstate);
>         pfree(btspool);
>  }

Why can't tuplesort_end do it?

Re: shared tempfile was not removed on statement_timeout

From

Justin Pryzby

Date:

27 July 2020, 10:39:02

On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote:
> On Tue, Jul 21, 2020 at 4:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >  /*
> >   * clean up a spool structure and its substructures.
> >   */
> >  static void
> >  _bt_spooldestroy(BTSpool *btspool)
> >  {
> > +       void *fileset = tuplesort_shared_fileset(btspool->sortstate);
> > +       if (fileset)
> > +               SharedFileSetDeleteAll(fileset);
> >         tuplesort_end(btspool->sortstate);
> >         pfree(btspool);
> >  }
> 
> Why can't tuplesort_end do it?

Because then I think the parallel workers remove their own files, with tests
failing like:

+ERROR:  could not open temporary file "0.0" from BufFile "0": No such file or directory

I look around a bit more and came up with this, which works, but I don't know
enough to say if it's right.

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 5f6420efb2..f89d42f475 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3344,6 +3344,7 @@ walkdir(const char *path,
         struct stat fst;
         int            sret;
 
+        usleep(99999);
         CHECK_FOR_INTERRUPTS();
 
         if (strcmp(de->d_name, ".") == 0 ||
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3c49476483..c6e5e6d00b 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1387,6 +1387,9 @@ tuplesort_free(Tuplesortstate *state)
 void
 tuplesort_end(Tuplesortstate *state)
 {
+    if (state->shared && state->shared->workersFinished == state->nParticipants)
+        SharedFileSetDeleteAll(&state->shared->fileset);
+
     tuplesort_free(state);
 
     /*

Re: shared tempfile was not removed on statement_timeout

From

Justin Pryzby

Date:

29 July 2020, 01:37:24

On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote:
> On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote:
> > Why can't tuplesort_end do it?
> 
> Because then I think the parallel workers remove their own files, with tests
> failing like:
> 
> +ERROR:  could not open temporary file "0.0" from BufFile "0": No such file or directory
> 
> I look around a bit more and came up with this, which works, but I don't know
> enough to say if it's right.

I convinced myself this is right, since state->nParticipants==-1 for workers.
Only the leader should do the cleanup.

Added here:
https://commitfest.postgresql.org/29/2657/

-- 
Justin

Attachment

v1-0001-explicity-remove-shared-fileset.patch

Re: shared tempfile was not removed on statement_timeout

From

Masahiko Sawada

Date:

28 October 2020, 10:03:14

On Wed, 29 Jul 2020 at 10:37, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote:
> > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote:
> > > Why can't tuplesort_end do it?
> >
> > Because then I think the parallel workers remove their own files, with tests
> > failing like:
> >
> > +ERROR:  could not open temporary file "0.0" from BufFile "0": No such file or directory
> >
> > I look around a bit more and came up with this, which works, but I don't know
> > enough to say if it's right.
>
> I convinced myself this is right, since state->nParticipants==-1 for workers.
> Only the leader should do the cleanup.
>
> Added here:
> https://commitfest.postgresql.org/29/2657/

I've also investigated this issue. As Thomas mentioned before, this
problem is not specific to parallel index creation. Shared temporary
files could be left if the process is interrupted while deleting the
file as a part of the work of detaching dsm segment.

To fix this issue, possible solutions would be:

1. Like the current patch, we call SharedFileSetDeleteAll() before
DestroyParallelContext() which calls dsm_detach() so that we can make
sure to delete these files while not relying on on_dsm_detach
callback. That way, even if the process is interrupted during that
cleaning, it will clean these files again during transaction abort
(AtEOXact_Parallel() calls dsm_detach()). OTOH a downside would be
that we will end up setting a rule that we need to explicitly call
SharedFileSetDeleteAll().

2. We don't use on_dsm_detach callback to delete the shared file set.
Instead, I wonder if we can delete them at the end of the transaction
by using ResourceOwner mechanism, like we do for non-shared temporary
files cleanup. This idea doesn't have the cons that idea #1 has. OTOH,
the lifetime of the shared file set will change from the parallel
context to the transaction, leading to keep many temporary files until
the transaction end. Also, we would need to rework the handling shared
file set.

3. We use on_dsm_detach as well as on_proc_exit callback to delete the
shared file set. It doesn't resolve the root cause but that way, even
if the process didn’t delete it on destroying the parallel context, we
can make sure to delete it on process exit.

I think #1 is suitable for back branches. For HEAD, I think #2 and #3
would be better in terms of not setting an implicit rule. Thoughts?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: shared tempfile was not removed on statement_timeout

From

Kyotaro Horiguchi

Date:

26 January 2021, 04:46:18

At Wed, 28 Oct 2020 19:03:14 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in 
> On Wed, 29 Jul 2020 at 10:37, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote:
> > > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote:
> > > > Why can't tuplesort_end do it?
> > >
> > > Because then I think the parallel workers remove their own files, with tests
> > > failing like:
> > >
> > > +ERROR:  could not open temporary file "0.0" from BufFile "0": No such file or directory
> > >
> > > I look around a bit more and came up with this, which works, but I don't know
> > > enough to say if it's right.
> >
> > I convinced myself this is right, since state->nParticipants==-1 for workers.
> > Only the leader should do the cleanup.
> >
> > Added here:
> > https://commitfest.postgresql.org/29/2657/

+    if (state->shared && state->shared->workersFinished == state->nParticipants)

Isn't it more straight forward to check "state->shared &&
state->worker == -1"?

> I've also investigated this issue. As Thomas mentioned before, this
> problem is not specific to parallel index creation. Shared temporary
> files could be left if the process is interrupted while deleting the
> file as a part of the work of detaching dsm segment.
> 
> To fix this issue, possible solutions would be:
> 
> 1. Like the current patch, we call SharedFileSetDeleteAll() before
> DestroyParallelContext() which calls dsm_detach() so that we can make
> sure to delete these files while not relying on on_dsm_detach
> callback. That way, even if the process is interrupted during that
> cleaning, it will clean these files again during transaction abort
> (AtEOXact_Parallel() calls dsm_detach()). OTOH a downside would be
> that we will end up setting a rule that we need to explicitly call
> SharedFileSetDeleteAll().

That seems to be common. We release lock explicitly but it is
automatically released on error. Of couse it is slightly different
that SharedFileSetOnDetach unconditionally removes the directory but
that doesn't matter as far as that behavior doesn't lead to an
error. We can skip, as Thomas suggested, the cleanup if not necessary.

Looking the comment of SharedFileSetOnDetach:

| * everything in them.  We can't raise an error on failures, because this runs
| * in error cleanup paths.

I feel that a function that shouldn't error-out also shouldn't be
cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in
walkdir() when elevel is smaller than ERROR.

=====
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58502837a..593c23553e 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3374,7 +3374,9 @@ walkdir(const char *path,
     {
         char        subpath[MAXPGPATH * 2];
 
-        CHECK_FOR_INTERRUPTS();
+        /* omit interrupts while we shouldn't error-out */
+        if (elevel >= ERROR)
+            CHECK_FOR_INTERRUPTS();
 
         if (strcmp(de->d_name, ".") == 0 ||
             strcmp(de->d_name, "..") == 0)
=====

> 2. We don't use on_dsm_detach callback to delete the shared file set.
> Instead, I wonder if we can delete them at the end of the transaction
> by using ResourceOwner mechanism, like we do for non-shared temporary
> files cleanup. This idea doesn't have the cons that idea #1 has. OTOH,
> the lifetime of the shared file set will change from the parallel
> context to the transaction, leading to keep many temporary files until
> the transaction end. Also, we would need to rework the handling shared
> file set.
> 
> 3. We use on_dsm_detach as well as on_proc_exit callback to delete the
> shared file set. It doesn't resolve the root cause but that way, even
> if the process didn’t delete it on destroying the parallel context, we
> can make sure to delete it on process exit.
> 
> I think #1 is suitable for back branches. For HEAD, I think #2 and #3
> would be better in terms of not setting an implicit rule. Thoughts?

As far as we allow dms_detach being canceled, the problem persists
anywhat other we do. So #2 and #3 seems a bit too much. It seems to me
that #1 + omitting CHECK_FOR_INTERRUPTS() is suitable for all
branches.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: shared tempfile was not removed on statement_timeout

From

Heikki Linnakangas

Date:

26 January 2021, 09:00:56

On 26/01/2021 06:46, Kyotaro Horiguchi wrote:
> Looking the comment of SharedFileSetOnDetach:
> 
> | * everything in them.  We can't raise an error on failures, because this runs
> | * in error cleanup paths.
> 
> I feel that a function that shouldn't error-out also shouldn't be
> cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in
> walkdir() when elevel is smaller than ERROR.
> 
> =====
> diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
> index b58502837a..593c23553e 100644
> --- a/src/backend/storage/file/fd.c
> +++ b/src/backend/storage/file/fd.c
> @@ -3374,7 +3374,9 @@ walkdir(const char *path,
>       {
>           char        subpath[MAXPGPATH * 2];
>   
> -        CHECK_FOR_INTERRUPTS();
> +        /* omit interrupts while we shouldn't error-out */
> +        if (elevel >= ERROR)
> +            CHECK_FOR_INTERRUPTS();
>   
>           if (strcmp(de->d_name, ".") == 0 ||
>               strcmp(de->d_name, "..") == 0)
> =====

Don't we potentially have the same problem with all on_dsm_detach 
callbacks? Looking at the other on_dsm_detach callbacks, I don't see any 
CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to assume that.

I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least 
around the removal of the callback from the list and calling the 
callback. Maybe even over the whole function.

- Heikki

Re: shared tempfile was not removed on statement_timeout

From

Kyotaro Horiguchi

Date:

26 January 2021, 11:22:21

At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in 
> On 26/01/2021 06:46, Kyotaro Horiguchi wrote:
> > Looking the comment of SharedFileSetOnDetach:
> > | * everything in them.  We can't raise an error on failures, because this
> > | * runs
> > | * in error cleanup paths.
> > I feel that a function that shouldn't error-out also shouldn't be
> > cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in
> > walkdir() when elevel is smaller than ERROR.
> > =====
> > diff --git a/src/backend/storage/file/fd.c
> > b/src/backend/storage/file/fd.c
> > index b58502837a..593c23553e 100644
> > --- a/src/backend/storage/file/fd.c
> > +++ b/src/backend/storage/file/fd.c
> > @@ -3374,7 +3374,9 @@ walkdir(const char *path,
> >       {
> >           char        subpath[MAXPGPATH * 2];
> >   -        CHECK_FOR_INTERRUPTS();
> > +        /* omit interrupts while we shouldn't error-out */
> > +        if (elevel >= ERROR)
> > +            CHECK_FOR_INTERRUPTS();
> >             if (strcmp(de->d_name, ".") == 0 ||
> >               strcmp(de->d_name, "..") == 0)
> > =====
> 
> Don't we potentially have the same problem with all on_dsm_detach
> callbacks? Looking at the other on_dsm_detach callbacks, I don't see
> any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to
> assume that.
> 
> I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least
> around the removal of the callback from the list and calling the
> callback. Maybe even over the whole function.

Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same
location.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: shared tempfile was not removed on statement_timeout

From

Thomas Munro

Date:

26 January 2021, 20:34:25

On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
> > Don't we potentially have the same problem with all on_dsm_detach
> > callbacks? Looking at the other on_dsm_detach callbacks, I don't see
> > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to
> > assume that.
> >
> > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least
> > around the removal of the callback from the list and calling the
> > callback. Maybe even over the whole function.
>
> Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same
> location.

+1, this seems like a good idea.  This is a little bit like the code
near the comments "Don't joggle the elbow of proc_exit".

Re: shared tempfile was not removed on statement_timeout

From

Thomas Munro

Date:

31 January 2021, 01:26:31

On Wed, Jan 27, 2021 at 9:34 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
> > > Don't we potentially have the same problem with all on_dsm_detach
> > > callbacks? Looking at the other on_dsm_detach callbacks, I don't see
> > > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to
> > > assume that.
> > >
> > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least
> > > around the removal of the callback from the list and calling the
> > > callback. Maybe even over the whole function.
> >
> > Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same
> > location.
>
> +1, this seems like a good idea.  This is a little bit like the code
> near the comments "Don't joggle the elbow of proc_exit".

So that gives a very simple back-patchable patch.

Attachment

0001-Hold-interrupts-while-running-dsm_detach-callbacks.patch

Re: shared tempfile was not removed on statement_timeout

From

Tom Lane

Date:

31 January 2021, 05:07:01

Thomas Munro <thomas.munro@gmail.com> writes:
>> +1, this seems like a good idea.  This is a little bit like the code
>> near the comments "Don't joggle the elbow of proc_exit".

> So that gives a very simple back-patchable patch.

Hmm, so is the *rest* of that function perfectly okay with being
interrupted?

            regards, tom lane

Re: shared tempfile was not removed on statement_timeout

From

Thomas Munro

Date:

05 February 2021, 04:47:16

On Sun, Jan 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > So that gives a very simple back-patchable patch.
>
> Hmm, so is the *rest* of that function perfectly okay with being
> interrupted?

It looks OK to me.  There aren't any CFI()s in there.

Re: shared tempfile was not removed on statement_timeout

From

Thomas Munro

Date:

15 February 2021, 01:33:00

On Fri, Feb 5, 2021 at 5:47 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sun, Jan 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > So that gives a very simple back-patchable patch.
> >
> > Hmm, so is the *rest* of that function perfectly okay with being
> > interrupted?
>
> It looks OK to me.  There aren't any CFI()s in there.

Pushed.  That closes CF #2657.