Re: shared tempfile was not removed on statement_timeout - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: shared tempfile was not removed on statement_timeout
Date
Msg-id 20200721043258.GE5748@telsasoft.com
Whole thread Raw
In response to Re: shared tempfile was not removed on statement_timeout (unreproducible)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: shared tempfile was not removed on statement_timeout
List pgsql-hackers
On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote:
> On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I have a nagios check on ancient tempfiles, intended to catch debris left by
> > crashed processes.  But triggered on this file:
> >
> > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> > 142977    4 drwxr-x---   3 postgres postgres     4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp
> > 169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> > 169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> > 169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
> >
> > I found:
> >  2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout
            | CLUSTER pg_stat_database_snap USI
 
> >  2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",
size5455872 | CLUSTER pg_stat_database_snap USI
 
> 
> Hmm.  I played around with this and couldn't reproduce it, but I
> thought of something.  What if the statement timeout is reached while
> we're in here:
> 
> #0  PathNameDeleteTemporaryDir (dirname=0x7fffffffd010 "base/pgsql_tmp/pgsql_tmp28884.31.sharedfileset") at
fd.c:1471
> #1  0x0000000000a32c77 in SharedFileSetDeleteAll (fileset=0x80182e2cc) at sharedfileset.c:177
> #2  0x0000000000a327e1 in SharedFileSetOnDetach (segment=0x80a6e62d8, datum=34385093324) at sharedfileset.c:206
> #3  0x0000000000a365ca in dsm_detach (seg=0x80a6e62d8) at dsm.c:684
> #4  0x000000000061621b in DestroyParallelContext (pcxt=0x80a708f20) at parallel.c:904
> #5  0x00000000005d97b3 in _bt_end_parallel (btleader=0x80fe9b4b0) at nbtsort.c:1473
> #6  0x00000000005d92f0 in btbuild (heap=0x80a7bc4c8, index=0x80a850a50, indexInfo=0x80fec1ab0) at nbtsort.c:340
...

> The CHECK_FOR_INTERRUPTS() inside the walkdir() loop could ereport()
> out of there after deleting some but not all of your files, but the
> code in dsm_detach() has already popped the callback (which it does
> "to avoid infinite error recursion"), so it won't run again on error
> cleanup.  Hmm.  But then... maybe the two log lines you quoted should
> be the other way around for that.

With inspired from re-reading your messages several times in rapid succession,
I was able to reproduce this easily with:

--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3344,6 +3344,7 @@ walkdir(const char *path,
                struct stat fst;
                int                     sret;
 
+               usleep(99999);
                CHECK_FOR_INTERRUPTS();
 
On Fri, Dec 13, 2019 at 03:49:26PM +1300, Thomas Munro wrote:
> Ok, so it looks like we shouldn't be relying on the same code path for
> 'happy' and 'error' cleanup.  This could probably be fixed with a well
> placed explicit call to SharedFileSetDeleteAll() or a new function
> SharedFileSetDestroy(), and perhaps a flag in shmem to say it's been
> done so the callback doesn't do it again needlessly.  I don't think
> this problem is specific to parallel index creation.

Find below a caveman-approved patch which avoids leaving behind tmpfiles.

I'm not sure how to do this cleanly, since:
| src/include/utils/tuplesort.h: * Tuplesortstate and Sharedsort are opaque types whose details are not

Maybe we need to add a new parameter like:
| tuplesort_end(Tuplesortstate *state, bool do_delete_fileset)

Arguably, that has the benefit that existing callers *have* to confront whether
they should delete the fileset or not.  This is such a minor issue that it's
unfortunate to force a confrontation over it.

-- 
Justin

diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index efee86784b..f5b0a48d64 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -511,12 +511,17 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
     return reltuples;
 }
 
+extern void *tuplesort_shared_fileset(Tuplesortstate*);
+
 /*
  * clean up a spool structure and its substructures.
  */
 static void
 _bt_spooldestroy(BTSpool *btspool)
 {
+    void *fileset = tuplesort_shared_fileset(btspool->sortstate);
+    if (fileset)
+        SharedFileSetDeleteAll(fileset);
     tuplesort_end(btspool->sortstate);
     pfree(btspool);
 }
@@ -1669,6 +1674,10 @@ _bt_end_parallel(BTLeader *btleader)
     /* Free last reference to MVCC snapshot, if one was used */
     if (IsMVCCSnapshot(btleader->snapshot))
         UnregisterSnapshot(btleader->snapshot);
+
+    // SharedFileSetDeleteAll(btleader->sharedsort->fileset);
+    // SharedFileSetDeleteAll(btleader->sharedsort2->fileset);
+
     DestroyParallelContext(btleader->pcxt);
     ExitParallelMode();
 }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 5f6420efb2..f89d42f475 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3344,6 +3344,7 @@ walkdir(const char *path,
         struct stat fst;
         int            sret;
 
+        usleep(99999);
         CHECK_FOR_INTERRUPTS();
 
         if (strcmp(de->d_name, ".") == 0 ||
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3c49476483..3de9592b78 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1375,6 +1375,11 @@ tuplesort_free(Tuplesortstate *state)
     MemoryContextReset(state->sortcontext);
 }
 
+void *tuplesort_shared_fileset(Tuplesortstate *state)
+{
+    return state->shared ? &state->shared->fileset : NULL;
+}
+
 /*
  * tuplesort_end
  *



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: expose parallel leader in CSV and log_line_prefix
Next
From: Michael Paquier
Date:
Subject: Re: expose parallel leader in CSV and log_line_prefix