Thread: shared tempfile was not removed on statement_timeout (unreproducible)
I have a nagios check on ancient tempfiles, intended to catch debris left by crashed processes. But triggered on this file: $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 I found: 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0", size5455872 | CLUSTER pg_stat_database_snap USI I don't have a saved log entry for sharedfileset/0.0 - that may be significant, or may be a bug in my log-archiving script. The process has not crashed since it started: postgres 18145 1 0 Nov18 ? 00:51:39 /usr/pgsql-12/bin/postmaster -D /var/lib/pgsql/12/data postgres 18147 18145 0 Nov18 ? 00:00:53 postgres: logger postgres 18149 18145 0 Nov18 ? 00:38:42 postgres: checkpointer version | PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit Just to be sure: $ ps -wwf 11025 || echo not running UID PID PPID C STIME TTY STAT TIME CMD not running I wasn't able to reproduce it like this. PGOPTIONS='-c maintenance_work_mem=128MB -c client_min_messages=debug' psql postgres -c 'CREATE TABLE t (i int unique); INSERTINTO t SELECT generate_series(1,999999)' -c 'SET statement_timeout=4999' -c 'CLUSTER t USING t_i_key' Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks non-regular files and thus shared filesets. Maybe that's worth discussion on a new thread ? src/backend/utils/adt/genfile.c /* Ignore anything but regular files */ if (!S_ISREG(attrib.st_mode)) continue; BTW there's no other tablespaces. Justin
On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > I have a nagios check on ancient tempfiles, intended to catch debris left by > crashed processes. But triggered on this file: > > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls > 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp > 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset > 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 > 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 > > I found: > 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI > 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0", size5455872 | CLUSTER pg_stat_database_snap USI Hmm. I played around with this and couldn't reproduce it, but I thought of something. What if the statement timeout is reached while we're in here: #0 PathNameDeleteTemporaryDir (dirname=0x7fffffffd010 "base/pgsql_tmp/pgsql_tmp28884.31.sharedfileset") at fd.c:1471 #1 0x0000000000a32c77 in SharedFileSetDeleteAll (fileset=0x80182e2cc) at sharedfileset.c:177 #2 0x0000000000a327e1 in SharedFileSetOnDetach (segment=0x80a6e62d8, datum=34385093324) at sharedfileset.c:206 #3 0x0000000000a365ca in dsm_detach (seg=0x80a6e62d8) at dsm.c:684 #4 0x000000000061621b in DestroyParallelContext (pcxt=0x80a708f20) at parallel.c:904 #5 0x00000000005d97b3 in _bt_end_parallel (btleader=0x80fe9b4b0) at nbtsort.c:1473 #6 0x00000000005d92f0 in btbuild (heap=0x80a7bc4c8, index=0x80a850a50, indexInfo=0x80fec1ab0) at nbtsort.c:340 #7 0x000000000067445b in index_build (heapRelation=0x80a7bc4c8, indexRelation=0x80a850a50, indexInfo=0x80fec1ab0, isreindex=true, parallel=true) at index.c:2963 #8 0x0000000000677bd3 in reindex_index (indexId=16532, skip_constraint_checks=true, persistence=112 'p', options=0) at index.c:3591 #9 0x0000000000678402 in reindex_relation (relid=16508, flags=18, options=0) at index.c:3807 #10 0x000000000073928f in finish_heap_swap (OIDOldHeap=16508, OIDNewHeap=16573, is_system_catalog=false, swap_toast_by_content=false, check_constraints=false, is_internal=true, frozenXid=604, cutoffMulti=1, newrelpersistence=112 'p') at cluster.c:1409 #11 0x00000000007389ab in rebuild_relation (OldHeap=0x80a7bc4c8, indexOid=16532, verbose=false) at cluster.c:622 #12 0x000000000073849e in cluster_rel (tableOid=16508, indexOid=16532, options=0) at cluster.c:428 #13 0x0000000000737f22 in cluster (stmt=0x800cfcbf0, isTopLevel=true) at cluster.c:185 #14 0x0000000000a7cc5c in standard_ProcessUtility (pstmt=0x800cfcf40, queryString=0x800cfc120 "cluster t USING t_i_idx ;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x800cfd030, completionTag=0x7fffffffe0b0 "") at utility.c:654 The CHECK_FOR_INTERRUPTS() inside the walkdir() loop could ereport() out of there after deleting some but not all of your files, but the code in dsm_detach() has already popped the callback (which it does "to avoid infinite error recursion"), so it won't run again on error cleanup. Hmm. But then... maybe the two log lines you quoted should be the other way around for that. > Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks > non-regular files and thus shared filesets. Maybe that's worth discussion on a > new thread ? > > src/backend/utils/adt/genfile.c > /* Ignore anything but regular files */ > if (!S_ISREG(attrib.st_mode)) > continue; +1, that's worth fixing.
On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote: > On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I have a nagios check on ancient tempfiles, intended to catch debris left by > > crashed processes. But triggered on this file: > > > > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls > > 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp > > 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset > > 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 > > 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 > > > > I found: > > 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI > > 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0", size5455872 | CLUSTER pg_stat_database_snap USI > > Hmm. But then... maybe the two log lines you quoted should > be the other way around for that. And, it's actually the other way around, when I order BY something better than left(log_time::text,19). postgres=# SELECT log_time pid, session_line ln, pid, database db, left(message,99), left(query,33) FROM jrn_postgres_logWHERE user_name IS NOT NULL AND log_time BETWEEN '2019-12-07 01:35' AND '2019-12-07 01:36' ORDER BY 1,2; 2019-12-07 01:35:56.626-06 | 1 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",size 5455872 | CLUSTER pg_stat_database_snap USI 2019-12-07 01:35:56.626-06 | 2 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI
On Fri, Dec 13, 2019 at 3:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote: > > On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI > > > 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0",size 5455872 | CLUSTER pg_stat_database_snap USI > > > > Hmm. But then... maybe the two log lines you quoted should > > be the other way around for that. > > And, it's actually the other way around, when I order BY something better than > left(log_time::text,19). Hah. Ok, so it looks like we shouldn't be relying on the same code path for 'happy' and 'error' cleanup. This could probably be fixed with a well placed explicit call to SharedFileSetDeleteAll() or a new function SharedFileSetDestroy(), and perhaps a flag in shmem to say it's been done so the callback doesn't do it again needlessly. I don't think this problem is specific to parallel index creation.
On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote: > On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I have a nagios check on ancient tempfiles, intended to catch debris left by > > crashed processes. But triggered on this file: > > > > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls > > 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp > > 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset > > 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 > > 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 > > > > I found: > > 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI > > 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0", size5455872 | CLUSTER pg_stat_database_snap USI > > Hmm. I played around with this and couldn't reproduce it, but I > thought of something. What if the statement timeout is reached while > we're in here: > > #0 PathNameDeleteTemporaryDir (dirname=0x7fffffffd010 "base/pgsql_tmp/pgsql_tmp28884.31.sharedfileset") at fd.c:1471 > #1 0x0000000000a32c77 in SharedFileSetDeleteAll (fileset=0x80182e2cc) at sharedfileset.c:177 > #2 0x0000000000a327e1 in SharedFileSetOnDetach (segment=0x80a6e62d8, datum=34385093324) at sharedfileset.c:206 > #3 0x0000000000a365ca in dsm_detach (seg=0x80a6e62d8) at dsm.c:684 > #4 0x000000000061621b in DestroyParallelContext (pcxt=0x80a708f20) at parallel.c:904 > #5 0x00000000005d97b3 in _bt_end_parallel (btleader=0x80fe9b4b0) at nbtsort.c:1473 > #6 0x00000000005d92f0 in btbuild (heap=0x80a7bc4c8, index=0x80a850a50, indexInfo=0x80fec1ab0) at nbtsort.c:340 ... > The CHECK_FOR_INTERRUPTS() inside the walkdir() loop could ereport() > out of there after deleting some but not all of your files, but the > code in dsm_detach() has already popped the callback (which it does > "to avoid infinite error recursion"), so it won't run again on error > cleanup. Hmm. But then... maybe the two log lines you quoted should > be the other way around for that. With inspired from re-reading your messages several times in rapid succession, I was able to reproduce this easily with: --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3344,6 +3344,7 @@ walkdir(const char *path, struct stat fst; int sret; + usleep(99999); CHECK_FOR_INTERRUPTS(); On Fri, Dec 13, 2019 at 03:49:26PM +1300, Thomas Munro wrote: > Ok, so it looks like we shouldn't be relying on the same code path for > 'happy' and 'error' cleanup. This could probably be fixed with a well > placed explicit call to SharedFileSetDeleteAll() or a new function > SharedFileSetDestroy(), and perhaps a flag in shmem to say it's been > done so the callback doesn't do it again needlessly. I don't think > this problem is specific to parallel index creation. Find below a caveman-approved patch which avoids leaving behind tmpfiles. I'm not sure how to do this cleanly, since: | src/include/utils/tuplesort.h: * Tuplesortstate and Sharedsort are opaque types whose details are not Maybe we need to add a new parameter like: | tuplesort_end(Tuplesortstate *state, bool do_delete_fileset) Arguably, that has the benefit that existing callers *have* to confront whether they should delete the fileset or not. This is such a minor issue that it's unfortunate to force a confrontation over it. -- Justin diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index efee86784b..f5b0a48d64 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -511,12 +511,17 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate, return reltuples; } +extern void *tuplesort_shared_fileset(Tuplesortstate*); + /* * clean up a spool structure and its substructures. */ static void _bt_spooldestroy(BTSpool *btspool) { + void *fileset = tuplesort_shared_fileset(btspool->sortstate); + if (fileset) + SharedFileSetDeleteAll(fileset); tuplesort_end(btspool->sortstate); pfree(btspool); } @@ -1669,6 +1674,10 @@ _bt_end_parallel(BTLeader *btleader) /* Free last reference to MVCC snapshot, if one was used */ if (IsMVCCSnapshot(btleader->snapshot)) UnregisterSnapshot(btleader->snapshot); + + // SharedFileSetDeleteAll(btleader->sharedsort->fileset); + // SharedFileSetDeleteAll(btleader->sharedsort2->fileset); + DestroyParallelContext(btleader->pcxt); ExitParallelMode(); } diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 5f6420efb2..f89d42f475 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3344,6 +3344,7 @@ walkdir(const char *path, struct stat fst; int sret; + usleep(99999); CHECK_FOR_INTERRUPTS(); if (strcmp(de->d_name, ".") == 0 || diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 3c49476483..3de9592b78 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -1375,6 +1375,11 @@ tuplesort_free(Tuplesortstate *state) MemoryContextReset(state->sortcontext); } +void *tuplesort_shared_fileset(Tuplesortstate *state) +{ + return state->shared ? &state->shared->fileset : NULL; +} + /* * tuplesort_end *
On Tue, Jul 21, 2020 at 4:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > /* > * clean up a spool structure and its substructures. > */ > static void > _bt_spooldestroy(BTSpool *btspool) > { > + void *fileset = tuplesort_shared_fileset(btspool->sortstate); > + if (fileset) > + SharedFileSetDeleteAll(fileset); > tuplesort_end(btspool->sortstate); > pfree(btspool); > } Why can't tuplesort_end do it?
On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote: > On Tue, Jul 21, 2020 at 4:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > /* > > * clean up a spool structure and its substructures. > > */ > > static void > > _bt_spooldestroy(BTSpool *btspool) > > { > > + void *fileset = tuplesort_shared_fileset(btspool->sortstate); > > + if (fileset) > > + SharedFileSetDeleteAll(fileset); > > tuplesort_end(btspool->sortstate); > > pfree(btspool); > > } > > Why can't tuplesort_end do it? Because then I think the parallel workers remove their own files, with tests failing like: +ERROR: could not open temporary file "0.0" from BufFile "0": No such file or directory I look around a bit more and came up with this, which works, but I don't know enough to say if it's right. diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 5f6420efb2..f89d42f475 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3344,6 +3344,7 @@ walkdir(const char *path, struct stat fst; int sret; + usleep(99999); CHECK_FOR_INTERRUPTS(); if (strcmp(de->d_name, ".") == 0 || diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 3c49476483..c6e5e6d00b 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -1387,6 +1387,9 @@ tuplesort_free(Tuplesortstate *state) void tuplesort_end(Tuplesortstate *state) { + if (state->shared && state->shared->workersFinished == state->nParticipants) + SharedFileSetDeleteAll(&state->shared->fileset); + tuplesort_free(state); /*
On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote: > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote: > > Why can't tuplesort_end do it? > > Because then I think the parallel workers remove their own files, with tests > failing like: > > +ERROR: could not open temporary file "0.0" from BufFile "0": No such file or directory > > I look around a bit more and came up with this, which works, but I don't know > enough to say if it's right. I convinced myself this is right, since state->nParticipants==-1 for workers. Only the leader should do the cleanup. Added here: https://commitfest.postgresql.org/29/2657/ -- Justin
Attachment
On Wed, 29 Jul 2020 at 10:37, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote: > > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote: > > > Why can't tuplesort_end do it? > > > > Because then I think the parallel workers remove their own files, with tests > > failing like: > > > > +ERROR: could not open temporary file "0.0" from BufFile "0": No such file or directory > > > > I look around a bit more and came up with this, which works, but I don't know > > enough to say if it's right. > > I convinced myself this is right, since state->nParticipants==-1 for workers. > Only the leader should do the cleanup. > > Added here: > https://commitfest.postgresql.org/29/2657/ I've also investigated this issue. As Thomas mentioned before, this problem is not specific to parallel index creation. Shared temporary files could be left if the process is interrupted while deleting the file as a part of the work of detaching dsm segment. To fix this issue, possible solutions would be: 1. Like the current patch, we call SharedFileSetDeleteAll() before DestroyParallelContext() which calls dsm_detach() so that we can make sure to delete these files while not relying on on_dsm_detach callback. That way, even if the process is interrupted during that cleaning, it will clean these files again during transaction abort (AtEOXact_Parallel() calls dsm_detach()). OTOH a downside would be that we will end up setting a rule that we need to explicitly call SharedFileSetDeleteAll(). 2. We don't use on_dsm_detach callback to delete the shared file set. Instead, I wonder if we can delete them at the end of the transaction by using ResourceOwner mechanism, like we do for non-shared temporary files cleanup. This idea doesn't have the cons that idea #1 has. OTOH, the lifetime of the shared file set will change from the parallel context to the transaction, leading to keep many temporary files until the transaction end. Also, we would need to rework the handling shared file set. 3. We use on_dsm_detach as well as on_proc_exit callback to delete the shared file set. It doesn't resolve the root cause but that way, even if the process didn’t delete it on destroying the parallel context, we can make sure to delete it on process exit. I think #1 is suitable for back branches. For HEAD, I think #2 and #3 would be better in terms of not setting an implicit rule. Thoughts? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Wed, 28 Oct 2020 19:03:14 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Wed, 29 Jul 2020 at 10:37, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote: > > > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote: > > > > Why can't tuplesort_end do it? > > > > > > Because then I think the parallel workers remove their own files, with tests > > > failing like: > > > > > > +ERROR: could not open temporary file "0.0" from BufFile "0": No such file or directory > > > > > > I look around a bit more and came up with this, which works, but I don't know > > > enough to say if it's right. > > > > I convinced myself this is right, since state->nParticipants==-1 for workers. > > Only the leader should do the cleanup. > > > > Added here: > > https://commitfest.postgresql.org/29/2657/ + if (state->shared && state->shared->workersFinished == state->nParticipants) Isn't it more straight forward to check "state->shared && state->worker == -1"? > I've also investigated this issue. As Thomas mentioned before, this > problem is not specific to parallel index creation. Shared temporary > files could be left if the process is interrupted while deleting the > file as a part of the work of detaching dsm segment. > > To fix this issue, possible solutions would be: > > 1. Like the current patch, we call SharedFileSetDeleteAll() before > DestroyParallelContext() which calls dsm_detach() so that we can make > sure to delete these files while not relying on on_dsm_detach > callback. That way, even if the process is interrupted during that > cleaning, it will clean these files again during transaction abort > (AtEOXact_Parallel() calls dsm_detach()). OTOH a downside would be > that we will end up setting a rule that we need to explicitly call > SharedFileSetDeleteAll(). That seems to be common. We release lock explicitly but it is automatically released on error. Of couse it is slightly different that SharedFileSetOnDetach unconditionally removes the directory but that doesn't matter as far as that behavior doesn't lead to an error. We can skip, as Thomas suggested, the cleanup if not necessary. Looking the comment of SharedFileSetOnDetach: | * everything in them. We can't raise an error on failures, because this runs | * in error cleanup paths. I feel that a function that shouldn't error-out also shouldn't be cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in walkdir() when elevel is smaller than ERROR. ===== diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index b58502837a..593c23553e 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3374,7 +3374,9 @@ walkdir(const char *path, { char subpath[MAXPGPATH * 2]; - CHECK_FOR_INTERRUPTS(); + /* omit interrupts while we shouldn't error-out */ + if (elevel >= ERROR) + CHECK_FOR_INTERRUPTS(); if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) ===== > 2. We don't use on_dsm_detach callback to delete the shared file set. > Instead, I wonder if we can delete them at the end of the transaction > by using ResourceOwner mechanism, like we do for non-shared temporary > files cleanup. This idea doesn't have the cons that idea #1 has. OTOH, > the lifetime of the shared file set will change from the parallel > context to the transaction, leading to keep many temporary files until > the transaction end. Also, we would need to rework the handling shared > file set. > > 3. We use on_dsm_detach as well as on_proc_exit callback to delete the > shared file set. It doesn't resolve the root cause but that way, even > if the process didn’t delete it on destroying the parallel context, we > can make sure to delete it on process exit. > > I think #1 is suitable for back branches. For HEAD, I think #2 and #3 > would be better in terms of not setting an implicit rule. Thoughts? As far as we allow dms_detach being canceled, the problem persists anywhat other we do. So #2 and #3 seems a bit too much. It seems to me that #1 + omitting CHECK_FOR_INTERRUPTS() is suitable for all branches. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 26/01/2021 06:46, Kyotaro Horiguchi wrote: > Looking the comment of SharedFileSetOnDetach: > > | * everything in them. We can't raise an error on failures, because this runs > | * in error cleanup paths. > > I feel that a function that shouldn't error-out also shouldn't be > cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in > walkdir() when elevel is smaller than ERROR. > > ===== > diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c > index b58502837a..593c23553e 100644 > --- a/src/backend/storage/file/fd.c > +++ b/src/backend/storage/file/fd.c > @@ -3374,7 +3374,9 @@ walkdir(const char *path, > { > char subpath[MAXPGPATH * 2]; > > - CHECK_FOR_INTERRUPTS(); > + /* omit interrupts while we shouldn't error-out */ > + if (elevel >= ERROR) > + CHECK_FOR_INTERRUPTS(); > > if (strcmp(de->d_name, ".") == 0 || > strcmp(de->d_name, "..") == 0) > ===== Don't we potentially have the same problem with all on_dsm_detach callbacks? Looking at the other on_dsm_detach callbacks, I don't see any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to assume that. I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least around the removal of the callback from the list and calling the callback. Maybe even over the whole function. - Heikki
At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in > On 26/01/2021 06:46, Kyotaro Horiguchi wrote: > > Looking the comment of SharedFileSetOnDetach: > > | * everything in them. We can't raise an error on failures, because this > > | * runs > > | * in error cleanup paths. > > I feel that a function that shouldn't error-out also shouldn't be > > cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in > > walkdir() when elevel is smaller than ERROR. > > ===== > > diff --git a/src/backend/storage/file/fd.c > > b/src/backend/storage/file/fd.c > > index b58502837a..593c23553e 100644 > > --- a/src/backend/storage/file/fd.c > > +++ b/src/backend/storage/file/fd.c > > @@ -3374,7 +3374,9 @@ walkdir(const char *path, > > { > > char subpath[MAXPGPATH * 2]; > > - CHECK_FOR_INTERRUPTS(); > > + /* omit interrupts while we shouldn't error-out */ > > + if (elevel >= ERROR) > > + CHECK_FOR_INTERRUPTS(); > > if (strcmp(de->d_name, ".") == 0 || > > strcmp(de->d_name, "..") == 0) > > ===== > > Don't we potentially have the same problem with all on_dsm_detach > callbacks? Looking at the other on_dsm_detach callbacks, I don't see > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to > assume that. > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least > around the removal of the callback from the list and calling the > callback. Maybe even over the whole function. Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same location. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in > > Don't we potentially have the same problem with all on_dsm_detach > > callbacks? Looking at the other on_dsm_detach callbacks, I don't see > > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to > > assume that. > > > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least > > around the removal of the callback from the list and calling the > > callback. Maybe even over the whole function. > > Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same > location. +1, this seems like a good idea. This is a little bit like the code near the comments "Don't joggle the elbow of proc_exit".
On Wed, Jan 27, 2021 at 9:34 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in > > > Don't we potentially have the same problem with all on_dsm_detach > > > callbacks? Looking at the other on_dsm_detach callbacks, I don't see > > > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to > > > assume that. > > > > > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least > > > around the removal of the callback from the list and calling the > > > callback. Maybe even over the whole function. > > > > Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same > > location. > > +1, this seems like a good idea. This is a little bit like the code > near the comments "Don't joggle the elbow of proc_exit". So that gives a very simple back-patchable patch.
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: >> +1, this seems like a good idea. This is a little bit like the code >> near the comments "Don't joggle the elbow of proc_exit". > So that gives a very simple back-patchable patch. Hmm, so is the *rest* of that function perfectly okay with being interrupted? regards, tom lane
On Sun, Jan 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > So that gives a very simple back-patchable patch. > > Hmm, so is the *rest* of that function perfectly okay with being > interrupted? It looks OK to me. There aren't any CFI()s in there.
On Fri, Feb 5, 2021 at 5:47 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sun, Jan 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > So that gives a very simple back-patchable patch. > > > > Hmm, so is the *rest* of that function perfectly okay with being > > interrupted? > > It looks OK to me. There aren't any CFI()s in there. Pushed. That closes CF #2657.