Re: shared tempfile was not removed on statement_timeout (unreproducible) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: shared tempfile was not removed on statement_timeout (unreproducible) |
Date | |
Msg-id | CA+hUKGJStr-3B6qNnFEOpES8HHc3Wwe3wSrYYQJcQhHuTB9SdQ@mail.gmail.com Whole thread Raw |
In response to | shared tempfile was not removed on statement_timeout (unreproducible) (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: shared tempfile was not removed on statement_timeout(unreproducible)
Re: shared tempfile was not removed on statement_timeout |
List | pgsql-hackers |
On Fri, Dec 13, 2019 at 7:05 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > I have a nagios check on ancient tempfiles, intended to catch debris left by > crashed processes. But triggered on this file: > > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls > 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 12 11:32 /var/lib/pgsql/12/data/base/pgsql_tmp > 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset > 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 > 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 > > I found: > 2019-12-07 01:35:56 | 11025 | postgres | canceling statement due to statement timeout | CLUSTER pg_stat_database_snap USI > 2019-12-07 01:35:56 | 11025 | postgres | temporary file: path "base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/2.0", size5455872 | CLUSTER pg_stat_database_snap USI Hmm. I played around with this and couldn't reproduce it, but I thought of something. What if the statement timeout is reached while we're in here: #0 PathNameDeleteTemporaryDir (dirname=0x7fffffffd010 "base/pgsql_tmp/pgsql_tmp28884.31.sharedfileset") at fd.c:1471 #1 0x0000000000a32c77 in SharedFileSetDeleteAll (fileset=0x80182e2cc) at sharedfileset.c:177 #2 0x0000000000a327e1 in SharedFileSetOnDetach (segment=0x80a6e62d8, datum=34385093324) at sharedfileset.c:206 #3 0x0000000000a365ca in dsm_detach (seg=0x80a6e62d8) at dsm.c:684 #4 0x000000000061621b in DestroyParallelContext (pcxt=0x80a708f20) at parallel.c:904 #5 0x00000000005d97b3 in _bt_end_parallel (btleader=0x80fe9b4b0) at nbtsort.c:1473 #6 0x00000000005d92f0 in btbuild (heap=0x80a7bc4c8, index=0x80a850a50, indexInfo=0x80fec1ab0) at nbtsort.c:340 #7 0x000000000067445b in index_build (heapRelation=0x80a7bc4c8, indexRelation=0x80a850a50, indexInfo=0x80fec1ab0, isreindex=true, parallel=true) at index.c:2963 #8 0x0000000000677bd3 in reindex_index (indexId=16532, skip_constraint_checks=true, persistence=112 'p', options=0) at index.c:3591 #9 0x0000000000678402 in reindex_relation (relid=16508, flags=18, options=0) at index.c:3807 #10 0x000000000073928f in finish_heap_swap (OIDOldHeap=16508, OIDNewHeap=16573, is_system_catalog=false, swap_toast_by_content=false, check_constraints=false, is_internal=true, frozenXid=604, cutoffMulti=1, newrelpersistence=112 'p') at cluster.c:1409 #11 0x00000000007389ab in rebuild_relation (OldHeap=0x80a7bc4c8, indexOid=16532, verbose=false) at cluster.c:622 #12 0x000000000073849e in cluster_rel (tableOid=16508, indexOid=16532, options=0) at cluster.c:428 #13 0x0000000000737f22 in cluster (stmt=0x800cfcbf0, isTopLevel=true) at cluster.c:185 #14 0x0000000000a7cc5c in standard_ProcessUtility (pstmt=0x800cfcf40, queryString=0x800cfc120 "cluster t USING t_i_idx ;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x800cfd030, completionTag=0x7fffffffe0b0 "") at utility.c:654 The CHECK_FOR_INTERRUPTS() inside the walkdir() loop could ereport() out of there after deleting some but not all of your files, but the code in dsm_detach() has already popped the callback (which it does "to avoid infinite error recursion"), so it won't run again on error cleanup. Hmm. But then... maybe the two log lines you quoted should be the other way around for that. > Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks > non-regular files and thus shared filesets. Maybe that's worth discussion on a > new thread ? > > src/backend/utils/adt/genfile.c > /* Ignore anything but regular files */ > if (!S_ISREG(attrib.st_mode)) > continue; +1, that's worth fixing.
pgsql-hackers by date: