Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Parallel Hash take II |
Date | |
Msg-id | CAEepm=1zWGPpok6BKCZyzm46jz8Zy1fR9+tWcnHPMF6DKgBwiA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Hash take II (Peter Geoghegan <pg@bowt.ie>) |
List | pgsql-hackers |
Hi Peter, See responses to a couple of points below. I'll respond to the other points separately (ie with code/comment changes). On Wed, Nov 8, 2017 at 10:32 AM, Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote: >> +/* >> + * Delete a BufFile that was created by BufFileCreateShared in the given >> + * SharedFileSet using the given name. >> + * >> + * It is not necessary to delete files explicitly with this function. It is >> + * provided only as a way to delete files proactively, rather than waiting for >> + * the SharedFileSet to be cleaned up. >> + * >> + * Only one backend should attempt to delete a given name, and should know >> + * that it exists and has been exported or closed. >> + */ > > This part is new to me. We now want one backend to delete a given > filename. What changed? Please provide a Message-Id reference if that > will help me to understand. > > For now, I'm going to guess that this development had something to do > with the need to deal with virtual FDs that do a close() on an FD to > keep under backend limits. Do I have that right? No -- this is simply an option available to client code that wants to clean up individual temporary files earlier. Such client code is responsible for meeting the synchronisation requirements described in the comment, but it's entirely optional. For example: SharedTuplestore is backed by multiple files and it could take the opportunity to delete individual files after they've been scanned, if you told it you were going to scan only once (SHARED_TUPLESTORE_SINGLE_PASS) and if it could prove that no other backend could still need to read from it, as mentioned in a comment in sts_end_parallel_scan(). However, since I changed to the page/chunk based model (spinlock while advancing block counter, mimicking Parallel Seq Scan) instead of the earlier Fisher Price version (LWLock while reading each tuple with a shared read head maintained with tell/seek), I don't actually do that. Hitting the end of the file no longer means that no one else is reading from the file (someone else might still be reading from an earlier chunk even though you've finished reading the final chunk in the file, and the vfd systems means that they must be free to close and reopen the file at any time). In the current patch version the files are cleaned up wholesale at two times: SharedFileSet cleanup triggered by DSM destruction, and SharedFileSet reset triggered by rescan. Practically, it's always the former case. It's vanishingly rare that you'd actually want to be rescanning a Parallel Hash that spills to disk but in that case we delete the old files and recreate, and that case is tested in the regression tests. If it bothers you that I have an API there that I'm not actually using yet, I will remove it. >> + if (vfdP->fdstate & FD_TEMP_FILE_LIMIT) >> + { >> + /* Subtract its size from current usage (do first in case of error) */ >> + temporary_files_size -= vfdP->fileSize; >> + vfdP->fileSize = 0; >> + } >> >> So, is it right to do so unconditionally and without regard for errors? >> If the file isn't deleted, it shouldn't be subtracted from fileSize. I >> guess you're managing that through the flag, but that's not entirely >> obvious. > > I think that the problem here is that the accounting is expected to > always work. It's not like there is a resowner style error path in > which temporary_files_size gets reset to 0. But there is a resowner error path in which File handles get automatically closed and temporary_files_size gets adjusted. The counter goes up when you create, and goes down when you close or resowner closes for you. Eventually you either close and the bookkeeping is consistent or you crash and it doesn't matter. And some kind of freak multiple close attempt is guarded against by setting the files size to 0 so we can't double-subtract. Do you see a bug? None of this has any impact on whether files are leaked: either SharedFileSet removes the files, or you crash (or take a filesystem snapshot, etc) and RemovePgTempFiles() mops them up at the next clean startup. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: