Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Parallel Hash take II
Date
Msg-id CAH2-Wz=aAE6nSYOjjMnBWnAJ+81UsWvn7-6i=t+aqYC6JRfd6Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Hash take II  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote:
> +/*
> + * Build the name for a given segment of a given BufFile.
> + */
> +static void
> +MakeSharedSegmentName(char *name, const char *buffile_name, int segment)
> +{
> +       snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
> +}
>
> Not a fan of this name - you're not "making" a filename here (as in
> allocating or such). I think I'd just remove the Make prefix.

+1

Can we document the theory behind file naming here, if that isn't in
the latest version? This is a routine private to parallel hash join
(or shared tuplestore), not Buffile. Maybe Buffile should have some
opinion on this, though. Just as a matter of style.

> +/*
> + * Delete a BufFile that was created by BufFileCreateShared in the given
> + * SharedFileSet using the given name.
> + *
> + * It is not necessary to delete files explicitly with this function.  It is
> + * provided only as a way to delete files proactively, rather than waiting for
> + * the SharedFileSet to be cleaned up.
> + *
> + * Only one backend should attempt to delete a given name, and should know
> + * that it exists and has been exported or closed.
> + */

This part is new to me. We now want one backend to delete a given
filename. What changed? Please provide a Message-Id reference if that
will help me to understand.

For now, I'm going to guess that this development had something to do
with the need to deal with virtual FDs that do a close() on an FD to
keep under backend limits. Do I have that right?

> +       /*
> +        * We don't set FD_DELETE_AT_CLOSE for files opened this way, but we still
> +        * want to make sure they get closed at end of xact.
> +        */
> +       ResourceOwnerEnlargeFiles(CurrentResourceOwner);
> +       ResourceOwnerRememberFile(CurrentResourceOwner, file);
> +       VfdCache[file].resowner = CurrentResourceOwner;
>
> So maybe I'm being pedantic here, but wouldn't the right order be to do
> ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory
> allocating operation, so it can fail, which'd leak the file.

I remember going to pains to get this right with my own unifiable
BufFile concept. I'm going to wait for an my question about file
deletion + resowners before commenting further, though.

> +       if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
> +       {
> +               /* Subtract its size from current usage (do first in case of error) */
> +               temporary_files_size -= vfdP->fileSize;
> +               vfdP->fileSize = 0;
> +       }
>
> So, is it right to do so unconditionally and without regard for errors?
> If the file isn't deleted, it shouldn't be subtracted from fileSize. I
> guess you're managing that through the flag, but that's not entirely
> obvious.

I think that the problem here is that the accounting is expected to
always work. It's not like there is a resowner style error path in
which temporary_files_size gets reset to 0.

> Is that entirely unproblematic? Are there any DSM callbacks that rely on
> locks still being held? Please split this part into a separate commit
> with such analysis.

I was always confused about the proper use of DSM callbacks myself.
They seemed generally underdocumented.

> +                       /* Create the output buffer. */
> +                       if (accessor->write_chunk != NULL)
> +                               pfree(accessor->write_chunk);
> +                       accessor->write_chunk = (SharedTuplestoreChunk *)
> +                               palloc0(accessor->write_pages * BLCKSZ);
>
> Are we guaranteed to be in a long-lived memory context here?

I imagine that Thomas looked to tuplestore_begin_heap() + interXact as
a kind of precedent here. See comments above that function.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Parallel Hash take II