Home > mailing lists

Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: [HACKERS] Parallel Hash take II
Date	November 8, 2017 03:32:28
Msg-id	CAH2-Wz=aAE6nSYOjjMnBWnAJ+81UsWvn7-6i=t+aqYC6JRfd6Q@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel Hash take II (Andres Freund <andres@anarazel.de>)
List	pgsql-hackers

Tree view

On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote:
> +/*
> + * Build the name for a given segment of a given BufFile.
> + */
> +static void
> +MakeSharedSegmentName(char *name, const char *buffile_name, int segment)
> +{
> +       snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
> +}
>
> Not a fan of this name - you're not "making" a filename here (as in
> allocating or such). I think I'd just remove the Make prefix.

+1

Can we document the theory behind file naming here, if that isn't in
the latest version? This is a routine private to parallel hash join
(or shared tuplestore), not Buffile. Maybe Buffile should have some
opinion on this, though. Just as a matter of style.

> +/*
> + * Delete a BufFile that was created by BufFileCreateShared in the given
> + * SharedFileSet using the given name.
> + *
> + * It is not necessary to delete files explicitly with this function.  It is
> + * provided only as a way to delete files proactively, rather than waiting for
> + * the SharedFileSet to be cleaned up.
> + *
> + * Only one backend should attempt to delete a given name, and should know
> + * that it exists and has been exported or closed.
> + */

This part is new to me. We now want one backend to delete a given
filename. What changed? Please provide a Message-Id reference if that
will help me to understand.

For now, I'm going to guess that this development had something to do
with the need to deal with virtual FDs that do a close() on an FD to
keep under backend limits. Do I have that right?

> +       /*
> +        * We don't set FD_DELETE_AT_CLOSE for files opened this way, but we still
> +        * want to make sure they get closed at end of xact.
> +        */
> +       ResourceOwnerEnlargeFiles(CurrentResourceOwner);
> +       ResourceOwnerRememberFile(CurrentResourceOwner, file);
> +       VfdCache[file].resowner = CurrentResourceOwner;
>
> So maybe I'm being pedantic here, but wouldn't the right order be to do
> ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory
> allocating operation, so it can fail, which'd leak the file.

I remember going to pains to get this right with my own unifiable
BufFile concept. I'm going to wait for an my question about file
deletion + resowners before commenting further, though.

> +       if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
> +       {
> +               /* Subtract its size from current usage (do first in case of error) */
> +               temporary_files_size -= vfdP->fileSize;
> +               vfdP->fileSize = 0;
> +       }
>
> So, is it right to do so unconditionally and without regard for errors?
> If the file isn't deleted, it shouldn't be subtracted from fileSize. I
> guess you're managing that through the flag, but that's not entirely
> obvious.

I think that the problem here is that the accounting is expected to
always work. It's not like there is a resowner style error path in
which temporary_files_size gets reset to 0.

> Is that entirely unproblematic? Are there any DSM callbacks that rely on
> locks still being held? Please split this part into a separate commit
> with such analysis.

I was always confused about the proper use of DSM callbacks myself.
They seemed generally underdocumented.

> +                       /* Create the output buffer. */
> +                       if (accessor->write_chunk != NULL)
> +                               pfree(accessor->write_chunk);
> +                       accessor->write_chunk = (SharedTuplestoreChunk *)
> +                               palloc0(accessor->write_pages * BLCKSZ);
>
> Are we guaranteed to be in a long-lived memory context here?

I imagine that Thomas looked to tuplestore_begin_heap() + interXact as
a kind of precedent here. See comments above that function.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Robert Haas
Date: 08 November 2017, 03:21:55
Subject: Re: [HACKERS] why not parallel seq scan for slow functions

From: Robert Haas
Date: 08 November 2017, 03:32:41
Subject: Re: [HACKERS] Parallel Hash take II

Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

Previous

Next