Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAH2-Wz=50G_B7Y6nZrdpXmaxc4A=s9osA6-1--7-0dOf2Nfaxw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> There is still an open item here, though: The leader-as-worker
> Tuplesortstate, a special case, can still leak files.

I phrased this badly. What I mean is that there can be instances where
temp files are left on disk following a failure such as palloc() OOM;
no backend ends up doing an unlink() iff a leader-as-worker
Tuplesortstate was used and we get unlucky. I did not mean a leak of
virtual or real file descriptors, which would see Postgres print a
refcount leak warning from resowner.c. Naturally, these "leaked" files
will eventually be deleted by the next restart of the server at the
latest, within RemovePgTempFiles(). Note also that a duplicate
unlink() (with annoying LOG message) is impossible under any
circumstances with V9, regardless of whether or not a leader-as-worker
Tuplesort state is involved.

Anyway, I was sure that I needed to completely nail this down in order
to be consistent with existing guarantees, but another look at
OpenTemporaryFile() makes me doubt that. ResourceOwnerEnlargeFiles()
is called, which itself uses palloc(), which can of course fail. There
are remarks over that function within resowner.c about OOM:

/** Make sure there is room for at least one more entry in a ResourceOwner's* files reference array.** This is separate
fromactually inserting an entry because if we run out* of memory, it's critical to do so *before* acquiring the
resource.*/
void
ResourceOwnerEnlargeFiles(ResourceOwner owner)
{   ...
}

But this happens after OpenTemporaryFileInTablespace() has already
returned. Taking care to allocate memory up-front here is motivated by
keeping the vFD cache entry and current resource owner in perfect
agreement about the FD_XACT_TEMPORARY-ness of a file, and that's it.
It's *not* true that there is a broader sense in which
OpenTemporaryFile() is atomic, which for some reason I previously
believed to be the case.

So, I haven't failed to prevent an outcome that wasn't already
possible. It doesn't seem like it would be that hard to fix this, and
then have the parallel tuplesort patch live up to that new higher
standard. But, it's possible that Tom or maybe someone else would
consider that a bad idea, for roughly the same reason that we don't
call RemovePgTempFiles() for *crash* induced restarts, as mentioned by
Thomas up-thead:
* NOTE: we could, but don't, call this during a post-backend-crash restart* cycle.  The argument for not doing it is
thatsomeone might want to examine* the temp files for debugging purposes.  This does however mean that*
OpenTemporaryFilehad better allow for collision with an existing temp* file name.*/
 
void
RemovePgTempFiles(void)
{   ...
}

Note that I did put some thought into making sure OpenTemporaryFile()
does the right thing with collisions with existing temp files. So,
maybe the right thing is to do nothing at all. I don't have strong
feelings either way on this question.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist