Thread: Fixing order of resowner cleanup in 12, for Windows

Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

02 May 2019, 10:30:12

Hi all,

A while back I posted a patch[1] to change the order of resowner
cleanup so that DSM handles are released last.  That's useful for the
error cleanup path on Windows, when a SharedFileSet is cleaned up (a
mechanism that's used by parallel CREATE INDEX and parallel hash join,
for spilling files to disk under a temporary directory, with automatic
cleanup).  Previously we believed that it was OK to unlink things that
other processes might have currently open as long as you use the
FILE_SHARE_DELETE flag, but that turned out not to be the full story:
you can unlink files that someone has open, but you can't unlink the
directory that contains them!  Hence the desire to reverse the
clean-up order.

It didn't seem worth the risk of back-patching the change, because the
only consequence is a confusing message that appears somewhere near
the real error:

LOG: could not rmdir directory
"base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty

I suppose we probably should make the change to 12 though: then owners
of extensions that use DSM detach hooks (if there any such extensions)
will have a bit of time to get used to the new order during the beta
period.  I'll need to find someone to test this with a fault injection
scenario on Windows before committing it, but wanted to sound out the
list for any objections to this late change?

[1] https://www.postgresql.org/message-id/CAEepm%3D2ikUtjmiJ18bTnwaeUBoiYN%3DwMDSdhU1jy%3D8WzNhET-Q%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com

Re: Fixing order of resowner cleanup in 12, for Windows

From

Tom Lane

Date:

02 May 2019, 14:15:11

Thomas Munro <thomas.munro@gmail.com> writes:
> A while back I posted a patch[1] to change the order of resowner
> cleanup so that DSM handles are released last.  That's useful for the
> error cleanup path on Windows, when a SharedFileSet is cleaned up (a
> mechanism that's used by parallel CREATE INDEX and parallel hash join,
> for spilling files to disk under a temporary directory, with automatic
> cleanup).

I guess what I'm wondering is if there are any potential negative
consequences, ie code that won't work if we change the order like this.
I'm finding it hard to visualize what that would be, but then again
this failure mode wasn't obvious either.

> I suppose we probably should make the change to 12 though: then owners
> of extensions that use DSM detach hooks (if there any such extensions)
> will have a bit of time to get used to the new order during the beta
> period.  I'll need to find someone to test this with a fault injection
> scenario on Windows before committing it, but wanted to sound out the
> list for any objections to this late change?

Since we haven't started beta yet, I don't see a reason not to change
it.  Worst case is that it causes problems and we revert it.

I concur with not back-patching, in any case.

            regards, tom lane

Re: Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

05 May 2019, 22:13:06

On Fri, May 3, 2019 at 2:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > A while back I posted a patch[1] to change the order of resowner
> > cleanup so that DSM handles are released last.  That's useful for the
> > error cleanup path on Windows, when a SharedFileSet is cleaned up (a
> > mechanism that's used by parallel CREATE INDEX and parallel hash join,
> > for spilling files to disk under a temporary directory, with automatic
> > cleanup).
>
> I guess what I'm wondering is if there are any potential negative
> consequences, ie code that won't work if we change the order like this.
> I'm finding it hard to visualize what that would be, but then again
> this failure mode wasn't obvious either.

I can't think of anything in core.  The trouble here is that we're
talking about hypothetical out-of-tree code that could want to plug in
detach hooks to do anything at all, so it's hard to say.  One idea
that occurred to me is that if someone comes up with a genuine need to
run arbitrary callbacks before locks are released (for example), we
could provide a way to be called in all three phases and receive the
phase, though admittedly in this case FileClose() is in the same phase
as I'm proposing to put dsm_detach(), so there is an ordering
requirement that might require more fine grained phases.  I don't
know.

> > I suppose we probably should make the change to 12 though: then owners
> > of extensions that use DSM detach hooks (if there any such extensions)
> > will have a bit of time to get used to the new order during the beta
> > period.  I'll need to find someone to test this with a fault injection
> > scenario on Windows before committing it, but wanted to sound out the
> > list for any objections to this late change?
>
> Since we haven't started beta yet, I don't see a reason not to change
> it.  Worst case is that it causes problems and we revert it.
>
> I concur with not back-patching, in any case.

Here's a way to produce an error which might produce the log message
on Windows.  Does anyone want to try it?

postgres=# create table foo as select generate_series(1, 10000000)::int i;
SELECT 10000000
postgres=# set synchronize_seqscans = off;
SET
postgres=# create index on foo ((1 / (5000000 - i)));
psql: ERROR:  division by zero
postgres=# create index on foo ((1 / (5000000 - i)));
psql: ERROR:  division by zero
postgres=# create index on foo ((1 / (5000000 - i)));
psql: ERROR:  division by zero
CONTEXT:  parallel worker

(If you don't turn sync scan off, it starts scanning from where it
left off last time and then fails immediately, which may interfere
with the experiment if you run it more than once, I'm not sure).

If it does produce the log message, then the attached patch should
make it go away.

-- 
Thomas Munro
https://enterprisedb.com

Attachment

0001-Detach-from-DSM-segments-last-in-resowner.c.patch

Re: Fixing order of resowner cleanup in 12, for Windows

From

Tom Lane

Date:

05 May 2019, 23:07:52

Thomas Munro <thomas.munro@gmail.com> writes:
> If it does produce the log message, then the attached patch should
> make it go away.

One thing I don't care for about this patch is that the original code
looked like it didn't matter what order we did the resource releases in,
and the patched code still looks like that.  You're not doing future
hackers any service by failing to include a comment that explains that
DSM detach MUST BE LAST, and explaining why.  Even with that, I'd only
rate it about a 75% chance that somebody won't try to add their new
resource type at the end --- but with no comment, the odds they'll
get it right are indistinguishable from zero.

            regards, tom lane

Re: Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

06 May 2019, 00:56:46

On Mon, May 6, 2019 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One thing I don't care for about this patch is that the original code
> looked like it didn't matter what order we did the resource releases in,
> and the patched code still looks like that.  You're not doing future
> hackers any service by failing to include a comment that explains that
> DSM detach MUST BE LAST, and explaining why.  Even with that, I'd only
> rate it about a 75% chance that somebody won't try to add their new
> resource type at the end --- but with no comment, the odds they'll
> get it right are indistinguishable from zero.

Ok, here's a version that provides a specific reason (the Windows file
handle thing) and also a more general reasoning: we don't really want
extension (or core) authors writing callbacks that depend on eg pins
or locks or whatever else being still held when they run, because
that's fragile, so calling them last is the best and most conservative
choice.  I think if someone does come with legitimate reasons to want
that, we should discuss it then, and perhaps consider something a bit
like the ResourceRelease_callbacks list: its callbacks are invoked for
each phase.

--
Thomas Munro
https://enterprisedb.com

Attachment

0001-Detach-from-DSM-segments-last-in-resowner.c-v2.patch

Re: Fixing order of resowner cleanup in 12, for Windows

From

Tom Lane

Date:

06 May 2019, 03:44:47

Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, May 6, 2019 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... You're not doing future
>> hackers any service by failing to include a comment that explains that
>> DSM detach MUST BE LAST, and explaining why.

> Ok, here's a version that provides a specific reason (the Windows file
> handle thing) and also a more general reasoning: we don't really want
> extension (or core) authors writing callbacks that depend on eg pins
> or locks or whatever else being still held when they run, because
> that's fragile, so calling them last is the best and most conservative
> choice.

LGTM.

> ... I think if someone does come with legitimate reasons to want
> that, we should discuss it then, and perhaps consider something a bit
> like the ResourceRelease_callbacks list: its callbacks are invoked for
> each phase.

Hmm, now that you mention it: this bit at the very end

    /* Let add-on modules get a chance too */
    for (item = ResourceRelease_callbacks; item; item = item->next)
        item->callback(phase, isCommit, isTopLevel, item->arg);

seems kind of misplaced given this discussion.  Should we not run that
*first*, before we release core resources for the same phase?  It's
a lot more plausible that extension resources depend on core resources
than vice versa.

            regards, tom lane

Re: Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

06 May 2019, 04:47:40

On Mon, May 6, 2019 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Mon, May 6, 2019 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> ... You're not doing future
> >> hackers any service by failing to include a comment that explains that
> >> DSM detach MUST BE LAST, and explaining why.
>
> > Ok, here's a version that provides a specific reason (the Windows file
> > handle thing) and also a more general reasoning: we don't really want
> > extension (or core) authors writing callbacks that depend on eg pins
> > or locks or whatever else being still held when they run, because
> > that's fragile, so calling them last is the best and most conservative
> > choice.
>
> LGTM.

Cool.  I'll wait a bit to see if we can get confirmation from a
Windows hacker that it does what I claim.  Or maybe I should try to
come up with a regression test that exercises it without having to
create a big table.

> > ... I think if someone does come with legitimate reasons to want
> > that, we should discuss it then, and perhaps consider something a bit
> > like the ResourceRelease_callbacks list: its callbacks are invoked for
> > each phase.
>
> Hmm, now that you mention it: this bit at the very end
>
>         /* Let add-on modules get a chance too */
>         for (item = ResourceRelease_callbacks; item; item = item->next)
>                 item->callback(phase, isCommit, isTopLevel, item->arg);
>
> seems kind of misplaced given this discussion.  Should we not run that
> *first*, before we release core resources for the same phase?  It's
> a lot more plausible that extension resources depend on core resources
> than vice versa.

Not sure.   Changing the meaning of the existing callbacks from last
to first in each phase seems a bit unfriendly.  If it's useful to be
able to run a callback before RESOURCE_RELEASE_BEFORE_LOCKS, perhaps
we need a new phase that comes before that?

-- 
Thomas Munro
https://enterprisedb.com

Re: Fixing order of resowner cleanup in 12, for Windows

From

Amit Kapila

Date:

06 May 2019, 09:26:38

On Mon, May 6, 2019 at 3:43 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, May 3, 2019 at 2:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > A while back I posted a patch[1] to change the order of resowner
> > > cleanup so that DSM handles are released last.  That's useful for the
> > > error cleanup path on Windows, when a SharedFileSet is cleaned up (a
> > > mechanism that's used by parallel CREATE INDEX and parallel hash join,
> > > for spilling files to disk under a temporary directory, with automatic
> > > cleanup).
> >
> > I guess what I'm wondering is if there are any potential negative
> > consequences, ie code that won't work if we change the order like this.
> > I'm finding it hard to visualize what that would be, but then again
> > this failure mode wasn't obvious either.
>
> I can't think of anything in core.  The trouble here is that we're
> talking about hypothetical out-of-tree code that could want to plug in
> detach hooks to do anything at all, so it's hard to say.  One idea
> that occurred to me is that if someone comes up with a genuine need to
> run arbitrary callbacks before locks are released (for example), we
> could provide a way to be called in all three phases and receive the
> phase, though admittedly in this case FileClose() is in the same phase
> as I'm proposing to put dsm_detach(), so there is an ordering
> requirement that might require more fine grained phases.  I don't
> know.
>
> > > I suppose we probably should make the change to 12 though: then owners
> > > of extensions that use DSM detach hooks (if there any such extensions)
> > > will have a bit of time to get used to the new order during the beta
> > > period.  I'll need to find someone to test this with a fault injection
> > > scenario on Windows before committing it, but wanted to sound out the
> > > list for any objections to this late change?
> >
> > Since we haven't started beta yet, I don't see a reason not to change
> > it.  Worst case is that it causes problems and we revert it.
> >
> > I concur with not back-patching, in any case.
>
> Here's a way to produce an error which might produce the log message
> on Windows.  Does anyone want to try it?
>

I can give it a try.

> postgres=# create table foo as select generate_series(1, 10000000)::int i;
> SELECT 10000000
> postgres=# set synchronize_seqscans = off;
> SET
> postgres=# create index on foo ((1 / (5000000 - i)));
> psql: ERROR:  division by zero
> postgres=# create index on foo ((1 / (5000000 - i)));
> psql: ERROR:  division by zero
> postgres=# create index on foo ((1 / (5000000 - i)));
> psql: ERROR:  division by zero
> CONTEXT:  parallel worker
>
> (If you don't turn sync scan off, it starts scanning from where it
> left off last time and then fails immediately, which may interfere
> with the experiment if you run it more than once, I'm not sure).
>
> If it does produce the log message, then the attached patch should
> make it go away.
>

Are you referring to log message  "LOG:  could not rmdir directory
"base/pgsql_tmp/pgsql_tmp3692.0.sharedfileset": Directory not empty"?
If so, I am getting it both before and after your patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

06 May 2019, 11:11:17

On Mon, May 6, 2019 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, May 6, 2019 at 3:43 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Here's a way to produce an error which might produce the log message
> > on Windows.  Does anyone want to try it?
>
> I can give it a try.

Thanks!

> > If it does produce the log message, then the attached patch should
> > make it go away.
>
> Are you referring to log message  "LOG:  could not rmdir directory
> "base/pgsql_tmp/pgsql_tmp3692.0.sharedfileset": Directory not empty"?

Yes.

> If so, I am getting it both before and after your patch.

Huh.  I thought the only problem here was the phenomenon demonstrated
by [1].  I'm a bit stumped... if we've closed all the handles in every
backend before detaching, and then the last to detach unlinks all the
files first and then the directory, how can we get that error?

[1] https://www.postgresql.org/message-id/CAEepm%3D2rH_V5by1kH1Q1HZWPFj%3D4ykjU4JcyoKMNVT6Jh8Q_Rw%40mail.gmail.com


--
Thomas Munro
https://enterprisedb.com

Re: Fixing order of resowner cleanup in 12, for Windows

From

Amit Kapila

Date:

06 May 2019, 11:51:20

On Mon, May 6, 2019 at 4:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, May 6, 2019 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > If so, I am getting it both before and after your patch.
>
> Huh.  I thought the only problem here was the phenomenon demonstrated
> by [1].  I'm a bit stumped... if we've closed all the handles in every
> backend before detaching, and then the last to detach unlinks all the
> files first and then the directory, how can we get that error?
>

Yeah, I am also not sure what caused that and I have verified it two
times to ensure that I have not made any mistake.  I can try once
again tomorrow after adding some debug messages, but if someone else
can also once confirm the behavior, it would be good.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Fixing order of resowner cleanup in 12, for Windows

From

Robert Haas

Date:

06 May 2019, 16:14:55

On Thu, May 2, 2019 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > A while back I posted a patch[1] to change the order of resowner
> > cleanup so that DSM handles are released last.  That's useful for the
> > error cleanup path on Windows, when a SharedFileSet is cleaned up (a
> > mechanism that's used by parallel CREATE INDEX and parallel hash join,
> > for spilling files to disk under a temporary directory, with automatic
> > cleanup).
>
> I guess what I'm wondering is if there are any potential negative
> consequences, ie code that won't work if we change the order like this.
> I'm finding it hard to visualize what that would be, but then again
> this failure mode wasn't obvious either.

I have a thought about this.  It seems to me that when it comes to
backend-private memory, we release it even later: aborting the
transaction does nothing, and we do it only later when we clean up the
transaction.  So I wonder whether we're going to find that we actually
want to postpone reclaiming dynamic shared memory for even longer than
this change would do.  But in general, I think we've already
established the principle that releasing memory needs to happen last,
because every other resource that you might be using is tracked using
data structures that are, uh, stored in memory.  Therefore I suspect
that this change is going in the right direction.

To put that another way, the issue here is that the removal of the
files can't happen after the cleanup of the memory that tells us which
files to remove.  If we had the corresponding problem for the
non-parallel case, it would mean that we were deleting the
transaction's memory context before we finished releasing all the
resources managed by the transaction's resowner, which would be
insane.  I believe I put the call to release DSM segments where it is
on the theory that "we should release dynamic shared memory as early
as possible because freeing memory is good," completing failing to
take into account that this was not at all like what we do for
backend-private memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fixing order of resowner cleanup in 12, for Windows

From

Tom Lane

Date:

06 May 2019, 17:32:14

Robert Haas <robertmhaas@gmail.com> writes:
> I have a thought about this.  It seems to me that when it comes to
> backend-private memory, we release it even later: aborting the
> transaction does nothing, and we do it only later when we clean up the
> transaction.  So I wonder whether we're going to find that we actually
> want to postpone reclaiming dynamic shared memory for even longer than
> this change would do.  But in general, I think we've already
> established the principle that releasing memory needs to happen last,
> because every other resource that you might be using is tracked using
> data structures that are, uh, stored in memory.  Therefore I suspect
> that this change is going in the right direction.

Hmm.  That argument suggests that DSM cleanup shouldn't be part of
resowner cleanup at all, but should be handled as a bespoke, late
step in transaction cleanup, as memory-context release is.

Not sure if that's going too far or not.  It would definitely be a big
change in environment for DSM-cleanup hooks.

            regards, tom lane

Re: Fixing order of resowner cleanup in 12, for Windows

From

Robert Haas

Date:

06 May 2019, 17:47:16

On Mon, May 6, 2019 at 1:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I have a thought about this.  It seems to me that when it comes to
> > backend-private memory, we release it even later: aborting the
> > transaction does nothing, and we do it only later when we clean up the
> > transaction.  So I wonder whether we're going to find that we actually
> > want to postpone reclaiming dynamic shared memory for even longer than
> > this change would do.  But in general, I think we've already
> > established the principle that releasing memory needs to happen last,
> > because every other resource that you might be using is tracked using
> > data structures that are, uh, stored in memory.  Therefore I suspect
> > that this change is going in the right direction.
>
> Hmm.  That argument suggests that DSM cleanup shouldn't be part of
> resowner cleanup at all, but should be handled as a bespoke, late
> step in transaction cleanup, as memory-context release is.
>
> Not sure if that's going too far or not.  It would definitely be a big
> change in environment for DSM-cleanup hooks.

Right.  That's why I favor applying the change to move DSM cleanup to
the end for now, and seeing how that goes.  It could be that we'll
eventually discover that doing it before all of the AtEOXact_BLAH
functions have had a short at doing their thing is still too early,
but the only concrete problem that we know about right now can be
solved by this much-less-invasive change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fixing order of resowner cleanup in 12, for Windows

From

Tom Lane

Date:

06 May 2019, 17:58:25

Robert Haas <robertmhaas@gmail.com> writes:
> Right.  That's why I favor applying the change to move DSM cleanup to
> the end for now, and seeing how that goes.  It could be that we'll
> eventually discover that doing it before all of the AtEOXact_BLAH
> functions have had a short at doing their thing is still too early,
> but the only concrete problem that we know about right now can be
> solved by this much-less-invasive change.

But Amit's results say that this *doesn't* fix the problem that we know
about.  I suspect the reason is exactly that we need to run AtEOXact_Files
or the like before closing DSM.  But we should get some Windows developer
to trace through this and identify the cause for-sure before we go
designing an invasive fix.

            regards, tom lane

Re: Fixing order of resowner cleanup in 12, for Windows

From

Robert Haas

Date:

06 May 2019, 18:08:18

On Mon, May 6, 2019 at 1:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Right.  That's why I favor applying the change to move DSM cleanup to
> > the end for now, and seeing how that goes.  It could be that we'll
> > eventually discover that doing it before all of the AtEOXact_BLAH
> > functions have had a short at doing their thing is still too early,
> > but the only concrete problem that we know about right now can be
> > solved by this much-less-invasive change.
>
> But Amit's results say that this *doesn't* fix the problem that we know
> about.  I suspect the reason is exactly that we need to run AtEOXact_Files
> or the like before closing DSM.  But we should get some Windows developer
> to trace through this and identify the cause for-sure before we go
> designing an invasive fix.

Huh, OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

09 May 2019, 10:23:25

On Tue, May 7, 2019 at 6:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, May 6, 2019 at 1:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > Right.  That's why I favor applying the change to move DSM cleanup to
> > > the end for now, and seeing how that goes.  It could be that we'll
> > > eventually discover that doing it before all of the AtEOXact_BLAH
> > > functions have had a short at doing their thing is still too early,
> > > but the only concrete problem that we know about right now can be
> > > solved by this much-less-invasive change.
> >
> > But Amit's results say that this *doesn't* fix the problem that we know
> > about.  I suspect the reason is exactly that we need to run AtEOXact_Files
> > or the like before closing DSM.  But we should get some Windows developer
> > to trace through this and identify the cause for-sure before we go
> > designing an invasive fix.
>
> Huh, OK.

The reason the patch didn't solve the problem is that
AtEOXact_Parallel() calls DestroyParallelContext().  So DSM segments
that happen to belong to ParallelContext objects are already gone by
the time resowner.c gets involved.

-- 
Thomas Munro
https://enterprisedb.com

Re: Fixing order of resowner cleanup in 12, for Windows

From

Thomas Munro

Date:

17 May 2019, 01:42:15

On Thu, May 9, 2019 at 10:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> The reason the patch didn't solve the problem is that
> AtEOXact_Parallel() calls DestroyParallelContext().  So DSM segments
> that happen to belong to ParallelContext objects are already gone by
> the time resowner.c gets involved.

This was listed as an open item for PostgreSQL 12, but I'm going to
move it to "older bugs".  I want to fix it, but now that I understand
what's wrong, it's a slightly bigger design issue than I'm game to try
to fix right now.

This means that 12, like 11, will be capable of leaking empty
temporary directories on Windows whenever an error is raised in the
middle of parallel CREATE INDEX or multi-batch Parallel Hash Join.
The directories are eventually unlinked at restart, and at least
the (potentially large) files inside the directory are unlinked on
abort.  I think we can live with that for a bit longer.

--
Thomas Munro
https://enterprisedb.com