Re: [RFC] Should we fix postmaster to avoid slow shutdown? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date
Msg-id 21221.1479830385@sss.pgh.pa.us
Whole thread Raw
In response to Re: [RFC] Should we fix postmaster to avoid slow shutdown?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Responses Re: [RFC] Should we fix postmaster to avoid slow shutdown?
List pgsql-hackers
"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> The point I was trying to make is that I think the forced-removal behavior
>> is not desirable, and therefore committing a patch that makes it be graven
>> in stone is not desirable either.

> I totally agree that we should pursue the direction for escaping from the complete loss of stats files.  Personally,
Iwould like to combine that with the idea of persistent performance diagnosis information for long-term analysis (IIRC,
someoneproposed it.)  However, I don't think my patch will make everyone forget about the problem of stats file loss
duringrecovery.  The problem exists with or without my patch, and my patch doesn't have the power to delute the
importanceof the problem.  If you are worried about memory, we can add an entry for the problem in TODO list that
Bruce-sanis maintaining. 

> Or, maybe we can just stop removing the stats files during recovery by keeping the files of previous generation and
usingit as the current one.  I haven't seen how fresh the previous generation is (500ms ago?).  A bit older might be
betterthan nothing. 

Freshness isn't the issue.  The stats file isn't there at all, in the
permanent stats directory, unless the collector takes the time to write
it before exiting.  Without that, we have unrecoverable loss of the stats
data.  Now, that isn't as bad as loss of the SQL data content, but it's
not good either.

It's already the case that the pgstats code writes the stats data under a
temporary file name and then renames it into place atomically.  So the
prospects for corrupt data are not large, and I do not think that the
existing removal behavior was intended to prevent that.  Rather, the
concern was that if you do a point-in-time recovery to someplace much
earlier on the WAL timeline, the stats file will be out of sync with
what's now in your database.  That's a valid point, but deleting the
stats file during *any* recovery seems like an overreaction.

The simplest solution I can think of is to delete the stats file when
doing a PITR operation, but not during simple crash recovery.  I've
not looked to see how hard it would be to do that, but it seems like
it should be a fairly minor logic tweak.  Maybe decide to do the removal
at the point where we intentionally stop following WAL someplace earlier
than its end.

Another angle we might take, independently of that, is to delete the
stats file if the stats collector process itself crashes.  This would
provide a recovery avenue if somehow we did have a stats file that
was corrupt enough to crash the collector.  And it would not matter
for post-startup crashes of the stats collector, because the file
would not be there anyway.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Push down more full joins in postgres_fdw
Next
From: Robert Haas
Date:
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?