Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-general

From Robert Haas
Subject Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date
Msg-id CA+TgmobdnG62LH51b1Av-G03U1avuCC=Pi1y7uvzgMfQin784g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
List pgsql-general
On Sat, May 30, 2015 at 8:55 PM, Andres Freund <andres@anarazel.de> wrote:
> Is oldestMulti, nextMulti - 1 really suitable for this? Are both
> actually guaranteed to exist in the offsets slru and be valid?  Hm. I
> guess you intend to simply truncate everything else, but just in
> offsets?

oldestMulti in theory is the right thing, I think, but in actuality we
know that some people have 1 here instead of the correct value.

>> One argument against this idea is that we may not want to keep a full
>> set of member files on standbys (due to disk space usage), but that's
>> what will happen unless we truncate during replay.
>
> I think that argument is pretty much the death-knell.=

Yes.  Truncating on the standby is really not optional.

>> > I think at least for 9.5+ we should a) invent proper truncation records
>> > for pg_multixact b) start storing oldestValidMultiOffset in pg_control.
>> > The current hack of scanning the directories to get knowledge we should
>> > have is a pretty bad hack, and we should not continue using it forever.
>> > I think we might end up needing to do a) even in the backbranches.
>>
>> Definitely agree with WAL-logging truncations; also +1 on backpatching
>> that to 9.3.  We already have experience with adding extra WAL records
>> on minor releases, and it didn't seem to have bitten too hard.
>
> I'm inclined to agree. My only problem is that I'm not sure whether we
> can find a way of doing all this without adding a pg_control field. Let
> me try to sketch this out:
>
> 1) We continue determining the oldest SlruScanDirectory(SlruScanDirCbFindEarliest)
>    on the master to find the oldest offsets segment to
>    truncate. Alternatively, if we determine it to be safe, we could use
>    oldestMulti to find that.
> 2) SlruScanDirCbRemoveMembers is changed to return the range of members
>    to remove, instead of doing itself
> 3) We wal log [oldest offset segment guaranteed to not be alive,
>    nextmulti) for offsets, and [oldest members segment guaranteed to not be alive,
>    nextmultioff), and redo truncations for the entire range during
>    recovery.
>
> I'm pretty tired right now, but this sounds doable.

I'm probably biased here, but I think we should finish reviewing,
testing, and committing my patch before we embark on designing this.
So far we have no reports of trouble attributable to the lack of the
WAL-logging support discussed here, as opposed to several reports of
trouble from the status quo within days of release.

I'm having trouble reconstructing the series of events where what
you're worried about here really becomes a problem, and I think we
ought to have a very clear idea about that before back-patching
changes of this type.  It's true that if the state of the SLRU
directory is in the future, because recovery was restarted from an
earlier checkpoint, we might replay a checkpoint and remove some of
those files from the future.  But so what?  By the time we've reached
the minimum recovery point, they will have been recreated by the same
WAL records that created them in the first place.  If, in the previous
replay, we had wrapped all the way around, some of the stuff we keep
may actually already have been overwritten by future WAL records, but
they'll be overwritten again.  Now, that could mess up our
determination of which members to remove, I guess, but I'm not clear
that actually matters either: if the offsets space has wrapped around,
the members space will certainly have wrapped around as well, so we
can remove anything we like at this stage and we're still OK.  I agree
this is ugly the way it is, but where is the actual bug?

As far as your actual outline goes, I think if we do this, we need to
be very careful about step #2.  Right now, we decide what we need to
keep and then remove everything else, but that's kind of wonky because
new stuff may be getting created at the same time, so we keep
adjusting our idea of exactly what needs to be removed.  It would be
far better to invert that logic: decide what needs to be removed -
presumably, everything from the oldest member that now exists up until
some later point - and then remove precisely that stuff and nothing
else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-general by date:

Previous
From: "Glen M. Witherington"
Date:
Subject: Re: Efficient sorting the results of a join, without denormalization
Next
From: Adrian Klaver
Date:
Subject: Re: date type changing to timestamp without time zone in postgres 9.4