Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-general

From Andres Freund
Subject Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date
Msg-id 20150531005555.GB30287@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-general
On 2015-05-30 00:52:37 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > I considered for a second whether the solution for that could be to not
> > truncate while inconsistent - but I think that doesn't solve anything as
> > then we can end up with directories where every single offsets/member
> > file exists.
>
> Hang on a minute.  We don't need to scan any files to determine the
> truncate point for offsets; we have the valid range for them in
> pg_control, as nextMulti + oldestMulti.  And using those end points, we
> can look for the offsets corresponding to each, and determine the member
> files corresponding to the whole set; it doesn't matter what other files
> exist, we just remove them all.  In other words, maybe we can get away
> with considering truncation separately for offset and members on
> recovery: do it like today for offsets (i.e. at each restartpoint), but
> do it only in TrimMultiXact for members.

Is oldestMulti, nextMulti - 1 really suitable for this? Are both
actually guaranteed to exist in the offsets slru and be valid?  Hm. I
guess you intend to simply truncate everything else, but just in
offsets?

> One argument against this idea is that we may not want to keep a full
> set of member files on standbys (due to disk space usage), but that's
> what will happen unless we truncate during replay.

I think that argument is pretty much the death-knell.=

> > I think at least for 9.5+ we should a) invent proper truncation records
> > for pg_multixact b) start storing oldestValidMultiOffset in pg_control.
> > The current hack of scanning the directories to get knowledge we should
> > have is a pretty bad hack, and we should not continue using it forever.
> > I think we might end up needing to do a) even in the backbranches.
>
> Definitely agree with WAL-logging truncations; also +1 on backpatching
> that to 9.3.  We already have experience with adding extra WAL records
> on minor releases, and it didn't seem to have bitten too hard.

I'm inclined to agree. My only problem is that I'm not sure whether we
can find a way of doing all this without adding a pg_control field. Let
me try to sketch this out:

1) We continue determining the oldest SlruScanDirectory(SlruScanDirCbFindEarliest)
   on the master to find the oldest offsets segment to
   truncate. Alternatively, if we determine it to be safe, we could use
   oldestMulti to find that.
2) SlruScanDirCbRemoveMembers is changed to return the range of members
   to remove, instead of doing itself
3) We wal log [oldest offset segment guaranteed to not be alive,
   nextmulti) for offsets, and [oldest members segment guaranteed to not be alive,
   nextmultioff), and redo truncations for the entire range during
   recovery.

I'm pretty tired right now, but this sounds doable.

Greetings,

Andres Freund


pgsql-general by date:

Previous
From: Glyn Astill
Date:
Subject: Re: replacing jsonb field value
Next
From: "Glen M. Witherington"
Date:
Subject: Efficient sorting the results of a join, without denormalization