Andres Freund wrote:
> I considered for a second whether the solution for that could be to not
> truncate while inconsistent - but I think that doesn't solve anything as
> then we can end up with directories where every single offsets/member
> file exists.
Hang on a minute. We don't need to scan any files to determine the
truncate point for offsets; we have the valid range for them in
pg_control, as nextMulti + oldestMulti. And using those end points, we
can look for the offsets corresponding to each, and determine the member
files corresponding to the whole set; it doesn't matter what other files
exist, we just remove them all. In other words, maybe we can get away
with considering truncation separately for offset and members on
recovery: do it like today for offsets (i.e. at each restartpoint), but
do it only in TrimMultiXact for members.
One argument against this idea is that we may not want to keep a full
set of member files on standbys (due to disk space usage), but that's
what will happen unless we truncate during replay.
> I think at least for 9.5+ we should a) invent proper truncation records
> for pg_multixact b) start storing oldestValidMultiOffset in pg_control.
> The current hack of scanning the directories to get knowledge we should
> have is a pretty bad hack, and we should not continue using it forever.
> I think we might end up needing to do a) even in the backbranches.
Definitely agree with WAL-logging truncations; also +1 on backpatching
that to 9.3. We already have experience with adding extra WAL records
on minor releases, and it didn't seem to have bitten too hard.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services