Thread: Rework the way multixact truncations work

Rework the way multixact truncations work

From

Andres Freund

Date:

21 June 2015, 19:24:21

Hi,

As discussed on list, over IM and in person at pgcon I want to make
multixact truncations be WAL logged to address various bugs.

Since that's a comparatively large and invasive change I thought it'd be
a good idea to start a new thread instead of burying it in a already
long thread.

Here's the commit message which hopefully explains what's being changed
and why:

    Rework the way multixact truncations work.

    The fact that multixact truncations are not WAL logged has caused a fair
    share of problems. Amongst others it requires to do computations during
    recovery while the database is not in a consistent state, delaying
    truncations till checkpoints, and handling members being truncated, but
    offset not.

    We tried to put bandaids on lots of these issues over the last years,
    but it seems time to change course. Thus this patch introduces WAL
    logging for truncation, even in the back branches.

    This allows:
    1) to perform the truncation directly during VACUUM, instead of delaying it
       to the checkpoint.
    2) to avoid looking at the offsets SLRU for truncation during recovery,
       we can just use the master's values.
    3) simplify a fair amount of logic to keep in memory limits straight,
       this has gotten much easier

    During the course of fixing this a bunch of bugs had to be fixed:
    1) Data was not purged from memory the member's slru before deleting
       segments. This happend to be hard or impossible to hit due to the
       interlock between checkpoints and truncation.
    2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
       that doesn't work for offsets that haven't yet been flushed to
       disk. Flush out before running to fix. Not pretty, but it feels
       slightly safer to only make decisions based on on-disk state.

    To handle the case of an updated standby replaying WAL from a not-yet
    upgraded primary we have to recognize that situation and use "old style"
    truncation (i.e. looking at the SLRUs) during WAL replay. In contrast to
    before this now happens in the startup process, when replaying a
    checkpoint record, instead of the checkpointer. Doing this in the
    restartpoint was incorrect, they can happen much later than the original
    checkpoint, thereby leading to wraparound. It's also more in line to how
    the WAL logging now works.

    To avoid "multixact_redo: unknown op code 48" errors standbys should be
    upgraded before primaries. This needs to be expressed clearly in the
    release notes.

    Backpatch to 9.3, where the use of multixacts was expanded. Arguably
    this could be backpatched further, but there doesn't seem to be
    sufficient benefit to outweigh the risk of applying a significantly
    different patch there.


I've tested this a bunch, including using a newer standby against a
older master and such. What I have yet to test is that the concurrency
protections against multiple backends truncating at the same time are
correct.

It'd be very welcome to see some wider testing and review on this.

I've attached three commits:
0001: Add functions to burn through multixacts - that should get its own file.
0002: Lower the lower bound limits for *_freeze_max_age - I think we should
      just do that. There really is no reason for the current limits
      and they make testing hard and force space wastage.
0003: The actual truncation patch.

Greetings,

Andres Freund

Hi,

I'm still working on this patch. I found a bunch of issues. Non of them
super critical and some pre-existing; but nonetheless I don't feel like
it's ready to push yet. So we'll have the first alpha without those
fixes :(

The biggest issues so far are:

* There's, both in the posted patch and as-is in all the branches, a lot
  of places that aren't actually safe against a concurrent truncation. A
  bunch of places grab oldestMultiXactId with an lwlock held, release
  it, and then make decisions based on that.

  A bunch of places (including the find_multixact_start callsites) is
  actually vulnerable to that, for a bunch of others it's less likely to
  be a problem.  All callers of GetMultiXactIdMembers() are vulnerable,
  but with the exception of pg_get_multixact_members() they'll never
  pass in a value that's older than the new oldest member value.

  That's a problem for the current branches. Afaics that can lead to a
  useless round of emergency autovacs via
  SetMultiXactIdLimit()->SetOffsetVacuumLimit().

  SetOffsetVacuumLimit() can protect easily agains that by taking the
  new MultiXactTruncationLock lock. We could do the same for
  pg_get_multixact_members() - afaics the only caller that'll look up too
  old values otherwise - but I don't think it matters, you'll get a
  slightly obscure error if you access a too old xact that's just being
  truncated away and that is that.

* There was no update of the in-memory oldest* values when replaying a
  truncation. That's problematic if a standby is promoted after
  replaying a truncation record, but before a following checkpoint
  record. This would be fixed by an emergency autovacuum, but that's
  obviously not nice. Trivial to fix.

* The in-memory oldest values were updated *after* the truncation
  happened. It's unlikely to matter in reality, but it's safer to
  update them before, so a concurrent GetMultiXactIdMembers() of stuff
  from before the truncation will get the proper error.

* PerformMembersTruncation() probably confused Alvaro because it wasn't
  actually correct - there's no need to have the segment containing old
  oldestOffset (in contrast to oldestOffsetAlive) survive. Except
  leaking a segment that's harmless, but obviously not desirable.

Additionally I'm changing some stuff, some requested by review:
* xl_multixact_truncate's members are now called
  (start|end)Trunc(Off|Memb)
* (start|end)TruncOff have the appropriate type now
* typo fixes
* comment improvements
* pgindent

New version attached.

Greetings,

Andres Freund

On 2015-09-21 16:36:03 +0200, Andres Freund wrote:
> Agreed. I'll update the patch.

Here's updated patches against master. These include the "legacy"
truncation support. There's no meaningful functional differences in this
version except addressing the review comments that I agreed with, and a
fair amount of additional polishing.

I've not:
* removed legacy truncation support
* removed SimpleLruFlush() from find_multixact_start() - imo it's easier
  to reason about the system when the disk state is in sync with the in
  memory state.
* removed the interval syntax from debug messages and xlogdump - they're
  a fair bit more concise and the target audience of those will be able
  to figure it out.
* unsplit DetermineSafeOldestOffset & SetOffsetVacuumLimit - imo the
  separate functions don't make sense anymore now that limits and
  truncations aren't as separate anymore.

What I've tested is the following:
* continous burning of multis, both triggered via members and offsets
* a standby keeping up when the primary is old
* a standby keeping up when the primary is new
* basebackups made while a new primary is under load
* verified that we properly PANIC when a truncation record is replayed
  in an old standby.

Does anybody have additional tests in mind?

I plan to push 0002 fairly soon, it seemed to be pretty
uncontroversial. I'll then work tomorrow afternoon on producing branch
specific versions of 0003 and on producing 0004 removing all the legacy
stuff for 9.5 & master.

Greetings,

Andres Freund

The comment on top of TrimMultiXact states that "no locks are needed
here", but then goes on to grab a few locks. I think we should remove
the comment, or rephrase it to state that we still grab them for
consistency or whatever; or perhaps even remove the lock acquisitions.
(I think the comment is still true: by the time TrimMultiXact runs,
we're out of recovery but not yet running, so it's not possible for
anyone to try to do anything multixact-related.)

I wonder if it would be cleaner to move the setting of finishedStartup
down to just before calling SetMultiXactIdLimit, instead of at the top
of the function.

It's a bit odd that SetMultiXactIdLimit has the "finishedStartup" test
so low. Why bother setting all those local variables only to bail out?
I think it would make more sense to just do it at the top. The only
thing you lose AFAICS is that elog(DEBUG1) message -- is that worth it?
Also, the fact that finishedStartup itself is read without a lock at
least merits a comment.

In MultiXactAdvanceOldest, the test for sawTruncationinCkptCycle seems
reversed? if (!MultiXactState->sawTruncationInCkptCycle)
surely we should be doing truncation if it's set?

Honestly, I wonder whether this message ereport(LOG, (errmsg("performing legacy multixact
truncation"), errdetail("Legacy truncations are sometimes performed when replaying WAL from an older
primary."), errhint("Upgrade the primary, it is susceptible to data corruption.")));

shouldn't rather be a PANIC. (The main reason not to, I think, is that
once you see this, there is no way to put the standby in a working state
without recloning).

I think the prevOldestOffsetKnown test in line 2667 ("if we failed to
get ...") is better expressed as an else-if of the previous "if" block.

I think the two "there are NO MultiXacts" cases in TruncateMultiXact
would benefit in readability from adding braces around the lone
statement (and moving the comment to the line prior).

If the find_multixact_start(oldestMulti) call in TruncateMultiXact
fails, what recourse does the user have? I wonder if the elog() should
be a FATAL instead of just LOG. It's not like it would work on a
subsequent run, is it?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

23 September 2015, 18:49:01

On 2015-09-23 15:03:05 -0300, Alvaro Herrera wrote:
> The comment on top of TrimMultiXact states that "no locks are needed
> here", but then goes on to grab a few locks.

Hm. Yea. Although that was the case before.

> It's a bit odd that SetMultiXactIdLimit has the "finishedStartup" test
> so low.  Why bother setting all those local variables only to bail
> out?

Hm. Doesn't seem to matter much to me, but I can change it.

> In MultiXactAdvanceOldest, the test for sawTruncationinCkptCycle seems
> reversed?
>         if (!MultiXactState->sawTruncationInCkptCycle)
> surely we should be doing truncation if it's set?

No, that's correct. If there was a checkpoint cycle where oldestMulti
advanced without seing a truncation record we need to perform a legacy
truncation.

> Honestly, I wonder whether this message
>             ereport(LOG,
>                     (errmsg("performing legacy multixact truncation"),
>                      errdetail("Legacy truncations are sometimes performed when replaying WAL from an older
primary."),
>                      errhint("Upgrade the primary, it is susceptible to data corruption.")));
> shouldn't rather be a PANIC.  (The main reason not to, I think, is that
> once you see this, there is no way to put the standby in a working state
> without recloning).

Huh? The behaviour in that case is still better than what we have in
9.3+ today (not delayed till the restartpoint). Don't see why that
should be a panic. That'd imo make it pretty much impossible to upgrade
a pair of primary/master where you normally upgrade the standby first?

This is all moot given Robert's objection to backpatching this to
9.3/4.

> If the find_multixact_start(oldestMulti) call in TruncateMultiXact
> fails, what recourse does the user have?  I wonder if the elog() should
> be a FATAL instead of just LOG.  It's not like it would work on a
> subsequent run, is it?

It currently only LOGs, I don't want to change that. The cases where we
currently know it's possible to hit this, it should be fixed by the next
set of emergency autovacuums (which we trigger).

Thanks for the look,

Andres

Re: Rework the way multixact truncations work

From

Alvaro Herrera

Date:

23 September 2015, 18:58:24

Andres Freund wrote:
> On 2015-09-23 15:03:05 -0300, Alvaro Herrera wrote:

> > Honestly, I wonder whether this message
> >             ereport(LOG,
> >                     (errmsg("performing legacy multixact truncation"),
> >                      errdetail("Legacy truncations are sometimes performed when replaying WAL from an older
primary."),
> >                      errhint("Upgrade the primary, it is susceptible to data corruption.")));
> > shouldn't rather be a PANIC.  (The main reason not to, I think, is that
> > once you see this, there is no way to put the standby in a working state
> > without recloning).
> 
> Huh? The behaviour in that case is still better than what we have in
> 9.3+ today (not delayed till the restartpoint). Don't see why that
> should be a panic. That'd imo make it pretty much impossible to upgrade
> a pair of primary/master where you normally upgrade the standby first?
> 
> This is all moot given Robert's objection to backpatching this to
> 9.3/4.

I think we need to make a decision here.  Is this a terribly serious
bug/misdesign that needs addressing?  If so, we need to backpatch.  If
not, then by all means lets leave it alone.  I don't think it is a good
idea to leave it open if we think it's serious, which is what I think is
happening.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

23 September 2015, 19:00:53

On 2015-09-23 15:57:02 -0300, Alvaro Herrera wrote:
> I think we need to make a decision here.  Is this a terribly serious
> bug/misdesign that needs addressing?

Imo yes. Not sure about terribly, but definitely serious. It's several
data loss bugs in one package.

> If so, we need to backpatch.  If not, then by all means lets leave it
> alone.  I don't think it is a good idea to leave it open if we think
> it's serious, which is what I think is happening.

Right, but I don't want to backpatch this over an objection, and it
doesn't seem like I have a chance to convince Robert that it'd be a good
idea. So it'll be 9.5+master for now.

Greetings,

Andres Freund

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

24 September 2015, 14:35:05

Hi,

On 2015-09-23 15:03:05 -0300, Alvaro Herrera wrote:
> I wonder if it would be cleaner to move the setting of finishedStartup
> down to just before calling SetMultiXactIdLimit, instead of at the top
> of the function.

Done. I don't think it makes much of a difference, but there's really no
reason not to change it.

> It's a bit odd that SetMultiXactIdLimit has the "finishedStartup" test
> so low.  Why bother setting all those local variables only to bail
> out?

But we do more than set local variables? We actually set some in-memory
limits (MultiXactState->multiVacLimit et al). What we can't do is to
startup the members wraparound protection because that requires
accessing the SLRUs.

Perhaps we should, independently of this patch really, rename
SetOffsetVacuumLimit() - it may be rather confusing that it actually is
about members/? The current name is correct, but also a bit
confusing. ComputeMembersVacuumLimits()?

> In MultiXactAdvanceOldest, the test for sawTruncationinCkptCycle seems
> reversed?
>         if (!MultiXactState->sawTruncationInCkptCycle)
> surely we should be doing truncation if it's set?

I wanted to add a comment explaining this, but the existing job seems to
do a fair job at that:
        /*
         * If there has been a truncation on the master, detected by seeing a
         * moving oldestMulti, without a corresponding truncation record, we
         * know that the primary is still running an older version of postgres
         * that doesn't yet log multixact truncations. So perform the
         * truncation ourselves.
         */

I've done some additional comment smithing.

Attached is 0002 (prev 0003) including the legacy truncation support,
and 0003 removing that and bumping page magic. I'm slightly inclined to
commit them separately (to 9.5 & master) so that we have something to
backpatch from.

Greetings,

Andres Freund

On Thu, Oct 29, 2015 at 08:46:52AM +0100, Andres Freund wrote:
> On October 29, 2015 7:59:03 AM GMT+01:00, Noah Misch <noah@leadboat.com> wrote:
> >That helps; thanks.  Your design seems good.  I've located only insipid
> >defects.
>
> Great!
>
> > I propose to save some time by writing a patch series
> >eliminating
> >them, which you could hopefully review.  Does that sound good?
>
> Yes, it does.

I have pushed a stack of branches to https://github.com/nmisch/postgresql.git:

mxt0-revert - reverts commits 4f627f8 and aa29c1c
mxt1-disk-independent - see below
mxt2-cosmetic - update already-wrong comments and formatting
mxt3-main - replaces commit 4f627f8
mxt4-rm-legacy - replaces commit aa29c1c

The plan is to squash each branch into one PostgreSQL commit.  In addition to
examining overall "git diff mxt2-cosmetic mxt3-main", I recommend reviewing
itemized changes and commit log entries in "git log -p --reverse --no-merges
mxt2-cosmetic..mxt3-main".  In particular, when a change involves something
you discussed upthread or was otherwise not obvious, I put a statement of
rationale in the commit log.

> +     * Make sure to only attempt truncation if there's values to truncate
> +     * away. In normal processing values shouldn't go backwards, but there's
> +     * some corner cases (due to bugs) where that's possible.

Which bugs are those?  I would like to include more detail if available.

If anything here requires careful study, it's the small mxt1-disk-independent
change, which I have also attached in patch form.  That patch removes the
SlruScanDirCbFindEarliest() test from TruncateMultiXact(), which in turn makes
multixact control decisions independent of whether TruncateMultiXact() is
successful at unlinking segments.  Today, one undeletable segment file can
cause TruncateMultiXact() to give up on truncation completely for a span of
hundreds of millions of MultiXactId.  Patched multixact.c will, like CLOG,
make its decisions strictly based on the content of files it expects to exist.
It will no longer depend on the absence of files it hopes will not exist.

To aid in explaining the change's effects, I will define some terms.  A
"logical wrap" occurs when no range of 2^31 integers covers the set of
MultiXactId stored in xmax fields.  A "file-level wrap" occurs when there
exists a pair of pg_multixact/offsets segment files such that:

  | segno_a * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE -
    segno_b * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE | > 2^31

A logical wrap implies either a file-level wrap or missing visibility
metadata, but a file-level wrap does not imply other consequences.  The
SlruScanDirCbFindEarliest() test is usually redundant with
find_multixact_start(), because MultiXactIdPrecedes(oldestMXact, earliest)
almost implies that find_multixact_start() will fail.  The exception arises
when pg_multixact/offsets files compose a file-level wrap, which can happen
when TruncateMultiXact() fails to unlink segments as planned.  When it does
happen, the result of SlruScanDirCbFindEarliest(), and therefore the computed
"earliest" value, is undefined.  (This outcome is connected to our requirement
to use only half the pg_clog or pg_multixact/offsets address space at any one
time.  The PagePrecedes callbacks for these SLRUs cease to be transitive if
more than half their address space is in use.)

The SlruScanDirCbFindEarliest() test can be helpful when a file-level wrap
coexists with incorrect oldestMultiXactId (extant xmax values define "correct
oldestMultiXactId").  If we're lucky with readdir() order, the test will block
truncation so we don't delete still-needed segments.  I am content to lose
that, because (a) the code isn't reliable for (or even directed toward) that
purpose and (b) sites running on today's latest point releases no longer have
incorrect oldestMultiXactId.

Attachment

mxact-trunc-disk-independent-v1.patch

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

08 November 2015, 19:12:01

On November 8, 2015 12:54:07 AM PST, Noah Misch <noah@leadboat.com> wrote:

>I have pushed a stack of branches to
>https://github.com/nmisch/postgresql.git:
>
>mxt0-revert - reverts commits 4f627f8 and aa29c1c
>mxt1-disk-independent - see below
>mxt2-cosmetic - update already-wrong comments and formatting
>mxt3-main - replaces commit 4f627f8
>mxt4-rm-legacy - replaces commit aa29c1c
>
>The plan is to squash each branch into one PostgreSQL commit.  In
>addition to
>examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
>reviewing
>itemized changes and commit log entries in "git log -p --reverse
>--no-merges
>mxt2-cosmetic..mxt3-main".  In particular, when a change involves
>something
>you discussed upthread or was otherwise not obvious, I put a statement
>of
>rationale in the commit log.

I'm not following along right now - in order to make cleanups the plan is to revert a couple commits and then redo them
prettyfied?

Andres
Hi
--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

08 November 2015, 19:52:15

On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
> On November 8, 2015 12:54:07 AM PST, Noah Misch <noah@leadboat.com> wrote:
> 
> >I have pushed a stack of branches to
> >https://github.com/nmisch/postgresql.git:
> >
> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
> >mxt1-disk-independent - see below
> >mxt2-cosmetic - update already-wrong comments and formatting
> >mxt3-main - replaces commit 4f627f8
> >mxt4-rm-legacy - replaces commit aa29c1c
> >
> >The plan is to squash each branch into one PostgreSQL commit.  In
> >addition to
> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
> >reviewing
> >itemized changes and commit log entries in "git log -p --reverse
> >--no-merges
> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
> >something
> >you discussed upthread or was otherwise not obvious, I put a statement
> >of
> >rationale in the commit log.
> 
> I'm not following along right now - in order to make cleanups the plan is to revert a couple commits and then redo
themprettyfied?
 

Yes, essentially.  Given the volume of updates, this seemed neater than
framing those updates as in-tree incremental development.

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

08 November 2015, 19:59:42

On November 8, 2015 11:52:05 AM PST, Noah Misch <noah@leadboat.com> wrote:
>On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
>> On November 8, 2015 12:54:07 AM PST, Noah Misch <noah@leadboat.com>
>wrote:
>> 
>> >I have pushed a stack of branches to
>> >https://github.com/nmisch/postgresql.git:
>> >
>> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
>> >mxt1-disk-independent - see below
>> >mxt2-cosmetic - update already-wrong comments and formatting
>> >mxt3-main - replaces commit 4f627f8
>> >mxt4-rm-legacy - replaces commit aa29c1c
>> >
>> >The plan is to squash each branch into one PostgreSQL commit.  In
>> >addition to
>> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
>> >reviewing
>> >itemized changes and commit log entries in "git log -p --reverse
>> >--no-merges
>> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
>> >something
>> >you discussed upthread or was otherwise not obvious, I put a
>statement
>> >of
>> >rationale in the commit log.
>> 
>> I'm not following along right now - in order to make cleanups the
>plan is to revert a couple commits and then redo them prettyfied?
>
>Yes, essentially.  Given the volume of updates, this seemed neater than
>framing those updates as in-tree incremental development.

I don't like that plan. I don't have a problem doing that in some development branch somewhere, but I fail to see any
benefitdoing that in 9.5/master. It'll just make the history more convoluted for no benefit.
 

I'll obviously still review the changes.

Even for review it's nor particularly convenient, because now the entirety of the large changes essentially needs to be
reviewedanew, given they're not the same. 
 

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

16 November 2015, 08:21:05

On Sun, Nov 08, 2015 at 11:59:33AM -0800, Andres Freund wrote:
> On November 8, 2015 11:52:05 AM PST, Noah Misch <noah@leadboat.com> wrote:
> >On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
> >> On November 8, 2015 12:54:07 AM PST, Noah Misch <noah@leadboat.com> wrote:
> >> 
> >> >I have pushed a stack of branches to
> >> >https://github.com/nmisch/postgresql.git:
> >> >
> >> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
> >> >mxt1-disk-independent - see below
> >> >mxt2-cosmetic - update already-wrong comments and formatting
> >> >mxt3-main - replaces commit 4f627f8
> >> >mxt4-rm-legacy - replaces commit aa29c1c
> >> >
> >> >The plan is to squash each branch into one PostgreSQL commit.  In
> >> >addition to
> >> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
> >> >reviewing
> >> >itemized changes and commit log entries in "git log -p --reverse
> >> >--no-merges
> >> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
> >> >something
> >> >you discussed upthread or was otherwise not obvious, I put a
> >statement
> >> >of
> >> >rationale in the commit log.
> >> 
> >> I'm not following along right now - in order to make cleanups the
> >plan is to revert a couple commits and then redo them prettyfied?
> >
> >Yes, essentially.  Given the volume of updates, this seemed neater than
> >framing those updates as in-tree incremental development.
> 
> I don't like that plan. I don't have a problem doing that in some development branch somewhere, but I fail to see any
benefitdoing that in 9.5/master. It'll just make the history more convoluted for no benefit.

> 
> I'll obviously still review the changes.

Cleanliness of history is precisely why I did it this way.  If I had framed
the changes as in-tree incremental development, no one "git diff" command
would show the truncation rework or a coherent subset.  To review the whole,
students of this code might resort to a cherry-pick of the repair commits onto
aa29c1c.  That, too, proves dissatisfying; the history would nowhere carry a
finished version of legacy truncation support.  A hacker opting to back-patch
in the future, as commit 4f627f8 contemplated, would need to dig through this
thread for the bits added in mxt3-main and removed in mxt4-rm-legacy.

The benefits may become clearer as you continue to review the branches.

> Even for review it's nor particularly convenient, because now the entirety of the large changes essentially needs to
bereviewed anew, given they're not the same. 

Agreed; I optimized for future readers, and I don't doubt this is less
convenient for you and for others already familiar with commits 4f627f8 and
aa29c1c.  I published branches, not squashed patches, mostly because I think
the individual branch commits will facilitate your study of the changes.  I
admit the cost to you remains high.

Re: Rework the way multixact truncations work

From

Peter Geoghegan

Date:

23 November 2015, 19:44:52

On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch <noah@leadboat.com> wrote:
>> I'm not following along right now - in order to make cleanups the plan is to revert a couple commits and then redo
themprettyfied?

>
> Yes, essentially.  Given the volume of updates, this seemed neater than
> framing those updates as in-tree incremental development.

I think that's an odd way of representing this work. I tend to
remember roughly when major things were committed even years later. An
outright revert should represent a total back out of the original
commit IMV. Otherwise, a git blame can be quite misleading. I can
imagine questioning my recollection, even when it is accurate, if only
because I don't expect this.

-- 
Peter Geoghegan

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

27 November 2015, 22:16:41

On Mon, Nov 23, 2015 at 11:44:45AM -0800, Peter Geoghegan wrote:
> On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch <noah@leadboat.com> wrote:
> >> I'm not following along right now - in order to make cleanups the plan is to revert a couple commits and then redo
themprettyfied?
 
> >
> > Yes, essentially.  Given the volume of updates, this seemed neater than
> > framing those updates as in-tree incremental development.
> 
> I think that's an odd way of representing this work. I tend to
> remember roughly when major things were committed even years later. An
> outright revert should represent a total back out of the original
> commit IMV. Otherwise, a git blame can be quite misleading.

I think you're saying that "clearer git blame" is a more-important reason than
"volume of updates" for preferring an outright revert over in-tree incremental
development.  Fair preference.  If that's a correct reading of your message,
then we do agree on the bottom line.

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

01 December 2015, 22:07:21

On Fri, Nov 27, 2015 at 5:16 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Nov 23, 2015 at 11:44:45AM -0800, Peter Geoghegan wrote:
>> On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch <noah@leadboat.com> wrote:
>> >> I'm not following along right now - in order to make cleanups the plan is to revert a couple commits and then
redothem prettyfied?
 
>> >
>> > Yes, essentially.  Given the volume of updates, this seemed neater than
>> > framing those updates as in-tree incremental development.
>>
>> I think that's an odd way of representing this work. I tend to
>> remember roughly when major things were committed even years later. An
>> outright revert should represent a total back out of the original
>> commit IMV. Otherwise, a git blame can be quite misleading.
>
> I think you're saying that "clearer git blame" is a more-important reason than
> "volume of updates" for preferring an outright revert over in-tree incremental
> development.  Fair preference.  If that's a correct reading of your message,
> then we do agree on the bottom line.

Hmm.  I read Peter's message as agreeing with Andres rather than with
you.  And I have to say I agree with Andres as well.  I think it's
weird to back a commit out only to put a bunch of very similar stuff
back in.  Even figuring out what you've actually changed here seems
rather hard.  I couldn't get github to give me a diff showing your
changes vs. master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Peter Geoghegan

Date:

01 December 2015, 22:15:18

On Tue, Dec 1, 2015 at 2:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm.  I read Peter's message as agreeing with Andres rather than with
> you.  And I have to say I agree with Andres as well.  I think it's
> weird to back a commit out only to put a bunch of very similar stuff
> back in.

Your interpretation was correct. I think it's surprising to structure
things this way, especially since we haven't done things this way in
the past.

-- 
Peter Geoghegan

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

02 December 2015, 14:57:34

On Tue, Dec 01, 2015 at 05:07:15PM -0500, Robert Haas wrote:
> I think it's weird to back a commit out only to put a bunch of very similar
> stuff back in.

I agree with that.  If the original patches and their replacements shared 95%
of diff lines in common, we wouldn't be having this conversation.  These
replacements redo closer to 50% of the lines, so the patches are not very
similar by verbatim line comparison.  Even so, let's stipulate that my
proposal is weird.  I'd rather be weird than lose one of the benefits I
articulated upthread, let alone all of them.

My post-commit review of RLS woke me up to the problems of gradually finishing
work in-tree.  By the time I started that review in 2015-06, the base RLS
feature already spanned twenty-two commits.  (That count has since more than
doubled.)  Reviewing each commit threatened to be wasteful, because I would
presumably find matters already fixed later.  I tried to cherry-pick the
twenty-two commits onto a branch, hoping to review the overall diff as "git
diff master...squash-rls", but that yielded complex merge conflicts.  Where
the conflicts were too much, I reviewed entire files instead.  (Granted, no
matter how this thread ends, I do not expect an outcome that opaque.)  Hackers
have been too reticent to revert and redo defect-laden commits.  If doing that
is weird today, let it be normal.

> Even figuring out what you've actually changed here seems
> rather hard.  I couldn't get github to give me a diff showing your
> changes vs. master.

If you, an expert in the 2015-09-26 commits, want to study the key changes I
made, I recommend perusing these outputs:

git remote add nmisch_github https://github.com/nmisch/postgresql.git
git fetch nmisch_github
git diff nmisch_github/mxt0-revert nmisch_github/mxt1-disk-independent
git log -p --reverse --no-merges nmisch_github/mxt2-cosmetic..nmisch_github/mxt3-main

For the overall diff vs. master that you sought:

git remote add community git://git.postgresql.org/git/postgresql.git
git remote add nmisch_github https://github.com/nmisch/postgresql.git
git fetch --multiple community nmisch_github
git diff community/master...nmisch_github/mxt4-rm-legacy

If anyone not an author or reviewer of the 2015-09-26 commits wishes to review
the work, don't read the above diffs; the intermediate states are
uninteresting.  Read these four:

git remote add nmisch_github https://github.com/nmisch/postgresql.git
git fetch nmisch_github
git diff nmisch_github/mxt0-revert nmisch_github/mxt1-disk-independent
git diff nmisch_github/mxt1-disk-independent nmisch_github/mxt2-cosmetic
git diff nmisch_github/mxt2-cosmetic nmisch_github/mxt3-main
git diff nmisch_github/mxt3-main nmisch_github/mxt4-rm-legacy

Thanks,
nm

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

02 December 2015, 15:46:36

On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> On Tue, Dec 01, 2015 at 05:07:15PM -0500, Robert Haas wrote:
> > I think it's weird to back a commit out only to put a bunch of very similar
> > stuff back in.
>
> I agree with that.  If the original patches and their replacements shared 95%
> of diff lines in common, we wouldn't be having this conversation. These
> replacements redo closer to 50% of the lines, so the patches are not very
> similar by verbatim line comparison.

Which is a huge problem, because it makes it very hard to see what your
changes actually do. And a significant portion of the changes relative
to master aren't particularly critical. Which is easy to see if if a
commit only changes comments, but harder if you see one commit reverting
things, and another redoing most of the same things.

> Hackers have been too reticent to revert and redo defect-laden
> commits.  If doing that is weird today, let it be normal.

Why? Especially if reverting and redoing includes conflicts that mainly
increases the chance of accidental bugs.

> git remote add community git://git.postgresql.org/git/postgresql.git
> git remote add nmisch_github https://github.com/nmisch/postgresql.git
> git fetch --multiple community nmisch_github
> git diff community/master...nmisch_github/mxt4-rm-legacy

That's a nearly useless diff, because it includes a lot of other changes
(218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
you published the changes. What kinda works is
git diff $(git merge-base community/master nmisch_github/mxt4-rm-legacy)..nmisch_github/mxt4-rm-legacy
which shows the diff to the version of master you start off from.

Review of the above diff:
> @@ -2013,7 +2017,7 @@ TrimMultiXact(void)
>  {
>      MultiXactId nextMXact;
>      MultiXactOffset offset;
> -    MultiXactId oldestMXact;
> +    MultiXactId    oldestMXact;

That's a bit weird, given that nextMXact isn't indented...


> @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> -    /*
> -     * Computing the actual limits is only possible once the data directory is
> -     * in a consistent state. There's no need to compute the limits while
> -     * still replaying WAL - no decisions about new multis are made even
> -     * though multixact creations might be replayed. So we'll only do further
> -     * checks after TrimMultiXact() has been called.
> -     */
> +    /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
>      if (!MultiXactState->finishedStartup)
>          return;
> -
>      Assert(!InRecovery);
> 
> -    /* Set limits for offset vacuum. */
> +    /*
> +     * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> +     * call, which is only possible once the data directory is in a consistent
> +     * state. There's no need for an offset limit while still replaying WAL;
> +     * no decisions about new multis are made even though multixact creations
> +     * might be replayed.
> +     */
>      needs_offset_vacuum = SetOffsetVacuumLimit();

I don't really see the benefit of this change. The previous location of
the comment is where we return early, so it seems appropriate to
document the reason there?

>      /*
> @@ -2354,6 +2356,12 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
>          debug_elog3(DEBUG2, "MultiXact: setting next multi to %u", minMulti);
>          MultiXactState->nextMXact = minMulti;
>      }
> +
> +    /*
> +     * MultiXactOffsetPrecedes() gives the wrong answer if nextOffset would
> +     * advance more than 2^31 between calls.  Since we get a call for each
> +     * XLOG_MULTIXACT_CREATE_ID, that should never happen.
> +     */

Independent comment improvement. Good idea though.


>  /*
> - * Update our oldestMultiXactId value, but only if it's more recent than what
> - * we had.
> - *
> - * This may only be called during WAL replay.
> + * Update our oldestMultiXactId value, but only if it's more recent than
> + * what we had.  This may only be called during WAL replay.
>   */

Whatever?

>  void
>  MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB)
> @@ -2544,14 +2550,13 @@ GetOldestMultiXactId(void)
>  static bool
>  SetOffsetVacuumLimit(void)
>  {
> -    MultiXactId oldestMultiXactId;
> +    MultiXactId    oldestMultiXactId;
>      MultiXactId nextMXact;
> -    MultiXactOffset oldestOffset = 0;    /* placate compiler */
> -    MultiXactOffset prevOldestOffset;
> +    MultiXactOffset oldestOffset = 0;        /* placate compiler */
>      MultiXactOffset nextOffset;
>      bool        oldestOffsetKnown = false;
> +    MultiXactOffset prevOldestOffset;
>      bool        prevOldestOffsetKnown;
> -    MultiXactOffset offsetStopLimit = 0;

I don't see the benefit of the order changes here.

> @@ -2588,40 +2590,50 @@ SetOffsetVacuumLimit(void)
>      else
>      {
>          /*
> -         * Figure out where the oldest existing multixact's offsets are
> -         * stored. Due to bugs in early release of PostgreSQL 9.3.X and 9.4.X,
> -         * the supposedly-earliest multixact might not really exist.  We are
> +         * Figure out where the oldest existing multixact's offsets are stored.
> +         * Due to bugs in early release of PostgreSQL 9.3.X and 9.4.X, the
> +         * supposedly-earliest multixact might not really exist.  We are
>           * careful not to fail in that case.
>           */
>          oldestOffsetKnown =
>              find_multixact_start(oldestMultiXactId, &oldestOffset);
> -
> -        if (oldestOffsetKnown)
> -            ereport(DEBUG1,
> -                    (errmsg("oldest MultiXactId member is at offset %u",
> -                            oldestOffset)));

That's imo a rather useful debug message.

> -        else
> +        if (!oldestOffsetKnown)
> +        {
> +            /* XXX This message is incorrect if prevOldestOffsetKnown. */
>              ereport(LOG,
>                      (errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed
MultiXact%u does not exist on disk",
 
>                              oldestMultiXactId)));
> +        }
>      }

Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere?

>      LWLockRelease(MultiXactTruncationLock);
> 
>      /*
> -     * If we can, compute limits (and install them MultiXactState) to prevent
> -     * overrun of old data in the members SLRU area. We can only do so if the
> -     * oldest offset is known though.
> +     * There's no need to update anything if we don't know the oldest offset
> +     * or if it hasn't changed.
>       */

Is that really a worthwhile optimization?

> -typedef struct mxtruncinfo
> -{
> -    int            earliestExistingPage;
> -} mxtruncinfo;
> -
> -/*
> - * SlruScanDirectory callback
> - *        This callback determines the earliest existing page number.
> - */
> -static bool
> -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
> -{
> -    mxtruncinfo *trunc = (mxtruncinfo *) data;
> -
> -    if (trunc->earliestExistingPage == -1 ||
> -        ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> -    {
> -        trunc->earliestExistingPage = segpage;
> -    }
> -
> -    return false;                /* keep going */
> -}
> -

That really seems like an independent change, deserving its own commit +
explanation. Just referring to "See mailing list submission notes for
rationale." makes understanding the change later imo much harder than
all the incremental commits you try to avoid.

> /*
> - * Decide which of two MultiXactMember page numbers is "older" for truncation
> - * purposes.  There is no "invalid offset number" so use the numbers verbatim.
> + * Dummy notion of which of two MultiXactMember page numbers is "older".
> + *
> + * Due to the MultiXactOffsetPrecedes() specification, this function's result
> + * is meaningless unless the system is preserving less than 2^31 members.  It
> + * is adequate for SlruSelectLRUPage() guessing the cheapest slot to reclaim.
> + * Do not pass MultiXactMemberCtl to any of the functions that use the
> + * PagePrecedes callback in other ways.
>   */
>  static bool
>  MultiXactMemberPagePrecedes(int page1, int page2)
> @@ -3157,6 +3134,10 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
> 
>  /*
>   * Decide which of two offsets is earlier.
> + *
> + * Avoid calling this function.  pg_multixact/members can preserve almost 2^32
> + * members at any given time, but this function is transitive only when the
> + * system is preserving less than 2^31 members.
>   */
>  static bool
>  MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)

As mentioned before, these really seem unrelated.

> @@ -1237,24 +1235,25 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
>      SlruShared    shared = ctl->shared;
>      int            slotno;
>      char        path[MAXPGPATH];
> -    bool        did_write;
> 
>      /* Clean out any possibly existing references to the segment. */
>      LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
>  restart:
> -    did_write = false;
>      for (slotno = 0; slotno < shared->num_slots; slotno++)
>      {
> -        int            pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
> +        int            pagesegno;
> 
>          if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
>              continue;
> 
>          /* not the segment we're looking for */
> +        pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
>          if (pagesegno != segno)
>              continue;
> 
> -        /* If page is clean, just change state to EMPTY (expected case). */
> +        /*
> +         * If page is clean, just change state to EMPTY (expected case).
> +         */
>          if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
>              !shared->page_dirty[slotno])
>          {
> @@ -1267,18 +1266,10 @@ restart:
>              SlruInternalWritePage(ctl, slotno, NULL);
>          else
>              SimpleLruWaitIO(ctl, slotno);
> -
> -        did_write = true;
> -    }
> -
> -    /*
> -     * Be extra careful and re-check. The IO functions release the control
> -     * lock, so new pages could have been read in.
> -     */
> -    if (did_write)
>          goto restart;
> +    }

I don't think that's really a good idea - this way we restart after
every single page write, whereas currently we only restart after passing
through all pages once. In nearly all cases we'll only ever have to
retry once in total, be because such old pages aren't usually going to
be reread/redirtied.

> @@ -9216,10 +9212,8 @@ xlog_redo(XLogReaderState *record)
>          LWLockRelease(OidGenLock);
>          MultiXactSetNextMXact(checkPoint.nextMulti,
>                                checkPoint.nextMultiOffset);
> -
> -        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> -                               checkPoint.oldestMultiDB);
>          SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
> +        SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
> 
>          /*
>           * If we see a shutdown checkpoint while waiting for an end-of-backup
> @@ -9309,17 +9303,13 @@ xlog_redo(XLogReaderState *record)
>          LWLockRelease(OidGenLock);
>          MultiXactAdvanceNextMXact(checkPoint.nextMulti,
>                                    checkPoint.nextMultiOffset);
> -
> -        /*
> -         * NB: This may perform multixact truncation when replaying WAL
> -         * generated by an older primary.
> -         */
> -        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> -                               checkPoint.oldestMultiDB);
>          if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
>                                    checkPoint.oldestXid))
>              SetTransactionIdLimit(checkPoint.oldestXid,
>                                    checkPoint.oldestXidDB);
> +        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> +                               checkPoint.oldestMultiDB);
> +
>          /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
>          ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
>          ControlFile->checkPointCopy.nextXid =
>          checkPoint.nextXid;

Why?

> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index 7c4ef58..e2b4f4c 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -1136,9 +1136,6 @@ vac_truncate_clog(TransactionId frozenXID,
>      if (bogus)
>          return;
> 
> -    /*
> -     * Truncate CLOG, multixact and CommitTs to the oldest computed value.
> -     */
>      TruncateCLOG(frozenXID);
>      TruncateCommitTs(frozenXID);
>      TruncateMultiXact(minMulti, minmulti_datoid);

Why? Sure, it's not a super important comment, but ...?

> diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
> index 0dc4117..41e51cf 100644
> --- a/src/backend/storage/ipc/procarray.c
> +++ b/src/backend/storage/ipc/procarray.c
> @@ -2192,10 +2192,8 @@ GetOldestSafeDecodingTransactionId(void)
> 
>  /*
>   * GetVirtualXIDsDelayingChkpt -- Get the VXIDs of transactions that are
> - * delaying checkpoint because they have critical actions in progress.
> - *
> - * Constructs an array of VXIDs of transactions that are currently in commit
> - * critical sections, as shown by having delayChkpt set in their PGXACT.
> + * delaying checkpoint because they have critical actions in progress, as
> + * shown by having delayChkpt set in their PGXACT.
> 
>   * Returns a palloc'd array that should be freed by the caller.
>   * *nvxids is the number of valid entries.
> @@ -2204,8 +2202,8 @@ GetOldestSafeDecodingTransactionId(void)
>   * the result is somewhat indeterminate, but we don't really care.  Even in
>   * a multiprocessor with delayed writes to shared memory, it should be certain
>   * that setting of delayChkpt will propagate to shared memory when the backend
> - * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if
> - * it's already inserted its commit record.  Whether it takes a little while
> + * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if it's
> + * already inserted its critical xlog record.  Whether it takes a little while
>   * for clearing of delayChkpt to propagate is unimportant for correctness.
>   */

Seems unrelated, given that this is already used in
MarkBufferDirtyHint(). Don't get me wrong, I think the changes are a
good idea, but it's not really tied to the truncation changes.


- Andres

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

03 December 2015, 09:38:56

On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> > Hackers have been too reticent to revert and redo defect-laden
> > commits.  If doing that is weird today, let it be normal.
> 
> Why?

See my paragraph ending with the two sentences you quoted.

> Especially if reverting and redoing includes conflicts that mainly
> increases the chance of accidental bugs.

True.  (That doesn't apply to these patches.)

> > git remote add community git://git.postgresql.org/git/postgresql.git
> > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > git fetch --multiple community nmisch_github
> > git diff community/master...nmisch_github/mxt4-rm-legacy
> 
> That's a nearly useless diff, because it includes a lot of other changes
> (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> you published the changes.

Perhaps you used "git diff a..b", not "git diff a...b".  If not, please send
the outputs of "git rev-parse community/master nmisch_github/mxt4-rm-legacy"
and "git --version".

> > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > -    /*
> > -     * Computing the actual limits is only possible once the data directory is
> > -     * in a consistent state. There's no need to compute the limits while
> > -     * still replaying WAL - no decisions about new multis are made even
> > -     * though multixact creations might be replayed. So we'll only do further
> > -     * checks after TrimMultiXact() has been called.
> > -     */
> > +    /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> >      if (!MultiXactState->finishedStartup)
> >          return;
> > -
> >      Assert(!InRecovery);
> > 
> > -    /* Set limits for offset vacuum. */
> > +    /*
> > +     * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > +     * call, which is only possible once the data directory is in a consistent
> > +     * state. There's no need for an offset limit while still replaying WAL;
> > +     * no decisions about new multis are made even though multixact creations
> > +     * might be replayed.
> > +     */
> >      needs_offset_vacuum = SetOffsetVacuumLimit();
> 
> I don't really see the benefit of this change. The previous location of
> the comment is where we return early, so it seems appropriate to
> document the reason there?

I made that low-importance change for two reasons.  First, returning at that
point skips more than just the setting a limit; it also skips autovacuum
signalling and wraparound warnings.  Second, the function has just computed
mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
defer an offset limit, not any and all limits.

> >  static bool
> >  SetOffsetVacuumLimit(void)
> >  {
> > -    MultiXactId oldestMultiXactId;
> > +    MultiXactId    oldestMultiXactId;
> >      MultiXactId nextMXact;
> > -    MultiXactOffset oldestOffset = 0;    /* placate compiler */
> > -    MultiXactOffset prevOldestOffset;
> > +    MultiXactOffset oldestOffset = 0;        /* placate compiler */
> >      MultiXactOffset nextOffset;
> >      bool        oldestOffsetKnown = false;
> > +    MultiXactOffset prevOldestOffset;
> >      bool        prevOldestOffsetKnown;
> > -    MultiXactOffset offsetStopLimit = 0;
> 
> I don't see the benefit of the order changes here.

I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
reverted when I refinished that commit as branch mxt3-main.

> > -        if (oldestOffsetKnown)
> > -            ereport(DEBUG1,
> > -                    (errmsg("oldest MultiXactId member is at offset %u",
> > -                            oldestOffset)));
> 
> That's imo a rather useful debug message.

The branches emit that message at the same times 4f627f8^ and earlier emit it.

> > -        else
> > +        if (!oldestOffsetKnown)
> > +        {
> > +            /* XXX This message is incorrect if prevOldestOffsetKnown. */
> >              ereport(LOG,
> >                      (errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed
MultiXact%u does not exist on disk",
 
> >                              oldestMultiXactId)));
> > +        }
> >      }
> 
> Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere?

I welcome a project to fix it.

> >      LWLockRelease(MultiXactTruncationLock);
> > 
> >      /*
> > -     * If we can, compute limits (and install them MultiXactState) to prevent
> > -     * overrun of old data in the members SLRU area. We can only do so if the
> > -     * oldest offset is known though.
> > +     * There's no need to update anything if we don't know the oldest offset
> > +     * or if it hasn't changed.
> >       */
> 
> Is that really a worthwhile optimization?

I would neither remove that longstanding optimization nor add it from scratch
today.  Branch commit 06c9979 restored it as part of a larger restoration to
the pre-4f627f8 structure of SetOffsetVacuumLimit().

> > -typedef struct mxtruncinfo
> > -{
> > -    int            earliestExistingPage;
> > -} mxtruncinfo;
> > -
> > -/*
> > - * SlruScanDirectory callback
> > - *        This callback determines the earliest existing page number.
> > - */
> > -static bool
> > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
> > -{
> > -    mxtruncinfo *trunc = (mxtruncinfo *) data;
> > -
> > -    if (trunc->earliestExistingPage == -1 ||
> > -        ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> > -    {
> > -        trunc->earliestExistingPage = segpage;
> > -    }
> > -
> > -    return false;                /* keep going */
> > -}
> > -
> 
> That really seems like an independent change, deserving its own commit +
> explanation.

Indeed.  I explained that change at length in
http://www.postgresql.org/message-id/20151108085407.GA1097830@tornado.leadboat.com,
including that it's alone on a branch (mxt1-disk-independent), to become its
own PostgreSQL commit.

> > [branch commit 89a7232]
> 
> I don't think that's really a good idea - this way we restart after
> every single page write, whereas currently we only restart after passing
> through all pages once. In nearly all cases we'll only ever have to
> retry once in total, be because such old pages aren't usually going to
> be reread/redirtied.

Your improvement sounds fine, then.  Would both SimpleLruTruncate() and
SlruDeleteSegment() benefit from it?

> > @@ -9216,10 +9212,8 @@ xlog_redo(XLogReaderState *record)
> >          LWLockRelease(OidGenLock);
> >          MultiXactSetNextMXact(checkPoint.nextMulti,
> >                                checkPoint.nextMultiOffset);
> > -
> > -        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > -                               checkPoint.oldestMultiDB);
> >          SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
> > +        SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
> > 
> >          /*
> >           * If we see a shutdown checkpoint while waiting for an end-of-backup
> > @@ -9309,17 +9303,13 @@ xlog_redo(XLogReaderState *record)
> >          LWLockRelease(OidGenLock);
> >          MultiXactAdvanceNextMXact(checkPoint.nextMulti,
> >                                    checkPoint.nextMultiOffset);
> > -
> > -        /*
> > -         * NB: This may perform multixact truncation when replaying WAL
> > -         * generated by an older primary.
> > -         */
> > -        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > -                               checkPoint.oldestMultiDB);
> >          if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
> >                                    checkPoint.oldestXid))
> >              SetTransactionIdLimit(checkPoint.oldestXid,
> >                                    checkPoint.oldestXidDB);
> > +        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > +                               checkPoint.oldestMultiDB);
> > +
> >          /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
> >          ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
> >          ControlFile->checkPointCopy.nextXid =
> >          checkPoint.nextXid;
> 
> Why?

master does not and will not have legacy truncation, so the deleted comment
does not belong in master.  Regarding the SetMultiXactIdLimit() call:

commit 611a2ec
Author:     Noah Misch <noah@leadboat.com>
AuthorDate: Sat Nov 7 15:06:28 2015 -0500
Commit:     Noah Misch <noah@leadboat.com>
CommitDate: Sat Nov 7 15:06:28 2015 -0500
   In xlog_redo(), believe a SHUTDOWN checkPoint.oldestMulti exactly.      It was so before this branch.  This restores
consistencywith the   handling of nextXid, nextMulti and oldestMulti: we treat them as exact   for
XLOG_CHECKPOINT_SHUTDOWNand as minima for XLOG_CHECKPOINT_ONLINE.   I do not know of a case where this definitely
mattersfor any of these   counters.  It might matter if a bug causes oldestXid to move forward   wrongly, causing it to
thenmove backward later.  (I don't know if   VACUUM does ever move oldestXid backward, but it's a plausible thing to
doif on-disk state fully agrees with an older value.)  That example has   no counterpart for oldestMultiXactId, because
anyupdate first arrives   in an XLOG_MULTIXACT_TRUNCATE_ID record.  Therefore, this commit is   probably cosmetic.
 

> > -    /*
> > -     * Truncate CLOG, multixact and CommitTs to the oldest computed value.
> > -     */
> >      TruncateCLOG(frozenXID);
> >      TruncateCommitTs(frozenXID);
> >      TruncateMultiXact(minMulti, minmulti_datoid);
> 
> Why? Sure, it's not a super important comment, but ...?

Yeah, it scarcely matters either way.  Commit 4f627f8 reduced this comment to
merely restating the code, so I removed it instead.

> > @@ -2204,8 +2202,8 @@ GetOldestSafeDecodingTransactionId(void)
> >   * the result is somewhat indeterminate, but we don't really care.  Even in
> >   * a multiprocessor with delayed writes to shared memory, it should be certain
> >   * that setting of delayChkpt will propagate to shared memory when the backend
> > - * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if
> > - * it's already inserted its commit record.  Whether it takes a little while
> > + * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if it's
> > + * already inserted its critical xlog record.  Whether it takes a little while
> >   * for clearing of delayChkpt to propagate is unimportant for correctness.
> >   */
> 
> Seems unrelated, given that this is already used in
> MarkBufferDirtyHint(). Don't get me wrong, I think the changes are a
> good idea, but it's not really tied to the truncation changes.

Quite so; its branch (one branch = one proposed PostgreSQL commit),
mxt2-cosmetic, contains no truncation changes.  Likewise for the other
independent comment improvements you noted.

nm

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

03 December 2015, 18:03:32

On 2015-12-03 04:38:45 -0500, Noah Misch wrote:
> On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> > Especially if reverting and redoing includes conflicts that mainly
> > increases the chance of accidental bugs.
> 
> True.  (That doesn't apply to these patches.)

Uh, it does. You had conflicts in your process, and it's hard to verify
that the re-applied patch is actually functionally identical given the
volume of changes. It's much easier to see what actually changes by
looking at iterative commits forward from the current state.


Sorry, but I really just want to see these changes as iterative patches
ontop of 9.5/HEAD instead of this process. I won't revert the reversion
if you push it anyway, but I think it's a rather bad idea.


> > > git remote add community git://git.postgresql.org/git/postgresql.git
> > > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > > git fetch --multiple community nmisch_github
> > > git diff community/master...nmisch_github/mxt4-rm-legacy
> > 
> > That's a nearly useless diff, because it includes a lot of other changes
> > (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> > you published the changes.
> 
> Perhaps you used "git diff a..b", not "git diff a...b".

Ah yes. Neat, didn't know that one.

> > > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > > -    /*
> > > -     * Computing the actual limits is only possible once the data directory is
> > > -     * in a consistent state. There's no need to compute the limits while
> > > -     * still replaying WAL - no decisions about new multis are made even
> > > -     * though multixact creations might be replayed. So we'll only do further
> > > -     * checks after TrimMultiXact() has been called.
> > > -     */
> > > +    /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> > >      if (!MultiXactState->finishedStartup)
> > >          return;
> > > -
> > >      Assert(!InRecovery);
> > > 
> > > -    /* Set limits for offset vacuum. */
> > > +    /*
> > > +     * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > > +     * call, which is only possible once the data directory is in a consistent
> > > +     * state. There's no need for an offset limit while still replaying WAL;
> > > +     * no decisions about new multis are made even though multixact creations
> > > +     * might be replayed.
> > > +     */
> > >      needs_offset_vacuum = SetOffsetVacuumLimit();
> > 
> > I don't really see the benefit of this change. The previous location of
> > the comment is where we return early, so it seems appropriate to
> > document the reason there?
> 
> I made that low-importance change for two reasons.  First, returning at that
> point skips more than just the setting a limit; it also skips autovacuum
> signalling and wraparound warnings.  Second, the function has just computed
> mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
> defer an offset limit, not any and all limits.

My question was more about the comment being after the "early return"
than about the content change, should have made that clearer. Can we
just move your comment up?

> > >  static bool
> > >  SetOffsetVacuumLimit(void)
> > >  {
> > > -    MultiXactId oldestMultiXactId;
> > > +    MultiXactId    oldestMultiXactId;
> > >      MultiXactId nextMXact;
> > > -    MultiXactOffset oldestOffset = 0;    /* placate compiler */
> > > -    MultiXactOffset prevOldestOffset;
> > > +    MultiXactOffset oldestOffset = 0;        /* placate compiler */
> > >      MultiXactOffset nextOffset;
> > >      bool        oldestOffsetKnown = false;
> > > +    MultiXactOffset prevOldestOffset;
> > >      bool        prevOldestOffsetKnown;
> > > -    MultiXactOffset offsetStopLimit = 0;
> > 
> > I don't see the benefit of the order changes here.
> 
> I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
> reverted when I refinished that commit as branch mxt3-main.

But the other changes are there, and in the history anyway. As the new
order isn't more meaningful than the current one...

> > > -        if (oldestOffsetKnown)
> > > -            ereport(DEBUG1,
> > > -                    (errmsg("oldest MultiXactId member is at offset %u",
> > > -                            oldestOffset)));
> > 
> > That's imo a rather useful debug message.
> 
> The branches emit that message at the same times 4f627f8^ and earlier emit it.

During testing I found it rather helpful if it was emitted regularly.

> > >      LWLockRelease(MultiXactTruncationLock);
> > > 
> > >      /*
> > > -     * If we can, compute limits (and install them MultiXactState) to prevent
> > > -     * overrun of old data in the members SLRU area. We can only do so if the
> > > -     * oldest offset is known though.
> > > +     * There's no need to update anything if we don't know the oldest offset
> > > +     * or if it hasn't changed.
> > >       */
> > 
> > Is that really a worthwhile optimization?
> 
> I would neither remove that longstanding optimization nor add it from scratch
> today.  Branch commit 06c9979 restored it as part of a larger restoration to
> the pre-4f627f8 structure of SetOffsetVacuumLimit().

There DetermineSafeOldestOffset() did it unconditionally.

> > > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
> > 
> > That really seems like an independent change, deserving its own commit +
> > explanation.
> 
> Indeed.  I explained that change at length in
> http://www.postgresql.org/message-id/20151108085407.GA1097830@tornado.leadboat.com,
> including that it's alone on a branch (mxt1-disk-independent), to become its
> own PostgreSQL commit.

The comment there doesn't include the explanation...

> > > [branch commit 89a7232]
> > 
> > I don't think that's really a good idea - this way we restart after
> > every single page write, whereas currently we only restart after passing
> > through all pages once. In nearly all cases we'll only ever have to
> > retry once in total, be because such old pages aren't usually going to
> > be reread/redirtied.
> 
> Your improvement sounds fine, then.  Would both SimpleLruTruncate() and
> SlruDeleteSegment() benefit from it?

It probably makes sense to do it in SimpleLruTruncate too - but it does
additional checks as part of the restarts which aren't applicable for
DeleteSegment(), which is IIRC why I didn't also change it.

Greetings,

Andres Freund

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

05 December 2015, 02:55:39

On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> On 2015-12-03 04:38:45 -0500, Noah Misch wrote:
> > On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> > > Especially if reverting and redoing includes conflicts that mainly
> > > increases the chance of accidental bugs.
> > 
> > True.  (That doesn't apply to these patches.)
> 
> Uh, it does. You had conflicts in your process, and it's hard to verify
> that the re-applied patch is actually functionally identical given the
> volume of changes. It's much easier to see what actually changes by
> looking at iterative commits forward from the current state.

Ah, we were talking about different topics after all.  I was talking about
_merge_ conflicts in a reversion commit.

> Sorry, but I really just want to see these changes as iterative patches
> ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> if you push it anyway, but I think it's a rather bad idea.

I hear you.  I evaluated your request and judged that the benefits you cited
did not make up for the losses I cited.  Should you wish to change my mind,
your best bet is to find defects in the commits I proposed.  If I introduced
juicy defects, that discovery would lend much weight to your conjectures.

> My question was more about the comment being after the "early return"
> than about the content change, should have made that clearer. Can we
> just move your comment up?

Sure, I will.

> > > >  static bool
> > > >  SetOffsetVacuumLimit(void)
> > > >  {
> > > > -    MultiXactId oldestMultiXactId;
> > > > +    MultiXactId    oldestMultiXactId;
> > > >      MultiXactId nextMXact;
> > > > -    MultiXactOffset oldestOffset = 0;    /* placate compiler */
> > > > -    MultiXactOffset prevOldestOffset;
> > > > +    MultiXactOffset oldestOffset = 0;        /* placate compiler */
> > > >      MultiXactOffset nextOffset;
> > > >      bool        oldestOffsetKnown = false;
> > > > +    MultiXactOffset prevOldestOffset;
> > > >      bool        prevOldestOffsetKnown;
> > > > -    MultiXactOffset offsetStopLimit = 0;
> > > 
> > > I don't see the benefit of the order changes here.
> > 
> > I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
> > reverted when I refinished that commit as branch mxt3-main.
> 
> But the other changes are there, and in the history anyway. As the new
> order isn't more meaningful than the current one...

Right.  A revert+redo patch series can and should purge formatting changes
that did not belong in its predecessor commits.  Alternate change delivery
strategies wouldn't do that.

> > > > -        if (oldestOffsetKnown)
> > > > -            ereport(DEBUG1,
> > > > -                    (errmsg("oldest MultiXactId member is at offset %u",
> > > > -                            oldestOffset)));
> > > 
> > > That's imo a rather useful debug message.
> > 
> > The branches emit that message at the same times 4f627f8^ and earlier emit it.
> 
> During testing I found it rather helpful if it was emitted regularly.

I wouldn't oppose a patch making it happen more often.

> > > >      LWLockRelease(MultiXactTruncationLock);
> > > > 
> > > >      /*
> > > > -     * If we can, compute limits (and install them MultiXactState) to prevent
> > > > -     * overrun of old data in the members SLRU area. We can only do so if the
> > > > -     * oldest offset is known though.
> > > > +     * There's no need to update anything if we don't know the oldest offset
> > > > +     * or if it hasn't changed.
> > > >       */
> > > 
> > > Is that really a worthwhile optimization?
> > 
> > I would neither remove that longstanding optimization nor add it from scratch
> > today.  Branch commit 06c9979 restored it as part of a larger restoration to
> > the pre-4f627f8 structure of SetOffsetVacuumLimit().
> 
> There DetermineSafeOldestOffset() did it unconditionally.

That is true; one won't be consistent with both.  06c9979 materially shortened
the final patch and eliminated some user-visible message emission changes.
Moreover, this is clearly a case of SetOffsetVacuumLimit() absorbing
DetermineSafeOldestOffset(), not vice versa.

> > > > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
> > > 
> > > That really seems like an independent change, deserving its own commit +
> > > explanation.
> > 
> > Indeed.  I explained that change at length in
> > http://www.postgresql.org/message-id/20151108085407.GA1097830@tornado.leadboat.com,
> > including that it's alone on a branch (mxt1-disk-independent), to become its
> > own PostgreSQL commit.
> 
> The comment there doesn't include the explanation...

If you visit that URL, everything from "If anything here requires careful
study, it's the small mxt1-disk-independent change, which ..." to the end of
the message is my explanation of this change.  What else would you like to
know about it?

> > > > [branch commit 89a7232]
> > > 
> > > I don't think that's really a good idea - this way we restart after
> > > every single page write, whereas currently we only restart after passing
> > > through all pages once. In nearly all cases we'll only ever have to
> > > retry once in total, be because such old pages aren't usually going to
> > > be reread/redirtied.
> > 
> > Your improvement sounds fine, then.  Would both SimpleLruTruncate() and
> > SlruDeleteSegment() benefit from it?
> 
> It probably makes sense to do it in SimpleLruTruncate too - but it does
> additional checks as part of the restarts which aren't applicable for
> DeleteSegment(), which is IIRC why I didn't also change it.

Understood.  There's no rule that these two functions must look as similar as
possible, so I will undo 89a7232.

Thanks,
nm

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

08 December 2015, 11:43:09

On 2015-12-04 21:55:29 -0500, Noah Misch wrote:
> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> > Sorry, but I really just want to see these changes as iterative patches
> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> > if you push it anyway, but I think it's a rather bad idea.
> 
> I hear you.

Not just me.

> I evaluated your request and judged that the benefits you cited
> did not make up for the losses I cited.  Should you wish to change my mind,
> your best bet is to find defects in the commits I proposed.  If I introduced
> juicy defects, that discovery would lend much weight to your conjectures.

I've absolutely no interest in "proving you wrong". And my desire to
review patches that are in a, in my opinion, barely reviewable format is
pretty low as well.

Andres

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

08 December 2015, 18:05:09

On Tue, Dec 8, 2015 at 6:43 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-12-04 21:55:29 -0500, Noah Misch wrote:
>> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
>> > Sorry, but I really just want to see these changes as iterative patches
>> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion
>> > if you push it anyway, but I think it's a rather bad idea.
>>
>> I hear you.
>
> Not just me.
>
>> I evaluated your request and judged that the benefits you cited
>> did not make up for the losses I cited.  Should you wish to change my mind,
>> your best bet is to find defects in the commits I proposed.  If I introduced
>> juicy defects, that discovery would lend much weight to your conjectures.
>
> I've absolutely no interest in "proving you wrong". And my desire to
> review patches that are in a, in my opinion, barely reviewable format is
> pretty low as well.

I agree.  Noah, it seems to me that you are offering a novel theory of
how patches should be submitted, reviewed, and committed, but you've
got three people, two of them committers, telling you that we don't
like that approach.  I seriously doubt you're going to find anyone who
does.  When stuff gets committed to the tree, people want to to be
able to answer the question "what has just now changed?" and it is
indisputable that what you want to do here will make that harder.
That's not a one-time problem for Andres during the course of review;
that is a problem for every single person who looks at the commit
history from now until the end of time.  I don't think you have the
right to force your proposed approach through in the face of concerted
opposition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

09 December 2015, 14:43:43

On Tue, Dec 08, 2015 at 01:05:03PM -0500, Robert Haas wrote:
> On Tue, Dec 8, 2015 at 6:43 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2015-12-04 21:55:29 -0500, Noah Misch wrote:
> >> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> >> > Sorry, but I really just want to see these changes as iterative patches
> >> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> >> > if you push it anyway, but I think it's a rather bad idea.
> >>
> >> I hear you.
> >
> > Not just me.
> >
> >> I evaluated your request and judged that the benefits you cited
> >> did not make up for the losses I cited.  Should you wish to change my mind,
> >> your best bet is to find defects in the commits I proposed.  If I introduced
> >> juicy defects, that discovery would lend much weight to your conjectures.
> >
> > I've absolutely no interest in "proving you wrong". And my desire to
> > review patches that are in a, in my opinion, barely reviewable format is
> > pretty low as well.
> 
> I agree.  Noah, it seems to me that you are offering a novel theory of
> how patches should be submitted, reviewed, and committed, but you've
> got three people, two of them committers, telling you that we don't
> like that approach.  I seriously doubt you're going to find anyone who
> does.

Andres writing the patch that became commit 4f627f8 and you reviewing it were
gifts to Alvaro and to the community.  Aware of that, I have avoided[1] saying
that I was shocked to see that commit's defects.  Despite a committer-author
and _two_ committer reviewers, the patch was rife with wrong new comments,
omitted updates to comments it caused to become wrong, and unsolicited
whitespace churn.  (Anyone could have missed the data loss bug, but these
collectively leap off the page.)  This in beleaguered code critical to data
integrity.  You call this thread's latest code a patch submission, but I call
it bandaging the tree after a recent commit that never should have reached the
tree.  Hey, if you'd like me to post the traditional patch files, that's easy.
It would have been easier for me.  I posted branches because it gives more
metadata to guide review.  As for the choice to revert and redo ...

> When stuff gets committed to the tree, people want to to be
> able to answer the question "what has just now changed?" and it is
> indisputable that what you want to do here will make that harder.

I hope those who have not already read commit 4f627f8 will not waste time
reading it.  They should instead ignore multixact changes from commit 4f627f8
through its revert.  The 2015-09-26 commits have not appeared in a supported
release, and no other work has built upon them.  They have no tenure.  (I am
glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
would have introduced a data loss bug.)  Nobody reported a single defect
before my review overturned half the patch.  A revert will indeed impose on
those who invested time to understand commit 4f627f8, but the silence about
its defects suggests the people understanding it number either zero or one.
Even as its author and reviewers, you would do better to set aside what you
thought you knew about this code.

> That's not a one-time problem for Andres during the course of review;
> that is a problem for every single person who looks at the commit
> history from now until the end of time.

It's a service to future readers that no line of "git blame master <...>" will
refer to a 2015-09-26 multixact commit.  Blame reports will instead refer to
replacement commits designed to be meaningful for study in isolation.  If I
instead structured the repairs as you ask, the blame would find one of 4f627f8
or its various repair commits, none of which would be a self-contained unit of
development.  What's to enjoy about discovering that history?

> I don't think you have the
> right to force your proposed approach through in the face of concerted
> opposition.

That's correct; I do not have that right.  Your objection still worries me.

nm

[1] http://www.postgresql.org/message-id/20151029065903.GC770464@tornado.leadboat.com

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

09 December 2015, 15:42:00

On 2015-12-09 09:43:19 -0500, Noah Misch wrote:
> Aware of that, I have avoided[1] saying that I was shocked to see that
> commit's defects.  Despite a committer-author and _two_ committer
> reviewers, the patch was rife with wrong new comments, omitted updates
> to comments it caused to become wrong,

It's not like that patch wasn't posted for review for months...

> and unsolicited whitespace churn.

Whitespace churn? The commit includes a pgindent run, because Alvaro
asked me to do that, but that just affected a handful of lines. If you
mean the variable ordering: given several variables were renamed anyway,
additionally putting them in a easier to understand order, seems rather
painless. If you mean 'pgindent immune' long lines - multixact.c is far
from the only one with those, and they're prett harmless.

> You call this thread's latest code a patch
> submission, but I call it bandaging the tree after a recent commit
> that never should have reached the tree.

Oh, for christs sake.

> Hey, if you'd like me to
> post the traditional patch files, that's easy.  It would have been
> easier for me.

You've been asked that, repeatedly. At least if you take 'traditional
patch files' to include traditional, iterative, patches ontop of the
current tree.

> I hope those who have not already read commit 4f627f8 will not waste time
> reading it.

We have to, who knows what's hiding in there. Your git log even shows
that you had conflicts in your approach (83cb04 Conflicts:
src/backend/access/transam/multixact.c).

> They should instead ignore multixact changes from commit 4f627f8
> through its revert.  The 2015-09-26 commits have not appeared in a supported
> release, and no other work has built upon them.

> They have no tenure.

Man.

> (I am glad you talked the author out of back-patching; otherwise,
> 9.4.5 and 9.3.10 would have introduced a data loss bug.)

Isn't that a bug in a, as far as we know, impossible scenario? Unless I
miss something there's no known case where it's "expected" that
find_multixact_start() fails after initially succeeding? Sure, it sucks
that the bug survived review and that it was written in the first
place. But it not showing up during testing isn't meaningful, given it's
a should-never-happen scenario.

I'm actually kinda inclined to rip out the whole "previous pass" logic
out alltogether, and replace it with a PANIC. It's a hard to test,
should never happen, scenario. If it happens, things have already
seriously gone sour.

> > That's not a one-time problem for Andres during the course of review;
> > that is a problem for every single person who looks at the commit
> > history from now until the end of time.
> 
> It's a service to future readers that no line of "git blame master <...>" will
> refer to a 2015-09-26 multixact commit.

And a disservice for everyone doing git log, or git blame for
intermediate states of the tree. The benefit for git blame, are almost
nonexistant, not seing a couple newlines changed, or not seing some
intermediate commits isn't really important.

> Blame reports will instead refer to
> replacement commits designed to be meaningful for study in isolation.  If I
> instead structured the repairs as you ask, the blame would find one of 4f627f8
> or its various repair commits, none of which would be a self-contained unit of
> development.

So what? That's how development in general works. And how it actually
happened in this specific case.

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

09 December 2015, 16:08:39

On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch <noah@leadboat.com> wrote:
> Andres writing the patch that became commit 4f627f8 and you reviewing it were
> gifts to Alvaro and to the community.  Aware of that, I have avoided[1] saying
> that I was shocked to see that commit's defects.  Despite a committer-author
> and _two_ committer reviewers, the patch was rife with wrong new comments,
> omitted updates to comments it caused to become wrong, and unsolicited
> whitespace churn.  (Anyone could have missed the data loss bug, but these
> collectively leap off the page.)  This in beleaguered code critical to data
> integrity.  You call this thread's latest code a patch submission, but I call
> it bandaging the tree after a recent commit that never should have reached the
> tree.  Hey, if you'd like me to post the traditional patch files, that's easy.
> It would have been easier for me.  I posted branches because it gives more
> metadata to guide review.  As for the choice to revert and redo ...

Yes, I'd like patch files, one per topic.

I wasn't very happy with the way that patch it was written; it seemed
to me that it touched too much code and move a lot of things around
unnecessarily, and I said so at the time.  I would have preferred
something more incremental, and I asked for it and didn't get it.
Well, I'm not giving up: I'm asking for the same thing here.  I didn't
think it was a good idea for Andres to rearrange that much code in a
single commit, because it was hard to review, and I don't think it's a
good idea for you to do it, either.  To the extent that you found
bugs, I think that proves the point that large commits are hard to
review and small commits that change things just a little bit at a
time are the way to go.

> I hope those who have not already read commit 4f627f8 will not waste time
> reading it.  They should instead ignore multixact changes from commit 4f627f8
> through its revert.  The 2015-09-26 commits have not appeared in a supported
> release, and no other work has built upon them.  They have no tenure.  (I am
> glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
> would have introduced a data loss bug.)  Nobody reported a single defect
> before my review overturned half the patch.  A revert will indeed impose on
> those who invested time to understand commit 4f627f8, but the silence about
> its defects suggests the people understanding it number either zero or one.
> Even as its author and reviewers, you would do better to set aside what you
> thought you knew about this code.

I just don't find this a realistic model of how people use the git
log.  Maybe you use it this way; I don't.  I don't *want* git blame to
make it seem as if 4f627f8 is not part of the history.  For better or
worse, it is.  Ripping it out and replacing it monolithically will not
change that; it will only make the detailed history harder to
reconstruct, and I *will* want to reconstruct it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

09 December 2015, 16:18:48

On Wed, Dec 9, 2015 at 10:41 AM, Andres Freund <andres@anarazel.de> wrote:
>> (I am glad you talked the author out of back-patching; otherwise,
>> 9.4.5 and 9.3.10 would have introduced a data loss bug.)
>
> Isn't that a bug in a, as far as we know, impossible scenario? Unless I
> miss something there's no known case where it's "expected" that
> find_multixact_start() fails after initially succeeding? Sure, it sucks
> that the bug survived review and that it was written in the first
> place. But it not showing up during testing isn't meaningful, given it's
> a should-never-happen scenario.

If I correctly understand the scenario that you are describing, that
does happen - not for the same MXID, but for different ones.  At least
the last time I checked, and I'm not sure if we've fixed this, it
could happen because the SLRU page that contains the multixact wasn't
flushed out of the SLRU buffers yet.  But apart from that, it could
happen any time there's a gap in the sequence of files, and that sure
doesn't seem like a can't-happen situation.  We know that, on 9.3,
there's definitely a sequence of events that leads to a 0000 file
followed by a gap followed by the series of files that are still live.
Given the number of other bugs we've fixed in this area, I would not
like to bet on that being the only scenario where this crops up.  It
*shouldn't* happen, and as far as we know, if you start and end on a
version newer than 4f627f8 and aa29c1c, it won't.  Older branches,
though, I wouldn't like to bet on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

09 December 2015, 16:21:03

On 2015-12-09 11:18:39 -0500, Robert Haas wrote:
> If I correctly understand the scenario that you are describing, that
> does happen - not for the same MXID, but for different ones.  At least
> the last time I checked, and I'm not sure if we've fixed this, it
> could happen because the SLRU page that contains the multixact wasn't
> flushed out of the SLRU buffers yet.

That should be fixed, with the brute force solution of flushing buffers
before searching for files on disk.

> But apart from that, it could
> happen any time there's a gap in the sequence of files, and that sure
> doesn't seem like a can't-happen situation.  We know that, on 9.3,
> there's definitely a sequence of events that leads to a 0000 file
> followed by a gap followed by the series of files that are still live.
> Given the number of other bugs we've fixed in this area, I would not
> like to bet on that being the only scenario where this crops up.  It
> *shouldn't* happen, and as far as we know, if you start and end on a
> version newer than 4f627f8 and aa29c1c, it won't.  Older branches,
> though, I wouldn't like to bet on.

Ok, fair enough.

andres

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

10 December 2015, 01:23:13

On Wed, Dec 09, 2015 at 11:08:32AM -0500, Robert Haas wrote:
> On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch <noah@leadboat.com> wrote:
> > I hope those who have not already read commit 4f627f8 will not waste time
> > reading it.  They should instead ignore multixact changes from commit 4f627f8
> > through its revert.  The 2015-09-26 commits have not appeared in a supported
> > release, and no other work has built upon them.  They have no tenure.  (I am
> > glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
> > would have introduced a data loss bug.)  Nobody reported a single defect
> > before my review overturned half the patch.  A revert will indeed impose on
> > those who invested time to understand commit 4f627f8, but the silence about
> > its defects suggests the people understanding it number either zero or one.
> > Even as its author and reviewers, you would do better to set aside what you
> > thought you knew about this code.
> 
> I just don't find this a realistic model of how people use the git
> log.  Maybe you use it this way; I don't.  I don't *want* git blame to
> make it seem as if 4f627f8 is not part of the history.  For better or
> worse, it is.

I would like to understand how you use git, then.  What's one of your models
of using "git log" and/or "git blame" in which you foresee the revert making
history less clear, not more clear?

By the way, it occurs to me that I should also make pg_upgrade blacklist the
range of catversions that might have data loss.  No sense in putting ourselves
in the position of asking whether data files of a 9.9.3 cluster spent time in
a 9.5beta2 cluster.

> Ripping it out and replacing it monolithically will not
> change that; it will only make the detailed history harder to
> reconstruct, and I *will* want to reconstruct it.

What's something that might happen six months from now and lead you to inspect
master or 9.5 multixact.c between 4f627f8 and its revert?

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

10 December 2015, 08:34:15

On 2015-12-09 20:23:06 -0500, Noah Misch wrote:
> By the way, it occurs to me that I should also make pg_upgrade blacklist the
> range of catversions that might have data loss.  No sense in putting ourselves
> in the position of asking whether data files of a 9.9.3 cluster spent time in
> a 9.5beta2 cluster.

I can't see any benefit in that. We're talking about a bug that, afaics,
needs another unknown bug to trigger (so find_multixact_start() fails),
and then very likely needs significant amounts of new multixacts
consumed, without a restart and without find_multixact_start()
succeeding later.

What I think would actually help for questions like this, is to add, as
discussed in some other threads, the following:
1) 'creating version' to pg_control
2) 'creating version' to each pg_class entry
3) 'last relation rewrite in version' to each pg_class entry
4) 'last full vacuum in version' to each pg_class entry

I think for this purpose 'version' should be something akin to
$catversion||$numericversion (int64 probably?) - that way development
branches and release branches are handled somewhat usefully.

I think that'd be useful, both from an investigatory perspective, as
from a tooling perspective, because it'd allow reusing things like hint
bits.

> > Ripping it out and replacing it monolithically will not
> > change that; it will only make the detailed history harder to
> > reconstruct, and I *will* want to reconstruct it.
> 
> What's something that might happen six months from now and lead you to inspect
> master or 9.5 multixact.c between 4f627f8 and its revert?

"Hey, what has happened to multixact.c lately? I'm investigating a bug,
and I wonder if it already has been fixed?", "Uh, what was the problem
with that earlier large commit?", "Hey, what has changed between beta2
and the final release?"...

Re: Rework the way multixact truncations work

From

Peter Geoghegan

Date:

10 December 2015, 08:58:33

On Thu, Dec 10, 2015 at 12:34 AM, Andres Freund <andres@anarazel.de> wrote:
>> > Ripping it out and replacing it monolithically will not
>> > change that; it will only make the detailed history harder to
>> > reconstruct, and I *will* want to reconstruct it.
>>
>> What's something that might happen six months from now and lead you to inspect
>> master or 9.5 multixact.c between 4f627f8 and its revert?
>
> "Hey, what has happened to multixact.c lately? I'm investigating a bug,
> and I wonder if it already has been fixed?", "Uh, what was the problem
> with that earlier large commit?", "Hey, what has changed between beta2
> and the final release?"...

Quite.

I can't believe we're still having this silly discussion. Can we please move on?

-- 
Peter Geoghegan

Re: Rework the way multixact truncations work

From

Bert

Date:

10 December 2015, 09:01:51

<div dir="ltr">+1<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Dec 10, 2015 at 9:58 AM,
PeterGeoghegan <span dir="ltr"><<a href="mailto:pg@heroku.com" target="_blank">pg@heroku.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">OnThu, Dec 10, 2015 at 12:34 AM, Andres Freund <<a
href="mailto:andres@anarazel.de">andres@anarazel.de</a>>wrote:<br /> >> > Ripping it out and replacing it
monolithicallywill not<br /> >> > change that; it will only make the detailed history harder to<br /> >>
>reconstruct, and I *will* want to reconstruct it.<br /> >><br /> >> What's something that might happen
sixmonths from now and lead you to inspect<br /> >> master or 9.5 multixact.c between 4f627f8 and its revert?<br
/>><br /> > "Hey, what has happened to multixact.c lately? I'm investigating a bug,<br /> > and I wonder if it
alreadyhas been fixed?", "Uh, what was the problem<br /> > with that earlier large commit?", "Hey, what has changed
betweenbeta2<br /> > and the final release?"...<br /><br /></span>Quite.<br /><br /> I can't believe we're still
havingthis silly discussion. Can we please move on?<br /><span class="HOEnZb"><font color="#888888"><br /> --<br />
PeterGeoghegan<br /></font></span><div class="HOEnZb"><div class="h5"><br /><br /> --<br /> Sent via pgsql-hackers
mailinglist (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To make changes to
yoursubscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br /><br
clear="all"/><br />-- <br /><div class="gmail_signature">Bert Desmet<br />0477/305361</div></div>

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

10 December 2015, 13:56:01

On Wed, Dec 9, 2015 at 8:23 PM, Noah Misch <noah@leadboat.com> wrote:
> On Wed, Dec 09, 2015 at 11:08:32AM -0500, Robert Haas wrote:
>> On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch <noah@leadboat.com> wrote:
>> > I hope those who have not already read commit 4f627f8 will not waste time
>> > reading it.  They should instead ignore multixact changes from commit 4f627f8
>> > through its revert.  The 2015-09-26 commits have not appeared in a supported
>> > release, and no other work has built upon them.  They have no tenure.  (I am
>> > glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
>> > would have introduced a data loss bug.)  Nobody reported a single defect
>> > before my review overturned half the patch.  A revert will indeed impose on
>> > those who invested time to understand commit 4f627f8, but the silence about
>> > its defects suggests the people understanding it number either zero or one.
>> > Even as its author and reviewers, you would do better to set aside what you
>> > thought you knew about this code.
>>
>> I just don't find this a realistic model of how people use the git
>> log.  Maybe you use it this way; I don't.  I don't *want* git blame to
>> make it seem as if 4f627f8 is not part of the history.  For better or
>> worse, it is.
>
> I would like to understand how you use git, then.  What's one of your models
> of using "git log" and/or "git blame" in which you foresee the revert making
> history less clear, not more clear?

Well, suppose I wanted to know what bugs were fixed between 9.5beta
and 9.5.0, for example.  I mean, I'm going to run git log
src/backend/access/transam/multixact.c ... and the existing commits
are going to be there.

> By the way, it occurs to me that I should also make pg_upgrade blacklist the
> range of catversions that might have data loss.  No sense in putting ourselves
> in the position of asking whether data files of a 9.9.3 cluster spent time in
> a 9.5beta2 cluster.

Maybe.  But I think we could use a little more vigorous discussion of
that issue, since Andres doesn't seem to be very convinced by your
analysis, and I don't currently understand what you've fixed because I
can't, as mentioned several times, follow your patch stack.

>> Ripping it out and replacing it monolithically will not
>> change that; it will only make the detailed history harder to
>> reconstruct, and I *will* want to reconstruct it.
>
> What's something that might happen six months from now and lead you to inspect
> master or 9.5 multixact.c between 4f627f8 and its revert?

I don't know have anything to add to what others have said in response
to this point, except this: the whole point of using a source code
management system is to tell you what changed and when.  What you are
proposing to do makes it unusable for that purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

10 December 2015, 14:04:59

On 2015-12-10 08:55:54 -0500, Robert Haas wrote:
> Maybe.  But I think we could use a little more vigorous discussion of
> that issue, since Andres doesn't seem to be very convinced by your
> analysis, and I don't currently understand what you've fixed because I
> can't, as mentioned several times, follow your patch stack.

The issue at hand is that the following block    oldestOffsetKnown =        find_multixact_start(oldestMultiXactId,
&oldestOffset);

...else if (prevOldestOffsetKnown){    /*     * If we failed to get the oldest offset this time, but we have a     *
valuefrom a previous pass through this function, use the old value     * rather than automatically forcing it.     */
oldestOffset = prevOldestOffset;    oldestOffsetKnown = true;}
 
in SetOffsetVacuumLimit() fails to restore offsetStopLimit, which then
is set in shared memory:/* Install the computed values */LWLockAcquire(MultiXactGenLock,
LW_EXCLUSIVE);MultiXactState->oldestOffset= oldestOffset;MultiXactState->oldestOffsetKnown =
oldestOffsetKnown;MultiXactState->offsetStopLimit= offsetStopLimit;LWLockRelease(MultiXactGenLock);
 

so, if find_multixact_start() failed - a "should never happen" occurance
- we install a wrong stop limit. It does get 'repaired' upon the next
suceeding find_multixact_start() in SetOffsetVacuumLimit() or a restart
though.

Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

10 December 2015, 14:32:12

On Thu, Dec 10, 2015 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-12-10 08:55:54 -0500, Robert Haas wrote:
>> Maybe.  But I think we could use a little more vigorous discussion of
>> that issue, since Andres doesn't seem to be very convinced by your
>> analysis, and I don't currently understand what you've fixed because I
>> can't, as mentioned several times, follow your patch stack.
>
> The issue at hand is that the following block
>                 oldestOffsetKnown =
>                         find_multixact_start(oldestMultiXactId, &oldestOffset);
>
> ...
>         else if (prevOldestOffsetKnown)
>         {
>                 /*
>                  * If we failed to get the oldest offset this time, but we have a
>                  * value from a previous pass through this function, use the old value
>                  * rather than automatically forcing it.
>                  */
>                 oldestOffset = prevOldestOffset;
>                 oldestOffsetKnown = true;
>         }
> in SetOffsetVacuumLimit() fails to restore offsetStopLimit, which then
> is set in shared memory:
>         /* Install the computed values */
>         LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
>         MultiXactState->oldestOffset = oldestOffset;
>         MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
>         MultiXactState->offsetStopLimit = offsetStopLimit;
>         LWLockRelease(MultiXactGenLock);
>
> so, if find_multixact_start() failed - a "should never happen" occurance
> - we install a wrong stop limit. It does get 'repaired' upon the next
> suceeding find_multixact_start() in SetOffsetVacuumLimit() or a restart
> though.
>
> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.

So let's do that, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

11 December 2015, 16:20:29

On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 10, 2015 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2015-12-10 08:55:54 -0500, Robert Haas wrote:
>>> Maybe.  But I think we could use a little more vigorous discussion of
>>> that issue, since Andres doesn't seem to be very convinced by your
>>> analysis, and I don't currently understand what you've fixed because I
>>> can't, as mentioned several times, follow your patch stack.
>>
>> The issue at hand is that the following block
>>                 oldestOffsetKnown =
>>                         find_multixact_start(oldestMultiXactId, &oldestOffset);
>>
>> ...
>>         else if (prevOldestOffsetKnown)
>>         {
>>                 /*
>>                  * If we failed to get the oldest offset this time, but we have a
>>                  * value from a previous pass through this function, use the old value
>>                  * rather than automatically forcing it.
>>                  */
>>                 oldestOffset = prevOldestOffset;
>>                 oldestOffsetKnown = true;
>>         }
>> in SetOffsetVacuumLimit() fails to restore offsetStopLimit, which then
>> is set in shared memory:
>>         /* Install the computed values */
>>         LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
>>         MultiXactState->oldestOffset = oldestOffset;
>>         MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
>>         MultiXactState->offsetStopLimit = offsetStopLimit;
>>         LWLockRelease(MultiXactGenLock);
>>
>> so, if find_multixact_start() failed - a "should never happen" occurance
>> - we install a wrong stop limit. It does get 'repaired' upon the next
>> suceeding find_multixact_start() in SetOffsetVacuumLimit() or a restart
>> though.
>>
>> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
>
> So let's do that, then.

Who is going to take care of this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rework the way multixact truncations work

From

Noah Misch

Date:

12 December 2015, 01:22:49

On Thu, Dec 10, 2015 at 08:55:54AM -0500, Robert Haas wrote:
> I don't know have anything to add to what others have said in response
> to this point, except this: the whole point of using a source code
> management system is to tell you what changed and when.  What you are
> proposing to do makes it unusable for that purpose.

Based on your comments, I'm calling the patch series returned with feedback.
I built the series around the goal of making history maximally reviewable for
persons not insiders to commit 4f627f8.  Having spent 90% of my 2015
PostgreSQL contribution time finding or fixing committed defects, my judgment
of how best to achieve that is no shout from the peanut gallery.  (Neither is
your judgment.)  In particular, I had in view two works, RLS and pg_audit,
that used the post-commit repair strategy you've advocated.  But you gave me a
fair chance to make the case, and you stayed convinced that my repairs oppose
my goal.  I can now follow your development of that belief, which is enough.

Re: Rework the way multixact truncations work

From

Andres Freund

Date:

12 December 2015, 17:02:20

Noah, Robert, All

On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
> >
> > So let's do that, then.
>
> Who is going to take care of this?

Here's two patches:

1) The fix to SetOffsetVacuumLimit().

   I've tested this by introducing a probabilistic "return false;" to
   find_multixact_start(), and torturing postgres by burning through
   billions of multixactids of various sizes.  Behaves about as
   bad^Wgood as without the induced failures; before the patch there
   were moments of spurious warnings/errors when ->offsetStopLimit was
   out of whack.

2) A subset of the comment changes from Noah's repository. Some of the
   comment changes didn't make sense without the removal
   SlruScanDirCbFindEarliest(), a small number of other changes I couldn't
   fully agree with.

   Noah, are you ok with pushing that subset of your changes? Is
   "Slightly edited subset of a larger patchset by Noah Misch" an OK
   attribution?

Noah, on a first glance I think e50cca0ae ("Remove the
SlruScanDirCbFindEarliest() test from TruncateMultiXact().") is a good
idea. So I do encourage you to submit that as a separate patch.

Regards,

Andres

Attachment

Re: Rework the way multixact truncations work

From

Robert Haas

Date:

13 December 2015, 02:28:30

On Sat, Dec 12, 2015 at 12:02 PM, Andres Freund <andres@anarazel.de> wrote:
> Noah, Robert, All
>
> On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
>> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
>> >
>> > So let's do that, then.
>>
>> Who is going to take care of this?
>
> Here's two patches:
>
> 1) The fix to SetOffsetVacuumLimit().
>
>    I've tested this by introducing a probabilistic "return false;" to
>    find_multixact_start(), and torturing postgres by burning through
>    billions of multixactids of various sizes.  Behaves about as
>    bad^Wgood as without the induced failures; before the patch there
>    were moments of spurious warnings/errors when ->offsetStopLimit was
>    out of whack.

I find the commit message you wrote a little difficult to read, and
propose the following version instead, which reads better to me:

Previously, if find_multixact_start() failed, SetOffsetVacuumLimit()
would install 0 into MultiXactState->offsetStopLimit.  Luckily, there
are no known cases where find_multixact_start() will return an error
in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad:
GetNewMultiXactId() could continue allocating mxids even if close to a
wraparound, or it could erroneously stop allocating mxids, even if no
wraparound is looming.  The wrong value would be corrected the next
time SetOffsetVacuumLimit() is called, or by a restart.

(I have no comments on the substance of either patch and have reviewed
the first one to a negligible degree - it doesn't look obviously wrong
- and the second one not at all.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company