Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date
Msg-id 20150608131504.GH24997@alap3.anarazel.de
Whole thread Raw
In response to Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
List pgsql-hackers
On 2015-06-05 16:56:18 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas <robertmhaas@gmail.com> wrote:
> >> I think we would be foolish to rush that part into the tree.  We
> >> probably got here in the first place by rushing the last round of
> >> fixes too much; let's try not to double down on that mistake.
>
> > My problem with that approach is that I think the code has gotten significantly more complex in the least few
weeks.I have very little trust that the interactions between vacuum, the deferred truncations in the checkpointer, the
statemanagement in shared memory and recovery are correct. There's just too many non-local subtleties here. 
>
> > I don't know what the right thing to do here is.
>
> My gut feeling is that rushing to make a release date is the wrong thing.
>
> If we have confidence that we can ship something on Monday that is
> materially more trustworthy than the current releases, then let's aim to
> do that; but let's ship only patches we are confident in.  We can do
> another set of releases later that incorporate additional fixes.  (As some
> wise man once said, there's always another bug.)

I've tortured hardware a fair bit with HEAD. So far it looks much better
than 9.4.2+ et al. I've noticed a bunch of, to me at least, new issues:

1) the autovacuum trigger logic isn't perfect yet. I.e. especially with
  autovacuum=off you can get into situations where emergency vacuums
  aren't started when necessary. This is particularly likely to happen
  if either very large multixacts are used, or if the server has been
  shut down while emergency autovacuum where happening. No corruption
  ensues, but it's not easy to get out of.

2) I've managed to corrupt a cluster when a standby performed
  restartpoints less frequently than the master performed
  checkpoints. Because truncations happen in the checkpointer it's not
  that hard to end up with entirely full multixact slrus. This is a
  problem on several fronts. We can IIUC end up truncating away the
  wrong data, and we can be in a bad state upon promotion.  None of that
  is new.

3) It's really confusing that truncation (and thus the limits in shared
  memory) happens in checkpoints. If you hit a limit and manually do all
  the necessary vacuums you'll see a "good" limit in
  pg_database.datminmxid, but you'll still into the error. You manually
  have to force a checkpoint for the truncation to actually
  happen. That's particularly problematic because larger installations,
  where I presume wraparound issues are more likely, often have a large
  checkpoint_timeout setting.

Since none of these are really new, I don't think they should prevent us
from doing a back branch release. While I'm still not convinced we're
better of with 9.4.4 than with 9.4.1, we're certainly better of than
with 9.4.[23] et al.

If we want to go ahead with the release I plan to do a bit more testing
today and tomorrow. If not I'm first going to continue working on fixing
the above.

I've a "good" fix for 1). I'm not 100% sure I'll feel confident with
pushing if we wrap today. I am wondering if we shouldn't at least apply
the portion that unconditionally sends a signal in the ERROR
case. That's still an improvement.


One more thing:
Our testing infrastructure sucks. Without writing C code it's basically
impossible to test wraparounds and such. Even if not particularly useful
for non-devs, I really think we should have functions for creating
burning xids/multixacts in core. Or at least in some extension.


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
Next
From: Geoff Winkless
Date:
Subject: Re: [CORE] Restore-reliability mode