Hi,
On 2013-11-21 20:38:47 +0100, Andres Freund wrote:
> Turns out, we don't ever truncate pg_multixact during recovery since
> 9dc842f0832fd71eda826349a0c17ecf8ae93b84 because multixact truncations,
> in contrast to clog, aren't WAL logged themselves. Disabling probably
> was fair game back then since it wasn't too likely to remain in crash
> recovery forever.
> But at the very least since the addition of Hot Standby that's really
> not the case anymore. If I calculate correctly currently you'd end up
> with ~34GB(<9.3)/38GB of pg_multixact which seems a bit much.
>
> I am not 100% sure, but it looks like things could actually continue to
> work despite having an slru wraparound into existing data. But that's
> certainly nothing I'd want to rely on and looks mostly like lucky
> happenstance, especially in 9.3.
>
> If this were a master only issue, I'd say WAL-logging mxact truncation
> would be the way to go, but we can't really do that in the back branches
> since multixact_redo() would throw a fit if we were to introduce a new
> type of wal record and somebody would upgrade a primary first.
>
> So, what I think we need to do is to split StartupMultiXact() into two
> parts, StartupMultiXact() which only sets the offset's, members's
> shared->latest_page_number and TrimMultiXact() which does the remainder
> of the work, executed when finishing crash recovery at the current
> location of StartupMultiXact().
So, I've done this for 9.3+ for now. Testing around that turned up that
our current way to schedule anti mxid wraparounds doesn't really work:
1) autovacuum.c knows about such vacuums, but vacuum.c doesn't. Leading
to a long cycle of partial vacuums that don't increase relminmxid.
2) Parts of the code used 200mio as a hardcoded constant, others used
autovacuum_freeze_max_age.
0001 fixes the vacuum scheduling and is applicable to 9.3+,
0002 re-adds pg_multixact truncation during crash recovery. The current
code will only work on 9.3+, but if it's deemed acceptable I can
backport it to earlier versions. I am not sure if it's worth backporting
it 9.0 given it has neither HS nor SR?
0003 is a debugging only patch adding the useful pg_burn_multixact(num)
function to pageinspect (plus some core changes to make that fast) and
allows for low autovacuum_freeze_max_age settings.
Not sure if it's really worth adding MultiXactIdPrecedesOrEquals in
0002, but I didn't want to differ in the scan_all logic normal xids ids
and mxids. I think it'd also be fine to change the logic for xids to use
TransactionIdPrecedes(), but I didn't want to touch that logic
unnecessarily.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services