Re: Multixid hindsight design - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Multixid hindsight design
Date
Msg-id CANP8+jKbUmvupJTfyVh3KbqG4PErkJbAiu39ceio4AkYjZWYQA@mail.gmail.com
Whole thread Raw
In response to Multixid hindsight design  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Multixid hindsight design  (Robert Haas <robertmhaas@gmail.com>)
Re: Multixid hindsight design  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 11 May 2015 at 22:20, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I'd like to discuss how we should've implemented the infamous 9.3 multixid/row-locking stuff, and perhaps still should in 9.6. Hindsight is always 20/20 - I'll readily admit that I didn't understand the problems until well after the release - so this isn't meant to bash what's been done. Rather, let's think of the future.

The main problem with the infamous multixid changes was that it made pg_multixact a permanent, critical, piece of data. Without it, you cannot decipher whether some rows have been deleted or not. The 9.3 changes uncovered pre-existing issues with vacuuming and wraparound, but the fact that multixids are now critical turned those the otherwise relatively harmless bugs into data loss.

We have pg_clog, which is a similar critical data structure. That's a pain too - you need VACUUM and you can't easily move tables from one cluster to another for example - but we've learned to live with it. But we certainly don't need any more such data structures.

So the lesson here is that having a permanent pg_multixact is not nice, and we should get rid of it. Here's how to do that:

I think we should think back to exactly what we are trying to store, why and for how long. A clear definition of what we are trying to achieve is essential to solving the problem.

In my understanding we need to store
* at most one xid - the Updating Xid
* potentially many Locking Xids

The current design has two SLRUs and puts all of those xids in the Members SLRU, causing it to need to be persistent.

The problems come from having significant numbers of locking xids. My understanding is that any change in the number of lockers requires the full array to be rewritten. So with N lockers we end up with 2N-1 arrays, each array has an average of N/2 members, or N^2  entries, i.e. an O(N^2) algorithm, which makes it a bad thing to persist. Assuming that design continues mostly unchanged in its core points...

An alternate proposal:

1. Store only the Locking xids in the Members SLRU
2. In the Offsets SLRU store: 1) the Updating Xid and 2) the offset to the Locking xids in the Members SLRU. 

This means the Offsets SLRU will be around twice the size it was before BUT since we reduce the size of each Members array by one, there is a balanced saving there, so this change is disk-space-neutral.

That way if we need to make Offsets SLRU persistent it won't bloat.
We then leave the Members SLRU as non-persistent, just as it was <9.3

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Multixid hindsight design
Next
From: Simon Riggs
Date:
Subject: Re: Multixid hindsight design