Re: [CORE] postpone next week's release - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [CORE] postpone next week's release
Date
Msg-id 20150604091742.GI18006@awork2.anarazel.de
Whole thread Raw
In response to Re: [CORE] postpone next week's release  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote:
> I think this explanation is wrong. I agree that there are many places that
> would be good to refactor - like StartupXLOG() - but the multixact code was
> not too bad in that regard. IIRC the patch included some refactoring, it
> added some new helper functions in heapam.c, for example. You can argue that
> it didn't do enough of it, but that was not the big issue.

Yea, but the bugs were more around the interactions to other parts of
the system. Like e.g. crash recovery, which now is about bug 7 or
so. And those are the ones that are hard to understand.

> The big issue was at the architecture level. Basically, we liked vacuuming
> of XIDs and clog so much that we decided that it'd be nice if you had to
> vacuum multixids too, in order to not lose data. Many of the bugs and issues
> were not new - we had multixids before - but we upped the ante and turned
> minor locking bugs into data loss. And that had nothing to do with the code
> structure - we'd have similar issues if we had rewritten everything java,
> with the same design.

I think we're probably just using slightly different terms here - for me
one very good way of fixing some structurally bad things *is* improving
the design.

If you look at the bugs around multixacts: The first few were around
ctid-chaining, hard to find and fix because there's about 8-10 places
implementing it with slight differences.  The next bunch were around
vacuuming, some of them oversights, a good bunch of them more
fundamental. Crash recovery wasn't thought about (lack of
testing/review), and more generally the new code tripped over bad old
decisions (hey, wraparound is ok!).  Then there were a bunch of stupid
bugs in crash-recovery (testing mainly), and larger scale bugs (hey, let's
access stuff during recovery).  Then there's the whole row level locking
code - which is by now among the hardest to understand code in
postgres - and voila it contained a bunch of oversights that were hard
to spot.

So yes, I think nicer code to work with would have prevented us from
making a significant portion of these. It might have also made us
realize earlier how significant the increase in complexity was.

> So, I'm all for refactoring and adding abstractions where it makes sense,
> but it's not going to solve design problems.

I personally don't really see the multixact changes being that bad on
the overall design. It pretty much just extended an earlier design. Now
that wasn't great, but I don't think too many people had realized that
at that point.  The biggest problem was underestimating the complexity.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [CORE] postpone next week's release
Next
From: Heikki Linnakangas
Date:
Subject: Re: [CORE] postpone next week's release