On Wed, Jan 20, 2016 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think a lesson we *should* have learned by now is that we need to put
> more emphasis on testing. That includes not only spending more time on
> it, but investing more in testing infrastructure. The buildfarm has been
> a huge advance in our ability to find/fix portability issues quickly, but
> it does nothing to, say, assess crash safety. Or identify performance
> regressions.
I might add: Making testability an explicit goal of development from
the very beginning, during the design of the feature. In essence,
reducing the surface area for bugs when (not if) they happen, by being
able to easily simulate various stresses. Or better yet, making sure
that each mechanism modified has very little chance of interacting
with some other complicated mechanism in production. Things that
happen just once every 6 months on many production systems seemed to
be where MultiXacts kept giving trouble.
My pet example is how UPSERT doesn't change anything about the on-disk
representation of heap tuples, unless their xact aborted.
> As a concrete example, I recall that Heikki or someone had a tool for
> checking WAL replay by comparing master and slave disk contents. We
> should make an effort to get that into a state where anyone can use it.
I think that that was initially Jeff Janes. Jeff also provided great
testing infrastructure for UPSERT, something that I was very grateful
for.
--
Peter Geoghegan