Summary and Plan for Hot Standby - Mailing list pgsql-hackers

After some time thinking about the best way forward for Hot Standby, I
have some observations and proposals.

First, the project is very large. We have agreed ways to trim the patch,
yet it remains large. Trying to do everything in one lump is almost
always a bad plan, so we need to phase things.

Second, everybody is keen that HS hits the tree, so we can have alpha
code etc.. There are a few remaining issues that should *not* be rushed.
The only way to remove this dependency is to decouple parts of the
project.

Third, testing the patch is difficult and continuous change makes it
harder to guarantee everything is working.

There are two remaining areas of significant thought/effort:

* Issues relating to handling of prepared transactions
* How fast Hot Standby mode is enabled in the standby

I propose that we stabilise and eventually commit a version of HS that
circumvents/defers those issues and then address the issues with
separate patches afterwards. This approach will allow us to isolate the
areas of further change so we can have a test blitz to remove silly
mistakes, then follow it with a commit to CVS, and then release as Alpha
to allow further testing.

Let's look at the two areas of difficulty in more detail

* Issues relating to handling of prepared transactions
There are some delicate issues surrounding what happens at the end of
recovery if there is a prepared transaction still holding an access
exclusive lock. It is straightforward to say, as an interim measure,
"Hot Standby will not work with max_prepared_transactions > 0". I see
that this has a fiddly, yet fairly clear solution.

* How fast Hot Standby mode is enabled in the standby
We need to have full snapshot information on the standby before we can
allow connections and queries. There are two basic approaches: i) we
wait until we *know* we have full info or ii) we try to collect data and
inject a correct starting condition. Waiting (i) may take a while, but
is clean and requires only a few lines of code. Injecting the starting
condition (ii) requires boatloads of hectic code and we have been unable
to agree a way forwards. If we did have that code, all it would give us
is a faster/more reliable starting point for connections on the standby.
Until we can make approach (ii) work, we should just rely on the easy
approach (i). In many cases, the starting point is very similar. (In
some cases we can actually make (i) faster because the overhead of data
collection forces us to derive the starting conditions minutes apart.)

Phasing the commit seems like the only way.

Please can we agree a way forwards?

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: NULL input for array_agg()?
Next
From: Magnus Hagander
Date:
Subject: Re: Summary and Plan for Hot Standby