Simon Riggs wrote:
> On Wed, 2008-10-22 at 12:29 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Thu, 2008-10-16 at 18:52 +0300, Heikki Linnakangas wrote:
>>>> Simon Riggs wrote:
>>>>> * The backend slot may not be reused for some time, so we should take
>>>>> additional actions to keep state current and true. So we choose to log a
>>>>> snapshot from the master into WAL after each checkpoint. This can then
>>>>> be used to cleanup any unobserved xids. It also provides us with our
>>>>> initial state data, see later.
>>>> We don't need to log a complete snapshot, do we? Just oldestxmin should
>>>> be enough.
>>> Possibly, but you're thinking that once we're up and running we can use
>>> less info.
>>>
>>> Trouble is, you don't know when/if the standby will crash/be shutdown.
>>> So we need regular full snapshots to allow it to re-establish full
>>> information at regular points. So we may as well drop the whole snapshot
>>> to WAL every checkpoint. To do otherwise would mean more code and less
>>> flexibility.
>> Surely it's less code to write the OldestXmin to the checkpoint record,
>> rather than a full snapshot, no? And to read it off the checkpoint record.
>
> You may be missing my point.
>
> We need an initial state to work from.
>
> I am proposing we write a full snapshot after each checkpoint because it
> allows us to start recovery again from that point. If we wrote only
> OldestXmin as you suggest it would optimise the size of the WAL record
> but it would prevent us from restarting at that point.
Well, you'd just need to treat anything > oldestxmin, and not marked as
finished in clog, as unobserved. Which doesn't seem too bad. Not that
storing the full list of in-progress xids is that bad either, though.
Hmm. What about in-progress subtransactions that have overflowed the
shared mem cache? Can we rely that subtrans is up-to-date, up to the
checkpoint record that recovery starts from?
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com