Re: Deriving Recovery Snapshots - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Deriving Recovery Snapshots
Date
Msg-id 48FF3635.4000109@enterprisedb.com
Whole thread Raw
In response to Re: Deriving Recovery Snapshots  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Deriving Recovery Snapshots
Re: Deriving Recovery Snapshots
List pgsql-hackers
Simon Riggs wrote:
> On Wed, 2008-10-22 at 12:29 +0300, Heikki Linnakangas wrote:
> 
>> How about:
>>
>> 1. Keep all transactions and subtransactions in UnobservedXids.
>> 2. If it fills up, remove all subtransactions from it, that the startup 
>> process knows to be subtransactions and knows the parents, and update 
>> subtrans. Mark the array as overflowed.
>>
>> To take a snapshot, a backend simply copies UnobservedXids array and the 
>> flag. If it hasn't overflowed, a transaction is considered to be in 
>> progress if it's in the array. If it has overflowed, and the xid is not 
>> in the array, check subtrans
> 
> We can't check subtrans. We do not have any record of what the parent is
> for an unobserved transaction id. So the complete list of unobserved
> xids *must* be added to the snapshot. If that makes snapshot overflow,
> we have a big problem: we would be forced to say "sorry snapshot cannot
> be issued at this time, please wait". Ugh!

That's why we still need the occasional WAL logging in 
AssignTransactionId(). To log the parent-child relationships of the 
subtransactions.

>> For the startup process to know about the parent-child relationships, 
>> we'll need something like WAL changes you suggested. I'm not too 
>> thrilled about adding a new field to all WAL records. Seems simpler to 
>> just rely on the new WAL records on AssignTransactionId(), and we can 
>> only do it, say, every 100 subtransactions, if we make the 
>> UnobservedXids array big enough (100*max_connections).
> 
> Yes, we can make the UnobservedXids array bigger, but only to the point
> where it will all fit within a snapshot.

The list of xids in a snapshot is just a palloc'd array, in 
backend-local memory, so we can easily make it as large as we need to.

> Every new subxid needs to specify its parent's xid. We must supply that
> information somehow: either via an XLOG_XACT_ASSIGNMENT, or as I have
> done in most cases, tuck that into the wasted space on the xlrec.
> Writing a WAL record every 100 subtransactions will not work: we need to
> write to subtrans *before* that xid appears anywhere on disk, so that
> visibility tests can determine the status of the transaction.

I don't follow. It doesn't need to be in subtrans before it appears on 
disk, AFAICS. It can be stored in UnobservedXids at first, and when it 
overflows, we can update subtrans and remove the entries from 
UnobservedXids. A snapshot taken before the overflow will have the 
subxid in its copy of UnobservedXids, and one taken after overflow will 
find it in subtrans.

If UnobservedXids is large enough to hold, say 100 * max_connections 
xids, by writing a WAL record containing the parent-child relationships 
every 100 assigned subtransactions within a top-level transaction, the 
top-level transactions and those subtransactions that we don't know the 
parent of will always fit into UnobservedXids.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: SQL:2008 CURRENT_CATALOG and CURRENT_SCHEMA
Next
From: Heikki Linnakangas
Date:
Subject: Re: Deriving Recovery Snapshots