Re: SSI and Hot Standby - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: SSI and Hot Standby
Date
Msg-id 876DA065-D013-4364-836E-3870D9986D34@phlo.org
Whole thread Raw
In response to Re: SSI and Hot Standby  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
Responses Re: SSI and Hot Standby  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
List pgsql-hackers
On Jan21, 2011, at 12:55 , Anssi Kääriäinen wrote:
> On 01/21/2011 03:25 AM, Florian Pflug wrote:
>> The COMMIT order in the actual, concurrent, schedule doesn't not necessarily
>> represent the order of the transaction in an equivalent serial schedule. Here's
>> an example
>>
>> T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here)
>> T1: UPDATE D1 ... ;
>> T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here)
>> T2: SELECT * FROM D1 ... ;
>> T2: UPDATE D2 ... ;
>> T1: COMMIT;
>> T3: SELECT * FROM D1, D2;
>> T2: COMMIT;
>>
>> Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent
>> serial schedule. In any such schedule
>>
>> T2 must run before T1 because T2 didn't see T1's changes to D1
>> T3 must run after T1 because T3 did see T1's changes to D1
>> T3 must run before T2 because T3 didn't see T2's changes to D2
>>
>> This is obviously impossible - if T3 runs before T2 and T2 runs before T1
>> then T3 runs before T1, contradicting the second requirement. There is thus
>> no equivalent serial schedule and we must abort of these transactions with
>> a serialization error.
>>
>> Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone,
>> an equivalent serial schedule is T2,T1!

> Sorry for bothering all of you, but I just don't get this. What if T2 rolls back instead of committing? Then the
snapshotof T3 would have been valid, right?  

Yeah. If T2 is removed from the picture, the only remaining ordering constraint is "T3 must run after T1 because T3 did
seeT1's changes to D1", and thus T1,T3 is an equivalent serial schedule. 

> Now, for the snapshot of T3 it doesn't matter if T2 commits or if it doesn't, because it can't see the changes of T2
inany case. Thus, it would seem that the snapshot is valid. On the other hand I can't see anything wrong in the logic
inyour post. What am I missing? I am feeling stupid...   

The problem is that T3 sees the effects of T1 but not those of T2. Since T2 must run *before* T1 in any equivalent
serialschedule, that is impossible. In other words, if you look at an equivalent serial schedule of the *writing*
transactionsT1 and T2 you won't find a place to insert T3 such that it gives the same answer as in the concurrent
schedule.

It isn't really T3's snapshot that is invalid, it's the interleaving of T1,T2,T3 because there is no equivalent serial
schedule(a serial schedule the produces the same results). If, for example T3 reads only *one* of the tables D1,D2 then
thewhole thing suddenly *is* serializable! If T3 reads only D1 an equivalent serial schedule must run T3 after T1, and
ifit reads only D2 then it must run before T2. 

That "validity" of snapshots comes into play if you attempt to distinguish safe and unsafe interleaved schedules
*without*taking the dataset inspected by T3 into account. So you simply assume that T3 reads the *whole* database
(sincethats the worst case), and must thus run *after* all transactions I didn't see as COMMITTED in any serial
schedule.

The root of the whole issue is that this might not be possible! Some not-yet-committed transaction (T2 in the example)
mayhave to be placed *before* some transaction seen as COMMITTED by T3 (T1 is our example). Since T3 needs to run
*after*T1 (since it saw it as committed) it'd thus also see T2 in any serial schedule. But it didn't see T2 in the
interleavedschedule, we're hosed. 

> At least for dumps I don't see how T2 can matter (assuming T3 is the pg_dump's snapshot). Because if you reload from
thedump, T2 never happened in that dump. In the reloaded database it just did not exist at all. 

Still, the would dump reflects a database state that *logically* never existed (i.e. not in any serial schedule). If
youdump for disaster recovery, you might not care. If you dump to copy the data onto some reporting server you might. 

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: Error code for "terminating connection due to conflict with recovery"
Next
From: Florian Pflug
Date:
Subject: Re: REVIEW: EXPLAIN and nfiltered