Re: SSI and Hot Standby - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: SSI and Hot Standby
Date
Msg-id 28809270-AE80-4F89-9DE5-556C92D9B53D@phlo.org
Whole thread Raw
In response to Re: SSI and Hot Standby  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: SSI and Hot Standby  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
List pgsql-hackers
On Jan21, 2011, at 01:28 , Simon Riggs wrote:
> What I'm still not clear on is why that HS is different. Whatever rules
> apply on the master must also apply on the standby, immutably. Why is it
> we need to pass explicit snapshot information from master to standby? We
> don't do that, except at startup for normal HS. Why do we need that?

> I hear, but do not yet understand, that the SSI transaction sequence on
> the master may differ from the WAL transaction sequence. Is it important
> that the ordering on the master would differ from the standby?

The COMMIT order in the actual, concurrent, schedule doesn't not necessarily
represent the order of the transaction in an equivalent serial schedule. Here's
an example

T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here)
T1: UPDATE D1 ... ;
T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here)
T2: SELECT * FROM D1 ... ;
T2: UPDATE D2 ... ;
T1: COMMIT;
T3: SELECT * FROM D1, D2;
T2: COMMIT;

Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent
serial schedule. In any such schedule

T2 must run before T1 because T2 didn't see T1's changes to D1
T3 must run after T1 because T3 did see T1's changes to D1
T3 must run before T2 because T3 didn't see T2's changes to D2

This is obviously impossible - if T3 runs before T2 and T2 runs before T1
then T3 runs before T1, contradicting the second requirement. There is thus
no equivalent serial schedule and we must abort of these transactions with
a serialization error.

Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone,
an equivalent serial schedule is T2,T1!

On the master, these "run before" requirement are tracked by remembering which
transaction read which parts of the data via the SIREAD-lock mechanism (These
are more flags than locks, since nobody ever blocks on them).

Since we do not want to report SIREAD locks back to the master, the slave has
to prevent this another way. Kevin's proposed solution does that by only using
those snapshots on the slave for which reading the *whole* database is safe. The
downside is that whether or not a snapshot is safe can only be decided after all
concurrent transactions have finished. The snapshot is thus always a bit outdated,
but shows that state that is known to be possible in some serial schedule.

The very same mechanism can be used on the master also by setting the isolation
level to SERIALIZABLE READ ONLY DEFERRED.

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Dan Ports
Date:
Subject: Re: SSI and Hot Standby
Next
From: Josh Berkus
Date:
Subject: Re: One Role, Two Passwords