Thread: SSI and 2PC

SSI and 2PC

From
"Kevin Grittner"
Date:
In going back through old emails to see what issues might have been
raised but not yet addressed for the SSI patch, I found the subject
issue described in a review by Jeff Davis here:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php
I think this is already handled based on feedback from Heikki:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00789.php
Basically, the PREPARE TRANSACTION statement plays the same role for
the prepared transaction that COMMIT does for other transactions --
final conflict checking is done and the transaction becomes immune to
later serialization_failure rollback.  A transaction which starts
after PREPARE TRANSACTION executes is not considered to overlap with
the prepared transaction.
In Jeff's example, if the Session2 runs a query before Session1
executes PREPARE TRANSACTION, one of them will fail.  (I tested to
make sure.)
Does that sound sane, or is something else needed here?
-Kevin



Re: SSI and 2PC

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> In going back through old emails to see what issues might have
> been raised but not yet addressed for the SSI patch, I found the
> subject issue described in a review by Jeff Davis here:
>  
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php
After reviewing the docs and testing things, I'm convinced that more
work is needed.  Because the transaction's writes aren't visible
until COMMIT PREPARED is run, and write-write conflicts are still
causing serialization failures after PREPARE TRANSACTION, some of
the work being done for SSI on PREPARE TRANSACTION needs to be moved
to COMMIT PREPARED.
It seems likely that shops who use prepared transactions are more
likely than most to care about truly serializable transactions, so I
don't think I should write this off as a limitation for the 9.1
implementation.  Unless someone sees some dire problem with the
patch which I've missed, this seems like my top priority to fix
before cutting a patch.
-Kevin


Re: SSI and 2PC

From
David Fetter
Date:
On Mon, Jan 10, 2011 at 08:49:12AM -0600, Kevin Grittner wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>  
> > In going back through old emails to see what issues might have
> > been raised but not yet addressed for the SSI patch, I found the
> > subject issue described in a review by Jeff Davis here:
> >  
> > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php
>  
> After reviewing the docs and testing things, I'm convinced that more
> work is needed.  Because the transaction's writes aren't visible
> until COMMIT PREPARED is run, and write-write conflicts are still
> causing serialization failures after PREPARE TRANSACTION, some of
> the work being done for SSI on PREPARE TRANSACTION needs to be moved
> to COMMIT PREPARED.
>  
> It seems likely that shops who use prepared transactions are more
> likely than most to care about truly serializable transactions, so I
> don't think I should write this off as a limitation for the 9.1
> implementation.  Unless someone sees some dire problem with the
> patch which I've missed, this seems like my top priority to fix
> before cutting a patch.

Could people fix it after the patch?  ISTM that a great way to test it
is to make very sure it's available ASAP to a wide range of people via
the next alpha (or beta, if that's where we're going next).

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: SSI and 2PC

From
"Kevin Grittner"
Date:
David Fetter <david@fetter.org> wrote:
> Could people fix it after the patch?  ISTM that a great way to
> test it is to make very sure it's available ASAP to a wide range
> of people via the next alpha (or beta, if that's where we're going
> next).
People can always pull from the git repo:
git://git.postgresql.org/git/users/kgrittn/postgres.git
Also, I can post a patch against HEAD at any time.  Should I post
one now, and then again after this is solved?
Full disclosure requires that I mention that while Dan has completed
code to fix the page split/combine issues Heikki raised, I don't
think he's done testing it.  (It's hard to test because you don't
hit the problem unless you have a page split or combine right at the
point where the hash table for predicate lock becomes full.)  So,
anyway, there could possibly be some wet paint there.
-Kevin


Re: SSI and 2PC

From
David Fetter
Date:
On Mon, Jan 10, 2011 at 08:59:45AM -0600, Kevin Grittner wrote:
> David Fetter <david@fetter.org> wrote:
> > Could people fix it after the patch?  ISTM that a great way to
> > test it is to make very sure it's available ASAP to a wide range
> > of people via the next alpha (or beta, if that's where we're going
> > next).
>  
> People can always pull from the git repo:
>  
> git://git.postgresql.org/git/users/kgrittn/postgres.git
>  
> Also, I can post a patch against HEAD at any time.  Should I post
> one now, and then again after this is solved?
>  
> Full disclosure requires that I mention that while Dan has completed
> code to fix the page split/combine issues Heikki raised, I don't
> think he's done testing it.  (It's hard to test because you don't
> hit the problem unless you have a page split or combine right at the
> point where the hash table for predicate lock becomes full.)  So,
> anyway, there could possibly be some wet paint there.

Short of a test suite that can inject faults at the exact kinds of
places where this occurs and a way to enumerate all those faults,
there's only so much testing that's possible to do /in vitro/.  Oh,
and such enumerations tend to be combinatorial explosions anyhow. :P

At some point, and that point is rapidly approaching if it's not
already here, you've done what you can to shake out bugs and
infelicities, and the next steps are up to people testing alphas,
betas, and to be completely frank, 9.1.0 and possibly later versions.

This is way, way too big a feature to expect you can get a perfect
handle on it by theory alone.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: SSI and 2PC

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>  
>> In going back through old emails to see what issues might have
>> been raised but not yet addressed for the SSI patch, I found the
>> subject issue described in a review by Jeff Davis here:
>>  
>> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php
>  
> After reviewing the docs and testing things, I'm convinced that
> more work is needed.  Because the transaction's writes aren't
> visible until COMMIT PREPARED is run, and write-write conflicts
> are still causing serialization failures after PREPARE
> TRANSACTION, some of the work being done for SSI on PREPARE
> TRANSACTION needs to be moved to COMMIT PREPARED.
I'm now also convinced that Jeff is right in his assessment that
when a transaction is prepared, information about predicate locks
and conflicts with other prepared transactions must be persisted
somewhere.  (Jeff referred to a "2PC state file".)
I'm trying not to panic here, but I haven't looked at 2PC before
yesterday and am just dipping into the code to support it, and time
is short.  Can anyone give me a pointer to anything I should read
before I dig through the 2PC code, which might accelerate this?
-Kevin


Re: SSI and 2PC

From
Jeff Davis
Date:
On Mon, 2011-01-10 at 11:50 -0600, Kevin Grittner wrote:
> I'm trying not to panic here, but I haven't looked at 2PC before
> yesterday and am just dipping into the code to support it, and time
> is short.  Can anyone give me a pointer to anything I should read
> before I dig through the 2PC code, which might accelerate this?

I don't see much about 2PC outside of twophase.c.

Regarding the original post, I agree that we should have two
phase-commit support for SSI. We opted not to support it for
notifications, but there was a fairly reasonable argument why users
wouldn't value the combination of 2PC and NOTIFY.

I don't expect this to be a huge roadblock for the feature though. It
seems fairly contained. I haven't read the 2PC code either, but I don't
expect that you'll need to change the rest of your algorithm just to
support it.

Regards,Jeff Davis



Re: SSI and 2PC

From
Florian Pflug
Date:
On Jan10, 2011, at 18:50 , Kevin Grittner wrote:
> I'm trying not to panic here, but I haven't looked at 2PC before
> yesterday and am just dipping into the code to support it, and time
> is short.  Can anyone give me a pointer to anything I should read
> before I dig through the 2PC code, which might accelerate this?


It roughly works as follows

Upon PREPARE, the locks previously held by the transaction are transferred
to a kind of virtual backend which only consists of a special proc array
entry. The transaction will thus still appear to be running, and will still
be holding its locks, even after the original backend is gone. The information
necessary to reconstruct that proc array entry is also written to the 2PC state,
and used to recreate the "virtual backend" after a restart or crash.

There are also some additional pieces of transaction state which are stored
in the 2PC state file like the full list of subtransaction xids (The proc array
entry may not contain all of them if it overflowed).

Upon COMMIT PREPARED, the information in the 2PC state file is used to write
a COMMIT wal record and to update the clog. The transaction is then committed,
and the special proc array entry is removed and all lockmgr locks it held are
released.

For 2PC to work for SSI transaction, I guess you must check for conflicts
during PREPARE - at any later point the COMMIT may only fail transiently,
not permanently. Any transaction that adds a conflict with an already
prepared transaction must check if that conflict completes a dangerous
structure, and abort if this is the case, since the already PREPAREd transaction
can no longer be aborted. COMMIT PREPARED then probably doesn't need to do
anything special for SSI transactions, apart from some cleanup actions maybe.

Unfortunately, it seems that doing things this way will undermine the guarantee
that retrying a failed SSI transaction won't fail due to the same conflict as
it did originally. Consider

T1> BEGIN TRANSACTION ISOLATION SERIALIZABLE
T1> SELECT * FROM T
T1> UPDATE T ...
T1> PREPARE TRANSACTION

T2> BEGIN TRANSACTION ISOLATION SERIALIZABLE
T2> SELECT * FROM T
T2> UPDATE T ...   -> Serialization Error

Retrying T2 won't help as long as T1 isn't COMMITTED.

There doesn't seems a way around that, however - any correct implementation
of 2PC for SSI will have to behave that way I fear :-(

Hope this helps & best regards,
Florian Pflug



Re: SSI and 2PC

From
"Kevin Grittner"
Date:
Florian Pflug <fgp@phlo.org> wrote:
> On Jan10, 2011, at 18:50 , Kevin Grittner wrote:
>> I'm trying not to panic here, but I haven't looked at 2PC before
>> yesterday and am just dipping into the code to support it, and
>> time is short.  Can anyone give me a pointer to anything I should
>> read before I dig through the 2PC code, which might accelerate
>> this?
> 
> 
> It roughly works as follows
> 
> Upon PREPARE, the locks previously held by the transaction are
> transferred to a kind of virtual backend which only consists of a
> special proc array entry. The transaction will thus still appear
> to be running, and will still be holding its locks, even after the
> original backend is gone. The information necessary to reconstruct
> that proc array entry is also written to the 2PC state, and used
> to recreate the "virtual backend" after a restart or crash.
> 
> There are also some additional pieces of transaction state which
> are stored in the 2PC state file like the full list of
> subtransaction xids (The proc array entry may not contain all of
> them if it overflowed). 
> 
> Upon COMMIT PREPARED, the information in the 2PC state file is
> used to write a COMMIT wal record and to update the clog. The
> transaction is then committed, and the special proc array entry is
> removed and all lockmgr locks it held are released.
> 
> For 2PC to work for SSI transaction, I guess you must check for
> conflicts during PREPARE - at any later point the COMMIT may only
> fail transiently, not permanently. Any transaction that adds a
> conflict with an already prepared transaction must check if that
> conflict completes a dangerous structure, and abort if this is the
> case, since the already PREPAREd transaction can no longer be
> aborted. COMMIT PREPARED then probably doesn't need to do anything
> special for SSI transactions, apart from some cleanup actions
> maybe.
Thanks; that all makes sense.  The devil, as they say, is in the
details.  As far as I've worked it out, the PREPARE must persist
both the predicate locks and any conflict pointers which are to
other prepared transactions.  That leaves some fussy work around the
coming and going of prepared transactions, because on recovery you
need to be prepared to ignore conflict pointers with prepared
transactions which committed or rolled back.
What I haven't found yet is the right place and means to persist and
recover this stuff, but that's just a matter of digging through
enough source code.  Any tips regarding that may save time.  I'm
also not clear on what, if anything, needs to be written to WAL. I'm
really fuzzy on that, still.
> Unfortunately, it seems that doing things this way will undermine
> the guarantee that retrying a failed SSI transaction won't fail
> due to the same conflict as it did originally.
I hadn't thought of that, but you're right.  Of course, I can't
enforce that guarantee, anyway, without some other patch first being
there to allow me to cancel other transactions with
serialization_failure, even if they are "idle in transaction".
> There doesn't seems a way around that, however - any correct
> implementation of 2PC for SSI will have to behave that way I fear
> :-(
I think you're right.
> Hope this helps & best regards,
It does.  Even the parts which just confirm my tentative conclusions
save me time in not feeling like I need to cross-check so much.  I
can move forward with more confidence.  Thanks.
-Kevin


Re: SSI and 2PC

From
"Kevin Grittner"
Date:
Jeff Davis <pgsql@j-davis.com> wrote:
> I don't expect this to be a huge roadblock for the feature though.
> It seems fairly contained. I haven't read the 2PC code either, but
> I don't expect that you'll need to change the rest of your
> algorithm just to support it.
Agreed; but I am starting to get concerned about whether this
particular area can be completed by the start of the CF.  I might
run a few days over on 2PC support.  Unless ... Dan?  Could you look
into this while I chase down the issue Anssi raised?
-Kevin


Re: SSI and 2PC

From
Heikki Linnakangas
Date:
On 11.01.2011 20:08, Florian Pflug wrote:
> Unfortunately, it seems that doing things this way will undermine the guarantee
> that retrying a failed SSI transaction won't fail due to the same conflict as
> it did originally. Consider
>
> T1>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
> T1>  SELECT * FROM T
> T1>  UPDATE T ...
> T1>  PREPARE TRANSACTION
>
> T2>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
> T2>  SELECT * FROM T
> T2>  UPDATE T ...
>      ->  Serialization Error
>
> Retrying T2 won't help as long as T1 isn't COMMITTED.

T2 should block until T1 commits. I would be very surprised if it 
doesn't behave like that already. In general, a prepared transaction 
should be treated like an in-progress transaction - it might still abort 
too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: SSI and 2PC

From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 11.01.2011 20:08, Florian Pflug wrote:
>> Unfortunately, it seems that doing things this way will undermine
>> the guarantee that retrying a failed SSI transaction won't fail
>> due to the same conflict as it did originally. Consider
>>
>> T1>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T1>  SELECT * FROM T
>> T1>  UPDATE T ...
>> T1>  PREPARE TRANSACTION
>>
>> T2>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T2>  SELECT * FROM T
>> T2>  UPDATE T ...
>>      ->  Serialization Error
>>
>> Retrying T2 won't help as long as T1 isn't COMMITTED.
> 
> T2 should block until T1 commits. I would be very surprised if it 
> doesn't behave like that already. In general, a prepared
> transaction should be treated like an in-progress transaction - it
> might still abort too.
It shouldn't block if the updates were to different rows, which is
what I took Florian to mean; otherwise this would be handled by SI
and would have nothing to do with the SSI patch.  SSI doesn't
introduce any new blocking (with the one exception of the READ ONLY
DEFERRABLE style we invented to support long-running reports and
backups, and all blocking there is at the front -- once it's
running, it's going full speed ahead).
-Kevin


Re: SSI and 2PC

From
Florian Pflug
Date:
On Jan11, 2011, at 19:41 , Heikki Linnakangas wrote:
> On 11.01.2011 20:08, Florian Pflug wrote:
>> Unfortunately, it seems that doing things this way will undermine the guarantee
>> that retrying a failed SSI transaction won't fail due to the same conflict as
>> it did originally. Consider
>>
>> T1>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T1>  SELECT * FROM T
>> T1>  UPDATE T ...
>> T1>  PREPARE TRANSACTION
>>
>> T2>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T2>  SELECT * FROM T
>> T2>  UPDATE T ...
>>     ->  Serialization Error
>>
>> Retrying T2 won't help as long as T1 isn't COMMITTED.
>
> T2 should block until T1 commits.

The serialization error will occur even if T1 and T2 update *different* rows. This is
due to the SELECTs in the interleaved schedule above returning the state of T prior to
both T1 and T2. Which of course never the case for a serial schedule.

best regards,
Florian Pflug



Re: SSI and 2PC

From
Dan Ports
Date:
On Tue, Jan 11, 2011 at 12:34:44PM -0600, Kevin Grittner wrote:
> Agreed; but I am starting to get concerned about whether this
> particular area can be completed by the start of the CF.  I might
> run a few days over on 2PC support.  Unless ... Dan?  Could you look
> into this while I chase down the issue Anssi raised?

I'll take a look at it, but be forewarned that I currently know
extremely little about 2PC in Postgres...

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/