Home > mailing lists

Re: Speedup twophase transactions - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Speedup twophase transactions
Date	April 7, 2016 09:29:25
Msg-id	CAB7nPqQgwCcCiCTvGsHJyDDiFs-SXVkNc9=Yy94DVfXLm8d7Tw@mail.gmail.com Whole thread Raw
In response to	Re: Speedup twophase transactions (Stas Kelvich <s.kelvich@postgrespro.ru>)
Responses	Re: Speedup twophase transactions (Jesper Pedersen <jesper.pedersen@redhat.com>) Re: Speedup twophase transactions (Stas Kelvich <s.kelvich@postgrespro.ru>) Re: Speedup twophase transactions (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

On Wed, Apr 6, 2016 at 6:47 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
>> On Apr 2, 2016, at 3:14 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Fri, Apr 1, 2016 at 10:53 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
>>> I wrote:
>>>> While testing the patch, I found a bug in the recovery conflict code
>>>> path. You can do the following to reproduce it:
>>>> 1) Start a master with a standby
>>>> 2) prepare a transaction on master
>>>> 3) Stop immediate on standby to force replay
>>>> 4) commit prepare transaction on master
>>>> 5) When starting the standby, it remains stuck here:
>>>
>>> Hm, I wasn’t able to reproduce that. Do you mean following scenario or am I missing something?
>>>
>>> (async replication)
>>>
>>> $node_master->psql('postgres', "
>>>        begin;
>>>        insert into t values (1);
>>>        prepare transaction 'x';
>>> ");
>>> $node_slave->teardown_node;
>>> $node_master->psql('postgres',"commit prepared 'x'");
>>> $node_slave->start;
>>> $node_slave->psql('postgres',"select count(*) from pg_prepared_xacts", stdout => \$psql_out);
>>> is($psql_out, '0', "Commit prepared on master while slave is down.");
>>
>> Actually, not exactly, the transaction prepared on master created a
>> table. Sorry for the lack of precisions in my review.
>
> Sorry for delay.
>
> Actually I can’t reproduce that again, tried with following tx:

Well not for me, here are more details, with a test case attached:
* thread #1: tid = 0x50c5b, 0x00007fff93822f96
libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread',
stop reason = signal SIGSTOP
  * frame #0: 0x00007fff93822f96 libsystem_kernel.dylib`poll + 10
    frame #1: 0x00000001023cdda3
postgres`WaitEventSetWaitBlock(set=0x00007fde50858f28, cur_timeout=-1,
occurred_events=0x00007fff5dc87cf8, nevents=1) + 51 at latch.c:1102
    frame #2: 0x00000001023cd786
postgres`WaitEventSetWait(set=0x00007fde50858f28, timeout=-1,
occurred_events=0x00007fff5dc87cf8, nevents=1) + 390 at latch.c:935
    frame #3: 0x00000001023cd227
postgres`WaitLatchOrSocket(latch=0x000000010b9b12e4, wakeEvents=1,
sock=-1, timeout=-1) + 343 at latch.c:347
    frame #4: 0x00000001023cd0ca
postgres`WaitLatch(latch=0x000000010b9b12e4, wakeEvents=1, timeout=0)
+ 42 at latch.c:302
    frame #5: 0x00000001023eb306 postgres`ProcWaitForSignal + 38 at proc.c:1731
    frame #6: 0x00000001023da24b
postgres`ResolveRecoveryConflictWithLock(locktag=LOCKTAG at
0x00007fff5dc87db8) + 187 at standby.c:391
    frame #7: 0x00000001023ea408
postgres`ProcSleep(locallock=0x00007fde5083dac8,
lockMethodTable=0x000000010286e278) + 1128 at proc.c:1215
    frame #8: 0x00000001023e25e6
postgres`WaitOnLock(locallock=0x00007fde5083dac8,
owner=0x0000000000000000) + 358 at lock.c:1703
    frame #9: 0x00000001023e0cf3
postgres`LockAcquireExtended(locktag=0x00007fff5dc88208, lockmode=8,
sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at
lock.c:998
    frame #10: 0x00000001023da706
postgres`StandbyAcquireAccessExclusiveLock(xid=867, dbOid=16384,
relOid=16385) + 358 at standby.c:627
    frame #11: 0x00000001023dac6b
postgres`standby_redo(record=0x00007fde50841e38) + 251 at
standby.c:809

(LOCALLOCK) $1 = {
  tag = {
    lock = {
      locktag_field1 = 16384
      locktag_field2 = 16385
      locktag_field3 = 0
      locktag_field4 = 0
      locktag_type = '\0'
      locktag_lockmethodid = '\x01'
    }
    mode = 8
  }

=# select relname, oid from pg_class where oid > 16000;
 relname |  oid
---------+-------
 aa      | 16385
(1 row)
So recovery is conflicting here. My guess is that this patch is
missing some lock cleanup.

With the test case attached in my case the COMMIT PREPARED record does
not even get replayed.
--
Michael

Attachment

twophase_recovery_bug_2.pl

pgsql-hackers by date:

From: Fujii Masao
Date: 07 April 2016, 09:26:52
Subject: Re: Support for N synchronous standby servers - take 2

From: "Karl O. Pinc"
Date: 07 April 2016, 09:30:36
Subject: Re: Patch to implement pg_current_logfile() function

Re: Speedup twophase transactions - Mailing list pgsql-hackers

Attachment

Previous

Next