Thread: [BUGS] BUG #14721: Assertion of synchronous replication

[BUGS] BUG #14721: Assertion of synchronous replication

From
const_sunny@126.com
Date:
The following bug has been logged on the website:

Bug reference:      14721
Logged by:          Const Zhang
Email address:      const_sunny@126.com
PostgreSQL version: 9.6.2
Operating system:   CentOS7
Description:

Hi all!
   I have found a bug about synchronous replication.   At first, see the stack of the core file.

1
(gdb) bt
2
#0 0x00007fe9aab2e1d7 in raise () from /lib64/libc.so.6
3
#1 0x00007fe9aab2f8c8 in abort () from /lib64/libc.so.6
4
#2 0x0000000000af0699 in ExceptionalCondition (conditionName=0xcdc111
"!(SHMQueueIsDetached(&(MyProc->syncRepLinks)))", errorType=0xb6c443
"FailedAssertion",
5
fileName=0xcdc140
"/home/zl/workspace_pg962/postgres/src/backend/replication/syncrep.c",
lineNumber=294) at
/home/zl/workspace_pg962/postgres/src/backend/utils/error/assert.c:54
6
#3 0x00000000008c7e94 in SyncRepWaitForLSN (lsn=50435080, commit=1 '\001')
at /home/zl/workspace_pg962/postgres/src/backend/replication/syncrep.c:294
7
#4 0x000000000056ed11 in RecordTransactionCommit () at
/home/zl/workspace_pg962/postgres/src/backend/access/transam/xact.c:1343
8
#5 0x0000000000568c96 in CommitTransaction () at
/home/zl/workspace_pg962/postgres/src/backend/access/transam/xact.c:2041
9
#6 0x0000000000568717 in CommitTransactionCommand () at
/home/zl/workspace_pg962/postgres/src/backend/access/transam/xact.c:2768
10
#7 0x000000000092eb56 in finish_xact_command () at
/home/zl/workspace_pg962/postgres/src/backend/tcop/postgres.c:2459
11
#8 0x000000000092cb37 in exec_simple_query (query_string=0x25d0de0 "insert
into x values(1,3),(1,4),(1,5),(1,6);") at
/home/zl/workspace_pg962/postgres/src/backend/tcop/postgres.c:1132
12
#9 0x000000000092bdf0 in PostgresMain (argc=1, argv=0x257f308,
dbname=0x257f168 "postgres", username=0x2550e10 "postgres") at
/home/zl/workspace_pg962/postgres/src/backend/tcop/postgres.c:4066
13
#10 0x0000000000879426 in BackendRun (port=0x2575650) at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:4317
14
#11 0x0000000000878a50 in BackendStartup (port=0x2575650) at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:3989
15
#12 0x000000000087509c in ServerLoop () at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:1729
16
#13 0x0000000000872612 in PostmasterMain (argc=3, argv=0x254ec80) at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:1337
17
#14 0x0000000000795228 in main (argc=3, argv=0x254ec80) at
/home/zl/workspace_pg962/postgres/src/backend/main/main.c:228   I think it is impossible when i print something about
theassertion.
 

1
(gdb) p MyProc->syncRepLinks
2
$1 = {prev = 0x0, next = 0x0}   So, what causes this assertion? To solve my doubts, i add some debug
log. See the macro DEBUG_SUNNY as below.

1
/*
2* Wait for synchronous replication, if requested by user.
3*
4* Initially backends start in state SYNC_REP_NOT_WAITING and then
5* change that state to SYNC_REP_WAITING before adding ourselves
6* to the wait queue. During SyncRepWakeQueue() a WALSender changes
7* the state to SYNC_REP_WAIT_COMPLETE once replication is confirmed.
8* This backend then resets its state to SYNC_REP_NOT_WAITING.
9*
10* 'lsn' represents the LSN to wait for.  'commit' indicates whether this
LSN
11* represents a commit record.  If it doesn't, then we wait only for the
WAL
12* to be flushed if synchronous_commit is set to the higher level of
13* remote_apply, because only commit records provide apply feedback.
14*/
15
void
16
SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
17
{
18   char       *new_status = NULL;
19   const char *old_status;
20   int         mode;
21
22   /* Cap the level for anything other than commit to remote flush only.
*/
23   if (commit)
24       mode = SyncRepWaitMode;
25   else
26       mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
27
28   /*
29    * Fast exit if user has not requested sync replication, or there are
no
30    * sync replication standby names defined. Note that those standbys
don't
31    * need to be connected.
32    */
33   if (!SyncRepRequested() || !SyncStandbysDefined())
34       return;
35
36   Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
37   Assert(WalSndCtl != NULL);
38
39   LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
40   Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
41
42   /*
43    * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is
not
44    * set.  See SyncRepUpdateSyncStandbysDefined.
45    *
46    * Also check that the standby hasn't already replied. Unlikely race
47    * condition but we'll be fetching that cache line anyway so it's
likely
48    * to be a low cost check.
49    */
50   if (!WalSndCtl->sync_standbys_defined ||
51       lsn <= WalSndCtl->lsn[mode])
52   {
53       LWLockRelease(SyncRepLock);
54       return;
55   }
56
57   /*
58    * Set our waitLSN so WALSender will know when to wake us, and add
59    * ourselves to the queue.
60    */
61   MyProc->waitLSN = lsn;
62   MyProc->syncRepState = SYNC_REP_WAITING;
63   SyncRepQueueInsert(mode);
64   Assert(SyncRepQueueIsOrderedByLSN(mode));
65   LWLockRelease(SyncRepLock);
66
67   /* Alter ps display to show waiting for sync rep. */
68   if (update_process_title)
69   {
70       int         len;
71
72       old_status = get_ps_display(&len);
73       new_status = (char *) palloc(len + 32 + 1);
74       memcpy(new_status, old_status, len);
75       sprintf(new_status + len, " waiting for %X/%X",
76               (uint32) (lsn >> 32), (uint32) lsn);
77       set_ps_display(new_status, false);
78       new_status[len] = '\0'; /* truncate off " waiting ..." */
79   }
80
81   /*
82    * Wait for specified LSN to be confirmed.
83    *
84    * Each proc has its own wait latch, so we perform a normal latch
85    * check/wait loop here.
86    */
87   for (;;)
88   {
89       /* Must reset the latch before testing state. */
90       ResetLatch(MyLatch);
91
92       /*
93        * Acquiring the lock is not needed, the latch ensures proper
94        * barriers. If it looks like we're done, we must really be done,
95        * because once walsender changes the state to
SYNC_REP_WAIT_COMPLETE,
96        * it will never update it again, so we can't be seeing a stale
value
97        * in that case.
98        */
99       if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
100           break;
101
102       /*
103        * If a wait for synchronous replication is pending, we can
neither
104        * acknowledge the commit nor raise ERROR or FATAL.  The latter
would
105        * lead the client to believe that the transaction aborted, which
is
106        * not true: it's already committed locally. The former is no good
107        * either: the client has requested synchronous replication, and
is
108        * entitled to assume that an acknowledged commit is also
replicated,
109        * which might not be true. So in this case we issue a WARNING
(which
110        * some clients may be able to interpret) and shut off further
output.
111        * We do NOT reset ProcDiePending, so that the process will die
after
112        * the commit is cleaned up.
113        */
114       if (ProcDiePending)
115       {
116           ereport(WARNING,
117                   (errcode(ERRCODE_ADMIN_SHUTDOWN),
118                    errmsg("canceling the wait for synchronous replication
and terminating connection due to administrator command"),
119                    errdetail("The transaction has already committed
locally, but might not have been replicated to the standby.")));
120           whereToSendOutput = DestNone;
121           SyncRepCancelWait();
122           break;
123       }
124
125       /*
126        * It's unclear what to do if a query cancel interrupt arrives.
We
127        * can't actually abort at this point, but ignoring the interrupt
128        * altogether is not helpful, so we just terminate the wait with a
129        * suitable warning.
130        */
131       if (QueryCancelPending)
132       {
133           QueryCancelPending = false;
134           ereport(WARNING,
135                   (errmsg("canceling wait for synchronous replication due
to user request"),
136                    errdetail("The transaction has already committed
locally, but might not have been replicated to the standby.")));
137           SyncRepCancelWait();
138           break;
139       }
140
141       /*
142        * If the postmaster dies, we'll probably never get an
143        * acknowledgement, because all the wal sender processes will exit.
So
144        * just bail out.
145        */
146       if (!PostmasterIsAlive())
147       {
148           ProcDiePending = true;
149           whereToSendOutput = DestNone;
150           SyncRepCancelWait();
151           break;
152       }
153
154       /*
155        * Wait on latch.  Any condition that should wake us up will set
the
156        * latch, so no need for timeout.
157        */
158       WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1);
159   }
160
161
#ifdef DEBUG_SUNNY
162   if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
163       ereport(LOG,
164           (errmsg("[SUNNY] It is impossible. [lsn] %X/%X [prev] %p [next]
%p",
165               (uint32) (MyProc->waitLSN >> 32), (uint32)
MyProc->waitLSN,
166               MyProc->syncRepLinks.prev, MyProc->syncRepLinks.next)));
167
#endif
168
169   /*
170    * WalSender has checked our LSN and has removed us from queue. Clean
up
171    * state and leave.  It's OK to reset these shared memory fields
without
172    * holding SyncRepLock, because any walsenders will ignore us anyway
when
173    * we're not on the queue.
174    */
175   Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
176   MyProc->syncRepState = SYNC_REP_NOT_WAITING;
177   MyProc->waitLSN = 0;
178
179   if (new_status)
180   {
181       /* Reset ps display */
182       set_ps_display(new_status, false);
183       pfree(new_status);
184   }
185
}
186
187
/*
188* Insert MyProc into the specified SyncRepQueue, maintaining sorted
invariant.
189*
190* Usually we will go at tail of queue, though it's possible that we
arrive
191* here out of order, so start at tail and work back to insertion point.
192*/
193
static void
194
SyncRepQueueInsert(int mode)
195
{
196   PGPROC     *proc;
197
198   Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
199   proc = (PGPROC *) SHMQueuePrev(&(WalSndCtl->SyncRepQueue[mode]),
200                                  &(WalSndCtl->SyncRepQueue[mode]),
201                                  offsetof(PGPROC, syncRepLinks));
202
203   while (proc)
204   {
205       /*
206        * Stop at the queue element that we should after to ensure the
queue
207        * is ordered by LSN.
208        */
209       if (proc->waitLSN < MyProc->waitLSN)
210           break;
211
212       proc = (PGPROC *) SHMQueuePrev(&(WalSndCtl->SyncRepQueue[mode]),
213                                      &(proc->syncRepLinks),
214                                      offsetof(PGPROC, syncRepLinks));
215   }
216
217   if (proc)
218       SHMQueueInsertAfter(&(proc->syncRepLinks),
&(MyProc->syncRepLinks));
219   else
220       SHMQueueInsertAfter(&(WalSndCtl->SyncRepQueue[mode]),
&(MyProc->syncRepLinks));
221
222
#ifdef DEBUG_SUNNY
223   ereport(LOG,
224       (errmsg("[SUNNY] Insert [lsn] %X/%X [prev] %p [next] %p",
225           (uint32) (MyProc->waitLSN >> 32), (uint32) MyProc->waitLSN,
226           MyProc->syncRepLinks.prev, MyProc->syncRepLinks.next)));
227
#endif
228
}
229
230
/*
231* Walk the specified queue from head.  Set the state of any backends that
232* need to be woken, remove them from the queue, and then wake them.
233* Pass all = true to wake whole queue; otherwise, just wake up to
234* the walsender's LSN.
235*
236* Must hold SyncRepLock.
237*/
238
static int
239
SyncRepWakeQueue(bool all, int mode)
240
{
241   volatile WalSndCtlData *walsndctl = WalSndCtl;
242   PGPROC     *proc = NULL;
243   PGPROC     *thisproc = NULL;
244   int         numprocs = 0;
245
246   Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
247   Assert(SyncRepQueueIsOrderedByLSN(mode));
248
249   proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
250                                  &(WalSndCtl->SyncRepQueue[mode]),
251                                  offsetof(PGPROC, syncRepLinks));
252
253   while (proc)
254   {
255       /*
256        * Assume the queue is ordered by LSN
257        */
258       if (!all && walsndctl->lsn[mode] < proc->waitLSN)
259           return numprocs;
260
261       /*
262        * Move to next proc, so we can delete thisproc from the queue.
263        * thisproc is valid, proc may be NULL after this.
264        */
265       thisproc = proc;
266       proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
267                                      &(proc->syncRepLinks),
268                                      offsetof(PGPROC, syncRepLinks));
269
270       /*
271        * Remove thisproc from queue.
272        */
273       SHMQueueDelete(&(thisproc->syncRepLinks));
274
275       /*
276        * Set state to complete; see SyncRepWaitForLSN() for discussion
of
277        * the various states.
278        */
279       thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE;
280
281
#ifdef DEBUG_SUNNY
282       ereport(LOG,
283           (errmsg("[SUNNY] Delete [lsn] %X/%X [prev] %p [next] %p",
284               (uint32) (thisproc->waitLSN >> 32), (uint32)
thisproc->waitLSN,
285               thisproc->syncRepLinks.prev,
thisproc->syncRepLinks.next)));
286
#endif
287       /*
288        * Wake only when we have set state and removed from queue.
289        */
290       SetLatch(&(thisproc->procLatch));
291
292       numprocs++;
293   }
294
295   return numprocs;
296
}
   Then i made pressure test by benchmark, and got some log as below.

1
2017-06-28 02:51:33.868
CST,"benchmarksql","postgres",77171,"10.20.16.227:43879",595271c7.12d73,176627,"COMMIT
PREPARED",2017-06-27 22:55:03 CST,769/7909,0,LOG,00000,"[SUNNY] Insert [lsn]
32/7131AB58 [prev] 0x2b614633d6a8 [next] 0x2b61446f1b18",,,,,,"COMMIT
PREPARED 'T11860878'",,,"pgxc"
2
3
2017-06-28 02:51:33.868
CST,"benchmarksql","postgres",77171,"10.20.16.227:43879",595271c7.12d73,176628,"COMMIT
PREPARED waiting for 32/7131AB58",2017-06-27 22:55:03
CST,769/7909,0,LOG,00000,"[SUNNY] It is impossible. [lsn] 32/7131AB58 [prev]
0x2b614633d6a8 [next] 0x2b61446f1b18",,,,,,"COMMIT PREPARED
'T11860878'",,,"pgxc"
4
5
2017-06-28 02:51:33.870
CST,"pgxcn","",52856,"10.20.16.214:43102",59522322.ce78,4758326,"streaming
32/7131AB58",2017-06-27 17:19:30 CST,1/0,0,LOG,00000,"[SUNNY] Delete [lsn]
32/7131AB58 [prev] (nil) [next] (nil)",,,,,,,,,"slave"   You can find the "DELETE" log is later than the "IMPOSSIBLE"
log.What 
conditions does this happen under?


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 At last, i have made this bug reappear by GDB follow these steps.
 
1. In wal sender process, add a breakpoint at code line
"SHMQueueDelete(&(thisproc->syncRepLinks)); " of "SyncRepWakeQueue".

SUNNY
1
/*
2* Walk the specified queue from head.  Set the state of any backends that
3* need to be woken, remove them from the queue, and then wake them.
4* Pass all = true to wake whole queue; otherwise, just wake up to
5* the walsender's LSN.
6*
7* Must hold SyncRepLock.
8*/
9
static int
10
SyncRepWakeQueue(bool all, int mode)
11
{
12   volatile WalSndCtlData *walsndctl = WalSndCtl;
13   PGPROC     *proc = NULL;
14   PGPROC     *thisproc = NULL;
15   int         numprocs = 0;
16
17   Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
18   Assert(SyncRepQueueIsOrderedByLSN(mode));
19
20   proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
21                                  &(WalSndCtl->SyncRepQueue[mode]),
22                                  offsetof(PGPROC, syncRepLinks));
23
24   while (proc)
25   {
26       /*
27        * Assume the queue is ordered by LSN
28        */
29       if (!all && walsndctl->lsn[mode] < proc->waitLSN)
30           return numprocs;
31
32       /*
33        * Move to next proc, so we can delete thisproc from the queue.
34        * thisproc is valid, proc may be NULL after this.
35        */
36       thisproc = proc;
37       proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
38                                      &(proc->syncRepLinks),
39                                      offsetof(PGPROC, syncRepLinks));
40
41       /*
42        * Set state to complete; see SyncRepWaitForLSN() for discussion
of
43        * the various states.
44        */
45       thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE;
46
47       /*
48        * Remove thisproc from queue.
49        */
50       SHMQueueDelete(&(thisproc->syncRepLinks));
51
52
#ifdef DEBUG_SUNNY
53       ereport(LOG,
54           (errmsg("[SUNNY] Delete [lsn] %X/%X [prev] %p [next] %p",
55               (uint32) (thisproc->waitLSN >> 32), (uint32)
thisproc->waitLSN,
56               thisproc->syncRepLinks.prev,
thisproc->syncRepLinks.next)));
57
#endif
58       /*
59        * Wake only when we have set state and removed from queue.
60        */
61       SetLatch(&(thisproc->procLatch));
62
63       numprocs++;
64   }
65
66   return numprocs;
67
}
2. In backend process, add a breakpoint at code line "if
(MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)" of "SyncRepWaitForLSN".

1
/*
2* Wait for synchronous replication, if requested by user.
3*
4* Initially backends start in state SYNC_REP_NOT_WAITING and then
5* change that state to SYNC_REP_WAITING before adding ourselves
6* to the wait queue. During SyncRepWakeQueue() a WALSender changes
7* the state to SYNC_REP_WAIT_COMPLETE once replication is confirmed.
8* This backend then resets its state to SYNC_REP_NOT_WAITING.
9*
10* 'lsn' represents the LSN to wait for.  'commit' indicates whether this
LSN
11* represents a commit record.  If it doesn't, then we wait only for the
WAL
12* to be flushed if synchronous_commit is set to the higher level of
13* remote_apply, because only commit records provide apply feedback.
14*/
15
void
16
SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
17
{
18   char       *new_status = NULL;
19   const char *old_status;
20   int         mode;
21
22   /* Cap the level for anything other than commit to remote flush only.
*/
23   if (commit)
24       mode = SyncRepWaitMode;
25   else
26       mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
27
28   /*
29    * Fast exit if user has not requested sync replication, or there are
no
30    * sync replication standby names defined. Note that those standbys
don't
31    * need to be connected.
32    */
33   if (!SyncRepRequested() || !SyncStandbysDefined())
34       return;
35
36   Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
37   Assert(WalSndCtl != NULL);
38
39   LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
40   Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
41
42   /*
43    * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is
not
44    * set.  See SyncRepUpdateSyncStandbysDefined.
45    *
46    * Also check that the standby hasn't already replied. Unlikely race
47    * condition but we'll be fetching that cache line anyway so it's
likely
48    * to be a low cost check.
49    */
50   if (!WalSndCtl->sync_standbys_defined ||
51       lsn <= WalSndCtl->lsn[mode])
52   {
53       LWLockRelease(SyncRepLock);
54       return;
55   }
56
57   /*
58    * Set our waitLSN so WALSender will know when to wake us, and add
59    * ourselves to the queue.
60    */
61   MyProc->waitLSN = lsn;
62   MyProc->syncRepState = SYNC_REP_WAITING;
63   SyncRepQueueInsert(mode);
64   Assert(SyncRepQueueIsOrderedByLSN(mode));
65   LWLockRelease(SyncRepLock);
66
67   /* Alter ps display to show waiting for sync rep. */
68   if (update_process_title)
69   {
70       int         len;
71
72       old_status = get_ps_display(&len);
73       new_status = (char *) palloc(len + 32 + 1);
74       memcpy(new_status, old_status, len);
75       sprintf(new_status + len, " waiting for %X/%X",
76               (uint32) (lsn >> 32), (uint32) lsn);
77       set_ps_display(new_status, false);
78       new_status[len] = '\0'; /* truncate off " waiting ..." */
79   }
80
81   /*
82    * Wait for specified LSN to be confirmed.
83    *
84    * Each proc has its own wait latch, so we perform a normal latch
85    * check/wait loop here.
86    */
87   for (;;)
88   {
89       /* Must reset the latch before testing state. */
90       ResetLatch(MyLatch);
91
92       /*
93        * Acquiring the lock is not needed, the latch ensures proper
94        * barriers. If it looks like we're done, we must really be done,
95        * because once walsender changes the state to
SYNC_REP_WAIT_COMPLETE,
96        * it will never update it again, so we can't be seeing a stale
value
97        * in that case.
98        */
99       if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
100           break;
101
102       /*
103        * If a wait for synchronous replication is pending, we can
neither
104        * acknowledge the commit nor raise ERROR or FATAL.  The latter
would
105        * lead the client to believe that the transaction aborted, which
is
106        * not true: it's already committed locally. The former is no good
107        * either: the client has requested synchronous replication, and
is
108        * entitled to assume that an acknowledged commit is also
replicated,
109        * which might not be true. So in this case we issue a WARNING
(which
110        * some clients may be able to interpret) and shut off further
output.
111        * We do NOT reset ProcDiePending, so that the process will die
after
112        * the commit is cleaned up.
113        */
114       if (ProcDiePending)
115       {
116           ereport(WARNING,
117                   (errcode(ERRCODE_ADMIN_SHUTDOWN),
118                    errmsg("canceling the wait for synchronous replication
and terminating connection due to administrator command"),
119                    errdetail("The transaction has already committed
locally, but might not have been replicated to the standby.")));
120           whereToSendOutput = DestNone;
121           SyncRepCancelWait();
122           break;
123       }
124
125       /*
126        * It's unclear what to do if a query cancel interrupt arrives.
We
127        * can't actually abort at this point, but ignoring the interrupt
128        * altogether is not helpful, so we just terminate the wait with a
129        * suitable warning.
130        */
131       if (QueryCancelPending)
132       {
133           QueryCancelPending = false;
134           ereport(WARNING,
135                   (errmsg("canceling wait for synchronous replication due
to user request"),
136                    errdetail("The transaction has already committed
locally, but might not have been replicated to the standby.")));
137           SyncRepCancelWait();
138           break;
139       }
140
141       /*
142        * If the postmaster dies, we'll probably never get an
143        * acknowledgement, because all the wal sender processes will exit.
So
144        * just bail out.
145        */
146       if (!PostmasterIsAlive())
147       {
148           ProcDiePending = true;
149           whereToSendOutput = DestNone;
150           SyncRepCancelWait();
151           break;
152       }
153
154       /*
155        * Wait on latch.  Any condition that should wake us up will set
the
156        * latch, so no need for timeout.
157        */
158       WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1);
159   }
160
161
#ifdef DEBUG_SUNNY
162   if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
163       ereport(LOG,
164           (errmsg("[SUNNY] It is impossible. [lsn] %X/%X [prev] %p [next]
%p",
165               (uint32) (MyProc->waitLSN >> 32), (uint32)
MyProc->waitLSN,
166               MyProc->syncRepLinks.prev, MyProc->syncRepLinks.next)));
167
#endif
168
169   /*
170    * WalSender has checked our LSN and has removed us from queue. Clean
up
171    * state and leave.  It's OK to reset these shared memory fields
without
172    * holding SyncRepLock, because any walsenders will ignore us anyway
when
173    * we're not on the queue.
174    */
175   Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
176   MyProc->syncRepState = SYNC_REP_NOT_WAITING;
177   MyProc->waitLSN = 0;
178
179   if (new_status)
180   {
181       /* Reset ps display */
182       set_ps_display(new_status, false);
183       pfree(new_status);
184   }
185
}
3. Execute a SQL whatever will generate tansaction log by psql.

.
1
[zl@INTEL175 ~/workspace_pg962/project]$ psql -p 8000 -U postgres
2
psql (9.6.2)
3
Type "help" for help.
4
5
postgres=# insert into x values(1,3),(1,4),(1,5),(1,6);
4. Hold the breakpoint in wal sender process and step next in backend
process. Then a assertion core file will be found.

1
(gdb) bt
2
#0  0x00007fa769cb41d7 in raise () from /lib64/libc.so.6
3
#1  0x00007fa769cb58c8 in abort () from /lib64/libc.so.6
4
#2  0x0000000000af0699 in ExceptionalCondition (conditionName=0xcdc111
"!(SHMQueueIsDetached(&(MyProc->syncRepLinks)))", errorType=0xb6c443
"FailedAssertion", 
5   fileName=0xcdc140
"/home/zl/workspace_pg962/postgres/src/backend/replication/syncrep.c",
lineNumber=294) at
/home/zl/workspace_pg962/postgres/src/backend/utils/error/assert.c:54
6
#3  0x00000000008c7e94 in SyncRepWaitForLSN (lsn=50438264, commit=1 '\001')
at /home/zl/workspace_pg962/postgres/src/backend/replication/syncrep.c:294
7
#4  0x000000000056ed11 in RecordTransactionCommit () at
/home/zl/workspace_pg962/postgres/src/backend/access/transam/xact.c:1343
8
#5  0x0000000000568c96 in CommitTransaction () at
/home/zl/workspace_pg962/postgres/src/backend/access/transam/xact.c:2041
9
#6  0x0000000000568717 in CommitTransactionCommand () at
/home/zl/workspace_pg962/postgres/src/backend/access/transam/xact.c:2768
10
#7  0x000000000092eb56 in finish_xact_command () at
/home/zl/workspace_pg962/postgres/src/backend/tcop/postgres.c:2459
11
#8  0x000000000092cb37 in exec_simple_query (query_string=0x1194de0 "insert
into x values(1,3),(1,4),(1,5),(1,6);") at
/home/zl/workspace_pg962/postgres/src/backend/tcop/postgres.c:1132
12
#9  0x000000000092bdf0 in PostgresMain (argc=1, argv=0x1143308,
dbname=0x1143168 "postgres", username=0x1114e10 "postgres") at
/home/zl/workspace_pg962/postgres/src/backend/tcop/postgres.c:4066
13
#10 0x0000000000879426 in BackendRun (port=0x1139650) at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:4317
14
#11 0x0000000000878a50 in BackendStartup (port=0x1139650) at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:3989
15
#12 0x000000000087509c in ServerLoop () at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:1729
16
#13 0x0000000000872612 in PostmasterMain (argc=3, argv=0x1112c80) at
/home/zl/workspace_pg962/postgres/src/backend/postmaster/postmaster.c:1337
17
#14 0x0000000000795228 in main (argc=3, argv=0x1112c80) at
/home/zl/workspace_pg962/postgres/src/backend/main/main.c:228
18
(gdb) 
   Is this a bug? And how to slove it? Looking forward to your reply.
Thanks!   Sorry about my poor english O(∩_∩)O

                                           Const Sunny                                           2017-6-29
                             


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14721: Assertion of synchronous replication

From
Thomas Munro
Date:
On Thu, Jun 29, 2017 at 2:36 PM,  <const_sunny@126.com> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14721
> Logged by:          Const Zhang
> Email address:      const_sunny@126.com
> PostgreSQL version: 9.6.2
> Operating system:   CentOS7

Hi Const,

Thanks for the detailed report.  What type of CPU is this running on?

>     I have found a bug about synchronous replication.
>     At first, see the stack of the core file.
>
> 1
> (gdb) bt
> 2
> #0 0x00007fe9aab2e1d7 in raise () from /lib64/libc.so.6
> 3
> #1 0x00007fe9aab2f8c8 in abort () from /lib64/libc.so.6
> 4
> #2 0x0000000000af0699 in ExceptionalCondition (conditionName=0xcdc111
> "!(SHMQueueIsDetached(&(MyProc->syncRepLinks)))", errorType=0xb6c443
> "FailedAssertion",
> 5
> fileName=0xcdc140
> "/home/zl/workspace_pg962/postgres/src/backend/replication/syncrep.c",
> lineNumber=294) at

I assume this is line 298 in master today.  There is some interesting
IPC going on here.  SyncRepWakeQueue() does three things while holding
SyncRepLock in some other process:
 thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE; SHMQueueDelete(&(thisproc->syncRepLinks));
SetLatch(&(thisproc->procLatch));

Meanwhile SyncRepWaitForLSN() in your process does this:
       /*        * Acquiring the lock is not needed, the latch ensures proper        * barriers. If it looks like we're
done,we must really be done,        * because once walsender changes the state to SYNC_REP_WAIT_COMPLETE,        * it
willnever update it again, so we can't be seeing a stale value        * in that case.        */       if
(MyProc->syncRepState== SYNC_REP_WAIT_COMPLETE)           break;
 

... OK, then outside the loop:
   /*    * WalSender has checked our LSN and has removed us from queue. Clean up    * state and leave.  It's OK to
resetthese shared memory fields without    * holding SyncRepLock, because any walsenders will ignore us anyway when
*we're not on the queue.    */   Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
 

I wonder your CPU core was able to see syncRepState ==
SYNC_REP_WAIT_COMPLETE, but not yet see MyProc->syncRepLinks's next
and previous members as 0.  When you look at them in your debugger you
see them as zero, but that's a bit later:

> (gdb) p MyProc->syncRepLinks
> 2
> $1 = {prev = 0x0, next = 0x0}

I may be way off base here, and haven't studied all of your report
yet.  But my first thought is: shouldn't SyncRepWakeQueue() do things
the other way around, with a barrier in between, like this:
 SHMQueueDelete(&(thisproc->syncRepLinks)); pg_write_barrier(); thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE;
SetLatch(&(thisproc->procLatch));

... and then shouldn't SyncRepWaitForLSN() have a pg_read_barrier()
inserted before Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)))?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14721: Assertion of synchronous replication

From
Thomas Munro
Date:
On Thu, Jun 29, 2017 at 4:27 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>         /*
>          * Acquiring the lock is not needed, the latch ensures proper
>          * barriers. If it looks like we're done, we must really be done,
>          * because once walsender changes the state to SYNC_REP_WAIT_COMPLETE,
>          * it will never update it again, so we can't be seeing a stale value
>          * in that case.
>          */

Yeah, counting on the latch for free barriers doesn't work if you
happen to see SYNC_REP_WAIT_COMPLETE first time through the loop, or
if you see it after a spurious signal woke you and then it's
immediately set to SYNC_REP_WAIT_COMPLETE.  In those cases, the
following Assert statement is making an assertion about cache
coherency that doesn't work even on a friendly TSO system.

Can you reproduce the problem with this experimental patch applied?

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

Re: [BUGS] BUG #14721: Assertion of synchronous replication

From
Heikki Linnakangas
Date:
On 06/29/2017 08:11 AM, Thomas Munro wrote:
> On Thu, Jun 29, 2017 at 4:27 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>         /*
>>          * Acquiring the lock is not needed, the latch ensures proper
>>          * barriers. If it looks like we're done, we must really be done,
>>          * because once walsender changes the state to SYNC_REP_WAIT_COMPLETE,
>>          * it will never update it again, so we can't be seeing a stale value
>>          * in that case.
>>          */
>
> Yeah, counting on the latch for free barriers doesn't work if you
> happen to see SYNC_REP_WAIT_COMPLETE first time through the loop, or
> if you see it after a spurious signal woke you and then it's
> immediately set to SYNC_REP_WAIT_COMPLETE.  In those cases, the
> following Assert statement is making an assertion about cache
> coherency that doesn't work even on a friendly TSO system.

Ah yes, that's clearly broken.

> Can you reproduce the problem with this experimental patch applied?

I was able to reproduce this, after adding some sleeps (see attached). 
It must be pretty hard to hit it normally, or we would've gotten reports 
earlier. Although it only really shows with assertions enabled, so that 
could also be why.

In your patch, I think the pg_read_barrier() is only necessary, if 
assertions are enabled. Then again, better safe than sorry, and at least 
on my x86-64 machine with gcc, it doesn't actually change the produced 
machine code at all.

I committed your patch and backported to 9.5, with some additional 
comments. Thanks for the report Const!

- Heikki


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment