Re: [bug fix] Cascaded standby cannot start after a clean shutdown - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [bug fix] Cascaded standby cannot start after a clean shutdown |
Date | |
Msg-id | 20180216071900.GE1174@paquier.xyz Whole thread Raw |
In response to | [bug fix] Cascaded standby cannot start after a clean shutdown ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>) |
Responses |
Re: [bug fix] Cascaded standby cannot start after a clean shutdown
RE: [bug fix] Cascaded standby cannot start after a clean shutdown |
List | pgsql-hackers |
On Wed, Feb 14, 2018 at 04:37:05AM +0000, Tsunakawa, Takayuki wrote: > The PostgreSQL version is 9.5. The cluster consists of a master, a > cascading standby (SB1), and a cascaded standby (SB2). The WAL flows > like this: master -> SB1 -> SB2. > > The user shut down SB2 and tried to restart it, but failed with the > following message: > > FATAL: invalid memory alloc request size 3075129344 > > The master and SB1 continued to run. This matches a couple of recent bug reports we have seen around like this one: https://www.postgresql.org/message-id/CAE2gYzzVZNsGn%3D-E6grO4sVQs04J02zNKQofQEO8gu8%3DqCFR%2BQ%40mail.gmail.com I recalled as well this one but in this case the user shot his own foot with the failover scenario he designed: https://www.postgresql.org/message-id/CAAc9rOyKAyzipiq7ee1%3DVbcRy2fRqV_hRujLbC0mrBkL07%3D7wQ%40mail.gmail.com > CAUSE > ============================== > > total_len in the code below was about 3GB, so palloc() rejected the > memory allocation request. Yes, it is limited to 1GB. > [xlogreader.c] > record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ); > total_len = record->xl_tot_len; > ... > /* > * Enlarge readRecordBuf as needed. > */ > if (total_len > state->readRecordBufSize && > !allocate_recordbuf(state, total_len)) > { > > Why was XLogRecord->xl_tot_len so big? That's because the XLOG reader > read the garbage portion in a WAL file, which is immediately after the > end of valid WAL records. > > Why was there the garbage? Because the cascading standby sends just > the valid WAL records, not the whole WAL blocks, to the cascaded > standby. When the cascaded standby receives those WAL records and > write them in a recycled WAL file, the last WAL block in the file > contains the mix of valid WAL records and the garbage left over. Wait a minute here, when recycled past WAL segments would be filled with zeros before being used. > Why did the XLOG reader try to allocate memory for a garbage? Doesn't > it stop reading the WAL? As the following code shows, when the WAL > record header crosses a WAL block boundary, the XLOG reader first > allocates memory for the whole record, and then checks the validity of > the record header as soon as it reads the entire header. > > [...] > > FIX > ============================== > > One idea is to defer the memory allocation until the entire record > header is read and ValidXLogRecordHeader() is called. However, > ValidXLogRecordHeader() might misjudge the validity if the garbage > contains xl_prev seeming correct. It seems to me that the consolidation of the page read should happen directly in xlogreader.c and not even in one of its callbacks so as no garbage data is presented back to the caller using its own XLogReader. I think that you need to initialize XLogReaderState->readBuf directly in ReadPageInternal() before reading a page and you should be good. With your patch you get visibly only one portion of things addressed, what of other code paths using xlogreader.c's APIs like pg_rewind, 2PC code and such? > FYI, the following unsolved problem may be solved, too. > > https://www.postgresql.org/message-id/CAE2gYzzVZNsGn%3D-E6grO4sVQs04J02zNKQofQEO8gu8%3DqCFR%2BQ%40mail.gmail.com Yeah, I noticed this one too before going through your message in details ;) Please note that your patch has some unnecessary garbage in two places: - * Portions Copyright (c) 2010-2018, PostgreSQL Global Development Group + * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group [...] - * Only superusers and members of pg_read_all_stats can see details. - * Other users only get the pid value + * Only superusers can see details. Other users only get the pid valu You may want to indent your patch as well before sending it. + if (zbuffer == NULL) + zbuffer = palloc0(XLOG_BLCKSZ); You could just use a static buffer which is MemSet'd with 0 only if necessary. -- Michael
Attachment
pgsql-hackers by date: