Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring) - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)
Date
Msg-id CAB7nPqRXABT05BxnVNHY+s_DabqJpOm0CcZ99+CVZ6v_Zy6x8Q@mail.gmail.com
Whole thread Raw
In response to Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)  (Andres Freund <andres@anarazel.de>)
Responses Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)
List pgsql-bugs


On Mon, Jul 27, 2015 at 12:32 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-07-23 22:50:12 +0900, Michael Paquier wrote:
>>
>> While running valgring on pg_rewind, I have noticed that each new call
>> to XLogReadRecord leaks XLogReaderState->main_data and
>> XLogReaderState->blocks[block_id].data because each one of them is
>> palloc'd to store respectively the main data of the record and the
>> data attached to each block. I think that the most correct fix to
>> prevent the leak is to free those pointers when calling ResetDecoder()
>> which is used to reset a reader state between two records.
>
> I don't think that'd be a good fix. Isn't the idea here that we only
> allocate new buffers when the previous one is too small? Your fix will
> greatly increase the memory management overhead.

OK.

>> ==46805== 154 bytes in 1 blocks are definitely lost in loss record 103 of 167
>> ==46805==    at 0x6B1B: malloc (in
>> /usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
>> ==46805==    by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
>> ==46805==    by 0x10000DE79: palloc (fe_memutils.c:118)
>> ==46805==    by 0x1000052A0: DecodeXLogRecord (xlogreader.c:1203)
>> ==46805==    by 0x100003AA9: XLogReadRecord (xlogreader.c:461)
>> ==46805==    by 0x1000022F0: extractPageMap (parsexlog.c:78)
>> ==46805==    by 0x10000188C: main (pg_rewind.c:280)
>
> Uh. This is the leak?
>         if (state->main_data_len > 0)
>         {
>                 if (!state->main_data || state->main_data_len > state->main_data_bufsz)
>                 {
>                         if (state->main_data)
>                                 pfree(state->main_data);
>                         state->main_data_bufsz = state->main_data_len;
>                         state->main_data = palloc(state->main_data_bufsz); <--- here
>                 }
>                 memcpy(state->main_data, ptr, state->main_data_len);
>                 ptr += state->main_data_len;
>         }
>
> Where/When are we leaking memory here? The previously used memory is
> freed before we allocate a larger buffer.

Yep, it is here, and it is not related to the calls of XLogReadRecord actually, that's a problem in XLogReaderFree that relies on max_block_id, which gets reinitialized to -1 between two records in ResetDecoder, instead of XLR_MAX_BLOCK_ID to clean up the data blocks when freeing the XLOG reader (note: looking at memory leaks at night can lead to incorrect conclusions). So for example if a first record uses some data blocks and allocates buffers for some data, and then a second record uses no data blocks, we'll leak. So that's less critical than I though as the couple XLofReaderAllocate/XLogReaderFree is called only once in StartupXLOG, still that's a leak.

Attached is a self-contained script able to show the problem if anybody wants to check by himself (the other leaks related to pg_rewind are addressed in another thread), and a patch that takes care of it.
Regards,
--
Michael
Attachment

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)
Next
From: Lacey Powers
Date:
Subject: Re: BUG #13497: Build with dtrace fails