Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring) - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring) |
Date | |
Msg-id | 55B66B3A.3070906@iki.fi Whole thread Raw |
In response to | Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring) (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Memory leak in XLOG reader facility when decoding records
(caused by WAL refactoring)
|
List | pgsql-bugs |
On 07/27/2015 04:40 AM, Michael Paquier wrote: > On Mon, Jul 27, 2015 at 12:32 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2015-07-23 22:50:12 +0900, Michael Paquier wrote: >>> >>> While running valgring on pg_rewind, I have noticed that each new call >>> to XLogReadRecord leaks XLogReaderState->main_data and >>> XLogReaderState->blocks[block_id].data because each one of them is >>> palloc'd to store respectively the main data of the record and the >>> data attached to each block. I think that the most correct fix to >>> prevent the leak is to free those pointers when calling ResetDecoder() >>> which is used to reset a reader state between two records. >> >> I don't think that'd be a good fix. Isn't the idea here that we only >> allocate new buffers when the previous one is too small? Your fix will >> greatly increase the memory management overhead. > > OK. > >>> ==46805== 154 bytes in 1 blocks are definitely lost in loss record 103 > of 167 >>> ==46805== at 0x6B1B: malloc (in >>> > /usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so) >>> ==46805== by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30) >>> ==46805== by 0x10000DE79: palloc (fe_memutils.c:118) >>> ==46805== by 0x1000052A0: DecodeXLogRecord (xlogreader.c:1203) >>> ==46805== by 0x100003AA9: XLogReadRecord (xlogreader.c:461) >>> ==46805== by 0x1000022F0: extractPageMap (parsexlog.c:78) >>> ==46805== by 0x10000188C: main (pg_rewind.c:280) >> >> Uh. This is the leak? >> if (state->main_data_len > 0) >> { >> if (!state->main_data || state->main_data_len > > state->main_data_bufsz) >> { >> if (state->main_data)g >> pfree(state->main_data); >> state->main_data_bufsz = state->main_data_len; >> state->main_data = > palloc(state->main_data_bufsz); <--- here >> } >> memcpy(state->main_data, ptr, state->main_data_len); >> ptr += state->main_data_len; >> } >> >> Where/When are we leaking memory here? The previously used memory is >> freed before we allocate a larger buffer. > > Yep, it is here, and it is not related to the calls of XLogReadRecord > actually, that's a problem in XLogReaderFree that relies on max_block_id, > which gets reinitialized to -1 between two records in ResetDecoder, instead > of XLR_MAX_BLOCK_ID to clean up the data blocks when freeing the XLOG > reader (note: looking at memory leaks at night can lead to incorrect > conclusions). So for example if a first record uses some data blocks and > allocates buffers for some data, and then a second record uses no data > blocks, we'll leak. So that's less critical than I though as the couple > XLofReaderAllocate/XLogReaderFree is called only once in StartupXLOG, still > that's a leak. > > Attached is a self-contained script able to show the problem if anybody > wants to check by himself (the other leaks related to pg_rewind are > addressed in another thread), and a patch that takes care of it. Thanks, committed. The leak was not actually in the "main_data" handling, as quoted in the above snippet, but in the per-block buffers. - Heikki
pgsql-bugs by date: