Re: Getting better results from valgrind leak tracking - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Getting better results from valgrind leak tracking
Date
Msg-id 20210318031536.lbybxu2jttdhmgjw@alap3.anarazel.de
Whole thread Raw
In response to Re: Getting better results from valgrind leak tracking  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2021-03-17 22:33:59 -0400, Tom Lane wrote:
> >> I've not yet tried to grok the comment that purports to justify it,
> >> but I fail to see why it'd ever be useful to drop values inherited
> >> from the postmaster.
> 
> > I can't really make sense of it either. I think it may be trying to
> > restore the GUC state to what it would have been in postmaster,
> > disregarding all the settings that were set as part of PostgresInit()
> > etc?
> 
> At least in a non-EXEC_BACKEND build, the most reliable way to reproduce
> the postmaster's settings is to do nothing whatsoever.  And I think the
> same is true for EXEC_BACKEND, really, because the guc.c subsystem is
> responsible for restoring what would have been the inherited-via-fork
> settings.  So I'm really not sure what this is on about, and I'm too
> tired to try to figure it out tonight.

The restore thing runs after we've already set and initialized GUCs,
including things like user/database default GUCs. Is see things like

==2251779== 4,560 bytes in 1 blocks are definitely lost in loss record 383 of 406
==2251779==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==2251779==    by 0x714A45: ConvertTimeZoneAbbrevs (datetime.c:4556)
==2251779==    by 0x88DE95: load_tzoffsets (tzparser.c:465)
==2251779==    by 0x88511D: check_timezone_abbreviations (guc.c:11565)
==2251779==    by 0x884633: call_string_check_hook (guc.c:11232)
==2251779==    by 0x87CB57: parse_and_validate_value (guc.c:7012)
==2251779==    by 0x87DD5F: set_config_option (guc.c:7630)
==2251779==    by 0x88397F: ProcessGUCArray (guc.c:10784)
==2251779==    by 0x32BCCF: ApplySetting (pg_db_role_setting.c:256)
==2251779==    by 0x874CA2: process_settings (postinit.c:1163)
==2251779==    by 0x874A0B: InitPostgres (postinit.c:1048)
==2251779==    by 0x60129A: BackgroundWorkerInitializeConnectionByOid (postmaster.c:5681)
==2251779==    by 0x2BB283: ParallelWorkerMain (parallel.c:1374)
==2251779==    by 0x5EBA6A: StartBackgroundWorker (bgworker.c:864)
==2251779==    by 0x6014FE: do_start_bgworker (postmaster.c:5802)
==2251779==    by 0x6018D2: maybe_start_bgworkers (postmaster.c:6027)
==2251779==    by 0x600811: sigusr1_handler (postmaster.c:5190)
==2251779==    by 0x4DD513F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2251779==    by 0x556E865: select (select.c:41)
==2251779==    by 0x5FC0CB: ServerLoop (postmaster.c:1700)
==2251779==    by 0x5FBA68: PostmasterMain (postmaster.c:1408)
==2251779==    by 0x4F8BFD: main (main.c:209)

The BackgroundWorkerInitializeConnectionByOid() in that trace is before
the RestoreGUCState() later in ParallelWorkerMain().  But that doesn't
really explain the approach.


> In other news, I found that there's a genuine leak in
> RelationBuildLocalRelation: RelationInitTableAccessMethod
> was added in a spot where CurrentMemoryContext is CacheMemoryContext,
> which is bad because it does a syscache lookup that can lead to
> a table access which can leak some memory.  Seems easy to fix though.

Yea, that's the one I was talking about re
ParallelBlockTableScanWorkerData not being freed. While I think we
should also add that pfree, I think you're right, and we shouldn't do
RelationInitTableAccessMethod() while in CacheMemoryContext.


> The plpgsql situation looks like a mess.  As a short-term answer,
> I'm inclined to recommend adding an exclusion that will ignore anything
> allocated within plpgsql_compile().  Doing better will require a fair
> amount of rewriting.  (Although I suppose we could also consider adding
> an on_proc_exit callback that runs through and frees all the function
> cache entries.)

The error variant of this one seems like it might actually be a
practically relevant leak? As well as increasing memory usage for
compiled plpgsql functions unnecessarily, of course. The latter would be
good to fix, but the former seems like it might be a practical issue for
poolers and the like?

So I think we should do at least the reparenting thing to address that?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Getting better results from valgrind leak tracking
Next
From: Tom Lane
Date:
Subject: Re: Getting better results from valgrind leak tracking