Thread: Re: [COMMITTERS] pgsql: Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).

rhaas@postgresql.org (Robert Haas) writes:
> Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).

With this patch, a CLOBBER_CACHE_ALWAYS build starts falling apart all
over the place :-(.  Looks like you blew the memory management somehow;
it appears to be using a previously pfree'd pointer:

#3  0x00000000006cde28 in pfree (pointer=0x7f7f7f7f7f7f7f7f) at mcxt.c:581
#4  0x00000000006ad4f3 in InvalidateTableSpaceCacheCallback (   arg=<value optimized out>, cacheid=<value optimized
out>,   tuplePtr=<value optimized out>) at spccache.c:61
 
#5  0x00000000006a63eb in InvalidateSystemCaches () at inval.c:552
#6  0x00000000006a6451 in AcceptInvalidationMessages () at inval.c:720
#7  0x00000000005fed1a in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:91
#8  0x000000000046ce09 in relation_open (relationId=1213, lockmode=1)   at heapam.c:899
#9  0x000000000046ce74 in heap_open (relationId=13656, lockmode=13656)   at heapam.c:1073
#10 0x00000000006a5210 in SearchCatCache (cache=0x1364a40, v1=1663,    v2=<value optimized out>, v3=0, v4=0) at
catcache.c:1223
#11 0x00000000006ad959 in SearchSysCache (cacheId=42, key1=1663, key2=0,    key3=0, key4=0) at syscache.c:874
#12 0x00000000006ad366 in get_tablespace (spcid=13656) at spccache.c:128
#13 get_tablespace_page_costs (spcid=13656) at spccache.c:164
#14 0x00000000005a8a80 in cost_seqscan (path=0x13cc2b0,    root=<value optimized out>, baserel=0x13ae328) at
costsize.c:186
#15 0x00000000005cb538 in create_seqscan_path (root=0x13af090, rel=0x13ae328)   at pathnode.c:404
#16 0x00000000005a48eb in set_plain_rel_pathlist (rte=<value optimized out>,    rel=<value optimized out>, root=<value
optimizedout>) at allpaths.c:258
 
#17 set_rel_pathlist (rte=<value optimized out>, rel=<value optimized out>,    root=<value optimized out>) at
allpaths.c:201
#18 0x00000000005a4b3e in set_base_rel_pathlists (root=<value optimized out>)   at allpaths.c:157
#19 make_one_rel (root=<value optimized out>) at allpaths.c:93
#20 0x00000000005b8cf0 in query_planner (root=0x13af090, tlist=0x13cbb90,    tuple_fraction=<value optimized out>,
limit_tuples=<valueoptimized out>,    cheapest_path=<value optimized out>, sorted_path=<value optimized out>,
num_groups=0x7fffa31aeec8)at planmain.c:254
 
#21 0x00000000005b9f1e in grouping_planner (root=0x13af090,    tuple_fraction=<value optimized out>) at planner.c:1087
#22 0x00000000005bbc90 in subquery_planner (glob=<value optimized out>,    parse=0x13ae440, parent_root=0x0,
hasRecursion=88'X',    tuple_fraction=<value optimized out>, subroot=<value optimized out>)   at planner.c:506
 
#23 0x00000000005bbea5 in standard_planner (parse=0x13ae440, cursorOptions=0, 
...
        regards, tom lane


I wrote:
> With this patch, a CLOBBER_CACHE_ALWAYS build starts falling apart all
> over the place :-(.  Looks like you blew the memory management somehow;
> it appears to be using a previously pfree'd pointer:

Actually, there were three different problems there:

1. Relying on a HASH_REMOVE'd hashtable entry to still be present and
valid.  This is at least bad style.  I wonder if we should tweak the
dynahash code to memset a free'd entry to 7F's like pfree does.

2. Assuming that a cache entry will remain in existence across a catalog
access.  This will work except when it doesn't, ie, when a cache flush
occurs during the table access.  You've just learned the hard way what
CLOBBER_CACHE_ALWAYS testing is good for ;-)

3. Invoking tablespace_reloptions while switched into
CacheMemoryContext.  This isn't a crasher, but it strikes me as a bad
idea because if reloptions.c happens to leak anything, that'll represent
a permanent (session-lifespan) memory leak.  And it's complicated enough
that being sure it doesn't leak anything is hard.  I think you should
invoke tablespace_reloptions in the caller's memory context, and then if
it succeeds, copy the result into CacheMemoryContext.  That would
probably require fixing the problem you noted earlier today about
making TableSpaceOpts be a valid bytea value, so that it'll be easy to
copy (then you can use datumCopy, for instance).

I fixed the first two because they were in the way of investigating
another problem, but #3 still needs your attention.
        regards, tom lane


On Wed, Jan 6, 2010 at 6:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> With this patch, a CLOBBER_CACHE_ALWAYS build starts falling apart all
>> over the place :-(.  Looks like you blew the memory management somehow;
>> it appears to be using a previously pfree'd pointer:
>
> Actually, there were three different problems there:
>
> 1. Relying on a HASH_REMOVE'd hashtable entry to still be present and
> valid.  This is at least bad style.  I wonder if we should tweak the
> dynahash code to memset a free'd entry to 7F's like pfree does.

Oops.

> 2. Assuming that a cache entry will remain in existence across a catalog
> access.  This will work except when it doesn't, ie, when a cache flush
> occurs during the table access.  You've just learned the hard way what
> CLOBBER_CACHE_ALWAYS testing is good for ;-)

Yeah.  I tried that at an earlier stage of development, but obviously
I should have retested with the final version.

> 3. Invoking tablespace_reloptions while switched into
> CacheMemoryContext.  This isn't a crasher, but it strikes me as a bad
> idea because if reloptions.c happens to leak anything, that'll represent
> a permanent (session-lifespan) memory leak.  And it's complicated enough
> that being sure it doesn't leak anything is hard.  I think you should
> invoke tablespace_reloptions in the caller's memory context, and then if
> it succeeds, copy the result into CacheMemoryContext.  That would
> probably require fixing the problem you noted earlier today about
> making TableSpaceOpts be a valid bytea value, so that it'll be easy to
> copy (then you can use datumCopy, for instance).

Hmm.  That seems a little crufty, but it makes sense.  Will do.

> I fixed the first two because they were in the way of investigating
> another problem, but #3 still needs your attention.

Thanks, and sorry for the hassle.

...Robert


On Wed, Jan 6, 2010 at 6:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 3. Invoking tablespace_reloptions while switched into
> CacheMemoryContext.  This isn't a crasher, but it strikes me as a bad
> idea because if reloptions.c happens to leak anything, that'll represent
> a permanent (session-lifespan) memory leak.  And it's complicated enough
> that being sure it doesn't leak anything is hard.

What tools do we have for identifying memory leaks?

> I think you should
> invoke tablespace_reloptions in the caller's memory context, and then if
> it succeeds, copy the result into CacheMemoryContext.  That would
> probably require fixing the problem you noted earlier today about
> making TableSpaceOpts be a valid bytea value, so that it'll be easy to
> copy (then you can use datumCopy, for instance).

I think I'll just copy Alvaro's existing pattern in relcache.c, which
just uses palloc0 and memcpy.

...Robert


Robert Haas <robertmhaas@gmail.com> writes:
> What tools do we have for identifying memory leaks?

User complaints :-(

> I think I'll just copy Alvaro's existing pattern in relcache.c, which
> just uses palloc0 and memcpy.

OK.
        regards, tom lane


On Wed, Jan 6, 2010 at 10:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> What tools do we have for identifying memory leaks?
>
> User complaints :-(

YGTBFKM.

It seems like we should have a tool that dumps out every memory
context in the system, with the number of allocations and frees and
number of bytes allocated and freed since the last reset.  Maybe the
time of the last reset.  You could run that before and after doing
whatever it is that might leak and compare.

...Robert


Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jan 6, 2010 at 10:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> What tools do we have for identifying memory leaks?
>> 
>> User complaints :-(

> YGTBFKM.

Not really.  Given the memory context architecture, leaks are simply not
a big deal in 99% of the system.  We just need a few coding rules like
"don't run random code in CacheMemoryContext" ;-)

> It seems like we should have a tool that dumps out every memory
> context in the system, with the number of allocations and frees and
> number of bytes allocated and freed since the last reset.  Maybe the
> time of the last reset.  You could run that before and after doing
> whatever it is that might leak and compare.

Once you've identified a place that "might leak" and a test case that
would exercise it, you've already done most of the work.  What you're
describing sounds to me like a lot of work for not much return.

Furthermore, if you do have a leaking test case and you don't know
exactly where the leak is coming from, numbers about how big the leak is
aren't any help in finding the cause.  What you really want is numbers
that are per palloc call site, which would not be simple to get.  I have
occasionally wondered about hooking up something similar to valgrind for
this; but the problem is that it would drown you in false positives
because of the number of places where we intentionally leave stuff to be
cleaned up at context reset.
        regards, tom lane