Thread: event triggers patch breaks with -DCLOBBER_CACHE_ALWAYS

event triggers patch breaks with -DCLOBBER_CACHE_ALWAYS

From
Jeff Davis
Date:
Compiling 3a0e4d with -DCLOBBER_CACHE_ALWAYS causes initdb to segfault.

It looks like the tuple was allocated in the EventTriggerCacheContext,
and before the tuple was freed, another invalidation event came along
and wiped out that context.

It seems like the cleanest fix would be to explicitly allocate things in
the EventTriggerCacheContext, rather than switching the context.

However, it's not 100% clear to me that this is really a problem with
the event trigger patch itself. RelationBuildDesc is doing an allocation
in whatever the caller's context is, and freeing it. But in the
meantime, it does it's own catalog accesses and otherwise executes quite
a lot of code, and assumes that the memory context will stay intact the
entire time. Maybe that's a reasonable assumption, but as seen here, it
leaves room for error.

While looking at it, I observe that no context is passed to
hash_create(), and no hash_destroy() is called. Unless I'm missing
something, that's a leak in TopMemoryContext.

Regards,
    Jeff Davis

#0  0x00000000004683a2 in extractRelOptions (tuple=0x11cd528,
tupdesc=0x7fc9e499b420, amoptions=0) at reloptions.c:778
#1  0x00000000008513f8 in RelationParseRelOptions
(relation=0x7fc9e495f490, tuple=0x11cd528) at relcache.c:388
#2  0x0000000000852695 in RelationBuildDesc (targetRelId=3466,
insertIt=1 '\001') at relcache.c:905
#3  0x0000000000854026 in RelationIdGetRelation (relationId=3466) at
relcache.c:1567
#4  0x000000000048a535 in relation_open (relationId=3466, lockmode=1) at
heapam.c:917
#5  0x000000000084d1bd in BuildEventTriggerCache () at evtcache.c:130
#6  0x000000000084d011 in EventCacheLookup (event=EVT_DDLCommandStart)
at evtcache.c:59
#7  0x00000000005bb1db in EventTriggerDDLCommandStart
(parsetree=0x11becd0) at event_trigger.c:594
#8  0x000000000075d102 in standard_ProcessUtility (parsetree=0x11becd0,
queryString=0x11bef98 "/*\n * PostgreSQL System Views\n *\n * Copyright
(c) 1996-2012, PostgreSQL Global Development Group\n *\n *
src/backend/catalog/system_views.sql\n */\n\nCREATE VIEW pg_roles AS\n
SELECT\n        rolname,\n   "..., params=0x0, dest=0xcdda60,
completionTag=0x7fff3476ea80 "", context=PROCESS_UTILITY_QUERY) at
utility.c:986
#9  0x000000000075bec9 in ProcessUtility (parsetree=0x11becd0,
queryString=0x11bef98 "/*\n * PostgreSQL System Views\n *\n * Copyright
(c) 1996-2012, PostgreSQL Global Development Group\n *\n *
src/backend/catalog/system_views.sql\n */\n\nCREATE VIEW pg_roles AS\n
SELECT\n        rolname,\n   "..., params=0x0, dest=0xcdda60,
completionTag=0x7fff3476ea80 "", context=PROCESS_UTILITY_QUERY) at
utility.c:338
#10 0x000000000075ae31 in PortalRunUtility (portal=0x11c8fe8,
utilityStmt=0x11becd0, isTopLevel=0 '\000', dest=0xcdda60,
completionTag=0x7fff3476ea80 "") at pquery.c:1186
#11 0x000000000075b018 in PortalRunMulti (portal=0x11c8fe8, isTopLevel=0
'\000', dest=0xcdda60, altdest=0xcdda60, completionTag=0x7fff3476ea80
"") at pquery.c:1318
#12 0x000000000075a54a in PortalRun (portal=0x11c8fe8,
count=9223372036854775807, isTopLevel=0 '\000', dest=0xcdda60,
altdest=0xcdda60, completionTag=0x7fff3476ea80 "") at pquery.c:815
#13 0x00000000007545b2 in exec_simple_query (query_string=0x11bef98 "/*
\n * PostgreSQL System Views\n *\n * Copyright (c) 1996-2012, PostgreSQL
Global Development Group\n *\n * src/backend/catalog/system_views.sql\n
*/\n\nCREATE VIEW pg_roles AS\n    SELECT\n        rolname,\n   "...) at
postgres.c:1053
#14 0x00000000007587fe in PostgresMain (argc=10, argv=0x10f29b0,
username=0x10f6eb0 "jdavis") at postgres.c:3972
#15 0x00000000006630fa in main (argc=10, argv=0x10f29b0) at main.c:195

Re: event triggers patch breaks with -DCLOBBER_CACHE_ALWAYS

From
Robert Haas
Date:
On Mon, Jul 23, 2012 at 8:16 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Compiling 3a0e4d with -DCLOBBER_CACHE_ALWAYS causes initdb to segfault.
>
> It looks like the tuple was allocated in the EventTriggerCacheContext,
> and before the tuple was freed, another invalidation event came along
> and wiped out that context.
>
> It seems like the cleanest fix would be to explicitly allocate things in
> the EventTriggerCacheContext, rather than switching the context.
>
> However, it's not 100% clear to me that this is really a problem with
> the event trigger patch itself. RelationBuildDesc is doing an allocation
> in whatever the caller's context is, and freeing it. But in the
> meantime, it does it's own catalog accesses and otherwise executes quite
> a lot of code, and assumes that the memory context will stay intact the
> entire time. Maybe that's a reasonable assumption, but as seen here, it
> leaves room for error.

Yeah.  Essentially, BuildRelationDesc wants the caller's memory
context to be short-lived (so that any leaks go away) but not so
short-lived that the tuple can get clobbered before the function gets
done using it.  It's worked up until now but it seems kinda fragile.
However, even if we fixed that, I don't think it would fix this
problem completely, because BuildEventTriggerCache() is itself
unprepared for the possibility of a cache reset in the middle of a
rebuild.

After some thought, I think the right fix is to add a three-valued
flag indicating whether the cache is (1) stale (the starting state),
(2) rebuild started, or (3) valid.

(1) At the beginning of BuildEventTriggerCache, we unconditionally set
the state to "rebuild started".  At the end, if the state is still
"rebuild started", we set it to "valid".

(2) InvalidateEventCacheCallback should blow the cache away only if
the current state is something other than "rebuild started".  It
should then unconditionally set the flag to "stale".

(3) EventCacheLookup should rebuild the cache whenever the state is not "valid".

That way, if an invalidation event arrives in the middle of a rebuild,
we finish the rebuild without blowing up, but leave the cache marked
as stale, so that the next lookup will rebuild it again.  The "rebuild
started" state could exist even when no rebuild is in progress if the
rebuild errors out, but it doesn't really matter; the next rebuild
will fix it.

Anyone see a hole in that, or have better ideas?

> While looking at it, I observe that no context is passed to
> hash_create(), and no hash_destroy() is called. Unless I'm missing
> something, that's a leak in TopMemoryContext.

I've pushed a fix for this; thanks for catching it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company