Weirdness using Executor Hooks - Mailing list pgsql-hackers

From Eric Ridge
Subject Weirdness using Executor Hooks
Date
Msg-id CANcm6waOgyFxYyZeVjnJs3WRPJPnf0kLmVVmhNhLy9cKnacYDA@mail.gmail.com
Whole thread Raw
Responses Re: Weirdness using Executor Hooks  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I've written an extension that hooks ExecutorStart_hook and ExecutorEnd_hook.  The hooks are assigned in _PG_init() (and the previous ones saved to static vars) and reset to the previous values in _PG_fini().  Maybe also of interest is the extension library is set in postgresql.conf as a local_preload_libraries.  This is with Postgres 9.3.4.

What happens is that rarely (and of course never on my development machine), the saved "prev_ExecutorXXXHook" gets set to the current value of ExecutorXXX_hook, so when my hook function is called:

static void my_executor_start_hook(QueryDesc *queryDesc, int eflags)
{
   executorDepth++;

   if (prev_ExecutorStartHook)   /* this ends up equal to my_executor_start_hook, so it recurses forever */
      prev_ExecutorStartHook(queryDesc, eflags);
   else
      standard_ExecutorStart(queryDesc, eflags);
}

it endless loops on itself (ie, prev_ExecutorStartHook == my_executor_start_hook).  Based on GDB backtraces, it looks like gcc compiles this into some form of tail recursion as the backtraces just sit on the line that calls prev_ExecutorXXXHook(...).  The backend has to be SIGKILL'd.

I've followed the patterns set forth in both the 'auto_explain' and 'pg_stat_statements' contrib extensions and I've been over my code about 3 dozen times now, and I just can't figure out what's going on.  Clearly the hooks are known-to-work, and I'm stumped.

One theory I have is that I've got a bug somewhere that's overwriting memory, but it's quite the coincidence that only the two saved prev hook pointers are being changed and being changed to very specific values.

Since it only happens rarely (and never for me during development), another theory is based on the fact that this extension is under pretty constant development/deployment and when we deploy a new binary (and run ALTER EXTENSION UPDATE) we don't restart Postgres and so maybe the already-active-and-initialized-with-the-previous-version backends are getting confused (maybe the kernel re-mmaps the .so or something, I dunno?).  I always seem to hear about the problem after a backend has been endlessly spinning for a few days.  :(

Have any of y'all seen anything like this and could I be on the right track with my second theory?

*scratching head*,

eric

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inheritance planner CPU and memory usage change since 9.3.2
Next
From: Andres Freund
Date:
Subject: Re: Weirdness using Executor Hooks