Thread: [BUG v13] Crash with event trigger in extension

[BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
Hi,

Philippe Beaudoin reported to me a crash he was able to reproduce with some
extension update script. Please, find in attachment a script that triggers it.
Just edit the three first vars to point to a temporary installation and
instance.

Some git bisect shows the bug was introduced by b5810de3f4, where
multi-statement query strings are using different short memory contexts.

To make it short, any ALTER TABLE command in an extension script is 
currently calling lappend to add its command to the
currentEventTriggerState->commandList list from its own short context instead
of using the context of currentEventTriggerState.

Because of this, when we call EventTriggerCollectSimpleCommand() for the top
level "ALTER EXTENSION" query, the list has long been removed and a new call of
lappend on currentEventTriggerState->commandList raise
Assert(IsPointerList(list)).

Please, find in attachement a bug fix proposal. I did a very quick look at
other ->commandList manipulations in event_trigger.c. They all appear to
be in the correct context in a first look.

Regards,

Attachment

Re: [BUG v13] Crash with event trigger in extension

From
Alvaro Herrera
Date:
On 2020-Sep-02, Jehan-Guillaume de Rorthais wrote:

> Hi,
> 
> Philippe Beaudoin reported to me a crash he was able to reproduce with some
> extension update script. Please, find in attachment a script that triggers it.
> Just edit the three first vars to point to a temporary installation and
> instance.
> 
> Some git bisect shows the bug was introduced by b5810de3f4, where
> multi-statement query strings are using different short memory contexts.

Good find.

I think we should turn your case into a proper TAP test so that we can
incorporate it, for coverage -- the lines you patch do appear covered,
but obviously not in these specific conditions.  I'm just not very sure
*where* to add it; maybe src/test/modules/test_extensions is the place
to have it, though that one doesn't have TAP tests currently.  An easy
addition, though.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Wed, 2 Sep 2020 13:53:28 -0400
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2020-Sep-02, Jehan-Guillaume de Rorthais wrote:
> 
> > Hi,
> > 
> > Philippe Beaudoin reported to me a crash he was able to reproduce with some
> > extension update script. Please, find in attachment a script that triggers
> > it. Just edit the three first vars to point to a temporary installation and
> > instance.
> > 
> > Some git bisect shows the bug was introduced by b5810de3f4, where
> > multi-statement query strings are using different short memory contexts.
> 
> Good find.
> 
> I think we should turn your case into a proper TAP test so that we can
> incorporate it, for coverage -- the lines you patch do appear covered,
> but obviously not in these specific conditions.  I'm just not very sure
> *where* to add it; maybe src/test/modules/test_extensions is the place
> to have it, though that one doesn't have TAP tests currently.  An easy
> addition, though.

Sure, please find in attachment patch v2 adding a regression test.

Regards,

Attachment

Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Fri, 4 Sep 2020 10:20:13 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Sep 03, 2020 at 07:41:14PM +0200, Jehan-Guillaume de Rorthais wrote:
> > +$node->safe_psql('postgres',
> > +    q{CREATE EXTENSION test_event_trigger VERSION '1.0'});
> > +$node->safe_psql('postgres',
> > +    q{ALTER EXTENSION test_event_trigger UPDATE TO '2.0'});
> > +
> > +my $ans = $node->safe_psql('postgres', q{
> > +    SELECT extversion
> > +    FROM pg_catalog.pg_extension
> > +    WHERE extname='test_event_trigger'
> > +});
> > +
> > +is($ans, '2.0', 'Extension test_event_trigger upgraded');  
> 
> I may be missing something of course, but if this just tests a
> sequence of SQL commands, why don't you just include a SQL test in
> text_extensions/sql/?  That would be much cheaper than having to
> create a full scenario with a TAP script.

Indeed, it's much more simple and cheaper. However, the result is a bit
confusing as it's not a wrong SQL result, but the backend that actually crash.
The crash message itself appears in results/test_extensions.out, but no other
tests are run. Neither from the same SQL file or from other ones. That Means
the diff is quite large, but the error appears close from the top.

I don't know if some other crash regression tests has been included that way in
the past. But if this is OK, I can append the test to sql/test_extensions.sql
and remove the TAP. Let me know what you think.

Regards,



Re: [BUG v13] Crash with event trigger in extension

From
Alvaro Herrera
Date:
On 2020-Sep-04, Jehan-Guillaume de Rorthais wrote:

> Indeed, it's much more simple and cheaper. However, the result is a bit
> confusing as it's not a wrong SQL result, but the backend that actually crash.
> The crash message itself appears in results/test_extensions.out, but no other
> tests are run. Neither from the same SQL file or from other ones. That Means
> the diff is quite large, but the error appears close from the top.

Oh!  If it crashes without the extension, then we don't *need* the
exension and surely we don't need the test to be in src/test/modules/;
my inclination would be to put it somewhere in
src/test/regress/sql/event_triggers.sql, maybe at the bottom, with a
comment that references this thread.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Fri, 4 Sep 2020 09:27:15 -0400
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2020-Sep-04, Jehan-Guillaume de Rorthais wrote:
> 
> > Indeed, it's much more simple and cheaper. However, the result is a bit
> > confusing as it's not a wrong SQL result, but the backend that actually
> > crash. The crash message itself appears in results/test_extensions.out, but
> > no other tests are run. Neither from the same SQL file or from other ones.
> > That Means the diff is quite large, but the error appears close from the
> > top.  
> 
> Oh!  If it crashes without the extension, then we don't *need* the
> exension and surely we don't need the test to be in src/test/modules/;

Oh wait. It's not what I meant. We *need* an extension to trigger the
backend crash.

> my inclination would be to put it somewhere in
> src/test/regress/sql/event_triggers.sql, maybe at the bottom, with a
> comment that references this thread.

As the regression need an extension to be installed, it still makes sens to
test from src/test/modules/test_extension where various extensions are
installed to run the checks.

Please find in attachment patch v3.

Attachment

Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Tue, 8 Sep 2020 14:27:03 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Sep 07, 2020 at 10:35:08AM +0900, Michael Paquier wrote:
> > More comments in each script would be helpful to document to the
> > reader why those objects are defined as such and their purpose in
> > life.  
> 
> I have looked at that, and tweaked a couple of things as per the
> attached.  Alvaro, Jehan-Guillaume, is that fine to you?

I was actually working on an answer with a new version of the patch, but you've
been faster. Thanks.

I would probably set the error message in VERSION '1.0' to 

  CREATE EXTENSION test_event_trigger VERSION '1.0'

But it's really nitpicking at this point.

More importantly, I wonder if this bug might be the visible tip of another
bigger one. I was wondering why this crash do not triggers without disabling
the event trigger first.

It appears that when keeping the event trigger enabled,
currentEventTriggerState might be initiated multiple times: one for the top
level "alter extension ... update", then once per DDL query triggering en event
in the extension update script. I suppose there's no crash because the very
first currentEventTriggerState initialized at top level is overrode by next
ones, each being destroyed in their own context. So at the end of the
extension script, currentEventTriggerState is just destroyed and empty.

I did not had time to investigate further more and produce a demo case though.
I might misunderstand something here.

Comment?

++



Re: [BUG v13] Crash with event trigger in extension

From
Alvaro Herrera
Date:
On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:

> It appears that when keeping the event trigger enabled,
> currentEventTriggerState might be initiated multiple times: one for the top
> level "alter extension ... update", then once per DDL query triggering en event
> in the extension update script. I suppose there's no crash because the very
> first currentEventTriggerState initialized at top level is overrode by next
> ones, each being destroyed in their own context. So at the end of the
> extension script, currentEventTriggerState is just destroyed and empty.

Hmm.  I spent a lot of time making sure the event trigger thing worked
correctly in reentrant cases, but I don't think the case of execution of
extension scripts was ever considered.  So you're right that it might be
broken, since it appears not to be thoroughly tested (if at all).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Tue, 8 Sep 2020 11:33:24 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:
> 
> > It appears that when keeping the event trigger enabled,
> > currentEventTriggerState might be initiated multiple times: one for the top
> > level "alter extension ... update", then once per DDL query triggering en
> > event in the extension update script. I suppose there's no crash because
> > the very first currentEventTriggerState initialized at top level is
> > overrode by next ones, each being destroyed in their own context. So at the
> > end of the extension script, currentEventTriggerState is just destroyed and
> > empty.  
> 
> Hmm.  I spent a lot of time making sure the event trigger thing worked
> correctly in reentrant cases, but I don't think the case of execution of
> extension scripts was ever considered.  So you're right that it might be
> broken, since it appears not to be thoroughly tested (if at all).

Argh. After some more debugging, I realized I missed the fact that
EventTriggerBeginCompleteQuery/EventTriggerEndCompleteQuery were actually
stacking allocations using currentEventTriggerState->previous. My bad, sorry for
your time.

I still have a last question in mind though. While running the bellow
scenario, the DDL in the extension do not trigger evt_ext_ddl_fnct. Is it
normal/expected?

  BEGIN;
  CREATE OR REPLACE FUNCTION _evt_ext_ddl_fnct()
  RETURNS EVENT_TRIGGER LANGUAGE plpgsql AS
  $$
    DECLARE r record;
    BEGIN
    FOR r in SELECT * FROM pg_event_trigger_ddl_commands() LOOP
      raise notice 'called %: %.%',
          quote_ident(r.command_tag),
          quote_ident(r.schema_name),
          quote_ident(r.object_identity);
    end loop;
    END;
  $$;
  
  CREATE EVENT TRIGGER evt_ext_ddl_fnct
    ON ddl_command_end
    EXECUTE PROCEDURE _evt_ext_ddl_fnct();
  
  CREATE EXTENSION test_event_trigger VERSION '1.0';
  ALTER EXTENSION test_event_trigger UPDATE TO '2.0';

  CREATE TABLE test( i int);
  ROLLBACK;

The only output is:

  NOTICE:  called "CREATE EXTENSION": <NULL>.test_event_trigger
  NOTICE:  called "ALTER EXTENSION": <NULL>.test_event_trigger
  NOTICE:  called "CREATE TABLE": public."public.test"

Regards,



Re: [BUG v13] Crash with event trigger in extension

From
Alvaro Herrera
Date:
On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:

> I still have a last question in mind though. While running the bellow
> scenario, the DDL in the extension do not trigger evt_ext_ddl_fnct. Is it
> normal/expected?

Hmm, I think this is intentional; see standard_ProcessUtility and
ProcessUtilitySlow, noting the value of isCompleteQuery for the
execution of the extension script.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Tue, 8 Sep 2020 14:32:22 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:
> 
> > I still have a last question in mind though. While running the bellow
> > scenario, the DDL in the extension do not trigger evt_ext_ddl_fnct. Is it
> > normal/expected?  
> 
> Hmm, I think this is intentional; see standard_ProcessUtility and
> ProcessUtilitySlow, noting the value of isCompleteQuery for the
> execution of the extension script.

According to extension.c:execute_sql_string(), queries from
extension script are executed as PROCESS_UTILITY_QUERY context. So
isCompleteQuery is indeed always true in ProcessUtilitySlow. A breakpoint here
while running my test case confirm this.

Maybe you were talking about isTopLevel? But this one doesn't seem considered
while defining if event triggers should trigger or not.

Anyway, if event trigger should not trigger during create/alter extension, I
suppose the original memory context bug that starts this discussion shouldn't
happen in the first place (but need to be fixed anyway), isn't it?

Regards,



Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Fri, 11 Sep 2020 15:14:41 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Wed, Sep 09, 2020 at 12:58:40PM +0200, Jehan-Guillaume de Rorthais wrote:
> > According to extension.c:execute_sql_string(), queries from
> > extension script are executed as PROCESS_UTILITY_QUERY context. So
> > isCompleteQuery is indeed always true in ProcessUtilitySlow. A breakpoint
> > here while running my test case confirm this.
> > 
> > Maybe you were talking about isTopLevel? But this one doesn't seem
> > considered while defining if event triggers should trigger or not.
> > 
> > Anyway, if event trigger should not trigger during create/alter extension, I
> > suppose the original memory context bug that starts this discussion
> > shouldn't happen in the first place (but need to be fixed anyway), isn't
> > it?  
> 
> If I read correctly the code of event_trigger.c, command collection
> and execution are and should be two different things, meaning that we
> should still collect the commands and then at execution time we decide
> if the event trigger associated to the commands should be fired or
> not.
> 
> Anyway, based on your example in [1], I can see that the event trigger
> for ddl_command_end is correctly triggered for the ALTER TABLE command
> included in the extension upgrade script if the event trigger is
> enabled at the time the extension script triggers, which is the
> behavior I would expect.  What may be a problem though is that the
> NOTICE you are trying to print does not show up, but I think that this
> is caused by the particular context where the SQL queries from an
> extension script are triggered within the backend in this case.
> Also, if you want to make sure of the event trigger execution, you can
> just change the notice to an exception in _evt_ext_ddl_fnct() and you
> would see ALTER EXTENSION fail, pg_event_trigger_ddl_commands
> capturing correctly the information associated to ALTER TABLE for
> table "t":
> ERROR:  P0001: called "ALTER TABLE": public."public.t"
> CONTEXT:  PL/pgSQL function _evt_ext_ddl_fnct() line 5 at RAISE
> LOCATION:  exec_stmt_raise, pl_exec.c:3878

Thank for your investigation, explanation and time.

> FWIW, I think that the fix proposed is fine as-is, and that we had
> better apply it.

+1



Re: [BUG v13] Crash with event trigger in extension

From
Jehan-Guillaume de Rorthais
Date:
On Wed, 16 Sep 2020 15:36:53 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Sep 15, 2020 at 09:26:39PM -0300, Alvaro Herrera wrote:
> > I gave this a very long look and was convinced that it was going to
> > break with multi-statement query strings, but it turns out that it
> > wasn't.  So I pushed it without any other changes.
> > 
> > I forgot to add reviewer credit to the commit message :-(  My apologies
> 
> No worries.  Thanks for committing.

Thank you both.