Regression instability + performance issue in TRIGGERS view - Mailing list pgsql-hackers

From Tom Lane
Subject Regression instability + performance issue in TRIGGERS view
Date
Msg-id 5891.1587594470@sss.pgh.pa.us
Whole thread Raw
Responses Re: Regression instability + performance issue in TRIGGERS view
List pgsql-hackers
I looked into the cause of several recent buildfarm failures:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2020-04-20%2020%3A32%3A23
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2020-04-18%2018%3A20%3A12
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=devario&dt=2020-02-27%2014%3A18%3A34
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2020-01-17%2022%3A56%3A13

all of which look like this:

diff -U3 /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/expected/triggers.out
/home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/results/triggers.out
--- /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/expected/triggers.out    2020-03-19
17:54:52.037720127-0300 
+++ /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/results/triggers.out    2020-04-20
17:44:02.024230072-0300 
@@ -2559,22 +2559,7 @@
   FROM information_schema.triggers
   WHERE event_object_table IN ('parent', 'child1', 'child2', 'child3')
   ORDER BY trigger_name COLLATE "C", 2;
-    trigger_name    | event_manipulation | event_object_schema | event_object_table | action_order | action_condition
|action_orientation | action_timing | action_reference_old_table | action_reference_new_table  

---------------------+--------------------+---------------------+--------------------+--------------+------------------+--------------------+---------------+----------------------------+----------------------------
- child1_delete_trig | DELETE             | public              | child1             |            1 |
|STATEMENT          | AFTER         | old_table                  |  
- child1_insert_trig | INSERT             | public              | child1             |            1 |
|STATEMENT          | AFTER         |                            | new_table 
- child1_update_trig | UPDATE             | public              | child1             |            1 |
|STATEMENT          | AFTER         | old_table                  | new_table 
- child2_delete_trig | DELETE             | public              | child2             |            1 |
|STATEMENT          | AFTER         | old_table                  |  
- child2_insert_trig | INSERT             | public              | child2             |            1 |
|STATEMENT          | AFTER         |                            | new_table 
- child2_update_trig | UPDATE             | public              | child2             |            1 |
|STATEMENT          | AFTER         | old_table                  | new_table 
- child3_delete_trig | DELETE             | public              | child3             |            1 |
|STATEMENT          | AFTER         | old_table                  |  
- child3_insert_trig | INSERT             | public              | child3             |            1 |
|STATEMENT          | AFTER         |                            | new_table 
- child3_update_trig | UPDATE             | public              | child3             |            1 |
|STATEMENT          | AFTER         | old_table                  | new_table 
- parent_delete_trig | DELETE             | public              | parent             |            1 |
|STATEMENT          | AFTER         | old_table                  |  
- parent_insert_trig | INSERT             | public              | parent             |            1 |
|STATEMENT          | AFTER         |                            | new_table 
- parent_update_trig | UPDATE             | public              | parent             |            1 |
|STATEMENT          | AFTER         | old_table                  | new_table 
-(12 rows)
-
+ERROR:  cache lookup failed for function 22540
 -- insert directly into children sees respective child-format tuples
 insert into child1 values ('AAA', 42);
 NOTICE:  trigger = child1_insert_trig, new table = (AAA,42)

What appears to be happening is that the concurrent updatable_views
test creates/deletes some triggers and trigger functions, and with
bad timing luck that results in a cache lookup failure within
pg_get_triggerdef().  This isn't a new problem (note that crake's
failure is on v12 not HEAD) but it seems that it's gotten much more
probable of late, probably because of unrelated test changes
altering the timing of these tests.

It's fairly annoying that the triggers view could be prone to failing due
to concurrent DDL, but I don't see any near-term fix for that general
problem --- as long as the ruleutils infrastructure relies on syscache
lookups in any way, we're going to have some problems of that sort.
But what's *really* annoying is that the view can fail due to DDL
changes that aren't even on any of the tables being looked at.  That's
because the planner is failing to push down the WHERE condition, meaning
that this isn't only a regression instability issue but a rather severe
performance problem: there is no way for a query on the triggers view
to not evaluate the entire view.

The reason for the pushdown failure is that the triggers view uses
a window function to compute its action_order output:

         rank() OVER (PARTITION BY n.oid, c.oid, em.num, t.tgtype & 1, t.tgtype & 66 ORDER BY t.tgname)

and the planner can't push the restriction on relname below that for fear
of changing the window function's results.  It does know that restrictions
on the PARTITION BY columns can be pushed down, but the query restriction
is not on any of those columns.

However, it's not like the query restriction is unrelated to that
partition condition, it's just not quite the same thing.
I experimented with

-        rank() OVER (PARTITION BY n.oid, c.oid, em.num, t.tgtype & 1, t.tgtype & 66 ORDER BY t.tgname)
+        rank() OVER (PARTITION BY n.nspname::sql_identifier, c.relname::sql_identifier, em.num, t.tgtype & 1, t.tgtype
&66 ORDER BY t.tgname) 

so as to make the first two partitioning columns actually match the
relevant view output columns, and lo, the planner now successfully
pushes the table-name constraint down to the pg_class relation scan.
(With a little more work, we could make the other partitioning columns
also match view output columns exactly, but I'm not sure that any of
those are worth messing with.  Restrictions on the table and schema
names seem like the only things likely to be interesting for
performance.)

This should not only fix the regression instability but offer a pretty
significant speedup for real-world use of the triggers view, so I
propose squeezing it in even though we're past feature freeze.

However ... what about the back branches?  As noted, we are seeing
this now in the back branches, at least v12.  Are we willing to make
a change in the information_schema in back branches to make the
instability go away?  (And if not, how else could we fix it?)

I note that this is actually a performance regression in the triggers
view: before v11, when this rank() call was added, the planner had
no problem pushing down restrictions on the table name.

I think that a reasonable case could be made for back-patching this
change as far as v11, though of course without a catversion bump.

Thoughts?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: More efficient RI checks - take 2
Next
From: Tom Lane
Date:
Subject: Re: More efficient RI checks - take 2