Regression instability + performance issue in TRIGGERS view - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Regression instability + performance issue in TRIGGERS view |
Date | |
Msg-id | 5891.1587594470@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Regression instability + performance issue in TRIGGERS view
|
List | pgsql-hackers |
I looked into the cause of several recent buildfarm failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2020-04-20%2020%3A32%3A23 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2020-04-18%2018%3A20%3A12 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=devario&dt=2020-02-27%2014%3A18%3A34 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2020-01-17%2022%3A56%3A13 all of which look like this: diff -U3 /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/expected/triggers.out /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/results/triggers.out --- /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/expected/triggers.out 2020-03-19 17:54:52.037720127-0300 +++ /home/filiperosset/dev/build-farm-11/HEAD/pgsql.build/src/test/regress/results/triggers.out 2020-04-20 17:44:02.024230072-0300 @@ -2559,22 +2559,7 @@ FROM information_schema.triggers WHERE event_object_table IN ('parent', 'child1', 'child2', 'child3') ORDER BY trigger_name COLLATE "C", 2; - trigger_name | event_manipulation | event_object_schema | event_object_table | action_order | action_condition |action_orientation | action_timing | action_reference_old_table | action_reference_new_table ---------------------+--------------------+---------------------+--------------------+--------------+------------------+--------------------+---------------+----------------------------+---------------------------- - child1_delete_trig | DELETE | public | child1 | 1 | |STATEMENT | AFTER | old_table | - child1_insert_trig | INSERT | public | child1 | 1 | |STATEMENT | AFTER | | new_table - child1_update_trig | UPDATE | public | child1 | 1 | |STATEMENT | AFTER | old_table | new_table - child2_delete_trig | DELETE | public | child2 | 1 | |STATEMENT | AFTER | old_table | - child2_insert_trig | INSERT | public | child2 | 1 | |STATEMENT | AFTER | | new_table - child2_update_trig | UPDATE | public | child2 | 1 | |STATEMENT | AFTER | old_table | new_table - child3_delete_trig | DELETE | public | child3 | 1 | |STATEMENT | AFTER | old_table | - child3_insert_trig | INSERT | public | child3 | 1 | |STATEMENT | AFTER | | new_table - child3_update_trig | UPDATE | public | child3 | 1 | |STATEMENT | AFTER | old_table | new_table - parent_delete_trig | DELETE | public | parent | 1 | |STATEMENT | AFTER | old_table | - parent_insert_trig | INSERT | public | parent | 1 | |STATEMENT | AFTER | | new_table - parent_update_trig | UPDATE | public | parent | 1 | |STATEMENT | AFTER | old_table | new_table -(12 rows) - +ERROR: cache lookup failed for function 22540 -- insert directly into children sees respective child-format tuples insert into child1 values ('AAA', 42); NOTICE: trigger = child1_insert_trig, new table = (AAA,42) What appears to be happening is that the concurrent updatable_views test creates/deletes some triggers and trigger functions, and with bad timing luck that results in a cache lookup failure within pg_get_triggerdef(). This isn't a new problem (note that crake's failure is on v12 not HEAD) but it seems that it's gotten much more probable of late, probably because of unrelated test changes altering the timing of these tests. It's fairly annoying that the triggers view could be prone to failing due to concurrent DDL, but I don't see any near-term fix for that general problem --- as long as the ruleutils infrastructure relies on syscache lookups in any way, we're going to have some problems of that sort. But what's *really* annoying is that the view can fail due to DDL changes that aren't even on any of the tables being looked at. That's because the planner is failing to push down the WHERE condition, meaning that this isn't only a regression instability issue but a rather severe performance problem: there is no way for a query on the triggers view to not evaluate the entire view. The reason for the pushdown failure is that the triggers view uses a window function to compute its action_order output: rank() OVER (PARTITION BY n.oid, c.oid, em.num, t.tgtype & 1, t.tgtype & 66 ORDER BY t.tgname) and the planner can't push the restriction on relname below that for fear of changing the window function's results. It does know that restrictions on the PARTITION BY columns can be pushed down, but the query restriction is not on any of those columns. However, it's not like the query restriction is unrelated to that partition condition, it's just not quite the same thing. I experimented with - rank() OVER (PARTITION BY n.oid, c.oid, em.num, t.tgtype & 1, t.tgtype & 66 ORDER BY t.tgname) + rank() OVER (PARTITION BY n.nspname::sql_identifier, c.relname::sql_identifier, em.num, t.tgtype & 1, t.tgtype &66 ORDER BY t.tgname) so as to make the first two partitioning columns actually match the relevant view output columns, and lo, the planner now successfully pushes the table-name constraint down to the pg_class relation scan. (With a little more work, we could make the other partitioning columns also match view output columns exactly, but I'm not sure that any of those are worth messing with. Restrictions on the table and schema names seem like the only things likely to be interesting for performance.) This should not only fix the regression instability but offer a pretty significant speedup for real-world use of the triggers view, so I propose squeezing it in even though we're past feature freeze. However ... what about the back branches? As noted, we are seeing this now in the back branches, at least v12. Are we willing to make a change in the information_schema in back branches to make the instability go away? (And if not, how else could we fix it?) I note that this is actually a performance regression in the triggers view: before v11, when this rank() call was added, the planner had no problem pushing down restrictions on the table name. I think that a reasonable case could be made for back-patching this change as far as v11, though of course without a catversion bump. Thoughts? regards, tom lane
pgsql-hackers by date: