Eliminating SPI from RI triggers - take 2 - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Eliminating SPI from RI triggers - take 2 |
Date | |
Msg-id | CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com Whole thread Raw |
Responses |
Re: Eliminating SPI from RI triggers - take 2
Re: Eliminating SPI from RI triggers - take 2 |
List | pgsql-hackers |
Hi, I had proposed $subject for some RI trigger functions in the last dev cycle [1]. Briefly, the proposal was to stop using an SQL query (using the SPI interface) for RI checks that could be done by directly scanning the primary/unique key index of the referenced table, which must always be there. While acknowledging that the patch showed a clear performance benefit, Tom gave the feedback that doing so only for some RI checks but not others is not very desirable [2]. The other cases include querying the referencing table when deleting from the referenced table to handle the referential action clause. Two main hurdles to not using an SQL query for those cases that I hadn't addressed were: 1) What should the hard-coded plan be? Referencing table may not always have an index on the queried foreign key columns. Even if there is one, it's not clear if scanning it is *always* better than scanning the whole table to find the matching rows. 2) While the RI check functions for RESTRICT and NO ACTION actions issue a `SELECT ... LIMIT 1` query, those for CASCADE and SET actions issue a `UPDATE SET / DELETE`. I had no good idea as to how much of the executor functionality would need to be replicated in order to perform the update/delete actions without leaving the ri_triggers.c module. We had an unconference session to discuss these concerns at this year's PGCon, whose minutes can be found at [3]. Among other suggestions, one was to only stop using the SPI interface to issue the RI check/action queries, while continuing to use the same SQL queries as now. That means creating a copy in ri_triggers.c of the functionality of SPI_prepare(), which creates the CachedPlanSource for the query, and of SPI_execute_plan(), which executes a CachedPlan obtained from that CachedPlanSource to produce the result tuples if any. That may not have the same performance boost as skipping the planner/plancache and the executor altogether, but at least it becomes easier to check the difference between semantic behaviors of an RI query implemented as SQL and another implemented using some hard-coded plan if we choose to do, because the logic would no longer be divided between ri_trigger.c and spi.c. I think that will, at least to some degree, alleviate the concerns that Tom expressed about the previous effort. So, I hacked together a patch (attached 0001) that invents an "RI plan" construct (struct RIPlan) to replace the use of an "SPI plan" (struct _SPI_plan). While the latter encapsulates the CachedPlanSource of an RI query directly, I decided to make it an option for a given RI trigger to specify whether it would like to have its RIPlan store CachedPlanSource if its check is still implemented as an SQL query or something else if the implementation will be a hard-coded plan. RIPlan contains callbacks to create, execute, validate, and free a plan that implements a given RI query. For example, an RI plan for checks implemented as SQL will call the callback ri_SqlStringPlanCreate() to parse the query and allocate a CachedPlanSource and ri_SqlStringPlanExecute() to a CachedPlan and executes it PlannedStmt using the executor interface directly. Remaining callbacks ri_SqlStringPlanIsValid() and ri_SqlStringPlanFree() use CachedPlanIsValid() and DropCachedPlan(), respectively, to validate and free a CachedPlan. With that in place, I decided to rebase my previous patch [1] to use this new interface and the result is attached 0002. One notable improvement over the previous standalone patch is that the snapshot setting logic need no longer be in function implementing the proposed hard-coded plan for RI check triggers. That logic and other configuration needed before executing the plan is now a part of the top-level ri_PerformCheck() function that is shared between various RI plan implementations. So whether an RI check or action is implemented using SQL plan or a hard-code plan, the execution should proceed with the effectively same config/environment. I will continue investigating what to do about points (1) and (2) mentioned above and see if we can do away with using SQL in the remaining cases. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com [1] https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com [2] https://postgr.es/m/3400437.1649363527%40sss.pgh.pa.us [3] https://wiki.postgresql.org/wiki/PgCon_2022_Developer_Unconference#Removing_SPI_from_RI_trigger_implementation
Attachment
pgsql-hackers by date: