Eliminating SPI from RI triggers - take 2 - Mailing list pgsql-hackers

From Amit Langote
Subject Eliminating SPI from RI triggers - take 2
Date
Msg-id CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com
Whole thread Raw
Responses Re: Eliminating SPI from RI triggers - take 2
Re: Eliminating SPI from RI triggers - take 2
List pgsql-hackers
Hi,

I had proposed $subject for some RI trigger functions in the last dev
cycle [1].  Briefly, the proposal was to stop using an SQL query
(using the SPI interface) for RI checks that could be done by directly
scanning the primary/unique key index of the referenced table, which
must always be there.  While acknowledging that the patch showed a
clear performance benefit, Tom gave the feedback that doing so only
for some RI checks but not others is not very desirable [2].

The other cases include querying the referencing table when deleting
from the referenced table to handle the referential action clause.
Two main hurdles to not using an SQL query for those cases that I
hadn't addressed were:

1) What should the hard-coded plan be?  Referencing table may not
always have an index on the queried foreign key columns.  Even if
there is one, it's not clear if scanning it is *always* better than
scanning the whole table to find the matching rows.

2) While the RI check functions for RESTRICT and NO ACTION actions
issue a `SELECT ... LIMIT 1` query, those for CASCADE and SET actions
issue a `UPDATE SET / DELETE`.  I had no good idea as to how much of
the executor functionality would need to be replicated in order to
perform the update/delete actions without leaving the ri_triggers.c
module.

We had an unconference session to discuss these concerns at this
year's PGCon, whose minutes can be found at [3].  Among other
suggestions, one was to only stop using the SPI interface to issue the
RI check/action queries, while continuing to use the same SQL queries
as now.  That means creating a copy in ri_triggers.c of the
functionality of SPI_prepare(), which creates the CachedPlanSource for
the query, and of SPI_execute_plan(), which executes a CachedPlan
obtained from that CachedPlanSource to produce the result tuples if
any.   That may not have the same performance boost as skipping the
planner/plancache and the executor altogether, but at least it becomes
easier to check the difference between semantic behaviors of an RI
query implemented as SQL and another implemented using some hard-coded
plan if we choose to do, because the logic would no longer be divided
between ri_trigger.c and spi.c.  I think that will, at least to some
degree, alleviate the concerns that Tom expressed about the previous
effort.

So, I hacked together a patch (attached 0001) that invents an "RI
plan" construct (struct RIPlan) to replace the use of an "SPI plan"
(struct _SPI_plan).  While the latter encapsulates the
CachedPlanSource of an RI query directly, I decided to make it an
option for a given RI trigger to specify whether it would like to have
its RIPlan store CachedPlanSource if its check is still implemented as
an SQL query or something else if the implementation will be a
hard-coded plan.  RIPlan contains callbacks to create, execute,
validate, and free a plan that implements a given RI query.  For
example, an RI plan for checks implemented as SQL will call the
callback ri_SqlStringPlanCreate() to parse the query and allocate a
CachedPlanSource and ri_SqlStringPlanExecute() to a CachedPlan and
executes it PlannedStmt using the executor interface directly.
Remaining callbacks ri_SqlStringPlanIsValid() and
ri_SqlStringPlanFree() use CachedPlanIsValid() and DropCachedPlan(),
respectively, to validate and free a CachedPlan.

With that in place, I decided to rebase my previous patch [1] to use
this new interface and the result is attached 0002.  One notable
improvement over the previous standalone patch is that the snapshot
setting logic need no longer be in function implementing the proposed
hard-coded plan for RI check triggers.  That logic and other
configuration needed before executing the plan is now a part of the
top-level ri_PerformCheck() function that is shared between various RI
plan implementations.  So whether an RI check or action is implemented
using SQL plan or a hard-code plan, the execution should proceed with
the effectively same config/environment.

I will continue investigating what to do about points (1) and (2)
mentioned above and see if we can do away with using SQL in the
remaining cases.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com

[2] https://postgr.es/m/3400437.1649363527%40sss.pgh.pa.us

[3] https://wiki.postgresql.org/wiki/PgCon_2022_Developer_Unconference#Removing_SPI_from_RI_trigger_implementation

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: making relfilenodes 56 bits
Next
From: Fujii Masao
Date:
Subject: Re: Backup command and functions can cause assertion failure and segmentation fault