Thread: Eliminating SPI from RI triggers - take 2

Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
Hi,

I had proposed $subject for some RI trigger functions in the last dev
cycle [1].  Briefly, the proposal was to stop using an SQL query
(using the SPI interface) for RI checks that could be done by directly
scanning the primary/unique key index of the referenced table, which
must always be there.  While acknowledging that the patch showed a
clear performance benefit, Tom gave the feedback that doing so only
for some RI checks but not others is not very desirable [2].

The other cases include querying the referencing table when deleting
from the referenced table to handle the referential action clause.
Two main hurdles to not using an SQL query for those cases that I
hadn't addressed were:

1) What should the hard-coded plan be?  Referencing table may not
always have an index on the queried foreign key columns.  Even if
there is one, it's not clear if scanning it is *always* better than
scanning the whole table to find the matching rows.

2) While the RI check functions for RESTRICT and NO ACTION actions
issue a `SELECT ... LIMIT 1` query, those for CASCADE and SET actions
issue a `UPDATE SET / DELETE`.  I had no good idea as to how much of
the executor functionality would need to be replicated in order to
perform the update/delete actions without leaving the ri_triggers.c
module.

We had an unconference session to discuss these concerns at this
year's PGCon, whose minutes can be found at [3].  Among other
suggestions, one was to only stop using the SPI interface to issue the
RI check/action queries, while continuing to use the same SQL queries
as now.  That means creating a copy in ri_triggers.c of the
functionality of SPI_prepare(), which creates the CachedPlanSource for
the query, and of SPI_execute_plan(), which executes a CachedPlan
obtained from that CachedPlanSource to produce the result tuples if
any.   That may not have the same performance boost as skipping the
planner/plancache and the executor altogether, but at least it becomes
easier to check the difference between semantic behaviors of an RI
query implemented as SQL and another implemented using some hard-coded
plan if we choose to do, because the logic would no longer be divided
between ri_trigger.c and spi.c.  I think that will, at least to some
degree, alleviate the concerns that Tom expressed about the previous
effort.

So, I hacked together a patch (attached 0001) that invents an "RI
plan" construct (struct RIPlan) to replace the use of an "SPI plan"
(struct _SPI_plan).  While the latter encapsulates the
CachedPlanSource of an RI query directly, I decided to make it an
option for a given RI trigger to specify whether it would like to have
its RIPlan store CachedPlanSource if its check is still implemented as
an SQL query or something else if the implementation will be a
hard-coded plan.  RIPlan contains callbacks to create, execute,
validate, and free a plan that implements a given RI query.  For
example, an RI plan for checks implemented as SQL will call the
callback ri_SqlStringPlanCreate() to parse the query and allocate a
CachedPlanSource and ri_SqlStringPlanExecute() to a CachedPlan and
executes it PlannedStmt using the executor interface directly.
Remaining callbacks ri_SqlStringPlanIsValid() and
ri_SqlStringPlanFree() use CachedPlanIsValid() and DropCachedPlan(),
respectively, to validate and free a CachedPlan.

With that in place, I decided to rebase my previous patch [1] to use
this new interface and the result is attached 0002.  One notable
improvement over the previous standalone patch is that the snapshot
setting logic need no longer be in function implementing the proposed
hard-coded plan for RI check triggers.  That logic and other
configuration needed before executing the plan is now a part of the
top-level ri_PerformCheck() function that is shared between various RI
plan implementations.  So whether an RI check or action is implemented
using SQL plan or a hard-code plan, the execution should proceed with
the effectively same config/environment.

I will continue investigating what to do about points (1) and (2)
mentioned above and see if we can do away with using SQL in the
remaining cases.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com

[2] https://postgr.es/m/3400437.1649363527%40sss.pgh.pa.us

[3] https://wiki.postgresql.org/wiki/PgCon_2022_Developer_Unconference#Removing_SPI_from_RI_trigger_implementation

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Jacob Champion
Date:
On Thu, Jun 30, 2022 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> I will continue investigating what to do about points (1) and (2)
> mentioned above and see if we can do away with using SQL in the
> remaining cases.

Hi Amit, looks like isolation tests are failing in cfbot:

    https://cirrus-ci.com/task/6642884727275520

Note also the uninitialized variable warning that cfbot picked up;
that may or may not be related.

Thanks,
--Jacob



Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Wed, Jul 6, 2022 at 3:24 AM Jacob Champion <jchampion@timescale.com> wrote:
> On Thu, Jun 30, 2022 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > I will continue investigating what to do about points (1) and (2)
> > mentioned above and see if we can do away with using SQL in the
> > remaining cases.
>
> Hi Amit, looks like isolation tests are failing in cfbot:
>
>     https://cirrus-ci.com/task/6642884727275520
>
> Note also the uninitialized variable warning that cfbot picked up;
> that may or may not be related.

Thanks for the heads up.

Yeah, I noticed the warning when I compiled with a different set of
gcc parameters, though not the isolation test failures, so not sure
what the bot is running into.

Attaching updated patches which fix the warning and a few other issues
I noticed.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Wed, Jul 6, 2022 at 11:55 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Jul 6, 2022 at 3:24 AM Jacob Champion <jchampion@timescale.com> wrote:
> > On Thu, Jun 30, 2022 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > I will continue investigating what to do about points (1) and (2)
> > > mentioned above and see if we can do away with using SQL in the
> > > remaining cases.
> >
> > Hi Amit, looks like isolation tests are failing in cfbot:
> >
> >     https://cirrus-ci.com/task/6642884727275520
> >
> > Note also the uninitialized variable warning that cfbot picked up;
> > that may or may not be related.
>
> Thanks for the heads up.
>
> Yeah, I noticed the warning when I compiled with a different set of
> gcc parameters, though not the isolation test failures, so not sure
> what the bot is running into.
>
> Attaching updated patches which fix the warning and a few other issues
> I noticed.

Hmm, cfbot is telling me that detach-partition-concurrently-2 is
failing on Cirrus-CI [1].  Will look into it.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://cirrus-ci.com/task/5253369525698560?logs=test_world#L317



Re: Eliminating SPI from RI triggers - take 2

From
Robert Haas
Date:
On Fri, Jul 1, 2022 at 2:23 AM Amit Langote <amitlangote09@gmail.com> wrote:
> So, I hacked together a patch (attached 0001) that invents an "RI
> plan" construct (struct RIPlan) to replace the use of an "SPI plan"
> (struct _SPI_plan).
>
> With that in place, I decided to rebase my previous patch [1] to use
> this new interface and the result is attached 0002.

I think inventing something like RIPlan is probably reasonable, but
I'm not sure how much it really does to address the objections that
were raised previously. How do we know that ri_LookupKeyInPkRel does
all the same things that executing a plan would have done? I see that
function contains permission-checking logic, for example, as well as
snapshot-related logic, and maybe there are other subsystems to worry
about, like rules or triggers or row-level security. Maybe there's no
answer to that problem other than careful manual verification, because
after all the only way to be 100% certain we're doing all the things
that would happen if you executed a plan is to execute a plan, which
kind of defeats the point of the whole thing. All I'm saying is that
I'm not sure that this refactoring in and of itself addresses that
concern.

As far as 0002 goes, the part I'm most skeptical about is this:

+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+ /* Never store anything that can be invalidated. */
+ return true;
+}

Isn't that leaving rather a lot on the table? ri_LookupKeyInPkRel is
going to be called a lot of times and do a lot of things over and over
again that maybe only need to be done once, like checking permissions
and looking up the operators to use and reopening the index. And all
the stuff ExecGetLeafPartitionForKey does too, yikes that's a lot of
stuff. Now maybe that's what Tom wants, I don't know. Certainly, the
existing SQL-based implementation is going to do that stuff on every
call, too; I'm just not sure that's a good thing. I think there's some
debate to be had here over what behavior we need to preserve exactly
vs. what we can and should change. For instance, it seems clear to me
that leaving out permissions checks altogether would be not OK, but if
this implementation arranged to cache the results of a permission
check and the SQL-based implementations don't, is that OK? Maybe Tom
would argue that it isn't, because he considers that a part of the
user-visible behavior, but I'm not sure that's the right view of it. I
think what we're promising the user is that we will check permissions,
not that we're going to do it separately for every trigger firing, or
even that every kind of trigger is going to do it exactly the same
number of times as every other trigger. I think we need some input
from Tom (and perhaps others) on how rigidly we need to maintain the
high-level behavior here before we can really say much about whether
the implementation is as good as it can be.

I suspect, though, that there's more that can be done here in terms of
sharing code. For instance, picking on the permissions checking logic,
presumably that's something that every non-SQL implementation would
need to do. But the rest of what's in ri_LookupKeyInPkRel() is
specific to one particular kind of trigger. If we had multiple non-SQL
trigger types, we'd want to somehow have common logic for permissions
checking for all of them.

I also suspect that we ought to have a separation between planning and
execution even for non-SQL based things. You don't really have that
here. What that ought to look like, though, depends on the answers to
the questions above, about how exactly we think we need to reproduce
the existing behavior.

I find my ego slightly wounded by the comment that "the partition
descriptor machinery has a hack that assumes that the queries
originating in this module push the latest snapshot in the
transaction-snapshot mode." It's true that the partition descriptor
machinery gives different answers depending on the active snapshot,
but, err, is that a hack, or just a perfectly reasonable design
decision? An alternative might be for PartitionDirectoryLookup to take
a snapshot as an explicit argument rather than relying on the global
variable to get that information from context. I generally feel that
we rely too much on global variables where we should be passing around
explicit parameters, so if you're just arguing that explicit
parameters would be better here, then I agree and just didn't think of
it. If you're arguing that making the answer depend on the snapshot is
itself a bad idea, I don't agree with that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eliminating SPI from RI triggers - take 2

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ...  I think there's some
> debate to be had here over what behavior we need to preserve exactly
> vs. what we can and should change.

For sure.  For example, people occasionally complain because
user-defined triggers can defeat RI integrity checks.  Should we
change that?  I dunno, but if we're not using the standard executor
then there's at least some room to consider it.  I think people would
be upset if we stopped firing user triggers at all; but if triggers
couldn't defeat RI actions short of throwing a transaction-aborting
error, I believe a lot of people would consider that an improvement.

> For instance, it seems clear to me
> that leaving out permissions checks altogether would be not OK, but if
> this implementation arranged to cache the results of a permission
> check and the SQL-based implementations don't, is that OK? Maybe Tom
> would argue that it isn't, because he considers that a part of the
> user-visible behavior, but I'm not sure that's the right view of it.

Uh ... if such caching behavior is at all competently implemented,
it will be transparent because the cache will notice and respond to
events that should change its outputs.  So I don't foresee a semantic
problem there.  It may well be that it's practical to cache
permissions-check info for RI checks when it isn't for more general
queries, so looking into ideas like that seems well within scope here.
(Or then again, maybe we should be building a more general permissions
cache?)

I'm too tired to have more than that to say right now, but I agree
that there is room for discussion about exactly what behavior we
want to preserve.

            regards, tom lane



Re: Eliminating SPI from RI triggers - take 2

From
Robert Haas
Date:
On Fri, Jul 8, 2022 at 10:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Uh ... if such caching behavior is at all competently implemented,
> it will be transparent because the cache will notice and respond to
> events that should change its outputs.

Well, that assumes that we emit appropriate invalidations in every
place where permissions are updated, and take appropriate locks every
place where they are checked. I think that the first one might be too
optimistic, and the second one is definitely too optimistic. For
instance, consider pg_proc_ownercheck. There's no lock of any kind
taken on the function here, and at least in typical cases, I don't
think the caller takes one either. Compare the extensive tap-dancing
around locking and permissions checking in RangeVarGetRelidExtended
against the blithe unconcern in FuncnameGetCandidates.

I believe that of all the types of SQL objects in the system, only
relations have anything like proper interlocking against concurrent
DDL. Other examples of not caring at all include LookupCollation() and
LookupTypeNameExtended(). There's just no heavyweight locking here at
all, and so no invalidation based on sinval messages can ever be
reliable.

GRANT and REVOKE don't take proper locks, either, even on tables:

rhaas=# begin;
BEGIN
rhaas=*# lock table pgbench_accounts;
LOCK TABLE
rhaas=*#

Then, in another session:

rhaas=# create role foo;
CREATE ROLE
rhaas=# grant select on pgbench_accounts to foo;
GRANT
rhaas=#

Executing "SELECT * FROM pgbench_accounts" in the other session would
have blocked, but the GRANT has no problem at all.

I don't see that any of this is this patch's job to fix. If nobody's
cared enough to fix it any time in the past 20 years, or just didn't
want to pay the locking cost, well then we probably don't need to do
it now either. But I think it means that even the slightest change in
the timing or frequency of permissions checks is in theory a
user-visible change, because there are no grounds for assuming that
the permissions on any of the objects involved aren't changing while
the query is executing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Sat, Jul 9, 2022 at 1:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 1, 2022 at 2:23 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > So, I hacked together a patch (attached 0001) that invents an "RI
> > plan" construct (struct RIPlan) to replace the use of an "SPI plan"
> > (struct _SPI_plan).
> >
> > With that in place, I decided to rebase my previous patch [1] to use
> > this new interface and the result is attached 0002.
>

Thanks for taking a look at this.  I'll try to respond to other points
in a separate email, but I wanted to clarify something about below:

> I find my ego slightly wounded by the comment that "the partition
> descriptor machinery has a hack that assumes that the queries
> originating in this module push the latest snapshot in the
> transaction-snapshot mode." It's true that the partition descriptor
> machinery gives different answers depending on the active snapshot,
> but, err, is that a hack, or just a perfectly reasonable design
> decision?

I think my calling it a hack of "partition descriptor machinery" is
not entirely fair (sorry), because it's talking about the following
comment in find_inheritance_children_extended(), which describes it as
being a hack, so I mentioned the word "hack" in my comment too:

        /*
         * Cope with partitions concurrently being detached.  When we see a
         * partition marked "detach pending", we omit it from the returned set
         * of visible partitions if caller requested that and the tuple's xmin
         * does not appear in progress to the active snapshot.  (If there's no
         * active snapshot set, that means we're not running a user query, so
         * it's OK to always include detached partitions in that case; if the
         * xmin is still running to the active snapshot, then the partition
         * has not been detached yet and so we include it.)
         *
         * The reason for this hack is that we want to avoid seeing the
         * partition as alive in RI queries during REPEATABLE READ or
         * SERIALIZABLE transactions: such queries use a different snapshot
         * than the one used by regular (user) queries.
         */

That bit came in to make DETACH CONCURRENTLY produce sane answers for
RI queries in some cases.

I guess my comment should really have said something like:

HACK: find_inheritance_children_extended() has a hack that assumes
that the queries originating in this module push the latest snapshot
in transaction-snapshot mode.

> An alternative might be for PartitionDirectoryLookup to take
> a snapshot as an explicit argument rather than relying on the global
> variable to get that information from context. I generally feel that
> we rely too much on global variables where we should be passing around
> explicit parameters, so if you're just arguing that explicit
> parameters would be better here, then I agree and just didn't think of
> it. If you're arguing that making the answer depend on the snapshot is
> itself a bad idea, I don't agree with that.

No, I'm not arguing that using a snapshot there is wrong and haven't
really thought hard about an alternative.

I tend to agree passing a snapshot explicitly might be better than
using ActiveSnapshot stuff for this.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Wed, Jul 13, 2022 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sat, Jul 9, 2022 at 1:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Thanks for taking a look at this.  I'll try to respond to other points
> in a separate email, but I wanted to clarify something about below:
>
> > I find my ego slightly wounded by the comment that "the partition
> > descriptor machinery has a hack that assumes that the queries
> > originating in this module push the latest snapshot in the
> > transaction-snapshot mode." It's true that the partition descriptor
> > machinery gives different answers depending on the active snapshot,
> > but, err, is that a hack, or just a perfectly reasonable design
> > decision?
>
> I think my calling it a hack of "partition descriptor machinery" is
> not entirely fair (sorry), because it's talking about the following
> comment in find_inheritance_children_extended(), which describes it as
> being a hack, so I mentioned the word "hack" in my comment too:
>
>         /*
>          * Cope with partitions concurrently being detached.  When we see a
>          * partition marked "detach pending", we omit it from the returned set
>          * of visible partitions if caller requested that and the tuple's xmin
>          * does not appear in progress to the active snapshot.  (If there's no
>          * active snapshot set, that means we're not running a user query, so
>          * it's OK to always include detached partitions in that case; if the
>          * xmin is still running to the active snapshot, then the partition
>          * has not been detached yet and so we include it.)
>          *
>          * The reason for this hack is that we want to avoid seeing the
>          * partition as alive in RI queries during REPEATABLE READ or
>          * SERIALIZABLE transactions: such queries use a different snapshot
>          * than the one used by regular (user) queries.
>          */
>
> That bit came in to make DETACH CONCURRENTLY produce sane answers for
> RI queries in some cases.
>
> I guess my comment should really have said something like:
>
> HACK: find_inheritance_children_extended() has a hack that assumes
> that the queries originating in this module push the latest snapshot
> in transaction-snapshot mode.

Posting a new version with this bit fixed; cfbot complained that 0002
needed a rebase over 3592e0ff98.

I will try to come up with a patch to enhance the PartitionDirectory
interface to allow passing the snapshot to use when scanning
pg_inherits explicitly, so we won't need the above "hack".

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Thu, Aug 4, 2022 at 1:05 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Jul 13, 2022 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > That bit came in to make DETACH CONCURRENTLY produce sane answers for
> > RI queries in some cases.
> >
> > I guess my comment should really have said something like:
> >
> > HACK: find_inheritance_children_extended() has a hack that assumes
> > that the queries originating in this module push the latest snapshot
> > in transaction-snapshot mode.
>
> Posting a new version with this bit fixed; cfbot complained that 0002
> needed a rebase over 3592e0ff98.
>
> I will try to come up with a patch to enhance the PartitionDirectory
> interface to allow passing the snapshot to use when scanning
> pg_inherits explicitly, so we won't need the above "hack".

Sorry about the delay.

So I came up with such a patch that is attached as 0003.

The main problem I want to fix with it is the need for RI_FKey_check()
to "force"-push the latest snapshot that the PartitionDesc code wants
to use to correctly include or omit a detach-pending partition from
the view of that function's RI query.  Scribbling on ActiveSnapshot
that way means that *all* scans involved in the execution of that
query now see a snapshot that they shouldn't likely be seeing; a bug
resulting from this has been demonstrated in a test case added by the
commit 00cb86e75d.

The fix is to make RI_FKey_check(), or really its RI_Plan's execution
function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
snapshot explicitly as a parameter of PartitionDirectoryLookup(),
which passes it down to the PartitionDesc code.  No need to manipulate
ActiveSnapshot.  The actual fix is in patch 0004, which I extracted
out of 0002 to keep the latter a mere refactoring patch without any
semantic changes (though a bit more on that below).  BTW, I don't know
of a way to back-patch a fix like this for the bug, because there is
no way other than ActiveSnapshot to pass the desired snapshot to the
PartitionDesc code if the only way we get to that code is by executing
an SQL query plan.

0003 moves the relevant logic out of
find_inheritance_children_extended() into its callers.  The logic of
deciding which snapshot to use to determine if a detach-pending
partition should indeed be omitted from the consideration of a caller
based on the result of checking the visibility of the corresponding
pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
Given the problems with using ActiveSnapshot mentioned above, I think
it is better to make the callers decide the snapshot and pass it using
a parameter named omit_detached_snapshot.  Only PartitionDesc code
actually cares about sending anything but the parent query's
ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
has been changed to add the same omit_detached_snapshot parameter.
find_inheritance_children(), the other caller used in many sites that
look at a table's partitions, defaults to using ActiveSnapshot, which
does not seem problematic.  Furthermore, only RI_FKey_check() needs to
pass anything other than ActiveSnapshot, so other users of
PartitionDesc, like user queries, still default to using the
ActiveSnapshot, which doesn't have any known problems either.

0001 and 0002 are mostly unchanged in this version, except I took out
the visibility bug-fix from 0002 into 0004 described above, which
looks better using the interface added by 0003 anyway.  I need to
address the main concern that it's still hard to be sure that the
patch in its current form doesn't break any user-level semantics of
these RI check triggers and other concerns about the implementation
that Robert expressed in [1].

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Sorry about the delay.
>
> So I came up with such a patch that is attached as 0003.
>
> The main problem I want to fix with it is the need for RI_FKey_check()
> to "force"-push the latest snapshot that the PartitionDesc code wants
> to use to correctly include or omit a detach-pending partition from
> the view of that function's RI query.  Scribbling on ActiveSnapshot
> that way means that *all* scans involved in the execution of that
> query now see a snapshot that they shouldn't likely be seeing; a bug
> resulting from this has been demonstrated in a test case added by the
> commit 00cb86e75d.
>
> The fix is to make RI_FKey_check(), or really its RI_Plan's execution
> function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
> snapshot explicitly as a parameter of PartitionDirectoryLookup(),
> which passes it down to the PartitionDesc code.  No need to manipulate
> ActiveSnapshot.  The actual fix is in patch 0004, which I extracted
> out of 0002 to keep the latter a mere refactoring patch without any
> semantic changes (though a bit more on that below).  BTW, I don't know
> of a way to back-patch a fix like this for the bug, because there is
> no way other than ActiveSnapshot to pass the desired snapshot to the
> PartitionDesc code if the only way we get to that code is by executing
> an SQL query plan.
>
> 0003 moves the relevant logic out of
> find_inheritance_children_extended() into its callers.  The logic of
> deciding which snapshot to use to determine if a detach-pending
> partition should indeed be omitted from the consideration of a caller
> based on the result of checking the visibility of the corresponding
> pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
> Given the problems with using ActiveSnapshot mentioned above, I think
> it is better to make the callers decide the snapshot and pass it using
> a parameter named omit_detached_snapshot.  Only PartitionDesc code
> actually cares about sending anything but the parent query's
> ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
> has been changed to add the same omit_detached_snapshot parameter.
> find_inheritance_children(), the other caller used in many sites that
> look at a table's partitions, defaults to using ActiveSnapshot, which
> does not seem problematic.  Furthermore, only RI_FKey_check() needs to
> pass anything other than ActiveSnapshot, so other users of
> PartitionDesc, like user queries, still default to using the
> ActiveSnapshot, which doesn't have any known problems either.
>
> 0001 and 0002 are mostly unchanged in this version, except I took out
> the visibility bug-fix from 0002 into 0004 described above, which
> looks better using the interface added by 0003 anyway.  I need to
> address the main concern that it's still hard to be sure that the
> patch in its current form doesn't break any user-level semantics of
> these RI check triggers and other concerns about the implementation
> that Robert expressed in [1].

Oops, I apparently posted the wrong 0004, containing a bug that
crashes `make check`.

Fixed version attached.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Thu, Sep 29, 2022 at 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Sorry about the delay.
> >
> > So I came up with such a patch that is attached as 0003.
> >
> > The main problem I want to fix with it is the need for RI_FKey_check()
> > to "force"-push the latest snapshot that the PartitionDesc code wants
> > to use to correctly include or omit a detach-pending partition from
> > the view of that function's RI query.  Scribbling on ActiveSnapshot
> > that way means that *all* scans involved in the execution of that
> > query now see a snapshot that they shouldn't likely be seeing; a bug
> > resulting from this has been demonstrated in a test case added by the
> > commit 00cb86e75d.
> >
> > The fix is to make RI_FKey_check(), or really its RI_Plan's execution
> > function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
> > snapshot explicitly as a parameter of PartitionDirectoryLookup(),
> > which passes it down to the PartitionDesc code.  No need to manipulate
> > ActiveSnapshot.  The actual fix is in patch 0004, which I extracted
> > out of 0002 to keep the latter a mere refactoring patch without any
> > semantic changes (though a bit more on that below).  BTW, I don't know
> > of a way to back-patch a fix like this for the bug, because there is
> > no way other than ActiveSnapshot to pass the desired snapshot to the
> > PartitionDesc code if the only way we get to that code is by executing
> > an SQL query plan.
> >
> > 0003 moves the relevant logic out of
> > find_inheritance_children_extended() into its callers.  The logic of
> > deciding which snapshot to use to determine if a detach-pending
> > partition should indeed be omitted from the consideration of a caller
> > based on the result of checking the visibility of the corresponding
> > pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
> > Given the problems with using ActiveSnapshot mentioned above, I think
> > it is better to make the callers decide the snapshot and pass it using
> > a parameter named omit_detached_snapshot.  Only PartitionDesc code
> > actually cares about sending anything but the parent query's
> > ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
> > has been changed to add the same omit_detached_snapshot parameter.
> > find_inheritance_children(), the other caller used in many sites that
> > look at a table's partitions, defaults to using ActiveSnapshot, which
> > does not seem problematic.  Furthermore, only RI_FKey_check() needs to
> > pass anything other than ActiveSnapshot, so other users of
> > PartitionDesc, like user queries, still default to using the
> > ActiveSnapshot, which doesn't have any known problems either.
> >
> > 0001 and 0002 are mostly unchanged in this version, except I took out
> > the visibility bug-fix from 0002 into 0004 described above, which
> > looks better using the interface added by 0003 anyway.  I need to
> > address the main concern that it's still hard to be sure that the
> > patch in its current form doesn't break any user-level semantics of
> > these RI check triggers and other concerns about the implementation
> > that Robert expressed in [1].
>
> Oops, I apparently posted the wrong 0004, containing a bug that
> crashes `make check`.
>
> Fixed version attached.

Here's another version that hopefully fixes the crash reported by
Cirrus CI [1] that is not reliably reproducible.

I suspect it may have to do with error_context_stack not being reset
when ri_LookupKeyInPkRel() does an early return; the `return false` in
that case was wrong too:

@@ -2693,7 +2693,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
         * looking for.
         */
        if (leaf_pk_rel == NULL)
-           return false;
+           goto done;

...
+done:
    /*
     * Pop the error context stack
     */

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://cirrus-ci.com/task/4901906421121024 (permalink?)

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Thu, Sep 29, 2022 at 6:09 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Sep 29, 2022 at 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Sorry about the delay.
> > >
> > > So I came up with such a patch that is attached as 0003.
> > >
> > > The main problem I want to fix with it is the need for RI_FKey_check()
> > > to "force"-push the latest snapshot that the PartitionDesc code wants
> > > to use to correctly include or omit a detach-pending partition from
> > > the view of that function's RI query.  Scribbling on ActiveSnapshot
> > > that way means that *all* scans involved in the execution of that
> > > query now see a snapshot that they shouldn't likely be seeing; a bug
> > > resulting from this has been demonstrated in a test case added by the
> > > commit 00cb86e75d.
> > >
> > > The fix is to make RI_FKey_check(), or really its RI_Plan's execution
> > > function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
> > > snapshot explicitly as a parameter of PartitionDirectoryLookup(),
> > > which passes it down to the PartitionDesc code.  No need to manipulate
> > > ActiveSnapshot.  The actual fix is in patch 0004, which I extracted
> > > out of 0002 to keep the latter a mere refactoring patch without any
> > > semantic changes (though a bit more on that below).  BTW, I don't know
> > > of a way to back-patch a fix like this for the bug, because there is
> > > no way other than ActiveSnapshot to pass the desired snapshot to the
> > > PartitionDesc code if the only way we get to that code is by executing
> > > an SQL query plan.
> > >
> > > 0003 moves the relevant logic out of
> > > find_inheritance_children_extended() into its callers.  The logic of
> > > deciding which snapshot to use to determine if a detach-pending
> > > partition should indeed be omitted from the consideration of a caller
> > > based on the result of checking the visibility of the corresponding
> > > pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
> > > Given the problems with using ActiveSnapshot mentioned above, I think
> > > it is better to make the callers decide the snapshot and pass it using
> > > a parameter named omit_detached_snapshot.  Only PartitionDesc code
> > > actually cares about sending anything but the parent query's
> > > ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
> > > has been changed to add the same omit_detached_snapshot parameter.
> > > find_inheritance_children(), the other caller used in many sites that
> > > look at a table's partitions, defaults to using ActiveSnapshot, which
> > > does not seem problematic.  Furthermore, only RI_FKey_check() needs to
> > > pass anything other than ActiveSnapshot, so other users of
> > > PartitionDesc, like user queries, still default to using the
> > > ActiveSnapshot, which doesn't have any known problems either.
> > >
> > > 0001 and 0002 are mostly unchanged in this version, except I took out
> > > the visibility bug-fix from 0002 into 0004 described above, which
> > > looks better using the interface added by 0003 anyway.  I need to
> > > address the main concern that it's still hard to be sure that the
> > > patch in its current form doesn't break any user-level semantics of
> > > these RI check triggers and other concerns about the implementation
> > > that Robert expressed in [1].
> >
> > Oops, I apparently posted the wrong 0004, containing a bug that
> > crashes `make check`.
> >
> > Fixed version attached.
>
> Here's another version that hopefully fixes the crash reported by
> Cirrus CI [1] that is not reliably reproducible.

And cfbot #1, which failed a bit after the above one, is not happy
with my failing to include utils/snapshot.h in a partdesc.h to which I
added:

@@ -65,9 +66,11 @@ typedef struct PartitionDescData


 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool
omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool
omit_detached,
+                                                Snapshot
omit_detached_snapshot);

 extern PartitionDirectory CreatePartitionDirectory(MemoryContext
mcxt, bool omit_detached);
-extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
+extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory,
Relation, Snapshot);

So, here's a final revision for today.  Sorry for the noise.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Andres Freund
Date:
Hi,

On 2022-09-29 18:18:10 +0900, Amit Langote wrote:
> So, here's a final revision for today.  Sorry for the noise.

This appears to fail on 32bit systems. Seems the new test is indeed
worthwhile...

https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406

[19:12:24.452] Summary of Failures:
[19:12:24.452]
[19:12:24.452]   2/243 postgresql:main / main/regress                                            FAIL           45.08s
(exitstatus 1)
 
[19:12:24.452]   4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade                         ERROR          71.96s
[19:12:24.452]  32/243 postgresql:recovery / recovery/027_stream_regress                         ERROR          45.84s

Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
suceeded in capture the test files of the 32bit build (and perhaps broke it
for 64bit builds as well?), so I can't see the regression.diffs contents.


[19:12:24.387]      alter_table                  ... FAILED     4546 ms
...
[19:12:24.387] ========================
[19:12:24.387]  1 of 211 tests failed.
[19:12:24.387] ========================
[19:12:24.387]
...


Greetings,

Andres Freund



Re: Eliminating SPI from RI triggers - take 2

From
Andres Freund
Date:
Hi,

On 2022-10-01 18:21:15 -0700, Andres Freund wrote:
> On 2022-09-29 18:18:10 +0900, Amit Langote wrote:
> > So, here's a final revision for today.  Sorry for the noise.
>
> This appears to fail on 32bit systems. Seems the new test is indeed
> worthwhile...
>
> https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406
>
> [19:12:24.452] Summary of Failures:
> [19:12:24.452]
> [19:12:24.452]   2/243 postgresql:main / main/regress                                            FAIL
45.08s(exit status 1)
 
> [19:12:24.452]   4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade                         ERROR
71.96s
> [19:12:24.452]  32/243 postgresql:recovery / recovery/027_stream_regress                         ERROR
45.84s
>
> Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
> suceeded in capture the test files of the 32bit build (and perhaps broke it
> for 64bit builds as well?), so I can't see the regression.diffs contents.

Oh, that appears to have been an issue on the CI side (*), while uploading the
logs. The previous run did catch the error:

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out
/tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out    2022-09-30 15:05:49.930613669 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out    2022-09-30 15:11:21.050383258 +0000
@@ -672,6 +672,8 @@
 ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable;
 -- Check it actually works
 INSERT INTO FKTABLE VALUES(42);        -- should succeed
+ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
+DETAIL:  Key (ftest1)=(42) is not present in table "pktable".
 INSERT INTO FKTABLE VALUES(43);        -- should fail
 ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
 DETAIL:  Key (ftest1)=(43) is not present in table "pktable".

Greetings,

Andres Freund

* Error from upload stream: rpc error: code = Unknown desc =



Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Sun, Oct 2, 2022 at 10:24 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-10-01 18:21:15 -0700, Andres Freund wrote:
> > On 2022-09-29 18:18:10 +0900, Amit Langote wrote:
> > > So, here's a final revision for today.  Sorry for the noise.
> >
> > This appears to fail on 32bit systems. Seems the new test is indeed
> > worthwhile...
> >
> > https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406
> >
> > [19:12:24.452] Summary of Failures:
> > [19:12:24.452]
> > [19:12:24.452]   2/243 postgresql:main / main/regress                                            FAIL
45.08s(exit status 1)
 
> > [19:12:24.452]   4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade                         ERROR
71.96s
> > [19:12:24.452]  32/243 postgresql:recovery / recovery/027_stream_regress                         ERROR
45.84s
> >
> > Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
> > suceeded in capture the test files of the 32bit build (and perhaps broke it
> > for 64bit builds as well?), so I can't see the regression.diffs contents.
>
> Oh, that appears to have been an issue on the CI side (*), while uploading the
> logs. The previous run did catch the error:
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out
/tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out      2022-09-30 15:05:49.930613669 +0000
> +++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out  2022-09-30 15:11:21.050383258 +0000
> @@ -672,6 +672,8 @@
>  ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable;
>  -- Check it actually works
>  INSERT INTO FKTABLE VALUES(42);                -- should succeed
> +ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
> +DETAIL:  Key (ftest1)=(42) is not present in table "pktable".
>  INSERT INTO FKTABLE VALUES(43);                -- should fail
>  ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
>  DETAIL:  Key (ftest1)=(43) is not present in table "pktable".

Thanks for the heads up.  Hmm, this I am not sure how to reproduce on
my own, so I am currently left with second-guessing what may be going
wrong on 32 bit machines with whichever of the 4 patches.

For now, I'll just post 0001, which I am claiming has no semantic
changes (proof pending), to rule out that that one's responsible.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Fri, Oct 7, 2022 at 6:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sun, Oct 2, 2022 at 10:24 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-10-01 18:21:15 -0700, Andres Freund wrote:
> > > On 2022-09-29 18:18:10 +0900, Amit Langote wrote:
> > > > So, here's a final revision for today.  Sorry for the noise.
> > >
> > > This appears to fail on 32bit systems. Seems the new test is indeed
> > > worthwhile...
> > >
> > > https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406
> > >
> > > [19:12:24.452] Summary of Failures:
> > > [19:12:24.452]
> > > [19:12:24.452]   2/243 postgresql:main / main/regress                                            FAIL
45.08s(exit status 1)
 
> > > [19:12:24.452]   4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade                         ERROR
71.96s
> > > [19:12:24.452]  32/243 postgresql:recovery / recovery/027_stream_regress                         ERROR
45.84s
> > >
> > > Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
> > > suceeded in capture the test files of the 32bit build (and perhaps broke it
> > > for 64bit builds as well?), so I can't see the regression.diffs contents.
> >
> > Oh, that appears to have been an issue on the CI side (*), while uploading the
> > logs. The previous run did catch the error:
> >
> > diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out
/tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out
> > --- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out      2022-09-30 15:05:49.930613669 +0000
> > +++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out  2022-09-30 15:11:21.050383258
+0000
> > @@ -672,6 +672,8 @@
> >  ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable;
> >  -- Check it actually works
> >  INSERT INTO FKTABLE VALUES(42);                -- should succeed
> > +ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
> > +DETAIL:  Key (ftest1)=(42) is not present in table "pktable".
> >  INSERT INTO FKTABLE VALUES(43);                -- should fail
> >  ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
> >  DETAIL:  Key (ftest1)=(43) is not present in table "pktable".
>
> Thanks for the heads up.  Hmm, this I am not sure how to reproduce on
> my own, so I am currently left with second-guessing what may be going
> wrong on 32 bit machines with whichever of the 4 patches.
>
> For now, I'll just post 0001, which I am claiming has no semantic
> changes (proof pending), to rule out that that one's responsible.

Nope, not 0001.  Here's 0001+0002.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Eliminating SPI from RI triggers - take 2

From
Alvaro Herrera
Date:
On 2022-Oct-07, Amit Langote wrote:

> > Thanks for the heads up.  Hmm, this I am not sure how to reproduce on
> > my own, so I am currently left with second-guessing what may be going
> > wrong on 32 bit machines with whichever of the 4 patches.
> >
> > For now, I'll just post 0001, which I am claiming has no semantic
> > changes (proof pending), to rule out that that one's responsible.
> 
> Nope, not 0001.  Here's 0001+0002.

Please note that you can set up a github repository so that cirrus-ci
tests whatever patches you like, without having to post them to
pg-hackers.  See src/tools/ci/README, it takes three minutes if you
already have the account and repository.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Fri, Oct 7, 2022 at 19:15 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Oct-07, Amit Langote wrote:

> > Thanks for the heads up.  Hmm, this I am not sure how to reproduce on
> > my own, so I am currently left with second-guessing what may be going
> > wrong on 32 bit machines with whichever of the 4 patches.
> >
> > For now, I'll just post 0001, which I am claiming has no semantic
> > changes (proof pending), to rule out that that one's responsible.
>
> Nope, not 0001.  Here's 0001+0002.

Please note that you can set up a github repository so that cirrus-ci
tests whatever patches you like, without having to post them to
pg-hackers.  See src/tools/ci/README, it takes three minutes if you
already have the account and repository.

Ah, that’s right.  Will do so, thanks for the suggestion.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Fri, Oct 7, 2022 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Oct 7, 2022 at 19:15 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> On 2022-Oct-07, Amit Langote wrote:
>> > > Thanks for the heads up.  Hmm, this I am not sure how to reproduce on
>> > > my own, so I am currently left with second-guessing what may be going
>> > > wrong on 32 bit machines with whichever of the 4 patches.
>> > >
>> > > For now, I'll just post 0001, which I am claiming has no semantic
>> > > changes (proof pending), to rule out that that one's responsible.
>> >
>> > Nope, not 0001.  Here's 0001+0002.

I had forgotten to actually attach anything with that email.

>> Please note that you can set up a github repository so that cirrus-ci
>> tests whatever patches you like, without having to post them to
>> pg-hackers.  See src/tools/ci/README, it takes three minutes if you
>> already have the account and repository.
>
> Ah, that’s right.  Will do so, thanks for the suggestion.

I'm waiting to hear from GitHub Support to resolve an error I'm facing
trying to add Cirrus CI to my account.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Robert Haas
Date:
On Thu, Sep 29, 2022 at 12:47 AM Amit Langote <amitlangote09@gmail.com> wrote:
> [ patches ]

While looking over this thread I came across this code:

    /* For data reading, executor always omits detached partitions */
    if (estate->es_partition_directory == NULL)
        estate->es_partition_directory =
            CreatePartitionDirectory(estate->es_query_cxt, false);

But CreatePartitionDirectory is declared like this:

extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
bool omit_detached);

So the comment seems to say the opposite of what the code does. The
code seems to match the explanation in the commit message for
71f4c8c6f74ba021e55d35b1128d22fb8c6e1629, so I am guessing that
perhaps s/always/never/ is needed here.

I also noticed that ExecCreatePartitionPruneState no longer exists in
the code but is still referenced in
src/test/modules/delay_execution/specs/partition-addition.spec

Regarding 0003, it seems unfortunate that
find_inheritance_children_extended() will now have 6 arguments 4 of
which have to do with detached partition handling. That is a lot of
detached partition handling, and it's hard to reason about. I don't
see an obvious way of simplifying things very much, but I wonder if we
could at least have the new omit_detached_snapshot snapshot replace
the existing bool omit_detached flag. Something like the attached
incremental patch.

Probably we need to go further than the attached, though. I don't
think that PartitionDirectoryLookup() should be getting any new
arguments. The whole point of that function is that it's supposed to
ensure that the returned value is stable, and the comments say so. But
with these changes it isn't any more, because it depends on the
snapshot you pass. It seems fine to specify when you create the
partition directory that you want it to show a different, still-stable
view of the world, but as written, it seems to me to undermine the
idea that the return value is expected to be stable at all. Is there a
way we can avoid that?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Wed, Oct 12, 2022 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Sep 29, 2022 at 12:47 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > [ patches ]
>
> While looking over this thread I came across this code:

Thanks for looking.

>     /* For data reading, executor always omits detached partitions */
>     if (estate->es_partition_directory == NULL)
>         estate->es_partition_directory =
>             CreatePartitionDirectory(estate->es_query_cxt, false);
>
> But CreatePartitionDirectory is declared like this:
>
> extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
> bool omit_detached);
>
> So the comment seems to say the opposite of what the code does. The
> code seems to match the explanation in the commit message for
> 71f4c8c6f74ba021e55d35b1128d22fb8c6e1629, so I am guessing that
> perhaps s/always/never/ is needed here.

I think you are right.  In commit 8aba9322511 that fixed a bug in this
area, we have this hunk:

-   /* Executor must always include detached partitions */
+   /* For data reading, executor always omits detached partitions */
    if (estate->es_partition_directory == NULL)
        estate->es_partition_directory =
-           CreatePartitionDirectory(estate->es_query_cxt, true);
+           CreatePartitionDirectory(estate->es_query_cxt, false);

The same commit also renamed the include_detached parameter of
CreatePartitionDirectory() to omit_detached but the comment change
didn't quite match with that.

I will fix this and other related comments to be consistent about
using the word "omit".  Will include them in the updated 0003.

> I also noticed that ExecCreatePartitionPruneState no longer exists in
> the code but is still referenced in
> src/test/modules/delay_execution/specs/partition-addition.spec

It looks like we missed that reference in commit 297daa9d435 wherein
we renamed it to just CreatePartitionPruneState().

I have posted a patch to fix this.

> Regarding 0003, it seems unfortunate that
> find_inheritance_children_extended() will now have 6 arguments 4 of
> which have to do with detached partition handling. That is a lot of
> detached partition handling, and it's hard to reason about. I don't
> see an obvious way of simplifying things very much, but I wonder if we
> could at least have the new omit_detached_snapshot snapshot replace
> the existing bool omit_detached flag. Something like the attached
> incremental patch.

Yeah, I was wondering the same too and don't see a reason why we
couldn't do it that way.

I have merged your incremental patch into 0003.

> Probably we need to go further than the attached, though. I don't
> think that PartitionDirectoryLookup() should be getting any new
> arguments. The whole point of that function is that it's supposed to
> ensure that the returned value is stable, and the comments say so. But
> with these changes it isn't any more, because it depends on the
> snapshot you pass. It seems fine to specify when you create the
> partition directory that you want it to show a different, still-stable
> view of the world, but as written, it seems to me to undermine the
> idea that the return value is expected to be stable at all. Is there a
> way we can avoid that?

Ok, I think it makes sense to have CreatePartitionDirectory take in
the snapshot and store it in PartitionDirectoryData for use during
each subsequent PartitionDirectoryLookup().  So we'll be replacing the
current omit_detached flag in PartitionDirectoryData, just as we are
doing for the interface functions.  Done that way in 0003.

Regarding 0002, which introduces ri_LookupKeyInPkRel(), I realized
that it may have been initializing the ScanKeys wrongly.  It was using
ScanKeyInit(), which uses InvalidOid for sk_subtype, causing the index
AM / btree code to use the wrong comparison functions when PK and FK
column types don't match.  That may have been a reason for 32-bit
machine failures pointed out by Andres upthread.  I've fixed it by
using ScanKeyEntryInitialize() to pass the opfamily-specified right
argument (FK column) type OID.

Attached updated patches.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Eliminating SPI from RI triggers - take 2

From
Robert Haas
Date:
On Sat, Oct 15, 2022 at 1:47 AM Amit Langote <amitlangote09@gmail.com> wrote:
> I have merged your incremental patch into 0003.

Note that if someone goes to commit 0003, they would have no idea that
I contributed to the effort. You should probably try to keep a running
list of co-authors, reviewers, or other people that need to be
acknowledged in your draft commit messages. On that note, I think that
the commit messages for 0001 and to some extent 0002 need some more
work. In particular, it seems like the commit message for 0001 is
entirely concerned with what the patch does and says nothing about why
it's a good idea. In my opinion, a good commit message needs to do
both, ideally but not always in less space than this patch takes to do
only one of those things. 0002 has the same problem to a lesser
degree, since it is perhaps not so hard to infer that the reason for
avoiding the SQL query is performance.

I am wondering if the ordering for this patch series needs to be
rethought. The commit message for 0004 reads as if it is fixing a bug
introduced by earlier patches in the series. If that is not correct,
maybe it can be made clearer. If it is correct, then that's not good,
because we don't want to commit buggy patches and then make follow-up
commits to remove the bugs. If a planned commit needs new
infrastructure to avoid being buggy, the commits adding that
infrastructure should happen first.

But I think the bigger problem for this patch set is that the
design-level feedback from
https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
is still trivial in v7, and that still seems wrong to me. And I still
don't know how we're going to avoid changing the semantics in ways
that are undesirable, or even knowing precisely what we did change. If
we don't have answers to those questions, then I suspect that this
patch set isn't going anywhere.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eliminating SPI from RI triggers - take 2

From
Andres Freund
Date:
Hi,

On 2022-10-15 14:47:05 +0900, Amit Langote wrote:
> Attached updated patches.

These started to fail to build recently:

[04:43:33.046] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage
-Isrc/include/utils-Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64
-Wall-Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes
-Wpointer-arith-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wshadow=compatible-local-Wformat-security -Wdeclaration-after-statement -Wno-format-truncation
-Wno-stringop-truncation-fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/executor_execPartition.c.o
-MFsrc/backend/postgres_lib.a.p/executor_execPartition.c.o.d -o src/backend/postgres_lib.a.p/executor_execPartition.c.o
-c../src/backend/executor/execPartition.c
 
[04:43:33.046] ../src/backend/executor/execPartition.c: In function ‘ExecGetLeafPartitionForKey’:
[04:43:33.046] ../src/backend/executor/execPartition.c:1679:19: error: too few arguments to function
‘build_attrmap_by_name_if_req’
[04:43:33.046]  1679 |    AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
[04:43:33.046]       |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[04:43:33.046] In file included from ../src/include/access/tupconvert.h:17,
[04:43:33.046]                  from ../src/include/nodes/execnodes.h:32,
[04:43:33.046]                  from ../src/include/executor/execPartition.h:16,
[04:43:33.046]                  from ../src/backend/executor/execPartition.c:21:
[04:43:33.046] ../src/include/access/attmap.h:47:17: note: declared here
[04:43:33.046]    47 | extern AttrMap *build_attrmap_by_name_if_req(TupleDesc indesc,
[04:43:33.046]       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~


Regards,

Andres



Re: Eliminating SPI from RI triggers - take 2

From
"Gregory Stark (as CFM)"
Date:
On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Oct 15, 2022 at 1:47 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> But I think the bigger problem for this patch set is that the
> design-level feedback from
> https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
> hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
> is still trivial in v7, and that still seems wrong to me. And I still
> don't know how we're going to avoid changing the semantics in ways
> that are undesirable, or even knowing precisely what we did change. If
> we don't have answers to those questions, then I suspect that this
> patch set isn't going anywhere.

Amit, do you plan to work on this patch for this commitfest (and
therefore this release?). And do you think it has a realistic chance
of being ready for commit this month?

It looks to me like you have some good feedback and can progress and
are unlikely to finish this patch for this release. In which case
maybe we can move it forward to the next release?

-- 
Gregory Stark
As Commitfest Manager



Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
Hi Greg,

On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM)
<stark.cfm@gmail.com> wrote:
> On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:
> > But I think the bigger problem for this patch set is that the
> > design-level feedback from
> >
https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
> > hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
> > is still trivial in v7, and that still seems wrong to me. And I still
> > don't know how we're going to avoid changing the semantics in ways
> > that are undesirable, or even knowing precisely what we did change. If
> > we don't have answers to those questions, then I suspect that this
> > patch set isn't going anywhere.
>
> Amit, do you plan to work on this patch for this commitfest (and
> therefore this release?). And do you think it has a realistic chance
> of being ready for commit this month?

Unfortunately, I don't think so.

> It looks to me like you have some good feedback and can progress and
> are unlikely to finish this patch for this release. In which case
> maybe we can move it forward to the next release?

Yes, that's what I am thinking too at this point.

I agree with Robert's point that changing the implementation from an
SQL query plan to a hand-rolled C function is going to change the
semantics in some known and perhaps many unknown ways.  Until I have
enumerated all those semantic changes, it's hard to judge whether the
hand-rolled implementation is correct to begin with.  I had started
doing that a few months back but couldn't keep up due to some other
work.

An example I had found of a thing that would be broken by taking out
the executor out of the equation, as the patch does, is the behavior
of an update under READ COMMITTED isolation, whereby a PK tuple being
checked for existence is concurrently updated and thus needs to
rechecked whether it still satisfies the RI query's conditions.  The
executor has the EvalPlanQual() mechanism to do that, but while the
hand-rolled implementation did refactor ExecLockRows() to allow doing
the tuple-locking without a PlanState, it gave no consideration to
handling rechecking under READ COMMITTED isolation.

There may be other such things and I think I'd better look for them
carefully in the next cycle than in the next couple of weeks for this
release.  My apologies that I didn't withdraw the patch sooner.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Eliminating SPI from RI triggers - take 2

From
Daniel Gustafsson
Date:
> On 21 Mar 2023, at 06:03, Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote:
>> On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:

>>> But I think the bigger problem for this patch set is that the
>>> design-level feedback from
>>>
https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
>>> hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
>>> is still trivial in v7, and that still seems wrong to me. And I still
>>> don't know how we're going to avoid changing the semantics in ways
>>> that are undesirable, or even knowing precisely what we did change. If
>>> we don't have answers to those questions, then I suspect that this
>>> patch set isn't going anywhere.
>>
>> Amit, do you plan to work on this patch for this commitfest (and
>> therefore this release?). And do you think it has a realistic chance
>> of being ready for commit this month?
>
> Unfortunately, I don't think so.

This thread has stalled with the patch not building and/or applying for a
while, so I am going to mark this Returned with Feebdback.  Please feel free to
resubmit to a future CF when there is renewed interest/time to work on this.

--
Daniel Gustafsson




Re: Eliminating SPI from RI triggers - take 2

From
Amit Langote
Date:
On Mon, Jul 10, 2023 at 5:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 21 Mar 2023, at 06:03, Amit Langote <amitlangote09@gmail.com> wrote:
> > On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote:
> >> On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:
>
> >>> But I think the bigger problem for this patch set is that the
> >>> design-level feedback from
> >>>
https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
> >>> hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
> >>> is still trivial in v7, and that still seems wrong to me. And I still
> >>> don't know how we're going to avoid changing the semantics in ways
> >>> that are undesirable, or even knowing precisely what we did change. If
> >>> we don't have answers to those questions, then I suspect that this
> >>> patch set isn't going anywhere.
> >>
> >> Amit, do you plan to work on this patch for this commitfest (and
> >> therefore this release?). And do you think it has a realistic chance
> >> of being ready for commit this month?
> >
> > Unfortunately, I don't think so.
>
> This thread has stalled with the patch not building and/or applying for a
> while, so I am going to mark this Returned with Feebdback.

Agreed, I was about to do so myself.

I'll give this another try later in the cycle.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com