On Thu, Sep 29, 2022 at 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Sorry about the delay.
> >
> > So I came up with such a patch that is attached as 0003.
> >
> > The main problem I want to fix with it is the need for RI_FKey_check()
> > to "force"-push the latest snapshot that the PartitionDesc code wants
> > to use to correctly include or omit a detach-pending partition from
> > the view of that function's RI query. Scribbling on ActiveSnapshot
> > that way means that *all* scans involved in the execution of that
> > query now see a snapshot that they shouldn't likely be seeing; a bug
> > resulting from this has been demonstrated in a test case added by the
> > commit 00cb86e75d.
> >
> > The fix is to make RI_FKey_check(), or really its RI_Plan's execution
> > function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
> > snapshot explicitly as a parameter of PartitionDirectoryLookup(),
> > which passes it down to the PartitionDesc code. No need to manipulate
> > ActiveSnapshot. The actual fix is in patch 0004, which I extracted
> > out of 0002 to keep the latter a mere refactoring patch without any
> > semantic changes (though a bit more on that below). BTW, I don't know
> > of a way to back-patch a fix like this for the bug, because there is
> > no way other than ActiveSnapshot to pass the desired snapshot to the
> > PartitionDesc code if the only way we get to that code is by executing
> > an SQL query plan.
> >
> > 0003 moves the relevant logic out of
> > find_inheritance_children_extended() into its callers. The logic of
> > deciding which snapshot to use to determine if a detach-pending
> > partition should indeed be omitted from the consideration of a caller
> > based on the result of checking the visibility of the corresponding
> > pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
> > Given the problems with using ActiveSnapshot mentioned above, I think
> > it is better to make the callers decide the snapshot and pass it using
> > a parameter named omit_detached_snapshot. Only PartitionDesc code
> > actually cares about sending anything but the parent query's
> > ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
> > has been changed to add the same omit_detached_snapshot parameter.
> > find_inheritance_children(), the other caller used in many sites that
> > look at a table's partitions, defaults to using ActiveSnapshot, which
> > does not seem problematic. Furthermore, only RI_FKey_check() needs to
> > pass anything other than ActiveSnapshot, so other users of
> > PartitionDesc, like user queries, still default to using the
> > ActiveSnapshot, which doesn't have any known problems either.
> >
> > 0001 and 0002 are mostly unchanged in this version, except I took out
> > the visibility bug-fix from 0002 into 0004 described above, which
> > looks better using the interface added by 0003 anyway. I need to
> > address the main concern that it's still hard to be sure that the
> > patch in its current form doesn't break any user-level semantics of
> > these RI check triggers and other concerns about the implementation
> > that Robert expressed in [1].
>
> Oops, I apparently posted the wrong 0004, containing a bug that
> crashes `make check`.
>
> Fixed version attached.
Here's another version that hopefully fixes the crash reported by
Cirrus CI [1] that is not reliably reproducible.
I suspect it may have to do with error_context_stack not being reset
when ri_LookupKeyInPkRel() does an early return; the `return false` in
that case was wrong too:
@@ -2693,7 +2693,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
* looking for.
*/
if (leaf_pk_rel == NULL)
- return false;
+ goto done;
...
+done:
/*
* Pop the error context stack
*/
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1] https://cirrus-ci.com/task/4901906421121024 (permalink?)