Re: Remove inner joins based on foreign keys - Mailing list pgsql-hackers

From Alexandra Wang
Subject Re: Remove inner joins based on foreign keys
Date
Msg-id CAK98qZ3ECQtS-QSa2cSHUKGkYjgOPWpfEGg7mJPVnqQ_5OdE0g@mail.gmail.com
Whole thread
In response to Re: Remove inner joins based on foreign keys  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
Hi Richard,

On Fri, May 1, 2026 at 1:25 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Attached patch relaxes the check against ref_rel->joininfo.  Nothing
> else has changed.

Thanks for the patch! I've tested and reviewed v4. Overall, this looks
good to me. Please find individual feedback below.

On Wed, Apr 1, 2026 at 2:46 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Mar 23, 2026 at 3:12 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > Just for the sake of archives, the timeline of the trap during a
> > cascading delete looks like this:
> >
> > T0: DELETE FROM users WHERE id = 1;
> >
> > T1: The executor finds users row 1 and sets its xmax, physically
> > marking it as dead.
> >
> > T2: [The Gap] The executor pushes the RI trigger into a queue to deal
> > with orders later.  Right now, the orders row still exists, but its
> > referenced row in users is dead.
> >
> > T3: The statement finishes, the trigger fires, and the orders row is
> > finally deleted.
> >
> > The "T2 Gap" is small, but there are several ways to execute user code
> > inside that window, such as RETURNING clauses, volatile functions, or
> > user-defined AFTER ROW triggers.
>
> I've been exploring ways to detect whether a query is executing within
> the RI trigger gap.  It seems one promising approach is to leverage
> the lock manager.
>
> Every DML statement acquires RowExclusiveLock on its target table
> before modifying any rows, so if the trigger gap is active, the lock
> is guaranteed to be held.  We can verify this during planning by
> calling CheckRelationOidLockedByMe(), which is a quite efficient
> check.  (As a point of precedent, the planner already relies on the
> local lock manager's state in several existing code paths.)
>
> For prepared statements and cached plans, a generic plan that was
> built with FK join removal could be reused in a context where the
> trigger gap is active.  To handle this, we can modify
> choose_custom_plan() to check whether any relation involved in an
> FK-based join removal currently holds RowExclusiveLock, and choose
> custom plan if so.
>
> The trade-off is false positives: because RowExclusiveLock persists
> for the entire transaction while the trigger gap is intra-statement,
> the optimization is also skipped after the DML completes but within
> the same transaction, after ROLLBACK TO a savepoint, or when
> RowExclusiveLock is held for other reasons (e.g., LOCK TABLE).  These
> seem like uncommon cases, and the query simply falls back to executing
> the join normally as it would without the optimization at all, so I
> think this is an acceptable trade-off.

On Tue, Apr 28, 2026 at 3:09 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Apr 1, 2026 at 6:45 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > Please see the v2 patch for the implementation details.
> >
> > I didn't find any mention of this approach in the 2014 thread.  I'd
> > appreciate any thoughts or feedback on this direction.
>
> Here is a new rebase over 20efbdffe.  It also tightens up some checks
> in inner_join_is_removable().
>
> I've reconsidered the trigger-gap issue and I think I now have a
> cleaner understanding.  A snapshot captured inside the trigger-gap
> window can outlive the window's closure -- for example via a STABLE
> function that inherits an older snapshot from its caller, as Tom
> pointed out back in 2015 [1].  A query executed against such a stale
> snapshot would still observe the inconsistency even after the trigger
> queue has drained.  So the predicate guarding the optimization has to
> remain positive at least as long as any in-transaction snapshot could
> still be referenced.
>
> I went through several alternatives (a per-statement counter
> incremented in ExecInitModifyTable, AfterTriggerPendingOnRel, etc.)
> and convinced myself that all of them go silent before the relevant
> snapshots are gone.  The lock-based predicate is still the best
> correct approach I can think of: RowExclusiveLock is released only at
> end of transaction, which trivially exceeds the lifetime of any
> in-transaction snapshot.  By the time the lock is released, every
> in-transaction snapshot has been released, so no stale gap-window
> snapshot can still be referenced.  Maybe the false positives are just
> the price we need to pay for that lifetime guarantee. 

Checking RowExclusiveLock makes sense to me.

One nitpick on wording: the RowExclusiveLock acquired at the beginning
of the DML command can be released early by ROLLBACK TO a savepoint
established before that command, so it doesn't strictly "exceed the
lifetime of any in-transaction snapshot." In practice this doesn't
matter, though, because the lock is guaranteed to outlive any snapshot
taken during the trigger gap of the DML command, which is the only
window where the FK invariant can be violated.

IIUC, as long as we ensure that the snapshot of the join statement --
whether it is an after-row trigger, RETURNING clause, cursor, etc. --
is taken outside of the trigger gap, which IS guarded by
RowExclusiveLock, removing the join from its plan should be safe.

So, setting aside the false positives you mentioned, which I think are
acceptable, I believe this approach is correct.

I then tried to find edge cases similar to the 2014 threads and found
one bug with TABLESAMPLE:

-- If the patch removes the join when the referenced table uses TABLESAMPLE,
-- it will return MORE rows than correct (all child rows instead of only
-- those whose parent appeared in the sample).

CREATE TABLE exp_parent (id int PRIMARY KEY, data text);
CREATE TABLE exp_child (
    id int PRIMARY KEY,
    parent_id int NOT NULL REFERENCES exp_parent(id)
);

INSERT INTO exp_parent SELECT g, 'row_' || g FROM generate_series(1, 100) g;
INSERT INTO exp_child SELECT g, g FROM generate_series(1, 100) g;
ANALYZE exp_parent;
ANALYZE exp_child;

-- This join should NOT be removed because TABLESAMPLE limits the parent rows
EXPLAIN (COSTS OFF)
SELECT c.id
FROM exp_child c
INNER JOIN exp_parent p TABLESAMPLE BERNOULLI(1) REPEATABLE(42) ON c.parent_id = p.id;

-- Verify: if join removal fires, this would return ~100 rows.
-- With the join, it should return ~1 row (1% sample).
SELECT count(*)
FROM exp_child c
INNER JOIN exp_parent p TABLESAMPLE BERNOULLI(1) REPEATABLE(42) ON c.parent_id = p.id;

We should add a condition for TABLESAMPLE in inner_join_is_removable()
to disallow join removal.

Minor things I think might be worth mentioning, but I don't have
strong opinions:

1. Self-join removal has a GUC: enable_self_join_elimination. I didn't
investigate why that was needed, and I don't know in what cases people
would want to turn it off, maybe for debugging purposes? Do you think
we need a similar GUC for inner joins? Disclaimer: I'm not a fan of
unnecessary GUCs, just putting the question out there.

2. Nitpick: should we consider renaming remove_useless_joins() to
remove_useless_left_joins() to reflect what it actually does?

Best,
Alex

--
Alexandra Wang

pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: Wrong results with equality search using trigram index and non-deterministic collation
Next
From: Alexandre Felipe
Date:
Subject: Re: SLOPE - Planner optimizations on monotonic expressions.