Re: Simplify backend terminate and wait logic in postgres_fdw test - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Simplify backend terminate and wait logic in postgres_fdw test
Date
Msg-id CALj2ACVGSQsq68y-LmyXKZzbNVgSgsiVKSzsrVXzVgnsZTN26Q@mail.gmail.com
Whole thread Raw
In response to Re: Simplify backend terminate and wait logic in postgres_fdw test  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, May 4, 2021 at 9:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> > On Tue, May 4, 2021 at 4:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The buildfarm is showing that one of these test queries is not stable
> >> under CLOBBER_CACHE_ALWAYS:
>
> > I can reproduce the issue with the failing case. Issue is that the
> > backend pid will be null in the pg_stat_activity because of the cache
> > invalidation that happens at the beginning of the query and hence
> > pg_terminate_backend returns null on null input.
>
> No, that's nonsense: if it were happening that way, the query would
> return one row with a NULL result, but actually it's returning no
> rows.  What's actually happening, it seems, is that because
> pgfdw_inval_callback is constantly getting called due to cache
> flushes, we invariably drop remote connections immediately during
> transaction termination (cf pgfdw_xact_callback).  Thus, by the time
> we inspect pg_stat_activity, there is no remote session to terminate.
>
> I don't like your patch because what it effectively does is mask
> whether termination happened or not; if there were a bug there
> causing that not to happen, the test would still appear to pass.
>
> I think the most expedient fix, if we want to keep this test, is
> just to transiently disable debug_invalidate_system_caches_always.
> (That option wasn't available before v14, but fortunately we
> don't need a fix for the back branches.)
>
> I believe the attached will do the trick, but I'm running the test
> with debug_invalidate_system_caches_always turned on to verify
> that.  Should be done in an hour or so...

Thanks for pushing this change.

If debug_invalidate_system_caches_always is allowed to be used for
cache sensitive test cases, I see an opportunity to make the tests,
that are adjusted by commit f77717b29, more meaningful as they were
before the commit. That commit changed the way below functions show up
output in the tests:
SELECT 1 FROM postgres_fdw_disconnect_all();
SELECT server_name FROM postgres_fdw_get_connections() ORDER BY 1;

If okay, I can work on it (not for PG14 of course). It can be
discussed in a separate thread though.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: AlterSubscription_refresh "wrconn" wrong variable?
Next
From: Craig Ringer
Date:
Subject: [PATCH] Faster, easier valgrind runs with make USE_VALGRIND=1 check