Re: Simplify backend terminate and wait logic in postgres_fdw test - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Simplify backend terminate and wait logic in postgres_fdw test
Date
Msg-id CALj2ACV4qtU1m5ip++DQdDrgczTqgxqSN5xAQB42o02YhTD_ug@mail.gmail.com
Whole thread Raw
In response to Re: Simplify backend terminate and wait logic in postgres_fdw test  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Simplify backend terminate and wait logic in postgres_fdw test
List pgsql-hackers
On Thu, Apr 8, 2021 at 6:27 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Apr 8, 2021 at 5:28 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Apr 08, 2021 at 04:55:22PM +0530, Bharath Rupireddy wrote:
> > > With the recent commit aaf0432572 which introduced a waiting/timeout
> > > capability for pg_teriminate_backend function, I would like to do
> > > $subject. Attaching a patch, please have a look.
> >
> > +-- Terminate the remote backend having the specified application_name and wait
> > +-- for the termination to complete. 10 seconds timeout here is chosen randomly,
> > +-- we will see a warning if the process doesn't go away within that time.
> > +SELECT pg_terminate_backend(pid, 10000) FROM pg_stat_activity
> > +    WHERE application_name = 'fdw_retry_check';
> >
> > I think that you are making the tests less stable by doing that.  A
> > couple of buildfarm machines are very slow, and 10 seconds would not
> > be enough.  So it seems to me that this patch is trading what is a
> > stable solution for a solution that may finish by randomly bite.
>
> Agree. Please see the attached patch, I removed a fixed waiting time.
> Instead of relying on pg_stat_activity, pg_sleep and
> pg_stat_clear_snapshot, now it depends on pg_terminate_backend and
> pg_wait_for_backend_termination. This way we could reduce the
> functions that the procedure terminate_backend_and_wait uses and also
> the new functions pg_terminate_backend and
> pg_wait_for_backend_termination gets a test coverage.
>
> Thoughts?

I realized that the usage like below in v2 patch is completely wrong,
because pg_terminate_backend without timeout will return true and the
loop exits without calling pg_wait_for_backend_terminatioin. Sorry for
not realizing this earlier.
    SELECT * INTO is_terminated FROM pg_terminate_backend(pid_v);
    LOOP
        EXIT WHEN is_terminated;
        SELECT * INTO is_terminated FROM
pg_wait_for_backend_termination(pid_v, 1000);
    END LOOP;

I feel that we can provide a high timeout value (It can be 1hr on the
similar lines of using pg_sleep(3600) for crash tests in
013_crash_restart.pl with the assumption that the backend running that
command will get killed with SIGQUIT) and make pg_terminate_backend
wait. This unreasonably high timeout looks okay because of the
assumption that the servers in the build farm will not take that much
time to remove the backend from the system processes, so the function
will return much earlier than that. If at all there's a server(which
is impractical IMO) that doesn't remove the backend process even
within 1hr, then that is anyways will fail with the warning.

With the attached patch, we could remove the procedure
terminate_backend_and_wait altogether. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Lots of incorrect comments in nodeFuncs.c
Next
From: "Daniel Westermann (DWE)"
Date:
Subject: Another small guc.c fix