Re: Move backup-related code to xlogbackup.c/.h - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Move backup-related code to xlogbackup.c/.h
Date
Msg-id 20221014082441.adpqpgr75nt7yxs3@alvherre.pgsql
Whole thread Raw
In response to Re: Move backup-related code to xlogbackup.c/.h  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Move backup-related code to xlogbackup.c/.h  (Michael Paquier <michael@paquier.xyz>)
Re: Move backup-related code to xlogbackup.c/.h  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On 2022-Oct-13, Bharath Rupireddy wrote:

> The pg_backup_start_callback() can just go ahead and reset
> sessionBackupState. However, it leads us to the complete removal of
> pg_backup_start_callback() itself and use do_pg_abort_backup()
> consistently across, saving 20 LOC attached as v5-0001.

OK, that's not bad -- but there is a fatal flaw here: do_pg_backup_start
only sets sessionBackupState *after* it has finished setting things up,
so if you only change it like this, do_pg_abort_backup will indeed run,
but it'll do nothing because it hits the "quick exit" test.  Therefore,
if a backup aborts while setting up, you'll keep running with forced
page writes until next postmaster crash or restart.  Not good.

ISTM we need to give another flag to the callback function besides
emit_warning: one that says whether to test sessionBackupState or not.
I suppose the easiest way to do it with no other changes is to turn
'arg' into a bitmask.
But alternatively, we could just remove emit_warning as a flag and have
the warning be emitted always; then we can use the boolean for the other
purpose.  I don't think the extra WARNING thrown during backup set-up is
going to be a problem, since it will mostly never be seen anyway (and if
you do see it, it's not a lie.)

However, what's most problematic about this patch is that it introduces
a pretty serious bug, yet that bug goes unnoticed if you just run the
builtin test suites.  I only noticed because I added an elog(ERROR,
"oops") in the area protected by ENSURE_ERROR_CLEANUP and a debug
elog(WARNING) in the cleanup area, then examined the server log after
the pg_basebackup test filed; but this is not very workable.  I wonder
what would be a good way to keep this in check.  The naive way seems to
be to run a pg_basebackup, have it abort partway through (how?), then
test the server and see if forced page writes are enabled or not.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
                                                           (Paul Graham)



pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Question about pull_up_sublinks_qual_recurse
Next
From: Michael Paquier
Date:
Subject: Re: Move backup-related code to xlogbackup.c/.h