Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Date
Msg-id 20220329020948.GA311150@nathanxps13
Whole thread Raw
In response to Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On Mon, Mar 28, 2022 at 04:30:27PM -0400, Stephen Frost wrote:
>> -     By default, <function>pg_start_backup</function> can take a long time to finish.
>> +     By default, <function>pg_backup_start</function> can take a long time to finish.
>>       This is because it performs a checkpoint, and the I/O
>>       required for the checkpoint will be spread out over a significant
>>       period of time, by default half your inter-checkpoint interval
> 
> Hrmpf.  Not this patch's fault, but the default isn't 'half your
> inter-checkpoint interval' anymore (checkpoint_completion_target was
> changed to 0.9 ... let's not discuss who did that and missed fixing
> this).  Overall though, maybe we should reword this to avoid having to
> remember to change it again if we ever change the completion target
> again?  Also it seems to imply that pg_backup_start is going to run its
> own independent checkpoint or something, which isn't the typical case.
> I generally explain what is going on here like this:
> 
> By default, <function>pg_backup_start</function> will wait for the next
> checkpoint to complete (see ref: checkpoint_timeout ... maybe also
> wal-configuration.html).

done

>> -     The <function>pg_stop_backup</function> will return one row with three
>> +     The <function>pg_backup_stop</function> will return one row with three
>>       values. The second of these fields should be written to a file named
>>       <filename>backup_label</filename> in the root directory of the backup. The
>>       third field should be written to a file named
> 
> I get that we have <function> and </function>, but those are just
> formatting and don't show up in the actual text, and so it ends up
> being:
> 
> The pg_backup_stop will return
> 
> Which doesn't sound quite right to me.  I'd say we either drop 'The' or
> add 'function' after.  I realize that this patch is mostly doing a
> s/pg_stop_backup/pg_backup_stop/, but, still.

done

>> -        Finishes performing an exclusive or non-exclusive on-line backup.
>> -        The <parameter>exclusive</parameter> parameter must match the
>> -        previous <function>pg_start_backup</function> call.
>> -        In an exclusive backup, <function>pg_stop_backup</function> removes
>> -        the backup label file and, if it exists, the tablespace map file
>> -        created by <function>pg_start_backup</function>.  In a non-exclusive
>> -        backup, the desired contents of these files are returned as part of
>> +        Finishes performing an on-line backup.  The desired contents of the
>> +        backup label file and the tablespace map file are returned as part of
>>          the result of the function, and should be written to files in the
>>          backup area (not in the data directory).
>>         </para>
> 
> Given that this is a crucial part, which the exclusive mode has wrong,
> I'd be a bit more forceful about it, eg:
> 
> The contents of the backup label file and the tablespace map file must
> be stored as part of the backup and must NOT be stored in the live data
> directory (doing so will cause PostgreSQL to fail to restart in the
> event of a crash).

done

>>          The result of the function is a single record.
>>          The <parameter>lsn</parameter> column holds the backup's ending
>>          write-ahead log location (which again can be ignored).  The second and
>> -        third columns are <literal>NULL</literal> when ending an exclusive
>> -        backup; after a non-exclusive backup they hold the desired contents of
>> +        third columns hold the desired contents of
>>          the label and tablespace map files.
>>         </para>
>>         <para>
> 
> I dislike saying 'desired' here as if it's something that we're nicely
> asking but maybe isn't a big deal, and just saying 'label' isn't good
> since we call it 'backup label' elsewhere and we should try hard to be
> consistent and clear.  How about:
> 
> The second column returns the contents of the backup label file, the
> third column returns the contents of the tablespace map file.  These
> must be stored as part of the backup and are required as part of the
> restore process.

done

>>          {
>>              print_msg(_("WARNING: online backup mode is active\n"
>> -                        "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
>> +                        "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
>>          }
>>  
>>          print_msg(_("waiting for server to shut down..."));
>> @@ -1145,7 +1145,7 @@ do_restart(void)
>>              get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
>>          {
>>              print_msg(_("WARNING: online backup mode is active\n"
>> -                        "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
>> +                        "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
>>          }
>>  
>>          print_msg(_("waiting for server to shut down..."));
> 
> This... can't actually happen anymore, right?  Shouldn't we just pull
> all of this out?

done

> On a once-over of the rest of the code, I definitely like how much we're
> able to simplify things in this area and remove various hacks in things
> like pg_basebackup and pg_rewind where we previously had to worry about
> backup_label and tablespace_map files being in a live data directory.
> I'm planning to spend more time on this and hopefully be able to get it
> in for v15.

Great!  Much appreciated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: MDAM techniques and Index Skip Scan patch
Next
From: David Rowley
Date:
Subject: Re: Window Function "Run Conditions"