Thread: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?

Hi,

After the commit [1], is it correct to say errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the contents in backend global memory, not actually reading from backup_label file? However, it is correct to say that in read_backup_label.

IMO, we can either say "invalid backup_label contents found" or we can be more descriptive and say "invalid "START WAL LOCATION" line found in backup_label content" and "invalid "BACKUP FROM" line found in backup_label content" and so on.

Thoughts?

[1]
commit 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a
Author: Stephen Frost <sfrost@snowman.net>
Date:   Wed Apr 6 14:41:03 2022 -0400

    Remove exclusive backup mode
 
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));

Regards,
Bharath Rupireddy.
At Wed, 20 Jul 2022 17:09:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Hi,
> 
> After the commit [1], is it correct to say errmsg("invalid data in file
> \"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the
> contents in backend global memory, not actually reading from backup_label
> file? However, it is correct to say that in read_backup_label.
> 
> IMO, we can either say "invalid backup_label contents found" or we can be
> more descriptive and say "invalid "START WAL LOCATION" line found in
> backup_label content" and "invalid "BACKUP FROM" line found in
> backup_label content" and so on.
> 
> Thoughts?

Previously there the case the "char *labelfile" is loaded from a file,
but currently it is alwasy a string build on the process. In that
sense, nowadays it is a kind of internal error, which I think is not
supposed to be exposed to users.

So I think we can leave the code alone to avoid back-patching
obstacles. But if we decided to change the code around, I'd like to
change the string into a C struct, so that we don't need to parse it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Thu, Jul 21, 2022 at 2:33 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 20 Jul 2022 17:09:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > Hi,
> >
> > After the commit [1], is it correct to say errmsg("invalid data in file
> > \"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the
> > contents in backend global memory, not actually reading from backup_label
> > file? However, it is correct to say that in read_backup_label.
> >
> > IMO, we can either say "invalid backup_label contents found" or we can be
> > more descriptive and say "invalid "START WAL LOCATION" line found in
> > backup_label content" and "invalid "BACKUP FROM" line found in
> > backup_label content" and so on.
> >
> > Thoughts?
>
> Previously there the case the "char *labelfile" is loaded from a file,
> but currently it is alwasy a string build on the process. In that
> sense, nowadays it is a kind of internal error, which I think is not
> supposed to be exposed to users.
>
> So I think we can leave the code alone to avoid back-patching
> obstacles. But if we decided to change the code around, I'd like to
> change the string into a C struct, so that we don't need to parse it.

Hm. I think we must take this opportunity to clean it up. You are
right, we don't need to parse the label file contents (just like we
used to do previously after reading it from the file) in
do_pg_backup_stop(), instead we can just pass a structure. Also,
do_pg_backup_stop() isn't modifying any labelfile contents, but using
startxlogfilename, startpoint and backupfrom from the labelfile
contents. I think this information can easily be passed as a single
structure. In fact, I might think a bit more here and wrap label_file,
tblspc_map_file to a single structure something like below and pass it
across the functions.

typedef struct BackupState
{
StringInfo label_file;
StringInfo tblspc_map_file;
char startxlogfilename[MAXFNAMELEN];
XLogRecPtr startpoint;
char backupfrom[20];
} BackupState;

This way, the code is more readable, structured and we can remove 2
sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
strstr() call. Only thing is that it creates code diff from the
previous PG versions which is fine IMO. If okay, I'm happy to prepare
a patch.

Thoughts?

Regards,
Bharath Rupireddy.



At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Hm. I think we must take this opportunity to clean it up. You are
> right, we don't need to parse the label file contents (just like we
> used to do previously after reading it from the file) in
> do_pg_backup_stop(), instead we can just pass a structure. Also,
> do_pg_backup_stop() isn't modifying any labelfile contents, but using
> startxlogfilename, startpoint and backupfrom from the labelfile
> contents. I think this information can easily be passed as a single
> structure. In fact, I might think a bit more here and wrap label_file,
> tblspc_map_file to a single structure something like below and pass it
> across the functions.
> 
> typedef struct BackupState
> {
> StringInfo label_file;
> StringInfo tblspc_map_file;
> char startxlogfilename[MAXFNAMELEN];
> XLogRecPtr startpoint;
> char backupfrom[20];
> } BackupState;
> 
> This way, the code is more readable, structured and we can remove 2
> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
> strstr() call. Only thing is that it creates code diff from the
> previous PG versions which is fine IMO. If okay, I'm happy to prepare
> a patch.
> 
> Thoughts?

It is more or less what was in my mind, but it seems that we don't
need StringInfo there, or should avoid it to signal the strings are
not editable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On 7/25/22 22:49, Kyotaro Horiguchi wrote:
> At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
>> Hm. I think we must take this opportunity to clean it up. You are
>> right, we don't need to parse the label file contents (just like we
>> used to do previously after reading it from the file) in
>> do_pg_backup_stop(), instead we can just pass a structure. Also,
>> do_pg_backup_stop() isn't modifying any labelfile contents, but using
>> startxlogfilename, startpoint and backupfrom from the labelfile
>> contents. I think this information can easily be passed as a single
>> structure. In fact, I might think a bit more here and wrap label_file,
>> tblspc_map_file to a single structure something like below and pass it
>> across the functions.
>>
>> typedef struct BackupState
>> {
>> StringInfo label_file;
>> StringInfo tblspc_map_file;
>> char startxlogfilename[MAXFNAMELEN];
>> XLogRecPtr startpoint;
>> char backupfrom[20];
>> } BackupState;
>>
>> This way, the code is more readable, structured and we can remove 2
>> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
>> strstr() call. Only thing is that it creates code diff from the
>> previous PG versions which is fine IMO. If okay, I'm happy to prepare
>> a patch.
>>
>> Thoughts?
> 
> It is more or less what was in my mind, but it seems that we don't
> need StringInfo there, or should avoid it to signal the strings are
> not editable.

I would prefer to have all the components of backup_label stored 
separately and then generate backup_label from them in pg_backup_stop().

For PG16 I am planning to add some fields to backup_label that are not 
known when pg_backup_start() is called, e.g. min recovery time.

Regards,
-David



On Tue, Jul 26, 2022 at 5:22 PM David Steele <david@pgmasters.net> wrote:
>
> On 7/25/22 22:49, Kyotaro Horiguchi wrote:
> > At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> >> Hm. I think we must take this opportunity to clean it up. You are
> >> right, we don't need to parse the label file contents (just like we
> >> used to do previously after reading it from the file) in
> >> do_pg_backup_stop(), instead we can just pass a structure. Also,
> >> do_pg_backup_stop() isn't modifying any labelfile contents, but using
> >> startxlogfilename, startpoint and backupfrom from the labelfile
> >> contents. I think this information can easily be passed as a single
> >> structure. In fact, I might think a bit more here and wrap label_file,
> >> tblspc_map_file to a single structure something like below and pass it
> >> across the functions.
> >>
> >> typedef struct BackupState
> >> {
> >> StringInfo label_file;
> >> StringInfo tblspc_map_file;
> >> char startxlogfilename[MAXFNAMELEN];
> >> XLogRecPtr startpoint;
> >> char backupfrom[20];
> >> } BackupState;
> >>
> >> This way, the code is more readable, structured and we can remove 2
> >> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
> >> strstr() call. Only thing is that it creates code diff from the
> >> previous PG versions which is fine IMO. If okay, I'm happy to prepare
> >> a patch.
> >>
> >> Thoughts?
> >
> > It is more or less what was in my mind, but it seems that we don't
> > need StringInfo there, or should avoid it to signal the strings are
> > not editable.
>
> I would prefer to have all the components of backup_label stored
> separately and then generate backup_label from them in pg_backup_stop().

+1, because pg_backup_stop is the one that's returning backup_label
contents, so it does make sense for it to prepare it once and for all
and return.

> For PG16 I am planning to add some fields to backup_label that are not
> known when pg_backup_start() is called, e.g. min recovery time.

Can you please point to your patch that does above?

Yes, right now, backup_label or tablespace_map contents are being
filled in by pg_backup_start and are never changed again. But if your
above proposal is for fixing some issue, then it would make sense for
us to carry all the info in a structure to pg_backup_stop and then let
it prepare the backup_label and tablespace_map contents.

If the approach is okay for the hackers, I would like to spend time on it.

Regards,
Bharath Rupireddy.




On 7/26/22 07:59, Bharath Rupireddy wrote:
> On Tue, Jul 26, 2022 at 5:22 PM David Steele <david@pgmasters.net> wrote:
>>
>> On 7/25/22 22:49, Kyotaro Horiguchi wrote:
>>> At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
>>>> Hm. I think we must take this opportunity to clean it up. You are
>>>> right, we don't need to parse the label file contents (just like we
>>>> used to do previously after reading it from the file) in
>>>> do_pg_backup_stop(), instead we can just pass a structure. Also,
>>>> do_pg_backup_stop() isn't modifying any labelfile contents, but using
>>>> startxlogfilename, startpoint and backupfrom from the labelfile
>>>> contents. I think this information can easily be passed as a single
>>>> structure. In fact, I might think a bit more here and wrap label_file,
>>>> tblspc_map_file to a single structure something like below and pass it
>>>> across the functions.
>>>>
>>>> typedef struct BackupState
>>>> {
>>>> StringInfo label_file;
>>>> StringInfo tblspc_map_file;
>>>> char startxlogfilename[MAXFNAMELEN];
>>>> XLogRecPtr startpoint;
>>>> char backupfrom[20];
>>>> } BackupState;
>>>>
>>>> This way, the code is more readable, structured and we can remove 2
>>>> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
>>>> strstr() call. Only thing is that it creates code diff from the
>>>> previous PG versions which is fine IMO. If okay, I'm happy to prepare
>>>> a patch.
>>>>
>>>> Thoughts?
>>>
>>> It is more or less what was in my mind, but it seems that we don't
>>> need StringInfo there, or should avoid it to signal the strings are
>>> not editable.
>>
>> I would prefer to have all the components of backup_label stored
>> separately and then generate backup_label from them in pg_backup_stop().
> 
> +1, because pg_backup_stop is the one that's returning backup_label
> contents, so it does make sense for it to prepare it once and for all
> and return.
> 
>> For PG16 I am planning to add some fields to backup_label that are not
>> known when pg_backup_start() is called, e.g. min recovery time.
> 
> Can you please point to your patch that does above?

Currently it is a plan, not a patch. So there is nothing to show yet.

> Yes, right now, backup_label or tablespace_map contents are being
> filled in by pg_backup_start and are never changed again. But if your
> above proposal is for fixing some issue, then it would make sense for
> us to carry all the info in a structure to pg_backup_stop and then let
> it prepare the backup_label and tablespace_map contents.

I think this makes sense even if I don't get these changes into PG16.

> If the approach is okay for the hackers, I would like to spend time on it.

+1 from me.

Regards,
-David



On Tue, Jul 26, 2022 at 5:50 PM David Steele <david@pgmasters.net> wrote:
>
> >> I would prefer to have all the components of backup_label stored
> >> separately and then generate backup_label from them in pg_backup_stop().
> >
> > +1, because pg_backup_stop is the one that's returning backup_label
> > contents, so it does make sense for it to prepare it once and for all
> > and return.
> >
> >> For PG16 I am planning to add some fields to backup_label that are not
> >> known when pg_backup_start() is called, e.g. min recovery time.
> >
> > Can you please point to your patch that does above?
>
> Currently it is a plan, not a patch. So there is nothing to show yet.
>
> > Yes, right now, backup_label or tablespace_map contents are being
> > filled in by pg_backup_start and are never changed again. But if your
> > above proposal is for fixing some issue, then it would make sense for
> > us to carry all the info in a structure to pg_backup_stop and then let
> > it prepare the backup_label and tablespace_map contents.
>
> I think this makes sense even if I don't get these changes into PG16.
>
> > If the approach is okay for the hackers, I would like to spend time on it.
>
> +1 from me.

Here comes the v1 patch. This patch tries to refactor backup related
code, advantages of doing so are following:

1) backup state is more structured now - all in a single structure,
callers can create backup_label contents whenever required, either
during the pg_backup_start or the pg_backup_stop or in between.
2) no parsing of backup_label file lines now in pg_backup_stop, no
error checking for invalid parsing.
3) backup_label and history file contents have most of the things in
common, they can now be created within a single function.
4) makes backup related code extensible and readable.

One downside is that it creates a lot of diff with previous versions.

Please review.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/

Attachment
On Sat, Jul 30, 2022 at 5:37 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 5:50 PM David Steele <david@pgmasters.net> wrote:
> >
> > >> I would prefer to have all the components of backup_label stored
> > >> separately and then generate backup_label from them in pg_backup_stop().
> > >
> > > +1, because pg_backup_stop is the one that's returning backup_label
> > > contents, so it does make sense for it to prepare it once and for all
> > > and return.
> > >
> > >> For PG16 I am planning to add some fields to backup_label that are not
> > >> known when pg_backup_start() is called, e.g. min recovery time.
> > >
> > > Can you please point to your patch that does above?
> >
> > Currently it is a plan, not a patch. So there is nothing to show yet.
> >
> > > Yes, right now, backup_label or tablespace_map contents are being
> > > filled in by pg_backup_start and are never changed again. But if your
> > > above proposal is for fixing some issue, then it would make sense for
> > > us to carry all the info in a structure to pg_backup_stop and then let
> > > it prepare the backup_label and tablespace_map contents.
> >
> > I think this makes sense even if I don't get these changes into PG16.
> >
> > > If the approach is okay for the hackers, I would like to spend time on it.
> >
> > +1 from me.
>
> Here comes the v1 patch. This patch tries to refactor backup related
> code, advantages of doing so are following:
>
> 1) backup state is more structured now - all in a single structure,
> callers can create backup_label contents whenever required, either
> during the pg_backup_start or the pg_backup_stop or in between.
> 2) no parsing of backup_label file lines now in pg_backup_stop, no
> error checking for invalid parsing.
> 3) backup_label and history file contents have most of the things in
> common, they can now be created within a single function.
> 4) makes backup related code extensible and readable.
>
> One downside is that it creates a lot of diff with previous versions.
>
> Please review.

I added this to current CF - https://commitfest.postgresql.org/39/3808/

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/



On Mon, Aug 8, 2022 at 7:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > Please review.
>
> I added this to current CF - https://commitfest.postgresql.org/39/3808/

Here's the v2 patch, no change from v1, just rebased because of commit
a8c012869763c711abc9085f54b2a100b60a85fa (Move basebackup code to new
directory src/backend/backup).

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment
On Thu, Aug 11, 2022 at 09:55:13AM +0530, Bharath Rupireddy wrote:
> Here's the v2 patch, no change from v1, just rebased because of commit
> a8c012869763c711abc9085f54b2a100b60a85fa (Move basebackup code to new
> directory src/backend/backup).

I was skimming at this patch, and indeed it is a bit crazy to write
the generate the contents of the backup_label file at backup start,
just to parse them again at backup stop with these extra sscan calls.

-#define PG_STOP_BACKUP_V2_COLS 3
+#define PG_BACKUP_STOP_V2_COLS 3
It seems to me that such changes, while they make sense with the new
naming of the backup start/stop functions are unrelated to what you
are trying to solve primarily here.  This justifies a separate
cleanup, but I am perhaps overly-pedantic here :)
--
Michael

Attachment
On Mon, Sep 12, 2022 at 1:12 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Aug 11, 2022 at 09:55:13AM +0530, Bharath Rupireddy wrote:
> > Here's the v2 patch, no change from v1, just rebased because of commit
> > a8c012869763c711abc9085f54b2a100b60a85fa (Move basebackup code to new
> > directory src/backend/backup).
>
> I was skimming at this patch, and indeed it is a bit crazy to write
> the generate the contents of the backup_label file at backup start,
> just to parse them again at backup stop with these extra sscan calls.

Thanks for taking a look at the patch.

> -#define PG_STOP_BACKUP_V2_COLS 3
> +#define PG_BACKUP_STOP_V2_COLS 3
> It seems to me that such changes, while they make sense with the new
> naming of the backup start/stop functions are unrelated to what you
> are trying to solve primarily here.  This justifies a separate
> cleanup, but I am perhaps overly-pedantic here :)

I've posted a separate patch [1] to adjust the macro name alone.

Please review the attached v3 patch after removing the above macro changes.

[1] https://www.postgresql.org/message-id/CALj2ACXjvC28ppeDTCrfaSyHga0ggP5nRLJbsjx%3D7N-74UT4QA%40mail.gmail.com

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Mon, Sep 12, 2022 at 5:09 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please review the attached v3 patch after removing the above macro changes.

I'm attaching the v4 patch that's rebased on to the latest HEAD.
Please consider this for review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, Sep 14, 2022 at 02:24:12PM +0530, Bharath Rupireddy wrote:
> I'm attaching the v4 patch that's rebased on to the latest HEAD.
> Please consider this for review.

I have been looking at this patch.

-   StringInfo  labelfile;
-   StringInfo  tblspc_map_file;
    backup_manifest_info manifest;
+   BackupState backup_state;
You could use initialize the state here with a {0}.  That's simpler.

--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -380,6 +380,31 @@ GetRmgr(RmgrId rmid)
 }
 #endif

+/* Structure to hold backup state. */
+typedef struct BackupStateData
+{
Why is that in xlog_internal.h?  This header includes a lot of
declarations about the internals of WAL, but the backup state is not
that internal.  I'd like to think that we should begin separating the
backup-related routines into their own file, as of a set of
xlogbackup.c/h in this case.  That's a split I have been wondering
about for some time now.  The internals of xlog.c for the start/stop
backups are tied to XLogCtlData which map such a switch more
complicated than it looks, so we can begin small and have the routines
to create, free and build the label file and the tablespace map in
this new file.

+   state->name = (char *) palloc0(len + 1);
+   memcpy(state->name, name, len);
Or just pstrdup()?

+BackupState
+get_backup_state(const char *name)
+{
I would name this one create_backup_state() instead.

+void
+create_backup_content_str(BackupState state, bool forhistoryfile)
+{
This could be a build_backup_content().

It seems to me that there is no point in having the list of
tablespaces in BackupStateData?  This actually makes the code harder
to follow, see for example the changes with do_pg_backup_start(), we
the list of tablespace may or may be not passed down as a pointer of
BackupStateData while BackupStateData is itself the first argument of
this routine.  These are independent from the label and backup history
file, as well.

We need to be careful about the file format (see read_backup_label()),
and create_backup_content_str() is careful about that which is good.
Core does not care about the format of the backup history file, though
some community tools may.  I agree that what you are proposing here
makes the generation of these files easier to follow, but let's
document what forhistoryfile is here for, at least.  Saving the
the backup label and history file strings in BackupState is a
confusing interface IMO.  It would be more intuitive to have the
backup state in input, and provide the string generated in output
depending on what we want from the backup state.

-   backup_started_in_recovery = RecoveryInProgress();
+   Assert(state != NULL);
+
+   in_recovery = RecoveryInProgress();
[...]
-   if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
+   if (state->started_in_recovery == true && in_recovery == false)

I would have kept the naming to backup_started_in_recovery here.  What
you are doing is logically right by relying on started_in_recovery to
check if recovery was running when the backup started, but this just
creates useless noise in the refactoring.

Something unrelated to your patch that I am noticing while scanning
the area is that we have been always lazy in freeing the label file
data allocated in TopMemoryContext when using the SQL functions if the
backup is aborted.  We are not talking about this much amount of
memory each time a backup is performed, but that could be a cause for
memory bloat in a server if the same session is used and backups keep
failing, as the data is freed only on a successful pg_backup_stop().
Base backups through the replication protocol don't care about that as
the code keeps around the same pointer for the whole duration of
perform_base_backup().  Trying to tie the cleanup of the label file
with the abort phase would be the cause of more complications with
do_pg_backup_stop(), and xlog.c has no need to know about that now.
Just a remark for later.
--
Michael

Attachment
On Fri, Sep 16, 2022 at 12:01 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Sep 14, 2022 at 02:24:12PM +0530, Bharath Rupireddy wrote:
> > I'm attaching the v4 patch that's rebased on to the latest HEAD.
> > Please consider this for review.
>
> I have been looking at this patch.

Thanks for reviewing it.

> -   StringInfo  labelfile;
> -   StringInfo  tblspc_map_file;
>     backup_manifest_info manifest;
> +   BackupState backup_state;
> You could use initialize the state here with a {0}.  That's simpler.

BackupState is a pointer  to BackupStateData, we can't initialize that
way. However, I got rid of BackupStateData and used BackupState for
the structure directly, whenever pointer to the structure is required,
I'm using BackupState * to be more clear.

> --- a/src/include/access/xlog_internal.h
> +++ b/src/include/access/xlog_internal.h
> @@ -380,6 +380,31 @@ GetRmgr(RmgrId rmid)
>  }
>  #endif
>
> +/* Structure to hold backup state. */
> +typedef struct BackupStateData
> +{
> Why is that in xlog_internal.h?  This header includes a lot of
> declarations about the internals of WAL, but the backup state is not
> that internal.  I'd like to think that we should begin separating the
> backup-related routines into their own file, as of a set of
> xlogbackup.c/h in this case.  That's a split I have been wondering
> about for some time now.  The internals of xlog.c for the start/stop
> backups are tied to XLogCtlData which map such a switch more
> complicated than it looks, so we can begin small and have the routines
> to create, free and build the label file and the tablespace map in
> this new file.

Good idea. It makes a lot more sense to me, because xlog.c is already
a file of 9000 LOC. I've created xlogbackup.c/.h files and added the
new code there. Once this patch gets in, I can offer my hand to move
do_pg_backup_start() and do_pg_backup_stop() from xlog.c and if okay,
pg_backup_start() and pg_backup_stop() from xlogfuncs.c to
xlogbackup.c/.h. Then, we might have to create new get/set APIs for
XLogCtl fields that do_pg_backup_start() and do_pg_backup_stop()
access.

> +   state->name = (char *) palloc0(len + 1);
> +   memcpy(state->name, name, len);
> Or just pstrdup()?

Done.

> +BackupState
> +get_backup_state(const char *name)
> +{
> I would name this one create_backup_state() instead.
>
> +void
> +create_backup_content_str(BackupState state, bool forhistoryfile)
> +{
> This could be a build_backup_content().

I came up with more meaningful names - allocate_backup_state(),
deallocate_backup_state(), build_backup_content().

> It seems to me that there is no point in having the list of
> tablespaces in BackupStateData? This actually makes the code harder
> to follow, see for example the changes with do_pg_backup_start(), we
> the list of tablespace may or may be not passed down as a pointer of
> BackupStateData while BackupStateData is itself the first argument of
> this routine.  These are independent from the label and backup history
> file, as well.

I haven't stored the list of tablespaces in BackupState, it's the
string that do_pg_backup_start() creates is stored in there for
carrying it till pg_backup_stop(). Adding the tablespace_map,
backup_label, history_file in BackupState makes it easy to carry them
across various backup related functions.

> We need to be careful about the file format (see read_backup_label()),
> and create_backup_content_str() is careful about that which is good.
> Core does not care about the format of the backup history file, though
> some community tools may.

Are you suggesting that we need something like check_history_file()
similar to what read_backup_label() does by parsing each line of the
label file and erroring out if not in the required format?

> I agree that what you are proposing here
> makes the generation of these files easier to follow, but let's
> document what forhistoryfile is here for, at least.  Saving the
> the backup label and history file strings in BackupState is a
> confusing interface IMO.  It would be more intuitive to have the
> backup state in input, and provide the string generated in output
> depending on what we want from the backup state.

We need to carry tablespace_map contents from do_pg_backup_start()
till pg_backup_stop(), backup_label and history_file too are easy to
carry across. Hence it will be good to have all of them i.e.
tablespace_map, backup_label and history_file in the BackupState
structure. IMO, this way is good.

> -   backup_started_in_recovery = RecoveryInProgress();
> +   Assert(state != NULL);
> +
> +   in_recovery = RecoveryInProgress();
> [...]
> -   if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
> +   if (state->started_in_recovery == true && in_recovery == false)
>
> I would have kept the naming to backup_started_in_recovery here.  What
> you are doing is logically right by relying on started_in_recovery to
> check if recovery was running when the backup started, but this just
> creates useless noise in the refactoring.

PSA new patch.

> Something unrelated to your patch that I am noticing while scanning
> the area is that we have been always lazy in freeing the label file
> data allocated in TopMemoryContext when using the SQL functions if the
> backup is aborted.  We are not talking about this much amount of
> memory each time a backup is performed, but that could be a cause for
> memory bloat in a server if the same session is used and backups keep
> failing, as the data is freed only on a successful pg_backup_stop().
> Base backups through the replication protocol don't care about that as
> the code keeps around the same pointer for the whole duration of
> perform_base_backup().  Trying to tie the cleanup of the label file
> with the abort phase would be the cause of more complications with
> do_pg_backup_stop(), and xlog.c has no need to know about that now.
> Just a remark for later.

Yeah, I think that can be solved by passing in backup_state to
do_pg_abort_backup(). If okay, I can work on this too as 0002 patch in
this thread or we can discuss this separately to get more attention
after this refactoring patch gets in.

I'm attaching v5 patch with the above review comments addressed,
please review it further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

On 2022/09/17 16:18, Bharath Rupireddy wrote:
> Good idea. It makes a lot more sense to me, because xlog.c is already
> a file of 9000 LOC. I've created xlogbackup.c/.h files and added the
> new code there. Once this patch gets in, I can offer my hand to move
> do_pg_backup_start() and do_pg_backup_stop() from xlog.c and if okay,
> pg_backup_start() and pg_backup_stop() from xlogfuncs.c to
> xlogbackup.c/.h. Then, we might have to create new get/set APIs for
> XLogCtl fields that do_pg_backup_start() and do_pg_backup_stop()
> access.

The definition of SessionBackupState enum type also should be in xlogbackup.h?

> We need to carry tablespace_map contents from do_pg_backup_start()
> till pg_backup_stop(), backup_label and history_file too are easy to
> carry across. Hence it will be good to have all of them i.e.
> tablespace_map, backup_label and history_file in the BackupState
> structure. IMO, this way is good.

backup_label and history_file are not carried between pg_backup_start()
and _stop(), so don't need to be saved in BackupState. Their contents
can be easily created from other saved fields in BackupState,
if necessary. So at least for me it's better to get rid of them from
BackupState and don't allocate TopMemoryContext memory for them.

>> Something unrelated to your patch that I am noticing while scanning
>> the area is that we have been always lazy in freeing the label file
>> data allocated in TopMemoryContext when using the SQL functions if the
>> backup is aborted.  We are not talking about this much amount of
>> memory each time a backup is performed, but that could be a cause for
>> memory bloat in a server if the same session is used and backups keep
>> failing, as the data is freed only on a successful pg_backup_stop().
>> Base backups through the replication protocol don't care about that as
>> the code keeps around the same pointer for the whole duration of
>> perform_base_backup().  Trying to tie the cleanup of the label file
>> with the abort phase would be the cause of more complications with
>> do_pg_backup_stop(), and xlog.c has no need to know about that now.
>> Just a remark for later.
> 
> Yeah, I think that can be solved by passing in backup_state to
> do_pg_abort_backup(). If okay, I can work on this too as 0002 patch in
> this thread or we can discuss this separately to get more attention
> after this refactoring patch gets in.

Or, to avoid such memory bloat, how about allocating the memory for
backup_state only when it's NULL?

> I'm attaching v5 patch with the above review comments addressed,
> please review it further.

Thanks for updating the patch!

+    char            startxlogfile[MAXFNAMELEN_BACKUP]; /* backup start WAL file */
<snip>
+    char            stopxlogfile[MAXFNAMELEN_BACKUP];    /* backup stop WAL file */

These file names seem not necessary in BackupState because they can be
calculated from other fields like startpoint and starttli, etc when
making backup_label and history file contents. If we remove them from
BackupState, we can also remove the definition of MAXFNAMELEN_BACKUP
macro from xlogbackup.h.

+    /* construct backup_label contents */
+    build_backup_content(state, false);

In basebackup case, build_backup_content() is called unnecessary twice
because do_pg_stop_backup() and its caller, perform_base_backup() call
that. This makes me think that it's better to get rid of the call to
build_backup_content() from do_pg_backup_stop(). Instead its callers,
perform_base_backup() and pg_backup_stop() should call that.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Sun, Sep 18, 2022 at 7:38 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2022/09/17 16:18, Bharath Rupireddy wrote:
> > Good idea. It makes a lot more sense to me, because xlog.c is already
> > a file of 9000 LOC. I've created xlogbackup.c/.h files and added the
> > new code there. Once this patch gets in, I can offer my hand to move
> > do_pg_backup_start() and do_pg_backup_stop() from xlog.c and if okay,
> > pg_backup_start() and pg_backup_stop() from xlogfuncs.c to
> > xlogbackup.c/.h. Then, we might have to create new get/set APIs for
> > XLogCtl fields that do_pg_backup_start() and do_pg_backup_stop()
> > access.
>
> The definition of SessionBackupState enum type also should be in xlogbackup.h?

Correct. Basically, all the backup related code from xlog.c,
xlogfuncs.c and elsewhere can go to xlogbackup.c/.h. I will focus on
that refactoring patch once this gets in.

> > We need to carry tablespace_map contents from do_pg_backup_start()
> > till pg_backup_stop(), backup_label and history_file too are easy to
> > carry across. Hence it will be good to have all of them i.e.
> > tablespace_map, backup_label and history_file in the BackupState
> > structure. IMO, this way is good.
>
> backup_label and history_file are not carried between pg_backup_start()
> and _stop(), so don't need to be saved in BackupState. Their contents
> can be easily created from other saved fields in BackupState,
> if necessary. So at least for me it's better to get rid of them from
> BackupState and don't allocate TopMemoryContext memory for them.

Yeah, but they have to be carried from do_pg_backup_stop() to
pg_backup_stop() or callers and also instead of keeping tablespace_map
in BackupState and others elsewhere don't seem to be a good idea to
me. IMO, BackupState is a good place to contain all the information
that's carried across various functions. I've changed the code to
lazily (upon first use in the backend) allocate memory for all of them
as we're concerned of the memory allocation beforehand.

> > Yeah, I think that can be solved by passing in backup_state to
> > do_pg_abort_backup(). If okay, I can work on this too as 0002 patch in
> > this thread or we can discuss this separately to get more attention
> > after this refactoring patch gets in.
>
> Or, to avoid such memory bloat, how about allocating the memory for
> backup_state only when it's NULL?

Ah my bad, I missed that. Done now.

> > I'm attaching v5 patch with the above review comments addressed,
> > please review it further.
>
> Thanks for updating the patch!

Thanks for reviewing it.

> +       char            startxlogfile[MAXFNAMELEN_BACKUP]; /* backup start WAL file */
> <snip>
> +       char            stopxlogfile[MAXFNAMELEN_BACKUP];       /* backup stop WAL file */
>
> These file names seem not necessary in BackupState because they can be
> calculated from other fields like startpoint and starttli, etc when
> making backup_label and history file contents. If we remove them from
> BackupState, we can also remove the definition of MAXFNAMELEN_BACKUP
> macro from xlogbackup.h.

Done.

> +       /* construct backup_label contents */
> +       build_backup_content(state, false);
>
> In basebackup case, build_backup_content() is called unnecessary twice
> because do_pg_stop_backup() and its caller, perform_base_backup() call
> that. This makes me think that it's better to get rid of the call to
> build_backup_content() from do_pg_backup_stop(). Instead its callers,
> perform_base_backup() and pg_backup_stop() should call that.

Yeah, it's a good idea. Done that way. It's easier because we can
create backup_label file contents at any point of time after
pg_backup_start().

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Sun, Sep 18, 2022 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

cfbot fails [1] with v6 patch. I made a silly mistake by not checking
the output of "make check-world -j  16" fully, I just saw the end
message "All tests successful." before posting the v6 patch.

The failure is due to perform_base_backup() accessing BackupState's
tablespace_map without a null check, so I fixed it.

Sorry for the noise. Please review the attached v7 patch further.

[1] https://cirrus-ci.com/task/5816966114967552?logs=test_world#L720

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

On 2022/09/18 23:09, Bharath Rupireddy wrote:
> On Sun, Sep 18, 2022 at 1:23 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> 
> cfbot fails [1] with v6 patch. I made a silly mistake by not checking
> the output of "make check-world -j  16" fully, I just saw the end
> message "All tests successful." before posting the v6 patch.
> 
> The failure is due to perform_base_backup() accessing BackupState's
> tablespace_map without a null check, so I fixed it.
> 
> Sorry for the noise. Please review the attached v7 patch further.

Thanks for updating the patch!

=# SELECT * FROM pg_backup_start('test', true);
=# SELECT * FROM pg_backup_stop();
LOG:  server process (PID 15651) was terminated by signal 11: Segmentation fault: 11
DETAIL:  Failed process was running: SELECT * FROM pg_backup_stop();

With v7 patch, pg_backup_stop() caused the segmentation fault.


=# SELECT * FROM pg_backup_start(repeat('test', 1024));
ERROR:  backup label too long (max 1024 bytes)
STATEMENT:  SELECT * FROM pg_backup_start(repeat('test', 1024));

=# SELECT * FROM pg_backup_start(repeat('testtest', 1024));
LOG:  server process (PID 15844) was terminated by signal 11: Segmentation fault: 11
DETAIL:  Failed process was running: SELECT * FROM pg_backup_start(repeat('testtest', 1024));

When I specified longer backup label in the second run of pg_backup_start()
after the first run failed, it caused the segmentation fault.


+    state = (BackupState *) palloc0(sizeof(BackupState));
+    state->name = pstrdup(name);

pg_backup_start() calls allocate_backup_state() and allocates the memory for
the input backup label before checking its length in do_pg_backup_start().
This can cause the memory for backup label to be allocated too much
unnecessary. I think that the maximum length of BackupState.name should
be MAXPGPATH (i.e., maximum allowed length for backup label).


>> The definition of SessionBackupState enum type also should be in xlogbackup.h?
>
> Correct. Basically, all the backup related code from xlog.c,
> xlogfuncs.c and elsewhere can go to xlogbackup.c/.h. I will focus on
> that refactoring patch once this gets in.

Understood.


> Yeah, but they have to be carried from do_pg_backup_stop() to
> pg_backup_stop() or callers and also instead of keeping tablespace_map
> in BackupState and others elsewhere don't seem to be a good idea to
> me. IMO, BackupState is a good place to contain all the information
> that's carried across various functions.

In v7 patch, since pg_backup_stop() calls build_backup_content(),
backup_label and history_file seem not to be carried from do_pg_backup_stop()
to pg_backup_stop(). This makes me still think that it's better not to include
them in BackupState...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Sep 19, 2022 at 2:38 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Thanks for updating the patch!
>
> =# SELECT * FROM pg_backup_start('test', true);
> =# SELECT * FROM pg_backup_stop();
> LOG:  server process (PID 15651) was terminated by signal 11: Segmentation fault: 11
> DETAIL:  Failed process was running: SELECT * FROM pg_backup_stop();
>
> With v7 patch, pg_backup_stop() caused the segmentation fault.

Fixed. I believed that the regression tests cover pg_backup_start()
and pg_backup_stop(), and relied on make check-world, surprisingly
there's no test that covers these functions. Is it a good idea to add
tests for these functions in misc_functions.sql or backup.sql or
somewhere so that they get run as part of make check? Thoughts?

> =# SELECT * FROM pg_backup_start(repeat('test', 1024));
> ERROR:  backup label too long (max 1024 bytes)
> STATEMENT:  SELECT * FROM pg_backup_start(repeat('test', 1024));
>
> =# SELECT * FROM pg_backup_start(repeat('testtest', 1024));
> LOG:  server process (PID 15844) was terminated by signal 11: Segmentation fault: 11
> DETAIL:  Failed process was running: SELECT * FROM pg_backup_start(repeat('testtest', 1024));
>
> When I specified longer backup label in the second run of pg_backup_start()
> after the first run failed, it caused the segmentation fault.
>
>
> +       state = (BackupState *) palloc0(sizeof(BackupState));
> +       state->name = pstrdup(name);
>
> pg_backup_start() calls allocate_backup_state() and allocates the memory for
> the input backup label before checking its length in do_pg_backup_start().
> This can cause the memory for backup label to be allocated too much
> unnecessary. I think that the maximum length of BackupState.name should
> be MAXPGPATH (i.e., maximum allowed length for backup label).

That's a good idea. I'm marking a flag if the label name overflows (>
MAXPGPATH), later using it in do_pg_backup_start() to error out. We
could've thrown error directly, but that changes the behaviour - right
now, first "
wal_level must be set to \"replica\" or \"logical\" at server start."
gets emitted and then label name overflow error - I don't want to do
that.

> > Yeah, but they have to be carried from do_pg_backup_stop() to
> > pg_backup_stop() or callers and also instead of keeping tablespace_map
> > in BackupState and others elsewhere don't seem to be a good idea to
> > me. IMO, BackupState is a good place to contain all the information
> > that's carried across various functions.
>
> In v7 patch, since pg_backup_stop() calls build_backup_content(),
> backup_label and history_file seem not to be carried from do_pg_backup_stop()
> to pg_backup_stop(). This makes me still think that it's better not to include
> them in BackupState...

I'm a bit worried about the backup state being spread across if we
separate out backup_label and history_file from BackupState and keep
tablespace_map contents there. As I said upthread, we are not
allocating memory for them at the beginning, we allocate only when
needed. IMO, this code is readable and more extensible.

I've also taken help of the error callback mechanism to clean up the
allocated memory in case of a failure. For do_pg_abort_backup() cases,
I think we don't need to clean the memory because that gets called on
proc exit (before_shmem_exit()).

Please review the v8 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Mon, Sep 19, 2022 at 06:26:34PM +0530, Bharath Rupireddy wrote:
> On Mon, Sep 19, 2022 at 2:38 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Fixed. I believed that the regression tests cover pg_backup_start()
> and pg_backup_stop(), and relied on make check-world, surprisingly
> there's no test that covers these functions. Is it a good idea to add
> tests for these functions in misc_functions.sql or backup.sql or
> somewhere so that they get run as part of make check? Thoughts?

The main regression test suite should not include direct calls to
pg_backup_start() or pg_backup_stop() as these depend on wal_level,
and we spend a certain amount of resources in keeping the tests a
maximum portable across such configurations, wal_level = minimal being
one of them.  One portable example is in 001_stream_rep.pl.

> That's a good idea. I'm marking a flag if the label name overflows (>
> MAXPGPATH), later using it in do_pg_backup_start() to error out. We
> could've thrown error directly, but that changes the behaviour - right
> now, first "
> wal_level must be set to \"replica\" or \"logical\" at server start."
> gets emitted and then label name overflow error - I don't want to do
> that.

-   if (strlen(backupidstr) > MAXPGPATH)
+   if (state->name_overflowed == true)
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("backup label too long (max %d bytes)",
It does not strike me as a huge issue to force a truncation of such
backup label names.  1024 is large enough for basically all users,
in my opinion.  Or you could just truncate in the internal logic, but
still complain at MAXPGPATH - 1 as the last byte would be for the zero
termination.  In short, there is no need to complicate things with
name_overflowed.

>> In v7 patch, since pg_backup_stop() calls build_backup_content(),
>> backup_label and history_file seem not to be carried from do_pg_backup_stop()
>> to pg_backup_stop(). This makes me still think that it's better not to include
>> them in BackupState...
>
> I'm a bit worried about the backup state being spread across if we
> separate out backup_label and history_file from BackupState and keep
> tablespace_map contents there. As I said upthread, we are not
> allocating memory for them at the beginning, we allocate only when
> needed. IMO, this code is readable and more extensible.

+   StringInfo      backup_label;   /* backup_label file contents */
+   StringInfo      tablespace_map; /* tablespace_map file contents */
+   StringInfo      history_file;   /* history file contents */
IMV, repeating a point I already made once upthread, BackupState
should hold none of these.  Let's just generate the contents of these
files in the contexts where they are needed, making BackupState
something to rely on to build them in the code paths where they are
necessary.  This is going to make the reasoning around the memory
contexts where each one of them is stored much easier and reduce the
changes of bugs in the long-term.

> I've also taken help of the error callback mechanism to clean up the
> allocated memory in case of a failure. For do_pg_abort_backup() cases,
> I think we don't need to clean the memory because that gets called on
> proc exit (before_shmem_exit()).

Memory could still bloat while the process running the SQL functions
is running depending on the error code path, anyway.
--
Michael

Attachment
On Tue, Sep 20, 2022 at 7:20 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> The main regression test suite should not include direct calls to
> pg_backup_start() or pg_backup_stop() as these depend on wal_level,
> and we spend a certain amount of resources in keeping the tests a
> maximum portable across such configurations, wal_level = minimal being
> one of them.  One portable example is in 001_stream_rep.pl.

Understood.

> > That's a good idea. I'm marking a flag if the label name overflows (>
> > MAXPGPATH), later using it in do_pg_backup_start() to error out. We
> > could've thrown error directly, but that changes the behaviour - right
> > now, first "
> > wal_level must be set to \"replica\" or \"logical\" at server start."
> > gets emitted and then label name overflow error - I don't want to do
> > that.
>
> -   if (strlen(backupidstr) > MAXPGPATH)
> +   if (state->name_overflowed == true)
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                  errmsg("backup label too long (max %d bytes)",
> It does not strike me as a huge issue to force a truncation of such
> backup label names.  1024 is large enough for basically all users,
> in my opinion.  Or you could just truncate in the internal logic, but
> still complain at MAXPGPATH - 1 as the last byte would be for the zero
> termination.  In short, there is no need to complicate things with
> name_overflowed.

We currently allow MAXPGPATH bytes of label name excluding null
termination. I don't want to change this behaviour. In the attached v9
patch, I'm truncating the larger names to  MAXPGPATH + 1 bytes in
backup state (one extra byte for representing that the name has
overflown, and another extra byte for null termination).

> +   StringInfo      backup_label;   /* backup_label file contents */
> +   StringInfo      tablespace_map; /* tablespace_map file contents */
> +   StringInfo      history_file;   /* history file contents */
> IMV, repeating a point I already made once upthread, BackupState
> should hold none of these.  Let's just generate the contents of these
> files in the contexts where they are needed, making BackupState
> something to rely on to build them in the code paths where they are
> necessary.  This is going to make the reasoning around the memory
> contexts where each one of them is stored much easier and reduce the
> changes of bugs in the long-term.

I've separated out these variables from the backup state, please see
the attached v9 patch.

> > I've also taken help of the error callback mechanism to clean up the
> > allocated memory in case of a failure. For do_pg_abort_backup() cases,
> > I think we don't need to clean the memory because that gets called on
> > proc exit (before_shmem_exit()).
>
> Memory could still bloat while the process running the SQL functions
> is running depending on the error code path, anyway.

I didn't get your point. Can you please elaborate it? I think adding
error callbacks at the right places would free up the memory for us.
Please note that we already are causing memory leaks on HEAD today.

I addressed the above review comments. I also changed a wrong comment
[1] that lies before pg_backup_start() even after the removal of
exclusive backup.

I'm attaching v9 patch set herewith, 0001 - refactors the backup code
with backup state, 0002 - adds error callbacks to clean up the memory
allocated for backup variables. Please review them further.

[1]
 * Essentially what this does is to create a backup label file in $PGDATA,
 * where it will be archived as part of the backup dump.  The label file
 * contains the user-supplied label string (typically this would be used
 * to tell where the backup dump will be stored) and the starting time and
 * starting WAL location for the dump.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

On 2022/09/20 20:43, Bharath Rupireddy wrote:
>> -   if (strlen(backupidstr) > MAXPGPATH)
>> +   if (state->name_overflowed == true)
>>          ereport(ERROR,
>>                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>                   errmsg("backup label too long (max %d bytes)",
>> It does not strike me as a huge issue to force a truncation of such
>> backup label names.  1024 is large enough for basically all users,
>> in my opinion.  Or you could just truncate in the internal logic, but
>> still complain at MAXPGPATH - 1 as the last byte would be for the zero
>> termination.  In short, there is no need to complicate things with
>> name_overflowed.
> 
> We currently allow MAXPGPATH bytes of label name excluding null
> termination. I don't want to change this behaviour. In the attached v9
> patch, I'm truncating the larger names to  MAXPGPATH + 1 bytes in
> backup state (one extra byte for representing that the name has
> overflown, and another extra byte for null termination).

This looks much complicated to me.

Instead of making allocate_backup_state() or reset_backup_state()
store the label name in BackupState before do_pg_backup_start(),
how about making do_pg_backup_start() do that after checking its length?
Seems this can simplify the code very much.

If so, ISTM that we can replace allocate_backup_state() and
reset_backup_state() with just palloc0() and MemSet(), respectively.
Also we can remove fill_backup_label_name().

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Tue, Sep 20, 2022 at 05:13:49PM +0530, Bharath Rupireddy wrote:
> On Tue, Sep 20, 2022 at 7:20 AM Michael Paquier <michael@paquier.xyz> wrote:
> I've separated out these variables from the backup state, please see
> the attached v9 patch.

Thanks, the separation looks cleaner.

>>> I've also taken help of the error callback mechanism to clean up the
>>> allocated memory in case of a failure. For do_pg_abort_backup() cases,
>>> I think we don't need to clean the memory because that gets called on
>>> proc exit (before_shmem_exit()).
>>
>> Memory could still bloat while the process running the SQL functions
>> is running depending on the error code path, anyway.
>
> I didn't get your point. Can you please elaborate it? I think adding
> error callbacks at the right places would free up the memory for us.
> Please note that we already are causing memory leaks on HEAD today.

I mean that HEAD makes no effort in freeing this memory in
TopMemoryContext on session ERROR.

> I addressed the above review comments. I also changed a wrong comment
> [1] that lies before pg_backup_start() even after the removal of
> exclusive backup.
>
> I'm attaching v9 patch set herewith, 0001 - refactors the backup code
> with backup state, 0002 - adds error callbacks to clean up the memory
> allocated for backup variables. Please review them further.

I have a few comments on 0001.

+#include <access/xlogbackup.h>
Double quotes wanted here.

deallocate_backup_variables() is the only part of xlogbackup.c that
includes references of the tablespace map_and backup_label
StringInfos.  I would be tempted to fully decouple that from
xlogbackup.c/h for the time being.

-   tblspc_map_file = makeStringInfo();
Not sure that there is a need for a rename here.

+void
+build_backup_content(BackupState *state, bool ishistoryfile, StringInfo str)
+{
It would be more natural to have build_backup_content() do by itself
the initialization of the StringInfo for the contents of backup_label
and return it as a result of the routine?  This is short-lived in
xlogfuncs.c when the backup ends.

@@ -248,18 +250,25 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
[...]
+   /* Construct backup_label contents. */
+   build_backup_content(backup_state, false, backup_label);

Actually, for base backups, perhaps it would be more intuitive to
build and free the StringInfo of the backup_label when we send it for
base.tar rather than initializing it at the beginning and freeing it
at the end?

- * pg_backup_start: set up for taking an on-line backup dump
+ * pg_backup_start: start taking an on-line backup.
  *
- * Essentially what this does is to create a backup label file in $PGDATA,
- * where it will be archived as part of the backup dump.  The label file
- * contains the user-supplied label string (typically this would be used
- * to tell where the backup dump will be stored) and the starting time and
- * starting WAL location for the dump.
+ * This function starts the backup and creates tablespace_map contents.

The last part of the comment is still correct while the former is not,
so this loses some information.
--
Michael

Attachment
On Wed, Sep 21, 2022 at 03:45:49PM +0900, Fujii Masao wrote:
> Instead of making allocate_backup_state() or reset_backup_state()
> store the label name in BackupState before do_pg_backup_start(),
> how about making do_pg_backup_start() do that after checking its length?
> Seems this can simplify the code very much.
>
> If so, ISTM that we can replace allocate_backup_state() and
> reset_backup_state() with just palloc0() and MemSet(), respectively.
> Also we can remove fill_backup_label_name().

Yep, agreed.  Having all these routines feels a bit overengineered.
--
Michael

Attachment
On Wed, Sep 21, 2022 at 12:15 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
> This looks much complicated to me.
>
> Instead of making allocate_backup_state() or reset_backup_state()
> store the label name in BackupState before do_pg_backup_start(),
> how about making do_pg_backup_start() do that after checking its length?
> Seems this can simplify the code very much.
>
> If so, ISTM that we can replace allocate_backup_state() and
> reset_backup_state() with just palloc0() and MemSet(), respectively.
> Also we can remove fill_backup_label_name().

Yes, that makes things simpler. I will post the v10 patch-set soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Sep 21, 2022 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> >>> I've also taken help of the error callback mechanism to clean up the
> >>> allocated memory in case of a failure. For do_pg_abort_backup() cases,
> >>> I think we don't need to clean the memory because that gets called on
> >>> proc exit (before_shmem_exit()).
> >>
> >> Memory could still bloat while the process running the SQL functions
> >> is running depending on the error code path, anyway.
> >
> > I didn't get your point. Can you please elaborate it? I think adding
> > error callbacks at the right places would free up the memory for us.
> > Please note that we already are causing memory leaks on HEAD today.
>
> I mean that HEAD makes no effort in freeing this memory in
> TopMemoryContext on session ERROR.

Correct. We can also solve it as part of this commit. Please let me
know your thoughts on 0002 patch.

> I have a few comments on 0001.
>
> +#include <access/xlogbackup.h>
> Double quotes wanted here.

Ah, my bad. Corrected now.

> deallocate_backup_variables() is the only part of xlogbackup.c that
> includes references of the tablespace map_and backup_label
> StringInfos.  I would be tempted to fully decouple that from
> xlogbackup.c/h for the time being.

There's no problem with it IMO, after all, they are backup related
variables. And that function reduces a bit of duplicate code.

> -   tblspc_map_file = makeStringInfo();
> Not sure that there is a need for a rename here.

We're referring tablespace_map and backup_label internally all around,
just to be in sync, I wanted to rename it while we're refactoring this
code.

> +void
> +build_backup_content(BackupState *state, bool ishistoryfile, StringInfo str)
> +{
> It would be more natural to have build_backup_content() do by itself
> the initialization of the StringInfo for the contents of backup_label
> and return it as a result of the routine?  This is short-lived in
> xlogfuncs.c when the backup ends.

See the below explaination.

> @@ -248,18 +250,25 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
> [...]
> +   /* Construct backup_label contents. */
> +   build_backup_content(backup_state, false, backup_label);
>
> Actually, for base backups, perhaps it would be more intuitive to
> build and free the StringInfo of the backup_label when we send it for
> base.tar rather than initializing it at the beginning and freeing it
> at the end?

sendFileWithContent() is in a for-loop and we are good if we call
build_backup_content() before do_pg_backup_start() just once. Also,
allocating backup_label in the for loop makes error handling trickier,
how do we free-up when sendFileWithContent() errors out? Of course, we
can allocate backup_label once even in the for loop with bool
first_time sort of variable and store StringInfo *ptr_backup_label; in
error callback info, but that would make things unnecessarily complex,
instead we're good allocating and creating backup_label content at the
beginning and freeing-it up at the end.

> - * pg_backup_start: set up for taking an on-line backup dump
> + * pg_backup_start: start taking an on-line backup.
>   *
> - * Essentially what this does is to create a backup label file in $PGDATA,
> - * where it will be archived as part of the backup dump.  The label file
> - * contains the user-supplied label string (typically this would be used
> - * to tell where the backup dump will be stored) and the starting time and
> - * starting WAL location for the dump.
> + * This function starts the backup and creates tablespace_map contents.
>
> The last part of the comment is still correct while the former is not,
> so this loses some information.

Added that part before pg_backup_stop() now where it makes sense with
the refactoring.

I'm attaching the v10 patch-set with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, Sep 21, 2022 at 05:10:39PM +0530, Bharath Rupireddy wrote:
>> deallocate_backup_variables() is the only part of xlogbackup.c that
>> includes references of the tablespace map_and backup_label
>> StringInfos.  I would be tempted to fully decouple that from
>> xlogbackup.c/h for the time being.
>
> There's no problem with it IMO, after all, they are backup related
> variables. And that function reduces a bit of duplicate code.

Hmm.  I'd like to disagree with this statement :)

> Added that part before pg_backup_stop() now where it makes sense with
> the refactoring.

I have put my hands on 0001, and finished with the attached, that
includes many fixes and tweaks.  Some of the variable renames felt out
of place, while some comments were overly verbose for their purpose,
though for the last part we did not lose any information in the last
version proposed.

As I suspected, the deallocate routine felt unnecessary, as
xlogbackup.c/h have no idea what these are.  The remark is
particularly true for the StringInfo of the backup_label file: for
basebackup.c we need to build it when sending base.tar and in
xlogfuncs.c we need it only at the backup stop phase.  The code was
actually a bit wrong, because free-ing StringInfos requires to free
its ->data and then the main object (stringinfo.h explains that).  My
tweaks have shaved something like 10%~15% of the patch, while making
it IMO more readable.

A second issue I had was with the build function, and again it seemed
much cleaner to let the routine do the makeStringInfo() and return the
result.  This is not the most popular routine ever, but this reduces
the workload of the caller of build_backup_content().

So, opinions?
--
Michael

Attachment

On 2022/09/22 16:43, Michael Paquier wrote:
>> Added that part before pg_backup_stop() now where it makes sense with
>> the refactoring.
> 
> I have put my hands on 0001, and finished with the attached, that
> includes many fixes and tweaks.  Some of the variable renames felt out
> of place, while some comments were overly verbose for their purpose,
> though for the last part we did not lose any information in the last
> version proposed.

Thanks for updating the patch! This looks better to me.

+        MemSet(backup_state, 0, sizeof(BackupState));
+        MemSet(backup_state->name, '\0', sizeof(backup_state->name));

The latter MemSet() is not necessary because the former already
resets that with zero, is it?

+        pfree(tablespace_map);
+        tablespace_map = NULL;
+    }
+
+    tablespace_map = makeStringInfo();

tablespace_map doesn't need to be reset to NULL here.

      /* Free structures allocated in TopMemoryContext */
-    pfree(label_file->data);
-    pfree(label_file);
<snip>
+    pfree(backup_label->data);
+    pfree(backup_label);
+    backup_label = NULL;

This source comment is a bit misleading, isn't it? Because the memory
for backup_label is allocated under the memory context other than
TopMemoryContext.

+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogbackup.h"
+#include "utils/memutils.h"

Seems "utils/memutils.h" doesn't need to be included.

+        XLByteToSeg(state->startpoint, stopsegno, wal_segment_size);
+        XLogFileName(stopxlogfile, state->starttli, stopsegno, wal_segment_size);
+        appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file %s)\n",
+                         LSN_FORMAT_ARGS(state->startpoint), stopxlogfile);

state->stoppoint and state->stoptli should be used instead of
state->startpoint and state->starttli?

+        pfree(tablespace_map);
+        tablespace_map = NULL;
+        pfree(backup_state);
+        backup_state = NULL;

It's harmless to set tablespace_map and backup_state to NULL after pfree(),
but it's also unnecessary at least here.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Hi,

On 2022-09-22 16:43:19 +0900, Michael Paquier wrote:
> I have put my hands on 0001, and finished with the attached, that
> includes many fixes and tweaks.

Due to the merge of the meson based build this patch needs some
adjustment. See
https://cirrus-ci.com/build/6146162607521792
Looks like it just requires adding xlogbackup.c to
src/backend/access/transam/meson.build.

Greetings,

Andres Freund



On Thu, Sep 22, 2022 at 8:55 PM Andres Freund <andres@anarazel.de> wrote:
>
> Due to the merge of the meson based build this patch needs some
> adjustment. See
> https://cirrus-ci.com/build/6146162607521792
> Looks like it just requires adding xlogbackup.c to
> src/backend/access/transam/meson.build.

Thanks! I will post a new patch with that change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Sep 22, 2022 at 3:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> +               MemSet(backup_state, 0, sizeof(BackupState));
> +               MemSet(backup_state->name, '\0', sizeof(backup_state->name));
>
> The latter MemSet() is not necessary because the former already
> resets that with zero, is it?

Yes, earlier, the name was a palloc'd string but now it is a fixed
array. Removed.

> +               pfree(tablespace_map);
> +               tablespace_map = NULL;
> +       }
> +
> +       tablespace_map = makeStringInfo();
>
> tablespace_map doesn't need to be reset to NULL here.
>
> +               pfree(tablespace_map);
> +               tablespace_map = NULL;
> +               pfree(backup_state);
> +               backup_state = NULL;
>
> It's harmless to set tablespace_map and backup_state to NULL after pfree(),
> but it's also unnecessary at least here.

Yes. But we can retain it for the sake of consistency with the other
places and avoid dangling pointers, if at all any new code gets added
in between it will be useful.

>         /* Free structures allocated in TopMemoryContext */
> -       pfree(label_file->data);
> -       pfree(label_file);
> <snip>
> +       pfree(backup_label->data);
> +       pfree(backup_label);
> +       backup_label = NULL;
>
> This source comment is a bit misleading, isn't it? Because the memory
> for backup_label is allocated under the memory context other than
> TopMemoryContext.

Yes, we can just say /* Deallocate backup-related variables. */. The
pg_backup_start() has the info about the variables being allocated in
TopMemoryContext.

> +#include "access/xlog.h"
> +#include "access/xlog_internal.h"
> +#include "access/xlogbackup.h"
> +#include "utils/memutils.h"
>
> Seems "utils/memutils.h" doesn't need to be included.

Yes, removed now.

> +               XLByteToSeg(state->startpoint, stopsegno, wal_segment_size);
> +               XLogFileName(stopxlogfile, state->starttli, stopsegno, wal_segment_size);
> +               appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file %s)\n",
> +                                                LSN_FORMAT_ARGS(state->startpoint), stopxlogfile);
>
> state->stoppoint and state->stoptli should be used instead of
> state->startpoint and state->starttli?

Nice catch! Corrected.

PSA v12 patch with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Fri, Sep 23, 2022 at 06:02:24AM +0530, Bharath Rupireddy wrote:
> PSA v12 patch with the above review comments addressed.

I've read this one, and nothing is standing out at quick glance, so
that looks rather reasonable to me.  I should be able to spend more
time on that at the beginning of next week, and maybe apply it.
--
Michael

Attachment
On Thu, Sep 22, 2022 at 08:25:31AM -0700, Andres Freund wrote:
> Due to the merge of the meson based build this patch needs some
> adjustment. See
> https://cirrus-ci.com/build/6146162607521792
> Looks like it just requires adding xlogbackup.c to
> src/backend/access/transam/meson.build.

Thanks for the reminder.  I have played a bit with meson and ninja,
and that's a rather straight-forward experience.  The output is muuuch
nicer to parse.
--
Michael

Attachment
On Fri, Sep 23, 2022 at 09:12:22PM +0900, Michael Paquier wrote:
> I've read this one, and nothing is standing out at quick glance, so
> that looks rather reasonable to me.  I should be able to spend more
> time on that at the beginning of next week, and maybe apply it.

What I had at hand seemed fine on a second look, so applied after
tweaking a couple of comments.  One thing that I have been wondering
after-the-fact is whether it would be cleaner to return the contents
of the backup history file or backup_label as a char rather than a
StringInfo?  This simplifies a bit what the callers of
build_backup_content() need to do.
--
Michael

Attachment
On Mon, Sep 26, 2022 at 8:13 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> What I had at hand seemed fine on a second look, so applied after
> tweaking a couple of comments.  One thing that I have been wondering
> after-the-fact is whether it would be cleaner to return the contents
> of the backup history file or backup_label as a char rather than a
> StringInfo?  This simplifies a bit what the callers of
> build_backup_content() need to do.

+1 because callers don't use returned StringInfo structure outside of
build_backup_content(). The patch looks good to me. I think it will be
good to add a note about the caller freeing up the retired string's
memory [1], just in case.

[1]
 * Returns the result generated as a palloc'd string. It is the caller's
 * responsibility to free the returned string's memory.
 */
char *
build_backup_content(BackupState *state, bool ishistoryfile)
{

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



At Mon, 26 Sep 2022 11:57:58 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Mon, Sep 26, 2022 at 8:13 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > What I had at hand seemed fine on a second look, so applied after
> > tweaking a couple of comments.  One thing that I have been wondering
> > after-the-fact is whether it would be cleaner to return the contents
> > of the backup history file or backup_label as a char rather than a
> > StringInfo?  This simplifies a bit what the callers of
> > build_backup_content() need to do.
> 
> +1 because callers don't use returned StringInfo structure outside of
> build_backup_content(). The patch looks good to me. I think it will be
> good to add a note about the caller freeing up the retired string's
> memory [1], just in case.

Doesn't the following (from you :) work?

+ * Returns the result generated as a palloc'd string.

This suggests no need for pfree if the caller properly destroys the
context or pfree is needed otherwise. In this case, the active memory
contexts are "ExprContext" and "Replication command context" so I
think we actually do not need to pfree it but I don't mean we sholnd't
do that in this patch (since those contexts are somewhat remote from
what the function does and pfree doesn't matter at all here.).

> [1]
>  * Returns the result generated as a palloc'd string. It is the caller's
>  * responsibility to free the returned string's memory.
>  */
> char *
> build_backup_content(BackupState *state, bool ishistoryfile)
> {

+1.  A nitpick.

-    if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
+    if (state->started_in_recovery == true &&
+        backup_stopped_in_recovery == false)

Using == for Booleans may not be great?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Mon, Sep 26, 2022 at 05:10:16PM +0900, Kyotaro Horiguchi wrote:
> -    if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
> +    if (state->started_in_recovery == true &&
> +        backup_stopped_in_recovery == false)
>
> Using == for Booleans may not be great?

Yes.  That's why 7d70809 does not use the way proposed by the previous
patch.
--
Michael

Attachment
On Mon, Sep 26, 2022 at 11:57:58AM +0530, Bharath Rupireddy wrote:
> +1 because callers don't use returned StringInfo structure outside of
> build_backup_content(). The patch looks good to me.

Thanks for looking.

> I think it will be
> good to add a note about the caller freeing up the retired string's
> memory [1], just in case.

Not sure that this is worth it.  It is fine to use palloc() in a
dedicated memory context, for one.
--
Michael

Attachment
This commit introduced BackupState struct. The comment of
do_pg_backup_start says that:

> * It fills in backup_state with the information required for the backup,

And the parameters are:

> do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
>                    BackupState *state, StringInfo tblspcmapfile)

So backup_state is different from both the type BackupState and the
parameter state.  I find it annoying.  Don't we either rename the
parameter or fix the comment?

The parameter "state" sounds a bit too generic. So I prefer to rename
the parameter to backup_state, as the attached.

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
On Tue, Sep 27, 2022 at 1:54 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> This commit introduced BackupState struct. The comment of
> do_pg_backup_start says that:
>
> > * It fills in backup_state with the information required for the backup,
>
> And the parameters are:
>
> > do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
> >                                  BackupState *state, StringInfo tblspcmapfile)
>
> So backup_state is different from both the type BackupState and the
> parameter state.  I find it annoying.  Don't we either rename the
> parameter or fix the comment?
>
> The parameter "state" sounds a bit too generic. So I prefer to rename
> the parameter to backup_state, as the attached.
>
> What do you think about this?

-1 from me. We have the function context and the structure name there
to represent that the parameter name 'state' is actually 'backup
state'. I don't think we gain anything here. Whenever the BackupState
is used away from any function, the variable name backup_state is
used, static variable in xlogfuncs.c

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com