Thread: Enhance traceability of wal_level changes for backup management

Enhance traceability of wal_level changes for backup management

From
"osumi.takamichi@fujitsu.com"
Date:
Hi,


This thread came from another thread about wal_level [1].

Mainly from backup management tools perspective
such as pgBackRest, EDB's BART and pg_probackup,
it seems worth talking about a way comprehensively
to trace and recognize wal_level changes for various purposes and values
like necessity of invalidating old backups for example.

In the thread [1], I talk about wal_level='none' but
these kind of topic applies changing wal_level to 'minimal'
from higher level too. Accordingly, I made this topic as a new independent thread.

Currently, these backup management tools described above
work when wal_level is higher than minimal
because these use physical online backup or wal archiving
but giving any useful ideas for backup management
related to wal_level changes is welcomed.

[1] -
https://www.postgresql.org/message-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320%40TYAPR01MB2990.jpnprd01.prod.outlook.com

Best Regards,
    Takamichi Osumi



RE: Enhance traceability of wal_level changes for backup management

From
"osumi.takamichi@fujitsu.com"
Date:
Hello, Sawada-san


I'll continue the discussion of [2].
We talked about how to recognize the time or LSN
when/where wal_level is changed to 'none' there.

You said
> The use case I imagined is that the user temporarily
> changes wal_level to 'none' from 'replica' or 'logical' to speed up loading and
> changes back to the normal. In this case, the backups taken before the
> wal_level change cannot be used to restore the database to the point after the
> wal_level change. So I think backup management tools would want to
> recognize the time or LSN when/where wal_level is changed to ‘none’ in order
> to do some actions such as invalidating older backups, re-calculating backup
> redundancy etc.
> Actually the same is true when the user changes to ‘minimal’. The tools would
> need to recognize the time or LSN in this case too. Since temporarily changing
> wal_level has been an uncommon use case some tools perhaps are not aware
> of that yet. But since introducing wal_level = ’none’ could make the change
> common, I think we need to provide a way for the tools.

I wondered, couldn't backup management tools utilize the information
in the backup that is needed to be made when wal_level is changed to "none" for example ?

As I said before, existing backup management tools support
only wal_level=replica or logical at present. And, if they would wish to alter the
status quo and want to cover the changes over wal_levels, I felt it's natural that
they support feature like taking a full backup, trigged by the wal_level changes (or server stop).

This is because taking a backup is a must for wal_level=none,
as I described in the patch of wal_level=none.
For example, they could prepare an easy way to
take an offline physical backup when the server stops for changing the wal_level.
(Here, they can support the change to minimal, too.)

What did you think ?

[2] - https://www.postgresql.org/message-id/CAD21AoCotoAxxCmMVz6niwg4j6c3Er_-GboTLmHBft8pALpOGA%40mail.gmail.com

Best Regards,
    Takamichi Osumi



Re: Enhance traceability of wal_level changes for backup management

From
Stephen Frost
Date:
Greetings,

* osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> You said
> > The use case I imagined is that the user temporarily
> > changes wal_level to 'none' from 'replica' or 'logical' to speed up loading and
> > changes back to the normal. In this case, the backups taken before the
> > wal_level change cannot be used to restore the database to the point after the
> > wal_level change. So I think backup management tools would want to
> > recognize the time or LSN when/where wal_level is changed to ‘none’ in order
> > to do some actions such as invalidating older backups, re-calculating backup
> > redundancy etc.
> > Actually the same is true when the user changes to ‘minimal’. The tools would
> > need to recognize the time or LSN in this case too. Since temporarily changing
> > wal_level has been an uncommon use case some tools perhaps are not aware
> > of that yet. But since introducing wal_level = ’none’ could make the change
> > common, I think we need to provide a way for the tools.

I continue to be against the idea of introducing another wal_level.  If
there's additional things we can do to reduce WAL traffic while we
continue to use it to keep the system in a consistent state then we
should implement those for the 'minimal' case and be done with it.
Adding another wal_level is just going to be confusing and increase
complexity, and the chances that someone will end up in a bad state.

> I wondered, couldn't backup management tools utilize the information
> in the backup that is needed to be made when wal_level is changed to "none" for example ?

What information is that, exactly?  If there's a way to detect that the
wal_level has been flipped to 'minimal' and then back to 'replica',
other than scanning the WAL, we'd certainly like to hear of it, so we
can implement logic in pgbackrest to detect that happening.  I'll admit
that I've not gone hunting for such of late, but I don't recall seeing
anything that would change that either.

The idea proposed above about having the LSN and time recorded when a
wal_level change to minimal happens, presumably in pg_control, seems
like it could work as a way to allow external tools to more easily
figure out if the wal_level's been changed to minimal.  Perhaps there's
a reason to track changes between replica and logical but I can't think
of any offhand and backup tools would still need to know if the wal
level was set to *minimal* independently of changes between replica and
logical.

Then again, once we add support for scanning the WAL to pgbackrest,
we'll almost certainly track it that way since that'll work for older
and released versions of PG, so I'm not really sure it's worth it to add
this to pg_control unless there's other reasons to.

> As I said before, existing backup management tools support
> only wal_level=replica or logical at present. And, if they would wish to alter the
> status quo and want to cover the changes over wal_levels, I felt it's natural that
> they support feature like taking a full backup, trigged by the wal_level changes (or server stop).

Sure, but there needs to be a way to actually do that..

> This is because taking a backup is a must for wal_level=none,
> as I described in the patch of wal_level=none.
> For example, they could prepare an easy way to
> take an offline physical backup when the server stops for changing the wal_level.
> (Here, they can support the change to minimal, too.)

pgbackrest does support offline physical backups and it's pretty easy
(just pass in --no-online).  That doesn't really help with the issue of
detecting a wal_level change though.

Thanks,

Stephen

Attachment

RE: Enhance traceability of wal_level changes for backup management

From
"tsunakawa.takay@fujitsu.com"
Date:
From: osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com>
> I wondered, couldn't backup management tools utilize the information
> in the backup that is needed to be made when wal_level is changed to "none"
> for example ?

IIRC, someone proposed in the original thread that the change count can be recorded in pg_control.  The change count is
incrementedwhen wal_level is changed from replica or higher to minimal or lower.  Maybe you can do it easily in
XLogReportParameters().

Then, the backup management tool compares the change counts of pg_control in a backup and that of the current
pg_control. If the change count is different, the tool assumes that the backup cannot be used to recover the database
upto date. 

Ideally, it'd be desirable for PostgreSQL core to have a backup catalog management capability like Oracle RMAN.  Then,
whenthe wal_level is changed, Postgres may be able to invalidate all backups in the backup catalog. 


> As I said before, existing backup management tools support
> only wal_level=replica or logical at present. And, if they would wish to alter the
> status quo and want to cover the changes over wal_levels, I felt it's natural that
> they support feature like taking a full backup, trigged by the wal_level changes
> (or server stop).

In that regard, a feature like Oracle Server Alert would be useful.  When important events occur, the database server
recordsthem in the alert queue.  Administration tools read from the alert queue and act accordingly.  wal_level change
canbe recorded in the alert queue, and the backup management tool polls the queue and detect the change. 


Regards
Takayuki Tsunakawa





RE: Enhance traceability of wal_level changes for backup management

From
"osumi.takamichi@fujitsu.com"
Date:
Hi Stephen


Thank you so much for replying !
On Thursday, January 7, 2021 2:40 AM Stephen Frost <sfrost@snowman.net> wrote:
> * osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> > You said
> > > The use case I imagined is that the user temporarily changes
> > > wal_level to 'none' from 'replica' or 'logical' to speed up loading
> > > and changes back to the normal. In this case, the backups taken
> > > before the wal_level change cannot be used to restore the database
> > > to the point after the wal_level change. So I think backup
> > > management tools would want to recognize the time or LSN when/where
> > > wal_level is changed to ‘none’ in order to do some actions such as
> > > invalidating older backups, re-calculating backup redundancy etc.
> > > Actually the same is true when the user changes to ‘minimal’. The
> > > tools would need to recognize the time or LSN in this case too.
> > > Since temporarily changing wal_level has been an uncommon use case
> > > some tools perhaps are not aware of that yet. But since introducing
> > > wal_level = ’none’ could make the change common, I think we need to
> provide a way for the tools.
> 
> I continue to be against the idea of introducing another wal_level.  If there's
> additional things we can do to reduce WAL traffic while we continue to use it to
> keep the system in a consistent state then we should implement those for the
> 'minimal' case and be done with it.
The new wal_level=none patch achieves something that cannot be done
or cannot be implemented together with wal_level='minimal' clearly.
Did you have a look at the peformance evaluation result that I conducted in [1] ?
It proved that data loading of 'none' is much faster than that of 'minimal'.


> Adding another wal_level is just going to be confusing and increase complexity,
> and the chances that someone will end up in a bad state.
Even if when we committed another idea,
that is "ALTER TABLE tbl SET UNLOGGED/LOGGED without copying relation data",
the complexity like taking a full backup before bulk data loading didn't change
and when any accidents happened during no wal logging for specific table with the improvement,
user would need to start from the backup again. This looks same to me.
Additionally, the patch itself in that thread is big and more complicated.
The complexity you meant is the wal_level's impact to backup management tools or anything else ?

> > I wondered, couldn't backup management tools utilize the information
> > in the backup that is needed to be made when wal_level is changed to "none"
> for example ?
> 
> What information is that, exactly?  If there's a way to detect that the wal_level
> has been flipped to 'minimal' and then back to 'replica', other than scanning the
> WAL, we'd certainly like to hear of it, so we can implement logic in pgbackrest
> to detect that happening.  I'll admit that I've not gone hunting for such of late,
> but I don't recall seeing anything that would change that either.
Excuse me for making you confused.
I was thinking about control file in the backup as information.

I'm not familiar with the internals of those backup management tools
but do they monitor the control file and its values of the runnning server at short intervals ?
And, if they don't do so and if we want accurate time or LSN that indicates wal_level changes,
I thought we could pick up exact information from control file of cluster directory or its backup
during server stop (because changing wal_level requires server stop).
That's all. Sorry for the noise.

> The idea proposed above about having the LSN and time recorded when a
> wal_level change to minimal happens, presumably in pg_control, seems like it
> could work as a way to allow external tools to more easily figure out if the
> wal_level's been changed to minimal.  Perhaps there's a reason to track
> changes between replica and logical but I can't think of any offhand and backup
> tools would still need to know if the wal level was set to *minimal*
> independently of changes between replica and logical.
Here, probably we use different assumptions.
What you say makes sense if we commit neither the patch of
ALTER TABLE SET UNLOGGED/LOGGED without copy (during wal_level=minimal)
nor the patch of new wal_level 'none' ?
We were talking about a case that fast bulk data loading is needed
and the user changes wal_level to 'none' or 'minimal' temporarily
from higher level in order to speed up data loading and make it back to the higher level again.
After this operation, we need to invalidate the old backups taken before the data loading.
In this case, we have to track the change to 'minimal'. Didn't it make sense ?

> Then again, once we add support for scanning the WAL to pgbackrest, we'll
> almost certainly track it that way since that'll work for older and released
> versions of PG, so I'm not really sure it's worth it to add this to pg_control
> unless there's other reasons to.
> 
> > As I said before, existing backup management tools support only
> > wal_level=replica or logical at present. And, if they would wish to
> > alter the status quo and want to cover the changes over wal_levels, I
> > felt it's natural that they support feature like taking a full backup, trigged by
> the wal_level changes (or server stop).
> 
> Sure, but there needs to be a way to actually do that..
> 
> > This is because taking a backup is a must for wal_level=none, as I
> > described in the patch of wal_level=none.
> > For example, they could prepare an easy way to take an offline
> > physical backup when the server stops for changing the wal_level.
> > (Here, they can support the change to minimal, too.)
> 
> pgbackrest does support offline physical backups and it's pretty easy (just pass
> in --no-online).  That doesn't really help with the issue of detecting a wal_level
> change though.
Thanks for the tip,


[1] -
https://www.postgresql.org/message-id/OSBPR01MB48887ECC140A97494C542264EDF40%40OSBPR01MB4888.jpnprd01.prod.outlook.com

Best Regards,
    Takamichi Osumi

Re: Enhance traceability of wal_level changes for backup management

From
Stephen Frost
Date:
Greetings,

* osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> On Thursday, January 7, 2021 2:40 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> > > You said
> > > > The use case I imagined is that the user temporarily changes
> > > > wal_level to 'none' from 'replica' or 'logical' to speed up loading
> > > > and changes back to the normal. In this case, the backups taken
> > > > before the wal_level change cannot be used to restore the database
> > > > to the point after the wal_level change. So I think backup
> > > > management tools would want to recognize the time or LSN when/where
> > > > wal_level is changed to ‘none’ in order to do some actions such as
> > > > invalidating older backups, re-calculating backup redundancy etc.
> > > > Actually the same is true when the user changes to ‘minimal’. The
> > > > tools would need to recognize the time or LSN in this case too.
> > > > Since temporarily changing wal_level has been an uncommon use case
> > > > some tools perhaps are not aware of that yet. But since introducing
> > > > wal_level = ’none’ could make the change common, I think we need to
> > provide a way for the tools.
> >
> > I continue to be against the idea of introducing another wal_level.  If there's
> > additional things we can do to reduce WAL traffic while we continue to use it to
> > keep the system in a consistent state then we should implement those for the
> > 'minimal' case and be done with it.

> The new wal_level=none patch achieves something that cannot be done
> or cannot be implemented together with wal_level='minimal' clearly.

I disagree.

> Did you have a look at the peformance evaluation result that I conducted in [1] ?
> It proved that data loading of 'none' is much faster than that of 'minimal'.

That test claims to have generated 10G worth of WAL, which makes it seem
pretty likely that you didn't use UNLOGGED tables, and didn't create the
table in the same transaction as you performed the data loading for that
table, so, naturally, you got the full amount of WAL.

> > Adding another wal_level is just going to be confusing and increase complexity,
> > and the chances that someone will end up in a bad state.
> Even if when we committed another idea,
> that is "ALTER TABLE tbl SET UNLOGGED/LOGGED without copying relation data",
> the complexity like taking a full backup before bulk data loading didn't change
> and when any accidents happened during no wal logging for specific table with the improvement,
> user would need to start from the backup again. This looks same to me.

Yes, there is still the issue that running with wal_level = minimal for
a period of time means that you can't perform PITR from before the
change to minimal through to the time after it's been changed back to
something higher than minimal.  That's no different, I agree.  That
isn't an argument to introduce another WAL level though.

> Additionally, the patch itself in that thread is big and more complicated.

This isn't a reason to avoid introducing another wal_level.

> The complexity you meant is the wal_level's impact to backup management tools or anything else ?

The complexity of having another wal_level and having to document it and
explain how it behaves to users, particularly when, effectively, it
shouldn't actually be any different from wal_level = minimal, assuming
we've found and implemented the interesting optimizations regarding
wal_level = minimal.

> > > I wondered, couldn't backup management tools utilize the information
> > > in the backup that is needed to be made when wal_level is changed to "none"
> > for example ?
> >
> > What information is that, exactly?  If there's a way to detect that the wal_level
> > has been flipped to 'minimal' and then back to 'replica', other than scanning the
> > WAL, we'd certainly like to hear of it, so we can implement logic in pgbackrest
> > to detect that happening.  I'll admit that I've not gone hunting for such of late,
> > but I don't recall seeing anything that would change that either.
> Excuse me for making you confused.
> I was thinking about control file in the backup as information.
>
> I'm not familiar with the internals of those backup management tools
> but do they monitor the control file and its values of the runnning server at short intervals ?
> And, if they don't do so and if we want accurate time or LSN that indicates wal_level changes,
> I thought we could pick up exact information from control file of cluster directory or its backup
> during server stop (because changing wal_level requires server stop).
> That's all. Sorry for the noise.

If it's in the control file then pg_controldata should be able to pull
it ount and all of the backup tools should be able to manage that, even
if they can't read it directly like tools which are also written in C,
so having it in the control file seems alright to me.

> > The idea proposed above about having the LSN and time recorded when a
> > wal_level change to minimal happens, presumably in pg_control, seems like it
> > could work as a way to allow external tools to more easily figure out if the
> > wal_level's been changed to minimal.  Perhaps there's a reason to track
> > changes between replica and logical but I can't think of any offhand and backup
> > tools would still need to know if the wal level was set to *minimal*
> > independently of changes between replica and logical.

> Here, probably we use different assumptions.
> What you say makes sense if we commit neither the patch of
> ALTER TABLE SET UNLOGGED/LOGGED without copy (during wal_level=minimal)
> nor the patch of new wal_level 'none' ?

Yes, it does.

> We were talking about a case that fast bulk data loading is needed
> and the user changes wal_level to 'none' or 'minimal' temporarily
> from higher level in order to speed up data loading and make it back to the higher level again.
> After this operation, we need to invalidate the old backups taken before the data loading.
> In this case, we have to track the change to 'minimal'. Didn't it make sense ?

The old backups don't actually need to be invaliated, but we want to
strongly encourage a new backup be taken as soon as possible since WAL
replay from an older backup through this period would end up having the
wrong result.  We really shouldn't need to track anything more than the
"LSN when wal_level was last seen to be less than replica" or something
along those lines.  I was questioning if there was any need to track the
changes to wal_level at a more granular level- none come to mind for me,
but perhaps someone else sees a reason to do so.

Thanks,

Stephen

Attachment

RE: Enhance traceability of wal_level changes for backup management

From
"osumi.takamichi@fujitsu.com"
Date:
Hi


Apologies for my delay.
On Wednesday, January 6, 2021 7:03 PM I wrote:
> I'll continue the discussion of [2].
> We talked about how to recognize the time or LSN when/where wal_level is
> changed to 'none' there.
>
> You said
> > The use case I imagined is that the user temporarily changes wal_level
> > to 'none' from 'replica' or 'logical' to speed up loading and changes
> > back to the normal. In this case, the backups taken before the
> > wal_level change cannot be used to restore the database to the point
> > after the wal_level change. So I think backup management tools would
> > want to recognize the time or LSN when/where wal_level is changed to
> > ‘none’ in order to do some actions such as invalidating older backups,
> > re-calculating backup redundancy etc.
> > Actually the same is true when the user changes to ‘minimal’. The
> > tools would need to recognize the time or LSN in this case too. Since
> > temporarily changing wal_level has been an uncommon use case some
> > tools perhaps are not aware of that yet. But since introducing
> > wal_level = ’none’ could make the change common, I think we need to
> provide a way for the tools.
Before my implementation, I'd like to confirm something.

As of now, I think there are two major ideas already.
I think to implement the 1st idea suffices.
If no one disagree with it, I'll proceed with (1) below.

(1) writing the time or LSN in the control file
to indicate when/where wal_level is changed to 'minimal'
from upper level to invalidate the old backups or make alerts to users.
I think we reset this when user executes pg_basebackup successfully.

(2) implementing incremental counters that indicates
drop of wal_level from replica to minimal(or between other levels).
Its purpose was to compare the wal_level changes between snapshots.
When any monitoring tools detect any difference of the counter,
we can predict something happened immediately without checking WAL in between.

The former could give accureate information for backup management
while the latter gives easier way to compare snapshots, I think.

By the way, thankfully I got advice to refer to
Oracle's feature such as Oracle Server Alert or
backup catalog management capability from Tsunakawa-San.
However, because those development would be huge, then
I'd like to choose either the first one or the second one
and for the purpose to give better information, I prefer the first one.

Any comments ?

Best Regards,
    Takamichi Osumi




RE: Enhance traceability of wal_level changes for backup management

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


On Thursday, January 21, 2021 11:19 PM I wrote:
> If no one disagree with it, I'll proceed with (1) below.
>
> (1) writing the time or LSN in the control file to indicate when/where wal_level
> is changed to 'minimal'
> from upper level to invalidate the old backups or make alerts to users.
I attached the first patch which implementes this idea.
It was aligned by pgindent and shows no regression.

Best Regards,
    Takamichi Osumi


Attachment

Re: Enhance traceability of wal_level changes for backup management

From
Stephen Frost
Date:
Greetings,

* tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> From: David Steele <david@pgmasters.net>
> > As a backup software author, I don't see this feature as very useful.
> >
> > The problem is that there are lots of ways for WAL to go missing so
> > monitoring the WAL archive for gaps is essential and this feature would
> > not replace that requirement. The only extra information you'd get is
> > the ability to classify the most recent gap as "intentional", maybe.
>
> But how do you know there's any missing WAL?  I think there are the following cases of missing WAL:

> 1. A WAL segment file is missing.  e.g., 00000001 and 00000003 exist, but 00000002 doesn't.
>
> 2. All consecutive WAL segment files appear to exist, but some WAL records are missing.
> This occurs ?only? when some WAL-optimized statements are run while wal_level = minimal.
>
> Currently, backup management tools can detect 1 by scanning through the WAL archive directory.  But the can't notice
2. The patch addresses this. 

They could notice #2 also by scanning the WAL, but that's certainly a
lot more work than just looking in pg_control.

* Peter Eisentraut (peter.eisentraut@enterprisedb.com) wrote:
> On 08.03.21 03:45, osumi.takamichi@fujitsu.com wrote:
> >OK. The basic idea is to enable backup management
> >tools to recognize wal_level drop between*snapshots*.
> >When you have a snapshot of the cluster at one time and another one
> >at different time, with this new parameter, you can see if
> >anything that causes discontinuity from the drop happens
> >in the middle of the two snapshots without efforts to have a look at the WALs in between.
>
> Is this an actual problem?  Changing wal_level requires a restart.  Are
> users frequently restarting their servers to change wal_level and then
> wonder why their backups are misbehaving or incomplete?  Why?  Just like
> fsync is "breaks your database", wal_level might as well be "breaks your
> backups".  Is it not documented well enough?

We explicitly document that people can switch the WAL level and restart
to do bulk data loads faster, and there's certainly no shortage of
discussion (including what prompted this thread..) about doing exactly
that.  Adding more documentation around that would certainly be good,
as would changing this:

ereport(WARNING,
(errmsg("WAL was generated with wal_level=minimal, data may be missing"),
errhint("This happens if you temporarily set wal_level=minimal without taking a new base backup.")));

into a PANIC instead of a WARNING.  It's simply far too easy to end up
with corruption in the system when doing PITR through a period of time
when the WAL level was set to minimal.  Unfortunately, if the user
didn't happen to know that they needed to take a new full backup after
flipping to minimal and back then they could end up with corruption at
restore/replay time which is certainly not when you want anything to go
wrong.  If it was available in the control file then we could more
proactively make noise at the user to take a new full backup.

Thanks,

Stephen

Attachment

Re: Enhance traceability of wal_level changes for backup management

From
vignesh C
Date:
On Thu, Jan 28, 2021 at 6:14 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hello
>
>
> On Thursday, January 21, 2021 11:19 PM I wrote:
> > If no one disagree with it, I'll proceed with (1) below.
> >
> > (1) writing the time or LSN in the control file to indicate when/where wal_level
> > is changed to 'minimal'
> > from upper level to invalidate the old backups or make alerts to users.
> I attached the first patch which implementes this idea.
> It was aligned by pgindent and shows no regression.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh