Thread: Is WAL_DEBUG related code still relevant today?

Is WAL_DEBUG related code still relevant today?

From
Bharath Rupireddy
Date:
Hi,

I was recently looking at the code around the WAL_DEBUG macro and GUC.
When enabled, the code does the following:

1. Creates a memory context that allows pallocs within critical sections.
2. Decodes (not logical decoding but DecodeXLogRecord()) every WAL
record using the above memory context that's generated in the server
and emits a LOG message.
3. Emits messages at DEBUG level in AdvanceXLInsertBuffer(),  at LOG
level in XLogFlush(), at LOG level in XLogBackgroundFlush().
4. Emits messages at LOG level for every record that the server
replays/applies in the main redo loop.

I enabled this code by compiling with the WAL_DEBUG macro and setting
wal_debug GUC to on. Firstly, the compilation on Windows failed
because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
used. After fixing the compilation issue [1], the TAP tests started to
fail [2] which I'm sure we can fix.

I started to think if this code is needed at all in production. How
about we do either of the following?

a) Remove the WAL_DEBUG macro and move all the code under the
wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
the users will know the consequences of enabling it in production.
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.

I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix the
compilation and test failure issues, and start testing this code
(perhaps I can think of setting up a buildfarm member to help here).

I'm in favour of option (b), but I'd like to hear more thoughts on this.

[1]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index ca7100d4db..52633793d4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1023,8 +1023,12 @@ XLogInsertRecord(XLogRecData *rdata,

palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));

                if (!debug_reader)
-                       debug_reader =
XLogReaderAllocate(wal_segment_size, NULL,
-
                   XL_ROUTINE(), NULL);
+                       debug_reader = XLogReaderAllocate(wal_segment_size,
+
                   NULL,
+
                   XL_ROUTINE(.page_read = NULL,
+
                                          .segment_open = NULL,
+
                                          .segment_close = NULL),
+
                                          NULL);

[2]
src/test/subscription/t/029_on_error.pl because the test gets LSN from
an error context message emitted to server logs which the new
WAL_DEBUG LOG messages flood the server logs with.
src/bin/initdb/t/001_initdb.pl because the WAL_DEBUG LOG messages are
emitted to the console while initdb.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Is WAL_DEBUG related code still relevant today?

From
Nathan Bossart
Date:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
> I started to think if this code is needed at all in production. How
> about we do either of the following?

Well, the fact that this code is hidden behind an off-by-default macro
seems like a pretty strong indicator that it is not intended for
production.  But that doesn't mean we should remove it. 

> a) Remove the WAL_DEBUG macro and move all the code under the
> wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
> the users will know the consequences of enabling it in production.

I think the key to this option is verifying there's no measurable
performance impact.

> b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
> think (2) is needed to be in core especially when tools like
> pg_walinspect and pg_waldump can do the same job. And, the messages in
> (3) and (4) can be turned to some DEBUGX level without being under the
> WAL_DEBUG macro.

Is there anything provided by wal_debug that can't be found via
pg_walinspect/pg_waldump?

> I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
> production, if we have somebody using it, I think we need to fix the
> compilation and test failure issues, and start testing this code
> (perhaps I can think of setting up a buildfarm member to help here).

+1 for at least fixing the code and tests, provided we decide to keep it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Is WAL_DEBUG related code still relevant today?

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
>> I started to think if this code is needed at all in production. How
>> about we do either of the following?

> Well, the fact that this code is hidden behind an off-by-default macro
> seems like a pretty strong indicator that it is not intended for
> production.  But that doesn't mean we should remove it. 

Agreed, production is not the question here.  The question is whether
it's of any use to developers either.  It looks to me that the code's
been broken since v13, if not before, which very strongly suggests
that nobody is using it.  Think I'd vote for nuking it rather than
putting effort into fixing it.

            regards, tom lane



Re: Is WAL_DEBUG related code still relevant today?

From
Bharath Rupireddy
Date:
On Sun, Dec 3, 2023 at 4:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Nathan Bossart <nathandbossart@gmail.com> writes:
> > On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
> >> I started to think if this code is needed at all in production. How
> >> about we do either of the following?
>
> > Well, the fact that this code is hidden behind an off-by-default macro
> > seems like a pretty strong indicator that it is not intended for
> > production.  But that doesn't mean we should remove it.
>
> Agreed, production is not the question here.  The question is whether
> it's of any use to developers either.  It looks to me that the code's
> been broken since v13, if not before, which very strongly suggests
> that nobody is using it.  Think I'd vote for nuking it rather than
> putting effort into fixing it.

How about something like the attached? Please see the commit message
for more detailed information.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
Bharath Rupireddy
Date:
On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
> > I started to think if this code is needed at all in production. How
> > about we do either of the following?
>
> Well, the fact that this code is hidden behind an off-by-default macro
> seems like a pretty strong indicator that it is not intended for
> production.  But that doesn't mean we should remove it.

I think all that the WAL_DEBUG code offers can be achieved with other
means after adjusting a few pieces. Please see the commit message in
the v1 patch posted in this thread at
https://www.postgresql.org/message-id/CALj2ACW5zPMT09eqXvh2Ve7Kz02HShTwyjG%2BxTzkrzeKtQMnQQ%40mail.gmail.com.

> > a) Remove the WAL_DEBUG macro and move all the code under the
> > wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
> > the users will know the consequences of enabling it in production.
>
> I think the key to this option is verifying there's no measurable
> performance impact.

FWIW, enabling this has a huge impact in production. For instance,
recovery TAP tests are ~10% slower with the WAL_DEBUG macro enabled. I
don't think we go the route of keeping this code.

WAL_DEBUG macro enabled:
All tests successful.
Files=38, Tests=531, 157 wallclock secs ( 0.18 usr  0.05 sys + 14.96
cusr 16.11 csys = 31.30 CPU)
Result: PASS

HEAD:
All tests successful.
Files=38, Tests=531, 143 wallclock secs ( 0.15 usr  0.06 sys + 14.24
cusr 15.62 csys = 30.07 CPU)
Result: PASS

> > b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
> > think (2) is needed to be in core especially when tools like
> > pg_walinspect and pg_waldump can do the same job. And, the messages in
> > (3) and (4) can be turned to some DEBUGX level without being under the
> > WAL_DEBUG macro.
>
> Is there anything provided by wal_debug that can't be found via
> pg_walinspect/pg_waldump?

I don't think so. The WAL record decoding can be achieved with
pg_walinspect or pg_waldump. The page comparison check in
generic_xlog.c can be moved under USE_ASSERT_CHECKING macro. PSA v1
patch posted in this thread.

> > I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
> > production, if we have somebody using it, I think we need to fix the
> > compilation and test failure issues, and start testing this code
> > (perhaps I can think of setting up a buildfarm member to help here).
>
> +1 for at least fixing the code and tests, provided we decide to keep it.

With no real use of this code in production, instead of fixing
compilation issues and TAP test failures to maintain the code, I think
it's better to adjust a few pieces and remove the other stuff like in
the attached v1 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Is WAL_DEBUG related code still relevant today?

From
Noah Misch
Date:
On Sun, Dec 03, 2023 at 08:30:24PM +0530, Bharath Rupireddy wrote:
> On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
> > > b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
> > > think (2) is needed to be in core especially when tools like
> > > pg_walinspect and pg_waldump can do the same job. And, the messages in
> > > (3) and (4) can be turned to some DEBUGX level without being under the
> > > WAL_DEBUG macro.
> >
> > Is there anything provided by wal_debug that can't be found via
> > pg_walinspect/pg_waldump?
> 
> I don't think so. The WAL record decoding can be achieved with
> pg_walinspect or pg_waldump.

Can be, but the WAL_DEBUG model is mighty convenient:
- Cooperates with backtrace_functions
- Change log_line_prefix to correlate any log_line_prefix fact with WAL records
- See WAL records interleaved with non-WAL log messages

> > > I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
> > > production, if we have somebody using it, I think we need to fix the

I don't use it in production, but I use it more than any other of our many
DEBUG macros.

> > > compilation and test failure issues, and start testing this code
> > > (perhaps I can think of setting up a buildfarm member to help here).

WAL_DEBUG compiles and works just fine on GNU/Linux.  I'm not surprised the
failure to compile on Windows has escaped notice, because Windows-specific WAL
behaviors are so rare.  We consistently do our WAL-related development on
non-Windows.  Needless to say, I wouldn't object to fixing WAL_DEBUG for
Windows.

Fixing tests is less valuable, especially since it's clear when a test fails
through extra messages the test didn't expect.  I bet other DEBUG macros make
some tests fail that way, which doesn't devalue those macros.  A test patch
might be okay nonetheless, but a buildfarm member is more likely to have
negative value.  It would create urgent work.  In the hypothetical buildfarm
member's absence, the project would be just fine if that work never happens.
A buildfarm member that compiles but doesn't test could be okay.



Re: Is WAL_DEBUG related code still relevant today?

From
Michael Paquier
Date:
On Sun, Dec 03, 2023 at 11:07:05AM -0800, Noah Misch wrote:
> Can be, but the WAL_DEBUG model is mighty convenient:
> - Cooperates with backtrace_functions
> - Change log_line_prefix to correlate any log_line_prefix fact with WAL records
> - See WAL records interleaved with non-WAL log messages
>
> I don't use it in production, but I use it more than any other of our many
> DEBUG macros.

So do I as a quick workaround to check the validity of records
generated without having to spawn a standby replaying the records.
Since 027_stream_regress.pl exists, I agree that its value has
decreased and that all patches should have queries to check their
records anyway, but it does not make it useless for developers.

> Fixing tests is less valuable, especially since it's clear when a test fails
> through extra messages the test didn't expect.  I bet other DEBUG macros make
> some tests fail that way, which doesn't devalue those macros.  A test patch
> might be okay nonetheless, but a buildfarm member is more likely to have
> negative value.  It would create urgent work.  In the hypothetical buildfarm
> member's absence, the project would be just fine if that work never happens.
> A buildfarm member that compiles but doesn't test could be okay.

I can add the flag in one of my nix animals if we don't have any to
provide minimal coverage, that's not an issue for me.  I'd suggest to
just fix the build on Windows, this flag is a low maintenance burden.
--
Michael

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
Michael Paquier
Date:
On Mon, Dec 04, 2023 at 10:14:36AM +0900, Michael Paquier wrote:
> I can add the flag in one of my nix animals if we don't have any to
> provide minimal coverage, that's not an issue for me.  I'd suggest to
> just fix the build on Windows, this flag is a low maintenance burden.

Hearing nothing about that, I've reproduced the failure, checked that
the proposed fix is OK, and applied it down to 13 where this was
introduced.

Regarding the tests, like Noah, I am not really sure that it is worth
spending resources on fixing as they'd require wal_debug = on to
break.
--
Michael

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
Peter Eisentraut
Date:
On 02.12.23 15:06, Bharath Rupireddy wrote:
> I enabled this code by compiling with the WAL_DEBUG macro and setting
> wal_debug GUC to on. Firstly, the compilation on Windows failed
> because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
> used.

This kind of thing could be mostly avoided if we didn't hide all the 
WAL_DEBUG behind #ifdefs.  For example, in the attached patch, I instead 
changed it so that

     if (XLOG_DEBUG)

resolves to

     if (false)

in the normal case.  That way, we don't need to wrap that in #ifdef 
WAL_DEBUG, and the compiler can see the disabled code and make sure it 
continues to build.

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
"Euler Taveira"
Date:
On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
On 02.12.23 15:06, Bharath Rupireddy wrote:
> I enabled this code by compiling with the WAL_DEBUG macro and setting
> wal_debug GUC to on. Firstly, the compilation on Windows failed
> because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
> used.

This kind of thing could be mostly avoided if we didn't hide all the 
WAL_DEBUG behind #ifdefs.

AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
is a variable but seems like a constant surprises me. I would rename it to
XLogDebug or xlog_debug.

in the normal case.  That way, we don't need to wrap that in #ifdef 
WAL_DEBUG, and the compiler can see the disabled code and make sure it 
continues to build.

I didn't check the LOCK_DEBUG code path to make sure it fits in the same
category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
there.


--
Euler Taveira

Re: Is WAL_DEBUG related code still relevant today?

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> This kind of thing could be mostly avoided if we didn't hide all the 
> WAL_DEBUG behind #ifdefs.  For example, in the attached patch, I instead 
> changed it so that
>      if (XLOG_DEBUG)
> resolves to
>      if (false)
> in the normal case.  That way, we don't need to wrap that in #ifdef 
> WAL_DEBUG, and the compiler can see the disabled code and make sure it 
> continues to build.

Hmm, maybe, but I'm not sure this would be an unalloyed good.
The main concern I have is compilers and static analyzers starting
to bleat about unreachable code (warnings like "variable set but
never used", or the like, seem plausible).  The dead code would
also decrease our code coverage statistics, not that those are
wonderful now.

            regards, tom lane



Re: Is WAL_DEBUG related code still relevant today?

From
Michael Paquier
Date:
On Wed, Dec 06, 2023 at 09:46:09AM -0300, Euler Taveira wrote:
> On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
>> This kind of thing could be mostly avoided if we didn't hide all the
>> WAL_DEBUG behind #ifdefs.
>
> AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
> is a variable but seems like a constant surprises me. I would rename it to
> XLogDebug or xlog_debug.

+1.  Or just wal_debug for greppability.

>> in the normal case.  That way, we don't need to wrap that in #ifdef
>> WAL_DEBUG, and the compiler can see the disabled code and make sure it
>> continues to build.
>
> I didn't check the LOCK_DEBUG code path to make sure it fits in the same
> category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
> there.

PerformWalRecovery() with its log for RM_XACT_ID is something that
stresses me a bit though because this is in the main redo loop which
is never free.  The same can be said about GenericXLogFinish() because
the extra computation happens while holding a buffer and marking it
dirty.  The ones in xlog.c are free of charge as they are called
outside any critical portions.

This makes me wonder how much we need to care about
trace_recovery_messages, actually, and I've never used it.
--
Michael

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
"Euler Taveira"
Date:
On Wed, Dec 6, 2023, at 9:51 PM, Michael Paquier wrote:
PerformWalRecovery() with its log for RM_XACT_ID is something that
stresses me a bit though because this is in the main redo loop which
is never free.  The same can be said about GenericXLogFinish() because
the extra computation happens while holding a buffer and marking it
dirty.  The ones in xlog.c are free of charge as they are called
outside any critical portions.

This makes me wonder how much we need to care about
trace_recovery_messages, actually, and I've never used it.

IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
introduced. I'm also wondering if anyone used it in the past years.

elog.c:

* Intention is to keep this for at least the whole of the 9.0 production
* release, so we can more easily diagnose production problems in the field.
* It should go away eventually, though, because it's an ugly and
* hard-to-explain kluge.
*/
int
trace_recovery(int trace_level)
{
    if (trace_level < LOG &&
        trace_level >= trace_recovery_messages)
        return LOG; 

    return trace_level;
}


--
Euler Taveira

Re: Is WAL_DEBUG related code still relevant today?

From
Michael Paquier
Date:
On Wed, Dec 06, 2023 at 11:32:19PM -0300, Euler Taveira wrote:
> IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
> introduced. I'm also wondering if anyone used it in the past years.

FWIW, I'd be +1 for getting rid of entirely, with its conditional
block in PerformWalRecovery(), as it does not bring any additional
value now that it is possible to achieve much more with pg_waldump
(pg_xlogdump before that) introduced a couple of years later in 9.3.
--
Michael

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
Bharath Rupireddy
Date:
On Mon, Dec 4, 2023 at 12:37 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Sun, Dec 03, 2023 at 08:30:24PM +0530, Bharath Rupireddy wrote:
> > On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > > On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
> > > > b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
> > > > think (2) is needed to be in core especially when tools like
> > > > pg_walinspect and pg_waldump can do the same job. And, the messages in
> > > > (3) and (4) can be turned to some DEBUGX level without being under the
> > > > WAL_DEBUG macro.
> > >
> > > Is there anything provided by wal_debug that can't be found via
> > > pg_walinspect/pg_waldump?
> >
> > I don't think so. The WAL record decoding can be achieved with
> > pg_walinspect or pg_waldump.
>
> Can be, but the WAL_DEBUG model is mighty convenient:
> - Cooperates with backtrace_functions
> - Change log_line_prefix to correlate any log_line_prefix fact with WAL records
> - See WAL records interleaved with non-WAL log messages

Agree it helps in all of the above situations, but I'm curious to know
what sorts of problems it helps debug with.

The interesting pieces that WAL_DEBUG code does are the following:

1. Decodes the WAL record right after it's written to WAL buffers in
XLogInsertRecord. What problem does it help to detect?
2. Emits a log message for every WAL record applied in the main redo
apply loop. Enabling this isn't cheap for sure even for developer
environments; I've observed a 10% increase in recovery test time)
3. Emits log messages for WAL writes/flushes and WAL buffer page
initializations. These messages don't have to be hiding under a macro,
but a DEBUGX level is sufficient.

> > > > I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
> > > > production, if we have somebody using it, I think we need to fix the
>
> I don't use it in production, but I use it more than any other of our many
> DEBUG macros.

I'm just curious to know what sorts of problems WAL_DEBUG code helps
debug with. Is the WAL_DEBUG code (1) or (2) or (3) that helped you
the most? Is converting the LOG messages (3) to DEBUGX level going to
help in your case? Can you please throw some light on this?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Is WAL_DEBUG related code still relevant today?

From
Bharath Rupireddy
Date:
On Thu, Dec 7, 2023 at 8:10 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 06, 2023 at 11:32:19PM -0300, Euler Taveira wrote:
> > IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
> > introduced. I'm also wondering if anyone used it in the past years.
>
> FWIW, I'd be +1 for getting rid of entirely, with

+1 for removing trace_recovery_messages. Firstly, it doesn't cover all
the recovery related messages as it promises, so it's an incomplete
feature. Secondly, it needs a bit of understanding as to how it gels
with client_min_messages and log_min_messages.

> its conditional
> block in PerformWalRecovery(), as it does not bring any additional
> value now that it is possible to achieve much more with pg_waldump
> (pg_xlogdump before that) introduced a couple of years later in 9.3.

And, I agree that the functionality (description of the WAL record
being applied) of conditional trace_recovery_messages code under
WAL_DEBUG macro in PerformWalRecovery's main redo apply loop can more
easily be achieved with either pg_walinspect or pg_waldump. That's my
point as well for getting rid of WAL_DEBUG macro related code after
converting a few messages to DEBUGX level.

The comment atop trace_recovery [1] function says it should go away
eventually and seems to have served the purpose when the recovery
related code was introduced in PG 9.0.

FWIW, the attached patch is what I've left with after removing
trace_recovery_messages related code, 9 files changed, 19
insertions(+), 97 deletions(-).

[1]
 * Intention is to keep this for at least the whole of the 9.0 production
 * release, so we can more easily diagnose production problems in the field.
 * It should go away eventually, though, because it's an ugly and
 * hard-to-explain kluge.
 */
int
trace_recovery(int trace_level)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
Robert Haas
Date:
On Thu, Dec 7, 2023 at 6:20 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I'm just curious to know what sorts of problems WAL_DEBUG code helps
> debug with. Is the WAL_DEBUG code (1) or (2) or (3) that helped you
> the most? Is converting the LOG messages (3) to DEBUGX level going to
> help in your case? Can you please throw some light on this?

I don't like the idea of removing WAL_DEBUG. I haven't used it in a
while, but I have used it, and it's been very helpful. You can for
example run a psql session and easily see what each command you type
generates in terms of WAL, without having to run pg_waldump over and
over and figure out which output is new. That's not something that I
need to do super-commonly, but I have wanted to do it for certain
projects, and I don't think that maintaining the WAL_DEBUG code in
tree is really causing us very much hassle. In fact, I'd argue that of
all of the various debugging bits that are part of the tree, WAL_DEBUG
is the most useful by a good margin. Things like OPTIMIZER_DEBUG and
LOCK_DEBUG seem to me to have much less utility. LOCK_DEBUG for
example produces a completely unworkable volume of output even for
very simple operations.

I've never been much of a believer in trace_recovery_messages, either,
but I'm somewhat sympathetic to the problem it's trying to solve. My
incremental backup patch set adds a new background process, and what
are you supposed to do if you have a problem with that process? You
can crank up the server debugging level overall, but what if you just
want the walsummarizer process, or say the walsummarizer and any
processes trying to take an incremental backup, to do super-detailed
logging? We don't have a model for that. I thought about adding
trace_walsummarizer_messages or debug_walsummarizer or something for
this exact reason, but I didn't because the existing
trace_recovery_messages setting seems like a kludge I don't want to
propagate. But that leaves me with nothing other than boosting up the
debug level for the whole server, which is an annoying thing to have
to do if you really only care about one subsystem.

I don't know what the right answer is exactly, but there should be
some way of telling the system something more fine-grained than "do
more logging" or "do a lot more logging" or "do really a lot more
logging" ...

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Is WAL_DEBUG related code still relevant today?

From
Noah Misch
Date:
On Thu, Dec 07, 2023 at 04:50:30PM +0530, Bharath Rupireddy wrote:
> On Mon, Dec 4, 2023 at 12:37 AM Noah Misch <noah@leadboat.com> wrote:
> > On Sun, Dec 03, 2023 at 08:30:24PM +0530, Bharath Rupireddy wrote:
> > > On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > > > On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:

> The interesting pieces that WAL_DEBUG code does are the following:
> 
> 1. Decodes the WAL record right after it's written to WAL buffers in
> XLogInsertRecord. What problem does it help to detect?

I think it helped me understand why a test case I was writing didn't reach the
bug I expected it to reach.

> 2. Emits a log message for every WAL record applied in the main redo
> apply loop. Enabling this isn't cheap for sure even for developer
> environments; I've observed a 10% increase in recovery test time)
> 3. Emits log messages for WAL writes/flushes and WAL buffer page
> initializations. These messages don't have to be hiding under a macro,
> but a DEBUGX level is sufficient.
> 
> > > > > I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
> > > > > production, if we have somebody using it, I think we need to fix the
> >
> > I don't use it in production, but I use it more than any other of our many
> > DEBUG macros.
> 
> I'm just curious to know what sorts of problems WAL_DEBUG code helps
> debug with. Is the WAL_DEBUG code (1) or (2) or (3) that helped you
> the most?

For me, (1) and (2) came up several times, and (3) came up once.  I don't
remember which of (1) or (2) helped more.

> Is converting the LOG messages (3) to DEBUGX level going to
> help in your case?

Not in my case.



Re: Is WAL_DEBUG related code still relevant today?

From
Michael Paquier
Date:
On Thu, Dec 07, 2023 at 05:29:55PM +0530, Bharath Rupireddy wrote:
> The comment atop trace_recovery [1] function says it should go away
> eventually and seems to have served the purpose when the recovery
> related code was introduced in PG 9.0.
>
> FWIW, the attached patch is what I've left with after removing
> trace_recovery_messages related code, 9 files changed, 19
> insertions(+), 97 deletions(-).

Looks acceptable to me.  Does somebody object to this removal?
--
Michael

Attachment

Re: Is WAL_DEBUG related code still relevant today?

From
Michael Paquier
Date:
On Fri, Dec 08, 2023 at 01:45:18PM +0900, Michael Paquier wrote:
> Looks acceptable to me.  Does somebody object to this removal?

Hearing nothing, done that.
--
Michael

Attachment