On Thu, Jul 15, 2021 at 6:14 AM Jeremy Schneider <schnjere@amazon.com> wrote:
>
> On 7/2/21 18:57, Jeremy Schneider wrote:
>
> The process of trying to understand this recent incident has given me some new insight about what information would
behelpful up front in this error message for faster resolution.
>
> First off, and most importantly, the current WAL record we're processing when the error is encountered. I wonder if
itcould easily print the LSN?
>
> Secondly, the transaction ID. In the specific bug Bertrand found, the problem is actually not with the actual WAL
recordthat's being processed - but rather because previous WAL records in the same transaction left the decoder process
ina state where the current WAL record [a commit] generated an error. So it's the entire transaction that needs to be
examinedto reproduce the error. (Andres actually pointed this out on the original thread back in December 2019.) I
realizethat once you know the LSN you can easily get the XID with pg_waldump, but personally I'd just as soon include
theXID in the error message since I think it will usually be a first step for debugging any problems with WAL decoding.
TheI can go straight to filtering that XID on my first pg_waldump run.
>
I don't think it is a bad idea to print additional information as you
are suggesting but why only for this error? It could be useful to
investigate any other error we get during decoding. I think normally
we add such additional information via error_context. We have recently
added/enhanced it for apply-workers, see commit [1].
I think here we should just print the relation name in the error
message you pointed out and then work on adding additional information
via error context as a separate patch. What do you think?
[1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=abc0910e2e0adfc5a17e035465ee31242e32c4fc
--
With Regards,
Amit Kapila.