Thread: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

From
Nazir Bilal Yavuz
Date:
Hi,

There is an ongoing thread [1] for adding missing SQL error codes to
PANIC and FATAL error reports in xlogrecovery.c file. I did the same
but for xlog.c and relcache.c files.

I couldn't find a suitable error code for the "cache lookup failed for
relation" error in relcache.c and this error comes up in many places.
Would it be reasonable to create a new error code specifically for
this?

Any kind of feedback would be appreciated.

[1] https://www.postgresql.org/message-id/CAPMWgZ8g17Myb5ZRE5aTNowUohafk4j48mZ_5_Zn9JnR5p2u0w%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment
> On 6 Dec 2023, at 14:03, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

> There is an ongoing thread [1] for adding missing SQL error codes to
> PANIC and FATAL error reports in xlogrecovery.c file. I did the same
> but for xlog.c and relcache.c files.

-    elog(PANIC, "space reserved for WAL record does not match what was written");
+    ereport(PANIC,
+            (errcode(ERRCODE_DATA_CORRUPTED),
+             errmsg("space reserved for WAL record does not match what was written")));

elogs turned into ereports should use errmsg_internal() to keep the strings
from being translated.

-    elog(FATAL, "could not write init file");
+    ereport(FATAL,
+            (errcode_for_file_access(),
+             errmsg("could not write init file")));

Is it worthwhile adding %m on these to get a little more help when debugging
errors that shouldn't happen?

-    elog(FATAL, "could not write init file");
+    ereport(FATAL,
+            (errcode_for_file_access(),

The extra parenthesis are no longer needed, I don't know if we have a policy to
remove them when changing an ereport call but we should at least not introduce
new ones.

-    elog(FATAL, "cannot read pg_class without having selected a database");
+    ereport(FATAL,
+            (errcode(ERRCODE_INTERNAL_ERROR),

ereport (and thus elog) already defaults to ERRCODE_INTERNAL_ERROR for ERROR or
higher, so unless there is a better errcode an elog() call if preferrable here.

> I couldn't find a suitable error code for the "cache lookup failed for
> relation" error in relcache.c and this error comes up in many places.
> Would it be reasonable to create a new error code specifically for
> this?

We use ERRCODE_UNDEFINED_OBJECT for similar errors elsewhere, perhaps we can
use that for these as well?

--
Daniel Gustafsson




Hi,

Thanks for the review!

On Thu, 22 Feb 2024 at 16:55, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 6 Dec 2023, at 14:03, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> > There is an ongoing thread [1] for adding missing SQL error codes to
> > PANIC and FATAL error reports in xlogrecovery.c file. I did the same
> > but for xlog.c and relcache.c files.
>
> -       elog(PANIC, "space reserved for WAL record does not match what was written");
> +       ereport(PANIC,
> +                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                        errmsg("space reserved for WAL record does not match what was written")));
>
> elogs turned into ereports should use errmsg_internal() to keep the strings
> from being translated.

Does errmsg_internal() need to be used all the time when turning elogs
into ereports? errmsg_internal()'s comment says that "This should be
used for "can't happen" cases that are probably not worth spending
translation effort on.". Is it enough to check if the error message
has a translation, and then decide the use of errmsg_internal() or
errmsg()?

> -       elog(FATAL, "could not write init file");
> +       ereport(FATAL,
> +                       (errcode_for_file_access(),
> +                        errmsg("could not write init file")));
>
> Is it worthwhile adding %m on these to get a little more help when debugging
> errors that shouldn't happen?

I believe it is worthwhile, so I will add.

> -       elog(FATAL, "could not write init file");
> +       ereport(FATAL,
> +                       (errcode_for_file_access(),
>
> The extra parenthesis are no longer needed, I don't know if we have a policy to
> remove them when changing an ereport call but we should at least not introduce
> new ones.
>
> -       elog(FATAL, "cannot read pg_class without having selected a database");
> +       ereport(FATAL,
> +                       (errcode(ERRCODE_INTERNAL_ERROR),
>
> ereport (and thus elog) already defaults to ERRCODE_INTERNAL_ERROR for ERROR or
> higher, so unless there is a better errcode an elog() call if preferrable here.

I did not know these, thanks.

> > I couldn't find a suitable error code for the "cache lookup failed for
> > relation" error in relcache.c and this error comes up in many places.
> > Would it be reasonable to create a new error code specifically for
> > this?
>
> We use ERRCODE_UNDEFINED_OBJECT for similar errors elsewhere, perhaps we can
> use that for these as well?

It looks okay to me, ERRCODE_UNDEFINED_OBJECT is mostly used for the
'not exist' errors and it seems the main reason for the 'cache lookup
failed for relation' error is because heap tuple does not exist
anymore.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



> On 23 Feb 2024, at 13:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

> Does errmsg_internal() need to be used all the time when turning elogs
> into ereports? errmsg_internal()'s comment says that "This should be
> used for "can't happen" cases that are probably not worth spending
> translation effort on.". Is it enough to check if the error message
> has a translation, and then decide the use of errmsg_internal() or
> errmsg()?

If it's an elog then it won't have a translation as none are included in the
translation set.  If the errmsg is generic enough to be translated anyways via
another (un)related ereport call then we of course use that, but ideally not
create new ones.

--
Daniel Gustafsson




Hi,

On Fri, 23 Feb 2024 at 15:34, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 23 Feb 2024, at 13:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> > Does errmsg_internal() need to be used all the time when turning elogs
> > into ereports? errmsg_internal()'s comment says that "This should be
> > used for "can't happen" cases that are probably not worth spending
> > translation effort on.". Is it enough to check if the error message
> > has a translation, and then decide the use of errmsg_internal() or
> > errmsg()?
>
> If it's an elog then it won't have a translation as none are included in the
> translation set.  If the errmsg is generic enough to be translated anyways via
> another (un)related ereport call then we of course use that, but ideally not
> create new ones.

Thanks for the explanation.

All of your feedback is addressed in v2.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment
> On 26 Feb 2024, at 13:42, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

> All of your feedback is addressed in v2.

Nothing sticks out from reading through these patches, they seem quite ready to
me.  Being able to filter and analyze on errorcodes is likely to be more
important going forward as more are running fleets of instances.  I'm marking
these Ready for Committer, unless there are objections I think we should go
ahead with these.  There are probably more errors in the system which could
benefit from the same treatment, but we need to start somewhere.

--
Daniel Gustafsson




> On 6 Mar 2024, at 09:59, Daniel Gustafsson <daniel@yesql.se> wrote:

> Nothing sticks out from reading through these patches, they seem quite ready to
> me.

Took another look at this today and committed it. Thanks!

--
Daniel Gustafsson




Hi,

On Wed, 3 Apr 2024 at 12:11, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 6 Mar 2024, at 09:59, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > Nothing sticks out from reading through these patches, they seem quite ready to
> > me.
>
> Took another look at this today and committed it. Thanks!

Thanks for the commit!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft