Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id qzxq6mqqozctlfcg2kg5744gmyubicvuehnp4a7up472thlvz2@y5xqgd5wcwhw
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
Responses Re: AIO v2.5
List pgsql-hackers
Hi,

Tom, CCed you since you have worked most on elog.c


On 2025-03-07 16:23:51 -0500, Andres Freund wrote:
> What about pg_io_handles?

While looking at the view I felt motivated to tackle the one FIXME in the
implementation of the view. Namely that the "error_desc" column wasn't
populated (the view did show that there was an error, but not what the error
was).

Which lead me down a sad sad rabbit hole, largely independent of AIO.


A bit of background:

For AIO completion callbacks can signal errors (e.g. a page header failing
validation). That error can be logged in the callback and/or raised later,
e.g. by the query that issued the IO.

AIO callbacks happen in critical sections, which is required to be able to use
AIO for WAL (see README.md for more details).

Currently errors are logged/raised by ereport()s in functions that gets passed
in an elevel, pretty standard.

A few of the ereports() use errcode_for_file_access() to translate an errno to
an sqlerrcode.


Now on to the problem:

The result of an ereport() can't be put into a view, obviously. I didn't think
it'd be good if the each kind of error needed to be implemented twice, once
with ereport() and once to just return a string to put in the view.


I tried a few things:

1) Use errsave() to allow delayed reporting of the error

I encountered a few problems:

- errsave() doesn't allow the log level to be specified, which means it can't
  directly be used to LOG if no context is specified.

  This could be worked around by always specifying the context, with
  ErrorSaveContext.details_wanted = true and having generic code that changes
  the elevel to whatever is appropriate and then using ThrowErrorData() to log the
  message.

- ersave_start() sets assoc_context to CurrentMemoryContext and
  errsave_finish() allocates an ErrorData copy in CurrentMemoryContext

  This makes naive use of this approach when logging in a critical section
  impossible. If ErrorSaveContext is not passed in an ERROR will be raised,
  even if we just want to log.  If ErrorSaveContext is used, we allocate
  memory in the caller context, which isn't allowed in a critical section.

  The only way I saw to work around that was to switch to ErrorContext before
  calling errsave(). That's doable, the logging is called from one function
  (pgaio_result_report()). That kinda works, but as a consequence we more than
  double the memory usage in ErrorContext as errsave_finish() will palloc a
  new ErrorData and ThrowErrorData() copies that ErrorData and all its string
  back to ErrorContext.


2) Have the error callback format the error using a helper function instead of
   using ereport()

Problems:

- errcode_for_file_access() would need to be reimplemented / split into a
  function translating an errnode into an sqlerrcode without getting it from
  the error data stack

- emitting the log message in a critical section would require either doing
  the error formatting in ErrorContext or creating another context with
  reserved memory to do so.

- allowing to specify DETAIL, HINT etc basically requires a small elog.c
  interface reimplementation


3) Use pre_format_elog_string(), format_elog_string() similar to what guc.c
   does for check hooks, via GUC_check_errmsg(), GUC_check_errhint() ...

Problems:

- Requires to duplicate errcode_for_file_access() for similar reason as in 2)

- Not exactly pretty

- Somewhat gnarly, but doable, to make use of %m safe, the way it's done in
  guc.h afaict isn't safe:
  pre_format_elog_string() is called for each of
  GUC_check_{errmsg,errdetail,errhint}. As the global errno might get set
  during the format_elog_string(), it'll not be the right one during
  the next GUC_check_*.



4) Don't use ereport() directly, but instead put the errstart() in
   pgaio_result_report(), before calling the error description callback.

   When emitting a log message, call errfinish() after the callback. For the
   view, get the message out via CopyErrorData() and free the memory again
   using FlushErrorState

Problems:

- Seems extremely hacky


I implemented all, but don't really like any of them.


Unless somebody has a better idea or we agree that one of the above is
actually a acceptable approach, I'm inclined to simply remove the column
containing the description of the error. The window in which one could see an
IO with an error is rather short most of the time anyway and the error will
also be logged.

It's a bit annoying that adding the column later would require revising the
signature of the error reporting callback at that time, but I think that
degree of churn is acceptable.


The main reason I wanted to write this up is that it seems that we're just
lacking some infrastructure here.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: per backend WAL statistics
Next
From: Matthias van de Meent
Date:
Subject: Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?