Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | qzxq6mqqozctlfcg2kg5744gmyubicvuehnp4a7up472thlvz2@y5xqgd5wcwhw Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, Tom, CCed you since you have worked most on elog.c On 2025-03-07 16:23:51 -0500, Andres Freund wrote: > What about pg_io_handles? While looking at the view I felt motivated to tackle the one FIXME in the implementation of the view. Namely that the "error_desc" column wasn't populated (the view did show that there was an error, but not what the error was). Which lead me down a sad sad rabbit hole, largely independent of AIO. A bit of background: For AIO completion callbacks can signal errors (e.g. a page header failing validation). That error can be logged in the callback and/or raised later, e.g. by the query that issued the IO. AIO callbacks happen in critical sections, which is required to be able to use AIO for WAL (see README.md for more details). Currently errors are logged/raised by ereport()s in functions that gets passed in an elevel, pretty standard. A few of the ereports() use errcode_for_file_access() to translate an errno to an sqlerrcode. Now on to the problem: The result of an ereport() can't be put into a view, obviously. I didn't think it'd be good if the each kind of error needed to be implemented twice, once with ereport() and once to just return a string to put in the view. I tried a few things: 1) Use errsave() to allow delayed reporting of the error I encountered a few problems: - errsave() doesn't allow the log level to be specified, which means it can't directly be used to LOG if no context is specified. This could be worked around by always specifying the context, with ErrorSaveContext.details_wanted = true and having generic code that changes the elevel to whatever is appropriate and then using ThrowErrorData() to log the message. - ersave_start() sets assoc_context to CurrentMemoryContext and errsave_finish() allocates an ErrorData copy in CurrentMemoryContext This makes naive use of this approach when logging in a critical section impossible. If ErrorSaveContext is not passed in an ERROR will be raised, even if we just want to log. If ErrorSaveContext is used, we allocate memory in the caller context, which isn't allowed in a critical section. The only way I saw to work around that was to switch to ErrorContext before calling errsave(). That's doable, the logging is called from one function (pgaio_result_report()). That kinda works, but as a consequence we more than double the memory usage in ErrorContext as errsave_finish() will palloc a new ErrorData and ThrowErrorData() copies that ErrorData and all its string back to ErrorContext. 2) Have the error callback format the error using a helper function instead of using ereport() Problems: - errcode_for_file_access() would need to be reimplemented / split into a function translating an errnode into an sqlerrcode without getting it from the error data stack - emitting the log message in a critical section would require either doing the error formatting in ErrorContext or creating another context with reserved memory to do so. - allowing to specify DETAIL, HINT etc basically requires a small elog.c interface reimplementation 3) Use pre_format_elog_string(), format_elog_string() similar to what guc.c does for check hooks, via GUC_check_errmsg(), GUC_check_errhint() ... Problems: - Requires to duplicate errcode_for_file_access() for similar reason as in 2) - Not exactly pretty - Somewhat gnarly, but doable, to make use of %m safe, the way it's done in guc.h afaict isn't safe: pre_format_elog_string() is called for each of GUC_check_{errmsg,errdetail,errhint}. As the global errno might get set during the format_elog_string(), it'll not be the right one during the next GUC_check_*. 4) Don't use ereport() directly, but instead put the errstart() in pgaio_result_report(), before calling the error description callback. When emitting a log message, call errfinish() after the callback. For the view, get the message out via CopyErrorData() and free the memory again using FlushErrorState Problems: - Seems extremely hacky I implemented all, but don't really like any of them. Unless somebody has a better idea or we agree that one of the above is actually a acceptable approach, I'm inclined to simply remove the column containing the description of the error. The window in which one could see an IO with an error is rather short most of the time anyway and the error will also be logged. It's a bit annoying that adding the column later would require revising the signature of the error reporting callback at that time, but I think that degree of churn is acceptable. The main reason I wanted to write this up is that it seems that we're just lacking some infrastructure here. Greetings, Andres Freund
pgsql-hackers by date: