Thread: GetRelationPath() vs critical sections

GetRelationPath() vs critical sections

From
Andres Freund
Date:
Hi,

In the AIO patchset there are cases where we have to LOG failures inside a
critical section. This is necessary because e.g. a buffer read might complete
while we are waiting for a WAL write inside a critical section.

We can't just defer the log message, as the IO might end up being
waited-on/completed-by another backend than the backend that issued the IO, so
we'd defer logging issues until an effectively arbitrary later time.

In general emitting a LOG inside a critical section isn't a huge issue - we
made sure that elog.c has a reserve of memory to be able to log without
crashing.

However, the current message for buffer IO issues use relpath*() (ending up in
a call to GetRelationPath()). Which in turn uses psprintf() to generate the
path. Which in turn violates the no-memory-allocations-in-critical-sections
rule, as the containing memory context will typically not have
->allowInCritSection == true.

It's not obvious to me what the best way to deal with this is.

One idea I had was to add an errrelpath() that switches to
edata->assoc_context before calling relpath(), but that would end up leaking
memory, as FreeErrorDataContents() wouldn't know about the allocation.

Obviously we could add a version of GetRelationPath() that just prints into a
caller provided buffer - but that's somewhat awkward API wise.

A third approach would be to have a dedicated memory context for this kind of
thing that's reset after logging the message - but that comes awkwardly close
to duplicating ErrorContext.


I wonder if we're lacking a bit of infrastructure here...

Greetings,

Andres Freund



Re: GetRelationPath() vs critical sections

From
Noah Misch
Date:
On Wed, Sep 04, 2024 at 11:58:33AM -0400, Andres Freund wrote:
> In general emitting a LOG inside a critical section isn't a huge issue - we
> made sure that elog.c has a reserve of memory to be able to log without
> crashing.
> 
> However, the current message for buffer IO issues use relpath*() (ending up in
> a call to GetRelationPath()). Which in turn uses psprintf() to generate the
> path. Which in turn violates the no-memory-allocations-in-critical-sections
> rule, as the containing memory context will typically not have
> ->allowInCritSection == true.
> 
> It's not obvious to me what the best way to deal with this is.
> 
> One idea I had was to add an errrelpath() that switches to
> edata->assoc_context before calling relpath(), but that would end up leaking
> memory, as FreeErrorDataContents() wouldn't know about the allocation.
> 
> Obviously we could add a version of GetRelationPath() that just prints into a
> caller provided buffer - but that's somewhat awkward API wise.

Agreed.

> A third approach would be to have a dedicated memory context for this kind of
> thing that's reset after logging the message - but that comes awkwardly close
> to duplicating ErrorContext.

That's how I'd try to do it.  Today's ErrorContext is the context for
allocations FreeErrorDataContents() knows how to find.  The new context would
be for calling into arbitrary code unknown to FreeErrorDataContents().  Most
of the time, we'd avoid reset work for the new context, since it would have no
allocations.

Ideally, errstart() would switch to the new context before returning true, and
errfinish() would switch back.  That way, you could just call relpath*()
without an errrelpath() to help.  This does need functions called in ereport()
argument lists to not use CurrentMemoryContext for allocations that need to
survive longer.  I'd not be concerned about imposing that in a major release.
What obstacles would arise if we did that?

> I wonder if we're lacking a bit of infrastructure here...

Conceptually, the ereport() argument list should be a closure that runs in a
suitable mcxt.  I think we're not far from the goal.



Re: GetRelationPath() vs critical sections

From
Thomas Munro
Date:
On Thu, Sep 5, 2024 at 3:58 AM Andres Freund <andres@anarazel.de> wrote:
> Obviously we could add a version of GetRelationPath() that just prints into a
> caller provided buffer - but that's somewhat awkward API wise.

For the record, that's exactly what I did in the patch I proposed to
fix our long standing RelationTruncate() data-eating bug:


https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B5nfWcpnZ%3DZ%3DUpGvY1tTF%3D4QU_0U_07EFaKmH7Nr%2BNLQ%40mail.gmail.com#aa061db119ee7a4b5390af56e24f475d



Re: GetRelationPath() vs critical sections

From
Andy Fan
Date:
Thomas Munro <thomas.munro@gmail.com> writes:

> On Thu, Sep 5, 2024 at 3:58 AM Andres Freund <andres@anarazel.de> wrote:
>> Obviously we could add a version of GetRelationPath() that just prints into a
>> caller provided buffer - but that's somewhat awkward API wise.
>
> For the record, that's exactly what I did in the patch I proposed to
> fix our long standing RelationTruncate() data-eating bug:
>
>
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B5nfWcpnZ%3DZ%3DUpGvY1tTF%3D4QU_0U_07EFaKmH7Nr%2BNLQ%40mail.gmail.com#aa061db119ee7a4b5390af56e24f475d

I want to have a dicussion on the user provided buffer APIs. I just get
the similar feedback on [1] because of this recently..

I agree that "user provided buffer" API is bad for the reasons like:
a). inconvenient since user need to provide the buffer. b) unsafe
because user may provide a incorrect buffer. But it still have some
advantages, like c). allocate the memory in a expected MemoryContext
rather than CurrentMemoryContext. d). Allocating the memory at the
different time rather than executing the API e). API can write the data
to the user descired buffer directly rather than another
copy after. My user case at [1] is because of (c) and (e), and the user
case here looks because of factor (d).

Come to the badness of "user provided buffer" API, I think we can ease
them by providing both the non-user-buffer API and user-provided-buffer
API. Since the later one is safe and convenient, so
people probably user the non-user-buffer API by default and just the
user who wants the benefits of "provided-buffer" would use that API.

Am I miss some important factors on this topic?

[1]
https://www.postgresql.org/message-id/1882669.1726701697%40sss.pgh.pa.us

(I read the above topic [1] now, I just realize I proposed to [change] the
API rather than adding an new variant, that's not my intention and
that's my fault).

--
Best Regards
Andy Fan