Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
Date
Msg-id b1a1eaf3-d5b7-da52-6bb7-c5b3fbe47f3e@gmail.com
Whole thread Raw
In response to Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello Tom,

02.03.2024 19:11, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> When a backend with deeply nested memory contexts hits out-of-memory
>> condition and logs the contexts, it might lead to a segmentation fault
>> (due to the lack of free memory again).
> Hmph.  That's not an out-of-memory crash, that's a stack-too-deep
> crash.

I tried to decrease the limit and still got the failure (with the much
shorter stack):
ulimit -Sv 200000; TESTS=infinite_recurse make -s check-tests

(gdb) p $rsp
$1 = (void *) 0x7ffcc83d4ff0
(gdb) frame 13269
#13269 0x000056289bc2685a in main (argc=8, argv=0x56289d3b4930) at main.c:198
198                     PostmasterMain(argc, argv);
(gdb) p $rsp
$2 = (void *) 0x7ffcc84834d0
(gdb) p $rsp - 0x7ffcc83d4ff0
$3 = (void *) 0xae4e0

(Far less than ulimit -s == 8 MB.)

It made me think that it's not a stack overflow issue, but may be I miss
something.

> Seems like we ought to do one or both of these:
>
> 1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal.
>
> 2. Teach MemoryContextStatsInternal to refuse to recurse more
> than N levels, for N perhaps around 100.
>
> Neither of these are very attractive though, as they'd obscure
> the OOM situation that we're trying to help debug.
>
> It strikes me that we don't actually need recursion in order to
> traverse the context tree: since the nodes have parent pointers,
> it'd be possible to visit them all using only iteration.  The
> recursion seems necessary though to manage the child summarization
> logic as we have it (in particular, we must have a local_totals
> per level to produce summarization like this).  Maybe we could
> modify solution #2 into
>
> 2a. Once we get more than say 100 levels deep, summarize everything
> below that in a single line, obtained in an iterative rather than
> recursive traversal.
>
> I wonder whether MemoryContextDelete and other cleanup methods
> also need to be rewritten to avoid recursion.  In the infinite_recurse
> test case I think we escape trouble because we longjmp out of most
> of the stack before we try to clean up --- but you could probably
> devise a test case that tries to do a subtransaction abort at a
> deep call level, and then maybe kaboom?

Exploiting and protecting MemoryContextStatsInternal() were discussed
before:
https://www.postgresql.org/message-id/flat/1661334672.728714027%40f473.i.mail.ru
(It looks like the function got no stack-overflow protection at the end.)

But I'm still not sure that we deal here with the same issue.

Best regards,
Alexander



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
Next
From: Tom Lane
Date:
Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault