Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault - Mailing list pgsql-bugs
From | Alexander Lakhin |
---|---|
Subject | Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault |
Date | |
Msg-id | 95461160-1214-4ac4-d65b-086182797b1d@gmail.com Whole thread Raw |
In response to | Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault (Alexander Lakhin <exclusion@gmail.com>) |
List | pgsql-bugs |
04.03.2024 18:00, Alexander Lakhin wrote: > > 04.03.2024 00:39, Tom Lane wrote: >> >>> Seems like we need to do some more work at startup to enforce that >>> we have the amount of stack we think we do, if we're on Linux. >> After thinking about that some more, I'm really quite unenthused about >> trying to remap the stack for ourselves. It'd be both platform- and >> architecture-dependent, and I'm afraid it'd introduce as many failure >> modes as it removes. (Notably, I'm not sure we could guarantee >> there's a guard page below the stack.) Since we've not seen reports >> of this failure from the wild, I doubt it's worth the trouble. > I've discovered that the out-of-stack segfault can be reached without hitting an ordinary OOM condition at all. Please look at the demo script that finds -v value for the given sql recursion (the predefined range works for a server built with CPPFLAGS="-Og" ./configure --enable-cassert --enable-debug, using gcc 11.3 on 64-bit Ubuntu): for v in `seq 240000 100 260000`; do echo "limit -v: $v" ulimit -Sv $v rm server.log pg_ctl -l server.log start dropdb test createdb test cat << 'EOF' | psql test >psql.log 2>&1 create function explainer(text) returns setof text language plpgsql as $$ declare ln text; begin for ln in execute format('explain analyze %s', $1) loop return next ln; end loop; end; $$; prepare stmt as select explainer('execute stmt'); select explainer('execute stmt'); EOF pg_ctl stop || break grep 'was terminated by signal 11' server.log && break; done This script fails for me as follows: limit -v: 241100 waiting for server to start.... done server started waiting for server to shut down.......... done server stopped 2024-03-06 14:45:26.882 UTC [38567] LOG: server process (PID 38634) was terminated by signal 11: Segmentation fault (with no out-of-memory errors in the server.log) Core was generated by `postgres: law test [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. warning: Section `.reg-xstate/38634' in core file too small. #0 0x0000563ea8af7d60 in base_yyparse (yyscanner=yyscanner@entry=0x563eacbeec38) at gram.c:29020 bt 29020 { (gdb) bt #0 0x0000563ea8af7d60 in base_yyparse (yyscanner=yyscanner@entry=0x563eacbeec38) at gram.c:29020 #1 0x0000563ea8b37d4e in raw_parser (str=str@entry=0x563eacbf2c58 "explain analyze execute stmt", mode=<optimized out>) at parser.c:77 #2 0x0000563ea8c19217 in _SPI_prepare_plan (src=src@entry=0x563eacbf2c58 "explain analyze execute stmt", plan=plan@entry=0x7fffb16323b0) at spi.c:2235 #3 0x0000563ea8c1c5f5 in SPI_cursor_parse_open (name=name@entry=0x0, src=src@entry=0x563eacbf2c58 "explain analyze execute stmt", options=options@entry=0x7fffb1632450) at spi.c:1554 ... #11389 0x0000563ea8d8bbf2 in exec_simple_query (query_string=query_string@entry=0x563eaa5c6328 "select explainer('execute stmt');") at postgres.c:1273 #11390 0x0000563ea8d8daae in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4675 #11391 0x0000563ea8cf9c5d in BackendRun (port=port@entry=0x563eaa5f38c0) at postmaster.c:4475 #11392 0x0000563ea8cfcc10 in BackendStartup (port=port@entry=0x563eaa5f38c0) at postmaster.c:4151 #11393 0x0000563ea8cfcdae in ServerLoop () at postmaster.c:1769 #11394 0x0000563ea8cfe120 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x563eaa5c1760) at postmaster.c:1468 #11395 0x0000563ea8c3148d in main (argc=3, argv=0x563eaa5c1760) at main.c:197 (gdb) i reg rbp 0x563eacbeec38 0x563eacbeec38 rsp 0x7fffb16316b0 0x7fffb16316b0 (gdb) x/4 0x7fffb1631ff0 0x7fffb1631ff0: Cannot access memory at address 0x7fffb1631ff0 (gdb) x/4 0x7fffb1632000 0x7fffb1632000: 0 0 0 0 (gdb) p stack_base_ptr $1 = 0x7fffb17ac660 "\001" (gdb) p stack_base_ptr - $rsp $2 = 1552304 So it looks like a very specific corner stack-overflow case. Best regards, Alexander
pgsql-bugs by date: