回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c - Mailing list pgsql-hackers
From | Steven Niu |
---|---|
Subject | 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c |
Date | |
Msg-id | MN2PR15MB302160E8AC4AA87DE2B95183A701A@MN2PR15MB3021.namprd15.prod.outlook.com Whole thread Raw |
In response to | Fix segfault while accessing half-initialized hash table in pgstat_shmem.c (Mikhail Kot <mikhail.kot@databricks.com>) |
Responses |
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c
|
List | pgsql-hackers |
I found there are many cases of following pattern:
ptr_1 = dsa_allocate();
ptr_2 = dsa_get_address(xxx, ptr_1);
ptr_2->yyy = zzz;
Inside dsa_get_address(dsa_area *area, dsa_pointer dp):
/* Convert InvalidDsaPointer to NULL. */
if (!DsaPointerIsValid(dp))
return NULL;
So unless dsa_allocate() can ensure never returns InvalidDsaPointer, there is risk of SegV.
In fact the function dsa_allocate() does return InvalidDsaPointer in some cases.
So, maybe should we add pointer check in all places where dsa_get_address is called. Comments?
发件人: Mikhail Kot <mikhail.kot@databricks.com>
已发送: 2025 年 9 月 03 日 星期三 04:09
收件人: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
抄送: to@myrrc.dev <to@myrrc.dev>
主题: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c
已发送: 2025 年 9 月 03 日 星期三 04:09
收件人: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
抄送: to@myrrc.dev <to@myrrc.dev>
主题: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c
Hi,
I've encountered the following segmentation fault lately. It happens when
Postgres is experiencing high memory pressure. There are multiple OOM errors in
the log as well.
Core was generated by `postgres: neondb_owner neondb ::1(46658) BIND
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_atomic_read_u32_impl (ptr=0x8) at
../../../../src/include/port/atomics/generic.h:48
#1 pg_atomic_read_u32 (ptr=0x8) at ../../../../src/include/port/atomics.h:239
#2 LWLockAttemptLock (lock=lock@entry=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:821
#3 0x000056446bce129f in LWLockConditionalAcquire (lock=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1386
#4 0x000056446bd0bacf in pgstat_lock_entry
(entry_ref=entry_ref@entry=0x56446d9f4340, nowait=nowait@entry=true)
at pgstat_shmem.c:625
#5 0x000056446bd0a3c9 in pgstat_relation_flush_cb
(entry_ref=0x56446d9f4340, nowait=<optimized out>) at
pgstat_relation.c:794
#6 0x000056446bd069f5 in pgstat_flush_pending_entries
(nowait=<optimized out>) at pgstat.c:1217
#7 pgstat_report_stat (force=<optimized out>, force@entry=false) at
pgstat.c:658
#8 0x000056446bcf16c1 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4623
#9 0x000056446bc716b3 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4465
#10 BackendStartup (port=<optimized out>) at postmaster.c:4193
#11 ServerLoop () at postmaster.c:1782
#12 0x000056446bc726ea in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x56446cd803b0) at postmaster.c:1466
#13 0x000056446b9d5a00 in main (argc=3, argv=0x56446cd803b0) at main.c:238
The error originates from pgstat_shmem.c file where shhashent is left in
half-initialized state if pgstat_init_entry(), calling dsa_allocate0(), errors
out with OOM. Then shhashent causes a segmentation fault on access. I propose a
patch which solves this issue. The patch is for main branch, but the code is
nearly identical in Postgres 13-17 so I suggest backporting it to other
supported versions.
The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory
allocation failed. It also adds sanity checks to routines accepting arguments
returned by pgstat_init_entry().
Reproducing this behaviour is tricky, because under OOM Postgres doesn't
necessarily reach the condition where specific dsa_allocate0() call errors.
I've encountered the following segmentation fault lately. It happens when
Postgres is experiencing high memory pressure. There are multiple OOM errors in
the log as well.
Core was generated by `postgres: neondb_owner neondb ::1(46658) BIND
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_atomic_read_u32_impl (ptr=0x8) at
../../../../src/include/port/atomics/generic.h:48
#1 pg_atomic_read_u32 (ptr=0x8) at ../../../../src/include/port/atomics.h:239
#2 LWLockAttemptLock (lock=lock@entry=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:821
#3 0x000056446bce129f in LWLockConditionalAcquire (lock=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1386
#4 0x000056446bd0bacf in pgstat_lock_entry
(entry_ref=entry_ref@entry=0x56446d9f4340, nowait=nowait@entry=true)
at pgstat_shmem.c:625
#5 0x000056446bd0a3c9 in pgstat_relation_flush_cb
(entry_ref=0x56446d9f4340, nowait=<optimized out>) at
pgstat_relation.c:794
#6 0x000056446bd069f5 in pgstat_flush_pending_entries
(nowait=<optimized out>) at pgstat.c:1217
#7 pgstat_report_stat (force=<optimized out>, force@entry=false) at
pgstat.c:658
#8 0x000056446bcf16c1 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4623
#9 0x000056446bc716b3 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4465
#10 BackendStartup (port=<optimized out>) at postmaster.c:4193
#11 ServerLoop () at postmaster.c:1782
#12 0x000056446bc726ea in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x56446cd803b0) at postmaster.c:1466
#13 0x000056446b9d5a00 in main (argc=3, argv=0x56446cd803b0) at main.c:238
The error originates from pgstat_shmem.c file where shhashent is left in
half-initialized state if pgstat_init_entry(), calling dsa_allocate0(), errors
out with OOM. Then shhashent causes a segmentation fault on access. I propose a
patch which solves this issue. The patch is for main branch, but the code is
nearly identical in Postgres 13-17 so I suggest backporting it to other
supported versions.
The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory
allocation failed. It also adds sanity checks to routines accepting arguments
returned by pgstat_init_entry().
Reproducing this behaviour is tricky, because under OOM Postgres doesn't
necessarily reach the condition where specific dsa_allocate0() call errors.
pgsql-hackers by date: