Thread: Segfault on postgresql 12.3

Segfault on postgresql 12.3

From
Thomas SIMON
Date:
Hi all,

I just had strange behavior on my postgresql instance, with postgresql 
auto restart

Looking for logs, I've found a segfault in kern.log

[12:24:09]root@db12:~# cat /var/log/kern.log
2020-08-21T12:00:01.436378+02:00 db12 kernel: postgres[177990]: segfault 
at 0 ip 00005636d2d844f1 sp 00007fff4fa69910 error 4 in 
postgres[5636d2cb7000+775000]

I've also enabled core dump, file output is :

[12:24:13]root@db12:~# file /data/postgresql/12/main/core
/data/postgresql/12/main/core: ELF 64-bit LSB core file x86-64, version 
1 (SYSV), SVR4-style, from 'postgres: 12/main: supervision neteven2 
localhost(34868) SELECT', real uid: 110, effective uid: 110, real gid: 
114, effective gid: 114, execfn: '/usr/lib/postgresql/12/bin/postgres', 
platform: 'x86_64'


In logs , I have these messages

2020-08-21 12:00:01.451 CEST [274137]: [299-1] user=,db=,app=,client= 
LOG:  server process (PID 177990) was terminated by signal 11: 
Segmentation fault
2020-08-21 12:00:01.451 CEST [274137]: [300-1] user=,db=,app=,client= 
DETAIL:  Failed process was running: SELECT usename,count(*) FROM 
pg_stat_activity WHERE pid != pg_backend_pid() GROUP BY usename ORDER BY 1

..
2020-08-21 12:00:02.776 CEST [274137]: [302-1] user=,db=,app=,client= 
LOG:  archiver process (PID 274215) exited with exit code 1
2020-08-21 12:00:02.774 CEST [274214]: [1-1] user=,db=,app=,client= 
WARNING:  terminating connection because of crash of another server process
2020-08-21 12:00:02.774 CEST [274214]: [2-1] user=,db=,app=,client= 
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another s
erver process exited abnormally and possibly corrupted shared memory.
2020-08-21 12:00:02.774 CEST [274214]: [3-1] user=,db=,app=,client= 
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
(many times until full restart)


I'm on 12.3 version, on a dedicated host on prem.

root@db12:~# dpkg -l | grep postgresql
ii  pgdg-keyring 2018.2                                 all          
keyring for apt.postgresql.org
ii  postgresql-12 12.3-1.pgdg90+1                        amd64 
object-relational SQL database, version 12 server
ii  postgresql-12-repmgr 5.1.0-1.stretch+1                      
amd64        replication manager for PostgreSQL 12
ii  postgresql-client-12 12.3-1.pgdg90+1                        
amd64        front-end programs for PostgreSQL 12
ii  postgresql-client-common 215.pgdg90+1                           
all          manager for multiple PostgreSQL client versions
ii  postgresql-common 215.pgdg90+1                           
all          PostgreSQL database-cluster manager
ii  postgresql-server-dev-12 12.3-1.pgdg90+1                        
amd64        development files for PostgreSQL 12 server-side programming


Could you please help me to find what is the root cause ?

Best regards
thomas




Re: Segfault on postgresql 12.3

From
Julien Rouhaud
Date:
On Fri, Aug 21, 2020 at 2:25 PM Thomas SIMON <tsimon@neteven.com> wrote:
>
> Hi all,
>
> I just had strange behavior on my postgresql instance, with postgresql
> auto restart
>
> Looking for logs, I've found a segfault in kern.log
>
> [12:24:09]root@db12:~# cat /var/log/kern.log
> 2020-08-21T12:00:01.436378+02:00 db12 kernel: postgres[177990]: segfault
> at 0 ip 00005636d2d844f1 sp 00007fff4fa69910 error 4 in
> postgres[5636d2cb7000+775000]
>
> I've also enabled core dump, file output is :
>
> [12:24:13]root@db12:~# file /data/postgresql/12/main/core
> /data/postgresql/12/main/core: ELF 64-bit LSB core file x86-64, version
> 1 (SYSV), SVR4-style, from 'postgres: 12/main: supervision neteven2
> localhost(34868) SELECT', real uid: 110, effective uid: 110, real gid:
> 114, effective gid: 114, execfn: '/usr/lib/postgresql/12/bin/postgres',
> platform: 'x86_64'
>
>
> In logs , I have these messages
>
> 2020-08-21 12:00:01.451 CEST [274137]: [299-1] user=,db=,app=,client=
> LOG:  server process (PID 177990) was terminated by signal 11:
> Segmentation fault
> 2020-08-21 12:00:01.451 CEST [274137]: [300-1] user=,db=,app=,client=
> DETAIL:  Failed process was running: SELECT usename,count(*) FROM
> pg_stat_activity WHERE pid != pg_backend_pid() GROUP BY usename ORDER BY 1
>
> ..
> 2020-08-21 12:00:02.776 CEST [274137]: [302-1] user=,db=,app=,client=
> LOG:  archiver process (PID 274215) exited with exit code 1
> 2020-08-21 12:00:02.774 CEST [274214]: [1-1] user=,db=,app=,client=
> WARNING:  terminating connection because of crash of another server process
> 2020-08-21 12:00:02.774 CEST [274214]: [2-1] user=,db=,app=,client=
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another s
> erver process exited abnormally and possibly corrupted shared memory.
> 2020-08-21 12:00:02.774 CEST [274214]: [3-1] user=,db=,app=,client=
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> (many times until full restart)
>
>
> I'm on 12.3 version, on a dedicated host on prem.

Note that version 12.4 is now available, however I don't see any relevant fix.

> root@db12:~# dpkg -l | grep postgresql
> ii  pgdg-keyring 2018.2                                 all
> keyring for apt.postgresql.org
> ii  postgresql-12 12.3-1.pgdg90+1                        amd64
> object-relational SQL database, version 12 server
> ii  postgresql-12-repmgr 5.1.0-1.stretch+1
> amd64        replication manager for PostgreSQL 12
> ii  postgresql-client-12 12.3-1.pgdg90+1
> amd64        front-end programs for PostgreSQL 12
> ii  postgresql-client-common 215.pgdg90+1
> all          manager for multiple PostgreSQL client versions
> ii  postgresql-common 215.pgdg90+1
> all          PostgreSQL database-cluster manager
> ii  postgresql-server-dev-12 12.3-1.pgdg90+1
> amd64        development files for PostgreSQL 12 server-side programming
>
>
> Could you please help me to find what is the root cause ? for

This is unfortunately not enough information to find the root issue.

Do you have any custom extension?  Is there any chance you can get a
backtrace of the generated coredump? See

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Getting_a_trace_from_a_randomly_crashing_backend
for more details on how to do that.



Re: Segfault on postgresql 12.3

From
Thomas SIMON
Date:
Hi Julien,

thanks for answering me.

The only extension I use is repmgr.


I've tried to use gdb to see something (I don't know if i use it 
correctly) , below the backtarce :

[16:03:13]root@db13:/tmp$ gdb -q -c /tmp/core 
/usr/lib/postgresql/12/bin/postgres
Reading symbols from /usr/lib/postgresql/12/bin/postgres...(no debugging 
symbols found)...done.
[New LWP 177990]

warning: Unexpected size of section `.reg-xstate/177990' in core file.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: 12/main: supervision neteven2 
localhost(34868) SELECT               '.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Unexpected size of section `.reg-xstate/177990' in core file.
#0  0x00005636d2d844f1 in equalTupleDescs ()
(gdb)
(gdb) bt
#0  0x00005636d2d844f1 in equalTupleDescs ()
#1  0x00005636d31a65cf in ?? ()
#2  0x00005636d31b5fd3 in hash_search_with_hash_value ()
#3  0x00005636d31a83b1 in assign_record_type_typmod ()
#4  0x00005636d31b4855 in ?? ()
#5  0x00005636d31b4b43 in get_expr_result_type ()
#6  0x00005636d31b4b7b in get_expr_result_tupdesc ()
#7  0x00005636d2e8bcce in get_rte_attribute_is_dropped ()
#8  0x00005636d303fc7a in AcquireRewriteLocks ()
#9  0x00005636d304039e in ?? ()
#10 0x00005636d3043aa2 in QueryRewrite ()
#11 0x00005636d307de90 in ?? ()
#12 0x00005636d307df70 in pg_analyze_and_rewrite ()
#13 0x00005636d307e68f in ?? ()
#14 0x00005636d30804ad in PostgresMain ()
#15 0x00005636d2d73f00 in ?? ()
#16 0x00005636d3006f89 in PostmasterMain ()
#17 0x00005636d2d75128 in main ()
(gdb) cont
The program is not being run.


thomas

Le 21/08/2020 à 14:34, Julien Rouhaud a écrit :
> On Fri, Aug 21, 2020 at 2:25 PM Thomas SIMON <tsimon@neteven.com> wrote:
>> Hi all,
>>
>> I just had strange behavior on my postgresql instance, with postgresql
>> auto restart
>>
>> Looking for logs, I've found a segfault in kern.log
>>
>> [12:24:09]root@db12:~# cat /var/log/kern.log
>> 2020-08-21T12:00:01.436378+02:00 db12 kernel: postgres[177990]: segfault
>> at 0 ip 00005636d2d844f1 sp 00007fff4fa69910 error 4 in
>> postgres[5636d2cb7000+775000]
>>
>> I've also enabled core dump, file output is :
>>
>> [12:24:13]root@db12:~# file /data/postgresql/12/main/core
>> /data/postgresql/12/main/core: ELF 64-bit LSB core file x86-64, version
>> 1 (SYSV), SVR4-style, from 'postgres: 12/main: supervision neteven2
>> localhost(34868) SELECT', real uid: 110, effective uid: 110, real gid:
>> 114, effective gid: 114, execfn: '/usr/lib/postgresql/12/bin/postgres',
>> platform: 'x86_64'
>>
>>
>> In logs , I have these messages
>>
>> 2020-08-21 12:00:01.451 CEST [274137]: [299-1] user=,db=,app=,client=
>> LOG:  server process (PID 177990) was terminated by signal 11:
>> Segmentation fault
>> 2020-08-21 12:00:01.451 CEST [274137]: [300-1] user=,db=,app=,client=
>> DETAIL:  Failed process was running: SELECT usename,count(*) FROM
>> pg_stat_activity WHERE pid != pg_backend_pid() GROUP BY usename ORDER BY 1
>>
>> ..
>> 2020-08-21 12:00:02.776 CEST [274137]: [302-1] user=,db=,app=,client=
>> LOG:  archiver process (PID 274215) exited with exit code 1
>> 2020-08-21 12:00:02.774 CEST [274214]: [1-1] user=,db=,app=,client=
>> WARNING:  terminating connection because of crash of another server process
>> 2020-08-21 12:00:02.774 CEST [274214]: [2-1] user=,db=,app=,client=
>> DETAIL:  The postmaster has commanded this server process to roll back
>> the current transaction and exit, because another s
>> erver process exited abnormally and possibly corrupted shared memory.
>> 2020-08-21 12:00:02.774 CEST [274214]: [3-1] user=,db=,app=,client=
>> HINT:  In a moment you should be able to reconnect to the database and
>> repeat your command.
>> (many times until full restart)
>>
>>
>> I'm on 12.3 version, on a dedicated host on prem.
> Note that version 12.4 is now available, however I don't see any relevant fix.
>
>> root@db12:~# dpkg -l | grep postgresql
>> ii  pgdg-keyring 2018.2                                 all
>> keyring for apt.postgresql.org
>> ii  postgresql-12 12.3-1.pgdg90+1                        amd64
>> object-relational SQL database, version 12 server
>> ii  postgresql-12-repmgr 5.1.0-1.stretch+1
>> amd64        replication manager for PostgreSQL 12
>> ii  postgresql-client-12 12.3-1.pgdg90+1
>> amd64        front-end programs for PostgreSQL 12
>> ii  postgresql-client-common 215.pgdg90+1
>> all          manager for multiple PostgreSQL client versions
>> ii  postgresql-common 215.pgdg90+1
>> all          PostgreSQL database-cluster manager
>> ii  postgresql-server-dev-12 12.3-1.pgdg90+1
>> amd64        development files for PostgreSQL 12 server-side programming
>>
>>
>> Could you please help me to find what is the root cause ? for
> This is unfortunately not enough information to find the root issue.
>
> Do you have any custom extension?  Is there any chance you can get a
> backtrace of the generated coredump? See
>
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Getting_a_trace_from_a_randomly_crashing_backend
> for more details on how to do that.



Re: Segfault on postgresql 12.3

From
Julien Rouhaud
Date:
On Fri, Aug 21, 2020 at 4:13 PM Thomas SIMON <tsimon@neteven.com> wrote:
>
> Hi Julien,
>
> thanks for answering me.
>
> The only extension I use is repmgr.

Ok, this shouldn't be a problem.

> I've tried to use gdb to see something (I don't know if i use it
> correctly) , below the backtarce :
>
> [16:03:13]root@db13:/tmp$ gdb -q -c /tmp/core
> /usr/lib/postgresql/12/bin/postgres
> Reading symbols from /usr/lib/postgresql/12/bin/postgres...(no debugging
> symbols found)...done.
> [New LWP 177990]
>
> warning: Unexpected size of section `.reg-xstate/177990' in core file.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `postgres: 12/main: supervision neteven2
> localhost(34868) SELECT               '.
> Program terminated with signal SIGSEGV, Segmentation fault.
>
> warning: Unexpected size of section `.reg-xstate/177990' in core file.
> #0  0x00005636d2d844f1 in equalTupleDescs ()
> (gdb)
> (gdb) bt
> #0  0x00005636d2d844f1 in equalTupleDescs ()
> #1  0x00005636d31a65cf in ?? ()
> #2  0x00005636d31b5fd3 in hash_search_with_hash_value ()
> #3  0x00005636d31a83b1 in assign_record_type_typmod ()
> #4  0x00005636d31b4855 in ?? ()
> #5  0x00005636d31b4b43 in get_expr_result_type ()
> #6  0x00005636d31b4b7b in get_expr_result_tupdesc ()
> #7  0x00005636d2e8bcce in get_rte_attribute_is_dropped ()
> #8  0x00005636d303fc7a in AcquireRewriteLocks ()
> #9  0x00005636d304039e in ?? ()
> #10 0x00005636d3043aa2 in QueryRewrite ()
> #11 0x00005636d307de90 in ?? ()
> #12 0x00005636d307df70 in pg_analyze_and_rewrite ()
> #13 0x00005636d307e68f in ?? ()
> #14 0x00005636d30804ad in PostgresMain ()
> #15 0x00005636d2d73f00 in ?? ()
> #16 0x00005636d3006f89 in PostmasterMain ()
> #17 0x00005636d2d75128 in main ()
> (gdb) cont
> The program is not being run.

Thanks!

I don't see any obvious problem in that code, and that's something
that didn't change for a long time so I'm starting to think this could
be some hardware problem.  Do you have any alarming messages in your
system logs and/or dmesg?



Re: Segfault on postgresql 12.3

From
Thomas SIMON
Date:
Le 22/08/2020 à 10:11, Julien Rouhaud a écrit :
> On Fri, Aug 21, 2020 at 4:13 PM Thomas SIMON <tsimon@neteven.com> wrote:
>> Hi Julien,
>>
>> thanks for answering me.
>>
>> The only extension I use is repmgr.
> Ok, this shouldn't be a problem.
>
>> I've tried to use gdb to see something (I don't know if i use it
>> correctly) , below the backtarce :
>>
>> [16:03:13]root@db13:/tmp$ gdb -q -c /tmp/core
>> /usr/lib/postgresql/12/bin/postgres
>> Reading symbols from /usr/lib/postgresql/12/bin/postgres...(no debugging
>> symbols found)...done.
>> [New LWP 177990]
>>
>> warning: Unexpected size of section `.reg-xstate/177990' in core file.
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> Core was generated by `postgres: 12/main: supervision neteven2
>> localhost(34868) SELECT               '.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>>
>> warning: Unexpected size of section `.reg-xstate/177990' in core file.
>> #0  0x00005636d2d844f1 in equalTupleDescs ()
>> (gdb)
>> (gdb) bt
>> #0  0x00005636d2d844f1 in equalTupleDescs ()
>> #1  0x00005636d31a65cf in ?? ()
>> #2  0x00005636d31b5fd3 in hash_search_with_hash_value ()
>> #3  0x00005636d31a83b1 in assign_record_type_typmod ()
>> #4  0x00005636d31b4855 in ?? ()
>> #5  0x00005636d31b4b43 in get_expr_result_type ()
>> #6  0x00005636d31b4b7b in get_expr_result_tupdesc ()
>> #7  0x00005636d2e8bcce in get_rte_attribute_is_dropped ()
>> #8  0x00005636d303fc7a in AcquireRewriteLocks ()
>> #9  0x00005636d304039e in ?? ()
>> #10 0x00005636d3043aa2 in QueryRewrite ()
>> #11 0x00005636d307de90 in ?? ()
>> #12 0x00005636d307df70 in pg_analyze_and_rewrite ()
>> #13 0x00005636d307e68f in ?? ()
>> #14 0x00005636d30804ad in PostgresMain ()
>> #15 0x00005636d2d73f00 in ?? ()
>> #16 0x00005636d3006f89 in PostmasterMain ()
>> #17 0x00005636d2d75128 in main ()
>> (gdb) cont
>> The program is not being run.
> Thanks!
>
> I don't see any obvious problem in that code, and that's something
> that didn't change for a long time so I'm starting to think this could
> be some hardware problem.  Do you have any alarming messages in your
> system logs and/or dmesg?
I checked again, but found nothing relevant.
 From what you say, there is not much we can do, so i'm gonna keep this 
in mind, and wait to see if it happens again ...
>
>