Thread: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17326
Logged by:          James Pang
Email address:      chaolpan@cisco.com
PostgreSQL version: 13.4
Operating system:   RHEL8.4
Description:

we need SSL enabled for our production env, when I test renew a ssl
certificate , and reload_conf, it crashed. even with same certificate and
ssl parameters, run reload_conf often lead to Postgres crash. For example
:

 =# select name,setting from pg_settings where name like 'ssl_%' order by
name;
                  name                  |                setting
----------------------------------------+---------------------------------------
 ssl_ca_file                            |
/var/lib/pgsql/sslcerts/awstestca.crt
 ssl_cert_file                          |
/var/lib/pgsql/sslcerts/server.crt
 ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL
 ssl_crl_file                           |
 ssl_dh_params_file                     |
 ssl_ecdh_curve                         | prime256v1
 ssl_key_file                           |
/var/lib/pgsql/sslcerts/server.key
 ssl_library                            | OpenSSL
 ssl_max_protocol_version               |
 ssl_min_protocol_version               | TLSv1.2
 ssl_passphrase_command                 |
 ssl_passphrase_command_supports_reload | off
 ssl_prefer_server_ciphers              | on
(13 rows)

 =# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

 =# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

 =# select pg_reload_conf();
FATAL:  terminating connection due to unexpected postmaster exit
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
"James Pang (chaolpan)"
Date:
From postgres logs , it show 
2021-12-08 03:57:55.826 UTC::@:[1291058]:[9-1]:2021-12-08 03:33:21 UTC:LOG:  received SIGHUP, reloading configuration
files
2021-12-08 03:58:02.832 UTC::@:[1291058]:[10-1]:2021-12-08 03:33:21 UTC:LOG:  received SIGHUP, reloading configuration
files
2021-12-08 03:58:03.143 UTC:10.240.212.242(58646):jamet@jamet:[1291076]:[9-1]:2021-12-08 03:33:24 UTC:testsubLOG:
disconnection:session time: 0:24:38.967 user=jamet database=jamet host=10.240.212.242 port=58646
 
2021-12-08 03:58:03.147 UTC:[local]:postgres@jamet:[1291397]:[3-1]:2021-12-08 03:57:02 UTC:psqlFATAL:  terminating
connectiondue to unexpected postmaster exit
 
2021-12-08 03:58:03.147 UTC:[local]:postgres@jamet:[1291397]:[4-1]:2021-12-08 03:57:02 UTC:psqlLOG:  disconnection:
sessiontime: 0:01:00.405 user=postgres database=jamet host=[local]
 

James

-----Original Message-----
From: PG Bug reporting form <noreply@postgresql.org> 
Sent: Wednesday, December 8, 2021 12:03 PM
To: pgsql-bugs@lists.postgresql.org
Cc: James Pang (chaolpan) <chaolpan@cisco.com>
Subject: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

The following bug has been logged on the website:

Bug reference:      17326
Logged by:          James Pang
Email address:      chaolpan@cisco.com
PostgreSQL version: 13.4
Operating system:   RHEL8.4
Description:        

we need SSL enabled for our production env, when I test renew a ssl certificate , and reload_conf, it crashed. even
withsame certificate and ssl parameters, run reload_conf often lead to Postgres crash. For example
 
:

 =# select name,setting from pg_settings where name like 'ssl_%' order by name;
                  name                  |                setting
----------------------------------------+-------------------------------
----------------------------------------+--------
 ssl_ca_file                            |
/var/lib/pgsql/sslcerts/awstestca.crt
 ssl_cert_file                          |
/var/lib/pgsql/sslcerts/server.crt
 ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL
 ssl_crl_file                           |
 ssl_dh_params_file                     |
 ssl_ecdh_curve                         | prime256v1
 ssl_key_file                           |
/var/lib/pgsql/sslcerts/server.key
 ssl_library                            | OpenSSL
 ssl_max_protocol_version               |
 ssl_min_protocol_version               | TLSv1.2
 ssl_passphrase_command                 |
 ssl_passphrase_command_supports_reload | off
 ssl_prefer_server_ciphers              | on
(13 rows)

 =# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

 =# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

 =# select pg_reload_conf();
FATAL:  terminating connection due to unexpected postmaster exit server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


> The following bug has been logged on the website:
>
> Bug reference:      17326
> Logged by:          James Pang
> Email address:      chaolpan@cisco.com
> PostgreSQL version: 13.4
> Operating system:   RHEL8.4
> Description:
>
> we need SSL enabled for our production env, when I test renew a ssl certificate , and reload_conf, it crashed. even
withsame certificate and ssl parameters, run reload_conf often lead to Postgres crash. For example
 
> :
>
>  =# select name,setting from pg_settings where name like 'ssl_%' order by name;
>                   name                  |                setting
> ----------------------------------------+-------------------------------
> ----------------------------------------+--------
>  ssl_ca_file                            |
> /var/lib/pgsql/sslcerts/awstestca.crt
>  ssl_cert_file                          |
> /var/lib/pgsql/sslcerts/server.crt
>  ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL
>  ssl_crl_file                           |
>  ssl_dh_params_file                     |
>  ssl_ecdh_curve                         | prime256v1
>  ssl_key_file                           |
> /var/lib/pgsql/sslcerts/server.key
>  ssl_library                            | OpenSSL
>  ssl_max_protocol_version               |
>  ssl_min_protocol_version               | TLSv1.2
>  ssl_passphrase_command                 |
>  ssl_passphrase_command_supports_reload | off
>  ssl_prefer_server_ciphers              | on
> (13 rows)
>
>  =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
>
>  =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
>
>  =# select pg_reload_conf();
> FATAL:  terminating connection due to unexpected postmaster exit server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> On Wed, Dec 08, 2021 at 06:22:11AM +0000, James Pang (chaolpan) wrote:
> From postgres logs , it show
> 2021-12-08 03:57:55.826 UTC::@:[1291058]:[9-1]:2021-12-08 03:33:21 UTC:LOG:  received SIGHUP, reloading configuration
files
> 2021-12-08 03:58:02.832 UTC::@:[1291058]:[10-1]:2021-12-08 03:33:21 UTC:LOG:  received SIGHUP, reloading
configurationfiles
 
> 2021-12-08 03:58:03.143 UTC:10.240.212.242(58646):jamet@jamet:[1291076]:[9-1]:2021-12-08 03:33:24 UTC:testsubLOG:
disconnection:session time: 0:24:38.967 user=jamet database=jamet host=10.240.212.242 port=58646
 
> 2021-12-08 03:58:03.147 UTC:[local]:postgres@jamet:[1291397]:[3-1]:2021-12-08 03:57:02 UTC:psqlFATAL:  terminating
connectiondue to unexpected postmaster exit
 
> 2021-12-08 03:58:03.147 UTC:[local]:postgres@jamet:[1291397]:[4-1]:2021-12-08 03:57:02 UTC:psqlLOG:  disconnection:
sessiontime: 0:01:00.405 user=postgres database=jamet host=[local]
 

Hi,

Thanks for reporting the issue. Any chance to get a stack trace
corresponding to the crash, e.g. like in [1]?

[1]: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD



RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
"James Pang (chaolpan)"
Date:
Looks like this issue is related with "set_user" extension, I removed all extensions , pg_reload_conf() works
withoutissue.  When I installed and enable "set_user" extension, the issue got reproduced. 
    shared_preload_libraries = 'orafce,pgaudit,pg_cron,pg_stat_statements,pg_prewarm,set_user'
#set_user
set_user.superuser_whitelist = '+dba'
#set_user.superuser_allowlist = '+dba'
set_user.block_log_statement=on
set_user.nosuperuser_target_whitelist = ''
#set_user.nosuperuser_target_allowlist = ''

   Will try to get and update the stack.

James

-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: Wednesday, December 8, 2021 9:46 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

> The following bug has been logged on the website:
>
> Bug reference:      17326
> Logged by:          James Pang
> Email address:      chaolpan@cisco.com
> PostgreSQL version: 13.4
> Operating system:   RHEL8.4
> Description:
>
> we need SSL enabled for our production env, when I test renew a ssl
> certificate , and reload_conf, it crashed. even with same certificate
> and ssl parameters, run reload_conf often lead to Postgres crash. For
> example
> :
>
>  =# select name,setting from pg_settings where name like 'ssl_%' order by name;
>                   name                  |                setting
> ----------------------------------------+-----------------------------
> ----------------------------------------+--
> ----------------------------------------+--------
>  ssl_ca_file                            |
> /var/lib/pgsql/sslcerts/awstestca.crt
>  ssl_cert_file                          |
> /var/lib/pgsql/sslcerts/server.crt
>  ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL
>  ssl_crl_file                           |
>  ssl_dh_params_file                     |
>  ssl_ecdh_curve                         | prime256v1
>  ssl_key_file                           |
> /var/lib/pgsql/sslcerts/server.key
>  ssl_library                            | OpenSSL
>  ssl_max_protocol_version               |
>  ssl_min_protocol_version               | TLSv1.2
>  ssl_passphrase_command                 |
>  ssl_passphrase_command_supports_reload | off
>  ssl_prefer_server_ciphers              | on
> (13 rows)
>
>  =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
>
>  =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
>
>  =# select pg_reload_conf();
> FATAL:  terminating connection due to unexpected postmaster exit server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> On Wed, Dec 08, 2021 at 06:22:11AM +0000, James Pang (chaolpan) wrote:
> From postgres logs , it show
> 2021-12-08 03:57:55.826 UTC::@:[1291058]:[9-1]:2021-12-08 03:33:21
> UTC:LOG:  received SIGHUP, reloading configuration files
> 2021-12-08 03:58:02.832 UTC::@:[1291058]:[10-1]:2021-12-08 03:33:21
> UTC:LOG:  received SIGHUP, reloading configuration files
> 2021-12-08 03:58:03.143
> UTC:10.240.212.242(58646):jamet@jamet:[1291076]:[9-1]:2021-12-08
> 03:33:24 UTC:testsubLOG:  disconnection: session time: 0:24:38.967
> user=jamet database=jamet host=10.240.212.242 port=58646
> 2021-12-08 03:58:03.147
> UTC:[local]:postgres@jamet:[1291397]:[3-1]:2021-12-08 03:57:02
> UTC:psqlFATAL:  terminating connection due to unexpected postmaster
> exit
> 2021-12-08 03:58:03.147
> UTC:[local]:postgres@jamet:[1291397]:[4-1]:2021-12-08 03:57:02
> UTC:psqlLOG:  disconnection: session time: 0:01:00.405 user=postgres
> database=jamet host=[local]

Hi,

Thanks for reporting the issue. Any chance to get a stack trace corresponding to the crash, e.g. like in [1]?

[1]: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD



RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
"James Pang (chaolpan)"
Date:
try to install debug_info and get stack,
1. use coredump ,
 ]$ gdb -q -c /pgdata/core.1317550.sig11.1639122870s /usr/pgsql-13/bin/postgres
Reading symbols from /usr/pgsql-13/bin/postgres...Reading symbols from .gnu_debugdata for
/usr/pgsql-13/bin/postgres...(nodebugging symbols found)...done. 
(no debugging symbols found)...done.

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing
[New LWP 1317550]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/pgsql-13/bin/postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f72e3290094 in asn1_string_embed_free () from /lib64/libcrypto.so.1.1

2. when gdb log ,
Program received signal SIGHUP, Hangup.
0x00007f4fb438e25b in select () from /lib64/libc.so.6
Continuing.

Program received signal SIGHUP, Hangup.
0x00007f4fb438e25b in select () from /lib64/libc.so.6
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007f4fb5eef094 in asn1_string_embed_free () from /lib64/libcrypto.so.1.1
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Should I install debug info for set_user module too?

Thanks,

James

-----Original Message-----
From: James Pang (chaolpan)
Sent: Thursday, December 9, 2021 11:34 AM
To: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

    Looks like this issue is related with "set_user" extension, I removed all extensions , pg_reload_conf() works
withoutissue.  When I installed and enable "set_user" extension, the issue got reproduced. 
    shared_preload_libraries = 'orafce,pgaudit,pg_cron,pg_stat_statements,pg_prewarm,set_user'
#set_user
set_user.superuser_whitelist = '+dba'
#set_user.superuser_allowlist = '+dba'
set_user.block_log_statement=on
set_user.nosuperuser_target_whitelist = ''
#set_user.nosuperuser_target_allowlist = ''

   Will try to get and update the stack.

James

-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: Wednesday, December 8, 2021 9:46 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

> The following bug has been logged on the website:
>
> Bug reference:      17326
> Logged by:          James Pang
> Email address:      chaolpan@cisco.com
> PostgreSQL version: 13.4
> Operating system:   RHEL8.4
> Description:
>
> we need SSL enabled for our production env, when I test renew a ssl
> certificate , and reload_conf, it crashed. even with same certificate
> and ssl parameters, run reload_conf often lead to Postgres crash. For
> example
> :
>
>  =# select name,setting from pg_settings where name like 'ssl_%' order by name;
>                   name                  |                setting
> ----------------------------------------+-----------------------------
> ----------------------------------------+--
> ----------------------------------------+--------
>  ssl_ca_file                            |
> /var/lib/pgsql/sslcerts/awstestca.crt
>  ssl_cert_file                          |
> /var/lib/pgsql/sslcerts/server.crt
>  ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL
>  ssl_crl_file                           |
>  ssl_dh_params_file                     |
>  ssl_ecdh_curve                         | prime256v1
>  ssl_key_file                           |
> /var/lib/pgsql/sslcerts/server.key
>  ssl_library                            | OpenSSL
>  ssl_max_protocol_version               |
>  ssl_min_protocol_version               | TLSv1.2
>  ssl_passphrase_command                 |
>  ssl_passphrase_command_supports_reload | off
>  ssl_prefer_server_ciphers              | on
> (13 rows)
>
>  =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
>
>  =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
>
>  =# select pg_reload_conf();
> FATAL:  terminating connection due to unexpected postmaster exit server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> On Wed, Dec 08, 2021 at 06:22:11AM +0000, James Pang (chaolpan) wrote:
> From postgres logs , it show
> 2021-12-08 03:57:55.826 UTC::@:[1291058]:[9-1]:2021-12-08 03:33:21
> UTC:LOG:  received SIGHUP, reloading configuration files
> 2021-12-08 03:58:02.832 UTC::@:[1291058]:[10-1]:2021-12-08 03:33:21
> UTC:LOG:  received SIGHUP, reloading configuration files
> 2021-12-08 03:58:03.143
> UTC:10.240.212.242(58646):jamet@jamet:[1291076]:[9-1]:2021-12-08
> 03:33:24 UTC:testsubLOG:  disconnection: session time: 0:24:38.967
> user=jamet database=jamet host=10.240.212.242 port=58646
> 2021-12-08 03:58:03.147
> UTC:[local]:postgres@jamet:[1291397]:[3-1]:2021-12-08 03:57:02
> UTC:psqlFATAL:  terminating connection due to unexpected postmaster
> exit
> 2021-12-08 03:58:03.147
> UTC:[local]:postgres@jamet:[1291397]:[4-1]:2021-12-08 03:57:02
> UTC:psqlLOG:  disconnection: session time: 0:01:00.405 user=postgres
> database=jamet host=[local]

Hi,

Thanks for reporting the issue. Any chance to get a stack trace corresponding to the crash, e.g. like in [1]?

[1]: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD



> On Fri, Dec 10, 2021 at 09:05:19AM +0000, James Pang (chaolpan) wrote:
> try to install debug_info and get stack,
> 1. use coredump ,
>  ]$ gdb -q -c /pgdata/core.1317550.sig11.1639122870s /usr/pgsql-13/bin/postgres
> Reading symbols from /usr/pgsql-13/bin/postgres...Reading symbols from .gnu_debugdata for
/usr/pgsql-13/bin/postgres...(nodebugging symbols found)...done.
 
> (no debugging symbols found)...done.
>
> warning: Can't open file (null) during file-backed mapping note processing
>
> warning: Can't open file (null) during file-backed mapping note processing
>
> warning: Can't open file (null) during file-backed mapping note processing
> [New LWP 1317550]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/pgsql-13/bin/postgres'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f72e3290094 in asn1_string_embed_free () from /lib64/libcrypto.so.1.1
>
> 2. when gdb log ,
> Program received signal SIGHUP, Hangup.
> 0x00007f4fb438e25b in select () from /lib64/libc.so.6
> Continuing.
>
> Program received signal SIGHUP, Hangup.
> 0x00007f4fb438e25b in select () from /lib64/libc.so.6
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007f4fb5eef094 in asn1_string_embed_free () from /lib64/libcrypto.so.1.1
> Continuing.
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> The program no longer exists.
>
> Should I install debug info for set_user module too?

Eventually yes, but judging from the logs you've posted
("/usr/pgsql-13/bin/postgres...(no debugging symbols found)") the
debugging symbols for postgres itself are not there yet. Do you get a
meaningful stack trace from the coredump with the `bt` command right now?



RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
"James Pang (chaolpan)"
Date:
1. gdb attache postgres
  ]# ps -ef | grep postgres
postgres    8790       1  4 06:53 ?        00:00:00 /usr/pgsql-13/bin/postgres
# gdb -p 8790
...
Attaching to process 8790
Reading symbols from /usr/pgsql-13/bin/postgres...Reading symbols from .gnu_debugdata for
/usr/pgsql-13/bin/postgres...(nodebuggin                          g symbols found)...done. 

2. start another psql session to run pg_reload_conf()
    jamet=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

Edit postgresql.conf to change ssl_certificate parameter ,

3. (gdb) cont
Continuing.
[Detaching after fork from child process 8828]

Program received signal SIGHUP, Hangup.
0x00007ff49879d25b in select () from /lib64/libc.so.6
(gdb) cont
Continuing.

4. psql session run pg_reload_conf again
 $ psql
select pg_reload_conf();

5. gdb receive SEGSEGV
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ff49a2fe094 in asn1_string_embed_free () from /lib64/libcrypto.so.1.1
(gdb) bt
#0  0x00007ff49a2fe094 in asn1_string_embed_free () from /lib64/libcrypto.so.1.1
#1  0x00007ff49a30824f in asn1_primitive_free.localalias () from /lib64/libcrypto.so.1.1
#2  0x00007ff49a3086b8 in asn1_template_free () from /lib64/libcrypto.so.1.1
#3  0x00007ff49a308376 in asn1_item_embed_free () from /lib64/libcrypto.so.1.1
#4  0x00007ff49a3086b8 in asn1_template_free () from /lib64/libcrypto.so.1.1
#5  0x00007ff49a308376 in asn1_item_embed_free () from /lib64/libcrypto.so.1.1
#6  0x00007ff49a3086b8 in asn1_template_free () from /lib64/libcrypto.so.1.1
#7  0x00007ff49a308376 in asn1_item_embed_free () from /lib64/libcrypto.so.1.1
#8  0x00007ff49a3085d9 in ASN1_item_free () from /lib64/libcrypto.so.1.1
#9  0x00007ff49a78059c in ssl_cert_clear_certs () from /lib64/libssl.so.1.1
#10 0x00007ff49a780645 in ssl_cert_free () from /lib64/libssl.so.1.1
#11 0x00007ff49a78a25c in SSL_CTX_free () from /lib64/libssl.so.1.1
#12 0x000000000068b6b8 in be_tls_init ()
#13 0x00000000007271e1 in SIGHUP_handler ()
#14 <signal handler called>
#15 0x00007ff49879d25b in select () from /lib64/libc.so.6
#16 0x000000000072a20c in ServerLoop ()
#17 0x000000000072bd10 in PostmasterMain ()
#18 0x00000000004869a0 in main ()
(gdb) cont
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Thanks,

James
-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: Friday, December 10, 2021 10:23 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

> On Fri, Dec 10, 2021 at 09:05:19AM +0000, James Pang (chaolpan) wrote:
> try to install debug_info and get stack, 1. use coredump ,  ]$ gdb -q
> -c /pgdata/core.1317550.sig11.1639122870s /usr/pgsql-13/bin/postgres
> Reading symbols from /usr/pgsql-13/bin/postgres...Reading symbols from .gnu_debugdata for
/usr/pgsql-13/bin/postgres...(nodebugging symbols found)...done. 
> (no debugging symbols found)...done.
>
> warning: Can't open file (null) during file-backed mapping note
> processing
>
> warning: Can't open file (null) during file-backed mapping note
> processing
>
> warning: Can't open file (null) during file-backed mapping note
> processing [New LWP 1317550] [Thread debugging using libthread_db
> enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/pgsql-13/bin/postgres'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f72e3290094 in asn1_string_embed_free () from
> /lib64/libcrypto.so.1.1
>
> 2. when gdb log ,
> Program received signal SIGHUP, Hangup.
> 0x00007f4fb438e25b in select () from /lib64/libc.so.6 Continuing.
>
> Program received signal SIGHUP, Hangup.
> 0x00007f4fb438e25b in select () from /lib64/libc.so.6 Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007f4fb5eef094 in asn1_string_embed_free () from
> /lib64/libcrypto.so.1.1 Continuing.
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> The program no longer exists.
>
> Should I install debug info for set_user module too?

Eventually yes, but judging from the logs you've posted ("/usr/pgsql-13/bin/postgres...(no debugging symbols found)")
thedebugging symbols for postgres itself are not there yet. Do you get a meaningful stack trace from the coredump with
the`bt` command right now? 



On Mon, Dec 13, 2021 at 07:06:16AM +0000, James Pang (chaolpan) wrote:
> Edit postgresql.conf to change ssl_certificate parameter ,

Do you mean ssl_cert_file here?  Also, something that's not completely
clear to me is if this is a problem with a vanilla PostgreSQL
instance or if this is related to the pgaudit extension set_user, as
it has been mentioned as one potential origin of the problem upthread,
but you are not telling if this is the case here.  So what do you have
for shared_preload_libraries in this crash?

> #9  0x00007ff49a78059c in ssl_cert_clear_certs () from /lib64/libssl.so.1.1
> #10 0x00007ff49a780645 in ssl_cert_free () from /lib64/libssl.so.1.1
> #11 0x00007ff49a78a25c in SSL_CTX_free () from /lib64/libssl.so.1.1
> #12 0x000000000068b6b8 in be_tls_init ()
> #13 0x00000000007271e1 in SIGHUP_handler ()

Why is secure_initialize() not showing up in this stack?  That would
be the caller of be_tls_init() in the SIGHUP handler.  The version of
OpenSSL you are linking your binaries to would be useful here.  That
would be a 1.1.0 or a 1.1.1, no?  Any specific minor version letter?
--
Michael

Attachment
> On Mon, Dec 13, 2021 at 08:10:57PM +0900, Michael Paquier wrote:
> On Mon, Dec 13, 2021 at 07:06:16AM +0000, James Pang (chaolpan) wrote:
> > Edit postgresql.conf to change ssl_certificate parameter ,
>
> Do you mean ssl_cert_file here?  Also, something that's not completely
> clear to me is if this is a problem with a vanilla PostgreSQL
> instance or if this is related to the pgaudit extension set_user, as
> it has been mentioned as one potential origin of the problem upthread,
> but you are not telling if this is the case here.  So what do you have
> for shared_preload_libraries in this crash?
>
> > #9  0x00007ff49a78059c in ssl_cert_clear_certs () from /lib64/libssl.so.1.1
> > #10 0x00007ff49a780645 in ssl_cert_free () from /lib64/libssl.so.1.1
> > #11 0x00007ff49a78a25c in SSL_CTX_free () from /lib64/libssl.so.1.1
> > #12 0x000000000068b6b8 in be_tls_init ()
> > #13 0x00000000007271e1 in SIGHUP_handler ()
>
> Why is secure_initialize() not showing up in this stack?  That would
> be the caller of be_tls_init() in the SIGHUP handler.  The version of
> OpenSSL you are linking your binaries to would be useful here.  That
> would be a 1.1.0 or a 1.1.1, no?  Any specific minor version letter?

I think I can actually reproduce the issue. In my case the stack is
fine, it contains secure_initialize, and overall it looks like some sort
of memory corruption -- at least openssl gets segfault because it can't
access some memory address it tries to verify in asn1_primitive_free.
Not sure yet why, investigating.



> On Tue, Dec 14, 2021 at 04:46:04PM +0100, Dmitry Dolgov wrote:
> > On Mon, Dec 13, 2021 at 08:10:57PM +0900, Michael Paquier wrote:
> > On Mon, Dec 13, 2021 at 07:06:16AM +0000, James Pang (chaolpan) wrote:
> > > Edit postgresql.conf to change ssl_certificate parameter ,
> >
> > Do you mean ssl_cert_file here?  Also, something that's not completely
> > clear to me is if this is a problem with a vanilla PostgreSQL
> > instance or if this is related to the pgaudit extension set_user, as
> > it has been mentioned as one potential origin of the problem upthread,
> > but you are not telling if this is the case here.  So what do you have
> > for shared_preload_libraries in this crash?
> >
> > > #9  0x00007ff49a78059c in ssl_cert_clear_certs () from /lib64/libssl.so.1.1
> > > #10 0x00007ff49a780645 in ssl_cert_free () from /lib64/libssl.so.1.1
> > > #11 0x00007ff49a78a25c in SSL_CTX_free () from /lib64/libssl.so.1.1
> > > #12 0x000000000068b6b8 in be_tls_init ()
> > > #13 0x00000000007271e1 in SIGHUP_handler ()
> >
> > Why is secure_initialize() not showing up in this stack?  That would
> > be the caller of be_tls_init() in the SIGHUP handler.  The version of
> > OpenSSL you are linking your binaries to would be useful here.  That
> > would be a 1.1.0 or a 1.1.1, no?  Any specific minor version letter?
>
> I think I can actually reproduce the issue. In my case the stack is
> fine, it contains secure_initialize, and overall it looks like some sort
> of memory corruption -- at least openssl gets segfault because it can't
> access some memory address it tries to verify in asn1_primitive_free.
> Not sure yet why, investigating.

After a short investigation looks like it's set_user problem. The
extension has duplicating set of parameters, where one is the actual set
and another one is "deprecated options". If I have both sets set
simultaneously in configuration (e.g. set_user.superuser_whitelist and
set_user.superuser_allowlist), on sighup in

    set_config_option / PGC_STRING branch / makeDefault condition

something weird happens after set_extra_field, and after this point ssl
context memory seems to be corrupted. Right before that an assign_hook
from set_user is invoked to do something around "deprecated" options,
that's why it looks suspicious. As soon as no "deprecated" options left
in the config the issue disappears.



On Tue, Dec 14, 2021 at 06:36:54PM +0100, Dmitry Dolgov wrote:
> something weird happens after set_extra_field, and after this point ssl
> context memory seems to be corrupted. Right before that an assign_hook
> from set_user is invoked to do something around "deprecated" options,
> that's why it looks suspicious. As soon as no "deprecated" options left
> in the config the issue disappears.

Hmm, okay.  Thanks.  I have no idea if this extension is doing
something it should not, but I'd like to keep in mind that there could
be something that could be improved in core depending on what this
module is trying to achieve.  At least that's a possibility.
--
Michael

Attachment

RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
"James Pang (chaolpan)"
Date:
It's a new project that need security compliance , SSL is a MUST here , and pgaudit,set_user is installed here too to
meetingthe compliance request.  We test renew SSL certificate, and change the ssl_cert_file and ssl_key_file parameter
torenewed ssl certificates.  
ssl = on
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'

ssl_crl_file = ''
#ssl_min_protocol_version = 'TLSv1.2'
ssl_ca_file = '/var/lib/pgsql/sslrenew/idtrca.cer'
#ssl_cert_file = '/var/lib/pgsql/sslrenew/postgres-109798.crt'
#ssl_key_file = '/var/lib/pgsql/sslrenew/postgres-109798.key'

ssl_cert_file = '/var/lib/pgsql/sslrenew/postgres014-110388.crt'
ssl_key_file = '/var/lib/pgsql/sslrenew/postgres014-11038.key'

--
shared_preload_libraries = 'orafce,pgaudit,pg_cron,pg_stat_statements,pg_prewarm,set_user'
pgaudit.log_catalog='on'
pgaudit.log_level='log'
pgaudit.log_parameter=on
pgaudit.log_statement_once=off
pgaudit.log='all, -misc'
pgaudit.log='ddl,role'
pgaudit.role='postgres,jamet'

#set_user
set_user.superuser_whitelist = '+dba'
#set_user.superuser_allowlist = '+dba'
set_user.block_log_statement=on
#set_user.nosuperuser_target_whitelist = ''
set_user.nosuperuser_target_allowlist = ''

#pre_warm
pg_prewarm.autoprewarm = true
pg_prewarm.autoprewarm_interval = 600


the Operating system got some security hardening too, too meet compliance requirement.   The OpenSSL 1.1.1g with FIPS
enabled. 
$ openssl version
OpenSSL 1.1.1g FIPS  21 Apr 2020


Yes, interesting thing is when I remove all extensions and try the test again, then install orafce, pg_background,
pgaudit,looks like not reproduced the issue, until install set_user rpm it's ok, but when create extension again,
reproducedthe issue.  

=# \dx
                                                       List of installed extensions
        Name        | Version |   Schema   |                                          Description

--------------------+---------+------------+-----------------------------------------------------------------------------------------------
 amcheck            | 1.2     | public     | functions for verifying relation integrity
 orafce             | 3.15    | public     | Functions and operators that emulate a subset of functions and packages
fromthe Oracle RDBMS 
 pageinspect        | 1.8     | public     | inspect the contents of database pages at a low level
 pg_background      | 1.0     | public     | Run SQL queries in the background
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_cron            | 1.4     | public     | Job scheduler for PostgreSQL
 pg_freespacemap    | 1.2     | public     | examine the free space map (FSM)
 pg_permissions     | 1.1     | public     | view object permissions and compare them with the desired state
 pg_stat_statements | 1.8     | public     | track planning and execution statistics of all SQL statements executed
 pgaudit            | 1.5     | public     | provides auditing functionality
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgres_fdw       | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 set_user           | 3.0     | public     | similar to SET ROLE but with added logging
(14 rows)


Thanks,

James

-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: Tuesday, December 14, 2021 11:46 PM
To: Michael Paquier <michael@paquier.xyz>
Cc: James Pang (chaolpan) <chaolpan@cisco.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

> On Mon, Dec 13, 2021 at 08:10:57PM +0900, Michael Paquier wrote:
> On Mon, Dec 13, 2021 at 07:06:16AM +0000, James Pang (chaolpan) wrote:
> > Edit postgresql.conf to change ssl_certificate parameter ,
>
> Do you mean ssl_cert_file here?  Also, something that's not completely
> clear to me is if this is a problem with a vanilla PostgreSQL instance
> or if this is related to the pgaudit extension set_user, as it has
> been mentioned as one potential origin of the problem upthread, but
> you are not telling if this is the case here.  So what do you have for
> shared_preload_libraries in this crash?
>
> > #9  0x00007ff49a78059c in ssl_cert_clear_certs () from
> > /lib64/libssl.so.1.1
> > #10 0x00007ff49a780645 in ssl_cert_free () from /lib64/libssl.so.1.1
> > #11 0x00007ff49a78a25c in SSL_CTX_free () from /lib64/libssl.so.1.1
> > #12 0x000000000068b6b8 in be_tls_init ()
> > #13 0x00000000007271e1 in SIGHUP_handler ()
>
> Why is secure_initialize() not showing up in this stack?  That would
> be the caller of be_tls_init() in the SIGHUP handler.  The version of
> OpenSSL you are linking your binaries to would be useful here.  That
> would be a 1.1.0 or a 1.1.1, no?  Any specific minor version letter?

I think I can actually reproduce the issue. In my case the stack is fine, it contains secure_initialize, and overall it
lookslike some sort of memory corruption -- at least openssl gets segfault because it can't access some memory address
ittries to verify in asn1_primitive_free. 
Not sure yet why, investigating.



RE: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

From
"James Pang (chaolpan)"
Date:
It's a new project that need security compliance , SSL is a MUST here , and pgaudit,set_user is installed here too to
meetingthe compliance request.  We test renew SSL certificate, and change the ssl_cert_file and ssl_key_file parameter
torenewed ssl certificates.  
ssl = on
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'

ssl_crl_file = ''
#ssl_min_protocol_version = 'TLSv1.2'
ssl_ca_file = '/var/lib/pgsql/sslrenew/idtrca.cer'
#ssl_cert_file = '/var/lib/pgsql/sslrenew/postgres-109798.crt'
#ssl_key_file = '/var/lib/pgsql/sslrenew/postgres-109798.key'

ssl_cert_file = '/var/lib/pgsql/sslrenew/postgres014-110388.crt'
ssl_key_file = '/var/lib/pgsql/sslrenew/postgres014-11038.key'

--
shared_preload_libraries = 'orafce,pgaudit,pg_cron,pg_stat_statements,pg_prewarm,set_user'
pgaudit.log_catalog='on'
pgaudit.log_level='log'
pgaudit.log_parameter=on
pgaudit.log_statement_once=off
pgaudit.log='all, -misc'
pgaudit.log='ddl,role'
pgaudit.role='postgres,jamet'

#set_user
set_user.superuser_whitelist = '+dba'
#set_user.superuser_allowlist = '+dba'
set_user.block_log_statement=on
#set_user.nosuperuser_target_whitelist = ''
set_user.nosuperuser_target_allowlist = ''

#pre_warm
pg_prewarm.autoprewarm = true
pg_prewarm.autoprewarm_interval = 600


the Operating system got some security hardening too, too meet compliance requirement.   The OpenSSL 1.1.1g with FIPS
enabled. 
$ openssl version
OpenSSL 1.1.1g FIPS  21 Apr 2020


Yes, interesting thing is when I remove all extensions and try the test again, then install orafce, pg_background,
pgaudit,looks like not reproduced the issue, until install set_user rpm it's ok, but when create extension again,
reproducedthe issue.  

=# \dx
                                                       List of installed extensions
        Name        | Version |   Schema   |                                          Description
--------------------+---------+------------+----------------------------
--------------------+---------+------------+----------------------------
--------------------+---------+------------+----------------------------
--------------------+---------+------------+-----------
 amcheck            | 1.2     | public     | functions for verifying relation integrity
 orafce             | 3.15    | public     | Functions and operators that emulate a subset of functions and packages
fromthe Oracle RDBMS 
 pageinspect        | 1.8     | public     | inspect the contents of database pages at a low level
 pg_background      | 1.0     | public     | Run SQL queries in the background
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_cron            | 1.4     | public     | Job scheduler for PostgreSQL
 pg_freespacemap    | 1.2     | public     | examine the free space map (FSM)
 pg_permissions     | 1.1     | public     | view object permissions and compare them with the desired state
 pg_stat_statements | 1.8     | public     | track planning and execution statistics of all SQL statements executed
 pgaudit            | 1.5     | public     | provides auditing functionality
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgres_fdw       | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 set_user           | 3.0     | public     | similar to SET ROLE but with added logging
(14 rows)


Thanks,

James

-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: Tuesday, December 14, 2021 11:46 PM
To: Michael Paquier <michael@paquier.xyz>
Cc: James Pang (chaolpan) <chaolpan@cisco.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters

> On Mon, Dec 13, 2021 at 08:10:57PM +0900, Michael Paquier wrote:
> On Mon, Dec 13, 2021 at 07:06:16AM +0000, James Pang (chaolpan) wrote:
> > Edit postgresql.conf to change ssl_certificate parameter ,
>
> Do you mean ssl_cert_file here?  Also, something that's not completely
> clear to me is if this is a problem with a vanilla PostgreSQL instance
> or if this is related to the pgaudit extension set_user, as it has
> been mentioned as one potential origin of the problem upthread, but
> you are not telling if this is the case here.  So what do you have for
> shared_preload_libraries in this crash?
>
> > #9  0x00007ff49a78059c in ssl_cert_clear_certs () from
> > /lib64/libssl.so.1.1
> > #10 0x00007ff49a780645 in ssl_cert_free () from /lib64/libssl.so.1.1
> > #11 0x00007ff49a78a25c in SSL_CTX_free () from /lib64/libssl.so.1.1
> > #12 0x000000000068b6b8 in be_tls_init ()
> > #13 0x00000000007271e1 in SIGHUP_handler ()
>
> Why is secure_initialize() not showing up in this stack?  That would
> be the caller of be_tls_init() in the SIGHUP handler.  The version of
> OpenSSL you are linking your binaries to would be useful here.  That
> would be a 1.1.0 or a 1.1.1, no?  Any specific minor version letter?

I think I can actually reproduce the issue. In my case the stack is fine, it contains secure_initialize, and overall it
lookslike some sort of memory corruption -- at least openssl gets segfault because it can't access some memory address
ittries to verify in asn1_primitive_free. 
Not sure yet why, investigating.