Thread: libpq crashing on macOS during connection startup
I have a macOS web server using Postgres that has been very stable until a month or two ago. If I restart the web serverthe problem seems to go away for a while, but starts happening again within days. I thought it was a PHP issue as discussedin the link below, but I just noticed in the crash report it seems to be something related to a call from libpq. https://github.com/shivammathur/homebrew-php/issues/1862 Any ideas or suggestions appreciated. John DeSoi, Ph.D. ------------------------------------- Translated Report (Full Report Below) ------------------------------------- Process: httpd [54877] Path: /opt/homebrew/*/httpd Identifier: httpd Version: ??? Code Type: ARM-64 (Native) Parent Process: httpd [6040] Responsible: httpd [6040] User ID: 502 Date/Time: 2023-11-30 07:06:00.0651 -0600 OS Version: macOS 12.7 (21G816) Report Version: 12 Anonymous UUID: 750F146C-B2B5-BECA-EC21-1FEC0471D5AC Time Awake Since Boot: 1000000 seconds System Integrity Protection: enabled Crashed Thread: 0 Dispatch queue: com.apple.root.utility-qos Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110 Exception Codes: 0x0000000000000001, 0x0000000000000110 Exception Note: EXC_CORPSE_NOTIFY VM Region Info: 0x110 is not in any region. Bytes before following region: 105553518919408 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL UNUSED SPACE AT START ---> MALLOC_NANO (reserved) 600018000000-600020000000 [128.0M] rw-/rwx SM=NUL ...(unallocated) Application Specific Information: *** multi-threaded process forked *** crashed on child side of fork pre-exec Kernel Triage: VM - pmap_enter failed with resource shortage VM - pmap_enter failed with resource shortage Thread 0 Crashed:: Dispatch queue: com.apple.root.utility-qos 0 libdispatch.dylib 0x199dd825c _dispatch_apply_with_attr_f + 1136 1 libdispatch.dylib 0x199dd8234 _dispatch_apply_with_attr_f + 1096 2 libdispatch.dylib 0x199dd847c dispatch_apply + 108 3 CoreFoundation 0x19a172a80 __104-[CFPrefsSearchListSource synchronouslySendDaemonMessage:andAgentMessage:andDirectMessage:replyHandler:]_block_invoke.92+ 132 4 CoreFoundation 0x19a007e8c CFPREFERENCES_IS_WAITING_FOR_SYSTEM_AND_USER_CFPREFSDS + 100 5 CoreFoundation 0x19a007ccc -[CFPrefsSearchListSource synchronouslySendDaemonMessage:andAgentMessage:andDirectMessage:replyHandler:]+ 232 6 CoreFoundation 0x19a00649c -[CFPrefsSearchListSource alreadylocked_generationCountFromListOfSources:count:]+ 252 7 CoreFoundation 0x19a006178 -[CFPrefsSearchListSource alreadylocked_getDictionary:] + 468 8 CoreFoundation 0x19a005cec -[CFPrefsSearchListSource alreadylocked_copyValueForKey:] + 172 9 CoreFoundation 0x19a005c20 -[CFPrefsSource copyValueForKey:] + 60 10 CoreFoundation 0x19a005bcc __76-[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:]_block_invoke+ 44 11 CoreFoundation 0x199ffe9e0 __108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke+ 384 12 CoreFoundation 0x19a173350 -[_CFXPreferences withSearchListForIdentifier:container:cloudConfigurationURL:perform:]+ 384 13 CoreFoundation 0x199ffe394 -[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:]+ 168 14 CoreFoundation 0x199ffe2b0 _CFPreferencesCopyAppValueWithContainerAndConfiguration + 128 15 Heimdal 0x1a5d4cb80 init_context_from_config_file + 2732 16 Heimdal 0x1a5d33944 krb5_set_config_files + 392 17 Heimdal 0x1a5d33284 krb5_init_context_flags + 308 18 Heimdal 0x1a5d33144 krb5_init_context + 32 19 Kerberos 0x1a7fc32e8 mshim_ctx + 64 20 Kerberos 0x1a7fc16e4 context_new_ccache_iterator + 92 21 libkrb5.3.3.dylib 0x1017accc8 api_macos_ptcursor_next + 220 22 libkrb5.3.3.dylib 0x1017a9f0c krb5_cccol_cursor_next + 76 23 libkrb5.3.3.dylib 0x1017aa1f4 krb5_cccol_have_content + 92 24 libgssapi_krb5.2.2.dylib 0x1016a1f58 acquire_cred_context + 1668 25 libgssapi_krb5.2.2.dylib 0x1016a185c acquire_cred_from + 688 26 libgssapi_krb5.2.2.dylib 0x101693b8c gss_add_cred_from + 1108 27 libgssapi_krb5.2.2.dylib 0x101693568 gss_acquire_cred_from + 308 28 libgssapi_krb5.2.2.dylib 0x101693428 gss_acquire_cred + 36 29 libpq.5.dylib 0x1012a9db8 pg_GSS_have_cred_cache + 60 30 libpq.5.dylib 0x10129927c PQconnectPoll + 5600 31 libpq.5.dylib 0x10129623c connectDBComplete + 304 32 libpq.5.dylib 0x1012963a8 PQconnectdb + 44 33 libphp.so 0x10229569c pdo_pgsql_handle_factory + 328 34 libphp.so 0x102282230 zim_PDO___construct + 1496 35 libphp.so 0x10249bd0c ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER + 304 36 libphp.so 0x102479868 execute_ex + 52 37 libphp.so 0x10244b314 zend_call_function + 1332 38 libphp.so 0x10236cef0 zif_call_user_func_array + 136 39 libphp.so 0x1024b83e4 ZEND_DO_FCALL_BY_NAME_SPEC_RETVAL_USED_HANDLER + 264 40 libphp.so 0x102479868 execute_ex + 52 41 libphp.so 0x102479a64 zend_execute + 288 42 libphp.so 0x102459d84 zend_execute_scripts + 156 43 libphp.so 0x1023ff9a8 php_execute_script + 460 44 libphp.so 0x10253efa8 php_handler + 1024 45 httpd 0x100cc61a4 ap_run_handler + 64 46 httpd 0x100cc687c ap_invoke_handler + 264 47 httpd 0x100cfe364 ap_internal_redirect + 60 48 mod_rewrite.so 0x10204b6d8 handler_redirect + 136 49 httpd 0x100cc61a4 ap_run_handler + 64 50 httpd 0x100cc687c ap_invoke_handler + 264 51 httpd 0x100cfdf3c ap_process_async_request + 792 52 httpd 0x100cfdfec ap_process_request + 24 53 httpd 0x100cfae64 ap_process_http_connection + 344 54 httpd 0x100cd785c ap_run_process_connection + 64 55 mod_mpm_prefork.so 0x1010e23ec child_main + 1092 56 mod_mpm_prefork.so 0x1010e1e74 make_child + 436 57 mod_mpm_prefork.so 0x1010e18b0 prefork_run + 2056 58 httpd 0x100cd9f30 ap_run_mpm + 84 59 httpd 0x100ccd3b4 main + 2260 60 dyld 0x100fd108c start + 520
On 11/30/23 09:45, John DeSoi wrote: > I have a macOS web server using Postgres that has been very stable until a month or two ago. If I restart the web serverthe problem seems to go away for a while, but starts happening again within days. I thought it was a PHP issue as discussedin the link below, but I just noticed in the crash report it seems to be something related to a call from libpq. > > https://github.com/shivammathur/homebrew-php/issues/1862 > > Any ideas or suggestions appreciated. Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the dark, but perhaps related to the discussion here? https://www.postgresql.org/message-id/flat/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> On Nov 30, 2023, at 8:59 AM, Joe Conway <mail@joeconway.com> wrote: > > Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the dark, but perhaps related to the discussion here? > > https://www.postgresql.org/message-id/flat/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com No, this server is on openssl 3.1.4. But thanks for sending that, I'm about to setup a new server and I'm sure it will endup with the latest versions. John DeSoi, Ph.D.
On 11/30/23 06:45, John DeSoi wrote: > I have a macOS web server using Postgres that has been very stable until a month or two ago. If I restart the web serverthe problem seems to go away for a while, but starts happening again within days. I thought it was a PHP issue as discussedin the link below, but I just noticed in the crash report it seems to be something related to a call from libpq. What starts happening? Does the Postgres log show anything? Postgres version? How was Postgres installed? > > https://github.com/shivammathur/homebrew-php/issues/1862 > > > Any ideas or suggestions appreciated. > > > John DeSoi, Ph.D. -- Adrian Klaver adrian.klaver@aklaver.com
> On Nov 30, 2023, at 9:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > What starts happening? Random web process crashes when connecting to PostgreSQL. > > Does the Postgres log show anything? No. > > Postgres version? > > How was Postgres installed? PostgreSQL 15.4 installed with Homebrew. John DeSoi, Ph.D.
On 11/30/23 07:49, John DeSoi wrote: > >> On Nov 30, 2023, at 9:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> >> What starts happening? > > Random web process crashes when connecting to PostgreSQL. > > >> >> Does the Postgres log show anything? > > No. To be clear, at the times the Web processes crash there is are no traces in the Postgres log of an issue on the Postgres side? Is there evidence in the Postgres logs of what the Web process was doing just before it crashed? > >> >> Postgres version? >> >> How was Postgres installed? > > > PostgreSQL 15.4 installed with Homebrew. > > > John DeSoi, Ph.D. > > -- Adrian Klaver adrian.klaver@aklaver.com
> On Nov 30, 2023, at 10:21 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > To be clear, at the times the Web processes crash there is are no traces in the Postgres log of an issue on the Postgresside? > > Is there evidence in the Postgres logs of what the Web process was doing just before it crashed? No entry in the Postgres log that I can see. The backtrace I posted in the original message was today at 7:06am. There isnothing in the Postgres log around that time except for some checkpoint messages. I think the backtrace shows that Postgres has just connected and is authenticating by calling Kerberos which calls Heimdaland then crashes in CoreFoundation. I also posted this issue on the Heimdal GitHub account. John DeSoi, Ph.D.
John DeSoi <john@desoi.dev> writes: >> On Nov 30, 2023, at 8:59 AM, Joe Conway <mail@joeconway.com> wrote: >> Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the dark, but perhaps related to the discussion here? >> https://www.postgresql.org/message-id/flat/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com > No, this server is on openssl 3.1.4. But thanks for sending that, I'm about to setup a new server and I'm sure it willend up with the latest versions. The crash appears to be happening within GSSAPI authentication, which presumably indicates that we're not using OpenSSL, so that isn't where to look. What troubles me about that stack trace is the references to Heimdal. We gave up supporting Heimdal (and v16 explicitly rejects building with it) because its support for Kerberos credentials was too incomplete and flaky. So I'm inclined to guess that you are running into some Heimdal bug. Try to rebuild libpq using MIT Kerberos and see if things get better. regards, tom lane
> On Nov 30, 2023, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > What troubles me about that stack trace is the references to Heimdal. > We gave up supporting Heimdal (and v16 explicitly rejects building > with it) because its support for Kerberos credentials was too > incomplete and flaky. So I'm inclined to guess that you are running > into some Heimdal bug. Try to rebuild libpq using MIT Kerberos > and see if things get better. I'm using v16 on my development machine and it is crashing on me at times with the same backtrace. Restarting the web serverfixes it for a while for some reason. Is there a way to simply disable GSSAPI authentication? I could not find it. The builds are from homebrew (https://brew.sh/). I'll have to see if there is a way for me to override build options. The otool output below shows that Apple's Kerberos is being used and I assume by extension, their Heimdal library. The Heimdalproject told me as much - Apple has a fork and would not pull from their project. John DeSoi, Ph.D. $ otool -L /usr/local/opt/postgresql@16/lib/libpq.5.dylib /usr/local/opt/postgresql@16/lib/libpq.5.dylib: /usr/local/opt/postgresql@16/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.16.0) /usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 13.0.0, current version 13.0.0) /usr/local/opt/openssl@3/lib/libssl.3.dylib (compatibility version 3.0.0, current version 3.0.0) /usr/local/opt/openssl@3/lib/libcrypto.3.dylib (compatibility version 3.0.0, current version 3.0.0) /usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib (compatibility version 2.0.0, current version 2.2.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3) /System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0) $ otool -L /usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib /usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib: /usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib (compatibility version 2.0.0, current version 2.2.0) @loader_path/libkrb5.3.3.dylib (compatibility version 3.0.0, current version 3.3.0) @loader_path/libk5crypto.3.1.dylib (compatibility version 3.0.0, current version 3.1.0) @loader_path/libcom_err.3.0.dylib (compatibility version 3.0.0, current version 3.0.0) @loader_path/libkrb5support.1.1.dylib (compatibility version 1.0.0, current version 1.1.0) /usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3) $ otool -L /usr/local/opt/krb5/lib/libkrb5.3.3.dylib /usr/local/opt/krb5/lib/libkrb5.3.3.dylib: /usr/local/opt/krb5/lib/libkrb5.3.3.dylib (compatibility version 3.0.0, current version 3.3.0) @loader_path/libk5crypto.3.1.dylib (compatibility version 3.0.0, current version 3.1.0) @loader_path/libcom_err.3.0.dylib (compatibility version 3.0.0, current version 3.0.0) @loader_path/libkrb5support.1.1.dylib (compatibility version 1.0.0, current version 1.1.0) /System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0) /usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)
John DeSoi <john@desoi.dev> writes: > On Nov 30, 2023, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What troubles me about that stack trace is the references to Heimdal. >> We gave up supporting Heimdal (and v16 explicitly rejects building >> with it) because its support for Kerberos credentials was too >> incomplete and flaky. So I'm inclined to guess that you are running >> into some Heimdal bug. Try to rebuild libpq using MIT Kerberos >> and see if things get better. > Is there a way to simply disable GSSAPI authentication? I could not find it. gssencmode=disable in your connection options; but that's a tad inconvenient probably. > The otool output below shows that Apple's Kerberos is being used and I assume by extension, their Heimdal library. TheHeimdal project told me as much - Apple has a fork and would not pull from their project. Ugh, not only Heimdal but a very obsolete version thereof? It borders on negligence for the homebrew PG package to be building against that. They should be pulling in homebrew's MIT Kerberos package and using that, if they want to enable GSSAPI. regards, tom lane
> On Nov 30, 2023, at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > gssencmode=disable in your connection options; but that's a tad > inconvenient probably. Yes, the application uses PHP PDO to connect to PostgreSQL. I don't see any way to specify that in the connection options. > Ugh, not only Heimdal but a very obsolete version thereof? It borders > on negligence for the homebrew PG package to be building against that. > They should be pulling in homebrew's MIT Kerberos package and using > that, if they want to enable GSSAPI. I was looking at the homebrew source for PostgreSQL package to see if there was a way to customize the build options. I didnot find one but saw the comment below. Apparently this is a known issue and it was suggested to use the MIT Kerberospackage 4 years ago. Instead they just added this comment in 2020. # GSSAPI provided by Kerberos.framework crashes when forked. # See https://github.com/Homebrew/homebrew-core/issues/47494. John DeSoi, Ph.D.
John DeSoi <john@desoi.dev> writes: >> On Nov 30, 2023, at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Ugh, not only Heimdal but a very obsolete version thereof? It borders >> on negligence for the homebrew PG package to be building against that. >> They should be pulling in homebrew's MIT Kerberos package and using >> that, if they want to enable GSSAPI. > I was looking at the homebrew source for PostgreSQL package to see if there was a way to customize the build options. Idid not find one but saw the comment below. Apparently this is a known issue and it was suggested to use the MIT Kerberospackage 4 years ago. Instead they just added this comment in 2020. > # GSSAPI provided by Kerberos.framework crashes when forked. > # See https://github.com/Homebrew/homebrew-core/issues/47494. Oh, thanks for finding that. But you misinterpreted the outcome; the commit that closed that thread did +# GSSAPI provided by Kerberos.framework crashes when forked. +# See https://github.com/Homebrew/homebrew-core/issues/47494. +depends_on "krb5" The "depends_on" was evidently meant to force building against krb5, and I suppose it did have that effect when committed. Could they have done something since then to break it? Looking closer, your stack trace seems to show that libpq *is* linked against MIT Kerberos: at least, control flows from libpq.5.dylib to libgssapi_krb5.2.2.dylib, which is not a library that Apple supplies. However, then a few subroutines further deep, we somehow end up in Apple's Kerberos framework, and that eventually calls libdispatch which is the source of the problem according to the discussion in issues/47494. My guess at this point is that somebody at Homebrew put in a hack (perhaps quite recently) that causes their build of MIT Kerberos to sometimes call Apple's implementation, and that ill-advised idea has re-opened the problem that issues/47494 meant to solve. I'd suggest filing a bug against Homebrew's krb5 package. Whatever this is, it seems pretty clear that it's not a Postgres bug. regards, tom lane
> On Dec 1, 2023, at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'd suggest filing a bug against Homebrew's krb5 package. > Whatever this is, it seems pretty clear that it's not a > Postgres bug. Will do, thank you and everyone else for the help and feedback. John DeSoi, Ph.D.
> On Nov 30, 2023, at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Is there a way to simply disable GSSAPI authentication? I could not find it. > > gssencmode=disable in your connection options; but that's a tad > inconvenient probably. I discovered there is a PGGSSENCMODE environment variable. I set it to 'disable' in the environment used to run the httpserver. Hopefully this will solve it. https://www.postgresql.org/docs/current/libpq-envars.html John DeSoi, Ph.D.