Unnecessary connection overhead due copy-on-write (mainly openssl) - Mailing list pgsql-hackers

From Andres Freund
Subject Unnecessary connection overhead due copy-on-write (mainly openssl)
Date
Msg-id hgs2vs74tzxigf2xqosez7rpf3ia5e7izalg5gz3lv3nqfptxx@thanmprbpl4e
Whole thread Raw
Responses Re: Unnecessary connection overhead due copy-on-write (mainly openssl)
Re: Unnecessary connection overhead due copy-on-write (mainly openssl)
List pgsql-hackers
Hi,

Looking at [1] I, again, noticed that a decent portion of our connection
overhead is due to openssl's atexit handler.

On my older workstation (with a few noisy things running):

c=16;pgbench -n -M prepared -c$c -j$c -P1 -T10 -f <(echo 'select') -C
-> 3057 TPS

If I change the exit() in proc_exit() to a _exit():
-> 3633 TPS

The reason for this difference is that by default openssl registers an atexit
handler that frees a lot of memory that was initialized in postmaster. That in
turn triggers page-faults due to the relevant pages now differing in child
processes. Which a) isn't cheap b) causes contention with postmaster, since
those datastructures are shared.


It's possible to tell openssl to not register an atexit handler, see [2]:

> OPENSSL_INIT_NO_ATEXIT
>   By default OpenSSL will attempt to clean itself up when the process exits via
>   an "atexit" handler. Using this option suppresses that behaviour. This means
>   that the application will have to clean up OpenSSL explicitly using
>   OPENSSL_cleanup().

One slight difficulty is that we initialize openssl somewhat indirectly, via
PostmasterMain()->InitProcessGlobals()->pg_prng_strong_seed() which then, if
built with openssl support, triggers initialization within RAND_status().


The quick hack of putting

#ifdef USE_OPENSSL
    OPENSSL_init_crypto(OPENSSL_INIT_NO_ATEXIT, NULL);
#endif

at the start of PostmasterMain() gets the connection speed up a fair bit:
-> 3449 TPS


The reason this isn't as good as using _exit is that there are other libraries
with (effectively) atexit handlers. In particular ICU pulls in libstdc++,
which in turn seems to have a lot of destructors for global objects that
aren't cheap.

If I build without ICU support, the connection rate with exit() (and the
openssl "fix") is
-> 3863 TPS
and if I use _exit() it is
-> 3900 TPS

I.e. at that point the remaining atexit handlers only play a small role.

I don't know if there's a decent solution for the nontrivial overhead due to
ICU -> libstdc++'s atexit handlers.



There are a few related issues where we ourselves to blame. The most prominent
one is that we go around and delete PostmasterContext in child processes. That
however doesn't really save memory, as the memory is still needed in
postmaster, we just end up causing page faults that trigger copy-on-write.

If I just comment out the MemoryContextDelete in PostgresMain() I see
connection rates improve from
-> 3891 TPS
to
-> 4004 TPS


If I build a much more minimal postgres, disabling all optional dependencies
other than openssl I see a significant improvement, just due fewer mmaps for
the libraries:
-> 4865 TPS

Further disabling openssl and zlib interestingly does not help, interestingly.


Greetings,

Andres Freund

[1] https://postgr.es/m/CAFbpF8OA44_UG%2BRYJcWH9WjF7E3GA6gka3gvH6nsrSnEe9H0NA%40mail.gmail.com
[2] https://docs.openssl.org/3.1/man3/OPENSSL_init_crypto/#name



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: [19] Proposal: function markers to indicate collation/ctype sensitivity
Next
From: Nathan Bossart
Date:
Subject: Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring