Home > mailing lists

Re: Unnecessary connection overhead due copy-on-write (mainly openssl) - Mailing list pgsql-hackers

From	Nico Williams
Subject	Re: Unnecessary connection overhead due copy-on-write (mainly openssl)
Date	June 6 23:18:50
Msg-id	aENNKkE+JkkwBtmV@ubby Whole thread Raw
In response to	Re: Unnecessary connection overhead due copy-on-write (mainly openssl) (Jacob Champion <jacob.champion@enterprisedb.com>)
Responses	Re: Unnecessary connection overhead due copy-on-write (mainly openssl)
List	pgsql-hackers

Tree view

On Fri, Jun 06, 2025 at 11:58:38AM -0700, Jacob Champion wrote:
> > I'd expect all subsystems to recover cleanly from unclean shutdowns.  I
> > know, that's a lot to expect, but nowadays pretty much all filesystems
> > used in production do, for example.
> 
> I guess, but if we stop cleaning up entirely, we will suddenly be
> stressing those code paths... But maybe that's a community service? :)

The latter.

> I realize I'm making an argument from fear and ignorance. Maybe that
> ecosystem is very healthy. I'm just imagining the following
> conversation:
> 
> DBA: we upgraded our server and our HSM is freaking out after a few
> thousand connections; what gives?
> us: oh, we stopped cleaning up after ourselves for performance! tell
> your vendor to fix their drivers!
> DBA: hahahaha

TPMs for example have a concept of session.  You can have up to 64 open
sessions, and if you use the TPM resource manager and you're accessing
it through a file descriptor then the RM will just clean up when you
exit.  Though if you're accessing the raw TPM directly then fail to
flush sessions then yes, you'll eventually be unable to create new ones.

However no one will be using a discrete or firmware TPM for TLS server
certificate private key usage: discrete TPMs are way way too slow for
that, and firmware TPMs are... also way too slow.  You wouldn't bother
with a software TPM for this unless it's for privilege separation.

Anyways, if you were using a TPM then the user's startup scripts, or
postgres itself could just flush all sessions and be done.

Other types of hardware cryptographic providers also tend to have a
notion of "session", and they all tend to have relatively paltry limits,
which means that the software side that calls them will generally need
to be prepared to a) close its own sessions eagerly (at the cost of
extra overhead on the next operation), and b) recover from running out
of sessions (by flushing others at the cost of causing those that were
live to need retries).

But anyways, IIUC the OpenSSL engine interface is itself stateless and I
would expect providers to auto-recover.  And anyways I expect no one
uses PG with HW cryptographic providers to perform TLS server
signatures.  Instead the best current practice would be to use
short-lived server certificates with software keys and longer-lived
credentials in hardware with which to fetch new short-lived credentials
with software keys.  The kinds of HSMs that can do high rates of
signatures are neither cheap nor commonly used, and those do tend to
have higher session limits, and again you can recover from running out
of sessions by flushing extant sessions.

> > I doubt that PG w/ OpenSSL in any configuration maintains stateful
> > interactions with HW cryptographic providers.
> 
> (Why? From looking over the Cryptoki/PKCS#11 stuff, for example, isn't
> a lot of that API stateful?)

PKCS#11 is stateful, yes (it has session handles), but there are
generally low limits on how many sessions you can keep open, therefore
high pressure to close them soon, therefore the inference is that that
must be what actually happens at the rather high cost of having to set
up new sessions often.  That inference could be wrong, but then as you
note you'd be doing the community a service by testing it and making it
true in the future.

Nico
--

pgsql-hackers by date:

From: "Daniel Verite"
Date: 06 June, 23:03:07
Subject: Re: CREATE DATABASE command for non-libc providers

From: Masahiko Sawada
Date: 07 June, 00:13:23
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Re: Unnecessary connection overhead due copy-on-write (mainly openssl) - Mailing list pgsql-hackers

Previous

Next