Thread: Internal key management system
Hi, I've started a new separate thread from the previous long thread[1] for internal key management system to PostgreSQL. As I mentioned in the previous mail[2], we've decided to step back and focus on only internal key management system for PG13. The internal key management system introduces the functionality to PostgreSQL that allows user to encrypt and decrypt data without knowing the actual key. Besides, it will be able to be integrated with transparent data encryption in the future. The basic idea is that PostgreSQL generates the master encryption key which is further protected by the user-provided passphrase. The key management system provides two functions to wrap and unwrap the secret by the master encryption key. A user generates a secret key locally and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save it somewhere. Then the user can use the encrypted secret key to encrypt data and decrypt data by something like: INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx')); SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('xxxxx')) FROM tbl; Where 'xxxxx' is the result of pg_kmgr_wrap function. That way we can get something encrypted and decrypted without ever knowing the actual key that was used to encrypt it. I'm currently updating the patch and will submit it. On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni <sehrope@jackdb.com> wrote: > > On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni <sehrope@jackdb.com> wrote: > > > That > > > would allow the internal usage to have a fixed output length of > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes. > > > > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for > > AES256 (master key) internally generated. > > No it should be 64-bytes. That way we can have separate 32-byte > encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256). > > While it's common to reuse the same 32-byte key for both AES256 and an > HMAC-SHA256 and there aren't any known issues with doing so, when > designing something from scratch it's more secure to use entirely > separate keys. The HMAC key you mentioned above is not the same as the HMAC key derived from the user provided passphrase, right? That is, individual key needs to have its IV and HMAC key. Given that the HMAC key used for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from passphrase), what will be the former key used for? > > > > For the user facing piece, padding would enabled to support arbitrary > > > input data lengths. That would make the output length grow by up to > > > 16-bytes (rounding the data length up to the AES block size) plus one > > > more byte if a version field is added. > > > > I think the length of padding also needs to be added to the output. > > Anyway, in the first version the same methods of wrapping/unwrapping > > key are used for both internal use and user facing function. And user > > input key needs to be a multiple of 16 bytes value. > > A separate length field does not need to be added as the > padding-enabled output will already include it at the end[1]. This > would be handled automatically by the OpenSSL encryption / decryption > operations (if it's enabled): > Yes, right. Regards, [1] https://www.postgresql.org/message-id/031401d3f41d%245c70ed90%241552c8b0%24%40lab.ntt.co.jp [2] https://www.postgresql.org/message-id/CAD21AoD8QT0TWs3ma-aB821vwDKa1X519y1w3yrRKkAWjhZcrw%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Masahiko-san, > I've started a new separate thread from the previous long thread[1] > for internal key management system to PostgreSQL. As I mentioned in > the previous mail[2], we've decided to step back and focus on only > internal key management system for PG13. The internal key management > system introduces the functionality to PostgreSQL that allows user to > encrypt and decrypt data without knowing the actual key. Besides, it > will be able to be integrated with transparent data encryption in the > future. > > The basic idea is that PostgreSQL generates the master encryption key > which is further protected by the user-provided passphrase. The key > management system provides two functions to wrap and unwrap the secret > by the master encryption key. A user generates a secret key locally In understand that the secret key is sent in the clear for being encrypted by a master key. > and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save > it somewhere. Then the user can use the encrypted secret key to > encrypt data and decrypt data by something like: > > INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx')); > SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('xxxxx')) FROM tbl; > > Where 'xxxxx' is the result of pg_kmgr_wrap function. I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what prevent users to: SELECT pg_kmgr_unwrap('xxxx'); so as to recover the key, somehow in contradiction to "allows user to encrypt and decrypt data without knowing the actual key". When dealing with cryptography and key management, I can only recommand extreme caution. -- Fabien.
Hi,
I've started a new separate thread from the previous long thread[1]
for internal key management system to PostgreSQL. As I mentioned in
the previous mail[2], we've decided to step back and focus on only
internal key management system for PG13. The internal key management
system introduces the functionality to PostgreSQL that allows user to
encrypt and decrypt data without knowing the actual key. Besides, it
will be able to be integrated with transparent data encryption in the
future.
The basic idea is that PostgreSQL generates the master encryption key
which is further protected by the user-provided passphrase. The key
management system provides two functions to wrap and unwrap the secret
by the master encryption key. A user generates a secret key locally
and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save
it somewhere. Then the user can use the encrypted secret key to
encrypt data and decrypt data by something like:
INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx'));
SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('xxxxx')) FROM tbl;
Where 'xxxxx' is the result of pg_kmgr_wrap function.
That way we can get something encrypted and decrypted without ever
knowing the actual key that was used to encrypt it.
I'm currently updating the patch and will submit it.
On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > > That
> > > would allow the internal usage to have a fixed output length of
> > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
> >
> > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
> > AES256 (master key) internally generated.
>
> No it should be 64-bytes. That way we can have separate 32-byte
> encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256).
>
> While it's common to reuse the same 32-byte key for both AES256 and an
> HMAC-SHA256 and there aren't any known issues with doing so, when
> designing something from scratch it's more secure to use entirely
> separate keys.
The HMAC key you mentioned above is not the same as the HMAC key
derived from the user provided passphrase, right? That is, individual
key needs to have its IV and HMAC key. Given that the HMAC key used
for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from
passphrase), what will be the former key used for?
>
> > > For the user facing piece, padding would enabled to support arbitrary
> > > input data lengths. That would make the output length grow by up to
> > > 16-bytes (rounding the data length up to the AES block size) plus one
> > > more byte if a version field is added.
> >
> > I think the length of padding also needs to be added to the output.
> > Anyway, in the first version the same methods of wrapping/unwrapping
> > key are used for both internal use and user facing function. And user
> > input key needs to be a multiple of 16 bytes value.
>
> A separate length field does not need to be added as the
> padding-enabled output will already include it at the end[1]. This
> would be handled automatically by the OpenSSL encryption / decryption
> operations (if it's enabled):
>
Yes, right.
Regards,
[1] https://www.postgresql.org/message-id/031401d3f41d%245c70ed90%241552c8b0%24%40lab.ntt.co.jp
[2] https://www.postgresql.org/message-id/CAD21AoD8QT0TWs3ma-aB821vwDKa1X519y1w3yrRKkAWjhZcrw%40mail.gmail.com
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, 2 Feb 2020 at 17:05, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > > I've started a new separate thread from the previous long thread[1] > > for internal key management system to PostgreSQL. As I mentioned in > > the previous mail[2], we've decided to step back and focus on only > > internal key management system for PG13. The internal key management > > system introduces the functionality to PostgreSQL that allows user to > > encrypt and decrypt data without knowing the actual key. Besides, it > > will be able to be integrated with transparent data encryption in the > > future. > > > > The basic idea is that PostgreSQL generates the master encryption key > > which is further protected by the user-provided passphrase. The key > > management system provides two functions to wrap and unwrap the secret > > by the master encryption key. A user generates a secret key locally > > In understand that the secret key is sent in the clear for being encrypted > by a master key. Yeah we need to be careful about the secret key not being logged in any logs such as server logs for example when log_statement = 'all'. I guess that wrapping key doesn't often happen during service running but does once at development phase. So it would not be a big problem but probably we need to have something to deal with it. > > > and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save > > it somewhere. Then the user can use the encrypted secret key to > > encrypt data and decrypt data by something like: > > > > INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx')); > > SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('xxxxx')) FROM tbl; > > > > Where 'xxxxx' is the result of pg_kmgr_wrap function. > > I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what > prevent users to: > > SELECT pg_kmgr_unwrap('xxxx'); > > so as to recover the key, somehow in contradiction to "allows user to > encrypt and decrypt data without knowing the actual key". I might be missing your point but the above 'xxxx' is the wrapped key wrapped by the master key stored in PostgreSQL server. So user doesn't need to know the raw secret key to encrypt/decrypt the data. Even if a malicious user gets 'xxxx' they cannot know the actual secret key without the master key. pg_kmgr_wrap and pg_kmgr_unwrap are functions and it's possible for user to know the raw secret key by using pg_kmgr_unwrap(). The master key stored in PostgreSQL server never be revealed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> >
> > On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > > > That
> > > > would allow the internal usage to have a fixed output length of
> > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
> > >
> > > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
> > > AES256 (master key) internally generated.
> >
> > No it should be 64-bytes. That way we can have separate 32-byte
> > encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256).
> >
> > While it's common to reuse the same 32-byte key for both AES256 and an
> > HMAC-SHA256 and there aren't any known issues with doing so, when
> > designing something from scratch it's more secure to use entirely
> > separate keys.
>
> The HMAC key you mentioned above is not the same as the HMAC key
> derived from the user provided passphrase, right? That is, individual
> key needs to have its IV and HMAC key. Given that the HMAC key used
> for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from
> passphrase), what will be the former key used for?
It's not derived from the passphrase, it's unlocked by the passphrase (along with the master encryption key). The server will have 64-bytes of random data, saved encrypted in pg_control, which can be treated as two separate 32-byte keys, let's call them master_encryption_key and master_mac_key. The 64-bytes is unlocked by decrypting it with the user passphrase at startup (which itself would be split into a pair of encryption and MAC keys to do the unlocking).
The wrap and unwrap operations would use both keys:
wrap(plain_text, encryption_key, mac_key) {
// Generate random IV:
iv = pg_strong_random(16);
// Encrypt:
cipher_text = encrypt_aes256_cbc(encryption_key, iv, plain_text);
// Compute MAC on all inputs:
mac = hmac_sha256(mac_key, encryption_key || iv || cipher_text);
// Concat user facing pieces together
wrapped = mac || iv || cipher_text;
return wrapped;
}
unwrap(wrapped, encryption_key, mac_key) {
actual_mac = wrapped.slice(0, 32);
iv = wrapped.slice(0 + 32, 16);
cipher_text = wrapped.slice(0 + 32 + 16);
// Compute MAC on all inputs:
expected_mac = hmac_sha256(mac_key, encryption_key || iv || cipher_text);
if (expected_mac != actual_mac) { return Error("MAC does not match"); }
// MAC matches so decrypt:
plain_text = decrypt_aes256_cbc(encryption_key, iv, cipher_text);
return plain_text;
}
Every input to the encryption operation, including the encryption key, must be included into the HMAC calculation. If you use the same key for both encryption and MAC that's not required as it's already part of the MAC process as the key. Using separate keys requires explicitly adding in the encryption key into the MAC input to ensure that it the correct key prior to decryption in the unwrap operation. Any additional parts of the wrapped output (ex: a "version" byte for the algos or padding choices) should also be included.
The wrap / unwrap above would be used with the encryption and mac keys derived from the user passphrase to unlock the master_encryption_key and master_mac_key from pg_control. Then those would be used by the higher level functions:
pg_kmgr_wrap(plain_text) {
return wrap(plain_text, master_encryption_key, master_mac_key);
}
pg_kmgr_unwrap(wrapped) {
return unwrap(wrapped, master_encryption_key, master_mac_key);
}
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Mon, Feb 3, 2020 at 10:18 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what > > prevent users to: > > > > SELECT pg_kmgr_unwrap('xxxx'); > > > > so as to recover the key, somehow in contradiction to "allows user to > > encrypt and decrypt data without knowing the actual key". > > I might be missing your point but the above 'xxxx' is the wrapped key > wrapped by the master key stored in PostgreSQL server. So user doesn't > need to know the raw secret key to encrypt/decrypt the data. Even if a > malicious user gets 'xxxx' they cannot know the actual secret key > without the master key. pg_kmgr_wrap and pg_kmgr_unwrap are functions > and it's possible for user to know the raw secret key by using > pg_kmgr_unwrap(). The master key stored in PostgreSQL server never be > revealed. I think I have the same confusion as Fabien. Isn't it bad if somebody just runs pg_kmgr_unwrap() and records the return value? Now they've stolen your encryption key, it seems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 3, 2020 at 10:18 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> > I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what
> > prevent users to:
> >
> > SELECT pg_kmgr_unwrap('xxxx');
> >
> > so as to recover the key, somehow in contradiction to "allows user to
> > encrypt and decrypt data without knowing the actual key".
>
> I might be missing your point but the above 'xxxx' is the wrapped key
> wrapped by the master key stored in PostgreSQL server. So user doesn't
> need to know the raw secret key to encrypt/decrypt the data. Even if a
> malicious user gets 'xxxx' they cannot know the actual secret key
> without the master key. pg_kmgr_wrap and pg_kmgr_unwrap are functions
> and it's possible for user to know the raw secret key by using
> pg_kmgr_unwrap(). The master key stored in PostgreSQL server never be
> revealed.
I think I have the same confusion as Fabien. Isn't it bad if somebody
just runs pg_kmgr_unwrap() and records the return value? Now they've
stolen your encryption key, it seems.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, 7 Feb 2020 at 05:30, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 3, 2020 at 10:18 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what > > > prevent users to: > > > > > > SELECT pg_kmgr_unwrap('xxxx'); > > > > > > so as to recover the key, somehow in contradiction to "allows user to > > > encrypt and decrypt data without knowing the actual key". > > > > I might be missing your point but the above 'xxxx' is the wrapped key > > wrapped by the master key stored in PostgreSQL server. So user doesn't > > need to know the raw secret key to encrypt/decrypt the data. Even if a > > malicious user gets 'xxxx' they cannot know the actual secret key > > without the master key. pg_kmgr_wrap and pg_kmgr_unwrap are functions > > and it's possible for user to know the raw secret key by using > > pg_kmgr_unwrap(). The master key stored in PostgreSQL server never be > > revealed. > > I think I have the same confusion as Fabien. Isn't it bad if somebody > just runs pg_kmgr_unwrap() and records the return value? Now they've > stolen your encryption key, it seems. This feature protects data from disk thefts but cannot protect data from attackers who are able to access PostgreSQL server. In this design application side still is responsible for managing the wrapped secret in order to protect it from attackers. This is the same as when we use pgcrypto now. The difference is that data is safe even if attackers steal the wrapped secret and the disk. The data cannot be decrypted either without the passphrase which can be stored to other key management systems or without accessing postgres server. IOW for example, attackers can get the data if they get the wrapped secret managed by application side then run pg_kmgr_unwrap() to get the secret and then steal the disk. Another idea we discussed is to internally integrate pgcrypto with the key management system. That is, the key management system has one master key and provides a C function to pass the master key to other postgres modules. pgcrypto uses that function and provides new encryption and decryption functions something like pg_encrypt_with_key() and pg_decrypt_with_key(). Which encrypts/decrypts the given data by the master key stored in database cluster. That way user still doesn't have to know the encryption key and we can protect data from disk thefts. But the down side would be that we have only one encryption key and that we might need to change pgcrypto much. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2020-02-07 11:18:29 +0900, Masahiko Sawada wrote: > Another idea we discussed is to internally integrate pgcrypto with the > key management system. Perhaps this has already been discussed (I only briefly looked): I'd strongly advise against having any new infrastrure depend on pgcrypto. Its code quality imo is well below our standards and contains serious red flags like very outdated copies of cryptography algorithm implementations. I think we should consider deprecating and removing it, not expanding its use. It certainly shouldn't be involved in any potential disk encryption system at a later stage. Greetings, Andres Freund
On Fri, 7 Feb 2020 at 11:36, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-02-07 11:18:29 +0900, Masahiko Sawada wrote: > > Another idea we discussed is to internally integrate pgcrypto with the > > key management system. > > Perhaps this has already been discussed (I only briefly looked): I'd > strongly advise against having any new infrastrure depend on > pgcrypto. Its code quality imo is well below our standards and contains > serious red flags like very outdated copies of cryptography algorithm > implementations. I think we should consider deprecating and removing > it, not expanding its use. It certainly shouldn't be involved in any > potential disk encryption system at a later stage. Thank you for the advise. Yeah I'm not going to use pgcrypto for transparent data encryption. The KMS patch includes the new basic infrastructure for cryptographic functions (mainly AES-CBC). I'm thinking we can expand that infrastructure so that we can also use it for TDE purpose by supporting new cryptographic functions such as AES-CTR. Anyway, I agree to not have it depend on pgcrypto. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: > Yeah I'm not going to use pgcrypto for transparent data encryption. > The KMS patch includes the new basic infrastructure for cryptographic > functions (mainly AES-CBC). I'm thinking we can expand that > infrastructure so that we can also use it for TDE purpose by > supporting new cryptographic functions such as AES-CTR. Anyway, I > agree to not have it depend on pgcrypto. I thought for a minute, before checking the patch, that you were saying above that the KMS patch includes its *own* implementation of cryptographic functions. I think it's pretty crucial that it continues not to do that... Greetings, Andres Freund
On Sat, 8 Feb 2020 at 03:24, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: > > Yeah I'm not going to use pgcrypto for transparent data encryption. > > The KMS patch includes the new basic infrastructure for cryptographic > > functions (mainly AES-CBC). I'm thinking we can expand that > > infrastructure so that we can also use it for TDE purpose by > > supporting new cryptographic functions such as AES-CTR. Anyway, I > > agree to not have it depend on pgcrypto. > > I thought for a minute, before checking the patch, that you were saying > above that the KMS patch includes its *own* implementation of > cryptographic functions. I think it's pretty crucial that it continues > not to do that... I meant that we're going to use OpenSSL for AES encryption and decryption independent of pgcrypto's openssl code, as the first step. That is, KMS is available only when configured --with-openssl. And hopefully we eventually merge these openssl code and have pgcrypto use it, like when we introduced SCRAM. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: >On Sat, 8 Feb 2020 at 03:24, Andres Freund <andres@anarazel.de> wrote: >> >> Hi, >> >> On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: >> > Yeah I'm not going to use pgcrypto for transparent data encryption. >> > The KMS patch includes the new basic infrastructure for cryptographic >> > functions (mainly AES-CBC). I'm thinking we can expand that >> > infrastructure so that we can also use it for TDE purpose by >> > supporting new cryptographic functions such as AES-CTR. Anyway, I >> > agree to not have it depend on pgcrypto. >> >> I thought for a minute, before checking the patch, that you were saying >> above that the KMS patch includes its *own* implementation of >> cryptographic functions. I think it's pretty crucial that it continues >> not to do that... > >I meant that we're going to use OpenSSL for AES encryption and >decryption independent of pgcrypto's openssl code, as the first step. >That is, KMS is available only when configured --with-openssl. And >hopefully we eventually merge these openssl code and have pgcrypto use >it, like when we introduced SCRAM. > I don't think it's very likely we'll ever merge any openssl code into our repository, e.g. because of licensing. But we already have AES implementation in pgcrypto - why not to use that? I'm not saying we should make this depend on pgcrypto, but maybe we should move the AES library from pgcrypto into src/common or something like that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, I wonder if this is meant to support external KMS systems/services like Vault (from HashiCorp) or CloudHSM (from AWS) or a hardware HSM. AFAICS the current implementation does not allow storing keys in such external systems, right? But it seems kinda reasonable to want to do that, when already using the HSM for other parts of the system. Now, I'm not saying the first version we commit has to support this, or that it necessarily makes sense. But for example MariaDB seems to support this [1]. [1] https://mariadb.com/kb/en/encryption-key-management/ -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On February 8, 2020 7:08:26 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: >>On Sat, 8 Feb 2020 at 03:24, Andres Freund <andres@anarazel.de> wrote: >>> >>> Hi, >>> >>> On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: >>> > Yeah I'm not going to use pgcrypto for transparent data >encryption. >>> > The KMS patch includes the new basic infrastructure for >cryptographic >>> > functions (mainly AES-CBC). I'm thinking we can expand that >>> > infrastructure so that we can also use it for TDE purpose by >>> > supporting new cryptographic functions such as AES-CTR. Anyway, I >>> > agree to not have it depend on pgcrypto. >>> >>> I thought for a minute, before checking the patch, that you were >saying >>> above that the KMS patch includes its *own* implementation of >>> cryptographic functions. I think it's pretty crucial that it >continues >>> not to do that... >> >>I meant that we're going to use OpenSSL for AES encryption and >>decryption independent of pgcrypto's openssl code, as the first step. >>That is, KMS is available only when configured --with-openssl. And >>hopefully we eventually merge these openssl code and have pgcrypto use >>it, like when we introduced SCRAM. >> > >I don't think it's very likely we'll ever merge any openssl code into >our repository, e.g. because of licensing. But we already have AES >implementation in pgcrypto - why not to use that? I'm not saying we >should make this depend on pgcrypto, but maybe we should move the AES >library from pgcrypto into src/common or something like that. The code uses functions exposed by openssl, it doesn't copy there code. And no, I don't think we should copy the implemented from pgcrypto - it's not good. We should remove it entirely. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote: >Hi, > >On February 8, 2020 7:08:26 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: >>>On Sat, 8 Feb 2020 at 03:24, Andres Freund <andres@anarazel.de> wrote: >>>> >>>> Hi, >>>> >>>> On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: >>>> > Yeah I'm not going to use pgcrypto for transparent data >>encryption. >>>> > The KMS patch includes the new basic infrastructure for >>cryptographic >>>> > functions (mainly AES-CBC). I'm thinking we can expand that >>>> > infrastructure so that we can also use it for TDE purpose by >>>> > supporting new cryptographic functions such as AES-CTR. Anyway, I >>>> > agree to not have it depend on pgcrypto. >>>> >>>> I thought for a minute, before checking the patch, that you were >>saying >>>> above that the KMS patch includes its *own* implementation of >>>> cryptographic functions. I think it's pretty crucial that it >>continues >>>> not to do that... >>> >>>I meant that we're going to use OpenSSL for AES encryption and >>>decryption independent of pgcrypto's openssl code, as the first step. >>>That is, KMS is available only when configured --with-openssl. And >>>hopefully we eventually merge these openssl code and have pgcrypto use >>>it, like when we introduced SCRAM. >>> >> >>I don't think it's very likely we'll ever merge any openssl code into >>our repository, e.g. because of licensing. But we already have AES >>implementation in pgcrypto - why not to use that? I'm not saying we >>should make this depend on pgcrypto, but maybe we should move the AES >>library from pgcrypto into src/common or something like that. > >The code uses functions exposed by openssl, it doesn't copy there code. > Sure, I know the code is currently calling ooenssl functions. I was responding to Masahiko-san's message that we might eventually merge this openssl code into our tree. >And no, I don't think we should copy the implemented from pgcrypto - >it's not good. We should remove it entirely. OK, no opinion on the quality of this implementation. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote: >> On February 8, 2020 7:08:26 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>>> I don't think it's very likely we'll ever merge any openssl code into >>>> our repository, e.g. because of licensing. But we already have AES >>>> implementation in pgcrypto - why not to use that? I'm not saying we >>>> should make this depend on pgcrypto, but maybe we should move the AES >>>> library from pgcrypto into src/common or something like that. >> The code uses functions exposed by openssl, it doesn't copy there code. > Sure, I know the code is currently calling ooenssl functions. I was > responding to Masahiko-san's message that we might eventually merge this > openssl code into our tree. No. This absolutely, positively, will not happen. There will never be crypto functions in our core tree, because then there'd be no recourse for people who want to use Postgres in countries with restrictions on crypto software. It's hard enough for them that we have such code in contrib --- but at least they can remove pgcrypto and be legal. If it's in src/common then they're stuck. For the same reason, I don't think that an "internal key management" feature in the core code is ever going to be acceptable. It has to be an extension. (But, as long as it's an extension, whether it's bringing its own crypto or relying on some other extension for that doesn't matter from the legal standpoint.) regards, tom lane
On Sun, 9 Feb 2020 at 01:53, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote: > >Hi, > > > >On February 8, 2020 7:08:26 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > >>On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: > >>>On Sat, 8 Feb 2020 at 03:24, Andres Freund <andres@anarazel.de> wrote: > >>>> > >>>> Hi, > >>>> > >>>> On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: > >>>> > Yeah I'm not going to use pgcrypto for transparent data > >>encryption. > >>>> > The KMS patch includes the new basic infrastructure for > >>cryptographic > >>>> > functions (mainly AES-CBC). I'm thinking we can expand that > >>>> > infrastructure so that we can also use it for TDE purpose by > >>>> > supporting new cryptographic functions such as AES-CTR. Anyway, I > >>>> > agree to not have it depend on pgcrypto. > >>>> > >>>> I thought for a minute, before checking the patch, that you were > >>saying > >>>> above that the KMS patch includes its *own* implementation of > >>>> cryptographic functions. I think it's pretty crucial that it > >>continues > >>>> not to do that... > >>> > >>>I meant that we're going to use OpenSSL for AES encryption and > >>>decryption independent of pgcrypto's openssl code, as the first step. > >>>That is, KMS is available only when configured --with-openssl. And > >>>hopefully we eventually merge these openssl code and have pgcrypto use > >>>it, like when we introduced SCRAM. > >>> > >> > >>I don't think it's very likely we'll ever merge any openssl code into > >>our repository, e.g. because of licensing. But we already have AES > >>implementation in pgcrypto - why not to use that? I'm not saying we > >>should make this depend on pgcrypto, but maybe we should move the AES > >>library from pgcrypto into src/common or something like that. > > > >The code uses functions exposed by openssl, it doesn't copy there code. > > > > Sure, I know the code is currently calling ooenssl functions. I was > responding to Masahiko-san's message that we might eventually merge this > openssl code into our tree. Sorry for confusing you. What I wanted to say is to write AES encryption code in src/common using openssl library as the first step apart from pgcrypto's openssl code, and then merge these two code library into src/common as the next step. That is, it's moving the AES library pgcrypto to src/common as you mentioned. IIRC when we introduced SCRAM we moved sha2 library from pgcrypto to src/common. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: Andres Freund <andres@anarazel.de> > Perhaps this has already been discussed (I only briefly looked): I'd > strongly advise against having any new infrastrure depend on > pgcrypto. Its code quality imo is well below our standards and contains > serious red flags like very outdated copies of cryptography algorithm > implementations. I think we should consider deprecating and removing > it, not expanding its use. It certainly shouldn't be involved in any > potential disk encryption system at a later stage. +1 Regards Takayuki Tsunakawa
On Wed, 5 Feb 2020 at 22:28, Sehrope Sarkuni <sehrope@jackdb.com> wrote: > > On Sat, Feb 1, 2020 at 7:02 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni <sehrope@jackdb.com> wrote: > > > > > > On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni <sehrope@jackdb.com> wrote: > > > > > That > > > > > would allow the internal usage to have a fixed output length of > > > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes. > > > > > > > > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for > > > > AES256 (master key) internally generated. > > > > > > No it should be 64-bytes. That way we can have separate 32-byte > > > encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256). > > > > > > While it's common to reuse the same 32-byte key for both AES256 and an > > > HMAC-SHA256 and there aren't any known issues with doing so, when > > > designing something from scratch it's more secure to use entirely > > > separate keys. > > > > The HMAC key you mentioned above is not the same as the HMAC key > > derived from the user provided passphrase, right? That is, individual > > key needs to have its IV and HMAC key. Given that the HMAC key used > > for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from > > passphrase), what will be the former key used for? > > It's not derived from the passphrase, it's unlocked by the passphrase (along with the master encryption key). The serverwill have 64-bytes of random data, saved encrypted in pg_control, which can be treated as two separate 32-byte keys,let's call them master_encryption_key and master_mac_key. The 64-bytes is unlocked by decrypting it with the user passphraseat startup (which itself would be split into a pair of encryption and MAC keys to do the unlocking). > > The wrap and unwrap operations would use both keys: > > wrap(plain_text, encryption_key, mac_key) { > // Generate random IV: > iv = pg_strong_random(16); > // Encrypt: > cipher_text = encrypt_aes256_cbc(encryption_key, iv, plain_text); > // Compute MAC on all inputs: > mac = hmac_sha256(mac_key, encryption_key || iv || cipher_text); > // Concat user facing pieces together > wrapped = mac || iv || cipher_text; > return wrapped; > } > > unwrap(wrapped, encryption_key, mac_key) { > // Split wrapped into its pieces: > actual_mac = wrapped.slice(0, 32); > iv = wrapped.slice(0 + 32, 16); > cipher_text = wrapped.slice(0 + 32 + 16); > // Compute MAC on all inputs: > expected_mac = hmac_sha256(mac_key, encryption_key || iv || cipher_text); > // Compare MAC vs value in wrapped: > if (expected_mac != actual_mac) { return Error("MAC does not match"); } > // MAC matches so decrypt: > plain_text = decrypt_aes256_cbc(encryption_key, iv, cipher_text); > return plain_text; > } > > Every input to the encryption operation, including the encryption key, must be included into the HMAC calculation. If youuse the same key for both encryption and MAC that's not required as it's already part of the MAC process as the key. Usingseparate keys requires explicitly adding in the encryption key into the MAC input to ensure that it the correct keyprior to decryption in the unwrap operation. Any additional parts of the wrapped output (ex: a "version" byte for thealgos or padding choices) should also be included. > > The wrap / unwrap above would be used with the encryption and mac keys derived from the user passphrase to unlock the master_encryption_keyand master_mac_key from pg_control. Then those would be used by the higher level functions: > > pg_kmgr_wrap(plain_text) { > return wrap(plain_text, master_encryption_key, master_mac_key); > } > > pg_kmgr_unwrap(wrapped) { > return unwrap(wrapped, master_encryption_key, master_mac_key); > } Thank you for explaining the details. I had missed something. Attached updated patch incorporated all comments I got so far. The changes are: * Renamed data_encryption_cipher to key_management_cipher * Renamed pg_kmgr_wrap and pg_kmgr_unwrap to pg_wrap_key and pg_unwrap_key * Changed wrap and unwrap procedure based on the comments * Removed the restriction of requiring the input key being a multiple of 16 bytes. * Created a context dedicated to wrap and unwrap data Documentation and regression tests are still missing. Regarding key rotation, currently we allow online key rotation by doing pg_rotate_encryption_key after changing cluster_passphrase_command and loading. But if the server crashed during key rotation it might require the old passphrase in spite of the passphrase command in postgresql.conf having been changed. We need to deal with it but I'm not sure the best approach. Possibly having a new frontend tool that changes the key offline would be a safe approach. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2020-02-08 12:07:57 -0500, Tom Lane wrote: > For the same reason, I don't think that an "internal key management" > feature in the core code is ever going to be acceptable. It has to > be an extension. (But, as long as it's an extension, whether it's > bringing its own crypto or relying on some other extension for that > doesn't matter from the legal standpoint.) I'm not convinced by that. We have optional in-core functionality that requires external libraries, and we add more cases, if necessary. Given that the goal of this work is to be useful for on-disk encryption, I don't see moving it into an extension being viable? I am somewhat doubtful that the, imo significant, complexity of the feature is worth it, but that's imo a different discussion. > > Sure, I know the code is currently calling ooenssl functions. I was > > responding to Masahiko-san's message that we might eventually merge this > > openssl code into our tree. > > No. This absolutely, positively, will not happen. There will never be > crypto functions in our core tree, because then there'd be no recourse for > people who want to use Postgres in countries with restrictions on crypto > software. It's hard enough for them that we have such code in contrib > --- but at least they can remove pgcrypto and be legal. If it's in > src/common then they're stuck Isn't that basically a problem of the past by now? Partially due to changed laws (e.g. France, which used to be a problematic case), but also because it's basically futile to have import restrictions on cryptography by now. Just about every larger project contains significant amounts of cryptographic code and it's entirely impractical to operate anything interfacing with network without some form of transport encryption. And just about all open source distribution mechanism have stopped separating out crypto code a long time ago. I however do agree that we should strive to not introduce cryptographic code into the pg source tree - nobody here seems to have even close to enough experience to maintaining / writing that. Greetings, Andres Freund
On Mon, Feb 10, 2020 at 05:57:47PM -0800, Andres Freund wrote: > Hi, > > On 2020-02-08 12:07:57 -0500, Tom Lane wrote: > > For the same reason, I don't think that an "internal key management" > > feature in the core code is ever going to be acceptable. It has to > > be an extension. (But, as long as it's an extension, whether it's > > bringing its own crypto or relying on some other extension for that > > doesn't matter from the legal standpoint.) > > I'm not convinced by that. We have optional in-core functionality that > requires external libraries, and we add more cases, if necessary. Take for example libreadline, without which our CLI is at best dysfunctional. > > > Sure, I know the code is currently calling ooenssl functions. I > > > was responding to Masahiko-san's message that we might > > > eventually merge this openssl code into our tree. > > > > No. This absolutely, positively, will not happen. There will > > never be crypto functions in our core tree, because then there'd > > be no recourse for people who want to use Postgres in countries > > with restrictions on crypto software. It's hard enough for them > > that we have such code in contrib --- but at least they can remove > > pgcrypto and be legal. If it's in src/common then they're stuck > > Isn't that basically a problem of the past by now? > > Partially due to changed laws (e.g. France, which used to be a > problematic case), It's less of a problem than it once was, but it's not exactly gone yet. https://en.wikipedia.org/wiki/Restrictions_on_the_import_of_cryptography > but also because it's basically futile to have > import restrictions on cryptography by now. Just about every larger > project contains significant amounts of cryptographic code and it's > entirely impractical to operate anything interfacing with network > without some form of transport encryption. And just about all open > source distribution mechanism have stopped separating out crypto > code a long time ago. That's true. We have access to legal counsel. Maybe it's worth asking them how best to include cryptographic functionality, "how" being the question one asks when one wants to get a positive response. > I however do agree that we should strive to not introduce > cryptographic code into the pg source tree - nobody here seems to > have even close to enough experience to maintaining / writing that. +1 for not turning ourselves into implementers of cryptographic primitives. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, 11 Feb 2020 at 10:57, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-02-08 12:07:57 -0500, Tom Lane wrote: > > For the same reason, I don't think that an "internal key management" > > feature in the core code is ever going to be acceptable. It has to > > be an extension. (But, as long as it's an extension, whether it's > > bringing its own crypto or relying on some other extension for that > > doesn't matter from the legal standpoint.) > > I'm not convinced by that. We have optional in-core functionality that > requires external libraries, and we add more cases, if necessary. Given > that the goal of this work is to be useful for on-disk encryption, I > don't see moving it into an extension being viable? As far as I researched it is significantly hard to implement transparent data encryption without introducing into core. Adding a hook to the point where flushing data to the disk for encryption, compression and tracking dirty blocks has ever been proposed but it has been rejected every time. > > I am somewhat doubtful that the, imo significant, complexity of the > feature is worth it, but that's imo a different discussion. > > > > > Sure, I know the code is currently calling ooenssl functions. I was > > > responding to Masahiko-san's message that we might eventually merge this > > > openssl code into our tree. > > > > No. This absolutely, positively, will not happen. There will never be > > crypto functions in our core tree, because then there'd be no recourse for > > people who want to use Postgres in countries with restrictions on crypto > > software. It's hard enough for them that we have such code in contrib > > --- but at least they can remove pgcrypto and be legal. If it's in > > src/common then they're stuck > > Isn't that basically a problem of the past by now? Partially due to > changed laws (e.g. France, which used to be a problematic case), but > also because it's basically futile to have import restrictions on > cryptography by now. Just about every larger project contains > significant amounts of cryptographic code and it's entirely impractical > to operate anything interfacing with network without some form of > transport encryption. And just about all open source distribution > mechanism have stopped separating out crypto code a long time ago. > > I however do agree that we should strive to not introduce cryptographic > code into the pg source tree It doesn't include the case where we introduce the code using openssl cryptographic function library to the core. Is that right? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 6, 2020 at 4:37 PM Cary Huang <cary.huang@highgo.ca> wrote: > Since the user does not need to know the master secret key used to cipher the data, I don't think we should expose "pg_kmgr_unwrap("xxxx")"SQL function to the user at all. > The wrapped key "xxxx" is stored in control data and it is possible to obtain by malicious user and steal the key by runningSELECT pg_kmgr_unwrap("xxxx"). > Even the user is righteous, it may not be very straightforward for that user to obtain the wrapped key "xxxx" to use inthe unwrap function. I agree. > so instead of: > ------------------ > INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx')); > SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('xxxxx')) FROM tbl; > > it would become: > ------------------ > INSERT INTO tbl VALUES (pg_encrypt('user data', 'cluster_pass_phrase'); > SELECT pg_decrypt(secret_column, 'cluster_pass_phrase') FROM tbl; The second one is certainly better than the first one, as it prevents the key from being stolen. It's still pretty bad, though, because the supposedly-secret passphrase will end up in the server log. I have a hard time believing that this feature as currently proposed is worth anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 6, 2020 at 9:19 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > This feature protects data from disk thefts but cannot protect data > from attackers who are able to access PostgreSQL server. In this > design application side still is responsible for managing the wrapped > secret in order to protect it from attackers. This is the same as when > we use pgcrypto now. The difference is that data is safe even if > attackers steal the wrapped secret and the disk. The data cannot be > decrypted either without the passphrase which can be stored to other > key management systems or without accessing postgres server. IOW for > example, attackers can get the data if they get the wrapped secret > managed by application side then run pg_kmgr_unwrap() to get the > secret and then steal the disk. If you only care about protecting against the theft of the disk, you might as well just encrypt the whole filesystem, which will probably perform better and probably be a lot harder to break since you won't have short encrypted strings but instead large encrypted blocks of data. Moreover, I think a lot of people who are interested in these kinds of features are hoping for more than just protecting against the theft of the disk. While some people may be hoping for too much in this area, setting your sights only on encryption at rest seems like a fairly low bar. It also doesn't seem very likely to actually provide any security. You're talking about sending the encryption key in the query string, which means that there's a good chance it's going to end up in a log file someplace. One way that could happen is if the user has configured log_statement=all or log_min_duration_statement, but it could also happen any time the query throws an error. In theory, you might arrange for the log messages to be sent to another server that is protected by separate layers of security, but a lot of people are going to just log locally. And, even if you do have a separate server, do you really want to have the logfile over there be full of passwords? I know I can be awfully negative some times, but that it seems like a weakness so serious as to make this whole thing effectively useless. One way to plug this hole is to use new protocol messages for key exchanges. For example, suppose that after authentication is complete, you can send the server a new protocol message: KeyPassphrase <key-name> <passphrase>. The server stores the passphrase in backend-private memory and returns ReadyForQuery, and does not log the message payload anywhere. Now you do this: INSERT INTO tbl VALUES (pg_encrypt('user data', 'key-name'); SELECT pg_decrypt(secret_column, 'key-name') FROM tbl; If the passphrase for the named key has not been loaded into the current session's memory, this produces an error; otherwise, it looks up the passphrase and uses it to do the decryption. Now the passphrase never gets logged anywhere, and, also, the user can't persuade the server to provide it with the encryption key, because there's no SQL-level function to access that data. We could take it a step further: suppose that encryption is a column property, and the value of the property is a key name. If the user hasn't sent a KeyPassphrase message with the relevant key name, attempts to access that column just error out. If they have, then the server just does the encryption and decryption automatically. Now the user can just do: INSERT INTO tbl VALUES ('user data'); SELECT secret_column FROM tbl; It's a huge benefit if the SQL doesn't need to be changed. All that an application needs to do in order to use encryption in this scenario is use PQsetKeyPassphrase() or whatever before doing whatever else they want to do. Even with these changes, the security of this whole approach can be criticized on the basis that a good amount of information about the data can be inferred without decrypting anything. You can tell which encrypted values are long and which are short. If someone builds an index on the column, you can tell the order of all the encrypted values even though you may not know what the actual values are. Those could well be meaningful information leaks, but I think such a system might still be of use for certain purposes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 6, 2020 at 9:19 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> This feature protects data from disk thefts but cannot protect data
> from attackers who are able to access PostgreSQL server. In this
> design application side still is responsible for managing the wrapped
> secret in order to protect it from attackers. This is the same as when
> we use pgcrypto now. The difference is that data is safe even if
> attackers steal the wrapped secret and the disk. The data cannot be
> decrypted either without the passphrase which can be stored to other
> key management systems or without accessing postgres server. IOW for
> example, attackers can get the data if they get the wrapped secret
> managed by application side then run pg_kmgr_unwrap() to get the
> secret and then steal the disk.
If you only care about protecting against the theft of the disk, you
might as well just encrypt the whole filesystem, which will probably
perform better and probably be a lot harder to break since you won't
have short encrypted strings but instead large encrypted blocks of
data. Moreover, I think a lot of people who are interested in these
kinds of features are hoping for more than just protecting against the
theft of the disk. While some people may be hoping for too much in
this area, setting your sights only on encryption at rest seems like a
fairly low bar.
It also doesn't seem very likely to actually provide any security.
You're talking about sending the encryption key in the query string,
which means that there's a good chance it's going to end up in a log
file someplace. One way that could happen is if the user has
configured log_statement=all or log_min_duration_statement, but it
could also happen any time the query throws an error. In theory, you
might arrange for the log messages to be sent to another server that
is protected by separate layers of security, but a lot of people are
going to just log locally. And, even if you do have a separate server,
do you really want to have the logfile over there be full of
passwords? I know I can be awfully negative some times, but that it
seems like a weakness so serious as to make this whole thing
effectively useless.
One way to plug this hole is to use new protocol messages for key
exchanges. For example, suppose that after authentication is complete,
you can send the server a new protocol message: KeyPassphrase
<key-name> <passphrase>. The server stores the passphrase in
backend-private memory and returns ReadyForQuery, and does not log the
message payload anywhere. Now you do this:
INSERT INTO tbl VALUES (pg_encrypt('user data', 'key-name');
SELECT pg_decrypt(secret_column, 'key-name') FROM tbl;
If the passphrase for the named key has not been loaded into the
current session's memory, this produces an error; otherwise, it looks
up the passphrase and uses it to do the decryption. Now the passphrase
never gets logged anywhere, and, also, the user can't persuade the
server to provide it with the encryption key, because there's no
SQL-level function to access that data.
We could take it a step further: suppose that encryption is a column
property, and the value of the property is a key name. If the user
hasn't sent a KeyPassphrase message with the relevant key name,
attempts to access that column just error out. If they have, then the
server just does the encryption and decryption automatically. Now the
user can just do:
INSERT INTO tbl VALUES ('user data');
SELECT secret_column FROM tbl;
It's a huge benefit if the SQL doesn't need to be changed. All that an
application needs to do in order to use encryption in this scenario is
use PQsetKeyPassphrase() or whatever before doing whatever else they
want to do.
Even with these changes, the security of this whole approach can be
criticized on the basis that a good amount of information about the
data can be inferred without decrypting anything. You can tell which
encrypted values are long and which are short. If someone builds an
index on the column, you can tell the order of all the encrypted
values even though you may not know what the actual values are. Those
could well be meaningful information leaks, but I think such a system
might still be of use for certain purposes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> Isn't that basically a problem of the past by now? Partially due to
> changed laws (e.g. France, which used to be a problematic case), but
> also because it's basically futile to have import restrictions on
> cryptography by now. Just about every larger project contains
> significant amounts of cryptographic code and it's entirely impractical
> to operate anything interfacing with network without some form of
> transport encryption. And just about all open source distribution
> mechanism have stopped separating out crypto code a long time ago.
Australia passed some stunningly backwards crypto laws only quite recently. The Defense Trade Control Act (DCTA) imposes restrictions not only on exporting crypto software, but even on teaching about cryptography without a permit. While supposedly restricted to military items and software, it's rather broad and unclear how that is determined. It's one of those "written broadly, applied selectively, trust us to be nice" laws, because they're NEVER abused, right? See https://www.defence.gov.au/ExportControls/Cryptography.asp .
More recently we passed another idiotic "did you even bother to listen at all to the people who explained this to you" law called the Telecommunications (Assistance and Access) Act. This allows the Government to order companies/organisations to permit "lawful access" to encrypted communication, including end-to-end encrypted communications. It doesn't legislatively order the creation of backdoors, it just legislates that companies must be able to add them on demand, so ... um, it legislates backdoors. The law was drafted quickly, with little consultation, and rammed through Parliament during Christmas with the usual "but the Terrorists" handwaving. (Nevermind that real world terrorist organisations are communicating mainly through videogames chats and other innocuous places, not relying on strong crypto.) The law is already being abused to attack journalists. It has some nasty provisions about what Australia may order employees of a company to do as well, but thankfully the politicians who drafted those provisions did not appear to understand things like revision control or code review, so their real world threat is minimal.
My point? In practice, much of what we do with crypto is subject to a variety of legal questions in many legal jurisdictions. Not much is outright illegal in most places, but it's definitely complicated. I do not participate in anything I know to be illegal or reasonably suspect to be illegal - but with the kind of incredibly broad laws we have now on the books in so many places, talking about the Ceasar Cipher / rot13 could be a violation of someone's crypto law somewhere if you get the wrong judge and the wrong situation.
The main change has been that it got simpler in the US, so enough developers stopped caring. The US's Dep't of Commerce export restrictions were weakened and the set of countries they applied to were narrowed, allowing US companies and citizens the ability to participate in projects containing non-crippled crypto.
There are still plenty of places where any sort of non-backdoored crypto is entirely illegal, we just say "that's your problem" to people in those places.
I wholly support this approach. Pretty much everything is illegal somewhere. Patents are pain enough already.
On Sat, 15 Feb 2020 at 01:00, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Feb 6, 2020 at 9:19 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > This feature protects data from disk thefts but cannot protect data > > from attackers who are able to access PostgreSQL server. In this > > design application side still is responsible for managing the wrapped > > secret in order to protect it from attackers. This is the same as when > > we use pgcrypto now. The difference is that data is safe even if > > attackers steal the wrapped secret and the disk. The data cannot be > > decrypted either without the passphrase which can be stored to other > > key management systems or without accessing postgres server. IOW for > > example, attackers can get the data if they get the wrapped secret > > managed by application side then run pg_kmgr_unwrap() to get the > > secret and then steal the disk. > > If you only care about protecting against the theft of the disk, you > might as well just encrypt the whole filesystem, which will probably > perform better and probably be a lot harder to break since you won't > have short encrypted strings but instead large encrypted blocks of > data. Moreover, I think a lot of people who are interested in these > kinds of features are hoping for more than just protecting against the > theft of the disk. While some people may be hoping for too much in > this area, setting your sights only on encryption at rest seems like a > fairly low bar. This feature also protects data from reading database files directly. And it's also good that it's independent of platforms. To be clear, let me summarize scenarios where we will be able to protect data and won't. We can put the cluster key which will be obtained by cluster_passphrase_command into another component in the system, for instance into KMS ideally. The user key is wrapped and saved to an application server or somewhere it can obtain promptly. PostgreSQL server has the master key in the disk which is wrapped by the cluster key along with the user data encrypted by the user key. While running PostgreSQL server, user can unwrap the user key using by pg_unwrap_key to get the user key. Given that attackers stole the database disk that includes encrypted user data and the wrapped master key, what they need to complete their attack is (1) the wrapped user key and an access to PostgreSQL server, (2) the cluster key and the wrapped user key or (3) the master key and the wrapped user key. They cannot get user data with only one of those secrets: the cluster key, the master key and the wrapped user key. In case (1), PostgreSQL needs to be running and they need to be able to access a PostgreSQL server, which may require a password, to execute pg_unwrap_key with the wrapped user key they stole. In case (2), since the wrapped user key is stored in the application server and it will be likely to be accessible without special privilege it may be easy for attackers to get it. However in addition, they need to attack KMS to get the cluster key. Finally in case (3), again, they may be able to steal the wrapped user key. But they need also to be able to login to OS in an unauthorized way and then illegally see the PostgreSQL shared buffer. ISTM these all cases will be not easy for attackers. > > It also doesn't seem very likely to actually provide any security. > You're talking about sending the encryption key in the query string, > which means that there's a good chance it's going to end up in a log > file someplace. One way that could happen is if the user has > configured log_statement=all or log_min_duration_statement, but it > could also happen any time the query throws an error. In theory, you > might arrange for the log messages to be sent to another server that > is protected by separate layers of security, but a lot of people are > going to just log locally. And, even if you do have a separate server, > do you really want to have the logfile over there be full of > passwords? I know I can be awfully negative some times, but that it > seems like a weakness so serious as to make this whole thing > effectively useless. > Since the user key could be logged to server logs attackers will be able to get user data by stealing only the database disk if the server logs locally. But I personally think that it's not a serious problem that will make this feature meaningless, depending on user cases. User will be likely to have user key per users or one key for one instance. So for example, in the case where the system doesn't add new users during running, user can wrap the user key before the system starting service and therefore user will need pay attention only at that time. If user can take care of that we can accept such restriction. > One way to plug this hole is to use new protocol messages for key > exchanges. For example, suppose that after authentication is complete, > you can send the server a new protocol message: KeyPassphrase > <key-name> <passphrase>. The server stores the passphrase in > backend-private memory and returns ReadyForQuery, and does not log the > message payload anywhere. Now you do this: > > INSERT INTO tbl VALUES (pg_encrypt('user data', 'key-name'); > SELECT pg_decrypt(secret_column, 'key-name') FROM tbl; > > If the passphrase for the named key has not been loaded into the > current session's memory, this produces an error; otherwise, it looks > up the passphrase and uses it to do the decryption. Now the passphrase > never gets logged anywhere, and, also, the user can't persuade the > server to provide it with the encryption key, because there's no > SQL-level function to access that data. > > We could take it a step further: suppose that encryption is a column > property, and the value of the property is a key name. If the user > hasn't sent a KeyPassphrase message with the relevant key name, > attempts to access that column just error out. If they have, then the > server just does the encryption and decryption automatically. Now the > user can just do: > > INSERT INTO tbl VALUES ('user data'); > SELECT secret_column FROM tbl; > > It's a huge benefit if the SQL doesn't need to be changed. All that an > application needs to do in order to use encryption in this scenario is > use PQsetKeyPassphrase() or whatever before doing whatever else they > want to do. Your idea seems good. I think the point from development perspective is whether it's worth to have such a dedicated feature in order to provide the transparent encryption feature using pgcrypto. That is, looking at this feature as a building block of transparent data at rest encryption such changes might be overkill. Generally encrypting data using pgcrypto is not good in terms of performance. In transparent data encryption, PostgreSQL would be able to encrypt data by the key stored inside its database. As I mentioned above if this feature can cover a certain use case, it might be enough as is. > > Even with these changes, the security of this whole approach can be > criticized on the basis that a good amount of information about the > data can be inferred without decrypting anything. You can tell which > encrypted values are long and which are short. If someone builds an > index on the column, you can tell the order of all the encrypted > values even though you may not know what the actual values are. Those > could well be meaningful information leaks, but I think such a system > might still be of use for certain purposes. Yeah, that's another reason why I personally hesitate to use pgcrypto as a transparent data encryption feature. It's still under discussion that what data needs to be encrypted by the transparent data at rest encryption but it would be much better than pgcrypto's one from that perspective. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 19 Feb 2020 at 09:29, Cary Huang <cary.huang@highgo.ca> wrote: > > Hi > > I have tried the attached kms_v3 patch and have some comments: > > 1) In the comments, I think you meant hmac + iv + encrypted data instead of iv + hmac + encrypted data? > > ---> in kmgr_wrap_key( ): > + /* > + * Assemble the wrapped key. The order of the wrapped key is iv, hmac and > + * encrypted data. > + */ Right, will fix. > > > 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular cipher context init will both call ossl_aes256_encrypt_initto initialise context for encryption and key wrapping. In ossl_aes256_encrypt_init, the cipher methodalways initialises to aes-256-cbc, which is ok for keywrap because under CBC block cipher mode, we only had to supplyone unique IV as initial value. But for actual WAL and buffer encryption that will come in later, I think the discussionis to use CTR block cipher mode, which requires one unique IV for each block, and the sequence id from WAL andbuffer can be used to derive unique IV for each block for better security? I think it would be better to allow callerto decide which EVP_CIPHER to initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others? > > +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key) > +{ > + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL)) > + return false; > + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN)) > + return false; > + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL)) > + return false; > + > + /* > + * Always enable padding. We don't need to check the return > + * value as EVP_CIPHER_CTX_set_padding always returns 1. > + */ > + EVP_CIPHER_CTX_set_padding(ctx, 1); > + > + return true; > +} It seems good. We can expand it to make caller decide the block cipher mode of operation and key length. I removed such code from the previous patch to make it simple since currently we support only AES-256 CBC. > > 3) Following up point 2), I think we should enhance the enum to include not only the Encryption algorithm and key size,but also the block cipher mode as well because having all 3 pieces of information can describe exactly how KMS is performingthe encryption and decryption. So when we call "ossl_aes256_encrypt_init", we can include the new enum as inputparameter and it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or EVP_aes_256_ctr() for differentpurposes (key wrapping, or WAL encryption..etc). > > ---> kmgr.h > +/* Value of key_management_cipher */ > +enum > +{ > + KMGR_CIPHER_OFF = 0, > + KMGR_CIPHER_AES256 > +}; > + > > so it would become > +enum > +{ > + KMGR_CIPHER_OFF = 0, > + KMGR_CIPHER_AES256_CBC = 1, > + KMGR_CIPHER_AES256_CTR = 2 > +}; > > If you agree with this change, several other places will need to be changed as well, such as "kmgr_cipher_string", "kmgr_cipher_value"and the initdb code.... KMGR_CIPHER_XXX is relevant with cipher mode used by KMS and KMS would still use AES256 CBC even if we had TDE which would use AES256 CTR. After more thoughts, I think currently we can specify -e aes-256 to initdb but actually this is not necessary. When --cluster-passphrase-command specified, we enable the internal KMS and always use AES256 CBC. Something like -e option will be needed after supporting TDE with several cipher options. Thoughts? > > 4) the pg_wrap_key and pg_unwrap_key SQL functions defined in kmgr.c > I tried these new SQL functions and found that the pg_unwrap_key will produce the original key with 4 bytes less. Thisis because the result length is not set long enough to accommodate the 4 byte VARHDRSZ header used by the multi-typevariable. > > the len variable in SET_VARSIZE(res, len) should include also the variable header VARHDRSZ. Now it is 4 byte short so itwill produce incomplete output. > > ---> pg_unwrap_key function in kmgr.c > + if (!kmgr_unwrap_key(UnwrapCtx, (uint8 *) VARDATA_ANY(data), datalen, > + (uint8 *) VARDATA(res), &len)) > + ereport(ERROR, > + (errmsg("could not unwrap the given secret"))); > + > + /* > + * The size of unwrapped key can be smaller than the size estimated > + * before unwrapping since the padding is removed during unwrapping. > + */ > + SET_VARSIZE(res, len); > + PG_RETURN_BYTEA_P(res); > > I am only testing their functionalities with random key as input data. It is currently not possible for a user to obtainthe wrapped key from the server in order to use these wrap/unwrap functions. I personally don't think it is a goodidea to expose these functions to user Thank you for testing. I'm going to include regression tests and documentation in the next version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 20 Feb 2020 at 16:16, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 19 Feb 2020 at 09:29, Cary Huang <cary.huang@highgo.ca> wrote: > > > > Hi > > > > I have tried the attached kms_v3 patch and have some comments: > > > > 1) In the comments, I think you meant hmac + iv + encrypted data instead of iv + hmac + encrypted data? > > > > ---> in kmgr_wrap_key( ): > > + /* > > + * Assemble the wrapped key. The order of the wrapped key is iv, hmac and > > + * encrypted data. > > + */ > > Right, will fix. > > > > > > > 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular cipher context init will both call ossl_aes256_encrypt_initto initialise context for encryption and key wrapping. In ossl_aes256_encrypt_init, the cipher methodalways initialises to aes-256-cbc, which is ok for keywrap because under CBC block cipher mode, we only had to supplyone unique IV as initial value. But for actual WAL and buffer encryption that will come in later, I think the discussionis to use CTR block cipher mode, which requires one unique IV for each block, and the sequence id from WAL andbuffer can be used to derive unique IV for each block for better security? I think it would be better to allow callerto decide which EVP_CIPHER to initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others? > > > > +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key) > > +{ > > + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL)) > > + return false; > > + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN)) > > + return false; > > + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL)) > > + return false; > > + > > + /* > > + * Always enable padding. We don't need to check the return > > + * value as EVP_CIPHER_CTX_set_padding always returns 1. > > + */ > > + EVP_CIPHER_CTX_set_padding(ctx, 1); > > + > > + return true; > > +} > > It seems good. We can expand it to make caller decide the block cipher > mode of operation and key length. I removed such code from the > previous patch to make it simple since currently we support only > AES-256 CBC. > > > > > 3) Following up point 2), I think we should enhance the enum to include not only the Encryption algorithm and key size,but also the block cipher mode as well because having all 3 pieces of information can describe exactly how KMS is performingthe encryption and decryption. So when we call "ossl_aes256_encrypt_init", we can include the new enum as inputparameter and it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or EVP_aes_256_ctr() for differentpurposes (key wrapping, or WAL encryption..etc). > > > > ---> kmgr.h > > +/* Value of key_management_cipher */ > > +enum > > +{ > > + KMGR_CIPHER_OFF = 0, > > + KMGR_CIPHER_AES256 > > +}; > > + > > > > so it would become > > +enum > > +{ > > + KMGR_CIPHER_OFF = 0, > > + KMGR_CIPHER_AES256_CBC = 1, > > + KMGR_CIPHER_AES256_CTR = 2 > > +}; > > > > If you agree with this change, several other places will need to be changed as well, such as "kmgr_cipher_string", "kmgr_cipher_value"and the initdb code.... > > KMGR_CIPHER_XXX is relevant with cipher mode used by KMS and KMS would > still use AES256 CBC even if we had TDE which would use AES256 CTR. > > After more thoughts, I think currently we can specify -e aes-256 to > initdb but actually this is not necessary. When > --cluster-passphrase-command specified, we enable the internal KMS and > always use AES256 CBC. Something like -e option will be needed after > supporting TDE with several cipher options. Thoughts? > > > > > 4) the pg_wrap_key and pg_unwrap_key SQL functions defined in kmgr.c > > I tried these new SQL functions and found that the pg_unwrap_key will produce the original key with 4 bytes less. Thisis because the result length is not set long enough to accommodate the 4 byte VARHDRSZ header used by the multi-typevariable. > > > > the len variable in SET_VARSIZE(res, len) should include also the variable header VARHDRSZ. Now it is 4 byte short soit will produce incomplete output. > > > > ---> pg_unwrap_key function in kmgr.c > > + if (!kmgr_unwrap_key(UnwrapCtx, (uint8 *) VARDATA_ANY(data), datalen, > > + (uint8 *) VARDATA(res), &len)) > > + ereport(ERROR, > > + (errmsg("could not unwrap the given secret"))); > > + > > + /* > > + * The size of unwrapped key can be smaller than the size estimated > > + * before unwrapping since the padding is removed during unwrapping. > > + */ > > + SET_VARSIZE(res, len); > > + PG_RETURN_BYTEA_P(res); > > > > I am only testing their functionalities with random key as input data. It is currently not possible for a user to obtainthe wrapped key from the server in order to use these wrap/unwrap functions. I personally don't think it is a goodidea to expose these functions to user > > Thank you for testing. I'm going to include regression tests and > documentation in the next version patch. > Attached the updated version patch. In this version, I've removed -e option of initdb that was used to specify the encryption algorithm and key length like aes-256. The cipher algorithm and key length used by KMS is fixed, aes-256, so it's no longer necessary as long as we support only KMS. When we introduce transparent data encryption and we'd like to support multiple options we will have such option. Therefore, the internal KMS is enabled when PostgreSQL is built with --with-openssl and --cluster-passphrase-command is specified to initdb. The patch includes minimal doc and regression tests. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, 20 Feb 2020 at 16:16, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 19 Feb 2020 at 09:29, Cary Huang <cary.huang@highgo.ca> wrote:
> >
> > Hi
> >
> > I have tried the attached kms_v3 patch and have some comments:
> >
> > 1) In the comments, I think you meant hmac + iv + encrypted data instead of iv + hmac + encrypted data?
> >
> > ---> in kmgr_wrap_key( ):
> > + /*
> > + * Assemble the wrapped key. The order of the wrapped key is iv, hmac and
> > + * encrypted data.
> > + */
>
> Right, will fix.
>
> >
> >
> > 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular cipher context init will both call ossl_aes256_encrypt_init to initialise context for encryption and key wrapping. In ossl_aes256_encrypt_init, the cipher method always initialises to aes-256-cbc, which is ok for keywrap because under CBC block cipher mode, we only had to supply one unique IV as initial value. But for actual WAL and buffer encryption that will come in later, I think the discussion is to use CTR block cipher mode, which requires one unique IV for each block, and the sequence id from WAL and buffer can be used to derive unique IV for each block for better security? I think it would be better to allow caller to decide which EVP_CIPHER to initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others?
> >
> > +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key)
> > +{
> > + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL))
> > + return false;
> > + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN))
> > + return false;
> > + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL))
> > + return false;
> > +
> > + /*
> > + * Always enable padding. We don't need to check the return
> > + * value as EVP_CIPHER_CTX_set_padding always returns 1.
> > + */
> > + EVP_CIPHER_CTX_set_padding(ctx, 1);
> > +
> > + return true;
> > +}
>
> It seems good. We can expand it to make caller decide the block cipher
> mode of operation and key length. I removed such code from the
> previous patch to make it simple since currently we support only
> AES-256 CBC.
>
> >
> > 3) Following up point 2), I think we should enhance the enum to include not only the Encryption algorithm and key size, but also the block cipher mode as well because having all 3 pieces of information can describe exactly how KMS is performing the encryption and decryption. So when we call "ossl_aes256_encrypt_init", we can include the new enum as input parameter and it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or EVP_aes_256_ctr() for different purposes (key wrapping, or WAL encryption..etc).
> >
> > ---> kmgr.h
> > +/* Value of key_management_cipher */
> > +enum
> > +{
> > + KMGR_CIPHER_OFF = 0,
> > + KMGR_CIPHER_AES256
> > +};
> > +
> >
> > so it would become
> > +enum
> > +{
> > + KMGR_CIPHER_OFF = 0,
> > + KMGR_CIPHER_AES256_CBC = 1,
> > + KMGR_CIPHER_AES256_CTR = 2
> > +};
> >
> > If you agree with this change, several other places will need to be changed as well, such as "kmgr_cipher_string", "kmgr_cipher_value" and the initdb code....
>
> KMGR_CIPHER_XXX is relevant with cipher mode used by KMS and KMS would
> still use AES256 CBC even if we had TDE which would use AES256 CTR.
>
> After more thoughts, I think currently we can specify -e aes-256 to
> initdb but actually this is not necessary. When
> --cluster-passphrase-command specified, we enable the internal KMS and
> always use AES256 CBC. Something like -e option will be needed after
> supporting TDE with several cipher options. Thoughts?
>
> >
> > 4) the pg_wrap_key and pg_unwrap_key SQL functions defined in kmgr.c
> > I tried these new SQL functions and found that the pg_unwrap_key will produce the original key with 4 bytes less. This is because the result length is not set long enough to accommodate the 4 byte VARHDRSZ header used by the multi-type variable.
> >
> > the len variable in SET_VARSIZE(res, len) should include also the variable header VARHDRSZ. Now it is 4 byte short so it will produce incomplete output.
> >
> > ---> pg_unwrap_key function in kmgr.c
> > + if (!kmgr_unwrap_key(UnwrapCtx, (uint8 *) VARDATA_ANY(data), datalen,
> > + (uint8 *) VARDATA(res), &len))
> > + ereport(ERROR,
> > + (errmsg("could not unwrap the given secret")));
> > +
> > + /*
> > + * The size of unwrapped key can be smaller than the size estimated
> > + * before unwrapping since the padding is removed during unwrapping.
> > + */
> > + SET_VARSIZE(res, len);
> > + PG_RETURN_BYTEA_P(res);
> >
> > I am only testing their functionalities with random key as input data. It is currently not possible for a user to obtain the wrapped key from the server in order to use these wrap/unwrap functions. I personally don't think it is a good idea to expose these functions to user
>
> Thank you for testing. I'm going to include regression tests and
> documentation in the next version patch.
>
Attached the updated version patch. In this version, I've removed -e
option of initdb that was used to specify the encryption algorithm and
key length like aes-256. The cipher algorithm and key length used by
KMS is fixed, aes-256, so it's no longer necessary as long as we
support only KMS. When we introduce transparent data encryption and
we'd like to support multiple options we will have such option.
Therefore, the internal KMS is enabled when PostgreSQL is built with
--with-openssl and --cluster-passphrase-command is specified to
initdb. The patch includes minimal doc and regression tests.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
(1)
HiI would like to share with you a front end patch based on kms_v4.patch that you have shared, called "kms_v4_fe.patch".The patch integrates front end tool pg_waldump with the KMSv4 and obtain encryption and decryption cipher contexts from the KMS backend. These cipher contexts can then be used in subsequent encryption and decryption operations provided by cipher.h when TDE is enabled. I added two common functions in your kmgr_utils that other front end tools affected by TDE can also use to obtain the cipher contexts. Do let me know if this is how you would envision KMS APIs to be used by a front end.cheers---- On Mon, 24 Feb 2020 17:55:09 -0800 Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote ----On Thu, 20 Feb 2020 at 16:16, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 19 Feb 2020 at 09:29, Cary Huang <cary.huang@highgo.ca> wrote:
> >
> > Hi
> >
> > I have tried the attached kms_v3 patch and have some comments:
> >
> > 1) In the comments, I think you meant hmac + iv + encrypted data instead of iv + hmac + encrypted data?
> >
> > ---> in kmgr_wrap_key( ):
> > + /*
> > + * Assemble the wrapped key. The order of the wrapped key is iv, hmac and
> > + * encrypted data.
> > + */
>
> Right, will fix.
>
> >
> >
> > 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular cipher context init will both call ossl_aes256_encrypt_init to initialise context for encryption and key wrapping. In ossl_aes256_encrypt_init, the cipher method always initialises to aes-256-cbc, which is ok for keywrap because under CBC block cipher mode, we only had to supply one unique IV as initial value. But for actual WAL and buffer encryption that will come in later, I think the discussion is to use CTR block cipher mode, which requires one unique IV for each block, and the sequence id from WAL and buffer can be used to derive unique IV for each block for better security? I think it would be better to allow caller to decide which EVP_CIPHER to initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others?
> >
> > +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key)
> > +{
> > + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL))
> > + return false;
> > + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN))
> > + return false;
> > + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL))
> > + return false;
> > +
> > + /*
> > + * Always enable padding. We don't need to check the return
> > + * value as EVP_CIPHER_CTX_set_padding always returns 1.
> > + */
> > + EVP_CIPHER_CTX_set_padding(ctx, 1);
> > +
> > + return true;
> > +}
>
> It seems good. We can expand it to make caller decide the block cipher
> mode of operation and key length. I removed such code from the
> previous patch to make it simple since currently we support only
> AES-256 CBC.
>
> >
> > 3) Following up point 2), I think we should enhance the enum to include not only the Encryption algorithm and key size, but also the block cipher mode as well because having all 3 pieces of information can describe exactly how KMS is performing the encryption and decryption. So when we call "ossl_aes256_encrypt_init", we can include the new enum as input parameter and it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or EVP_aes_256_ctr() for different purposes (key wrapping, or WAL encryption..etc).
> >
> > ---> kmgr.h
> > +/* Value of key_management_cipher */
> > +enum
> > +{
> > + KMGR_CIPHER_OFF = 0,
> > + KMGR_CIPHER_AES256
> > +};
> > +
> >
> > so it would become
> > +enum
> > +{
> > + KMGR_CIPHER_OFF = 0,
> > + KMGR_CIPHER_AES256_CBC = 1,
> > + KMGR_CIPHER_AES256_CTR = 2
> > +};
> >
> > If you agree with this change, several other places will need to be changed as well, such as "kmgr_cipher_string", "kmgr_cipher_value" and the initdb code....
>
> KMGR_CIPHER_XXX is relevant with cipher mode used by KMS and KMS would
> still use AES256 CBC even if we had TDE which would use AES256 CTR.
>
> After more thoughts, I think currently we can specify -e aes-256 to
> initdb but actually this is not necessary. When
> --cluster-passphrase-command specified, we enable the internal KMS and
> always use AES256 CBC. Something like -e option will be needed after
> supporting TDE with several cipher options. Thoughts?
>
> >
> > 4) the pg_wrap_key and pg_unwrap_key SQL functions defined in kmgr.c
> > I tried these new SQL functions and found that the pg_unwrap_key will produce the original key with 4 bytes less. This is because the result length is not set long enough to accommodate the 4 byte VARHDRSZ header used by the multi-type variable.
> >
> > the len variable in SET_VARSIZE(res, len) should include also the variable header VARHDRSZ. Now it is 4 byte short so it will produce incomplete output.
> >
> > ---> pg_unwrap_key function in kmgr.c
> > + if (!kmgr_unwrap_key(UnwrapCtx, (uint8 *) VARDATA_ANY(data), datalen,
> > + (uint8 *) VARDATA(res), &len))
> > + ereport(ERROR,
> > + (errmsg("could not unwrap the given secret")));
> > +
> > + /*
> > + * The size of unwrapped key can be smaller than the size estimated
> > + * before unwrapping since the padding is removed during unwrapping.
> > + */
> > + SET_VARSIZE(res, len);
> > + PG_RETURN_BYTEA_P(res);
> >
> > I am only testing their functionalities with random key as input data. It is currently not possible for a user to obtain the wrapped key from the server in order to use these wrap/unwrap functions. I personally don't think it is a good idea to expose these functions to user
>
> Thank you for testing. I'm going to include regression tests and
> documentation in the next version patch.
>
Attached the updated version patch. In this version, I've removed -e
option of initdb that was used to specify the encryption algorithm and
key length like aes-256. The cipher algorithm and key length used by
KMS is fixed, aes-256, so it's no longer necessary as long as we
support only KMS. When we introduce transparent data encryption and
we'd like to support multiple options we will have such option.
Therefore, the internal KMS is enabled when PostgreSQL is built with
--with-openssl and --cluster-passphrase-command is specified to
initdb. The patch includes minimal doc and regression tests.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 3 Mar 2020 at 08:49, Cary Huang <cary.huang@highgo.ca> wrote: > > Hi Masahiko > Please see below my comments regarding kms_v4.patch that you have shared earlier. Thank you for reviewing this patch! > > (1) > There is a discrepancy between the documentation and the actual function definition. For example in func.sgml, it statespg_wrap_key takes argument in text data type but in pg_proc.dat and kmgr.c, the function is defined to take argumentin bytea data type. > > ===> doc/src/sgml/func.sgml > + <entry> > + <indexterm> > + <primary>pg_wrap_key</primary> > + </indexterm> > + <literal><function>pg_wrap_key(<parameter>data</parameter> <type>text</type>)</function></literal> > + </entry> > + <entry> > + <type>bytea</type> > + </entry> > > ===> src/include/catalog/pg_proc.dat > +{ oid => '8201', descr => 'wrap the given secret', > + proname => 'pg_wrap_key', > + provolatile => 'v', prorettype => 'bytea', > + proargtypes => 'bytea', prosrc => 'pg_wrap_key' }, > > ===> src/backend/crypto/kmgr.c > +Datum > +pg_wrap_key(PG_FUNCTION_ARGS) > +{ > + bytea *data = PG_GETARG_BYTEA_PP(0); > + bytea *res; > + int datalen; > + int reslen; > + int len; Fixed. > + > > (2) > I think the documentation needs to make clear the difference between a key and a user secret. Some parts of it are tryingto mix the 2 terms together when they shouldn't. To my understanding, a key is expressed as binary data that is actuallyused in the encryption and decryption operations. A user secret, on the other hand, is more like a passphrase, expressedas string, that is used to derive a encryption key for subsequent encryption operations. > > If I just look at the function names "pg_wrap_key" and "pg_unwrap_key", I immediately feel that these functions are usedto encapsulate and uncover cryptographic key materials. The input and output to these 2 functions should all be key materialsexpressed in bytea data type. In previous email discussion, there was only one key material in discussion, calledthe master key (generated during initdb and stored in cluster), and this somehow automatically makes people (includingmyself) associate pg_wrap_key and pg_unwrap_key to be used on this master key and raise a bunch of security concernsaround it. > > Having read the documentation provided by the patch describing pg_wrap_key and pg_unwrap_key, they seem to serve anotherpurpose. It states that pg_wrap_key is used to encrypt a user-supplied secret (text) with the master key and producea wrapped secret while pg_unwrap_key does the opposite, so we can prevent user from having to enter the secret inplaintext when using pgcrypto functions. > > This use case is ok but user secret is not really a cryptographic key material is it? It is more similar to a secret passphraseexpressed in text and pg_wrap_key is merely used to turn the passphrase into a wrapped passphrase expressed inbytea. > > If I give pg_wrap_key with a real key material expressed in bytea, I will not be able to unwrap it properly: > > Select pg_unwrap_key (pg_wrap_key(E'\\x2b073713476f9f0e761e45c64be8175424d2742e7d53df90b6416f1d84168e8a') ); > > pg_unwrap_key > ---------------------------------------------- > +\x077\x13Go\x0Ev\x1EEK\x17T$t.}SߐAo\x1D\x16 > (1 row) > > Maybe we should rename these SQL functions like this to prevent confusion. > => pg_wrap_secret (takes a text, returns a bytea) > => pg_unwrap_secret(takes a bytea, returns a text) Agreed to change argument types. User secret will be normally text password as we do with pgcrypto. So probably these functions can cover most cases. I changed the function name to pg_wrap and pg_unwrap because these functions generically wrap and unwrap the given data. > > if there is a use case for users to encapsulate key materials then we can define 2 more wrap functions for these, if thereis no use case, don't bother: > => pg_wrap_key (takes a bytea, returns a bytea) > => pg_unwrap_key (takes a bytea, returns a bytea) +1. Like pgcrypto has both pgp_sym_encrypt_bytea and pgp_sym_encrypt, maybe we can have such functions. > > (3) > I would rephrase "chapter 32: Encryption Key Management Part III. Server Administration" documentation like this: > > ===================== > PostgreSQL supports Encryption Key Management System, which is enabled when PostgreSQL is built with --with-openssl optionand cluster_passphrase_command is specified during initdb process. The user-provided cluster_passphrase_command inpostgresql.conf and the cluster_passphrase_command specified during initdb process must match, otherwise, the databasecluster will not start up. > > The user-provided cluster passphrase is derived into a Key Encryption Key (KEK), which is used to encapsulate the MasterEncryption Key generated during the initdb process. The encapsulated Master Encryption Key is stored inside the databasecluster. > > Encryption Key Management System provides several functions to allow users to use the master encryption key to wrap andunwrap their own encryption secrets during encryption and decryption operations. This feature allows users to encryptand decrypt data without knowing the actual key. > ===================== > > (4) > I would rephrase "chapter 32.2: Wrap and Unwrap user secret" documentation like this: Note that I rephrase based on (2)and uses pg_(un)wrap_secret instead. > > ===================== > Encryption key management System provides several functions described in Table 9.97, to wrap and unwrap user secrets withthe Master Encryption Key, which is uniquely and securely stored inside the database cluster. > > These functions allow user to encrypt and decrypt user data without having to provide user encryption secret in plain text.One possible use case is to use encryption key management together with pgcrypto. User wraps the user encryption secretwith pg_wrap_secret() and passes the wrapped encryption secret to the pgcrypto encryption functions. The wrapped secretcan be stored in the application server or somewhere secured and should be obtained promptly for cryptographic operationswith pgcrypto. > [same examples follow after...] > ===================== Thank you for suggesting the updated sentences. I've updated the docs based on your suggestions. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Dear Sawada-san I don't know if my environment or email system is weird, but the V5 patch file is only include simply a changed list. and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. Can you check it? Best regards. Moon. On Tue, Mar 3, 2020 at 5:58 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 3 Mar 2020 at 08:49, Cary Huang <cary.huang@highgo.ca> wrote: > > > > Hi Masahiko > > Please see below my comments regarding kms_v4.patch that you have shared earlier. > > Thank you for reviewing this patch! > > > > > (1) > > There is a discrepancy between the documentation and the actual function definition. For example in func.sgml, it statespg_wrap_key takes argument in text data type but in pg_proc.dat and kmgr.c, the function is defined to take argumentin bytea data type. > > > > ===> doc/src/sgml/func.sgml > > + <entry> > > + <indexterm> > > + <primary>pg_wrap_key</primary> > > + </indexterm> > > + <literal><function>pg_wrap_key(<parameter>data</parameter> <type>text</type>)</function></literal> > > + </entry> > > + <entry> > > + <type>bytea</type> > > + </entry> > > > > ===> src/include/catalog/pg_proc.dat > > +{ oid => '8201', descr => 'wrap the given secret', > > + proname => 'pg_wrap_key', > > + provolatile => 'v', prorettype => 'bytea', > > + proargtypes => 'bytea', prosrc => 'pg_wrap_key' }, > > > > ===> src/backend/crypto/kmgr.c > > +Datum > > +pg_wrap_key(PG_FUNCTION_ARGS) > > +{ > > + bytea *data = PG_GETARG_BYTEA_PP(0); > > + bytea *res; > > + int datalen; > > + int reslen; > > + int len; > > Fixed. > > > + > > > > (2) > > I think the documentation needs to make clear the difference between a key and a user secret. Some parts of it are tryingto mix the 2 terms together when they shouldn't. To my understanding, a key is expressed as binary data that is actuallyused in the encryption and decryption operations. A user secret, on the other hand, is more like a passphrase, expressedas string, that is used to derive a encryption key for subsequent encryption operations. > > > > If I just look at the function names "pg_wrap_key" and "pg_unwrap_key", I immediately feel that these functions are usedto encapsulate and uncover cryptographic key materials. The input and output to these 2 functions should all be key materialsexpressed in bytea data type. In previous email discussion, there was only one key material in discussion, calledthe master key (generated during initdb and stored in cluster), and this somehow automatically makes people (includingmyself) associate pg_wrap_key and pg_unwrap_key to be used on this master key and raise a bunch of security concernsaround it. > > > > Having read the documentation provided by the patch describing pg_wrap_key and pg_unwrap_key, they seem to serve anotherpurpose. It states that pg_wrap_key is used to encrypt a user-supplied secret (text) with the master key and producea wrapped secret while pg_unwrap_key does the opposite, so we can prevent user from having to enter the secret inplaintext when using pgcrypto functions. > > > > This use case is ok but user secret is not really a cryptographic key material is it? It is more similar to a secretpassphrase expressed in text and pg_wrap_key is merely used to turn the passphrase into a wrapped passphrase expressedin bytea. > > > > If I give pg_wrap_key with a real key material expressed in bytea, I will not be able to unwrap it properly: > > > > Select pg_unwrap_key (pg_wrap_key(E'\\x2b073713476f9f0e761e45c64be8175424d2742e7d53df90b6416f1d84168e8a') ); > > > > pg_unwrap_key > > ---------------------------------------------- > > +\x077\x13Go\x0Ev\x1EEK\x17T$t.}SߐAo\x1D\x16 > > (1 row) > > > > Maybe we should rename these SQL functions like this to prevent confusion. > > => pg_wrap_secret (takes a text, returns a bytea) > > => pg_unwrap_secret(takes a bytea, returns a text) > > Agreed to change argument types. User secret will be normally text > password as we do with pgcrypto. So probably these functions can cover > most cases. I changed the function name to pg_wrap and pg_unwrap > because these functions generically wrap and unwrap the given data. > > > > > if there is a use case for users to encapsulate key materials then we can define 2 more wrap functions for these, ifthere is no use case, don't bother: > > => pg_wrap_key (takes a bytea, returns a bytea) > > => pg_unwrap_key (takes a bytea, returns a bytea) > > +1. Like pgcrypto has both pgp_sym_encrypt_bytea and pgp_sym_encrypt, > maybe we can have such functions. > > > > > (3) > > I would rephrase "chapter 32: Encryption Key Management Part III. Server Administration" documentation like this: > > > > ===================== > > PostgreSQL supports Encryption Key Management System, which is enabled when PostgreSQL is built with --with-openssl optionand cluster_passphrase_command is specified during initdb process. The user-provided cluster_passphrase_command inpostgresql.conf and the cluster_passphrase_command specified during initdb process must match, otherwise, the databasecluster will not start up. > > > > The user-provided cluster passphrase is derived into a Key Encryption Key (KEK), which is used to encapsulate the MasterEncryption Key generated during the initdb process. The encapsulated Master Encryption Key is stored inside the databasecluster. > > > > Encryption Key Management System provides several functions to allow users to use the master encryption key to wrap andunwrap their own encryption secrets during encryption and decryption operations. This feature allows users to encryptand decrypt data without knowing the actual key. > > ===================== > > > > (4) > > I would rephrase "chapter 32.2: Wrap and Unwrap user secret" documentation like this: Note that I rephrase based on (2)and uses pg_(un)wrap_secret instead. > > > > ===================== > > Encryption key management System provides several functions described in Table 9.97, to wrap and unwrap user secretswith the Master Encryption Key, which is uniquely and securely stored inside the database cluster. > > > > These functions allow user to encrypt and decrypt user data without having to provide user encryption secret in plaintext. One possible use case is to use encryption key management together with pgcrypto. User wraps the user encryptionsecret with pg_wrap_secret() and passes the wrapped encryption secret to the pgcrypto encryption functions. Thewrapped secret can be stored in the application server or somewhere secured and should be obtained promptly for cryptographicoperations with pgcrypto. > > [same examples follow after...] > > ===================== > > Thank you for suggesting the updated sentences. I've updated the docs > based on your suggestions. > > Regards, > > -- > Masahiko Sawada http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > Dear Sawada-san > > I don't know if my environment or email system is weird, but the V5 > patch file is only include simply a changed list. > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > Can you check it? > Thank you! I'd attached wrong file. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Mar 6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote: > On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > > > Dear Sawada-san > > > > I don't know if my environment or email system is weird, but the V5 > > patch file is only include simply a changed list. > > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > > Can you check it? > > > > Thank you! I'd attached wrong file. Looking at this thread, I wanted to make a few comments: Everyone seems to think pgcrypto need some maintenance. Who would like to take on that task? This feature does require openssl since all the encryption/decryption happen via openssl function calls. Three are three levels of encrypt here: 1. The master key generated during initdb 2. The passphrase to unlock the master key at boot time. Is that optional or required? 3. The wrap/unwrap key, which can be per-user/application/cluster In the patch, the doc heading "Cluster Encryption Key Rotation" seems confusing. Doesn't that change the pass phrase? Why refer to it as the cluster encryption key here? Could the wrap functions expose the master encryption key by passing in empty string or null? I wonder if we should create a derived key from the master key to use for pg_wrap/pg_unwrap, maybe by appending a fixed string to all strings supplied to these functions. We could create another derived key for use in block-level encryption, so we are sure the two key spaces would never overlap. pgcryptokey shows a method for creating a secret between client and server using SQL that does not expose the secret in the server logs: https://momjian.us/download/pgcryptokey/ I assume we will create a 256-bit key for the master key, but give users an option of 128-bit vs 256-bit keys for block-level encryption. 256-bit keys are considered necessary for security against future quantum computing attacks. This looks like a bug in the patch: - This parameter can only be set in the <filename>postgresql.conf</filename> + This parameter can only be set in the <filename>postgresql.confo</filename> -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, 12 Mar 2020 at 08:13, Bruce Momjian <bruce.momjian@enterprisedb.com> wrote: > > On Fri, Mar 6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote: > > On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > > > > > Dear Sawada-san > > > > > > I don't know if my environment or email system is weird, but the V5 > > > patch file is only include simply a changed list. > > > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > > > Can you check it? > > > > > > > Thank you! I'd attached wrong file. > > Looking at this thread, I wanted to make a few comments: > > Everyone seems to think pgcrypto need some maintenance. Who would like > to take on that task? > > This feature does require openssl since all the encryption/decryption > happen via openssl function calls. > > Three are three levels of encrypt here: > > 1. The master key generated during initdb > > 2. The passphrase to unlock the master key at boot time. Is that > optional or required? The passphrase is required if the internal kms is enabled during initdb. Currently hashing the passphrase is also required but it could be optional. Even if we make hashing optional, we still require openssl to wrap and unwrap. > > 3. The wrap/unwrap key, which can be per-user/application/cluster > > In the patch, the doc heading "Cluster Encryption Key Rotation" seems > confusing. Doesn't that change the pass phrase? Why refer to it as the > cluster encryption key here? Agreed, changed to "Changing Cluster Passphrase". > > Could the wrap functions expose the master encryption key by passing in > empty string or null? Currently the wrap function returns NULL if null is passed, and doesn't expose the master encryption key even if empty string is passed because we add random IV for each wrapping. > I wonder if we should create a derived key from > the master key to use for pg_wrap/pg_unwrap, maybe by appending a fixed > string to all strings supplied to these functions. We could create > another derived key for use in block-level encryption, so we are sure > the two key spaces would never overlap. Currently the master key is 32 bytes but you mean to add fixed string to the master key to derive a new key? > > pgcryptokey shows a method for creating a secret between client and > server using SQL that does not expose the secret in the server logs: > > https://momjian.us/download/pgcryptokey/ > > I assume we will create a 256-bit key for the master key, but give users > an option of 128-bit vs 256-bit keys for block-level encryption. > 256-bit keys are considered necessary for security against future > quantum computing attacks. 256-bit keys are more weaker than 128-bit key in terms of quantum computing attacks? > > This looks like a bug in the patch: > > - This parameter can only be set in the <filename>postgresql.conf</filename> > + This parameter can only be set in the <filename>postgresql.confo</filename> Fixed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 16, 2020 at 04:13:21PM +0900, Masahiko Sawada wrote: > On Thu, 12 Mar 2020 at 08:13, Bruce Momjian > <bruce.momjian@enterprisedb.com> wrote: > > > > On Fri, Mar 6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote: > > > On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > > > > > > > Dear Sawada-san > > > > > > > > I don't know if my environment or email system is weird, but the V5 > > > > patch file is only include simply a changed list. > > > > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > > > > Can you check it? > > > > > > > > > > Thank you! I'd attached wrong file. > > > > Looking at this thread, I wanted to make a few comments: > > > > Everyone seems to think pgcrypto need some maintenance. Who would like > > to take on that task? > > > > This feature does require openssl since all the encryption/decryption > > happen via openssl function calls. > > > > Three are three levels of encrypt here: > > > > 1. The master key generated during initdb > > > > 2. The passphrase to unlock the master key at boot time. Is that > > optional or required? > > The passphrase is required if the internal kms is enabled during > initdb. Currently hashing the passphrase is also required but it could > be optional. Even if we make hashing optional, we still require > openssl to wrap and unwrap. I think openssl should be required for any of this --- that is what I was asking. > > Could the wrap functions expose the master encryption key by passing in > > empty string or null? > > Currently the wrap function returns NULL if null is passed, and > doesn't expose the master encryption key even if empty string is > passed because we add random IV for each wrapping. OK, good, makes sense, but you see why I am asking? We never want the master key to be visible. > > I wonder if we should create a derived key from > > the master key to use for pg_wrap/pg_unwrap, maybe by appending a fixed > > string to all strings supplied to these functions. We could create > > another derived key for use in block-level encryption, so we are sure > > the two key spaces would never overlap. > > Currently the master key is 32 bytes but you mean to add fixed string > to the master key to derive a new key? Yes, that was my idea --- make a separate keyspace for wrap/unwrap and block-level encryption. > > pgcryptokey shows a method for creating a secret between client and > > server using SQL that does not expose the secret in the server logs: > > > > https://momjian.us/download/pgcryptokey/ > > > > I assume we will create a 256-bit key for the master key, but give users > > an option of 128-bit vs 256-bit keys for block-level encryption. > > 256-bit keys are considered necessary for security against future > > quantum computing attacks. > > 256-bit keys are more weaker than 128-bit key in terms of quantum > computing attacks? No, I was saying we are using 256-bits for the master key and might allow 128 or 256 keys for block encryption. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Sending to pgsql-hackers again. On Tue, 17 Mar 2020 at 03:18, Bruce Momjian <bruce.momjian@enterprisedb.com> wrote: > > On Mon, Mar 16, 2020 at 04:13:21PM +0900, Masahiko Sawada wrote: > > On Thu, 12 Mar 2020 at 08:13, Bruce Momjian > > <bruce.momjian@enterprisedb.com> wrote: > > > > > > On Fri, Mar 6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote: > > > > On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > > > > > > > > > Dear Sawada-san > > > > > > > > > > I don't know if my environment or email system is weird, but the V5 > > > > > patch file is only include simply a changed list. > > > > > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > > > > > Can you check it? > > > > > > > > > > > > > Thank you! I'd attached wrong file. > > > > > > Looking at this thread, I wanted to make a few comments: > > > > > > Everyone seems to think pgcrypto need some maintenance. Who would like > > > to take on that task? > > > > > > This feature does require openssl since all the encryption/decryption > > > happen via openssl function calls. > > > > > > Three are three levels of encrypt here: > > > > > > 1. The master key generated during initdb > > > > > > 2. The passphrase to unlock the master key at boot time. Is that > > > optional or required? > > > > The passphrase is required if the internal kms is enabled during > > initdb. Currently hashing the passphrase is also required but it could > > be optional. Even if we make hashing optional, we still require > > openssl to wrap and unwrap. > > I think openssl should be required for any of this --- that is what I > was asking. > > > > Could the wrap functions expose the master encryption key by passing in > > > empty string or null? > > > > Currently the wrap function returns NULL if null is passed, and > > doesn't expose the master encryption key even if empty string is > > passed because we add random IV for each wrapping. > > OK, good, makes sense, but you see why I am asking? We never want the > master key to be visible. Understood. > > > > I wonder if we should create a derived key from > > > the master key to use for pg_wrap/pg_unwrap, maybe by appending a fixed > > > string to all strings supplied to these functions. We could create > > > another derived key for use in block-level encryption, so we are sure > > > the two key spaces would never overlap. > > > > Currently the master key is 32 bytes but you mean to add fixed string > > to the master key to derive a new key? > > Yes, that was my idea --- make a separate keyspace for wrap/unwrap and > block-level encryption. I understand that your idea is to include fixed length string to the 256 bit key in order to separate key space. But if we do that, I think that the key strength would actually be the same as the strength of weaker key length, depending on how we have the fixed string. I think if we want to have multiple key spaces, we need to derive keys from the master key using KDF. How do you think we can have the fixed length string? > > > > pgcryptokey shows a method for creating a secret between client and > > > server using SQL that does not expose the secret in the server logs: > > > > > > https://momjian.us/download/pgcryptokey/ > > > > > > I assume we will create a 256-bit key for the master key, but give users > > > an option of 128-bit vs 256-bit keys for block-level encryption. > > > 256-bit keys are considered necessary for security against future > > > quantum computing attacks. > > > > 256-bit keys are more weaker than 128-bit key in terms of quantum > > computing attacks? > > No, I was saying we are using 256-bits for the master key and might > allow 128 or 256 keys for block encryption. Yes, we might have 128 and 256 keys for block encryption. The current patch doesn't have option, supports only 256 bits key for the master key. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Sending to pgsql-hackers again. > > On Tue, 17 Mar 2020 at 03:18, Bruce Momjian > <bruce.momjian@enterprisedb.com> wrote: > > > > On Mon, Mar 16, 2020 at 04:13:21PM +0900, Masahiko Sawada wrote: > > > On Thu, 12 Mar 2020 at 08:13, Bruce Momjian > > > <bruce.momjian@enterprisedb.com> wrote: > > > > > > > > On Fri, Mar 6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote: > > > > > On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > > > > > > > > > > > Dear Sawada-san > > > > > > > > > > > > I don't know if my environment or email system is weird, but the V5 > > > > > > patch file is only include simply a changed list. > > > > > > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > > > > > > Can you check it? > > > > > > > > > > > > > > > > Thank you! I'd attached wrong file. > > > > > > > > Looking at this thread, I wanted to make a few comments: > > > > > > > > Everyone seems to think pgcrypto need some maintenance. Who would like > > > > to take on that task? > > > > > > > > This feature does require openssl since all the encryption/decryption > > > > happen via openssl function calls. > > > > > > > > Three are three levels of encrypt here: > > > > > > > > 1. The master key generated during initdb > > > > > > > > 2. The passphrase to unlock the master key at boot time. Is that > > > > optional or required? > > > > > > The passphrase is required if the internal kms is enabled during > > > initdb. Currently hashing the passphrase is also required but it could > > > be optional. Even if we make hashing optional, we still require > > > openssl to wrap and unwrap. > > > > I think openssl should be required for any of this --- that is what I > > was asking. > > > > > > Could the wrap functions expose the master encryption key by passing in > > > > empty string or null? > > > > > > Currently the wrap function returns NULL if null is passed, and > > > doesn't expose the master encryption key even if empty string is > > > passed because we add random IV for each wrapping. > > > > OK, good, makes sense, but you see why I am asking? We never want the > > master key to be visible. > > Understood. > > > > > > > I wonder if we should create a derived key from > > > > the master key to use for pg_wrap/pg_unwrap, maybe by appending a fixed > > > > string to all strings supplied to these functions. We could create > > > > another derived key for use in block-level encryption, so we are sure > > > > the two key spaces would never overlap. > > > > > > Currently the master key is 32 bytes but you mean to add fixed string > > > to the master key to derive a new key? > > > > Yes, that was my idea --- make a separate keyspace for wrap/unwrap and > > block-level encryption. > > I understand that your idea is to include fixed length string to the > 256 bit key in order to separate key space. But if we do that, I think > that the key strength would actually be the same as the strength of > weaker key length, depending on how we have the fixed string. I think > if we want to have multiple key spaces, we need to derive keys from the > master key using KDF. Or we can simply generate a different encryption key for block encryption. Therefore we will end up with having two encryption keys inside database. Maybe we can discuss this after the key manager has been introduced. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 19 Mar 2020 at 18:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Sending to pgsql-hackers again. > > > > On Tue, 17 Mar 2020 at 03:18, Bruce Momjian > > <bruce.momjian@enterprisedb.com> wrote: > > > > > > On Mon, Mar 16, 2020 at 04:13:21PM +0900, Masahiko Sawada wrote: > > > > On Thu, 12 Mar 2020 at 08:13, Bruce Momjian > > > > <bruce.momjian@enterprisedb.com> wrote: > > > > > > > > > > On Fri, Mar 6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote: > > > > > > On Fri, 6 Mar 2020 at 15:25, Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote: > > > > > > > > > > > > > > Dear Sawada-san > > > > > > > > > > > > > > I don't know if my environment or email system is weird, but the V5 > > > > > > > patch file is only include simply a changed list. > > > > > > > and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. > > > > > > > Can you check it? > > > > > > > > > > > > > > > > > > > Thank you! I'd attached wrong file. > > > > > > > > > > Looking at this thread, I wanted to make a few comments: > > > > > > > > > > Everyone seems to think pgcrypto need some maintenance. Who would like > > > > > to take on that task? > > > > > > > > > > This feature does require openssl since all the encryption/decryption > > > > > happen via openssl function calls. > > > > > > > > > > Three are three levels of encrypt here: > > > > > > > > > > 1. The master key generated during initdb > > > > > > > > > > 2. The passphrase to unlock the master key at boot time. Is that > > > > > optional or required? > > > > > > > > The passphrase is required if the internal kms is enabled during > > > > initdb. Currently hashing the passphrase is also required but it could > > > > be optional. Even if we make hashing optional, we still require > > > > openssl to wrap and unwrap. > > > > > > I think openssl should be required for any of this --- that is what I > > > was asking. > > > > > > > > Could the wrap functions expose the master encryption key by passing in > > > > > empty string or null? > > > > > > > > Currently the wrap function returns NULL if null is passed, and > > > > doesn't expose the master encryption key even if empty string is > > > > passed because we add random IV for each wrapping. > > > > > > OK, good, makes sense, but you see why I am asking? We never want the > > > master key to be visible. > > > > Understood. > > > > > > > > > > I wonder if we should create a derived key from > > > > > the master key to use for pg_wrap/pg_unwrap, maybe by appending a fixed > > > > > string to all strings supplied to these functions. We could create > > > > > another derived key for use in block-level encryption, so we are sure > > > > > the two key spaces would never overlap. > > > > > > > > Currently the master key is 32 bytes but you mean to add fixed string > > > > to the master key to derive a new key? > > > > > > Yes, that was my idea --- make a separate keyspace for wrap/unwrap and > > > block-level encryption. > > > > I understand that your idea is to include fixed length string to the > > 256 bit key in order to separate key space. But if we do that, I think > > that the key strength would actually be the same as the strength of > > weaker key length, depending on how we have the fixed string. I think > > if we want to have multiple key spaces, we need to derive keys from the > > master key using KDF. > > Or we can simply generate a different encryption key for block > encryption. Therefore we will end up with having two encryption keys > inside database. Maybe we can discuss this after the key manager has > been introduced. > Attached updated version patch. This patch incorporated the comments and changed pg_upgrade so that we take over the master encryption key from the old cluster to the new one if both enable key management. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote: > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada > > I understand that your idea is to include fixed length string to the > > 256 bit key in order to separate key space. But if we do that, I think > > that the key strength would actually be the same as the strength of > > weaker key length, depending on how we have the fixed string. I think > > if we want to have multiple key spaces, we need to derive keys from the > > master key using KDF. > > Or we can simply generate a different encryption key for block > encryption. Therefore we will end up with having two encryption keys > inside database. Maybe we can discuss this after the key manager has > been introduced. I know Sehrope liked derived keys so let's get his feedback on this. We might want to have two keys anyway for key rotation purposes. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, 19 Mar 2020 at 22:00, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote: > > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada > > > I understand that your idea is to include fixed length string to the > > > 256 bit key in order to separate key space. But if we do that, I think > > > that the key strength would actually be the same as the strength of > > > weaker key length, depending on how we have the fixed string. I think > > > if we want to have multiple key spaces, we need to derive keys from the > > > master key using KDF. > > > > Or we can simply generate a different encryption key for block > > encryption. Therefore we will end up with having two encryption keys > > inside database. Maybe we can discuss this after the key manager has > > been introduced. > > I know Sehrope liked derived keys so let's get his feedback on this. We > might want to have two keys anyway for key rotation purposes. > Agreed. Maybe we can derive a key for the use of wrap and unwrap SQL interface by like HKDF(MK, 'USER_KEY:') or HKDF(KM, 'USER_KEY:' || system_identifier). Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 19, 2020 at 11:42:36PM +0900, Masahiko Sawada wrote: > On Thu, 19 Mar 2020 at 22:00, Bruce Momjian <bruce@momjian.us> wrote: > > > > On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote: > > > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada > > > > I understand that your idea is to include fixed length string to the > > > > 256 bit key in order to separate key space. But if we do that, I think > > > > that the key strength would actually be the same as the strength of > > > > weaker key length, depending on how we have the fixed string. I think > > > > if we want to have multiple key spaces, we need to derive keys from the > > > > master key using KDF. > > > > > > Or we can simply generate a different encryption key for block > > > encryption. Therefore we will end up with having two encryption keys > > > inside database. Maybe we can discuss this after the key manager has > > > been introduced. > > > > I know Sehrope liked derived keys so let's get his feedback on this. We > > might want to have two keys anyway for key rotation purposes. > > > > Agreed. Maybe we can derive a key for the use of wrap and unwrap SQL > interface by like HKDF(MK, 'USER_KEY:') or HKDF(KM, 'USER_KEY:' || > system_identifier). Well, the issue is if the user can control the user key, there might be a way to make the user key do nothing. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Mar 19, 2020 at 11:42:36PM +0900, Masahiko Sawada wrote:
> On Thu, 19 Mar 2020 at 22:00, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote:
> > > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada
> > > > I understand that your idea is to include fixed length string to the
> > > > 256 bit key in order to separate key space. But if we do that, I think
> > > > that the key strength would actually be the same as the strength of
> > > > weaker key length, depending on how we have the fixed string. I think
> > > > if we want to have multiple key spaces, we need to derive keys from the
> > > > master key using KDF.
> > >
> > > Or we can simply generate a different encryption key for block
> > > encryption. Therefore we will end up with having two encryption keys
> > > inside database. Maybe we can discuss this after the key manager has
> > > been introduced.
> >
> > I know Sehrope liked derived keys so let's get his feedback on this. We
> > might want to have two keys anyway for key rotation purposes.
> >
>
> Agreed. Maybe we can derive a key for the use of wrap and unwrap SQL
> interface by like HKDF(MK, 'USER_KEY:') or HKDF(KM, 'USER_KEY:' ||
> system_identifier).
Well, the issue is if the user can control the user key, there is might be
a way to make the user key do nothing.
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 20, 2020 at 12:50:27AM +0900, Masahiko Sawada wrote: > On Fri, Mar 20, 2020 at 0:35 Bruce Momjian <bruce@momjian.us> wrote: > Well, the issue is if the user can control the user key, there is might be > a way to make the user key do nothing. > > Well I meant ‘USER_KEY:’ is a fixed length string for the key used for wrap and > unwrap SQL interface functions. So user cannot control it. We will have another > key derived by, for example, HKDF(MK, ‘TDE_KEY:’ || system_identifier) for > block encryption. OK, yes, something liek that might make sense. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, 20 Mar 2020 at 01:38, Bruce Momjian <bruce@momjian.us> wrote: > > On Fri, Mar 20, 2020 at 12:50:27AM +0900, Masahiko Sawada wrote: > > On Fri, Mar 20, 2020 at 0:35 Bruce Momjian <bruce@momjian.us> wrote: > > Well, the issue is if the user can control the user key, there is might be > > a way to make the user key do nothing. > > > > Well I meant ‘USER_KEY:’ is a fixed length string for the key used for wrap and > > unwrap SQL interface functions. So user cannot control it. We will have another > > key derived by, for example, HKDF(MK, ‘TDE_KEY:’ || system_identifier) for > > block encryption. > > OK, yes, something liek that might make sense. > Attached the updated version patch. The patch introduces KDF to derive a new key from the master encryption key. We use the derived key for pg_wrap and pg_unwrap SQL functions, instead of directly using the master encryption key. In the future, we will be able to have a separate derived keys for block encryption. As a result of using KDF, the minimum version of OpenSSL when enabling key management is 1.1.0. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Mar 19, 2020 at 09:33:09PM +0900, Masahiko Sawada wrote: > Attached updated version patch. This patch incorporated the comments > and changed pg_upgrade so that we take over the master encryption key > from the old cluster to the new one if both enable key management. We had a crypto team meeting today, and came away with a few ideas: We should create an SQL-level master key that is different from the block-level master key. By using separate keys, and not deriving them from a single key, they keys can be rotated and migrated to a different cluster independently. For example, users might want to create a new cluster with a new block-level key, but might want to copy the SQL-level key from the old cluster to the new cluster. Both keys would be unlocked with the same passphrase. I was confused by how the wrap/unwrap work. Here is an example from the proposed doc patch: +<programlisting> +=# SELECT pg_wrap('user sercret key'); + pg_wrap +-------------------------------------------------------------------------------------------------------------------------------------------------------------------- + \xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe +(1 row) +</programlisting> + + <para> + Once wrapping the user key, user can encrypt and decrypt user data using the + wrapped user key togehter with the key unwrap functions: + </para> + +<programlisting> + =# INSERT INTO tbl + VALUES (pgp_sym_encrypt('secret data', + pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe'))); + INSERT 1 + + =# SELECT * FROM tbl; + col +-------------------------------------------------------------------------------------------------------------------------------------------------------------- + \xc30d04070302a199ee38bea0320b75d23c01577bb3ffb315d67eecbeca3e40e869cea65efbf0b470f805549af905f94d94c447fbfb8113f585fc86b30c0bd784b10c9857322dc00d556aa8de14 +(1 row) + + =# SELECT pgp_sym_decrypt(col, + pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe')) ascol + FROM tbl; + col +------------------ + user secret data All pg_wrap() does is to take the user string, in this case 'user sercret key' and encrypt it with the SQL-level master key. It doesn't mix the SQL-level master key into the output, which is what I originally thought. This means that the pg_unwrap() call above just returns 'user sercret key'. How would this be used? Users would call pg_wrap() once, and store the result on the client. The client could then use the output of pg_wrap() in all future sessions, without exposing 'user sercret key', which is the key used to encrypt user data. The passing of the parameter to pg_wrap() has to be done in a way that doesn't permanently record the parameter anywhere, like in the logs. pgcryptokey (https://momjian.us/download/pgcryptokey/) has a method of doing this. This is how it passes the data encryption key without making it visible in the logs, using psql: SELECT get_shared_key() \gset \set enc_access_password `echo 'my secret' | tr -d '\n' | openssl dgst -sha256 -binary | gpg2 --symmetric --batch--cipher-algo AES128 --passphrase :'get_shared_key' | xxd -plain | tr -d '\n'` SELECT set_session_access_password(:'enc_access_password'); Removing the sanity checks and user-interface simplicity, it is internally doing this: SELECT set_config('pgcryptokey.shared_key', encode(gen_random_bytes(32), 'hex'), FALSE) AS get_shared_key \gset \set enc_access_password `echo 'my secret' | tr -d '\n' | openssl dgst -sha256 -binary | gpg2 --symmetric --batch--cipher-algo AES128 --passphrase :'get_shared_key' | xxd -plain | tr -d '\n'` SELECT set_config('pgcryptokey.access_password', encode(pgp_sym_decrypt_bytea(decode(:'enc_access_password', 'hex'), :'get_shared_key'), 'hex'), FALSE) || NULL; In English, what it does is the server generates a random key, stores it in a server-side veraible, and sends it to the client. The client hashes a user-supplied key and encrypts it with the random key it got from the server, and sends it to the sever. The server decrypts it using the key it sent (stored in a server-side variable) and stores the the result in another server-side veriable. Perhaps this can be added to our docs as a way of calling pg_wrap(). What good is this feature? Well, the user-supplied data encryption key like 'user sercret key', which is used to encrypt user data, is not visible in the query or the server logs. The wrapped password is visible, but to use it you must be able to connect to a running server (to unwrap it), or have a shut down server and know the paasphrase. Read access to the file system is not sufficient since there is no access to the pass phrase. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, 21 Mar 2020 at 05:30, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Mar 19, 2020 at 09:33:09PM +0900, Masahiko Sawada wrote: > > Attached updated version patch. This patch incorporated the comments > > and changed pg_upgrade so that we take over the master encryption key > > from the old cluster to the new one if both enable key management. > > We had a crypto team meeting today, and came away with a few ideas: > > We should create an SQL-level master key that is different from the > block-level master key. By using separate keys, and not deriving them > from a single key, they keys can be rotated and migrated to a different > cluster independently. For example, users might want to create a new > cluster with a new block-level key, but might want to copy the SQL-level > key from the old cluster to the new cluster. Both keys would be > unlocked with the same passphrase. I've updated the patch according to yesterday's meeting. As the above description by Bruce, the current patch have two encryption keys. Previously we have the master key in pg_control but due to exceeding the safe size limit of pg_control I moved two keys to the dedicated file located at global/pg_key. A wrapped key is 128 bytes and the total size including two wrapped key became 552 bytes while safe limit is 512 bytes. During pg_upgrade we copy the key file from the old cluster to the new cluster. Therefore we can unwrap the data that is wrapped on the old cluster on the new cluster. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sat, Mar 21, 2020 at 02:12:46PM +0900, Masahiko Sawada wrote: > On Sat, 21 Mar 2020 at 05:30, Bruce Momjian <bruce@momjian.us> wrote: > > We should create an SQL-level master key that is different from the > > block-level master key. By using separate keys, and not deriving them > > from a single key, they keys can be rotated and migrated to a different > > cluster independently. For example, users might want to create a new > > cluster with a new block-level key, but might want to copy the SQL-level > > key from the old cluster to the new cluster. Both keys would be > > unlocked with the same passphrase. > > I've updated the patch according to yesterday's meeting. As the above > description by Bruce, the current patch have two encryption keys. > Previously we have the master key in pg_control but due to exceeding > the safe size limit of pg_control I moved two keys to the dedicated > file located at global/pg_key. A wrapped key is 128 bytes and the > total size including two wrapped key became 552 bytes while safe limit > is 512 bytes. > > During pg_upgrade we copy the key file from the old cluster to the new > cluster. Therefore we can unwrap the data that is wrapped on the old > cluster on the new cluster. I wonder if we should just use two files, one for each key. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Mar 21, 2020 at 10:01:02AM -0400, Bruce Momjian wrote: > On Sat, Mar 21, 2020 at 02:12:46PM +0900, Masahiko Sawada wrote: > > On Sat, 21 Mar 2020 at 05:30, Bruce Momjian <bruce@momjian.us> wrote: > > > We should create an SQL-level master key that is different from the > > > block-level master key. By using separate keys, and not deriving them > > > from a single key, they keys can be rotated and migrated to a different > > > cluster independently. For example, users might want to create a new > > > cluster with a new block-level key, but might want to copy the SQL-level > > > key from the old cluster to the new cluster. Both keys would be > > > unlocked with the same passphrase. > > > > I've updated the patch according to yesterday's meeting. As the above > > description by Bruce, the current patch have two encryption keys. > > Previously we have the master key in pg_control but due to exceeding > > the safe size limit of pg_control I moved two keys to the dedicated > > file located at global/pg_key. A wrapped key is 128 bytes and the > > total size including two wrapped key became 552 bytes while safe limit > > is 512 bytes. > > > > During pg_upgrade we copy the key file from the old cluster to the new > > cluster. Therefore we can unwrap the data that is wrapped on the old > > cluster on the new cluster. > > I wonder if we should just use two files, one for each key. Actually, I think we need three files: * TDE WAL key file * TDE block key file * SQL-level file Primaries and standbys have to use the same TDE WAL key file, but can use different TDE block key files to allow for key rotation, so having separate files makes sense --- maybe they need to be in their own directory. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, 21 Mar 2020 at 23:50, Bruce Momjian <bruce@momjian.us> wrote: > > On Sat, Mar 21, 2020 at 10:01:02AM -0400, Bruce Momjian wrote: > > On Sat, Mar 21, 2020 at 02:12:46PM +0900, Masahiko Sawada wrote: > > > On Sat, 21 Mar 2020 at 05:30, Bruce Momjian <bruce@momjian.us> wrote: > > > > We should create an SQL-level master key that is different from the > > > > block-level master key. By using separate keys, and not deriving them > > > > from a single key, they keys can be rotated and migrated to a different > > > > cluster independently. For example, users might want to create a new > > > > cluster with a new block-level key, but might want to copy the SQL-level > > > > key from the old cluster to the new cluster. Both keys would be > > > > unlocked with the same passphrase. > > > > > > I've updated the patch according to yesterday's meeting. As the above > > > description by Bruce, the current patch have two encryption keys. > > > Previously we have the master key in pg_control but due to exceeding > > > the safe size limit of pg_control I moved two keys to the dedicated > > > file located at global/pg_key. A wrapped key is 128 bytes and the > > > total size including two wrapped key became 552 bytes while safe limit > > > is 512 bytes. > > > > > > During pg_upgrade we copy the key file from the old cluster to the new > > > cluster. Therefore we can unwrap the data that is wrapped on the old > > > cluster on the new cluster. > > > > I wonder if we should just use two files, one for each key. > > Actually, I think we need three files: > > * TDE WAL key file > * TDE block key file > * SQL-level file > > Primaries and standbys have to use the same TDE WAL key file, but can > use different TDE block key files to allow for key rotation, so having > separate files makes sense --- maybe they need to be in their own > directory. I've considered to have separate key files once but it would make things complex to update multiple files atomically. Postgres server will never start if it crashes in the middle of cluster passphrase rotation. Can we consider to have keys related to TDE after we introduce the basic key management system? Probably having keys in a separate file rather than in pg_control file would be better but we don't need these keys so far. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 23, 2020 at 03:55:34PM +0900, Masahiko Sawada wrote: > On Sat, 21 Mar 2020 at 23:50, Bruce Momjian <bruce@momjian.us> wrote: > > Actually, I think we need three files: > > > > * TDE WAL key file > > * TDE block key file > > * SQL-level file > > > > Primaries and standbys have to use the same TDE WAL key file, but can > > use different TDE block key files to allow for key rotation, so having > > separate files makes sense --- maybe they need to be in their own > > directory. > > I've considered to have separate key files once but it would make > things complex to update multiple files atomically. Postgres server > will never start if it crashes in the middle of cluster passphrase > rotation. Can we consider to have keys related to TDE after we > introduce the basic key management system? Probably having keys in a > separate file rather than in pg_control file would be better but we > don't need these keys so far. Well, we need to be able to upgrade this so we have to set it up now in a way that allows that. I am not sure we have ever had a case where we needed to update multiple files atomically at the same time, without the help of WAL. Perhaps we should put the three keys in separate files in a directory called 'cryptokeys', and when we change the pass phrase, we create a new directory called 'cryptokeys.new'. Then once we have created the files in there with the new pass phrase, we remove cryptokeys and rename directory cryptokeys.new to cryptokeys. On boot, if cryptokeys exists and cryptokeys.new does too, remove cryptokeys.new because we crashed during key rotation, If cryptokeys.new exists and cryptokeys doesn't, we rename cryptokeys.new to cryptokeys because we crashed before the rename. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, 24 Mar 2020 at 07:15, Bruce Momjian <bruce@momjian.us> wrote: > > On Mon, Mar 23, 2020 at 03:55:34PM +0900, Masahiko Sawada wrote: > > On Sat, 21 Mar 2020 at 23:50, Bruce Momjian <bruce@momjian.us> wrote: > > > Actually, I think we need three files: > > > > > > * TDE WAL key file > > > * TDE block key file > > > * SQL-level file > > > > > > Primaries and standbys have to use the same TDE WAL key file, but can > > > use different TDE block key files to allow for key rotation, so having > > > separate files makes sense --- maybe they need to be in their own > > > directory. > > > > I've considered to have separate key files once but it would make > > things complex to update multiple files atomically. Postgres server > > will never start if it crashes in the middle of cluster passphrase > > rotation. Can we consider to have keys related to TDE after we > > introduce the basic key management system? Probably having keys in a > > separate file rather than in pg_control file would be better but we > > don't need these keys so far. > > Well, we need to be able to upgrade this so we have to set it up now in > a way that allows that. > > I am not sure we have ever had a case where we needed to update multiple > files atomically at the same time, without the help of WAL. > > Perhaps we should put the three keys in separate files in a directory > called 'cryptokeys', and when we change the pass phrase, we create a new > directory called 'cryptokeys.new'. Then once we have created the files > in there with the new pass phrase, we remove cryptokeys and rename > directory cryptokeys.new to cryptokeys. On boot, if cryptokeys exists > and cryptokeys.new does too, remove cryptokeys.new because we crashed > during key rotation, If cryptokeys.new exists and cryptokeys doesn't, > we rename cryptokeys.new to cryptokeys because we crashed before the > rename. That seems to work fine. So we will have pg_cryptokeys within PGDATA and each key is stored into separate file named the key id such as "sql", "tde-wal" and "tde-block". I'll update the patch and post. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote: > That seems to work fine. > > So we will have pg_cryptokeys within PGDATA and each key is stored > into separate file named the key id such as "sql", "tde-wal" and > "tde-block". I'll update the patch and post. Yes, that makes sense to me. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, 24 Mar 2020 at 23:15, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote: > > That seems to work fine. > > > > So we will have pg_cryptokeys within PGDATA and each key is stored > > into separate file named the key id such as "sql", "tde-wal" and > > "tde-block". I'll update the patch and post. > > Yes, that makes sense to me. > I've attached the updated patch. With the patch, we have three internal keys: SQL key, TDE-block key and TDE-wal key. Only SQL key can be used so far to wrap and unwrap user secret via pg_wrap and pg_unwrap SQL functions. Each keys is saved to the single file located at pg_cryptokeys. After initdb with enabling key manager, the pg_cryptokeys directory has the following files: $ ll data/pg_cryptokeys total 12K -rw------- 1 masahiko staff 132 Mar 25 15:45 0000 -rw------- 1 masahiko staff 132 Mar 25 15:45 0001 -rw------- 1 masahiko staff 132 Mar 25 15:45 0002 I used the integer id rather than string id to make the code simple. When cluster passphrase rotation, we update all keys atomically using temporary directory as follows: 1. Derive the new passphrase 2. Wrap all internal keys with the new passphrase 3. Save all internal keys to the temp directory 4. Remove the original directory, pg_cryptokeys 5. Rename the temp directory to pg_cryptokeys In case of failure during rotation, pg_cyrptokeys and pg_cyrptokeys_tmp can be left in an incomplete state. We recover it by checking if the temporary directory exists and the wrapped keys in the temporary directory are valid. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Mar 25, 2020 at 05:51:08PM +0900, Masahiko Sawada wrote: > On Tue, 24 Mar 2020 at 23:15, Bruce Momjian <bruce@momjian.us> wrote: > > > > On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote: > > > That seems to work fine. > > > > > > So we will have pg_cryptokeys within PGDATA and each key is stored > > > into separate file named the key id such as "sql", "tde-wal" and > > > "tde-block". I'll update the patch and post. > > > > Yes, that makes sense to me. > > > > I've attached the updated patch. With the patch, we have three > internal keys: SQL key, TDE-block key and TDE-wal key. Only SQL key > can be used so far to wrap and unwrap user secret via pg_wrap and > pg_unwrap SQL functions. Each keys is saved to the single file located > at pg_cryptokeys. After initdb with enabling key manager, the > pg_cryptokeys directory has the following files: > > $ ll data/pg_cryptokeys > total 12K > -rw------- 1 masahiko staff 132 Mar 25 15:45 0000 > -rw------- 1 masahiko staff 132 Mar 25 15:45 0001 > -rw------- 1 masahiko staff 132 Mar 25 15:45 0002 > > I used the integer id rather than string id to make the code simple. Great, thanks. I assume the final version will use file names. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, 24 Mar 2020 at 23:15, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote:
> > That seems to work fine.
> >
> > So we will have pg_cryptokeys within PGDATA and each key is stored
> > into separate file named the key id such as "sql", "tde-wal" and
> > "tde-block". I'll update the patch and post.
>
> Yes, that makes sense to me.
>
I've attached the updated patch. With the patch, we have three
internal keys: SQL key, TDE-block key and TDE-wal key. Only SQL key
can be used so far to wrap and unwrap user secret via pg_wrap and
pg_unwrap SQL functions. Each keys is saved to the single file located
at pg_cryptokeys. After initdb with enabling key manager, the
pg_cryptokeys directory has the following files:
$ ll data/pg_cryptokeys
total 12K
-rw------- 1 masahiko staff 132 Mar 25 15:45 0000
-rw------- 1 masahiko staff 132 Mar 25 15:45 0001
-rw------- 1 masahiko staff 132 Mar 25 15:45 0002
I used the integer id rather than string id to make the code simple.
When cluster passphrase rotation, we update all keys atomically using
temporary directory as follows:
1. Derive the new passphrase
2. Wrap all internal keys with the new passphrase
3. Save all internal keys to the temp directory
4. Remove the original directory, pg_cryptokeys
5. Rename the temp directory to pg_cryptokeys
In case of failure during rotation, pg_cyrptokeys and
pg_cyrptokeys_tmp can be left in an incomplete state. We recover it by
checking if the temporary directory exists and the wrapped keys in the
temporary directory are valid.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 31 Mar 2020 at 09:36, Cary Huang <cary.huang@highgo.ca> wrote: > > Hi > I had a look on kms_v9 patch and have some comments > > --> pg_upgrade.c > keys are copied correctly, but as pg_upgrade progresses further, it will try to start the new_cluster from "issue_warnings_and_set_wal_level()"function, which is called after key copy. The new cluster will fail to start due to themismatch between cluster_passphrase_command and the newly copied keys. This causes pg_upgrade to always finish with failure.We could move "copy_master_encryption_key()" to be called after "issue_warnings_and_set_wal_level()" and this willmake pg_upgrade to finish with success, but user will still have to manually correct the "cluster_passphrase_command"param on the new cluster in order for it to start up correctly. Should pg_upgrade also take careof copying "cluster_passphrase_command" param from old to new cluster after it has copied the encryption keys so usersdon't have to do this step? If the expectation is for users to manually correct "cluster_passphrase_command" param aftersuccessful pg_upgrade and key copy, then there should be a message to remind the users to do so. I think both the old cluster and the new cluster must be initialized with the same passphrase at initdb. Specifying the different passphrase command to the new cluster at initdb and changing it after pg_upgrade doesn't make sense. Also I don't think we need to copy cluster_passphrase_command same as other GUC parameters. I've changed the patch so that pg_upgrade copies the crypto keys only if both new and old cluster enable the key management. User must specify the same passphrase command to both old and new cluster, which is not cumbersome, I think. I also added the description about this to the doc. > > -->Kmgr.c > + /* > + * If there is only temporary directory, it means that the previous > + * rotation failed after wrapping the all internal keys by the new > + * passphrase. Therefore we use the new cluster passphrase. > + */ > + if (stat(KMGR_DIR, &st) != 0) > + { > + ereport(DEBUG1, > + (errmsg("both directories %s and %s exist, use the newly wrapped keys", > + KMGR_DIR, KMGR_TMP_DIR))); > > I think the error message should say "there is only temporary directory exist" instead of "both directories exist" You're right. Fixed. I've attached the new version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, 31 Mar 2020 at 09:36, Cary Huang <cary.huang@highgo.ca> wrote:
>
> Hi
> I had a look on kms_v9 patch and have some comments
>
> --> pg_upgrade.c
> keys are copied correctly, but as pg_upgrade progresses further, it will try to start the new_cluster from "issue_warnings_and_set_wal_level()" function, which is called after key copy. The new cluster will fail to start due to the mismatch between cluster_passphrase_command and the newly copied keys. This causes pg_upgrade to always finish with failure. We could move "copy_master_encryption_key()" to be called after "issue_warnings_and_set_wal_level()" and this will make pg_upgrade to finish with success, but user will still have to manually correct the "cluster_passphrase_command" param on the new cluster in order for it to start up correctly. Should pg_upgrade also take care of copying "cluster_passphrase_command" param from old to new cluster after it has copied the encryption keys so users don't have to do this step? If the expectation is for users to manually correct "cluster_passphrase_command" param after successful pg_upgrade and key copy, then there should be a message to remind the users to do so.
I think both the old cluster and the new cluster must be initialized
with the same passphrase at initdb. Specifying the different
passphrase command to the new cluster at initdb and changing it after
pg_upgrade doesn't make sense. Also I don't think we need to copy
cluster_passphrase_command same as other GUC parameters.
I've changed the patch so that pg_upgrade copies the crypto keys only
if both new and old cluster enable the key management. User must
specify the same passphrase command to both old and new cluster, which
is not cumbersome, I think. I also added the description about this to
the doc.
>
> -->Kmgr.c
> + /*
> + * If there is only temporary directory, it means that the previous
> + * rotation failed after wrapping the all internal keys by the new
> + * passphrase. Therefore we use the new cluster passphrase.
> + */
> + if (stat(KMGR_DIR, &st) != 0)
> + {
> + ereport(DEBUG1,
> + (errmsg("both directories %s and %s exist, use the newly wrapped keys",
> + KMGR_DIR, KMGR_TMP_DIR)));
>
> I think the error message should say "there is only temporary directory exist" instead of "both directories exist"
You're right. Fixed.
I've attached the new version patch.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
HelloThanks a lot for the patch, I think in terms of functionality, the patch provides very straightforward functionalities regarding key management. In terms of documentation, I think the patch is still lacking some pieces of information that kind of prevent people from fully understanding how KMS works and how it can be used and why, (at least that is the impression I got from the zoom meeting recordings :p). I spent some time today revisiting the key-management documentation in the patch and rephrase and restructure it based on my current understanding of latest KMS design. I mentioned all 3 application level keys that we have agreed and emphasize on explaining the SQL level encryption key because that is the key that can be used right now. Block and WAL levels keys we can add here more information once they are actually used in the TDE development.Please see below the KMS documentation that I have revised and I hope it will be more clear and easier for people to understand KMS. Feel free to make adjustments. Please note that we use the term "wrap" and "unwrap" a lot in our past discussions. Originally we used the terms within a context involving Key encryption keys (KEK). For example, "KMS wraps a master key with KEK". Later, we used the same term in a context involving encrypting user secret /password. For example, "KMS wraps a user secret with SQL key". In my opinion, both make sense but it may be confusing to people having the same term used differently. So in my revision below, the terms "wrap" and "unwrap" refer to encrypting or decrypting user secret / password as they are used in "pg_wrap() and pg_unwrap()". I use the terms "encapsulate" and "restore" when KEK is used to encrypt or decrypt a key.Chapter 32: Encryption Key Management----------------------------------------------PostgreSQL supports internal Encryption Key Management System, which is designed to manage the life cycles of cryptographic keys within the PostgreSQL system. This includes dealing with their generation, storage, usage and rotation.Encryption Key Management is enabled when PostgreSQL is build with --with-openssl and cluster passphrase command is specified during initdb. The cluster passphrase provided by --cluster-passphrase-command option during initdb and the one generated by cluster_passphrase_command in the postgresql.conf must match, otherwise, the database cluster will not start up.32.1 Key Generations and Derivations------------------------------------------When cluster_passphrase_command option is specified to the initdb, the process will derive the cluster passphrase into a Key Encryption Key (KEK) and a HMAC Key using key derivation protocol before the actual generation of application level cryptographic level keys.-Key Encryption Key (KEK)KEK is primarily used to encapsulate or restore a given application level cryptographic key-HMAC KeyHMAC key is used to compute the HASH of a given application level cryptographic key for integrity check purposesThese 2 keys are not stored physically within the PostgreSQL cluster as they are designed to be derived from the correctly configured cluster passphrase.Encryption Key Management System currently manages 3 application level cryptographic keys that have different purposes and usages within the PostgreSQL system and these are generated using pg_strong_random() after KEK and HMAC key derivation during initdb process.The 3 keys are:-SQL Level KeySQL Level Key is used to wrap and unwrap a user secret / passphrase via pg_wrap() and pg_unwrap() SQL functions. These 2 functions are designed to be used in conjunction with the cryptographic functions provided by pgcrypto extension to perform column level encryption/decryption without having to supply a clear text user secret or passphrase that is required by many pgcrypto functions as input. Please refer to [Wrap and Unwrap User Secret section] for usage examples.-Block Level KeyBlock Level Key is primarily used to encrypt / decrypt buffers as part of the Transparent Data Encryption (TDE) feature-WAL Level KeyWAL Level Key is primarily used to encrypt / decrypt WAL files as part of the Transparent Data Encryption (TDE) featureThe 3 application level keys above will be encapsulated and hashed using KEK and HMAC key mentioned above before they are physically stored to pg_cryptokeys directory within the cluster.32.1. Key Initialization-------------------------When a PostgreSQL cluster with encryption key management enabled is started, the cluster_passphrase_command parameter in postgresql.conf will be evaluated and the cluster passphrase will be derived into KEK and HMAC Key in similar ways as initdb.After that, the 3 encapsulated application level cryptographic keys will be retrieved from pg_cryptokeys directory to be restored and integrity-checked by the key management system using the derived KEK and HMAC key. If this process fails, it is likely that the cluster passphrase supplied to the cluster is not the same as that supplied to the initdb process. The cluster will refuse to start in this case and user has to manually correct the cluster passphrase.32.2. Wrap and Unwrap User Secret----------------------------------------Encryption key management system provides pg_wrap() and pg_unwrap SQL functions (listed in Table 9.97) to perform wrap and unwrap operations on user secret with the SQL level encryption key. The SQL level encryption key is one of the 3 application level keys generated during initdb process when cluster_passphrase is supplied.When pg_wrap() and pg_unwrap() functions are invoked, SQL level encryption key will internally be used to perform the encryption and decryption operation with HMAC-based integrity check. From user's point of view, he or she is not aware of the actual SQL level encryption key used internally by both wrap functionsOne possible use case is to combine pg_wrap() and pg_unwrap() with pgcrypto. User wraps the user encryption secret with pg_wrap function and passes the wrapped encryption secret to pg_unwrap function for the pgcrypto encryption functions. The wrapped secret can be stored in the application server or somewhere secured and should be obtained promptly for cryptographic operation with pgcrypto.Here is an example that shows how to encrypt and decrypt data together with wrap and unwrap functions:=# SELECT pg_wrap('my secret passward');pg_wrap--------------------------------------------------------------------------------------------------------------------------------------------------------------------\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe(1 row)Once wrapping the user key, user can encrypt and decrypt user data using the wrapped user key together with the key unwrap functions:=# INSERT INTO tblVALUES (pgp_sym_encrypt('secret data',pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe')));INSERT 1=# SELECT * FROM tbl;col--------------------------------------------------------------------------------------------------------------------------------------------------------------\xc30d04070302a199ee38bea0320b75d23c01577bb3ffb315d67eecbeca3e40e869cea65efbf0b470f805549af905f94d94c447fbfb8113f585fc86b30c0bd784b10c9857322dc00d556aa8de14(1 row)=# SELECT pgp_sym_decrypt(col,pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe')) as colFROM tbl;col--------------secret data(1 row)The data 'secret data' is practically encrypted by the user secret 'my secret passward' but using wrap and unwrap functions user don't need to know the actual user secret during operation.32.3. Key Rotation Process------------------------------Encryption keys in general are not interminable, the longer the same key is in use, the chance of it being breached increases. Performing key rotation on a regular basis help meet standardized security practices such as PCI-DSS and it is a good practice in security to limit the number of encrypted bytes available for a specific key version. The key lifetimse are based on key length, key strength, algorithm and total number of bytes enciphered. The key management systems provides a efficient method to perform key rotation.Please be aware that the phrase "key rotation" here only refers to the rotation of KEK and HMAC keys. The 3 application level encryption keys (SQL, Block and WAL levels) are not rotated; they will in fact be the same before and after a "key rotation." This can be justified because the actual keys are never stored anywhere physically, presented to user or captured in logging. What is being rotated here is the KEK and HMAC keys who are responsible for encapsulating and restoring the actual application level encryption keys.Since both KEK and HMAC keys are derived from a cluster passphrase, the "key rotation" ultimately refers to the rotation of cluster passphrase and deriving a new KEK and HMAC keys from the new cluster passphrase. The new set of KEK and HMAC keys can then be used to encapsulate all 3 application level encryptions keys and store the new results in pg_cryptokeys directory.To rotate the cluster passphrase, user firstly needs to update cluster_passphrase_command in the postgresql.conf and then execute pg_rotate_cluster_passphrase() SQL function to initiate the rotation.---- On Mon, 30 Mar 2020 21:30:19 -0700 Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote ----On Tue, 31 Mar 2020 at 09:36, Cary Huang <cary.huang@highgo.ca> wrote:
>
> Hi
> I had a look on kms_v9 patch and have some comments
>
> --> pg_upgrade.c
> keys are copied correctly, but as pg_upgrade progresses further, it will try to start the new_cluster from "issue_warnings_and_set_wal_level()" function, which is called after key copy. The new cluster will fail to start due to the mismatch between cluster_passphrase_command and the newly copied keys. This causes pg_upgrade to always finish with failure. We could move "copy_master_encryption_key()" to be called after "issue_warnings_and_set_wal_level()" and this will make pg_upgrade to finish with success, but user will still have to manually correct the "cluster_passphrase_command" param on the new cluster in order for it to start up correctly. Should pg_upgrade also take care of copying "cluster_passphrase_command" param from old to new cluster after it has copied the encryption keys so users don't have to do this step? If the expectation is for users to manually correct "cluster_passphrase_command" param after successful pg_upgrade and key copy, then there should be a message to remind the users to do so.
I think both the old cluster and the new cluster must be initialized
with the same passphrase at initdb. Specifying the different
passphrase command to the new cluster at initdb and changing it after
pg_upgrade doesn't make sense. Also I don't think we need to copy
cluster_passphrase_command same as other GUC parameters.
I've changed the patch so that pg_upgrade copies the crypto keys only
if both new and old cluster enable the key management. User must
specify the same passphrase command to both old and new cluster, which
is not cumbersome, I think. I also added the description about this to
the doc.
>
> -->Kmgr.c
> + /*
> + * If there is only temporary directory, it means that the previous
> + * rotation failed after wrapping the all internal keys by the new
> + * passphrase. Therefore we use the new cluster passphrase.
> + */
> + if (stat(KMGR_DIR, &st) != 0)
> + {
> + ereport(DEBUG1,
> + (errmsg("both directories %s and %s exist, use the newly wrapped keys",
> + KMGR_DIR, KMGR_TMP_DIR)));
>
> I think the error message should say "there is only temporary directory exist" instead of "both directories exist"
You're right. Fixed.
I've attached the new version patch.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
Hi Bruce/Joe,In the last meeting we discussed the need for improving the documentation for KMS so it is easier to understand the interface. Cary from highgo had a go at doing that, please see the previous email on this thread from Cary and let us know if it looks good...?-- AhsanOn Wed, Apr 8, 2020 at 3:46 AM Cary Huang <cary.huang@highgo.ca> wrote:--Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.caHelloThanks a lot for the patch, I think in terms of functionality, the patch provides very straightforward functionalities regarding key management. In terms of documentation, I think the patch is still lacking some pieces of information that kind of prevent people from fully understanding how KMS works and how it can be used and why, (at least that is the impression I got from the zoom meeting recordings :p). I spent some time today revisiting the key-management documentation in the patch and rephrase and restructure it based on my current understanding of latest KMS design. I mentioned all 3 application level keys that we have agreed and emphasize on explaining the SQL level encryption key because that is the key that can be used right now. Block and WAL levels keys we can add here more information once they are actually used in the TDE development.Please see below the KMS documentation that I have revised and I hope it will be more clear and easier for people to understand KMS. Feel free to make adjustments. Please note that we use the term "wrap" and "unwrap" a lot in our past discussions. Originally we used the terms within a context involving Key encryption keys (KEK). For example, "KMS wraps a master key with KEK". Later, we used the same term in a context involving encrypting user secret /password. For example, "KMS wraps a user secret with SQL key". In my opinion, both make sense but it may be confusing to people having the same term used differently. So in my revision below, the terms "wrap" and "unwrap" refer to encrypting or decrypting user secret / password as they are used in "pg_wrap() and pg_unwrap()". I use the terms "encapsulate" and "restore" when KEK is used to encrypt or decrypt a key.Chapter 32: Encryption Key Management----------------------------------------------PostgreSQL supports internal Encryption Key Management System, which is designed to manage the life cycles of cryptographic keys within the PostgreSQL system. This includes dealing with their generation, storage, usage and rotation.Encryption Key Management is enabled when PostgreSQL is build with --with-openssl and cluster passphrase command is specified during initdb. The cluster passphrase provided by --cluster-passphrase-command option during initdb and the one generated by cluster_passphrase_command in the postgresql.conf must match, otherwise, the database cluster will not start up.32.1 Key Generations and Derivations------------------------------------------When cluster_passphrase_command option is specified to the initdb, the process will derive the cluster passphrase into a Key Encryption Key (KEK) and a HMAC Key using key derivation protocol before the actual generation of application level cryptographic level keys.-Key Encryption Key (KEK)KEK is primarily used to encapsulate or restore a given application level cryptographic key-HMAC KeyHMAC key is used to compute the HASH of a given application level cryptographic key for integrity check purposesThese 2 keys are not stored physically within the PostgreSQL cluster as they are designed to be derived from the correctly configured cluster passphrase.Encryption Key Management System currently manages 3 application level cryptographic keys that have different purposes and usages within the PostgreSQL system and these are generated using pg_strong_random() after KEK and HMAC key derivation during initdb process.The 3 keys are:-SQL Level KeySQL Level Key is used to wrap and unwrap a user secret / passphrase via pg_wrap() and pg_unwrap() SQL functions. These 2 functions are designed to be used in conjunction with the cryptographic functions provided by pgcrypto extension to perform column level encryption/decryption without having to supply a clear text user secret or passphrase that is required by many pgcrypto functions as input. Please refer to [Wrap and Unwrap User Secret section] for usage examples.-Block Level KeyBlock Level Key is primarily used to encrypt / decrypt buffers as part of the Transparent Data Encryption (TDE) feature-WAL Level KeyWAL Level Key is primarily used to encrypt / decrypt WAL files as part of the Transparent Data Encryption (TDE) featureThe 3 application level keys above will be encapsulated and hashed using KEK and HMAC key mentioned above before they are physically stored to pg_cryptokeys directory within the cluster.32.1. Key Initialization-------------------------When a PostgreSQL cluster with encryption key management enabled is started, the cluster_passphrase_command parameter in postgresql.conf will be evaluated and the cluster passphrase will be derived into KEK and HMAC Key in similar ways as initdb.After that, the 3 encapsulated application level cryptographic keys will be retrieved from pg_cryptokeys directory to be restored and integrity-checked by the key management system using the derived KEK and HMAC key. If this process fails, it is likely that the cluster passphrase supplied to the cluster is not the same as that supplied to the initdb process. The cluster will refuse to start in this case and user has to manually correct the cluster passphrase.32.2. Wrap and Unwrap User Secret----------------------------------------Encryption key management system provides pg_wrap() and pg_unwrap SQL functions (listed in Table 9.97) to perform wrap and unwrap operations on user secret with the SQL level encryption key. The SQL level encryption key is one of the 3 application level keys generated during initdb process when cluster_passphrase is supplied.When pg_wrap() and pg_unwrap() functions are invoked, SQL level encryption key will internally be used to perform the encryption and decryption operation with HMAC-based integrity check. From user's point of view, he or she is not aware of the actual SQL level encryption key used internally by both wrap functionsOne possible use case is to combine pg_wrap() and pg_unwrap() with pgcrypto. User wraps the user encryption secret with pg_wrap function and passes the wrapped encryption secret to pg_unwrap function for the pgcrypto encryption functions. The wrapped secret can be stored in the application server or somewhere secured and should be obtained promptly for cryptographic operation with pgcrypto.Here is an example that shows how to encrypt and decrypt data together with wrap and unwrap functions:=# SELECT pg_wrap('my secret passward');pg_wrap--------------------------------------------------------------------------------------------------------------------------------------------------------------------\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe(1 row)Once wrapping the user key, user can encrypt and decrypt user data using the wrapped user key together with the key unwrap functions:=# INSERT INTO tblVALUES (pgp_sym_encrypt('secret data',pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe')));INSERT 1=# SELECT * FROM tbl;col--------------------------------------------------------------------------------------------------------------------------------------------------------------\xc30d04070302a199ee38bea0320b75d23c01577bb3ffb315d67eecbeca3e40e869cea65efbf0b470f805549af905f94d94c447fbfb8113f585fc86b30c0bd784b10c9857322dc00d556aa8de14(1 row)=# SELECT pgp_sym_decrypt(col,pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe')) as colFROM tbl;col--------------secret data(1 row)The data 'secret data' is practically encrypted by the user secret 'my secret passward' but using wrap and unwrap functions user don't need to know the actual user secret during operation.32.3. Key Rotation Process------------------------------Encryption keys in general are not interminable, the longer the same key is in use, the chance of it being breached increases. Performing key rotation on a regular basis help meet standardized security practices such as PCI-DSS and it is a good practice in security to limit the number of encrypted bytes available for a specific key version. The key lifetimse are based on key length, key strength, algorithm and total number of bytes enciphered. The key management systems provides a efficient method to perform key rotation.Please be aware that the phrase "key rotation" here only refers to the rotation of KEK and HMAC keys. The 3 application level encryption keys (SQL, Block and WAL levels) are not rotated; they will in fact be the same before and after a "key rotation." This can be justified because the actual keys are never stored anywhere physically, presented to user or captured in logging. What is being rotated here is the KEK and HMAC keys who are responsible for encapsulating and restoring the actual application level encryption keys.Since both KEK and HMAC keys are derived from a cluster passphrase, the "key rotation" ultimately refers to the rotation of cluster passphrase and deriving a new KEK and HMAC keys from the new cluster passphrase. The new set of KEK and HMAC keys can then be used to encapsulate all 3 application level encryptions keys and store the new results in pg_cryptokeys directory.To rotate the cluster passphrase, user firstly needs to update cluster_passphrase_command in the postgresql.conf and then execute pg_rotate_cluster_passphrase() SQL function to initiate the rotation.---- On Mon, 30 Mar 2020 21:30:19 -0700 Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote ----On Tue, 31 Mar 2020 at 09:36, Cary Huang <cary.huang@highgo.ca> wrote:
>
> Hi
> I had a look on kms_v9 patch and have some comments
>
> --> pg_upgrade.c
> keys are copied correctly, but as pg_upgrade progresses further, it will try to start the new_cluster from "issue_warnings_and_set_wal_level()" function, which is called after key copy. The new cluster will fail to start due to the mismatch between cluster_passphrase_command and the newly copied keys. This causes pg_upgrade to always finish with failure. We could move "copy_master_encryption_key()" to be called after "issue_warnings_and_set_wal_level()" and this will make pg_upgrade to finish with success, but user will still have to manually correct the "cluster_passphrase_command" param on the new cluster in order for it to start up correctly. Should pg_upgrade also take care of copying "cluster_passphrase_command" param from old to new cluster after it has copied the encryption keys so users don't have to do this step? If the expectation is for users to manually correct "cluster_passphrase_command" param after successful pg_upgrade and key copy, then there should be a message to remind the users to do so.
I think both the old cluster and the new cluster must be initialized
with the same passphrase at initdb. Specifying the different
passphrase command to the new cluster at initdb and changing it after
pg_upgrade doesn't make sense. Also I don't think we need to copy
cluster_passphrase_command same as other GUC parameters.
I've changed the patch so that pg_upgrade copies the crypto keys only
if both new and old cluster enable the key management. User must
specify the same passphrase command to both old and new cluster, which
is not cumbersome, I think. I also added the description about this to
the doc.
>
> -->Kmgr.c
> + /*
> + * If there is only temporary directory, it means that the previous
> + * rotation failed after wrapping the all internal keys by the new
> + * passphrase. Therefore we use the new cluster passphrase.
> + */
> + if (stat(KMGR_DIR, &st) != 0)
> + {
> + ereport(DEBUG1,
> + (errmsg("both directories %s and %s exist, use the newly wrapped keys",
> + KMGR_DIR, KMGR_TMP_DIR)));
>
> I think the error message should say "there is only temporary directory exist" instead of "both directories exist"
You're right. Fixed.
I've attached the new version patch.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sat, 2 May 2020 at 07:17, Cary Huang <cary.huang@highgo.ca> wrote: > > Hi all > > I am sharing here a document patch based on top of kms_v10 that was shared awhile back. This document patch aims to covermore design details of the current KMS design and to help people understand KMS better. Please let me know if you haveany more comments. Thank you for your patch! I've changed the internal key management patch much. Here is the summary of the changes: I've changed the key manager so that it can manage multiple cryptographic keys up to 128 bytes long. Currently, all keys managed by the key manager need to be pre-defined, and the key manager has only one cryptographic key, SQL key which is used to encrypt/decrypt data via SQL function interface. But it's easy to add new keys for potential use cases, for example when we need some keys for transparent data encryption. When the server starting up, the key manager unwraps the internal key and load to the shared memory. Perhaps we need to protect the load key memory space from being swapped out using mlock() but it's not implemented yet. For SQL interface, I've changed the patch much. The encryption process we called 'wrap' and 'unwrap' is actually authenticated encryption with associated data[1] (AEAD) which is not a dedicated way to wrap cryptographic keys. I renamed pg_wrap() and pg_unwrap() to pg_encrypt() and pg_decrypt() to make these function names more understandable. These SQL functions encrypt/decrypt data using the SQL key. So currently, there are two usages of pg_encrypt () and pg_decrypt() functions to encrypt database data: First, we can encrypt data directly using these SQL functions. That way, users don't need to manage and know the encryption key, moreover, we enable users to use AEAD without pgcrypto. Second, by wrapping the user secret key using these SQL functions we can use them in conjunction with the cryptographic functions provided by pgcrypto. Users can wrap their secret key by SQL key via pg_encrypt(), and then use the user secret unwrapped by pg_decrypt() when SELECT, INSERT, UPDATE, and DELETE operation. Here is an example: -- Wrap user secret and save to 'key' variable, or somewhere =# SELECT pg_encrypt('user password') as key \gset -- Encrypt/decrypt user data with the secret string 'user password' which is obtained by unwrapping 'key' variable. =# INSERT INTO tbl VALUES (pgp_sym_encrypt('abc123', pg_decrypt(:'key'))); =# SELECT pgp_sym_decrypt(col, pg_decrypt(:'key')) FROM tbl; However, this usage has a downside that user secret can be logged to server logs when log_statement = 'all' or an error happens. To deal with this issue I've created a PoC patch on top of the key manager patch which adds a libpq function PQencrypt() to encrypt data and new psql meta-command named \encrypt in order to encrypt data while eliminating the possibility of the user data being logged. PQencrypt() just calls pg_encrypt() via PQfn(). Using this command the above example can become as follows: -- Encrypt user secret via PQfn and store it to 'key variable, or somewhere =# \encrypt Enter data: Enter it again: encrypted data: \x8e17079ed65f570f5adcac9023cb5d079708f34563e62f3f9f1f0f26c7ad4ecf7b90dc199d7b3bbf663c8800d98162d02dc30da247ca4c825f3240c4a7c419a7c8785d9f7f974d0ed310f179ecbbab1ecf38ec48d74d41dd13544595d45d5ec9 =# \set key '\x8e17079ed65f570f5adcac9023cb5d079708f34563e62f3f9f1f0f26c7ad4ecf7b90dc199d7b3bbf663c8800d98162d02dc30da247ca4c825f3240c4a7c419a7c8785d9f7f974d0ed310f179ecbbab1ecf38ec48d74d41dd13544595d45d5ec9 -- Encrypt/decrypt user data with the secret string 'user password' which is obtained by unwrapping 'key' variable. =# INSERT INTO tbl VALUES (pgp_sym_encrypt('abc123', pg_decrypt(:'key'))); =# SELECT pgp_sym_decrypt(col, pg_decrypt(:'key')) FROM tbl; BTW after some research, I've found Always Encrypted which is a database encryption feature provided by SQL Server uses a quite similar approach called AEAD_AES_256_CBC_HMAC_SHA_256[2]. AEAD_AES_256_CBC_HMAC_SHA_256 is actually derived from from the specification draft[3]. For documentation, I've incorporated the proposed update by Cary and add some descriptions, especially for AEAD. I've separate the patch into several pieces so it can be reviewed easily. Here is a short description of each patch: 0001 patch introduces AES256-CBC and HMAC-SHA512 to src/common. These functions are enabled only when --with-openssl environment. 0002 patch introduces AEAD algorithm to src/common. 0003 patch introduces the key management module that is split into two parts: utility code and backend code. The key manager reads/writes cryptographic keys, verifies the given passphrase, wraps and unwraps keys using AEAD. Currently the key manager has only one internal key, SQL key. 0004 patch added two SQL functions: pg_encrypt() and pg_decrypt() 0005 and 0006 patch introduce regression tests and documentation respectively. 0007 patch is a PoC patch that adds PQencrypt() function and psql's \encrypt meta-command to encrypt data so that target data are not logged. Regards, [1] https://en.wikipedia.org/wiki/Authenticated_encryption#Authenticated_encryption_with_associated_data_(AEAD) [2] https://docs.microsoft.com/ja-jp/sql/relational-databases/security/encryption/always-encrypted-cryptography?view=sql-server-ver15 [3] https://tools.ietf.org/html/draft-mcgrew-aead-aes-cbc-hmac-sha2-05. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
- v11-0007-POC-libpq-encryption-protocol.patch
- v11-0003-Add-key-management-module.patch
- v11-0006-Documentation-update.patch
- v11-0004-Add-encrypt-and-decrypt-SQL-functions.patch
- v11-0005-Add-regression-tests-for-key-management.patch
- v11-0002-Add-AEAD-functions-for-both-frontend-and-backend.patch
- v11-0001-Add-encryption-functions-for-both-frontend-and-b.patch
On Fri, May 29, 2020 at 1:50 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > However, this usage has a downside that user secret can be logged to > server logs when log_statement = 'all' or an error happens. To deal > with this issue I've created a PoC patch on top of the key manager > patch which adds a libpq function PQencrypt() to encrypt data and new > psql meta-command named \encrypt in order to encrypt data while > eliminating the possibility of the user data being logged. > PQencrypt() just calls pg_encrypt() via PQfn(). Using this command the > above example can become as follows: If PQfn() calls aren't currently logged, that's probably more of an oversight due to the feature being almost dead than something upon which we want to rely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Masahiko-san, >> I am sharing here a document patch based on top of kms_v10 that was >> shared awhile back. This document patch aims to cover more design >> details of the current KMS design and to help people understand KMS >> better. Please let me know if you have any more comments. A few questions and comments, mostly about the design. If I'm off topic, or these concerns have been clearly addressed in the thread, please accept my apology. A lot of what I write is based on guessing from a look at the doc & code provided in the patch. The patch should provide some explanatory README about the overall design. It is a lot of code, which for me should not be there, inside the backend. Could this whole thing be an extension? I cannot see why not. If it could, then ISTM that it should. If not, what set of features is needed to allow that as an extension? How could pg be improved so that it could be an extension? Also, I'm not at fully at ease with some of the underlying principles behind this proposal. Are we re-inventing/re-implementing kerberos or whatever? Are we re-implementing a brand new KMS inside pg? Why having our own? I think that key management should *not* belong to pg itself, but to some external facility/process with which pg would interact, so that no master key would ever be inside pg process, and possibly not on the same host, if it was me doing it. If some extension could provide it inside the process and stores thing inside some pg_cryptokeys directory, then fine if it fits the threat model being addressed, but the paranoïd user wanting that should have other options which could be summarized as "outside". Another benefit of "outside" is that if there is a security issue attached to the kms, then it would not be a pg security issue, and it would not affect normal pg users which do not use the feature. Also, implementing a crash-safe key rotation algorithm does not look like inside pg backend, that is not its job. Likewise, the AEAD AES-CBC HMAC-SHA512 does definitely not belong to postgres core backend implementation. Why should I use the OpenSSL library and not some other facility? Basically, I'm -1 on having such a feature right inside pg, and +1 on allowing pg to have it outside and interact with it appropriately, preferably through an extension which could be in core. So my take is that pg should allow an extension to: - provide a *generic* way to interact with an *external* kms eg by running a command (possibly setuid something) and interacting with its stdin/stderr what the command does should be of no concern to pg and use some trivial text protocol, and the existing code can be wrapped as an example working implementation. - store some local keys somewhere and provide functions to use these keys to encrypt/decrypt stuff, obviously, as generic as possible. ISTM that what crypto algorithms are actually used should not be hardcoded, but I'm not sure how to achieve that. Maybe simply by redefining the relevant function, maybe at the SQL level. There is an open question on how the "command" validates that it is indeed the right pg which is interacting with it. This means some authentication, probably some passphrase to provide somehow, probably close to what is being implemented, so from an interface point of view, it could look quite the same, but the key point is that the whole thing would be out of postgres process, only encryption keys being used would be in postgres, and probably only in the process which actually needs it. Random comments about details I saw in passing: * key_management_enabled key_management (on|off) ? * initdb -D dbname --cluster-passphrase-command="cat /path/to/passphrase-file" Putting example in the documentation looks like a recommendation. It would put a caveat that doing the above is probably a bad idea. -- Fabien.
On Sun, 31 May 2020 at 17:13, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > >> I am sharing here a document patch based on top of kms_v10 that was > >> shared awhile back. This document patch aims to cover more design > >> details of the current KMS design and to help people understand KMS > >> better. Please let me know if you have any more comments. > > A few questions and comments, mostly about the design. If I'm off topic, > or these concerns have been clearly addressed in the thread, please accept > my apology. Thank you for your comments! Please correct me if I'm misunderstanding your questions and comments. > > A lot of what I write is based on guessing from a look at the doc & code > provided in the patch. The patch should provide some explanatory README > about the overall design. Agreed. > > It is a lot of code, which for me should not be there, inside the backend. > Could this whole thing be an extension? I cannot see why not. If it could, > then ISTM that it should. If not, what set of features is needed to allow > that as an extension? How could pg be improved so that it could be an > extension? Let me explain some information about TDE behind this key manager patch. This key manager is aimed to manage cryptographic keys used for transparent data encryption. As a result of the discussion, we concluded it's safer to use multiple keys to encrypt database data rather than using one key to encrypt the whole thing, for example, in order to make sure different data is not encrypted with the same key and IV. Therefore, in terms of TDE, the minimum requirement is that PostgreSQL can use multiple keys. Using multiple keys in PG, there are roughly two possible designs: 1. Store all keys for TDE into the external KMS, and PG gets them from it as needed. 2. PG manages all keys for TDE inside and protect these keys on disk by the key (i.g. KEK) stored in the external KMS. There are pros and cons to each design. If I take one cons of #1 as an example, the operation between PG and the external KMS could be complex. The operations could be creating, removing and rotate key and so on. We can implement these operations in an extension to interact with different kinds of external KMS, and perhaps we can use KMIP. But the development cost could become high because we might need different extensions for each key management solutions/services. #2 is better at that point; the interaction between PG and KMS is only GET. Other databases employes a similar approach are SQL Server and DB2. In terms of the necessity of introducing the key manager into PG core, I think at least TDE needs to be implemented in PG core. And as this key manager is for managing keys for TDE, I think the key manager also needs to be introduced into the core so that TDE functionality doesn't depend on external modules. > > Also, I'm not at fully at ease with some of the underlying principles > behind this proposal. Are we re-inventing/re-implementing kerberos or > whatever? Are we re-implementing a brand new KMS inside pg? Why having > our own? As I explained above, this key manager is for managing internal keys used by TDE. It's not an alternative to existing key management solutions/services. The requirements of this key manager are generating internal keys, letting other PG components use them, protecting them by KEK when persisting, and support KEK rotation. It doesn’t have a feature like allowing users to store arbitrary keys into this key manager, like other key management solutions/services have. > > I think that key management should *not* belong to pg itself, but to some > external facility/process with which pg would interact, so that no master > key would ever be inside pg process, and possibly not on the same host, if > it was me doing it. > > If some extension could provide it inside the process and stores thing > inside some pg_cryptokeys directory, then fine if it fits the threat model > being addressed, but the paranoïd user wanting that should have other > options which could be summarized as "outside". > > Another benefit of "outside" is that if there is a security issue attached > to the kms, then it would not be a pg security issue, and it would not > affect normal pg users which do not use the feature. I agree that the key used to encrypt data must not be placed in the same host. But it's true only when the key is not protected, right? In this key manager, since we protect all internal keys by KEK it's no problem unless KEK is leaked. KEK can be obtained from outside key management solutions/services through cluster_passphrase_command. > > Also, implementing a crash-safe key rotation algorithm does not look like > inside pg backend, that is not its job. The key rotation this key manager has is KEK rotation, which is very important. Without KEK rotation, when KEK is leaked an attacker can get database data by disk theft. Since KEK is responsible for encrypting all internal keys it's necessary to re-encrypt the internal keys when KEK is rotated. I think PG is the only role that can do that job. In terms of rotation of internal keys, an idea proposed during the discussion is that change the internal keys when pg_basebackup. The sender transfers database data after decryption and the receiver encrypts the received data with the different internal keys from what the sender has. > Likewise, the AEAD AES-CBC > HMAC-SHA512 does definitely not belong to postgres core backend > implementation. Why should I use the OpenSSL library and not some other > facility? The purpose of AEAD is to do both things: encrypting internal keys and integrity checks of these keys. We cannot do integrity checks using only AES. Another option could be to use AES key wrapping[1]. > > Basically, I'm -1 on having such a feature right inside pg, and +1 on > allowing pg to have it outside and interact with it appropriately, > preferably through an extension which could be in core. > > So my take is that pg should allow an extension to: > > - provide a *generic* way to interact with an *external* kms > eg by running a command (possibly setuid something) and interacting > with its stdin/stderr what the command does should be of no concern > to pg and use some trivial text protocol, and the existing code > can be wrapped as an example working implementation. > > - store some local keys somewhere and provide functions to use these > keys to encrypt/decrypt stuff, obviously, as generic as possible. > > ISTM that what crypto algorithms are actually used should not be > hardcoded, but I'm not sure how to achieve that. Maybe simply by > redefining the relevant function, maybe at the SQL level. > I think this key manager satisfies the fist point by cluster_passphrase_command. For the second point, the key manager stores local keys inside PG while protecting them by KEK managed outside of PG. Inspired by SQL Server's Always Encrypted I implemented pg_encrypt() and pg_decrypt() but these are actually not necessary in terms of TDE. We can introduce the key manager with empty internal keys and then introduce TDE with adding necessary keys. I agree with the point that crypto algorithms should not be hardcoded. > There is an open question on how the "command" validates that it is indeed > the right pg which is interacting with it. This means some authentication, > probably some passphrase to provide somehow, probably close to what is > being implemented, so from an interface point of view, it could look quite > the same, but the key point is that the whole thing would be out of > postgres process, only encryption keys being used would be in postgres, > and probably only in the process which actually needs it. > I might be missing your point but is the question of how to verify the passphrase given by cluseter_passphrase_command is correct? > Random comments about details I saw in passing: > > * key_management_enabled > > key_management (on|off) ? > > * initdb -D dbname --cluster-passphrase-command="cat /path/to/passphrase-file" > > Putting example in the documentation looks like a recommendation. It would > put a caveat that doing the above is probably a bad idea. Agreed on the above two points. Regards, [1] https://tools.ietf.org/html/rfc3394 -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 30 May 2020 at 04:20, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, May 29, 2020 at 1:50 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > However, this usage has a downside that user secret can be logged to > > server logs when log_statement = 'all' or an error happens. To deal > > with this issue I've created a PoC patch on top of the key manager > > patch which adds a libpq function PQencrypt() to encrypt data and new > > psql meta-command named \encrypt in order to encrypt data while > > eliminating the possibility of the user data being logged. > > PQencrypt() just calls pg_encrypt() via PQfn(). Using this command the > > above example can become as follows: > > If PQfn() calls aren't currently logged, that's probably more of an > oversight due to the feature being almost dead than something upon > which we want to rely. Agreed. The patch includes pg_encrypt() and pg_decrypt() SQL functions inspired by Always Encryption but these functions are interfaces of the key manager to make it work independently from TDE and are actually not necessary in terms of TDE. Perhaps it's better to consider whether it's worth having them after introducing TDE. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
I took a step back today and started to think about the purpose of internal KMS and what it is supposed to do, and how it compares to external KMS. Both are intended to manage the life cycle of encryptions keys including their generation, protection, storage and rotation. External KMS, on the other hand, is a more centralized but expensive way to manage encryption life cycles and many deployment actually starts with internal KMS and later migrate to external one.
Anyhow, the design and implementation of internal KMS should circle around these stages of key lifecycle.
I am saying this, because I want to make sure we can all agree on the scope of internal KMS. Without clear scope, this KMS development will seem to go on forever.
In this patch, the internal KMS exposes pg_encrypt() and pg_decrypt() (was pg_wrap() and pg_unwrap() before) to the user to turn a clear text password into some sort of key material based on the SQL level key generated at initdb. This is used so the user does not have to provide clear text password to pgp_sym_encrypt() provided by pgcrypto extension. The intention is good, I understand, but I don't think it is within the scope of KMS and it is definitely not within the scope of TDE either.
Even if the password can be passed into pgp_sym_encrypt() securely by using pg_decrypt() function, the pgp_sym_encrypt() still will have to take this password and derive into an encryption key using algorithms that internal KMS does not manage currently. This kind of defeats the purpose of internal KMS. So simply using pg_encrypt() and pg_decrypt() is not really a solution to pgcrypto's limitation. This should be in another topic/project that is aimed to improve pgcrypto by integrating it with the internal KMS, similar to TDE where it also has to integrate with the internal KMS later.
So for internal KMS, the only cryptographic functions needed for now is kmgr_wrap_key() and kmgr_unwrap_key() to encapsulate and restore the encryptions keys to satisfy the "key protection" life cycle stage. I don't think pg_encrypt() and pg_decrypt() should be part of internal KMS.
Anyways, I have also reviewed the patch and have a few comments below:
(1)
The ciphering algorithm in my opinion should contain the algorithm name, key length and block cipher mode, which is similar to openssl's definition.
Instead of defining a cipher as PG_CIPHER_AES_CBC, and have key length as separate parameter, I would define them as
#define PG_CIPHER_AES128_CBC 0
#define PG_CIPHER_AES256_CBC 1
#define PG_CIPHER_AES128_CTR 2
#define PG_CIPHER_AES256_CTR 3
I know PG_CIPHER_AES128_CTR and PG_CIPHER_AES256_CTR are not being used now as these are for the TDE in future, but might as well list them here because this KMS is made to work specifically for TDE as I understand.
-----------------------------------------------------------------------------------------------------------
/*
* Supported symmetric encryption algorithm. These identifiers are passed
* to pg_cipher_ctx_create() function, and then actual encryption
* implementations need to initialize their context of the given encryption
* algorithm.
*/
#define PG_CIPHER_AES_CBC 0
#define PG_MAX_CIPHER_ID 1
-----------------------------------------------------------------------------------------------------------
(2)
If the cipher algorithms are defined like (1), then there is no need to pass key length as argument to ossl_cipher_ctx_create() function because it already knows the key length based on the cipher definition. Less argument the better.
-----------------------------------------------------------------------------------------------------------
PgCipherCtx *
pg_cipher_ctx_create(int cipher, uint8 *key, int klen)
{
PgCipherCtx *ctx = NULL;
if (cipher >= PG_MAX_CIPHER_ID)
return NULL;
#ifdef USE_OPENSSL
ctx = (PgCipherCtx *) palloc0(sizeof(PgCipherCtx));
ctx->encctx = ossl_cipher_ctx_create(cipher, key, klen, true);
ctx->decctx = ossl_cipher_ctx_create(cipher, key, klen, false);
#endif
return ctx;
}
-----------------------------------------------------------------------------------------------------------
On Sat, 30 May 2020 at 04:20, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, May 29, 2020 at 1:50 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > However, this usage has a downside that user secret can be logged to
> > server logs when log_statement = 'all' or an error happens. To deal
> > with this issue I've created a PoC patch on top of the key manager
> > patch which adds a libpq function PQencrypt() to encrypt data and new
> > psql meta-command named \encrypt in order to encrypt data while
> > eliminating the possibility of the user data being logged.
> > PQencrypt() just calls pg_encrypt() via PQfn(). Using this command the
> > above example can become as follows:
>
> If PQfn() calls aren't currently logged, that's probably more of an
> oversight due to the feature being almost dead than something upon
> which we want to rely.
Agreed.
The patch includes pg_encrypt() and pg_decrypt() SQL functions
inspired by Always Encryption but these functions are interfaces of
the key manager to make it work independently from TDE and are
actually not necessary in terms of TDE. Perhaps it's better to
consider whether it's worth having them after introducing TDE.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 3 Jun 2020 at 08:30, Cary Huang <cary.huang@highgo.ca> wrote: > > Hi > > > > I took a step back today and started to think about the purpose of internal KMS and what it is supposed to do, and howit compares to external KMS. Both are intended to manage the life cycle of encryptions keys including their generation,protection, storage and rotation. External KMS, on the other hand, is a more centralized but expensive way tomanage encryption life cycles and many deployment actually starts with internal KMS and later migrate to external one. > > > > Anyhow, the design and implementation of internal KMS should circle around these stages of key lifecycle. > > > > 1. Key Generation - Yes, internal KMS module generates keys using pseudo random function, but only the keys for TDE and SQL level keys. Users cannot request new key generation > 2. Key Protection - Yes, internal KMS wrap all keys with KEK and HMAC hash derived from a cluster passphrase > 3. Key Storage - Yes, the wrapped keys are stored in the cluster > 4. Key Rotation - Yes, internal KMS has a SQL method to swap out cluster passphrase, which rotates the KEK and HMAC key > > > > I am saying this, because I want to make sure we can all agree on the scope of internal KMS. Without clear scope, thisKMS development will seem to go on forever. > > Yes, the internal KMS is not an alternative to external KMS such as AWS KMS, SafeNet Key Secure, and Vault, but a PostgreSQL internal component that can work with these external solutions (via cluster_passphrase_command). It's the same position as our other management components such as bufmgr, smgr, and lmgr. I agree with this scope. It manages only encryption keys used by PostgreSQL. > > In this patch, the internal KMS exposes pg_encrypt() and pg_decrypt() (was pg_wrap() and pg_unwrap() before) to the userto turn a clear text password into some sort of key material based on the SQL level key generated at initdb. This isused so the user does not have to provide clear text password to pgp_sym_encrypt() provided by pgcrypto extension. Theintention is good, I understand, but I don't think it is within the scope of KMS and it is definitely not within the scopeof TDE either. > I agree that neither pg_encrypt() nor pg_decrypt() is within the scope of KMS and TDE. That's why I've split the patch, and that's why I renamed to pg_encrypt() and pg_decrypt() to clarify the purpose of these functions is not key management. Key wrapping and unwrapping is one of the usages of these functions. I think we can use the internal KMS for several purposes. It can manage encryption keys not only for cluster-wide TDE but also, for example, for column-level TDE and encryption SQL functions. pg_encrypt() and pg_decrypt() are one example of the usage of the internal KMS. Originally since we thought KMS and TDE are not introduced at the same release, the idea is come up with so that users can use KMS functionality with some interface. Therefore these SQL functions are not within the scope of KMS and it should be fine with introducing the internal KMS having 0 keys. > Even if the password can be passed into pgp_sym_encrypt() securely by using pg_decrypt() function, the pgp_sym_encrypt()still will have to take this password and derive into an encryption key using algorithms that internal KMSdoes not manage currently. This kind of defeats the purpose of internal KMS. So simply using pg_encrypt() and pg_decrypt()is not really a solution to pgcrypto's limitation. Yeah, when using pgcrypto, user must manage their encryption keys. The internal KMS doesn't help that because it manages only keys internally used. What pg_encrypt() and pg_decrypt() can help is only to hide the password from server logs. > This should be in another topic/project that is aimed to improve pgcrypto by integrating it with the internal KMS, similarto TDE where it also has to integrate with the internal KMS later. > Agreed. > So for internal KMS, the only cryptographic functions needed for now is kmgr_wrap_key() and kmgr_unwrap_key() to encapsulateand restore the encryptions keys to satisfy the "key protection" life cycle stage. I don't think pg_encrypt()and pg_decrypt() should be part of internal KMS. > Agreed. > > Anyways, I have also reviewed the patch and have a few comments below: > > > > (1) > > The ciphering algorithm in my opinion should contain the algorithm name, key length and block cipher mode, which is similarto openssl's definition. > > > > Instead of defining a cipher as PG_CIPHER_AES_CBC, and have key length as separate parameter, I would define them as > > #define PG_CIPHER_AES128_CBC 0 > > #define PG_CIPHER_AES256_CBC 1 > > #define PG_CIPHER_AES128_CTR 2 > > #define PG_CIPHER_AES256_CTR 3 > Agreed. I was concerned that we will end up having many IDs in the future for example when porting pgcrypto functions into core but I'm okay with that change. > > > I know PG_CIPHER_AES128_CTR and PG_CIPHER_AES256_CTR are not being used now as these are for the TDE in future, but mightas well list them here because this KMS is made to work specifically for TDE as I understand. > > ----------------------------------------------------------------------------------------------------------- > > /* > > * Supported symmetric encryption algorithm. These identifiers are passed > > * to pg_cipher_ctx_create() function, and then actual encryption > > * implementations need to initialize their context of the given encryption > > * algorithm. > > */ > > #define PG_CIPHER_AES_CBC 0 > > #define PG_MAX_CIPHER_ID 1 > > ----------------------------------------------------------------------------------------------------------- > > > > (2) > > If the cipher algorithms are defined like (1), then there is no need to pass key length as argument to ossl_cipher_ctx_create()function because it already knows the key length based on the cipher definition. Less argument thebetter. Agreed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Masahiko-san, > This key manager is aimed to manage cryptographic keys used for > transparent data encryption. As a result of the discussion, we > concluded it's safer to use multiple keys to encrypt database data > rather than using one key to encrypt the whole thing, for example, in > order to make sure different data is not encrypted with the same key > and IV. Therefore, in terms of TDE, the minimum requirement is that > PostgreSQL can use multiple keys. > > Using multiple keys in PG, there are roughly two possible designs: > > 1. Store all keys for TDE into the external KMS, and PG gets them from > it as needed. +1 > 2. PG manages all keys for TDE inside and protect these keys on disk > by the key (i.g. KEK) stored in the external KMS. -1, this is the one where you would need arguing. > There are pros and cons to each design. If I take one cons of #1 as an > example, the operation between PG and the external KMS could be > complex. The operations could be creating, removing and rotate key and > so on. ISTM that only create (delete?) are really needed. Rotating is the problem of the KMS itself, thus does not need to be managed by pg under #1. > We can implement these operations in an extension to interact > with different kinds of external KMS, and perhaps we can use KMIP. I would even put that (KMIP protocol stuff) outside pg core. Even under #2, if some KMS is implemented and managed by pg, I would put the stuff in a separate process which I would probably run with a different uid, so that the KEK is not accessible directly by pg, ever. Once KMS interactions are managed with an outside process, then what this process does becomes an interface, and whether this process actually manages the keys or discuss with some external KMS with some KMIP or whatever is irrelevant to pg. Providing an interface means that anyone could implement their KMS fitting their requirements if they comply with the interface/protocol. Note that I'd be fine with having the current implementation somehow wrapped up as an example KMS. > But the development cost could become high because we might need > different extensions for each key management solutions/services. Yes and no. What I suggest is, I think, pretty simple, and I think I can implement it in a few line of script, so the cost is not high, and having a separate process looks, to me, like a security win and an extensibility win (i.e. another implementation can be provided). > #2 is better at that point; the interaction between PG and KMS is only > GET. I think that it could be the same with #1. I think that having a separate process is a reasonable security requirement, and if you do that #1 and #2 are more or less the same. > Other databases employes a similar approach are SQL Server and DB2. Too bad for them:-) I'd still disagree with having the master key inside the database process, even if Microsoft, IBM and Oracle think it is a good idea. > In terms of the necessity of introducing the key manager into PG core, > I think at least TDE needs to be implemented in PG core. And as this > key manager is for managing keys for TDE, I think the key manager also > needs to be introduced into the core so that TDE functionality doesn't > depend on external modules. Hmmm. My point is that only interactions should be in core. The implementation could be in core, but as a separate process. I agree that pg needs to be able to manage the DEK, so it needs to store data keys. I still do not understand why an extension, possibly distributed with pg, would not be ok. There may be good arguments for that, but I do not think you provided any yet. >> Also, I'm not at fully at ease with some of the underlying principles >> behind this proposal. Are we re-inventing/re-implementing kerberos or >> whatever? Are we re-implementing a brand new KMS inside pg? Why having >> our own? > > As I explained above, this key manager is for managing internal keys > used by TDE. It's not an alternative to existing key management > solutions/services. Hmmm. This seels to suggest that interacting with something outside should be an option. > The requirements of this key manager are generating internal keys, > letting other PG components use them, protecting them by KEK when > persisting, If you want that, I'd still argue that you should have a separate process. > and support KEK rotation. It doesn’t have a feature like > allowing users to store arbitrary keys into this key manager, like > other key management solutions/services have. Hmmm. > I agree that the key used to encrypt data must not be placed in the > same host. But it's true only when the key is not protected, right? The DEK is needed when encrypting and decrypting, obviously, so it would be there once obtained, it cannot be helped. My concern is about the KEK, which AFAICS in your code is somewhere in memory accessible by the postgres process, which is a no go for me. The definition of "protected" is fuzzy, it would depend on what the user requires. Maybe protected for someone is "in a file which is only readable by postgres", and for someone else it means "inside an external hardware components activated by the fingerprint of the CEO". > In > this key manager, since we protect all internal keys by KEK it's no > problem unless KEK is leaked. KEK can be obtained from outside key > management solutions/services through cluster_passphrase_command. Again, I do not think that the KEK should be in postgres process, ever. >> >> Also, implementing a crash-safe key rotation algorithm does not look like >> inside pg backend, that is not its job. > > The key rotation this key manager has is KEK rotation, which is very > important. Without KEK rotation, when KEK is leaked an attacker can > get database data by disk theft. Since KEK is responsible for > encrypting all internal keys it's necessary to re-encrypt the internal > keys when KEK is rotated. I think PG is the only role that can do that > job. I'm not claiming that KEK rotation is a bad thing, I'm saying that it should not be postgres problem. My issue is where you put the thing, not about the thing itself. > I think this key manager satisfies the fist point by > cluster_passphrase_command. For the second point, the key manager > stores local keys inside PG while protecting them by KEK managed > outside of PG. I do not understand. From what I understood from the code, the KEK is loaded into postgres process. That is what I'm disagreeing with, only needed DEK should be there. [...] -- Fabien.
On Wed, 3 Jun 2020 at 16:16, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > > This key manager is aimed to manage cryptographic keys used for > > transparent data encryption. As a result of the discussion, we > > concluded it's safer to use multiple keys to encrypt database data > > rather than using one key to encrypt the whole thing, for example, in > > order to make sure different data is not encrypted with the same key > > and IV. Therefore, in terms of TDE, the minimum requirement is that > > PostgreSQL can use multiple keys. > > > > Using multiple keys in PG, there are roughly two possible designs: > > > > 1. Store all keys for TDE into the external KMS, and PG gets them from > > it as needed. > > +1 In this approach, encryption keys obtained from the external KMS are directly used to encrypt/decrypt data. What KEK and DEK are you referring to in this approach? > > > 2. PG manages all keys for TDE inside and protect these keys on disk > > by the key (i.g. KEK) stored in the external KMS. > > -1, this is the one where you would need arguing. > > > There are pros and cons to each design. If I take one cons of #1 as an > > example, the operation between PG and the external KMS could be > > complex. The operations could be creating, removing and rotate key and > > so on. > > ISTM that only create (delete?) are really needed. Rotating is the problem > of the KMS itself, thus does not need to be managed by pg under #1. With your idea how is the key rotation going to be performed? After invoking key rotation on the external KMS we need to re-encrypt all data encrypted with the old keys? Or you assume that the external KMS employes something like 2-tier key hierarchy? > > > We can implement these operations in an extension to interact > > with different kinds of external KMS, and perhaps we can use KMIP. > > I would even put that (KMIP protocol stuff) outside pg core. > > Even under #2, if some KMS is implemented and managed by pg, I would put > the stuff in a separate process which I would probably run with a > different uid, so that the KEK is not accessible directly by pg, ever. > > Once KMS interactions are managed with an outside process, then what this > process does becomes an interface, and whether this process actually > manages the keys or discuss with some external KMS with some KMIP or > whatever is irrelevant to pg. Providing an interface means that anyone > could implement their KMS fitting their requirements if they comply with > the interface/protocol. Just to be clear we don't keep KEK on neither shared memory nor disk. Postmaster and a backend who executes pg_rotate_cluster_passphrase() get KEK and use it to (re-)encrypt internal keys. But after that they immediately free it. The encryption keys we need to store inside PostgreSQL are DEK. > > Note that I'd be fine with having the current implementation somehow > wrapped up as an example KMS. > > > But the development cost could become high because we might need > > different extensions for each key management solutions/services. > > Yes and no. What I suggest is, I think, pretty simple, and I think I can > implement it in a few line of script, so the cost is not high, and having > a separate process looks, to me, like a security win and an extensibility > win (i.e. another implementation can be provided). How can we get multiple keys from the external KMS? I think we will need to save something like identifiers for each encryption key Postgres needs in the core and ask the external KMS for the key by the identifier via an extension. Is that right? > > > #2 is better at that point; the interaction between PG and KMS is only > > GET. > > I think that it could be the same with #1. I think that having a separate > process is a reasonable security requirement, and if you do that #1 and #2 > are more or less the same. > > > Other databases employes a similar approach are SQL Server and DB2. > > Too bad for them:-) I'd still disagree with having the master key inside > the database process, even if Microsoft, IBM and Oracle think it is a good > idea. > > > In terms of the necessity of introducing the key manager into PG core, > > I think at least TDE needs to be implemented in PG core. And as this > > key manager is for managing keys for TDE, I think the key manager also > > needs to be introduced into the core so that TDE functionality doesn't > > depend on external modules. > > Hmmm. > > My point is that only interactions should be in core. > > The implementation could be in core, but as a separate process. > > I agree that pg needs to be able to manage the DEK, so it needs to store > data keys. > > I still do not understand why an extension, possibly distributed with pg, > would not be ok. There may be good arguments for that, but I do not think > you provided any yet. Hmm I think I don't fully understand your idea yet. With the current patch, KEK could be obtained by either postmaster or backend processs who execute pg_rotate_cluster_passphrase() and KEK isn't stored anywhere on shared memory and disk. With your idea, KEK always is obtained by the particular process by a way provided by an extension. Is my understanding right? > > >> Also, I'm not at fully at ease with some of the underlying principles > >> behind this proposal. Are we re-inventing/re-implementing kerberos or > >> whatever? Are we re-implementing a brand new KMS inside pg? Why having > >> our own? > > > > As I explained above, this key manager is for managing internal keys > > used by TDE. It's not an alternative to existing key management > > solutions/services. > > Hmmm. This seels to suggest that interacting with something outside should > be an option. > > > The requirements of this key manager are generating internal keys, > > letting other PG components use them, protecting them by KEK when > > persisting, > > If you want that, I'd still argue that you should have a separate process. > > > and support KEK rotation. It doesn’t have a feature like > > allowing users to store arbitrary keys into this key manager, like > > other key management solutions/services have. > > Hmmm. > > > I agree that the key used to encrypt data must not be placed in the > > same host. But it's true only when the key is not protected, right? > > The DEK is needed when encrypting and decrypting, obviously, so it would > be there once obtained, it cannot be helped. My concern is about the KEK, > which AFAICS in your code is somewhere in memory accessible by the > postgres process, which is a no go for me. No. In the current patch, we don't save KEK anywhere on shared memory and disk. Once a process (postmaster or backend) used KEK stored in a PgAeadCtx it frees this context. We put the internal keys, DEK, in the shared buffer during startup. > > The definition of "protected" is fuzzy, it would depend on what the user > requires. Maybe protected for someone is "in a file which is only readable > by postgres", and for someone else it means "inside an external hardware > components activated by the fingerprint of the CEO". > > > In > > this key manager, since we protect all internal keys by KEK it's no > > problem unless KEK is leaked. KEK can be obtained from outside key > > management solutions/services through cluster_passphrase_command. > > Again, I do not think that the KEK should be in postgres process, ever. > > >> > >> Also, implementing a crash-safe key rotation algorithm does not look like > >> inside pg backend, that is not its job. > > > > The key rotation this key manager has is KEK rotation, which is very > > important. Without KEK rotation, when KEK is leaked an attacker can > > get database data by disk theft. Since KEK is responsible for > > encrypting all internal keys it's necessary to re-encrypt the internal > > keys when KEK is rotated. I think PG is the only role that can do that > > job. > > I'm not claiming that KEK rotation is a bad thing, I'm saying that it > should not be postgres problem. My issue is where you put the thing, not > about the thing itself. > > > I think this key manager satisfies the fist point by > > cluster_passphrase_command. For the second point, the key manager > > stores local keys inside PG while protecting them by KEK managed > > outside of PG. > > I do not understand. From what I understood from the code, the KEK is > loaded into postgres process. That is what I'm disagreeing with, only > needed DEK should be there. Please refer to kmgr_verify_passphrase() that is responsible for deriving KEK from passphrase, checking if the given passphrase is correct by unwrapping the internal keys, and storing the internal keys into the shared buffer: +bool +kmgr_verify_passphrase(char *passphrase, int passlen, + CryptoKey *keys_in, CryptoKey *keys_out, int nkeys) +{ + PgAeadCtx *tmpctx; + uint8 user_enckey[PG_AEAD_ENC_KEY_LEN]; + uint8 user_hmackey[PG_AEAD_MAC_KEY_LEN]; + + /* + * Create temporary wrap context with encryption key and HMAC key extracted + * from the passphrase. + */ + kmgr_derive_keys(passphrase, passlen, user_enckey, user_hmackey); + tmpctx = pg_create_aead_ctx(user_enckey, user_hmackey); + + for (int i = 0; i < nkeys; i++) + { + + if (!kmgr_unwrap_key(tmpctx, &(keys_in[i]), &(keys_out[i]))) + { + /* The passphrase is not correct */ + pg_free_aead_ctx(tmpctx); + return false; + } + } + + /* The passphrase is correct, free the cipher context */ + pg_free_aead_ctx(tmpctx); + + return true; +} We free tmpctx having KEK immediately after use. Or your argument is that we should not put KEK even onto a postgres process's local memory? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 3, 2020 at 09:16:03AM +0200, Fabien COELHO wrote: > > > Also, I'm not at fully at ease with some of the underlying principles > > > behind this proposal. Are we re-inventing/re-implementing kerberos or > > > whatever? Are we re-implementing a brand new KMS inside pg? Why having > > > our own? > > > > As I explained above, this key manager is for managing internal keys > > used by TDE. It's not an alternative to existing key management > > solutions/services. > > Hmmm. This seels to suggest that interacting with something outside should > be an option. Our goal is not to implement every possible security idea someone has, because we will never finish, and the final result would be too complex to be unable. You will need to explain exactly why having a separate process has value over coding/user complexity, and you will need to get agreement from a sufficient number of people to move that idea forward. > > The key rotation this key manager has is KEK rotation, which is very > > important. Without KEK rotation, when KEK is leaked an attacker can > > get database data by disk theft. Since KEK is responsible for > > encrypting all internal keys it's necessary to re-encrypt the internal > > keys when KEK is rotated. I think PG is the only role that can do that > > job. > > I'm not claiming that KEK rotation is a bad thing, I'm saying that it should > not be postgres problem. My issue is where you put the thing, not about the > thing itself. > > > I think this key manager satisfies the fist point by > > cluster_passphrase_command. For the second point, the key manager > > stores local keys inside PG while protecting them by KEK managed > > outside of PG. > > I do not understand. From what I understood from the code, the KEK is loaded > into postgres process. That is what I'm disagreeing with, only needed DEK > should be there. One option would be to send the data needing to be encrypted to an external command, and get the decrypted data back. In that way, the KEK is never on the Postgres server. However, the API for doing such an interface seems very complex and could lead to failures. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Hello Bruce, >> Hmmm. This seels to suggest that interacting with something outside >> should be an option. > > Our goal is not to implement every possible security idea someone has, > because we will never finish, and the final result would be too complex > to be unable. Sure. I'm trying to propose something both simple and extensible, so that other people could plug their own KMS if they are not fully satisfied with the way the internal pg KMS works, which IMHO should be the case if someone is motivated and paranoid enough to setup a KMS in the first place. > You will need to explain exactly why having a separate process has value > over coding/user complexity, and you will need to get agreement from a > sufficient number of people to move that idea forward. ISTM that the value is simple: The whole KMS idea turns around a "KEK", which is a secret key which allows to unlock/retrieve/recompute many data keys, aka DEKs. Loosing the KEK basically means loosing all data keys, past, present and possibly future, depending on how the KEK/DEK mechanism operates internally. So the thing you should not want is to lose your KEK. Keeping it inside pg process means that any pg process compromision would result in the KEK to be compromised as well, while the whole point of doing this KMS business was to provide security by isolating realms of data encryption. If you provide an interface instead, which I'm advocating, then where the KEK is does not concern pg, which has just to ask for DEKs. A compromise of pg would compromise local DEKs, but not the KEK "master key". The KEK may be somewhere on the same host, in another process, or possibly on another host, on some attached specialized quantum hardware inacessible to human beings. Postgres should not decide where the user should put its KEK, because it would depend on the threat model being addressed that you do not know. From an implementation point of view, what I suggest is reasonably simple and allows people to interface with the KMS of their choice, including one based on the patch, which would be a demos about what can be done, but other systems would be accessible just as well. The other software engineering aspect is that a KMS is a complex/sensitive thing, so reinventing our own and forcing it on users seems like a bad idea. >> From what I understood from the code, the KEK is loaded into postgres >> process. That is what I'm disagreeing with, only needed DEK should be >> there. > > One option would be to send the data needing to be encrypted to an > external command, and get the decrypted data back. In that way, the KEK > is never on the Postgres server. However, the API for doing such an > interface seems very complex and could lead to failures. I was more thinking of an interface to retrieve DEKs, but to still keep encryption/decryption inside postgres, to limit traffic, but what you suggest could be a valid option, so maybe should be allowed. I disagree with the implementation complexity, though. Basically the protocol only function is sending "GET <opaque-key-identifier>" and retrieving a response which is either the DEK or an error, which looks like a manageable complexity. Attached a simplistic server-side implementation of that for illustration. If you want externalized DEK as well, it would be sending "ENC/DEC <key-identifier> <data>" and the response is an error, or the translated data. Looks manageable as well. Allowing both approaches looks ok. Obviously it requires some more thinking and design, but my point is that postgres should not hold a KEK, ever, nor presume how DEK are to be managed by a DMS, and that is not very difficult to achieve by putting it outside of pg and defining how interactions take place. Providing a reference/example implementation would be nice as well, and Masahiko-san code can be rewrapped quite easily. -- Fabien.
Attachment
On Fri, Jun 5, 2020 at 03:34:54PM +0200, Fabien COELHO wrote: > Obviously it requires some more thinking and design, but my point is that > postgres should not hold a KEK, ever, nor presume how DEK are to be managed > by a DMS, and that is not very difficult to achieve by putting it outside of > pg and defining how interactions take place. Providing a reference/example > implementation would be nice as well, and Masahiko-san code can be rewrapped > quite easily. Well, the decrypted keys are already stored in backend memory, so what risk does haveing the KEK in memory for a brief period avoid? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Hello Bruce, Sorry for the length (yet again) of this answer, I'm trying to make my point as clear as possible. >> Obviously it requires some more thinking and design, but my point is that >> postgres should not hold a KEK, ever, nor presume how DEK are to be managed >> by a DMS, and that is not very difficult to achieve by putting it outside of >> pg and defining how interactions take place. Providing a reference/example >> implementation would be nice as well, and Masahiko-san code can be rewrapped >> quite easily. > > Well, the decrypted keys are already stored in backend memory, The fact that if the pg process is compromised then the DEK and data encrypted are compromised is more or less unavoidable (maybe only the data could be compromised though, but not the DEK, depending on how the encryption/decryption operates). > so what risk does haveing the KEK in memory for a brief period avoid? My understanding is that the KEK does not protect one key, but all keys, thus all data, possibly even past or future, so it loss impacts more than the here and now local process. If the KEK is ever present in pg process, it presumes that the threat model being addressed allows its loss if the process is compromised, i.e. all (past, present, future) security properties are void once the process is compromised. This may be okay for some use cases, but I can easily foresee that it would not be for all. I can think of use cases where the user/auditor says that the KEK should be elsewhere, and I would tend to agree. So my point is that the implementation should allow it, i.e. define a simple interface, and possibly a reference implementation with good properties which might fit some/typical security requirements, and the patch mostly fits that need, but for the possible isolation of the KEK. ISTM that a reasonable point of comparision is the design of kerberos, with a central authority/server which authenticate parties and allow them to authenticate one another and communicate securely. The design means that any compromised host/service would compromise all its interaction with other parties, but not communications between third parties. The compromission stays local, with the exception is the kerberos server itself, which somehow holds all the keys. For me the KEK is basically the kerberos server, you should provide means to allow the user to isolate that where they think it should be, and not enforce that it is within postgres process. Another point is that what I suggest does not seem very hard from an implementation point of view, and allows extensibility, which is also a win. Lastly, I still think that all this, whatever the design, should be packaged as an extension, unless it is really impossible to do so. I would also try to separate the extension for KMS interaction with the extension for actually using the keys, so that the user could change the underlying cryptographic primitives as they see fit. -- Fabien.
On Thu, 11 Jun 2020 at 16:07, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Bruce, > > Sorry for the length (yet again) of this answer, I'm trying to make my > point as clear as possible. Thank you for your explanation! > > >> Obviously it requires some more thinking and design, but my point is that > >> postgres should not hold a KEK, ever, nor presume how DEK are to be managed > >> by a DMS, and that is not very difficult to achieve by putting it outside of > >> pg and defining how interactions take place. Providing a reference/example > >> implementation would be nice as well, and Masahiko-san code can be rewrapped > >> quite easily. > > > > Well, the decrypted keys are already stored in backend memory, > > The fact that if the pg process is compromised then the DEK and data > encrypted are compromised is more or less unavoidable (maybe only the data > could be compromised though, but not the DEK, depending on how the > encryption/decryption operates). > > > so what risk does haveing the KEK in memory for a brief period avoid? > > My understanding is that the KEK does not protect one key, but all keys, > thus all data, possibly even past or future, so it loss impacts more than > the here and now local process. > > If the KEK is ever present in pg process, it presumes that the threat > model being addressed allows its loss if the process is compromised, i.e. > all (past, present, future) security properties are void once the process > is compromised. Why we should not put KEK in pg process but it's okay for other processes? I guess you're talking about a threat when a malicious user logged in OS (or at least accessible) but I thought there is no difference between pg process and other processes in terms of the process being compromised. So the solution, in that case, would be to outsource encryption/decryption to external servers as Bruce mentioned. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Masahiko-san, >> If the KEK is ever present in pg process, it presumes that the threat >> model being addressed allows its loss if the process is compromised, i.e. >> all (past, present, future) security properties are void once the process >> is compromised. > > Why we should not put KEK in pg process but it's okay for other > processes? My point is "elsewhere". Indeed, it could be on another process on the same host, in which case I'd rather have the process run under a different uid, which means another compromission would be required if pg is compromissed locally ; it could also be in a process on another host ; it could be on some special hardware. Your imagination is the limit. > I guess you're talking about a threat when a malicious user logged in OS > (or at least accessible) but I thought there is no difference between pg > process and other processes in terms of the process being compromised. Processes are isolated based on uid, unless root is compromised. Once a id is compromised (eg "postgres"), the hacker has basically access to all files and processes accessible to that id. > So the solution, in that case, would be to outsource > encryption/decryption to external servers as Bruce mentioned. Hosting stuff (keys, encryption) on another server is indeed an option if "elsewhere" is implemented. From a design point of view: 0. KEK, DEK & crypto are managed by pg 1. DEK & crypto are managed by pg, but KEK is outside pg. 2. eveything is managed out of pg. I think that both 1 & 2 are valid options, which do not require the same interface. If you have 1, you can do 0 by giving KEK to a pg process. How DEK are identified and created with the KEK should also be something open, left to the implementation, the interface should not need to know. -- Fabien.
On Thu, 11 Jun 2020 at 20:03, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > >> If the KEK is ever present in pg process, it presumes that the threat > >> model being addressed allows its loss if the process is compromised, i.e. > >> all (past, present, future) security properties are void once the process > >> is compromised. > > > > Why we should not put KEK in pg process but it's okay for other > > processes? > > My point is "elsewhere". > > Indeed, it could be on another process on the same host, in which case I'd > rather have the process run under a different uid, which means another > compromission would be required if pg is compromissed locally ; it could > also be in a process on another host ; it could be on some special > hardware. Your imagination is the limit. > > > I guess you're talking about a threat when a malicious user logged in OS > > (or at least accessible) but I thought there is no difference between pg > > process and other processes in terms of the process being compromised. > > Processes are isolated based on uid, unless root is compromised. Once a id > is compromised (eg "postgres"), the hacker has basically access to all > files and processes accessible to that id. > > > So the solution, in that case, would be to outsource > > encryption/decryption to external servers as Bruce mentioned. > > Hosting stuff (keys, encryption) on another server is indeed an option if > "elsewhere" is implemented. If I understand your idea correctly we put both DEK and KEK "elsewhere", and a postgres process gets only DEK from it. It seems to me this idea assumes that the place storing encryption keys employees 2-tire key hierarchy or similar thing. What if the user wants to or has to manage a single encryption key? For example, storing an encryption key for PostgreSQL TDE into a file in a safe server instead of KMS using DEK and KEK because of budgets or requirements whatever. In this case, if the user does key rotation, that encryption key would need to be rotated, resulting in the user would need to re-encrypt all database data encrypted with old key. It should work but what do you think about how postgres does key rotation and re-encryption? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Masahiko-san, I'm not sure I understood your concern. I try to answer below. > If I understand your idea correctly we put both DEK and KEK > "elsewhere", and a postgres process gets only DEK from it. Yes, that is one of the option. > It seems to me this idea assumes that the place storing encryption keys > employees 2-tire key hierarchy or similar thing. ISTM that there is no such assumption. There is the assumption that there is an interface to retrieve DEK. What is done being the interface to retrieve this DEK should be irrelevant to pg. Having them secure by a KEK looks like an reasonable design, though. Maybe keys are actually stored. Maybe thay are computed based on something, eg key identifier and some secret. Maybe there is indeed a 2-tier something. Maybe whatever. > What if the user wants to or has to manage a single encryption key? Then it has one key identifier and it retrieve one key from the DMS. Having a "management system" for a singleton looks like overkill though, but it should work. > For example, storing an encryption key for PostgreSQL TDE into a file in > a safe server instead of KMS using DEK and KEK because of budgets or > requirements whatever. Good. If you have an interface to retrieve a key, then it can probably contact said server to get it when needed? > In this case, if the user does key rotation, that encryption key would > need to be rotated, resulting in the user would need to re-encrypt all > database data encrypted with old key. Sure, by definition actually changing the key requires a decryption/encryption cycle on all data. > It should work but what do you think about how postgres does key > rotation and re-encryption? If pg actually has the DEK, then it means that while the re-encryption is performed it has to manage two keys simultenaously, this is a question for what is done on pg server with the keys, not really about the DMS ? If the "elsewhere" service does the encryption, maybe the protocol could include it, eg something like: REC key1-id key2-id data-encrypted-with-key1 -> data-encrypted-with-key2 But it could also achieve the same thing with two commands, eg: DEC key1-id data-encrypted-with-key1 -> clear-text-data ENC key2-id clear-text-data -> data-encrypted-with-key2 The question is what should be put in the protocol, and I would tend to think that some careful design time should be put in it. -- Fabien.
On Fri, 12 Jun 2020 at 16:17, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > I'm not sure I understood your concern. I try to answer below. > > > If I understand your idea correctly we put both DEK and KEK > > "elsewhere", and a postgres process gets only DEK from it. > > Yes, that is one of the option. > > > It seems to me this idea assumes that the place storing encryption keys > > employees 2-tire key hierarchy or similar thing. > > ISTM that there is no such assumption. There is the assumption that there > is an interface to retrieve DEK. What is done being the interface to > retrieve this DEK should be irrelevant to pg. Having them secure by a > KEK looks like an reasonable design, though. Maybe keys are actually > stored. Maybe thay are computed based on something, eg key identifier and > some secret. Maybe there is indeed a 2-tier something. Maybe whatever. > > > What if the user wants to or has to manage a single encryption key? > > Then it has one key identifier and it retrieve one key from the DMS. > Having a "management system" for a singleton looks like overkill though, > but it should work. > > > For example, storing an encryption key for PostgreSQL TDE into a file in > > a safe server instead of KMS using DEK and KEK because of budgets or > > requirements whatever. > > Good. If you have an interface to retrieve a key, then it can probably > contact said server to get it when needed? > > > In this case, if the user does key rotation, that encryption key would > > need to be rotated, resulting in the user would need to re-encrypt all > > database data encrypted with old key. > > Sure, by definition actually changing the key requires a > decryption/encryption cycle on all data. > > > It should work but what do you think about how postgres does key > > rotation and re-encryption? > > If pg actually has the DEK, then it means that while the re-encryption is > performed it has to manage two keys simultenaously, this is a question for > what is done on pg server with the keys, not really about the DMS ? Yes. Your explanation made my concern clear. Thanks! > > If the "elsewhere" service does the encryption, maybe the protocol could > include it, eg something like: > > REC key1-id key2-id data-encrypted-with-key1 > -> data-encrypted-with-key2 > > But it could also achieve the same thing with two commands, eg: > > DEC key1-id data-encrypted-with-key1 > -> clear-text-data > > ENC key2-id clear-text-data > -> data-encrypted-with-key2 > > The question is what should be put in the protocol, and I would tend to > think that some careful design time should be put in it. > Summarizing the discussed points so far, I think that the major advantage points of your idea comparing to the current patch's architecture are: * More secure. Because it never loads KEK in postgres processes we can lower the likelihood of KEK leakage. * More extensible. We will be able to implement more protocols to outsource other operations to the external place. On the other hand, here are some downsides and issues: * The external place needs to manage more encryption keys than the current patch does. Some cloud key management services are charged by the number of active keys and key operations. So the number of keys postgres requires affects the charges. It'd be worse if we were to have keys per table. * If this approach supports only GET protocol, the user needs to create encryption keys with appropriate ids in advance so that postgres can get keys by id. If postgres TDE creates keys as needed, CREATE protocol would also be required. * If we need only GET protocol, the current approach (i.g. cluser_passphase_command) would be more simple. I imagine the interface between postgres and the extension is C function. This approach is more extensible but it also means extensions need to support multiple protocols, leading to increase complexity and development cost. * This approach necessarily doesn’t eliminate the data leakage threat completely caused by process compromisation. Since DEK is placed in postgres process memory, it’s still possible that if a postgres process is compromised the attacker can steal database data. The benefit of lowering the possibility of KEK leakage is to deal with the threat that the attacker sees database data encrypted by past or future DEK protected by the stolen KEK. * An open question is, as you previously mentioned, how to verify the key obtained from the external place is the right key. Anything else we need to note? Finally, please understand I’m not controverting your idea but just trying to understand which approach is better from multiple perspectives. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 12, 2020 at 09:17:37AM +0200, Fabien COELHO wrote: > The question is what should be put in the protocol, and I would tend to > think that some careful design time should be put in it. I still don't see the value of this vs. its complexity. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Hello Masahiko-san, > Summarizing the discussed points so far, I think that the major > advantage points of your idea comparing to the current patch's > architecture are: > > * More secure. Because it never loads KEK in postgres processes we can > lower the likelihood of KEK leakage. Yes. > * More extensible. We will be able to implement more protocols to > outsource other operations to the external place. Yes. > On the other hand, here are some downsides and issues: > > * The external place needs to manage more encryption keys than the > current patch does. Why? If the external place is just a separate process on the same host, probably it would manage the very same amount as what your patch. > Some cloud key management services are charged by the number of active > keys and key operations. So the number of keys postgres requires affects > the charges. It'd be worse if we were to have keys per table. Possibly. Note that you do not have to use a cloud storage paid as a service. However, you could do it if there is an interface, because it would allow postgres to do so if the user wishes that. That is the point of having an interface that can be implemented differently for different use cases. > * If this approach supports only GET protocol, the user needs to > create encryption keys with appropriate ids in advance so that > postgres can get keys by id. If postgres TDE creates keys as needed, > CREATE protocol would also be required. I'm not sure. ISTM that if there is a KMS to manage keys, it could be its responsability to actually create a key, however the client (pg) would have to request it, basically say "given me a new key for this id". This could even work with a "get" command only, if the KMS is expected to create a new key when asked for a key which does not exists yet. ISTM that the client could (should?) only have to create identifiers for its keys. > * If we need only GET protocol, the current approach (i.g. > cluser_passphase_command) would be more simple. I imagine the > interface between postgres and the extension is C function. Yes. ISTM that can be pretty simple, something like: A guc to define the process to start the interface (having a process means that its uid can be changed), which would communicate on its stdin/stdout. A guc to define how to interact with the interface (eg whether DEK are retrieved, or whether the interface is to ask for encryption/decryption, or possibly some other modes). A few function: - set_key(<local-id:int>, <key-identifier:bytea>); # may retrieve the DEK, or only note that local id of some key. - encode(<local-id:int>, <data:bytea>) -> <encrypted-data:bytea> # may fail if no key is associated to local-id # or if the service is down somehow - decode(<local-id>, <encrypted-data>) -> <data> # could also fail if there is some integrity check associated > This approach is more extensible Yep. > but it also means extensions need to support multiple protocols, leading > to increase complexity and development cost. I do not understand what you mean by "multiple protocols". For me there is one protocol, possibly a few commands in this protocol between client (postgres) and DMS. Anyway, sending "GET <key-id>" to retreive a DEK, for instance, does not sound "complex". Here is some pseudo code: For get_key: if (mode of operation is to have DEKS locally) try send to KMS "get <key-id>" keys[local-id] = answer catch & rethrow possible errors elif (mode is to keep DEKs remote) key_id[local-id] = key-id; else ... For encode, the code is basically: if (has_key(local-id)) if (mode of operation is to have DEKs locally) return some_encode(key[local-id], data); elif (mode is to keep DEKs remote) send to KMS "encode key_id[local-id] data" return answer; # or error else ... else throw error local-id has no associated key; decode is more or less the same as encode. There is another thing to consider is how the client "proves" its identity to the KMS interface, which might suggest some provisions when starting a process, but you already have things in your patch to deal with the KEK, which could be turned into some generic auth. > * This approach necessarily doesn’t eliminate the data leakage threat > completely caused by process compromisation. Sure, if the process as decrypted data or DEK or whatever, then the process compromission leaks these data. My point is to try to limit the leakage potential of a process compromission. > Since DEK is placed in postgres process memory, May be placed, depending on the mode of operation. > it’s still possible that if a postgres process is compromised the > attacker can steal database data. Obviously. This cannot be helped if pg is to hold unencrypted data. > The benefit of lowering the possibility of KEK leakage is to deal with > the threat that the attacker sees database data encrypted by past or > future DEK protected by the stolen KEK. Yes. > * An open question is, as you previously mentioned, how to verify the > key obtained from the external place is the right key. It woud succeed in decrypting data if there is some associated integrity check. Note that from a cryptographic point if view, depending on the use case, it may be a desirable property that you cannot tell whether it is the right one. > Anything else we need to note? Dunno. I would like to see some thread model, and what properties you would expect depending on the hypothesis. For instance, I guess that the minimal you would like is that stolen database cold data (PGDATA contents) should not allow to recover clear contents, but only encrypted stuff which is the whole point of encrypting data in the first place. This is the "here and now". ISTM that the only possible achievement of the current patch is the above. Then you should also consider past data (prior states of PGDATA which may have been stored somewhere the attacker might recover) and future data (that the attacker may be able to recover later). Now what happens on those (past, present, future) data on: - stolen DEK - stolen KEK - stolen full cold data (whole disk stolen) - access to process & live data (pg account compromission at some point in time) - access process & live data & ability to issue more commands at some point in time... - access to full host live data (root compromission) - ... - network full compromission (eg AD has been subverted, this is the usual target for taking down everything on a network if every authentication and authorization is managed by it, which is often the case in a corporate network). - the pg admin is working for the attacker... - the sys admin is working for the attacker... - ... In the end anyway you would lose, the question is how soon, how many compromissions are necessary. > Finally, please understand I’m not controverting your idea but just > trying to understand which approach is better from multiple > perspectives. The point of a discussion is basically to present arguments. -- Fabien.
Hello Bruce. >> The question is what should be put in the protocol, and I would tend to >> think that some careful design time should be put in it. > > I still don't see the value of this vs. its complexity. Dunno. I'm looking for the value of having such a thing in core. ISTM that there are no clear design goals of the system, no clear description of the target use case(s), no clear explanations of the underlying choices (in something like a README), no saying what it achieves and what it does not achieve. It is only code. If the proposed thing is very specific to one use case, which may be more or less particular, then I'd say the stuff should really be an external extension, and you do not need to ask for a review. Call it pgcryptoXYZ and it is done. However, if the stuff is amenable to many/more use cases, then it may still be an extension because it is specialized somehow and not everyone would like to have it if they do not use it, but having it in core would be much more justified. Also, it would have to be a little more "complex" to be extensible, sure. I do not think that it needs to be very complex in the end, but it needs to be carefully designed to be extensible. Note I still do not see why it should be in core directly, i.e. not an extension. I'm yet to see a convincing argument about that. -- Fabien.
On Sat, 13 Jun 2020 at 05:59, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > > Summarizing the discussed points so far, I think that the major > > advantage points of your idea comparing to the current patch's > > architecture are: > > > > * More secure. Because it never loads KEK in postgres processes we can > > lower the likelihood of KEK leakage. > > Yes. > > > * More extensible. We will be able to implement more protocols to > > outsource other operations to the external place. > > Yes. > > > On the other hand, here are some downsides and issues: > > > > * The external place needs to manage more encryption keys than the > > current patch does. > > Why? If the external place is just a separate process on the same host, > probably it would manage the very same amount as what your patch. In the current patch, the external place needs to manage only one key whereas postgres needs to manages multiple DEKs. But with your idea, the external place needs to manage both KEK and DEKs. > > > Some cloud key management services are charged by the number of active > > keys and key operations. So the number of keys postgres requires affects > > the charges. It'd be worse if we were to have keys per table. > > Possibly. Note that you do not have to use a cloud storage paid as a > service. However, you could do it if there is an interface, because it > would allow postgres to do so if the user wishes that. That is the point > of having an interface that can be implemented differently for different > use cases. The same is true for the current patch. The user can get KEK from anywhere they want using cluster_passphrase_command. But as I mentioned above the number of keys that the user manages outside postgres is different. > > > * If this approach supports only GET protocol, the user needs to > > create encryption keys with appropriate ids in advance so that > > postgres can get keys by id. If postgres TDE creates keys as needed, > > CREATE protocol would also be required. > > I'm not sure. ISTM that if there is a KMS to manage keys, it could be its > responsability to actually create a key, however the client (pg) would > have to request it, basically say "given me a new key for this id". > > This could even work with a "get" command only, if the KMS is expected to > create a new key when asked for a key which does not exists yet. ISTM that > the client could (should?) only have to create identifiers for its keys. Yeah, it depends on KMS, meaning we need different extensions for different KMS. A KMS might support an interface that creates key if not exist during GET but some KMS might support CREATE and GET separately. > > > * If we need only GET protocol, the current approach (i.g. > > cluser_passphase_command) would be more simple. I imagine the > > interface between postgres and the extension is C function. > > Yes. ISTM that can be pretty simple, something like: > > A guc to define the process to start the interface (having a process means > that its uid can be changed), which would communicate on its stdin/stdout. > > A guc to define how to interact with the interface (eg whether DEK are > retrieved, or whether the interface is to ask for encryption/decryption, > or possibly some other modes). > > A few function: > > - set_key(<local-id:int>, <key-identifier:bytea>); > # may retrieve the DEK, or only note that local id of some key. > > - encode(<local-id:int>, <data:bytea>) -> <encrypted-data:bytea> > # may fail if no key is associated to local-id > # or if the service is down somehow > > - decode(<local-id>, <encrypted-data>) -> <data> > # could also fail if there is some integrity check associated > > > This approach is more extensible > > Yep. > > > but it also means extensions need to support multiple protocols, leading > > to increase complexity and development cost. > > I do not understand what you mean by "multiple protocols". For me there is > one protocol, possibly a few commands in this protocol between client > (postgres) and DMS. Anyway, sending "GET <key-id>" to retreive a DEK, for > instance, does not sound "complex". Here is some pseudo code: > > For get_key: > > if (mode of operation is to have DEKS locally) > try > send to KMS "get <key-id>" > keys[local-id] = answer > catch & rethrow possible errors > elif (mode is to keep DEKs remote) > key_id[local-id] = key-id; > else ... > > For encode, the code is basically: > > if (has_key(local-id)) > if (mode of operation is to have DEKs locally) > return some_encode(key[local-id], data); > elif (mode is to keep DEKs remote) > send to KMS "encode key_id[local-id] data" > return answer; # or error > else ... > else > throw error local-id has no associated key; > > decode is more or less the same as encode. > > There is another thing to consider is how the client "proves" its identity > to the KMS interface, which might suggest some provisions when starting a > process, but you already have things in your patch to deal with the KEK, > which could be turned into some generic auth. > > > * This approach necessarily doesn’t eliminate the data leakage threat > > completely caused by process compromisation. > > Sure, if the process as decrypted data or DEK or whatever, then the > process compromission leaks these data. My point is to try to limit the > leakage potential of a process compromission. > > > Since DEK is placed in postgres process memory, > > May be placed, depending on the mode of operation. > > > it’s still possible that if a postgres process is compromised the > > attacker can steal database data. > > Obviously. This cannot be helped if pg is to hold unencrypted data. > > > The benefit of lowering the possibility of KEK leakage is to deal with > > the threat that the attacker sees database data encrypted by past or > > future DEK protected by the stolen KEK. > > Yes. > > > * An open question is, as you previously mentioned, how to verify the > > key obtained from the external place is the right key. > > It woud succeed in decrypting data if there is some associated integrity > check. > > Note that from a cryptographic point if view, depending on the use case, > it may be a desirable property that you cannot tell whether it is the > right one. > > > Anything else we need to note? > > Dunno. > > I would like to see some thread model, and what properties you would > expect depending on the hypothesis. > > For instance, I guess that the minimal you would like is that stolen > database cold data (PGDATA contents) should not allow to recover clear > contents, but only encrypted stuff which is the whole point of encrypting > data in the first place. This is the "here and now". > > ISTM that the only possible achievement of the current patch is the above. > > Then you should also consider past data (prior states of PGDATA which may > have been stored somewhere the attacker might recover) and future data > (that the attacker may be able to recover later). > The current patch can protect old data and future data from theft using KEK rotation for the case of KEK theft. The user needs to rotate KEK on a regular basis. > Now what happens on those (past, present, future) data on: > > - stolen DEK > > - stolen KEK > > - stolen full cold data (whole disk stolen) > > - access to process & live data > (pg account compromission at some point in time) > > - access process & live data & ability to issue more commands at some > point in time... > > - access to full host live data (root compromission) > > - ... > > - network full compromission (eg AD has been subverted, this is the usual > target for taking down everything on a network if every > authentication and authorization is managed by it, which is often > the case in a corporate network). > > - the pg admin is working for the attacker... > > - the sys admin is working for the attacker... > > - ... > > In the end anyway you would lose, the question is how soon, how many > compromissions are necessary. > > > Finally, please understand I’m not controverting your idea but just > > trying to understand which approach is better from multiple > > perspectives. > > The point of a discussion is basically to present arguments. My point is the same as Bruce. I'm concerned about the fact that even if we introduce this approach the present data could still be stolen when a postgres process is compromised. It seems to me that your approach is extensible and can protect data from threats in addition to threats that the current patch can protect but it would bring some costs and complexity instead comparing to the current patch. I'd like to hear opinions from other hackers in the community. I think the actual code would help to explain how your approach is not complexed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Masahiko-san, >>> * The external place needs to manage more encryption keys than the >>> current patch does. >> >> Why? If the external place is just a separate process on the same host, >> probably it would manage the very same amount as what your patch. > > In the current patch, the external place needs to manage only one key > whereas postgres needs to manages multiple DEKs. But with your idea, > the external place needs to manage both KEK and DEKs. Hmmm. I do not see a good use case for a "management system" which would only have to manage a singleton. ISTM that one point of using one KEK is to allows several DEKs under it. Maybe I have missed something in your patch, but only one key is a very restricted use case. >>> Some cloud key management services are charged by the number of active >>> keys and key operations. So the number of keys postgres requires affects >>> the charges. It'd be worse if we were to have keys per table. >> >> Possibly. Note that you do not have to use a cloud storage paid as a >> service. However, you could do it if there is an interface, because it >> would allow postgres to do so if the user wishes that. That is the point >> of having an interface that can be implemented differently for different >> use cases. > > The same is true for the current patch. The user can get KEK from > anywhere they want using cluster_passphrase_command. Yep. Somehow I'm proposing to have a command to get DEKs instead of just the KEK, otherwise it is not that far. > But as I mentioned above the number of keys that the user manages > outside postgres is different. Yep, and I do not think that "only one key" approach is good. I really missed something in the patch. From a use case point of view, I thought that the user could have has many keys has they see fit. Maybe one per cluser, or database, or table, or a row if for some reason the application would demand it. I do not think that the pg should decide that, among other things. That is why I'm constantly refering to a "key identifier", and in the pseudo code I added a "local id" (typically an int). >>> * If this approach supports only GET protocol, the user needs to >>> create encryption keys with appropriate ids in advance so that >>> postgres can get keys by id. If postgres TDE creates keys as needed, >>> CREATE protocol would also be required. >> >> I'm not sure. ISTM that if there is a KMS to manage keys, it could be its >> responsability to actually create a key, however the client (pg) would >> have to request it, basically say "given me a new key for this id". >> >> This could even work with a "get" command only, if the KMS is expected to >> create a new key when asked for a key which does not exists yet. ISTM that >> the client could (should?) only have to create identifiers for its keys. > > Yeah, it depends on KMS, meaning we need different extensions for > different KMS. A KMS might support an interface that creates key if not > exist during GET but some KMS might support CREATE and GET separately. I disagree that it is necessary, but this is debatable. The KMS-side interface code could take care of that, eg: if command is "get X" if (X does not exist in KMS) create a new key stored in KMS, return it; else return KMS-stored key; ... So you can still have a "GET" only interface which adapts to the "final" KMS. Basically, the glue code which implements the interface for the KMS can include some logic to adapt to the KMS point of view. >>> * If we need only GET protocol, the current approach (i.g. >> The point of a discussion is basically to present arguments. > > My point is the same as Bruce. I'm concerned about the fact that even > if we introduce this approach the present data could still be stolen > when a postgres process is compromised. Yes, sure. > It seems to me that your approach is extensible and can protect data > from threats in addition to threats that the current patch can protect > but it would bring some costs and complexity instead comparing to the > current patch. I'd like to hear opinions from other hackers in the > community. I'd like an extensible design to have anything in core. As I said in an other mail, if you want to handle a somehow restricted use case, just provide an external extension and do nothing in core, please. Put in core something that people with a slightly different use case or auditor can build on as well. The current patch makes a dozen hard-coded decisions which it should not, IMHO. > I think the actual code would help to explain how your approach is not > complexed. I provided quite some pseudo code, including some python. I'm not planning to redevelop the whole thing: my contribution is a review, currently about the overall design, then if somehow I agree on the design, I would look at the code more precisely. -- Fabien.
On Sun, 14 Jun 2020 at 19:39, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > >>> * The external place needs to manage more encryption keys than the > >>> current patch does. > >> > >> Why? If the external place is just a separate process on the same host, > >> probably it would manage the very same amount as what your patch. > > > > In the current patch, the external place needs to manage only one key > > whereas postgres needs to manages multiple DEKs. But with your idea, > > the external place needs to manage both KEK and DEKs. > > Hmmm. I do not see a good use case for a "management system" which would > only have to manage a singleton. ISTM that one point of using one KEK is > to allows several DEKs under it. Maybe I have missed something in your > patch, but only one key is a very restricted use case. > > >>> Some cloud key management services are charged by the number of active > >>> keys and key operations. So the number of keys postgres requires affects > >>> the charges. It'd be worse if we were to have keys per table. > >> > >> Possibly. Note that you do not have to use a cloud storage paid as a > >> service. However, you could do it if there is an interface, because it > >> would allow postgres to do so if the user wishes that. That is the point > >> of having an interface that can be implemented differently for different > >> use cases. > > > > The same is true for the current patch. The user can get KEK from > > anywhere they want using cluster_passphrase_command. > > Yep. Somehow I'm proposing to have a command to get DEKs instead of just > the KEK, otherwise it is not that far. > > > But as I mentioned above the number of keys that the user manages > > outside postgres is different. > > Yep, and I do not think that "only one key" approach is good. I really > missed something in the patch. From a use case point of view, I thought > that the user could have has many keys has they see fit. Maybe one per > cluser, or database, or table, or a row if for some reason the application > would demand it. I do not think that the pg should decide that, among > other things. That is why I'm constantly refering to a "key identifier", > and in the pseudo code I added a "local id" (typically an int). What I referred to "only one key" is KEK. In the current patch, postgres needs to manage multiple DEKs and fetches one KEK from somewhere. According to the recent TDE discussion, we would have one DEK for all tables/indexes encryption and one DEK for WAL encryption as the first step. > > >>> * If this approach supports only GET protocol, the user needs to > >>> create encryption keys with appropriate ids in advance so that > >>> postgres can get keys by id. If postgres TDE creates keys as needed, > >>> CREATE protocol would also be required. > >> > >> I'm not sure. ISTM that if there is a KMS to manage keys, it could be its > >> responsability to actually create a key, however the client (pg) would > >> have to request it, basically say "given me a new key for this id". > >> > >> This could even work with a "get" command only, if the KMS is expected to > >> create a new key when asked for a key which does not exists yet. ISTM that > >> the client could (should?) only have to create identifiers for its keys. > > > > Yeah, it depends on KMS, meaning we need different extensions for > > different KMS. A KMS might support an interface that creates key if not > > exist during GET but some KMS might support CREATE and GET separately. > > I disagree that it is necessary, but this is debatable. The KMS-side > interface code could take care of that, eg: > > if command is "get X" > if (X does not exist in KMS) > create a new key stored in KMS, return it; > else > return KMS-stored key; > ... > > So you can still have a "GET" only interface which adapts to the "final" > KMS. Basically, the glue code which implements the interface for the KMS > can include some logic to adapt to the KMS point of view. Is the above code is for the extension side, right? For example, if users want to use a cloud KMS, say AWS KMS, to store DEKs and KEK they need an extension that is loaded to postgres and can communicate with AWS KMS. I imagine that such extension needs to be written in C, the communication between the extension uses AWS KMS API, and the communication between postgres core and the extension uses text protocol. When postgres core needs a DEK identified by KEY-A, it asks for the extension to get the DEK by passing something like “GET KEY-A” message, and then the extension asks the existence of that key to AWK KMS, creates if not exist and returns it to the postgres core. Is my understanding right? When we have TDE feature in the future, we would also need to change frontend tools such as pg_waldump and pg_rewind that read database files so that they can read encrypted files. It means that these frond-end tools also somehow need to communicate with the external place to get DEKs in order to decrypt encrypted database files. In your idea, what do you think about how we can support it? > > >>> * If we need only GET protocol, the current approach (i.g. > > >> The point of a discussion is basically to present arguments. > > > > My point is the same as Bruce. I'm concerned about the fact that even > > if we introduce this approach the present data could still be stolen > > when a postgres process is compromised. > > Yes, sure. > > > It seems to me that your approach is extensible and can protect data > > from threats in addition to threats that the current patch can protect > > but it would bring some costs and complexity instead comparing to the > > current patch. I'd like to hear opinions from other hackers in the > > community. > > I'd like an extensible design to have anything in core. As I said in an > other mail, if you want to handle a somehow restricted use case, just > provide an external extension and do nothing in core, please. Put in core > something that people with a slightly different use case or auditor can > build on as well. The current patch makes a dozen hard-coded decisions > which it should not, IMHO. It might have confused you that I included key manager and encryption SQL functions to the patches but this key manager has been designed dedicated to only TDE. It might be better to remove both SQL interface and SQL key we discussed from the patch set as they are actually not necessary for TDE purposes. Aside from the security risk you mentioned, it was a natural design decision for me that we have our key manager component in postgres core that is responsible for managing encryption keys for our TDE. To make the key manager and TDE simple as much as possible, we discussed that we will have cluster-wide TDE and key manager that manages a few encryption keys used by TDE (e.g. one key for table/index encryption and another key for WAL encryption), as the first step. > > > I think the actual code would help to explain how your approach is not > > complexed. > > I provided quite some pseudo code, including some python. I'm not planning > to redevelop the whole thing: my contribution is a review, currently about > the overall design, then if somehow I agree on the design, I would look at > the code more precisely. Hmm, understood. Let's wait for comments from other members. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
* key protection
* key storage
* key disable/enable
* key destroy
On Sun, 14 Jun 2020 at 19:39, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Masahiko-san,
>
> >>> * The external place needs to manage more encryption keys than the
> >>> current patch does.
> >>
> >> Why? If the external place is just a separate process on the same host,
> >> probably it would manage the very same amount as what your patch.
> >
> > In the current patch, the external place needs to manage only one key
> > whereas postgres needs to manages multiple DEKs. But with your idea,
> > the external place needs to manage both KEK and DEKs.
>
> Hmmm. I do not see a good use case for a "management system" which would
> only have to manage a singleton. ISTM that one point of using one KEK is
> to allows several DEKs under it. Maybe I have missed something in your
> patch, but only one key is a very restricted use case.
>
> >>> Some cloud key management services are charged by the number of active
> >>> keys and key operations. So the number of keys postgres requires affects
> >>> the charges. It'd be worse if we were to have keys per table.
> >>
> >> Possibly. Note that you do not have to use a cloud storage paid as a
> >> service. However, you could do it if there is an interface, because it
> >> would allow postgres to do so if the user wishes that. That is the point
> >> of having an interface that can be implemented differently for different
> >> use cases.
> >
> > The same is true for the current patch. The user can get KEK from
> > anywhere they want using cluster_passphrase_command.
>
> Yep. Somehow I'm proposing to have a command to get DEKs instead of just
> the KEK, otherwise it is not that far.
>
> > But as I mentioned above the number of keys that the user manages
> > outside postgres is different.
>
> Yep, and I do not think that "only one key" approach is good. I really
> missed something in the patch. From a use case point of view, I thought
> that the user could have has many keys has they see fit. Maybe one per
> cluser, or database, or table, or a row if for some reason the application
> would demand it. I do not think that the pg should decide that, among
> other things. That is why I'm constantly refering to a "key identifier",
> and in the pseudo code I added a "local id" (typically an int).
What I referred to "only one key" is KEK. In the current patch,
postgres needs to manage multiple DEKs and fetches one KEK from
somewhere. According to the recent TDE discussion, we would have one
DEK for all tables/indexes encryption and one DEK for WAL encryption
as the first step.
>
> >>> * If this approach supports only GET protocol, the user needs to
> >>> create encryption keys with appropriate ids in advance so that
> >>> postgres can get keys by id. If postgres TDE creates keys as needed,
> >>> CREATE protocol would also be required.
> >>
> >> I'm not sure. ISTM that if there is a KMS to manage keys, it could be its
> >> responsability to actually create a key, however the client (pg) would
> >> have to request it, basically say "given me a new key for this id".
> >>
> >> This could even work with a "get" command only, if the KMS is expected to
> >> create a new key when asked for a key which does not exists yet. ISTM that
> >> the client could (should?) only have to create identifiers for its keys.
> >
> > Yeah, it depends on KMS, meaning we need different extensions for
> > different KMS. A KMS might support an interface that creates key if not
> > exist during GET but some KMS might support CREATE and GET separately.
>
> I disagree that it is necessary, but this is debatable. The KMS-side
> interface code could take care of that, eg:
>
> if command is "get X"
> if (X does not exist in KMS)
> create a new key stored in KMS, return it;
> else
> return KMS-stored key;
> ...
>
> So you can still have a "GET" only interface which adapts to the "final"
> KMS. Basically, the glue code which implements the interface for the KMS
> can include some logic to adapt to the KMS point of view.
Is the above code is for the extension side, right? For example, if
users want to use a cloud KMS, say AWS KMS, to store DEKs and KEK they
need an extension that is loaded to postgres and can communicate with
AWS KMS. I imagine that such extension needs to be written in C, the
communication between the extension uses AWS KMS API, and the
communication between postgres core and the extension uses text
protocol. When postgres core needs a DEK identified by KEY-A, it asks
for the extension to get the DEK by passing something like “GET KEY-A”
message, and then the extension asks the existence of that key to AWK
KMS, creates if not exist and returns it to the postgres core. Is my
understanding right?
When we have TDE feature in the future, we would also need to change
frontend tools such as pg_waldump and pg_rewind that read database
files so that they can read encrypted files. It means that these
frond-end tools also somehow need to communicate with the external
place to get DEKs in order to decrypt encrypted database files. In
your idea, what do you think about how we can support it?
>
> >>> * If we need only GET protocol, the current approach (i.g.
>
> >> The point of a discussion is basically to present arguments.
> >
> > My point is the same as Bruce. I'm concerned about the fact that even
> > if we introduce this approach the present data could still be stolen
> > when a postgres process is compromised.
>
> Yes, sure.
>
> > It seems to me that your approach is extensible and can protect data
> > from threats in addition to threats that the current patch can protect
> > but it would bring some costs and complexity instead comparing to the
> > current patch. I'd like to hear opinions from other hackers in the
> > community.
>
> I'd like an extensible design to have anything in core. As I said in an
> other mail, if you want to handle a somehow restricted use case, just
> provide an external extension and do nothing in core, please. Put in core
> something that people with a slightly different use case or auditor can
> build on as well. The current patch makes a dozen hard-coded decisions
> which it should not, IMHO.
It might have confused you that I included key manager and encryption
SQL functions to the patches but this key manager has been designed
dedicated to only TDE. It might be better to remove both SQL interface
and SQL key we discussed from the patch set as they are actually not
necessary for TDE purposes. Aside from the security risk you
mentioned, it was a natural design decision for me that we have our
key manager component in postgres core that is responsible for
managing encryption keys for our TDE. To make the key manager and TDE
simple as much as possible, we discussed that we will have
cluster-wide TDE and key manager that manages a few encryption keys
used by TDE (e.g. one key for table/index encryption and another key
for WAL encryption), as the first step.
>
> > I think the actual code would help to explain how your approach is not
> > complexed.
>
> I provided quite some pseudo code, including some python. I'm not planning
> to redevelop the whole thing: my contribution is a review, currently about
> the overall design, then if somehow I agree on the design, I would look at
> the code more precisely.
Hmm, understood. Let's wait for comments from other members.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi allHaving read through the discussion, I have some comments and suggestions that I would like to share.I think it is still quite early to even talk about external key management system even if it is running on the same host as PG. This is most likely achieved as an extension that can provide communication to external key server and it would be a separate project/discussion. I think the focus now is to finalize on the internal KMS design, and we can discuss about how to migrate internally managed keys to the external when the time is right.
As long as there exists a clean interface, and the "default" (internal) backend is a provider of said functionality, it'll be fine.
Given that having different KMS within a single instance (e.g. per database) is quite unlikely, I suggest just exposing hook-like function-pointer variables and be done with it. Requiring a preloaded library for this purpose doesn't seem too restrictive ---at least at this stage--- and can be very easily evolved in the future --- super-simple API which receives a struct made of function pointers, plus another function to reset it to "internal defaults" and that's it.
Key management system is generally built to manage the life cycle of cryptographic keys, so our KMS in my opinion needs to be built with key life cycle in mind such as:* Key generation
* key protection
* key storage* key rotation* key rewrap
* key disable/enable
* key destroy
Add the support functions for your suggested "key information" functionality, and that's a very rough first draft of the API ...
KMS should not perform the above life cycle management by itself automatically or hardcoded, instead it should expose some interfaces to the end user or even a backend process like TDE to trigger the above.The only key KMS should manage by itself is the KEK, which is derived from cluster passphrase value. This is fine in my opinion. This KEK should exist only within KMS to perform key protection (by wrapping) and key storage (save as file).
Asking for the "cluster password" is something better left optional / made easily overrideable ... or we risk thousands of clusters suddenly not working after a reboot.... :S
Just my .02€
Thanks,
J.L.
Hello Masahiko-san, > What I referred to "only one key" is KEK. Ok, sorry, I misunderstood. >>> Yeah, it depends on KMS, meaning we need different extensions for >>> different KMS. A KMS might support an interface that creates key if not >>> exist during GET but some KMS might support CREATE and GET separately. >> >> I disagree that it is necessary, but this is debatable. The KMS-side >> interface code could take care of that, eg: >> >> if command is "get X" >> if (X does not exist in KMS) >> create a new key stored in KMS, return it; >> else >> return KMS-stored key; >> ... >> >> So you can still have a "GET" only interface which adapts to the "final" >> KMS. Basically, the glue code which implements the interface for the KMS >> can include some logic to adapt to the KMS point of view. > > Is the above code is for the extension side, right? Such a code could be in the command with which pg communicates (eg through its stdin/stdout, or whatever) to get keys. pg talks to the command, the command can do anything, such as storing keys or communicating with an external service to retrieve them, anything really, that is the point. I'm advocating defining the pg/command protocol, something along "GET xxx" as you wrote, and possibly provide a possible/reasonable command implementation, which would be part of the code you put in your patch, only it would be in the command instead of postgres. > For example, if users want to use a cloud KMS, say AWS KMS, to store > DEKs and KEK they need an extension that is loaded to postgres and can > communicate with AWS KMS. I imagine that such extension needs to be > written in C, Why? I could write it in bash, probably. Ok, maybe not so good for suid, but in principle it could be anything. I'd probably write it in C, though. > the communication between the extension uses AWS KMS API, and the > communication between postgres core and the extension uses text > protocol. I'm not sure of the word "extension" above. For me the postgres side could be an extension as in "CREATE EXTENSION". The command itself could be provided in the extension code, but would not be in the "CREATE EXTENSION", it would be something run independently. > When postgres core needs a DEK identified by KEY-A, it asks > for the extension to get the DEK by passing something like “GET KEY-A” > message, and then the extension asks the existence of that key to AWK > KMS, creates if not exist and returns it to the postgres core. Is my > understanding right? Yes. The command in the use-case you outline would just be an intermediary, but for another use-case it would do the storing. The point of aiming at extensibility if that from pg point of view the external commands provide keys, but what these commands really do to do this can be anything. > When we have TDE feature in the future, we would also need to change > frontend tools such as pg_waldump and pg_rewind that read database > files so that they can read encrypted files. It means that these > frond-end tools also somehow need to communicate with the external > place to get DEKs in order to decrypt encrypted database files. In > your idea, what do you think about how we can support it? Hmmm. My idea was that the natural interface would be to get things through postgres. For a debug tool such as pg_waldump, probably it needs to be adapted if it needs to decrypt data to operate. Now I'm not sure I understood, because of the DEK are managed by postgres in your patch, waldump and other external commands would have no access to the decrypted data anyway, so the issue would be the same? With file-level encryption, obviously all commands which have to read and understand the files have to be adapted if they are to still work, which is another argument to have some interface rather than integrated server-side stuff, because these external commands would need to be able to get keys and use them as well. Or I misunderstood something. >> I'd like an extensible design to have anything in core. As I said in an >> other mail, if you want to handle a somehow restricted use case, just >> provide an external extension and do nothing in core, please. Put in core >> something that people with a slightly different use case or auditor can >> build on as well. The current patch makes a dozen hard-coded decisions >> which it should not, IMHO. > > It might have confused you that I included key manager and encryption > SQL functions to the patches but this key manager has been designed > dedicated to only TDE. Hmmm. This is NOT AT ALL what the patch does. The documentation in your patch talks about "column level encryption", which is an application thing. Now you seem to say that it does not matter and can be removed because the use case is elsewhere. > It might be better to remove both SQL interface > and SQL key we discussed from the patch set as they are actually not > necessary for TDE purposes. The documentation part of the patch, at no point, talks about TDE (transparent data encryption), which is a file-level encryption as far as I understand it, i.e. whole files are encrypted. I'm lost, because if you want to do that you cannot easily use padding/HMAC and so because they would change block sizes, and probably you would use CRT instead of CBC to be able to decrypt data selectively. So you certainly succeeded in confusing me deeply:-) > Aside from the security risk you mentioned, it was a natural design > decision for me that we have our key manager component in postgres core > that is responsible for managing encryption keys for our TDE. The patch really needs a README to explain what it really does, and why, and how, and what is the thread model, what are the choices (there should be as few as possible), how it can/could be extended. I've looked at the whole patch, and I could not find the place where files are actually encrypted/decrypted at a low level, that I would expect for file encryption implementation. > To make the key manager and TDE simple as much as possible, we discussed > that we will have cluster-wide TDE and key manager that manages a few > encryption keys used by TDE (e.g. one key for table/index encryption and > another key for WAL encryption), as the first step. Hmmm. Ok. So in fact all that is for TDE, *but* the patch does not do TDE, but provides a column-oriented SQL-level encryption, which is unrelated to your actual objective, which is to do file-level encryption in the end. However, for TDE, it may that you cannot do it with a pg extension because for the extension to work the database must work, which would require some "data" files not to be encrypted in the first place. That seems like a good argument to actually have something in core. Probably for TDE you only want the configuration file not to be encrypted. I'd still advocate to have the key management system possibly outside of pg, and have pg interact with it to get keys when needed. Probably key ids would be the relative file names in that case. The approach of externalizing encryption/decryption would be totally impractical for performance reasons, though. I see value in Cary Huang suggestion on the thread to have dynamically loaded functions implement an interface. That would at least allow to remove some hardcoded choices such as what cypher is actually used, key sizes, and so on. One possible implementation would be to manage things more or less internally as you do, another to fork an external command and talk with it to do the same. However, I do not share the actual interface briefly outlined: I do not thinkpg should have to care about key management functions such as rotation, generation or derivation, storage… the interest of pg should be limited to retrieving the keys it needs. That does not mean such functions do not have security value and should not be implemented, I'd say that it should not be visible/hardcoded in the pg/kms interface, especially if this interface is expected to be generic. As I see it, a pg/kms C-level loadable interface would provide the following function: // options would be supplied by a guc and allow to initialize the // interface with the relevant data, whatever the underlying // implementation needs. error kms_init(char *options); // associate opaque key identifier to a local id error kms_key(local_id int, int key_id_len, byte *key_id); or maybe something like: // would return the local id attributed to the key error/int kms_key(key_id_len, key_id); // the actual functions should be clarified // for TDE file-level, probably the encrypted length is the same as the // input, you cannot have padding, hmac or whatever added. // for SQL app-level, the rules could be different error kms_(en|de)crypt(local_id int, int mode, int len, byte *in, byte *out); // maybe error kms_key_forget(local_id int); error kms_destroy(…); // maybe, to allow extensibility and genericity // eg kms_command("rotate keys with new kek=123"); error kms_command(char *cmd); I'm a little bit unsure that there should be only one KMS active ever, though: a file-level vs app-level encryption could have quite different constraints. Also, should the app-level encryption be able to access keys loaded for file-level encryption? -- Fabien.
On Fri, 19 Jun 2020 at 15:44, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > > What I referred to "only one key" is KEK. > > Ok, sorry, I misunderstood. > > >>> Yeah, it depends on KMS, meaning we need different extensions for > >>> different KMS. A KMS might support an interface that creates key if not > >>> exist during GET but some KMS might support CREATE and GET separately. > >> > >> I disagree that it is necessary, but this is debatable. The KMS-side > >> interface code could take care of that, eg: > >> > >> if command is "get X" > >> if (X does not exist in KMS) > >> create a new key stored in KMS, return it; > >> else > >> return KMS-stored key; > >> ... > >> > >> So you can still have a "GET" only interface which adapts to the "final" > >> KMS. Basically, the glue code which implements the interface for the KMS > >> can include some logic to adapt to the KMS point of view. > > > > Is the above code is for the extension side, right? > > Such a code could be in the command with which pg communicates (eg through > its stdin/stdout, or whatever) to get keys. > > pg talks to the command, the command can do anything, such as storing keys > or communicating with an external service to retrieve them, anything > really, that is the point. > > I'm advocating defining the pg/command protocol, something along "GET xxx" > as you wrote, and possibly provide a possible/reasonable command > implementation, which would be part of the code you put in your patch, > only it would be in the command instead of postgres. > > > For example, if users want to use a cloud KMS, say AWS KMS, to store > > DEKs and KEK they need an extension that is loaded to postgres and can > > communicate with AWS KMS. I imagine that such extension needs to be > > written in C, > > Why? I could write it in bash, probably. Ok, maybe not so good for suid, > but in principle it could be anything. I'd probably write it in C, though. > > > the communication between the extension uses AWS KMS API, and the > > communication between postgres core and the extension uses text > > protocol. > > I'm not sure of the word "extension" above. For me the postgres side could > be an extension as in "CREATE EXTENSION". The command itself could be > provided in the extension code, but would not be in the "CREATE > EXTENSION", it would be something run independently. Oh, I imagined extensions that can be installed by CREATE EXTENSION or specifying it to shared_preload_libraries. If the command runs in the background to talk with postgres there are some problems that we need to deal with. For instance, what if the process downs? Does the postmaster re-execute it? How does it work in single-user mode? etc. It seems to me it will bring another complexity. > > > When postgres core needs a DEK identified by KEY-A, it asks > > for the extension to get the DEK by passing something like “GET KEY-A” > > message, and then the extension asks the existence of that key to AWK > > KMS, creates if not exist and returns it to the postgres core. Is my > > understanding right? > > Yes. The command in the use-case you outline would just be an > intermediary, but for another use-case it would do the storing. The point > of aiming at extensibility if that from pg point of view the external > commands provide keys, but what these commands really do to do this can be > anything. > > > When we have TDE feature in the future, we would also need to change > > frontend tools such as pg_waldump and pg_rewind that read database > > files so that they can read encrypted files. It means that these > > frond-end tools also somehow need to communicate with the external > > place to get DEKs in order to decrypt encrypted database files. In > > your idea, what do you think about how we can support it? > > Hmmm. My idea was that the natural interface would be to get things > through postgres. For a debug tool such as pg_waldump, probably it needs > to be adapted if it needs to decrypt data to operate. > > Now I'm not sure I understood, because of the DEK are managed by postgres > in your patch, waldump and other external commands would have no access to > the decrypted data anyway, so the issue would be the same? With the current patch, we will be able to add --cluster-passphase-command command-line option to front-end tools that want to read encrypted database files. The front-end tools execute the specified command to get KEK and unwrap DEKs. The functions such as running passphrase command, verifying the passphrase is correct, and getting wrapped DEKs from the database cluster are implemented in src/common so both can use these functions. > > With file-level encryption, obviously all commands which have to read and > understand the files have to be adapted if they are to still work, which > is another argument to have some interface rather than integrated > server-side stuff, because these external commands would need to be able > to get keys and use them as well. > > Or I misunderstood something. > > >> I'd like an extensible design to have anything in core. As I said in an > >> other mail, if you want to handle a somehow restricted use case, just > >> provide an external extension and do nothing in core, please. Put in core > >> something that people with a slightly different use case or auditor can > >> build on as well. The current patch makes a dozen hard-coded decisions > >> which it should not, IMHO. > > > > It might have confused you that I included key manager and encryption > > SQL functions to the patches but this key manager has been designed > > dedicated to only TDE. > > Hmmm. This is NOT AT ALL what the patch does. The documentation in your > patch talks about "column level encryption", which is an application > thing. Now you seem to say that it does not matter and can be removed > because the use case is elsewhere. > > > It might be better to remove both SQL interface > > and SQL key we discussed from the patch set as they are actually not > > necessary for TDE purposes. > > The documentation part of the patch, at no point, talks about TDE > (transparent data encryption), which is a file-level encryption as far as > I understand it, i.e. whole files are encrypted. I should have described the positioning of these patches.. The current patch is divided into 7 patches but there are two patches for different purposes. The first patch is to add an internal key manager. This is PostgreSQL’s internal component to manage cryptographic keys mainly for TDE. Originally I proposed both key manager and TDE[1] but these are actually independent each other and we can discuss them separately. Therefore, I started a new thread to discuss only the key manager. According to the discussion so far, since TDE doesn't need to dynamically register DEKs the key manager doesn't have an interface to register DEK for now. We cannot do anything with only the first patch but the plan is to implement TDE on top of the key manager. So we can implement the key manager patch and TDE patch separately but these actually depend on each other. The second patch is to make the key manager use even without TDE. The idea of this patch is to help the incremental development of TDE. There was a discussion that since the development of both the key manager and TDE will take a long time, it’s better to make the key manager work alone by providing SQL interfaces to it. This patch adds new SQL functions: pg_wrap() and pg_unwrap(), to wrap and unwrap the arbitrary user secret by the encryption key called SQL internal key which is managed and stored by the key manager. What this patch does is to register the SQL internal key to the key manager and add SQL functions that wrap and unwrap the given string in the same way of key wrapping used by the key manager. These functions renamed to pg_encrypt() and pg_decrypt(). Given that the purpose of the key manager is to help TDE, discussing the SQL interface part (i.g., the second patch) deviates from the original purpose. I think we should discuss the design and implementation of the key manager first and then other additional interfaces. So I’ve attached a new version patch and removed the second patch part so that we can focus on only the key manager part. > > I'm lost, because if you want to do that you cannot easily use > padding/HMAC and so because they would change block sizes, and probably > you would use CRT instead of CBC to be able to decrypt data selectively. > The patch introducing TDE will add CTR mode. The padding/HMAC is for only wrapping DEKs by the key manager. Since this patch adds only key manager it has only the encryption methods it needs. > So you certainly succeeded in confusing me deeply:-) > > > Aside from the security risk you mentioned, it was a natural design > > decision for me that we have our key manager component in postgres core > > that is responsible for managing encryption keys for our TDE. > > The patch really needs a README to explain what it really does, and why, > and how, and what is the thread model, what are the choices (there should > be as few as possible), how it can/could be extended. > > I've looked at the whole patch, and I could not find the place where files > are actually encrypted/decrypted at a low level, that I would expect for > file encryption implementation. As I explained above, the patch introduces only the key manager which will be a building block of TDE. > > > To make the key manager and TDE simple as much as possible, we discussed > > that we will have cluster-wide TDE and key manager that manages a few > > encryption keys used by TDE (e.g. one key for table/index encryption and > > another key for WAL encryption), as the first step. > > Hmmm. Ok. So in fact all that is for TDE, *but* the patch does not do TDE, > but provides a column-oriented SQL-level encryption, which is unrelated to > your actual objective, which is to do file-level encryption in the end. > > However, for TDE, it may that you cannot do it with a pg extension because > for the extension to work the database must work, which would require some > "data" files not to be encrypted in the first place. That seems like a > good argument to actually have something in core. > > Probably for TDE you only want the configuration file not to be encrypted. Yeah, for TDE, what we have discussed is to encrypt only tables, indexes, temporary files, and WAL (and possibly other database files that could have or help to infer user sensitive data). And each table/index page first several bytes in the page header are not encrypted. > > I'd still advocate to have the key management system possibly outside of > pg, and have pg interact with it to get keys when needed. Probably key ids > would be the relative file names in that case. The approach of > externalizing encryption/decryption would be totally impractical for > performance reasons, though. > > I see value in Cary Huang suggestion on the thread to have dynamically > loaded functions implement an interface. That would at least allow to > remove some hardcoded choices such as what cypher is actually used, key > sizes, and so on. One possible implementation would be to manage things > more or less internally as you do, another to fork an external command and > talk with it to do the same. > > However, I do not share the actual interface briefly outlined: I do not > thinkpg should have to care about key management functions such as > rotation, generation or derivation, storage… the interest of pg should be > limited to retrieving the keys it needs. That does not mean such functions > do not have security value and should not be implemented, I'd say that it > should not be visible/hardcoded in the pg/kms interface, especially if > this interface is expected to be generic. Since the current key manager is designed for only TDE or something encryption feature inside PostgreSQL, the usage of the key manager is limited. It’s minimum and simple but it has minimum extensible that PostgreSQL internal modules such as TDE can register their cryptographic key. I agree to have more an extensible feature if it is expected to cover generic use cases but currently it isn't. > > As I see it, a pg/kms C-level loadable interface would provide the > following function: > > // options would be supplied by a guc and allow to initialize the > // interface with the relevant data, whatever the underlying > // implementation needs. > error kms_init(char *options); > > // associate opaque key identifier to a local id > error kms_key(local_id int, int key_id_len, byte *key_id); > > or maybe something like: > > // would return the local id attributed to the key > error/int kms_key(key_id_len, key_id); > > // the actual functions should be clarified > // for TDE file-level, probably the encrypted length is the same as the > // input, you cannot have padding, hmac or whatever added. > // for SQL app-level, the rules could be different > error kms_(en|de)crypt(local_id int, int mode, int len, > byte *in, byte *out); > > // maybe > error kms_key_forget(local_id int); > error kms_destroy(…); > > // maybe, to allow extensibility and genericity > // eg kms_command("rotate keys with new kek=123"); > error kms_command(char *cmd); My first proposal implements a similar thing; having C-level loadable interface to get, generate, and rotating KEK. But the concern was the development cost and complexity vs benefit. And I don't think it's a good idea to support both key management and encryption/decryption as kms interface. > > I'm a little bit unsure that there should be only one KMS active ever, > though: a file-level vs app-level encryption could have quite different > constraints. Also, should the app-level encryption be able to access keys > loaded for file-level encryption? > You mean app-level encryption also uses encryption keys get from postgres? Regards, [1] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO%3D8N%3Dnc2xVZPB0d9e-VjJ%3DYaRnw%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, 23 Jun 2020 at 22:46, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 19 Jun 2020 at 15:44, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > > > > Hello Masahiko-san, > > > > > What I referred to "only one key" is KEK. > > > > Ok, sorry, I misunderstood. > > > > >>> Yeah, it depends on KMS, meaning we need different extensions for > > >>> different KMS. A KMS might support an interface that creates key if not > > >>> exist during GET but some KMS might support CREATE and GET separately. > > >> > > >> I disagree that it is necessary, but this is debatable. The KMS-side > > >> interface code could take care of that, eg: > > >> > > >> if command is "get X" > > >> if (X does not exist in KMS) > > >> create a new key stored in KMS, return it; > > >> else > > >> return KMS-stored key; > > >> ... > > >> > > >> So you can still have a "GET" only interface which adapts to the "final" > > >> KMS. Basically, the glue code which implements the interface for the KMS > > >> can include some logic to adapt to the KMS point of view. > > > > > > Is the above code is for the extension side, right? > > > > Such a code could be in the command with which pg communicates (eg through > > its stdin/stdout, or whatever) to get keys. > > > > pg talks to the command, the command can do anything, such as storing keys > > or communicating with an external service to retrieve them, anything > > really, that is the point. > > > > I'm advocating defining the pg/command protocol, something along "GET xxx" > > as you wrote, and possibly provide a possible/reasonable command > > implementation, which would be part of the code you put in your patch, > > only it would be in the command instead of postgres. > > > > > For example, if users want to use a cloud KMS, say AWS KMS, to store > > > DEKs and KEK they need an extension that is loaded to postgres and can > > > communicate with AWS KMS. I imagine that such extension needs to be > > > written in C, > > > > Why? I could write it in bash, probably. Ok, maybe not so good for suid, > > but in principle it could be anything. I'd probably write it in C, though. > > > > > the communication between the extension uses AWS KMS API, and the > > > communication between postgres core and the extension uses text > > > protocol. > > > > I'm not sure of the word "extension" above. For me the postgres side could > > be an extension as in "CREATE EXTENSION". The command itself could be > > provided in the extension code, but would not be in the "CREATE > > EXTENSION", it would be something run independently. > > Oh, I imagined extensions that can be installed by CREATE EXTENSION or > specifying it to shared_preload_libraries. > > If the command runs in the background to talk with postgres there are > some problems that we need to deal with. For instance, what if the > process downs? Does the postmaster re-execute it? How does it work in > single-user mode? etc. It seems to me it will bring another > complexity. > > > > > > When postgres core needs a DEK identified by KEY-A, it asks > > > for the extension to get the DEK by passing something like “GET KEY-A” > > > message, and then the extension asks the existence of that key to AWK > > > KMS, creates if not exist and returns it to the postgres core. Is my > > > understanding right? > > > > Yes. The command in the use-case you outline would just be an > > intermediary, but for another use-case it would do the storing. The point > > of aiming at extensibility if that from pg point of view the external > > commands provide keys, but what these commands really do to do this can be > > anything. > > > > > When we have TDE feature in the future, we would also need to change > > > frontend tools such as pg_waldump and pg_rewind that read database > > > files so that they can read encrypted files. It means that these > > > frond-end tools also somehow need to communicate with the external > > > place to get DEKs in order to decrypt encrypted database files. In > > > your idea, what do you think about how we can support it? > > > > Hmmm. My idea was that the natural interface would be to get things > > through postgres. For a debug tool such as pg_waldump, probably it needs > > to be adapted if it needs to decrypt data to operate. > > > > Now I'm not sure I understood, because of the DEK are managed by postgres > > in your patch, waldump and other external commands would have no access to > > the decrypted data anyway, so the issue would be the same? > > With the current patch, we will be able to add > --cluster-passphase-command command-line option to front-end tools > that want to read encrypted database files. The front-end tools > execute the specified command to get KEK and unwrap DEKs. The > functions such as running passphrase command, verifying the passphrase > is correct, and getting wrapped DEKs from the database cluster are > implemented in src/common so both can use these functions. > > > > > With file-level encryption, obviously all commands which have to read and > > understand the files have to be adapted if they are to still work, which > > is another argument to have some interface rather than integrated > > server-side stuff, because these external commands would need to be able > > to get keys and use them as well. > > > > Or I misunderstood something. > > > > >> I'd like an extensible design to have anything in core. As I said in an > > >> other mail, if you want to handle a somehow restricted use case, just > > >> provide an external extension and do nothing in core, please. Put in core > > >> something that people with a slightly different use case or auditor can > > >> build on as well. The current patch makes a dozen hard-coded decisions > > >> which it should not, IMHO. > > > > > > It might have confused you that I included key manager and encryption > > > SQL functions to the patches but this key manager has been designed > > > dedicated to only TDE. > > > > Hmmm. This is NOT AT ALL what the patch does. The documentation in your > > patch talks about "column level encryption", which is an application > > thing. Now you seem to say that it does not matter and can be removed > > because the use case is elsewhere. > > > > > It might be better to remove both SQL interface > > > and SQL key we discussed from the patch set as they are actually not > > > necessary for TDE purposes. > > > > The documentation part of the patch, at no point, talks about TDE > > (transparent data encryption), which is a file-level encryption as far as > > I understand it, i.e. whole files are encrypted. > > I should have described the positioning of these patches.. The current > patch is divided into 7 patches but there are two patches for > different purposes. > > The first patch is to add an internal key manager. This is > PostgreSQL’s internal component to manage cryptographic keys mainly > for TDE. Originally I proposed both key manager and TDE[1] but these > are actually independent each other and we can discuss them > separately. Therefore, I started a new thread to discuss only the key > manager. According to the discussion so far, since TDE doesn't need to > dynamically register DEKs the key manager doesn't have an interface to > register DEK for now. We cannot do anything with only the first patch > but the plan is to implement TDE on top of the key manager. So we can > implement the key manager patch and TDE patch separately but these > actually depend on each other. > > The second patch is to make the key manager use even without TDE. The > idea of this patch is to help the incremental development of TDE. > There was a discussion that since the development of both the key > manager and TDE will take a long time, it’s better to make the key > manager work alone by providing SQL interfaces to it. This patch adds > new SQL functions: pg_wrap() and pg_unwrap(), to wrap and unwrap the > arbitrary user secret by the encryption key called SQL internal key > which is managed and stored by the key manager. What this patch does > is to register the SQL internal key to the key manager and add SQL > functions that wrap and unwrap the given string in the same way of key > wrapping used by the key manager. These functions renamed to > pg_encrypt() and pg_decrypt(). > > Given that the purpose of the key manager is to help TDE, discussing > the SQL interface part (i.g., the second patch) deviates from the > original purpose. I think we should discuss the design and > implementation of the key manager first and then other additional > interfaces. So I’ve attached a new version patch and removed the > second patch part so that we can focus on only the key manager part. > Since the previous patch sets conflicts with the current HEAD, I've attached the rebased patch set. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Jul 31, 2020 at 04:06:38PM +0900, Masahiko Sawada wrote: > Since the previous patch sets conflicts with the current HEAD, I've > attached the rebased patch set. Patch 0002 fails to apply, so a rebase is needed. -- Michael
Attachment
On Fri, Jul 31, 2020 at 04:06:38PM +0900, Masahiko Sawada wrote: > > Given that the purpose of the key manager is to help TDE, discussing > > the SQL interface part (i.g., the second patch) deviates from the > > original purpose. I think we should discuss the design and > > implementation of the key manager first and then other additional > > interfaces. So I’ve attached a new version patch and removed the > > second patch part so that we can focus on only the key manager part. > > > > Since the previous patch sets conflicts with the current HEAD, I've > attached the rebased patch set. I have updated the attached patch and am hoping to move this feature forward. The changes I made are: * handle merge conflicts * changed ssl initialization to match other places in our code * changed StrNCpy() to strlcpy * update the docs The first three were needed to get it to compile. I then ran some tests using the attached shell script as my password script. First, I found that initdb called the script twice. The first call worked fine, but the second call would accept a password that didn't match the first call. This is because there are no keys defined, so there is nothing for kmgr_verify_passphrase() to check for passkey verification, so it just succeeds. In fact, I can't figure out how to create any keys with the patch, and pg_encrypt() is documented, but not defined anywhere. Second, in testing starting/stopping the server, pg_ctl doesn't allow the cluster_passphrase_command to read from /dev/tty, which I think is a requirement because the command could likely require a user-supplied unlock key, even if that is not the actual passphrase, just like ssl keys. This is because pg_ctl calls setsid() just before calling execl() to start the server, and setsid() disassociates itself from the controlling terminal. I think the fix is to remove setsid() from pg_ctl and add a postmaster flag to call setsid() after it has potentially called cluster_passphrase_command, and pg_ctl would use that flag. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Attachment
Bruce Momjian <bruce@momjian.us> writes: > Second, in testing starting/stopping the server, pg_ctl doesn't allow > the cluster_passphrase_command to read from /dev/tty, which I think is a > requirement because the command could likely require a user-supplied > unlock key, even if that is not the actual passphrase, just like ssl > keys. This is because pg_ctl calls setsid() just before calling execl() > to start the server, and setsid() disassociates itself from the > controlling terminal. I think the fix is to remove setsid() from pg_ctl > and add a postmaster flag to call setsid() after it has potentially > called cluster_passphrase_command, and pg_ctl would use that flag. We discussed that and rejected it in the thread leading up to bb24439ce [1]. The primary problem being that it's not very clear when the postmaster should daemonize itself, and later generally isn't better. I doubt that this proposal is doing anything to clarify that situation. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAEET0ZH5Bf7dhZB3mYy8zZQttJrdZg_0Wwaj0o1PuuBny1JkEw%40mail.gmail.com
On Fri, Oct 16, 2020 at 04:56:47PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Second, in testing starting/stopping the server, pg_ctl doesn't allow > > the cluster_passphrase_command to read from /dev/tty, which I think is a > > requirement because the command could likely require a user-supplied > > unlock key, even if that is not the actual passphrase, just like ssl > > keys. This is because pg_ctl calls setsid() just before calling execl() > > to start the server, and setsid() disassociates itself from the > > controlling terminal. I think the fix is to remove setsid() from pg_ctl > > and add a postmaster flag to call setsid() after it has potentially > > called cluster_passphrase_command, and pg_ctl would use that flag. > > We discussed that and rejected it in the thread leading up to > bb24439ce [1]. The primary problem being that it's not very clear > when the postmaster should daemonize itself, and later generally > isn't better. I doubt that this proposal is doing anything to > clarify that situation. Agreed. No reason to destablize the postmaster for this. What about having pg_ctl open /dev/tty, and then pass in an open file descriptor that is a copy of /dev/tty, that can be closed by the postmaster after the cluster_passphrase_command? I just tested this and it worked. I am thinking we would pass the file descriptor number to the postmaster via a command-line argument. Ideally we would pass in the device name of /dev/tty, but I don't know of a good way to do that. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Sat, 17 Oct 2020 at 05:24, Bruce Momjian <bruce@momjian.us> wrote: > > On Fri, Jul 31, 2020 at 04:06:38PM +0900, Masahiko Sawada wrote: > > > Given that the purpose of the key manager is to help TDE, discussing > > > the SQL interface part (i.g., the second patch) deviates from the > > > original purpose. I think we should discuss the design and > > > implementation of the key manager first and then other additional > > > interfaces. So I’ve attached a new version patch and removed the > > > second patch part so that we can focus on only the key manager part. > > > > > > > Since the previous patch sets conflicts with the current HEAD, I've > > attached the rebased patch set. > > I have updated the attached patch and am hoping to move this feature > forward. The changes I made are: > > * handle merge conflicts > * changed ssl initialization to match other places in our code > * changed StrNCpy() to strlcpy > * update the docs Thank you for updating the patch! > > The first three were needed to get it to compile. I then ran some tests > using the attached shell script as my password script. First, I found > that initdb called the script twice. The first call worked fine, but > the second call would accept a password that didn't match the first > call. This is because there are no keys defined, so there is nothing > for kmgr_verify_passphrase() to check for passkey verification, so it > just succeeds. In fact, I can't figure out how to create any keys with > the patch, The patch introduces only key management infrastructure but with no key. Currently, there is no interface to dynamically add a new encryption key. We need to add the new key ID to internalKeyLengths array and increase KMGR_MAX_INTERNAL_KEY. The plan was to add a subsequent patch that adds functionality using encryption keys managed by the key manager. The patch was to add two SQL functions: pg_wrap() and pg_unwrap(), along with the internal key wrap key. IIUC, what to integrate with the key manager is still an open question. The idea of pg_wrap() and pg_unwrap() seems good but it still has the problem that the password could be logged to the server log when the user wraps it. Of course, since the key manager is originally designed for cluster-wide transparent encryption, TDE will be one of the users of the key manager. But there was a discussion it’s better to introduce the key manager first and to make it have small functions or integrate with existing features such as pgcrypto because TDE development might take time over the releases. So I'm thinking to deal with the problem the pg_wrap idea has or to make pgcrypto use the key manager so that it doesn't require the user to pass the password as a function argument. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The patch introduces only key management infrastructure but with no
key. Currently, there is no interface to dynamically add a new
encryption key.
Greetings, * Craig Ringer (craig.ringer@enterprisedb.com) wrote: > On Mon, Oct 19, 2020 at 11:16 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > The patch introduces only key management infrastructure but with no > > key. Currently, there is no interface to dynamically add a new > > encryption key. > > I'm a bit confused by the exact intent and use cases behind this patch. > https://www.postgresql.org/message-id/17156d2e419.12a27f6df87825.436300492203108132%40highgo.ca > that was somewhat helpful but not entirely clear. > > The main intent of this proposal seems to be to power TDE-style encryption > of data at rest, with a single master key for the entire cluster. Has any > consideration been given to user- or role-level key management as part of > this, or is that expected to be done separately and protected by the master > key supplied by this patch? I've not been following very closely, but I definitely agree with the general feedback here (more on that below), but to this point- I do believe that was the intent, or at least I sure hope that it was. Being able to have user/role keys would certainly be good. Having a way for a user to log in and unlock their key would also be really nice. > If so, what if I have a HSM (or virtualised or paravirt or network proxied > HSM) that I want to use to manage my database keys, such that the database > master key is protected by the HSM? Say I want to put my database key in a > smartcard, my machine's TPM, a usb HSM, a virtual HSM provided by my > VM/cloud platform, etc? > > As far as I can tell with the current design I'd have to encrypt my unlock > passphrase and put it in the cluster_passphrase_command script or its > arguments. The script would ask the HSM to decrypt the key passphrase and > write that over stdio where Pg would read it and use it to decrypt the > master key(s). That would work - but it should not be necessary and it > weakens the protection offered by the HSM considerably. Yeah, I do think this is how you'd need to do it and I agree that it'd be better to offer an option that can go to the HSM directly. That said- I don't think we necessarily want to throw out tho command-based option, as users may wish to use a vaulting solution or similar instead of an HSM. What I am curious about though- what are the thoughts around using a vaulting solution's command-line tool vs. writing code to work with an API? Between these various options, what are the risks of having a script vs. using an API and would one or the other weaken the overall solution? Or is what's really needed here is a way to tell us if it's a passphrase we're getting or a proper key, regardless of the method being used to fetch it? > I suggest we allow the user to supply their own KEK via a > cluster_encryption_key GUC. If set, Pg would create an SSLContext with the > supplied key and use that SSLContext to decrypt the application keys - with > no intermediate KEK-derivation based on cluster_passphrase_command > performed. cluster_encryption_key could be set to an openssl engine URI, in > which case OpenSSL would transparently use the supplied engine (usually a > HSM) to decrypt the application keys. We'd install the > cluster_passphrase_command as an openssl askpass callback so that if the > HSM requires an unlock password it can be provided - like how it's done for > libpq in Pg 13. Some thought is required for how to do key rotation here, > though it matters a great deal less when a HSM is managing key escrow. This really locks us into OpenSSL for this, which I don't particularly like. If we do go down this route, we should definitely make it clear that this is for use when PG has been built with OpenSSL, ie: openssl_cluster_encryption_key as the parameter name, or such. > For example if I want to lock my database with a YubiHSM I would configure > something like: > > cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private' > > The DB would be encrypted and decrypted using application keys unlocked by > the HSM. Backups of the database, stolen disk images, etc, would be > unreadable unless you have access to another HSM with the same key loaded. Well, you would surely just need the key, since you could change the PG config to fetch the key from whereever you have it, you wouldn't need an actual HSM.. > If cluster_encryption_key is unset, Pg would perform its own KEK derivation > based on cluster_passphrase_command as currently implemented. To what I was suggesting above- what if we just had a GUC that's "kek_method" with options 'passphrase' and 'direct', where passphrase goes through KEK and 'direct' doesn't, which just changes how we treat the results of called cluster_passphrase_command? > I really don't think we should be adopting something that doesn't consider > platform based hardware key escrow and protection. I agree that we should consider platform based hardware key escrow and protection. I'm generally supportive of trying to do so in a way that keeps things very flexible for users without us having to write a lot of code that's either library-specific or solution-specific. Thanks, Stephen
Attachment
On Mon, Oct 26, 2020 at 11:02 PM Stephen Frost <sfrost@snowman.net> wrote: TL;DR: * Important to check that key rotation is possible on a replica, i.e. primary and standby can have different cluster passphrase and KEK encrypting the same WAL and heap keys; * with a HSM we can't read the key out, so a pluggable KEK operations context or a configurable URI for the KEK is necessary * I want the SQL key and SQL wrap/unwrap part in a separate patch, I don't think it's fully baked and oppose its inclusion in its current form * Otherwise this looks good so far Explanation and argument for why below. > I've not been following very closely, but I definitely agree with the > general feedback here (more on that below), but to this point- I do > believe that was the intent, or at least I sure hope that it was. Being > able to have user/role keys would certainly be good. Having a way for a > user to log in and unlock their key would also be really nice. Right. AFAICS this is supposed to provide the foundation layer for whole-cluster encryption, and it looks ok for that, caveat about HSMs aside. I see nothing wrong with using a single key for heap (and one for WAL, or even the same key). Finer grained and autovacuum etc becomes seriously painful. I want to take a closer look at how the current implementation will play with physical replication. I assume the WAL and heap keys have to be constant for the full cluster lifetime, short of a dump and reload. But I want to make sure that the KEK+HMAC can differ from one node to another, i.e. that we can perform KEK rotation on a replica to re-encrypt the WAL and heap keys against a new KEK. This is important for backups, and also for effective use of a HSM where we may not want or be able to have the same key on a primary and its replicas. If it isn't already supported it looks like it should be simple, but it's IMO important not to miss. The main issue I have so far is that I don't think the SQL key actually fits well with the current proposal. Its proposed interface and use cases are incomplete, it doesn't fully address key leak risks, there's no user access control, etc. Also the SQL key part could be implemented on top of the base cluster encryption part, I don't see why it actually has to integrate with the whole-cluster key management directly. SQL KEY ---- I'm not against the SQL key and wrap/unwrap functionality - quite the contrary, I think it's really important to have something like it. But is it appropriate to have a single, fixed-for-cluster-lifetime key for this, one with no SQL-level access control over who can or cannot use it, etc? The material encrypted with the key is user-exposed so key rotation is an issue, but is not addressed here. And the interface doesn't really solve the numerous potential problems with key material leaks through logs and error messages. I just think that if we bake in the proposed user visible wrap/unwrap interface now we're going to regret it later. How will it work when we want to add user- or role- level access control for database-stored keys? When we want to introduce a C-level API for extensions to work directly with encrypted data like they can work currently with TOASTed data, to prevent decrypted data from ever becoming SQL function arguments subject to possible leakage and to allow implementation of always-encrypted data types, etc? Most importantly - I don't think the SQL key adds anything really crucial that we cannot do at the SQL level with an extension. An extension "pg_wrap" could provide pg_wrap() and pg_unwrap() already, using a single master key much like the SQL key proposed in this patch. To store the master key it could: 1. Derive the key at shared_preload_libraries time in the same way this key management system proposes to do so, using a pg_wrap.pg_wrap_passphrase_command ; or 2. Read the key from a PEM file on disk, accepting a passphrase to decrypt it via a password command GUC or an SQL-level function call; or 3. Read the key from a pg_wrap.pg_wrap_keys extension catalog, which is superuser-only and which is protected by whole-db encryption if in use; 4. Like (3) but use generic WAL and a custom relation to make the in-db key store opaque without use of extensions like pageinspect. That way we haven't baked some sort of limited wrap/unwrap into Pg's long term user visible API. I'd be totally happy for such a SQL key wrap/unwrap to become part of pgcrypto, or a separate extension that uses pgcrypto, if you're worried about having it available to users. I just don't really want it in src/backend in its current form. To simplify (1) we could make the implementation of the KEK/HMAC derivation accessible from extensions and allow them to provide their own password callback, though that might make life harder if we want to change things later, and it'd mean that you couldn't use the extension on a db that was not configured for whole db encryption. So I'd actually rather make key derivation and storage the extension's problem for now. OTHER TRANSPARENT ENCRYPTION USE CASES ---- Does this patch get in the way of supporting other kinds of transparent encryption that are frequently requested and are in use on other systems already? I don't think so. Whole-cluster encryption is quite separate and the proposed patch doesn't seem to do anything that'd make table-, row- or column-level encryption, per-user key management, etc any harder. Specific use cases I looked at: * Finer grained keying than whole-cluster for transparent encryption-at-rest. As soon as we have relations that require user session supplied information to allow the backend to read the relation we get into a real mess with autovacuum, logical decoding, etc. So if anyone wants to implement that sorts of thing they're probably going to want to do so separately to block-level whole-cluster encryption, in a way that preserves the normal page and page item structure and encrypts the row data only. * Client-driver-assisted transparently encrypted at-rest-and-in-transit data, where the database engine doesn't have the encrypt/decrypt keys at all. Again in this case they're going to have to do that at the row level or column level, not the block (relfilenode extents and WAL) level, otherwise we can't provide autovacuum etc. > > If so, what if I have a HSM [...] > > [...] I agree that it'd > be better to offer an option that can go to the HSM directly. Right. > That > said- I don't think we necessarily want to throw out tho command-based > option, as users may wish to use a vaulting solution or similar instead > of an HSM. I agree. I wasn't proposing to throw out the command based approach, just provide a way to inform postgres that it should do operations with the KEK using an external engine instead of deriving its own KEK from a passphrase and other inputs. > What I am curious about though- what are the thoughts around > using a vaulting solution's command-line tool vs. writing code to work > with an API? I think the code that fetches the cluster passphrase from a command should be interceptable by a hook, so organisations with Wacky Security Policies Written By People Who Have Heard About Computers But Never Used One can jump through the necessary hoops. I am of course absolutely not speaking from experience here, no, not at all... see ssl_passphrase_function in src/backend/libpq/be-secure-openssl.c, and see src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c . So I suggest something like that - a hook that by default calls an external command but can by overridden by an extension. It wouldn't be openssl specific like the server key passphrase example though. That was done with an openssl specific hook because we don't know if we're going to need a passphrase at all until openssl has opened the key. In the cluster encryption case we'll know if we're doing our own KEK+HMAC generation or not without having to ask the SSL library. > Between these various options, what are the risks of > having a script vs. using an API and would one or the other weaken the > overall solution? Or is what's really needed here is a way to tell us > if it's a passphrase we're getting or a proper key, regardless of the > method being used to fetch it? For various vault systems I don't think it matters at all whether the secret they manage is the key, input used to generate the key, or input used to decrypt a key stored elsewhere. Either way they have the pivotal secret. So I don't see much point allowing the command to return a fully formed key. The point of a HSM that you don't get to read the key. Pg can never read the key, it can only perform encrypt and decrypt operations on the key using the HSM via the SSL library: Pg -> openssl: "this is the ciphertext of the wal_key. Please decrypt it for me." openssl -> engine layer "engine, please decrypt this" pkcs#11 engine-> pkcs#11 provider: "please decrypt this" pkcs#11 provider -> HSM-specific libraries, network proxies, whatever: "please decrypt this" "... here's the plaintext" <- flows back up So the KEK used to encrypt the main cluster keys for heap and wal encryption is never readable by Pg. It usually never enters host memory - in the case of a HSM, the ciphertext is sent over USB or PCIe to the HSM and the cleartext comes back. In openssl, the default engine is file-based with host software crypto implementations. You can specify alternate engines using various OpenSSL APIs, or you can specify them by supplying a URI where you'd usually supply a file path to a key. I'm proposing we make it easy to supply a key URI and let openssl handle the engine etc. It's far from perfect, and it's really meant as a fallback to allow apps that don't natively understand SSL engines etc to still use them in a limited capacity. What I'd *prefer* to do is make the function that sets up the KEK hookable. So by default we'd call a function that'd read the external passphrase from a command use that to generate KEK+HMAC. But an extension hook installed at shared_preload_libraries time could override the behaviour completely and return its own implementation. In the phrasing of the patch as written that'd probably be a hook that returns its own pg_cipher_ctx* , overriding the default implementation of pg_cipher_ctx * pg_cipher_ctx_create(void); . For a HSM the returned context would delegate operations to the HSM. > This really locks us into OpenSSL for this, which I don't particularly > like. We're pretty locked into openssl already. I don't like it either, it was just the option that has the least impact/delay on the main work on this patch. I'd rather abstract KEK operations behind a context object-like struct with function pointer members, like we do in many other places in Pg. Make the default one do the dance of reading the external passphrase and generating the KEK on the fly. Allow plugins to override it with their own, and let them set it up to delegate to a HSM or whatever else they want. Then ship a simple openssl based default implementation of HSM support that can be shoved in shared_preload_libraries. Or if we don't like using s_p_l, add a separate GUC for cluster_encryption_key_manager or whatever, and a different entrypoint, instead of having s_p_l call _PG_init() to register a hook. > > For example if I want to lock my database with a YubiHSM I would configure > > something like: > > > > cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private' > > > > The DB would be encrypted and decrypted using application keys unlocked by > > the HSM. Backups of the database, stolen disk images, etc, would be > > unreadable unless you have access to another HSM with the same key loaded. > > Well, you would surely just need the key, since you could change the PG > config to fetch the key from wherever you have it, you wouldn't need an > actual HSM. Right - if your HSM was programmed by generating a key and storing that into the HSM and you have that key backed up in file form somewhere, you could likely put it in a pem file and use that directly by pointing Pg at the file instead of an engine URI. But you might not even have the key. In some HSM implementations the key is completely sealed - you can program new HSMs to have the same key by using the same configuration, but you cannot actually obtain the key short of attacks on the HSM hardware itself. That's very much by design - the HSM configuration is usually on an air-gapped system, and it isn't sufficient to decrypt anything unless you also have access to a copy of the HSM hardware itself. Obviously you accept the risks if you take that approach, and you must have an escape route where you can re-encrypt the material protected by the HSM against some other key. But it's not at all uncommon. Key rotation is obviously vital to make this vaguely sane. In Pg's case you'd to change the key configuration, then trigger a key rotation step, which would decrypt with a context obtained from the old config then encrypt with a context obtained from the new config. > > If cluster_encryption_key is unset, Pg would perform its own KEK derivation > > based on cluster_passphrase_command as currently implemented. > > To what I was suggesting above- what if we just had a GUC that's > "kek_method" with options 'passphrase' and 'direct', where passphrase > goes through KEK and 'direct' doesn't, which just changes how we treat > the results of called cluster_passphrase_command? That won't work for a HSM. It is not possible to extract the key. "direct" cannot be implemented.
On Mon, Oct 26, 2020 at 10:05:10PM +0800, Craig Ringer wrote: > For example if I want to lock my database with a YubiHSM I would configure > something like: > > cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private' Well, openssl uses a prefix before the password string, e.g.: * pass:password * env:var * file:pathname * fd:number * stdin See 'man openssl'. I always thought that API was ugly, but I now see the value in it. We could implement a 'command:' prefix now, and maybe a 'pass:' one, and allow other methods like 'pkcs11' later. I can also imagine using the 'file' one to allow the key to be placed on an encrypted file system that has to be mounted for Postgres to start. You could also have the key on a USB device that has to be inserted to be used, and the 'file' is on the USB key --- seems clearer than having to create a script to 'cat' the file. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, Oct 27, 2020 at 03:07:22PM +0800, Craig Ringer wrote: > On Mon, Oct 26, 2020 at 11:02 PM Stephen Frost <sfrost@snowman.net> wrote: > > > TL;DR: > > * Important to check that key rotation is possible on a replica, i.e. > primary and standby can have different cluster passphrase and KEK > encrypting the same WAL and heap keys; > * with a HSM we can't read the key out, so a pluggable KEK operations > context or a configurable URI for the KEK is necessary > * I want the SQL key and SQL wrap/unwrap part in a separate patch, I > don't think it's fully baked and oppose its inclusion in its current > form > * Otherwise this looks good so far > > Explanation and argument for why below. > > > I've not been following very closely, but I definitely agree with the > > general feedback here (more on that below), but to this point- I do > > believe that was the intent, or at least I sure hope that it was. Being > > able to have user/role keys would certainly be good. Having a way for a > > user to log in and unlock their key would also be really nice. > > Right. AFAICS this is supposed to provide the foundation layer for > whole-cluster encryption, and it looks ok for that, caveat about HSMs > aside. I see nothing wrong with using a single key for heap (and one > for WAL, or even the same key). Finer grained and autovacuum etc > becomes seriously painful. You need to use separate keys for heap/index and WAL so you can replicate to another server that uses a different heap/index key, but the same WAL. You can then fail-over to the replica and change the WAL key to complete full key rotation. The replication protocol needs to decrypt, and the receiving end has to encrypt with a different heap/index key. This is the key rotation method this is planned. This is another good reason the keys should be in a separate directory so they can be easily copied or replaced. > I want to take a closer look at how the current implementation will > play with physical replication. I assume the WAL and heap keys have to > be constant for the full cluster lifetime, short of a dump and reload. The WAL key can change if you are willing to stop/start the server. You only read the WAL during crash recovery. > The main issue I have so far is that I don't think the SQL key > actually fits well with the current proposal. Its proposed interface > and use cases are incomplete, it doesn't fully address key leak risks, > there's no user access control, etc. Also the SQL key part could be > implemented on top of the base cluster encryption part, I don't see > why it actually has to integrate with the whole-cluster key management > directly. Agreed. Maybe we should just focus on the TDE use now. I do think the current patch is not commitable since, because there are no defined keys, there is no way to validate the boot-time password. The no-key case should be an unsupported configuration. Maybe we need to just create one key just to verify the boot password. > > SQL KEY > ---- > > I'm not against the SQL key and wrap/unwrap functionality - quite the > contrary, I think it's really important to have something like it. But > is it appropriate to have a single, fixed-for-cluster-lifetime key for > this, one with no SQL-level access control over who can or cannot use > it, etc? The material encrypted with the key is user-exposed so key > rotation is an issue, but is not addressed here. And the interface > doesn't really solve the numerous potential problems with key material > leaks through logs and error messages. > > I just think that if we bake in the proposed user visible wrap/unwrap > interface now we're going to regret it later. How will it work when we > want to add user- or role- level access control for database-stored > keys? When we want to introduce a C-level API for extensions to work > directly with encrypted data like they can work currently with TOASTed > data, to prevent decrypted data from ever becoming SQL function > arguments subject to possible leakage and to allow implementation of > always-encrypted data types, etc? > > Most importantly - I don't think the SQL key adds anything really > crucial that we cannot do at the SQL level with an extension. An > extension "pg_wrap" could provide pg_wrap() and pg_unwrap() already, > using a single master key much like the SQL key proposed in this > patch. To store the master key it could: The idea of the SQL key was to give the boot key a use, but I am now seeing that the SQL key is just holding us back, and is preventing the boot testing that is a requirement. Maybe we just need to forget the SQL part and focus on the TDE usage now, and come back to the SQL part. I am also not 100% clear on the usefulness of the SQL key. > OTHER TRANSPARENT ENCRYPTION USE CASES > ---- > > Does this patch get in the way of supporting other kinds of > transparent encryption that are frequently requested and are in use on > other systems already? > > I don't think so. Whole-cluster encryption is quite separate and the > proposed patch doesn't seem to do anything that'd make table-, row- or > column-level encryption, per-user key management, etc any harder. I think those all are very different and will require more user-level features that what is being done here. > Specific use cases I looked at: > > * Finer grained keying than whole-cluster for transparent > encryption-at-rest. As soon as we have relations that require user > session supplied information to allow the backend to read the relation > we get into a real mess with autovacuum, logical decoding, etc. So if > anyone wants to implement that sorts of thing they're probably going > to want to do so separately to block-level whole-cluster encryption, > in a way that preserves the normal page and page item structure and > encrypts the row data only. Agreed. > * Client-driver-assisted transparently encrypted > at-rest-and-in-transit data, where the database engine doesn't have > the encrypt/decrypt keys at all. Again in this case they're going to > have to do that at the row level or column level, not the block > (relfilenode extents and WAL) level, otherwise we can't provide > autovacuum etc. Yes, this is all going to have to be user-level. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
We could implement a 'command:' prefix now, and maybe
a 'pass:' one, and allow other methods like 'pkcs11' later.
On Tue, Oct 27, 2020 at 10:02:53PM +0800, Craig Ringer wrote: > On Tue, 27 Oct 2020, 19:15 Bruce Momjian, <bruce@momjian.us> wrote: > We don't need to do anything except provide a way to tell OpenSSL where to get > the KEK from, for situations where having Pg generate it internally > undesirable. > > I proposed a simple GUC that we could supply to OpenSSL as a key path because > it's simple. It's definitely not best. > > In my prior mail I outlined what I think is a better way. Abstract key key > initialisation - passphrase fetching KEK/HMAC loading and all of it - behind a > pluggable interface. Looking at the patch, it's mostly there already. We just > need a way to hook the key loading and setup so it can be overridden to use > whatever method is required. Then KEK operations to encrypt and decrypt the > heap and WAL keys happen via that abstraction. > > That way Pg does not have to care about the details of hardware key management, > PKCS#11 or OpenSSL engines, etc. > > A little thought is needed to make key rotation work well. Especially when you > want to switch from cluster passphrase to a plugin that supports use of a HVM > escrowed key, or vice versa. > > But most of what's needed looks like it's there already. It's just down to > making sure the key loading and initialisation is overrideable. I don't know much about how to hook into that stuff so if you have an idea, I am all ears. I have used OpenSSL with Yubikey via pksc11. You can see the use of it on slide 57 and following: https://momjian.us/main/writings/crypto_hw_config.pdf#page=57 Interestingly, that still needed the user to type in a key to unlock the Yubikey, so we might need PKCS11 and a password for the same server start. I would like to get this moving forward so I will work on the idea of passing an open /dev/tty file descriptor from pg_ctl to the postmaster on start. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, Oct 27, 2020 at 10:20:35AM -0400, Bruce Momjian wrote: > I don't know much about how to hook into that stuff so if you have an > idea, I am all ears. I have used OpenSSL with Yubikey via pksc11. You > can see the use of it on slide 57 and following: > > https://momjian.us/main/writings/crypto_hw_config.pdf#page=57 > > Interestingly, that still needed the user to type in a key to unlock the > Yubikey, so we might need PKCS11 and a password for the same server > start. > > I would like to get this moving forward so I will work on the idea of > passing an open /dev/tty file descriptor from pg_ctl to the postmaster > on start. Here is an updated patch that uses an argument to pass an open /dev/tty file descriptor to the postmaster. It uses -R for initdb/pg_ctl, -R ### for postmaster/postgres, and %R for cluster_passphrase_command. Here is a sample session: --> $ initdb -R --cluster-passphrase-command '/tmp/pass_fd.sh "%p" %R' The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. Key management system is enabled. fixing permissions on existing directory /u/pgsql/data ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... America/New_York creating configuration files ... ok running bootstrap script ... --> Enter database encryption pass phrase: B1D7B405EDCD97B7351DD3B7AE0637775FFBC6A2C2EEADAEC009A75A58A79F50 ok performing post-bootstrap initialization ... --> Enter database encryption pass phrase: B1D7B405EDCD97B7351DD3B7AE0637775FFBC6A2C2EEADAEC009A75A58A79F50 ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: pg_ctl -D /u/pgsql/data -l logfile start $ pg_ctl stop pg_ctl: PID file "/u/pgsql/data/postmaster.pid" does not exist Is server running? --> $ pg_ctl -l /u/pg/server.log -R start waiting for server to start... --> Enter database encryption pass phrase: B1D7B405EDCD97B7351DD3B7AE0637775FFBC6A2C2EEADAEC009A75A58A79F50 done server started Attached is my updated patch, based on Masahiko Sawada's patch, and my pass_fd.sh script. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Attachment
>
I don't know much about how to hook into that stuff so if you have an
idea, I am all ears.
I have used OpenSSL with Yubikey via pksc11. You
can see the use of it on slide 57 and following:
https://momjian.us/main/writings/crypto_hw_config.pdf#page=57
Interestingly, that still needed the user to type in a key to unlock the
Yubikey, so we might need PKCS11 and a password for the same server
start.
On Wed, Oct 28, 2020 at 9:43 AM Bruce Momjian <bruce@momjian.us> wrote:
>I don't know much about how to hook into that stuff so if you have an
idea, I am all ears.Yeah, I have a reasonable idea. The main thing will be to re-read the patch and put it into more concrete terms, which I'll try to find time for soon. I need to find time to craft a proper demo that uses a virtual hsm, and can also demonstrate how to use the host TPM or a Yubikey using the simple openssl engine interfaces or a URI.
On Tue, 27 Oct 2020 at 20:34, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Oct 27, 2020 at 03:07:22PM +0800, Craig Ringer wrote: > > On Mon, Oct 26, 2020 at 11:02 PM Stephen Frost <sfrost@snowman.net> wrote: > > > > > > TL;DR: > > > > * Important to check that key rotation is possible on a replica, i.e. > > primary and standby can have different cluster passphrase and KEK > > encrypting the same WAL and heap keys; > > * with a HSM we can't read the key out, so a pluggable KEK operations > > context or a configurable URI for the KEK is necessary > > * I want the SQL key and SQL wrap/unwrap part in a separate patch, I > > don't think it's fully baked and oppose its inclusion in its current > > form > > * Otherwise this looks good so far > > > > Explanation and argument for why below. > > > > > I've not been following very closely, but I definitely agree with the > > > general feedback here (more on that below), but to this point- I do > > > believe that was the intent, or at least I sure hope that it was. Being > > > able to have user/role keys would certainly be good. Having a way for a > > > user to log in and unlock their key would also be really nice. > > > > Right. AFAICS this is supposed to provide the foundation layer for > > whole-cluster encryption, and it looks ok for that, caveat about HSMs > > aside. I see nothing wrong with using a single key for heap (and one > > for WAL, or even the same key). Finer grained and autovacuum etc > > becomes seriously painful. > > You need to use separate keys for heap/index and WAL so you can > replicate to another server that uses a different heap/index key, but > the same WAL. You can then fail-over to the replica and change the WAL > key to complete full key rotation. The replication protocol needs to > decrypt, and the receiving end has to encrypt with a different > heap/index key. This is the key rotation method this is planned. This > is another good reason the keys should be in a separate directory so > they can be easily copied or replaced. I think it's better we decrypt WAL in the xlogreader layer, instead of doing in replication protocol. That way, we also can support frontend tools that need to read WAL such as pg_waldump and pg_rewind as well as logical decoding. > > > I want to take a closer look at how the current implementation will > > play with physical replication. I assume the WAL and heap keys have to > > be constant for the full cluster lifetime, short of a dump and reload. > > The WAL key can change if you are willing to stop/start the server. You > only read the WAL during crash recovery. We might need to consider having multiple key generations, rather than in-place rotation. If we simply update the WAL key in-place in the primary, archive WALs restored via restore_command cannot be decrypted in the replica. We might need to do generation management for WAL key and provide the functionality to purge old WAL keys. > > > > SQL KEY > > ---- > > > > I'm not against the SQL key and wrap/unwrap functionality - quite the > > contrary, I think it's really important to have something like it. But > > is it appropriate to have a single, fixed-for-cluster-lifetime key for > > this, one with no SQL-level access control over who can or cannot use > > it, etc? The material encrypted with the key is user-exposed so key > > rotation is an issue, but is not addressed here. And the interface > > doesn't really solve the numerous potential problems with key material > > leaks through logs and error messages. > > > > I just think that if we bake in the proposed user visible wrap/unwrap > > interface now we're going to regret it later. How will it work when we > > want to add user- or role- level access control for database-stored > > keys? When we want to introduce a C-level API for extensions to work > > directly with encrypted data like they can work currently with TOASTed > > data, to prevent decrypted data from ever becoming SQL function > > arguments subject to possible leakage and to allow implementation of > > always-encrypted data types, etc? > > > > Most importantly - I don't think the SQL key adds anything really > > crucial that we cannot do at the SQL level with an extension. An > > extension "pg_wrap" could provide pg_wrap() and pg_unwrap() already, > > using a single master key much like the SQL key proposed in this > > patch. To store the master key it could: > > The idea of the SQL key was to give the boot key a use, but I am now > seeing that the SQL key is just holding us back, and is preventing the > boot testing that is a requirement. Maybe we just need to forget the > SQL part and focus on the TDE usage now, and come back to the SQL part. > I am also not 100% clear on the usefulness of the SQL key. I agree to focus on the TDE usage now. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Oct 28, 2020 at 09:24:35PM +0900, Masahiko Sawada wrote: > On Tue, 27 Oct 2020 at 20:34, Bruce Momjian <bruce@momjian.us> wrote: > > You need to use separate keys for heap/index and WAL so you can > > replicate to another server that uses a different heap/index key, but > > the same WAL. You can then fail-over to the replica and change the WAL > > key to complete full key rotation. The replication protocol needs to > > decrypt, and the receiving end has to encrypt with a different > > heap/index key. This is the key rotation method this is planned. This > > is another good reason the keys should be in a separate directory so > > they can be easily copied or replaced. > > I think it's better we decrypt WAL in the xlogreader layer, instead of > doing in replication protocol. That way, we also can support frontend > tools that need to read WAL such as pg_waldump and pg_rewind as well > as logical decoding. Sure. I was just saying the heap/index files coming from the primary should have decrypted heap/index blocks, but I was not sure what level it should happen. If the data coming out the primary is encrypted, you would need the old (to decrypt) and new (to encrypt) keys on the standby, which seems too complex. To clarify, the data and heap/index pages in the WAL are only encrypted with the WAL key, but when pg_basebackup is streaming the files from PGDATA, it shouldn't be encrypted, or encrypted only with the WAL key, at the time of transfer since the receiver should be re-encrypting it. If that will not work, we should know now. > > > I want to take a closer look at how the current implementation will > > > play with physical replication. I assume the WAL and heap keys have to > > > be constant for the full cluster lifetime, short of a dump and reload. > > > > The WAL key can change if you are willing to stop/start the server. You > > only read the WAL during crash recovery. > > We might need to consider having multiple key generations, rather than > in-place rotation. If we simply update the WAL key in-place in the > primary, archive WALs restored via restore_command cannot be decrypted > in the replica. We might need to do generation management for WAL key > and provide the functionality to purge old WAL keys. Since we have the keys stored in the file system, I think we will use a command-line tool that can access both old and new keys and re-encrypted the archived WAL. I think old/new keys inside the server is too complex. > > The idea of the SQL key was to give the boot key a use, but I am now > > seeing that the SQL key is just holding us back, and is preventing the > > boot testing that is a requirement. Maybe we just need to forget the > > SQL part and focus on the TDE usage now, and come back to the SQL part. > > I am also not 100% clear on the usefulness of the SQL key. > > I agree to focus on the TDE usage now. I admit the SQL key idea was mine, and I now see it was a bad idea since it just adds confusion and doesn't add value. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Greetings, * Craig Ringer (craig.ringer@enterprisedb.com) wrote: > On Mon, Oct 26, 2020 at 11:02 PM Stephen Frost <sfrost@snowman.net> wrote: > > TL;DR: > > * Important to check that key rotation is possible on a replica, i.e. > primary and standby can have different cluster passphrase and KEK > encrypting the same WAL and heap keys; I agree that key rotation would certainly be good to have. > * with a HSM we can't read the key out, so a pluggable KEK operations > context or a configurable URI for the KEK is necessary There's a lot of options around HSMs, the Linux crypto API, potential different encryption libraries, et al. One thing that I'm not sure we're being clear enough on here is when we're talking about a KEK (key encryption key) vs. when we're talking about actually off-loading all of the encryption to an HSM or to an OpenSSL engine (which might in turn use the Linux crypto API...), etc. Agreed that, with some HSMs, we aren't able to actually pull out the key. Depending on the HSM, it may or may not be able to perform encryption and decryption with any kind of speed and therefore we should have options which don't require that. This would be the typical case where we'd have a KEK which encrypts a key we have stored and then that key is what's actually used for the encryption/decryption of the data. > * I want the SQL key and SQL wrap/unwrap part in a separate patch, I > don't think it's fully baked and oppose its inclusion in its current > form I'm generally a fan of having something at the SQL level, but I agree that it doesn't need to be part of this initial capability and could be done later as a separate patch. > Most importantly - I don't think the SQL key adds anything really > crucial that we cannot do at the SQL level with an extension. An > extension "pg_wrap" could provide pg_wrap() and pg_unwrap() already, > using a single master key much like the SQL key proposed in this > patch. To store the master key it could: Lots of things can be done in extensions but, at least for my part, I'd much rather see us build in an SQL key capability (with things like grammar support and being able to tie to to a role cleanly) than to try and figure out how to make this work as an extension. > That way we haven't baked some sort of limited wrap/unwrap into Pg's > long term user visible API. I'd be totally happy for such a SQL key > wrap/unwrap to become part of pgcrypto, or a separate extension that > uses pgcrypto, if you're worried about having it available to users. I > just don't really want it in src/backend in its current form. There's no shortage of interfaces that exist in other database systems for this that we can look at to help guide us in coming up with a good API here. All that said, we can debate that on another thread and independently of this discussion around TDE. > OTHER TRANSPARENT ENCRYPTION USE CASES > ---- > > Does this patch get in the way of supporting other kinds of > transparent encryption that are frequently requested and are in use on > other systems already? > > I don't think so. Whole-cluster encryption is quite separate and the > proposed patch doesn't seem to do anything that'd make table-, row- or > column-level encryption, per-user key management, etc any harder. > > Specific use cases I looked at: > > * Finer grained keying than whole-cluster for transparent > encryption-at-rest. As soon as we have relations that require user > session supplied information to allow the backend to read the relation > we get into a real mess with autovacuum, logical decoding, etc. So if > anyone wants to implement that sorts of thing they're probably going > to want to do so separately to block-level whole-cluster encryption, > in a way that preserves the normal page and page item structure and > encrypts the row data only. I tend to agree with this. > * Client-driver-assisted transparently encrypted > at-rest-and-in-transit data, where the database engine doesn't have > the encrypt/decrypt keys at all. Again in this case they're going to > have to do that at the row level or column level, not the block > (relfilenode extents and WAL) level, otherwise we can't provide > autovacuum etc. +100 to having client-driver-assisted encryption, this solves real attack vectors which traditional TDE simply doesn't, compared to filesystem or block device level encryption (even though lots of people seem to think it does, which is bizarre to me). > > That > > said- I don't think we necessarily want to throw out tho command-based > > option, as users may wish to use a vaulting solution or similar instead > > of an HSM. > > I agree. I wasn't proposing to throw out the command based approach, > just provide a way to inform postgres that it should do operations > with the KEK using an external engine instead of deriving its own KEK > from a passphrase and other inputs. I would think we'd want to enable admins to be able to control if what is being provided is a KEK (where the key is then decrypted by PG and PG then uses whatever libraries it's built with to perform the encryption and decryption in PG process space), or an engine/offloading configuration (where PG doesn't ever see the actual key and all encryption and decryption is done outside of PG's control by an HSM or the Linux kernel through the crypto API or whatever). The use-cases I'm thinking about: - User has a Yubikey, but would like PG to be able to write more than one block at a time. In this case, the Yubikey would have a KEK which PG doesn't ever see. PG would have an encrypted blob that it then asks the yubikey to decrypt which contains the actual key that's then kept in PG's memory to perform the encryption/decryption. Naturally, if that key is stolen then an attacker could decrypt the entire database, even if they don't have the yubikey. An attacker could acquire that key by having sufficient access on the PG sever to be able to read PG's memory. - User has a Thales Luna PCIe HSM, or similar. In this case, the user wants *all* of the encryption/decryption happening on the HSM and none of it happening in PG space, making it impossible for an attacker to acquire the actual key. - User has a yubikey, similar to #1, but would like to have the Linux kernel used to safe-guard the actual key used. This is a bit of an in-between area between the first case above and the second- specifically, a yubikey could have the KEK but then the actual data encryption key isn't given to PG, it's put into the Linux kernel's keyring and PG uses (perhaps through OpenSSL) the Linux crypto API to off-load the actual encryption and decryption to have that happening outside of PG's process space. This would make it much more difficult for an attacker to acquire the key if they only have control over PG or the postgres unix account, since the Linux kernel would prevent access to it, but it wouldn't require a HSM crypto accelerator. Of course, should an attacker gain root or direct physical access to the system somehow, they might be able to acquire the actual data encryption key that way. - User has a vaulting solution, and perhaps wants to store the actual encryption/decryption key there, or perhaps the user wants to store a passphrase in the vault and have PG derive the actual key from that. Either seems like it could be reasonable. - User hasn't got anything special and just wants to keep it simple by using a passphrase that's entered when PG is started up. > > What I am curious about though- what are the thoughts around > > using a vaulting solution's command-line tool vs. writing code to work > > with an API? > > I think the code that fetches the cluster passphrase from a command > should be interceptable by a hook, so organisations with Wacky > Security Policies Written By People Who Have Heard About Computers But > Never Used One can jump through the necessary hoops. I am of course > absolutely not speaking from experience here, no, not at all... see > ssl_passphrase_function in src/backend/libpq/be-secure-openssl.c, and > see src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c . > > So I suggest something like that - a hook that by default calls an > external command but can by overridden by an extension. It wouldn't be > openssl specific like the server key passphrase example though. That > was done with an openssl specific hook because we don't know if we're > going to need a passphrase at all until openssl has opened the key. In > the cluster encryption case we'll know if we're doing our own KEK+HMAC > generation or not without having to ask the SSL library. What I'm wondering about here is if we should make it an explicit option for a user to pick through the server configuration about if they're giving PG a direct key to use, a KEK that's actually meant to decrypt the data key, a way to fetch the direct key or the KEK, or a engine which has the KEK to ask to decrypt the data key, etc. If we can come up with a way to configure PG that will support the different use cases outlined above without being overly complicated, that'd be great. I'm not sure that I see that in what you've proposed here, but maybe by going through each of the use-cases and showing how a user would configure PG for each with this proposal, I will. > > Between these various options, what are the risks of > > having a script vs. using an API and would one or the other weaken the > > overall solution? Or is what's really needed here is a way to tell us > > if it's a passphrase we're getting or a proper key, regardless of the > > method being used to fetch it? > > For various vault systems I don't think it matters at all whether the > secret they manage is the key, input used to generate the key, or > input used to decrypt a key stored elsewhere. Either way they have the > pivotal secret. So I don't see much point allowing the command to > return a fully formed key. I hadn't really considered that to be a distinction either, so I'm glad that it sounds like we agreed on that point. > The point of a HSM that you don't get to read the key. Pg can never > read the key, it can only perform encrypt and decrypt operations on > the key using the HSM via the SSL library: This really depends on exactly what "key" is being referred to here, and where the encryption/decryption is happening. Hopefully the above use cases help clarify. > Pg -> openssl: > "this is the ciphertext of the wal_key. Please decrypt it for me." > openssl -> engine layer > "engine, please decrypt this" > pkcs#11 engine-> pkcs#11 provider: > "please decrypt this" > pkcs#11 provider -> HSM-specific libraries, network proxies, whatever: > "please decrypt this" > "... here's the plaintext" > <- flows back up Right- in this case, ultimately, the actual key used for the encryption and decryption ends up in PG's memory space as plaintext and could therefore be acquired by an attacker with access to PG memory space. > So the KEK used to encrypt the main cluster keys for heap and wal > encryption is never readable by Pg. It usually never enters host > memory - in the case of a HSM, the ciphertext is sent over USB or PCIe > to the HSM and the cleartext comes back. Agreed, the KEK isn't, but that isn't actually all that interesting since the KEK isn't needed to decrypt the data. > In openssl, the default engine is file-based with host software crypto > implementations. You can specify alternate engines using various > OpenSSL APIs, or you can specify them by supplying a URI where you'd > usually supply a file path to a key. Right. > I'm proposing we make it easy to supply a key URI and let openssl > handle the engine etc. It's far from perfect, and it's really meant as > a fallback to allow apps that don't natively understand SSL engines > etc to still use them in a limited capacity. I agree that it doesn't seem like a bad approach to expose that URI, but I'm not sure that's really the end of it since there's going to be cases where people would like to have a KEK on a yubikey and there'll be other cases where people would like to offload all of the encryption and decryption to a HSM crypto accelerator and, ideally, we'd allow them to be able to configure PG for either of those cases. > What I'd *prefer* to do is make the function that sets up the KEK > hookable. So by default we'd call a function that'd read the external > passphrase from a command use that to generate KEK+HMAC. But an > extension hook installed at shared_preload_libraries time could > override the behaviour completely and return its own implementation. I don't see a problem with adding hooks, where they make sense, but we should also make things work in a sensible way and a way that works with at least the use-cases that I've outlined, ideally, without having to go get an extension or write C code. > > This really locks us into OpenSSL for this, which I don't particularly > > like. > > We're pretty locked into openssl already. I don't like it either, it > was just the option that has the least impact/delay on the main work > on this patch. There's an active patch that's been worked on for quite some time that's getting some renewed interest in adding NSS support, something I certainly support also, so we really shouldn't be taking steps that end up making it more difficult to support alternatives. Perhaps a generic 'key URI' type of option wouldn't be too bad, and each library we support could parse that string out based on what information it needs (eg: for NSS, a database + key nickname could be provided in some specific format), but overall we certainly shouldn't be baking things in which are very OpenSSL-specific and exposed to users. > I'd rather abstract KEK operations behind a context object-like struct > with function pointer members, like we do in many other places in Pg. > Make the default one do the dance of reading the external passphrase > and generating the KEK on the fly. Allow plugins to override it with > their own, and let them set it up to delegate to a HSM or whatever > else they want. > > Then ship a simple openssl based default implementation of HSM support > that can be shoved in shared_preload_libraries. Or if we don't like > using s_p_l, add a separate GUC for cluster_encryption_key_manager or > whatever, and a different entrypoint, instead of having s_p_l call > _PG_init() to register a hook. I definitely think we want to support things directly in PG and not require an extension or something to be in s_p_l for this. > > > For example if I want to lock my database with a YubiHSM I would configure > > > something like: > > > > > > cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private' > > > > > > The DB would be encrypted and decrypted using application keys unlocked by > > > the HSM. Backups of the database, stolen disk images, etc, would be > > > unreadable unless you have access to another HSM with the same key loaded. > > > > Well, you would surely just need the key, since you could change the PG > > config to fetch the key from wherever you have it, you wouldn't need an > > actual HSM. > > Right - if your HSM was programmed by generating a key and storing > that into the HSM and you have that key backed up in file form > somewhere, you could likely put it in a pem file and use that directly > by pointing Pg at the file instead of an engine URI. Sure. > But you might not even have the key. In some HSM implementations the > key is completely sealed - you can program new HSMs to have the same > key by using the same configuration, but you cannot actually obtain > the key short of attacks on the HSM hardware itself. That's very much > by design - the HSM configuration is usually on an air-gapped system, > and it isn't sufficient to decrypt anything unless you also have > access to a copy of the HSM hardware itself. Obviously you accept the > risks if you take that approach, and you must have an escape route > where you can re-encrypt the material protected by the HSM against > some other key. But it's not at all uncommon. Right, but in such cases you'd need an HSM that's able to perform encryption and decryption at some reasonable rate. > Key rotation is obviously vital to make this vaguely sane. In Pg's > case you'd to change the key configuration, then trigger a key > rotation step, which would decrypt with a context obtained from the > old config then encrypt with a context obtained from the new config. Yes, key rotation is an important part. > > > If cluster_encryption_key is unset, Pg would perform its own KEK derivation > > > based on cluster_passphrase_command as currently implemented. > > > > To what I was suggesting above- what if we just had a GUC that's > > "kek_method" with options 'passphrase' and 'direct', where passphrase > > goes through KEK and 'direct' doesn't, which just changes how we treat > > the results of called cluster_passphrase_command? > > That won't work for a HSM. It is not possible to extract the key. > "direct" cannot be implemented. Perhaps the above helps explain what I was getting at there. Thanks, Stephen
Attachment
On Wed, Oct 28, 2020 at 12:02:46PM +0800, Craig Ringer wrote: > On Wed, Oct 28, 2020 at 9:43 AM Bruce Momjian <bruce@momjian.us> wrote: > I have used OpenSSL with Yubikey via pksc11. You > can see the use of it on slide 57 and following: > > https://momjian.us/main/writings/crypto_hw_config.pdf#page=57 > > Interestingly, that still needed the user to type in a key to unlock the > Yubikey, so we might need PKCS11 and a password for the same server > start. > > Yes, that's possible. But in that case the passphrase will be asked for by > openssl only when required, and we'll need to supply an openssl askpass hook. What we _will_ need is access to a /dev/tty file descriptor, and this patch does that, though it closes it as soon as the internal keys are unlocked so the terminal can be disconnected from the database processes. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Oct 28, 2020 at 02:29:16PM -0400, Bruce Momjian wrote: > On Wed, Oct 28, 2020 at 12:02:46PM +0800, Craig Ringer wrote: > > Yes, that's possible. But in that case the passphrase will be asked for by > > openssl only when required, and we'll need to supply an openssl askpass hook. > > What we _will_ need is access to a /dev/tty file descriptor, and this > patch does that, though it closes it as soon as the internal keys are > unlocked so the terminal can be disconnected from the database > processes. FYI, the file descriptor facility will eventually allow for SSL certificate unlocking passwords to be prompted from the terminal, instead of requiring the use of ssl_passphrase_command, but let's get the facility fully completed first. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Oct 28, 2020 at 05:16:32PM +0800, Craig Ringer wrote: > On Wed, Oct 28, 2020 at 12:02 PM Craig Ringer <craig.ringer@enterprisedb.com> > wrote: > > On Wed, Oct 28, 2020 at 9:43 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > I don't know much about how to hook into that stuff so if you have an > idea, I am all ears. > > > Yeah, I have a reasonable idea. The main thing will be to re-read the patch > and put it into more concrete terms, which I'll try to find time for soon. > I need to find time to craft a proper demo that uses a virtual hsm, and can > also demonstrate how to use the host TPM or a Yubikey using the simple > openssl engine interfaces or a URI. > > > Do you have this in a public git tree anywhere? If not please consider using > "git format-patch -v 1 -1" or similar to generate it, so I can "git am" the > patch. I have made a github branch, and will keep it updated: https://github.com/bmomjian/postgres/tree/key I am also attaching an updated patch. > A few comments on the patch as I read through. Some will be addressed by the > quick PoC I'm preparing for pluggable key derivation, some won't. In no > particular order: > > * The term KEK only appears in the docs; where it appears in the sources it's > lower case. I suggest making "KEK" grep-able in the sources. Fixed. > * BootStrapKmgr() says to call once on "system install" . I suggest "initdb". Done. > * The jumble of #ifdef FRONTEND in src/common/kmgr_utils.c shouldn't remain in > the final patch if possible. This may require some "static inline" wrappers or > helpers. I can do this if you give me more details. > * PgKeyWrapCtx.key is unused and should probably be deleted Removed. > * HMAC support should go into PgCipherCtx so that we can have a single > PgCipherCtx that supports cipher ops, cipher+HMAC ops, or just HMAC ops > * PgKeyWrapCtx.cipherctx should then be supplemented with a hmacctx. It should > be legal to set cipherctx and hmacctx to the same value, since in some cases > the it won't be easy to initialize the backing implementation separately for > key and HMAC. Sorry, I don't know how to do the above items. > FORK? > ---- > > One possible problem with this is that we should not assume we can perform KEK > operations in postmaster children, since there's no guarantee we can use > whatever sits behind a PgCipherCtx after fork(). But AFAICS the KEK doesn't > live beyond the various kmgr_ operations as it is, so there's no reason it > should ever have to be carried over a fork() anyway. Yes, I think so. > * the kmgr appears to need to be able to work in frontend code (?) > * for key rotation we need to be able to change KEKs, and possibly KEK > acquisition methods, at runtime We might need to change the KEK using a command-line tool so we can more easily prompt for the new KEK. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Attachment
> Most importantly - I don't think the SQL key adds anything really
> crucial that we cannot do at the SQL level with an extension. An
> extension "pg_wrap" could provide pg_wrap() and pg_unwrap() already,
> using a single master key much like the SQL key proposed in this
> patch. To store the master key it could:
Lots of things can be done in extensions but, at least for my part, I'd
much rather see us build in an SQL key capability (with things like
grammar support and being able to tie to to a role cleanly) than to try
and figure out how to make this work as an extension.
+100 to having client-driver-assisted encryption, this solves real
attack vectors which traditional TDE simply doesn't, compared to
filesystem or block device level encryption (even though lots of
people seem to think it does, which is bizarre to me).
I would think we'd want to enable admins to be able to control if what
is being provided is a KEK (where the key is then decrypted by PG and PG
then uses whatever libraries it's built with to perform the encryption
and decryption in PG process space), or an engine/offloading
configuration (where PG doesn't ever see the actual key and all
encryption and decryption is done outside of PG's control by an HSM or
the Linux kernel through the crypto API or whatever).
The use-cases I'm thinking about:
- User has a Yubikey, but would like PG to be able to write more than
one block at a time. In this case, the Yubikey would have a KEK which
PG doesn't ever see.
PG would have an encrypted blob that it then
asks the yubikey to decrypt which contains the actual key that's then
kept in PG's memory to perform the encryption/decryption. Naturally,
if that key is stolen then an attacker could decrypt the entire
database, even if they don't have the yubikey. An attacker could
acquire that key by having sufficient access on the PG sever to be
able to read PG's memory.
- User has a Thales Luna PCIe HSM, or similar. In this case, the user
wants *all* of the encryption/decryption happening on the HSM and none
of it happening in PG space, making it impossible for an attacker to
acquire the actual key.
- User has a yubikey, similar to #1, but would like to have the Linux
kernel used to safe-guard the actual key used.
- User has a vaulting solution, and perhaps wants to store the actual
encryption/decryption key there, or perhaps the user wants to store a
passphrase in the vault and have PG derive the actual key from that.
Either seems like it could be reasonable.
What I'm wondering about here is if we should make it an explicit option
for a user to pick through the server configuration about if they're
giving PG a direct key to use, a KEK that's actually meant to decrypt
the data key, a way to fetch the direct key or the KEK, or a engine
which has the KEK to ask to decrypt the data key, etc.
If we can come
up with a way to configure PG that will support the different use cases
outlined above without being overly complicated, that'd be great. I'm
not sure that I see that in what you've proposed here, but maybe by
going through each of the use-cases and showing how a user would
configure PG for each with this proposal, I will.
Prompt:
pass word here
I agree that it doesn't seem like a bad approach to expose that URI, but
I'm not sure that's really the end of it since there's going to be cases
where people would like to have a KEK on a yubikey and there'll be other
cases where people would like to offload all of the encryption and
decryption to a HSM crypto accelerator and, ideally, we'd allow them to
be able to configure PG for either of those cases.
I don't see a problem with adding hooks, where they make sense, but we
should also make things work in a sensible way and a way that works with
at least the use-cases that I've outlined, ideally, without having to go
get an extension or write C code.
There's an active patch that's been worked on for quite some time that's
getting some renewed interest in adding NSS support, something I
certainly support also, so we really shouldn't be taking steps that end
up making it more difficult to support alternatives.
I definitely think we want to support things directly in PG and not
require an extension or something to be in s_p_l for this.
> But you might not even have the key. In some HSM implementations the
> key is completely sealed - you can program new HSMs to have the same
> key by using the same configuration, but you cannot actually obtain
> the key short of attacks on the HSM hardware itself. That's very much
> by design - the HSM configuration is usually on an air-gapped system,
> and it isn't sufficient to decrypt anything unless you also have
> access to a copy of the HSM hardware itself. Obviously you accept the
> risks if you take that approach, and you must have an escape route
> where you can re-encrypt the material protected by the HSM against
> some other key. But it's not at all uncommon.
Right, but in such cases you'd need an HSM that's able to perform
encryption and decryption at some reasonable rate.