Thread: Proposed patch for key managment

Proposed patch for key managment

From
Bruce Momjian
Date:
Attached is a patch for key management, which will eventually be part of
cluster file encryption (CFE), called TDE (Transparent Data Encryption)
by Oracle.  It is an update of Masahiko Sawada's patch from July 31:

    https://www.postgresql.org/message-id/CA+fd4k6RJwNvZTro3q2f5HSDd8HgyUc4CuY9U3e6Ran4C6TO4g@mail.gmail.com

Sawada-san did all the hard work, and I just redirected the patch.  The
general outline of this CFE feature can be seen here:

    https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

The currently planned progression for this feature is to allow secure
retrieval of key encryption keys (KEK) outside of the database, then use
those to encrypt data keys that encrypt heap/index/tmpfile files.

This patch was changed from Masahiko Sawada's version by removing
references to non-cluster file encryption because having SQL-level keys
stored in this way was considered to have limited usefulness.  I have
also remove references to SQL-level KEK rotation since that is best done
as a command-line too.

If most people approve of this general approach, and the design
decisions made, I would like to apply this in the next few weeks, but
this brings complications.  The syntax added by this commit might not
provide a useful feature until PG 15, so how do we hide it from users. 
I was thinking of not applying the doc changes (or commenting them out)
and commenting out the --help output.

Once this patch is applied, I would like to use the /dev/tty file
descriptor passing feature for the ssl_passphrase_command parameter, so
the ssl passphrase can be prompted from the terminal.  (I am attaching
pass_fd.sh as a POC for how file descriptor handling works.)  I would
also then write the KEK rotation command-line tool.  After that, we can
start working on heap/index/tmpfile encryption using this patch as a
base.

Here is an example of the current patch in action, using a KEK like
'7CE7945EA2F7322536F103E4D7D91C21E52288089EF99D186B9A76D666EBCA66',
which is not echoed to the terminal:

    $ initdb -R -c '/u/postgres/tmp/pass_fd.sh "Enter password:" %R'
    The files belonging to this database system will be owned by user "postgres".
    This user must also own the server process.
    
    The database cluster will be initialized with locale "en_US.UTF-8".
    The default database encoding has accordingly been set to "UTF8".
    The default text search configuration will be set to "english".
    
    Data page checksums are disabled.
    Cluster file encryption is enabled.
    
    fixing permissions on existing directory /u/pgsql/data ... ok
    creating subdirectories ... ok
    selecting dynamic shared memory implementation ... posix
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting default time zone ... America/New_York
    creating configuration files ... ok
    running bootstrap script ...
-->    Enter password:ok
    performing post-bootstrap initialization ...
-->    Enter password:ok
    syncing data to disk ... ok
    
    initdb: warning: enabling "trust" authentication for local connections
    You can change this by editing pg_hba.conf or using the option -A, or
    --auth-local and --auth-host, the next time you run initdb.
    
    Success. You can now start the database server using:
    
        pg_ctl -D /u/pgsql/data -l logfile start
    
    $ pg_ctl -R -l /u/pg/server.log start
    waiting for server to start...
-->    Enter password: done
    server started

A non-matching passphrase looks like this:

    $ pg_ctl -R -l /u/pg/server.log start
    waiting for server to start...
-->    Enter password: stopped waiting
    pg_ctl: could not start server
    Examine the log output.

    $ tail -2 /u/pg/server.log
    2020-12-02 16:32:34.045 EST [13056] FATAL:  cluster passphrase does not match expected passphrase
    2020-12-02 16:32:34.047 EST [13056] LOG:  database system is shut down

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec  2, 2020 at 04:38:14PM -0500, Bruce Momjian wrote:
> If most people approve of this general approach, and the design
> decisions made, I would like to apply this in the next few weeks, but
> this brings complications.  The syntax added by this commit might not
> provide a useful feature until PG 15, so how do we hide it from users. 
> I was thinking of not applying the doc changes (or commenting them out)
> and commenting out the --help output.

Here is an updated patch to handle the new hash API introduced by
commit 87ae9691d2.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Fri, Dec 04, 2020 at 09:08:03PM -0500, Bruce Momjian wrote:
> Here is an updated patch to handle the new hash API introduced by
> commit 87ae9691d2.

+       if (!ossl_initialized)
+       {
+#ifdef HAVE_OPENSSL_INIT_SSL
+               OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL);
+#else
+               OPENSSL_config(NULL);
+               SSL_library_init();
+               SSL_load_error_strings();
+#endif
+               ossl_initialized = true;
This is a duplicate of what's done in be-secure-openssl.c, and it does
not strike me as a good idea to do that potentially twice.

git diff --check complains.

+extern bool pg_HMAC_SHA512(const uint8 *key,
+                           const uint8 *in, int inlen,
+                           uint8 *out);
I think that the split done in this patch makes the HMAC handling in
the core code messier:
- SCRAM makes use of HMAC internally, and we should try to use the
HMAC of OpenSSL if building with it even for SCRAM.
- For the first reason, I think that we should also have a fallback
implementation.
- This API layer should not depend directly on the SHA2 used (SCRAM
uses SHA256 with HMAC).
FWIW, I got plans to work on that once I am done with the business
around MD5 and OpenSSL.

The refactoring done with the ciphers moved from pgcrypto to
src/common/ should be a separate patch.  In short, it would be good to
rework this patch and split it into pieces that are independently
useful.  This would make the review much easier as well.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Masahiko Sawada
Date:
On Sat, 5 Dec 2020 at 11:08, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Dec  2, 2020 at 04:38:14PM -0500, Bruce Momjian wrote:
> > If most people approve of this general approach, and the design
> > decisions made, I would like to apply this in the next few weeks, but
> > this brings complications.  The syntax added by this commit might not
> > provide a useful feature until PG 15, so how do we hide it from users.
> > I was thinking of not applying the doc changes (or commenting them out)
> > and commenting out the --help output.
>
> Here is an updated patch to handle the new hash API introduced by
> commit 87ae9691d2.
>

Thank you for updating the patch and moving forward!

I've tried to use this patch on my environment (FreeBSD 12.1) but it
doesn't work. I got the following error:

$ bin/initdb -D data -E UTF8 --no-locale -c'echo
"1234567890123456789012345678901234567890123456789012345678901234567890"'
The files belonging to this database system will be owned by user "masahiko".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".

Data page checksums are disabled.
Cluster file encryption is enabled.

creating directory data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Japan
creating configuration files ... ok
running bootstrap script ... child process was terminated by signal
10: Bus error
initdb: removing data directory "data"

The backtrace is:

(lldb) bt
* thread #1, name = 'postgres', stop reason = signal SIGBUS: hardware error
    frame #0: 0x0000000000d3b074
postgres`ResourceArrayRemove(resarr=0x7f7f7f7f7f7f80df,
value=34383251232) at resowner.c:308:23
    frame #1: 0x0000000000d3c0ef
postgres`ResourceOwnerForgetCryptoHash(owner=0x7f7f7f7f7f7f7f7f,
handle=34383251232) at resowner.c:1421:7
    frame #2: 0x0000000000d77c54
postgres`pg_cryptohash_free(ctx=0x000000080166c720) at
cryptohash_openssl.c:228:3
  * frame #3: 0x0000000000d6c065
postgres`kmgr_derive_keys(passphrase="1234567890123456789012345678901234567890123456789012345678901234567890\n",
passlen=71, enckey="\x01d

\x17(\x93\xa8id\xae\xa5\x86\x84\xb9Y>\xa84\x16\U00000085\U0000009d'\r\x11123456789012345678901234567890
1234567890123456789012345678901234567890\n",

mackey="0\x1f\xb4_eg\x95\x02\x9e2\x0e\xba\t\b^\x18\x90U\xa0\x8e\xaei\x01oYJL\xa4\xb3E\x97a,\x06h4\x9fX\x9eS\xb2\x81%^d\xa4\x01d

\x17(\x93\xa8id\xae\xa5\x86\x84\xb9Y>\xa84\x16\U00000085\U0000009d'\r\x1112345678901234567890123456789
01234567890123456789012345678901234567890\n") at kmgr_utils.c:239:2
    frame #4: 0x0000000000d60810 postgres`BootStrapKmgr at kmgr.c:93:2
    frame #5: 0x0000000000647fa1 postgres`BootStrapXLOG at xlog.c:5359:3
    frame #6: 0x000000000066f034 postgres`AuxiliaryProcessMain(argc=7,
argv=0x00007fffffffcdb8) at bootstrap.c:450:4
    frame #7: 0x00000000008e9cfb postgres`main(argc=8,
argv=0x00007fffffffcdb0) at main.c:201:3
    frame #8: 0x000000000051010f postgres`_start(ap=<unavailable>,
cleanup=<unavailable>) at crt1.c:76:7

Investigating the issue, it seems we need to initialize the context
and the state with 0. The following change fixes this issue.

diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
index e5233daab6..a45c86fa67 100644
--- a/src/common/cryptohash_openssl.c
+++ b/src/common/cryptohash_openssl.c
@@ -81,6 +81,8 @@ pg_cryptohash_create(pg_cryptohash_type type)
        return NULL;
    }

+   memset(ctx, 0, sizeof(pg_cryptohash_ctx));
+   memset(state, 0, sizeof(pg_cryptohash_state));
    ctx->data = state;
    ctx->type = type;

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sat, Dec  5, 2020 at 11:39:18AM +0900, Michael Paquier wrote:
> On Fri, Dec 04, 2020 at 09:08:03PM -0500, Bruce Momjian wrote:
> > Here is an updated patch to handle the new hash API introduced by
> > commit 87ae9691d2.
> 
> +       if (!ossl_initialized)
> +       {
> +#ifdef HAVE_OPENSSL_INIT_SSL
> +               OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL);
> +#else
> +               OPENSSL_config(NULL);
> +               SSL_library_init();
> +               SSL_load_error_strings();
> +#endif
> +               ossl_initialized = true;
> This is a duplicate of what's done in be-secure-openssl.c, and it does
> not strike me as a good idea to do that potentially twice.

Yeah, I kind of wondered about that.  In fact, the code from the
original patch would not compile so I got this init code from somewhere
else. I have now removed it and it works fine.  :-)

> git diff --check complains.

Uh, can you be more specific?  I don't see any output from that command.

> +extern bool pg_HMAC_SHA512(const uint8 *key,
> +                           const uint8 *in, int inlen,
> +                           uint8 *out);
> I think that the split done in this patch makes the HMAC handling in
> the core code messier:
> - SCRAM makes use of HMAC internally, and we should try to use the
> HMAC of OpenSSL if building with it even for SCRAM.
> - For the first reason, I think that we should also have a fallback
> implementation.
> - This API layer should not depend directly on the SHA2 used (SCRAM
> uses SHA256 with HMAC).
> FWIW, I got plans to work on that once I am done with the business
> around MD5 and OpenSSL.

Uh, I just kind of kept all that code and didn't modify it.  It would be
great if you can help me improve it.  I will be using the hash code for
the command-line tool that alters the passphrase, so having that in
common/ does help me.

> The refactoring done with the ciphers moved from pgcrypto to
> src/common/ should be a separate patch.  In short, it would be good to

Uh, I am kind of unclear exactly what was done there since I just took
that part of the patch unchanged.

> rework this patch and split it into pieces that are independently
> useful.  This would make the review much easier as well.

I can break out the -R/file descriptor passing part as a separate patch,
and have the ssl_passphrase_command use that, but that's the only part I
know can be useful on its own.

Since the patch is large, I found a way to push the branch to git and
how to make a download link that tracks whatever I push to the 'key'
branch on my github account.  Here is the updated patch link:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sat, Dec  5, 2020 at 12:15:13PM +0900, Masahiko Sawada wrote:
> diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
> index e5233daab6..a45c86fa67 100644
> --- a/src/common/cryptohash_openssl.c
> +++ b/src/common/cryptohash_openssl.c
> @@ -81,6 +81,8 @@ pg_cryptohash_create(pg_cryptohash_type type)
>         return NULL;
>     }
> 
> +   memset(ctx, 0, sizeof(pg_cryptohash_ctx));
> +   memset(state, 0, sizeof(pg_cryptohash_state));
>     ctx->data = state;
>     ctx->type = type;

OK, I worked with Sawada-san and added the attached patch.  The updated
full patch is at the same URL:  :-)

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Fri, Dec 04, 2020 at 10:52:29PM -0500, Bruce Momjian wrote:
> OK, I worked with Sawada-san and added the attached patch.  The updated
> full patch is at the same URL:  :-)
>
>     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

Oh, I see that you use SHA256 during firstboot, which is why you
bypass the resowner paths in cryptohash_openssl.c.  Wouldn't it be
better to use IsBootstrapProcessingMode() then?

> @@ -72,14 +72,15 @@ pg_cryptohash_create(pg_cryptohash_type type)
>      ctx = ALLOC(sizeof(pg_cryptohash_ctx));
>      if (ctx == NULL)
>          return NULL;
> +    explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
>
>      state = ALLOC(sizeof(pg_cryptohash_state));
>      if (state == NULL)
>      {
> -        explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
>          FREE(ctx);
>          return NULL;
>      }
> +    explicit_bzero(state, sizeof(pg_cryptohash_state));

explicit_bzero() is used to give the guarantee that any sensitive data
gets cleaned up after an error or on free.  So I think that your use
is incorrect here for an initialization.

>      if (state->evpctx == NULL)
>      {
> -        explicit_bzero(state, sizeof(pg_cryptohash_state));
> -        explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
>  #ifndef FRONTEND
>          ereport(ERROR,
>                  (errcode(ERRCODE_OUT_OF_MEMORY),

And this original position is IMO more correct.

Anyway, at quick glance, the backtrace of upthread is indicating a
double-free with an attempt to free a resource owner that has been
already free'd.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sat, Dec  5, 2020 at 09:54:33PM +0900, Michael Paquier wrote:
> On Fri, Dec 04, 2020 at 10:52:29PM -0500, Bruce Momjian wrote:
> > OK, I worked with Sawada-san and added the attached patch.  The updated
> > full patch is at the same URL:  :-)
> > 
> >     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> 
> Oh, I see that you use SHA256 during firstboot, which is why you
> bypass the resowner paths in cryptohash_openssl.c.  Wouldn't it be
> better to use IsBootstrapProcessingMode() then?

I tried that, but we also use the resource references before the
resource system is started even in non-bootstrap mode.  Maybe we should
be creating a resource owner for this, but it is so early in the boot
process I don't know if that is safe.

> > @@ -72,14 +72,15 @@ pg_cryptohash_create(pg_cryptohash_type type)
> >      ctx = ALLOC(sizeof(pg_cryptohash_ctx));
> >      if (ctx == NULL)
> >          return NULL;
> > +    explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> >  
> >      state = ALLOC(sizeof(pg_cryptohash_state));
> >      if (state == NULL)
> >      {
> > -        explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> >          FREE(ctx);
> >          return NULL;
> >      }
> > +    explicit_bzero(state, sizeof(pg_cryptohash_state));
> 
> explicit_bzero() is used to give the guarantee that any sensitive data
> gets cleaned up after an error or on free.  So I think that your use
> is incorrect here for an initialization.

It turns out what we were missing was a clear of state->resowner in
cases where the resowner was null.  I removed the bzero calls completely
and it now runs fine.

> >      if (state->evpctx == NULL)
> >      {
> > -        explicit_bzero(state, sizeof(pg_cryptohash_state));
> > -        explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> >  #ifndef FRONTEND
> >          ereport(ERROR,
> >                  (errcode(ERRCODE_OUT_OF_MEMORY),
> 
> And this original position is IMO more correct.

Do we even need them?

> Anyway, at quick glance, the backtrace of upthread is indicating a
> double-free with an attempt to free a resource owner that has been
> already free'd.

I think that is now fixed too.  Updated patch at the same URL:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Masahiko Sawada
Date:
On Sun, 6 Dec 2020 at 00:42, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Sat, Dec  5, 2020 at 09:54:33PM +0900, Michael Paquier wrote:
> > On Fri, Dec 04, 2020 at 10:52:29PM -0500, Bruce Momjian wrote:
> > > OK, I worked with Sawada-san and added the attached patch.  The updated
> > > full patch is at the same URL:  :-)
> > >
> > >     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> >
> > Oh, I see that you use SHA256 during firstboot, which is why you
> > bypass the resowner paths in cryptohash_openssl.c.  Wouldn't it be
> > better to use IsBootstrapProcessingMode() then?
>
> I tried that, but we also use the resource references before the
> resource system is started even in non-bootstrap mode.  Maybe we should
> be creating a resource owner for this, but it is so early in the boot
> process I don't know if that is safe.
>
> > > @@ -72,14 +72,15 @@ pg_cryptohash_create(pg_cryptohash_type type)
> > >     ctx = ALLOC(sizeof(pg_cryptohash_ctx));
> > >     if (ctx == NULL)
> > >             return NULL;
> > > +   explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> > >
> > >     state = ALLOC(sizeof(pg_cryptohash_state));
> > >     if (state == NULL)
> > >     {
> > > -           explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> > >             FREE(ctx);
> > >             return NULL;
> > >     }
> > > +   explicit_bzero(state, sizeof(pg_cryptohash_state));
> >
> > explicit_bzero() is used to give the guarantee that any sensitive data
> > gets cleaned up after an error or on free.  So I think that your use
> > is incorrect here for an initialization.
>
> It turns out what we were missing was a clear of state->resowner in
> cases where the resowner was null.  I removed the bzero calls completely
> and it now runs fine.
>
> > >     if (state->evpctx == NULL)
> > >     {
> > > -           explicit_bzero(state, sizeof(pg_cryptohash_state));
> > > -           explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> > >  #ifndef FRONTEND
> > >             ereport(ERROR,
> > >                             (errcode(ERRCODE_OUT_OF_MEMORY),
> >
> > And this original position is IMO more correct.
>
> Do we even need them?
>
> > Anyway, at quick glance, the backtrace of upthread is indicating a
> > double-free with an attempt to free a resource owner that has been
> > already free'd.
>
> I think that is now fixed too.  Updated patch at the same URL:
>
>         https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

Thank you for updating the patch!

I think we need explicit_bzero() also in freeing the keywrap context.

BTW, when we need -R option pg_ctl command to start the server, how
can we start it in the single-user mode?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Sat, Dec 05, 2020 at 10:42:05AM -0500, Bruce Momjian wrote:
> On Sat, Dec  5, 2020 at 09:54:33PM +0900, Michael Paquier wrote:
>> On Fri, Dec 04, 2020 at 10:52:29PM -0500, Bruce Momjian wrote:
>>>      if (state->evpctx == NULL)
>>>      {
>>> -        explicit_bzero(state, sizeof(pg_cryptohash_state));
>>> -        explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
>>>  #ifndef FRONTEND
>>>          ereport(ERROR,
>>>                  (errcode(ERRCODE_OUT_OF_MEMORY),
>>
>> And this original position is IMO more correct.
>
> Do we even need them?

That's the intention to clean up things in a consistent way.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec  7, 2020 at 09:30:03AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch!
> 
> I think we need explicit_bzero() also in freeing the keywrap context.

pg_cryptohash_free() already has this:

    explicit_bzero(state, sizeof(pg_cryptohash_state));
    explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));

Do we need more?

> BTW, when we need -R option pg_ctl command to start the server, how
> can we start it in the single-user mode?

I added code for that, but I hadn't tested it yet.  Now that I tried it,
I realized that it is awkward to supply a file descriptor number (that
will be closed) from the command-line, so I added code and docs to allow
-1 to duplicate standard error, and it worked:

    $ postgres --single -R -1 -D /u/pg/data
    
    Enter password:
    PostgreSQL stand-alone backend 14devel
    backend> select 100;
             1: ?column?    (typeid = 23, len = 4, typmod = -1, byval = t)
            ----
             1: ?column? = "100"    (typeid = 23, len = 4, typmod = -1, byval = t)
            ----

Updated patch at the same URL:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec  7, 2020 at 11:03:30AM +0900, Michael Paquier wrote:
> On Sat, Dec 05, 2020 at 10:42:05AM -0500, Bruce Momjian wrote:
> > On Sat, Dec  5, 2020 at 09:54:33PM +0900, Michael Paquier wrote:
> >> On Fri, Dec 04, 2020 at 10:52:29PM -0500, Bruce Momjian wrote:
> >>>      if (state->evpctx == NULL)
> >>>      {
> >>> -        explicit_bzero(state, sizeof(pg_cryptohash_state));
> >>> -        explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> >>>  #ifndef FRONTEND
> >>>          ereport(ERROR,
> >>>                  (errcode(ERRCODE_OUT_OF_MEMORY),
> >> 
> >> And this original position is IMO more correct.
> > 
> > Do we even need them?
> 
> That's the intention to clean up things in a consistent way.

Ah, I see your point.  Added:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Daniel Gustafsson
Date:
> On 2 Dec 2020, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:
>
> Attached is a patch for key management, which will eventually be part of
> cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> by Oracle.

The attackvector protected against seems to be operating systems users
(*without* any legitimate database access) gaining access to the data files.
Such an attacker would also gain access to postgresql.conf and therefore may
have the cluster passphrase command in case it's stored there.  That would
imply the attacker is likely (or perhaps better phrased "not entirely
unlikely") to be able to run that command and decrypt the data *iff* there is
no interactive/2fa aspect to the passphrase command.  Is that an accurate
interpretation?  Do Oracle TDE et.al use a similar setup where an database
restart require manual intervention?

Admittedly I haven't read the other threads on the topic so if it's discussed
somewhere else then I'd appreciate a whacking with a cluestick.

A few other comments from skimming the patch:

+  is data encryption keys, specifically keys zero and one.  Key zero is
+  the key uses to encrypt database heap and index files which are stored in
".. is the key used to .."?

+   matches the initdb-supplied passphrase.  If this check fails, and the
+   the server will refuse to start.
Seems like something is missing, perhaps just s/and the//?

+     <ulink url="https://tools.ietf.org/html/rfc2315">RFC2315</ulink>.</simpara>
Tiny nitpick: turns out that RFC 2315 (with a space) is the correct spelling.

+ * are read-only.  All wrapping and unwrapping key routines depends on
+ * the openssl library for now.
OpenSSL is a name so it should be cased as OpenSSL in text like this.

 #include <openssl/ssl.h>
+#include <openssl/conf.h>
Why do we need this header in be-secure-openssl.c?  There are no other changes
to that file?

+/* Define to 1 if you have the `OPENSSL_init_crypto' function. */
+#undef HAVE_OPENSSL_INIT_CRYPTO
This seems to be unused?

KmgrSaveCryptoKeys doesn't seem to be explicitly clamping down permissions on
the generated file, is that something we want for belt-and-suspender level
paranoia around keys? The same goes for read_one_keyfile etc.

Also, KmgrSaveCryptoKeys assumes that keys are stored in plain files which is
true for OpenSSL but won't necessarily be true for other crypto libraries.
Would it make sense to have an API in be-kmgr.c with the OpenSSL specifics in
be-kmgr-openssl.c like how the TLS backend support is written?  We have spent a
lot effort in making the backend not assume a particular TLS library, it seems
a shame to step away from that here with a tight coupling. (yes, I am biased)

The same goes for src/common/cipher.c which wraps calls in ifdefs, why not just
have an thin wrapper API which cipher-openssl.c implements?

+           case 'K':
+               file_encryption_keylen = atoi(optarg);
+               break;
atoi() will return zero on failing to parse the keylen, where 0 implies
"disabled".  Wouldn't it make sense to parse this with code which can't
silently fall back on disabling in case of users mistyping?

+    * Skip cryptographic keys. It's generally not good idea to copy the
".. not *a* good idea .."

+                       pg_log_fatal("cluser passphrase referenced %%R, but -R not specified");
s/cluser/cluster/  (there are few copy-pasteos of that elsewhere too)

+               elog(ERROR, "too many cryptographic kes");
s/kes/keys/

+#ifndef CIPHER_H
+#define CIPHER_H
The risk for headerguard collision with non-PG headers seem quite high, maybe
PG_CIPHER_H would be better?

I will try to dive in a bit deeper over the holidays.

cheers ./daniel


Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 2 Dec 2020, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:
> > Attached is a patch for key management, which will eventually be part of
> > cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> > by Oracle.
>
> The attackvector protected against seems to be operating systems users
> (*without* any legitimate database access) gaining access to the data files.

That isn't the only attack vector that this addresses (though it is one
that this is envisioned to help with- to wit, someone rsync'ing the DB
directory).  TDE also helps with traditional data at rest requirements,
in environments where you don't trust the storage layer to handle that
(for whatever reason), and it's at least imagined that backups with
pg_basebackup would also be encrypted, helping protect against backup
theft.

There's, further, certainly no shortage of folks asking for this.

> Such an attacker would also gain access to postgresql.conf and therefore may
> have the cluster passphrase command in case it's stored there.  That would
> imply the attacker is likely (or perhaps better phrased "not entirely
> unlikely") to be able to run that command and decrypt the data *iff* there is
> no interactive/2fa aspect to the passphrase command.  Is that an accurate
> interpretation?  Do Oracle TDE et.al use a similar setup where an database
> restart require manual intervention?

This is very much an 'it depends' as other products have different
capabilities in this area.  The most similar implementation to what is
being contemplated for PG today would be the big O's "tablespace" TDE,
which offers options of either "Password-based software keystores" (you
have to provide a password when you want to bring that tablespace
online), or "Auto-login software keystores" (there's a "system generated
password" that's used to encrypt the keystore, which seems to be
more-or-less the username and hostname of the system), or HSM based
options.  As such, a DB restart might require manual intervention (if
the keystore is password based, or an HSM that requires manual
involvement) or it might not (auto-login keystore of some kind).

> Admittedly I haven't read the other threads on the topic so if it's discussed
> somewhere else then I'd appreciate a whacking with a cluestick.

I'm sure it was, but hopefully the above helps you avoid having to dig
through the very (very (very)) long threads on this topic.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec  4, 2020 at 10:32:45PM -0500, Bruce Momjian wrote:
> I can break out the -R/file descriptor passing part as a separate patch,
> and have the ssl_passphrase_command use that, but that's the only part I
> know can be useful on its own.
> 
> Since the patch is large, I found a way to push the branch to git and
> how to make a download link that tracks whatever I push to the 'key'
> branch on my github account.  Here is the updated patch link:
> 
>     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

I have made some good progress on the patch.  I realized that pg_upgrade
can't just copy the keys from the old cluster --- they encrypt the user
heap/index files that are copied/linked by pg_upgrade, but also encrypt
the system tables, which initdb creates, so the keys have to be copied
at initdb bootstrap time --- I have added an option to do that.  I also
realized that pg_upgrade will be starting/stopping the server, so I need
to add an option to pg_upgrade to allow that prompting.  I can now
successfully pg_upgrade a cluster that uses cluster file encryption, and
keep the same keys.  All at the same URL.

In addition I have completed the command-line tool to allow changing of
the cluster passphrase, which applies over the first diff;  diff at:

    https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff

My next task is to write a script for Yubikey authentication.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Neil Chen
Date:


Hi, everyone

I have read the patch and did some simple tests. I'm not entirely sure about some code segments; e.g.:

In the BootStrapKmgr() we generate a data encryption key by:
key = generate_crypto_key(file_encryption_keylen);

However, I found that the file_encryption_keylen is always 0 in bootstrap mode because there exitst another variable bootstrap_file_encryption_keylen in xlog.c and bootstrap.c.

We get the REL/WAL key by KmgrGetKey() call and it works like:
return (const CryptoKey *) &(KmgrShmem->intlKeys[id]);

But in bootstrap mode, the KmgrShmem are not assigned. So, if we want to use it to encrypt something in bootstrap mode, I suggest we make the following changes:
if ( in bootstrap mode)
return intlKeys[id]; // a static variable which contains key
else
reutrn (const CryptoKey *) &(KmgrShmem->intlKeys[id]);



--
There is no royal road to learning.
Highgo Software Co.

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec  9, 2020 at 10:34:59PM +0100, Daniel Gustafsson wrote:
> > On 2 Dec 2020, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:
> > 
> > Attached is a patch for key management, which will eventually be part of
> > cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> > by Oracle.
> 
> The attackvector protected against seems to be operating systems users
> (*without* any legitimate database access) gaining access to the data files.
> Such an attacker would also gain access to postgresql.conf and therefore may
> have the cluster passphrase command in case it's stored there.  That would
> imply the attacker is likely (or perhaps better phrased "not entirely
> unlikely") to be able to run that command and decrypt the data *iff* there is
> no interactive/2fa aspect to the passphrase command.  Is that an accurate
> interpretation?  Do Oracle TDE et.al use a similar setup where an database
> restart require manual intervention?

I have updated the docs to say "read" access more clearly:

   The purpose of cluster file encryption is to prevent users with read
   access to the directories used to store database files from being able to
   access the data stored in those files.  For example, when using cluster
   file encryption, users who have read access to the cluster directories
   for backup purposes will not be able to read the data stored in the
   these files.

I already said "read access" in the later part of the paragraph, but
obviously it needs to be mentioned early too.  If we need more text
here, please let me know.

As far as someone hijacking the passphrase command, that is a serious
risk.  No matter how you do the authentication, even TFA, you are going
to have someone able to grab that passphrase as it comes out of the
script and passes to the server.  It might help to store the script and
postgres binaries in a more secure location than the data, but I am not
sure how realistic that is.  I can if you are using a NAS for data store
and have local binaries and passphrase script, that would be more secure
than putting the script on the NAS.  Is that something we should explain?
Should we explicity say that there is no protection against write
access?  What can someone with write access to PGDATA do if
postgresql.conf is not stored there?  I don't know.

> Admittedly I haven't read the other threads on the topic so if it's discussed
> somewhere else then I'd appreciate a whacking with a cluestick.
> 
> A few other comments from skimming the patch:
> 
> +  is data encryption keys, specifically keys zero and one.  Key zero is
> +  the key uses to encrypt database heap and index files which are stored in
> ".. is the key used to .."?

Fixed, thanks.

> +   matches the initdb-supplied passphrase.  If this check fails, and the
> +   the server will refuse to start.
> Seems like something is missing, perhaps just s/and the//?

Fixed.

> +     <ulink url="https://tools.ietf.org/html/rfc2315">RFC2315</ulink>.</simpara>
> Tiny nitpick: turns out that RFC 2315 (with a space) is the correct spelling.

Fixed.

> + * are read-only.  All wrapping and unwrapping key routines depends on
> + * the openssl library for now.
> OpenSSL is a name so it should be cased as OpenSSL in text like this.

Fixed, and fixed "depends".

>  #include <openssl/ssl.h>
> +#include <openssl/conf.h>
> Why do we need this header in be-secure-openssl.c?  There are no other changes
> to that file?

Not sure, removed, since it compiles fine without it.
> 
> +/* Define to 1 if you have the `OPENSSL_init_crypto' function. */
> +#undef HAVE_OPENSSL_INIT_CRYPTO
> This seems to be unused?

Agreed, removed.

> KmgrSaveCryptoKeys doesn't seem to be explicitly clamping down permissions on
> the generated file, is that something we want for belt-and-suspender level
> paranoia around keys? The same goes for read_one_keyfile etc.

Well, KmgrSaveCryptoKeys is calling BasicOpenFile, which is calling
BasicOpenFilePerm(fileName, fileFlags, pg_file_create_mode).  The
question is whether we should honor the pg_file_create_mode specified on
the data directory, or make it tighter for these keys.  The file is
already owner-rw-only by default:

    $ ls -l
    drwx------ 2 postgres postgres 4096 Dec 10 13:13 live/
    $ cd live/
    $ ls -l
    total 8
    -rw------- 1 postgres postgres 356 Dec 10 13:13 0
    -rw------- 1 postgres postgres 356 Dec 10 13:13 1

If the key files were mixed in with a bunch of other files of lesser
value, like with Apache, I could see not honoring it, but for this case,
since all the files are of equal value, I think just honoring it makes
sense.

> Also, KmgrSaveCryptoKeys assumes that keys are stored in plain files which is
> true for OpenSSL but won't necessarily be true for other crypto libraries.
> Would it make sense to have an API in be-kmgr.c with the OpenSSL specifics in
> be-kmgr-openssl.c like how the TLS backend support is written?  We have spent a
> lot effort in making the backend not assume a particular TLS library, it seems
> a shame to step away from that here with a tight coupling. (yes, I am biased)

I have no idea, sorry.  I am mostly using the excellent routines
Sawada-san used in the original patch.

> The same goes for src/common/cipher.c which wraps calls in ifdefs, why not just
> have an thin wrapper API which cipher-openssl.c implements?

Same.  I am open to it, sure, but I would need a patch.

> +           case 'K':
> +               file_encryption_keylen = atoi(optarg);
> +               break;
> atoi() will return zero on failing to parse the keylen, where 0 implies
> "disabled".  Wouldn't it make sense to parse this with code which can't
> silently fall back on disabling in case of users mistyping?

Good question, I found 43 references to atoi(optarg) in our code, so it
seems they all should be fixed, no?  Not sure I want to be different here.

> +    * Skip cryptographic keys. It's generally not good idea to copy the
> ".. not *a* good idea .."

Thank, fixed

> +                       pg_log_fatal("cluser passphrase referenced %%R, but -R not specified");
> s/cluser/cluster/  (there are few copy-pasteos of that elsewhere too)

All fixed, thanks.

> +               elog(ERROR, "too many cryptographic kes");
> s/kes/keys/

Fixed.
> 
> +#ifndef CIPHER_H
> +#define CIPHER_H
> The risk for headerguard collision with non-PG headers seem quite high, maybe
> PG_CIPHER_H would be better?

Agreed, fixed.

> I will try to dive in a bit deeper over the holidays.

Thanks.  The diff URL has all the updates.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec  9, 2020 at 05:18:37PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Daniel Gustafsson (daniel@yesql.se) wrote:
> > > On 2 Dec 2020, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:
> > > Attached is a patch for key management, which will eventually be part of
> > > cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> > > by Oracle.
> > 
> > The attackvector protected against seems to be operating systems users
> > (*without* any legitimate database access) gaining access to the data files.
> 
> That isn't the only attack vector that this addresses (though it is one
> that this is envisioned to help with- to wit, someone rsync'ing the DB
> directory).  TDE also helps with traditional data at rest requirements,
> in environments where you don't trust the storage layer to handle that
> (for whatever reason), and it's at least imagined that backups with
> pg_basebackup would also be encrypted, helping protect against backup
> theft.
> 
> There's, further, certainly no shortage of folks asking for this.

Can we flesh this out more in the docs?  Any idea on wording compared to
what I have?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Thu, Dec 10, 2020 at 07:26:48PM +0800, Neil Chen wrote:
> 
> 
>     Hi, everyone
> 
>     I have read the patch and did some simple tests. I'm not entirely sure
>     about some code segments; e.g.:
> 
>     In the BootStrapKmgr() we generate a data encryption key by:
>     key = generate_crypto_key(file_encryption_keylen);
> 
>     However, I found that the file_encryption_keylen is always 0 in bootstrap
>     mode because there exitst another variable bootstrap_file_encryption_keylen
>     in xlog.c and bootstrap.c.

Oh, good point;  that is very helpful.  I was relying on SetConfigOption
to set file_encryption_keylen, but that happens _after_ we create the
keys, so they were zero length.  I have fixed this by passing
bootstrap_file_encryption_keylen to the boot routines.  The diff URL has
the fix:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

>     We get the REL/WAL key by KmgrGetKey() call and it works like:
>     return (const CryptoKey *) &(KmgrShmem->intlKeys[id]);
> 
>     But in bootstrap mode, the KmgrShmem are not assigned. So, if we want to
>     use it to encrypt something in bootstrap mode, I suggest we make the
>     following changes:
>     if ( in bootstrap mode)
>     return intlKeys[id]; // a static variable which contains key
>     else
>     reutrn (const CryptoKey *) &(KmgrShmem->intlKeys[id]);

Yes, you are also correct here.  I had not gotten to using KmgrGetKey
yet, but it clearly needs your suggestion, so have done that.

Thanks for your help.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec  9, 2020 at 08:40:50PM -0500, Bruce Momjian wrote:
> My next task is to write a script for Yubikey authentication.

I know Craig Ringer wanted Yubikey support, which allows two-factor
authentication, so I have added it to the most recent patch by adding a
cluster_passphrase_command %d/directory parameter:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

You can also store the PIN in a file, so you don't need a PIN to be
entered by the user for each server start.  Attached is the script I
with a PIN required.  Here is a session:

    $ initdb -K 256 -R -c '/u/postgres/tmp/pass_yubi.sh %R "%d"'
    The files belonging to this database system will be owned by user "postgres".
    This user must also own the server process.
    
    The database cluster will be initialized with locale "en_US.UTF-8".
    The default database encoding has accordingly been set to "UTF8".
    The default text search configuration will be set to "english".
    
    Data page checksums are disabled.
    Cluster file encryption is enabled.
    
    fixing permissions on existing directory /u/pgsql/data ... ok
    creating subdirectories ... ok
    selecting dynamic shared memory implementation ... posix
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting default time zone ... America/New_York
    creating configuration files ... ok
    running bootstrap script ...
    Enter Yubikey PIN:
    
    WARNING:  The Yubikey can be locked and require a reset if too many pin
    attempts fail.  It is recommended to run this command manually and save
    the passphrase in a secure location for possible recovery.
    
-->    Enter Yubikey PIN:
    ok
    performing post-bootstrap initialization ...
-->    Enter Yubikey PIN:
    ok
    syncing data to disk ... ok
    
    initdb: warning: enabling "trust" authentication for local connections
    You can change this by editing pg_hba.conf or using the option -A, or
    --auth-local and --auth-host, the next time you run initdb.
    
    Success. You can now start the database server using:
    
        pg_ctl -D /u/pgsql/data -l logfile start


    $ pg_ctl -R -l /u/pg/server.log start
    waiting for server to start...
-->    Enter Yubikey PIN:
     done
    server started

It even allows for passphrase rotation using my pg_altercpass tool with
this patch:

        https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff

The Yubikey-encrypted passphrase is stored in the key directory, so the
encrypted passphrase stays with the data/WAL keys during passphrase
rotation:

    $ pg_altercpass -R '/u/postgres/tmp/pass_yubi.sh %R "%d"' '/u/postgres/tmp/pass_yubi.sh %R "%d"'
    
-->    Enter Yubikey PIN:
    
-->    Enter Yubikey PIN:
    
    WARNING:  The Yubikey can be locked and require a reset if too many pin
    attempts fail.  It is recommended to run this command manually and save
    the passphrase in a secure location for possible recovery.
    
-->    Enter Yubikey PIN:

Yubikey PIN rotation has to be done using the Yubikey tool, and data/WAL
key rotation has to be done via switching to a standby, which hasn't
been implemented yet.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec 11, 2020 at 01:01:14PM -0500, Bruce Momjian wrote:
> On Wed, Dec  9, 2020 at 08:40:50PM -0500, Bruce Momjian wrote:
> > My next task is to write a script for Yubikey authentication.
> 
> I know Craig Ringer wanted Yubikey support, which allows two-factor
> authentication, so I have added it to the most recent patch by adding a
> cluster_passphrase_command %d/directory parameter:
> 
>     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> 
> You can also store the PIN in a file, so you don't need a PIN to be
> entered by the user for each server start.

Here is the output without requiring a PIN;  attached is the script used:

    ++ initdb -K 256 -R -c '/u/postgres/tmp/pass_yubi_nopin.sh "%d"'
    The files belonging to this database system will be owned by user "postgres".
    This user must also own the server process.
    
    The database cluster will be initialized with locale "en_US.UTF-8".
    The default database encoding has accordingly been set to "UTF8".
    The default text search configuration will be set to "english".
    
    Data page checksums are disabled.
    Cluster file encryption is enabled.
    
    fixing permissions on existing directory /u/pgsql/data ... ok
    creating subdirectories ... ok
    selecting dynamic shared memory implementation ... posix
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting default time zone ... America/New_York
    creating configuration files ... ok
    running bootstrap script ... engine "pkcs11" set.
    
    WARNING:  The Yubikey can be locked and require a reset if too many pin
    attempts fail.  It is recommended to run this command manually and save
    the passphrase in a secure location for possible recovery.
    engine "pkcs11" set.
    ok
    performing post-bootstrap initialization ... engine "pkcs11" set.
    ok
    syncing data to disk ... ok
    
    initdb: warning: enabling "trust" authentication for local connections
    You can change this by editing pg_hba.conf or using the option -A, or
    --auth-local and --auth-host, the next time you run initdb.
    
    Success. You can now start the database server using:
    
        pg_ctl -D /u/pgsql/data -l logfile start
    
    ++ pg_ctl -R -l /u/pg/server.log start
    waiting for server to start... done
    server started
    ++ pg_altercpass -R '/u/postgres/tmp/pass_yubi_nopin.sh "%d"' '/u/postgres/tmp/pass_yubi_nopin.sh "%d"'
    engine "pkcs11" set.
    engine "pkcs11" set.
    
    WARNING:  The Yubikey can be locked and require a reset if too many pin
    attempts fail.  It is recommended to run this command manually and save
    the passphrase in a secure location for possible recovery.
    engine "pkcs11" set.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec  2, 2020 at 04:38:14PM -0500, Bruce Momjian wrote:
> Attached is a patch for key management, which will eventually be part of
> cluster file encryption (CFE), called TDE (Transparent Data Encryption)
> by Oracle.  It is an update of Masahiko Sawada's patch from July 31:
> 
>     https://www.postgresql.org/message-id/CA+fd4k6RJwNvZTro3q2f5HSDd8HgyUc4CuY9U3e6Ran4C6TO4g@mail.gmail.com
> 
> Sawada-san did all the hard work, and I just redirected the patch.  The
> general outline of this CFE feature can be seen here:
> 
>     https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
> 
> The currently planned progression for this feature is to allow secure
> retrieval of key encryption keys (KEK) outside of the database, then use
> those to encrypt data keys that encrypt heap/index/tmpfile files.
...
> If most people approve of this general approach, and the design
> decisions made, I would like to apply this in the next few weeks, but
> this brings complications.  The syntax added by this commit might not
> provide a useful feature until PG 15, so how do we hide it from users. 
> I was thinking of not applying the doc changes (or commenting them out)
> and commenting out the --help output.

I am getting close to applying these patches, probably this week.  The
patches are cumulative:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
    https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff

I do have a few questions:

    Why is KmgrShmemData a struct, when it only has a single member?  Are
    all shared memory areas structs?
    
    Should pg_altercpass be using fsync's for directory renames?
    
    Can anyone test this on Windows, particularly -R handling?
    
    What testing infrastructure should this have?
    
    There are a few shell script I should include to show how to create
    commands.  Where should they be stored?  /contrib module?
    
    Are people okay with having the feature enabled, but invisible
    since the docs and --help output are missing?  When we enable
    ssl_passphrase_command to prompt from the terminal, some of the
    command-line options will be useful.

    Do people like the command-letter choices?

    I called the alter passphrase utility pg_altercpass.  I could
    have called it pg_clusterpass, but I wanted to highlight it is
    only for changing the passphrase, not for creating them.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

 



Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
> I am getting close to applying these patches, probably this week.  The
> patches are cumulative:
>
>     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
>     https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff

I strongly object to a commit based on the current state of the patch.
Based on my lookup of the patches you are referring to, I see a couple
of things that should be splitted up and refactored properly before
thinking about the core part of the patch (FWIW, I don't have much
thoughts to offer about the core part because I haven't really thought
about it, but it does not prevent to do a correct refactoring).  Here
are some notes:
- The code lacks a lot of comments IMO.  Why is retrieve_passphrase()
doing what it does?  It seems to me that pg_altercpass needs a large
brushup.
- There are no changes to src/tools/msvc/.  Seeing the diffs in
src/common/, this stuff breaks Windows builds.
- The HMAC split is terrible, as mentioned upthread.  I think that you
would need to extract this stuff as a separate patch, and not add more
mess to the existing mess (SCRAM uses its own HMAC with SHA256, which
is already wrong).  We can and should have a fallback implementation,
because that's easy to do.  And we need to have the fallback
implementation depend on the contents of cryptohash.c (in a
src/common/hmac.c), while the OpenSSL portion requires a
hmac_openssl.c where you can choose the hash type based on
pg_cryptohash_type.  So ossl_HMAC_SHA512() does not do the right
thing.  APIs flexible enough to allow a new SSL library to plug into
this stuff are required.
- Not much a fan of the changes done in cryptohash.c for the resource
owners as well.  It also feels like this could be thought as something
directly for resowner.c.
- The cipher split also should be done in its own patch, and reviewed
on its own.  There is a mix of dependencies between non-OpenSSL and
OpenSSL code paths, making the pluggin of a new SSL library harder to
do.  In short, this requires proper API design, which is not something
we have here.  There are also no changes in pgcrypto for that stuff.

> I do have a few questions:

That looks like a lot of things to sort out as well.

>     Can anyone test this on Windows, particularly -R handling?
>
>     What testing infrastructure should this have?

Using TAP tests would be adapted for those two points.

>     There are a few shell script I should include to show how to create
>     commands.  Where should they be stored?  /contrib module?

Why are these needed.  Is it a matter of documentation?

>     Are people okay with having the feature enabled, but invisible
>     since the docs and --help output are missing?  When we enable
>     ssl_passphrase_command to prompt from the terminal, some of the
>     command-line options will be useful.

You are suggesting to enable the feature by default, make it invisible
to the users and leave it undocumented?  Is there something I missing
here?

>     Do people like the command-letter choices?
>
>     I called the alter passphrase utility pg_altercpass.  I could
>     have called it pg_clusterpass, but I wanted to highlight it is
>     only for changing the passphrase, not for creating them.

I think that you should attach such patches directly to the emails
sent to pgsql-hackers, if those github links disappear for some
reason, it would become impossible to refer to see what has been
discussed here.

+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's so much
+ * backend-only stuff in the kmgr include files we need.  But we need a
+ * frontend-ish environment otherwise.  Hence this ugly hack.
+ */
+#define FRONTEND 1
+
+#include "postgres.h"
I would advise to really understand why this happens and split up the
backend and frontend parts cleanly.  This style ought to be avoided as
much as possible.

@@ -95,9 +101,9 @@ pg_cryptohash_create(pg_cryptohash_type type)

    if (state->evpctx == NULL)
    {
+#ifndef FRONTEND
        explicit_bzero(state, sizeof(pg_cryptohash_state));
    explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
-#ifndef FRONTEND
I think that this change is incorrect.  Any clean up of memory should
be done for the frontend *and* the backend.

+#ifdef USE_OPENSSL
+       ctx = (PgCipherCtx *) palloc0(sizeof(PgCipherCtx));
+
+       ctx->encctx = ossl_cipher_ctx_create(cipher, key, klen, true);
+       ctx->decctx = ossl_cipher_ctx_create(cipher, key, klen, false);
+#endif
There is a cipher_openssl.c and a cipher.c that includes USE_OPENSSL.
This requires a cleaner split IMO.  We should avoid as much as
possible OpenSSL-specific code paths in files part of src/common/ when
not building with OpenSSL.  So this is now a mixed bag of
dependencies.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Neil Chen
Date:
Hi, Bruce  

I read your question and here are some of my thoughts.

        Why is KmgrShmemData a struct, when it only has a single member?  Are
        all shared memory areas structs?
 
Yes, all areas created by ShmemInitStruct() are structs. I think the significance of using struct is that it delimits an "area"  to store the KMS related things, which makes our memory management more clear and unified.

        What testing infrastructure should this have?

        There are a few shell script I should include to show how to create
        commands.  Where should they be stored?  /contrib module?
 
        Are people okay with having the feature enabled, but invisible
        since the docs and --help output are missing? When we enable
        ssl_passphrase_command to prompt from the terminal, some of the
        command-line options will be useful.
 
Since our implementation is not in contrib, I don't think we should put the script there. Maybe we can refer to postgresql.conf.sample?  
Actually, I wonder whether we can add the UDK(user data encryption key) to the first version of KMS, which can be provided to plug-ins such as pgsodium. In this way, users can at least use it to a certain extent.

Also, I have tested some KMS functionalities by adding very simple TDE logic. In the meantime, I found some confusion in the following places:

1. Previously, we added a variable bootstrap_keys_wrap that is used for encryption during initdb. However, since we save the "wrapped" key, we need to use a global KEK that can be accessed in boot mode to unwrap it before use... I don't know if that's good. To make it simple, I modified the bootstrap_keys_wrap to store the "unwrapped" key so that the encryption function can get it correctly. (The variable name should be changed accordingly).


2. I tried to add support for AES_CTR mode, and the code for encrypting buffer blocks. In the process I found that in pg_cipher_ctx_create, the key length is declared as "byte". However, in the CryptoKey structure, the length is stored as "bit", which leads me to use a form similar to Key->klen / 8 when I call this function. Maybe we should unify the two to avoid unnecessary confusion.

3. This one is not a problem/bug. I have some doubts about the length of data encryption blocks. For the page, we do not encrypt the PageHeaderData, which is 192 bit long. By default, the page size is 8K(65536 bits), so the length of the data we want to encrypt is 65344 bits. This number can't be divisible by 128 and 192 keys and 256 bits long keys. Will this cause a problem?

Here is a simple patch that I wrote with references to Sawada's previous TDE works, hope it can help you.

Thanks.
--
There is no royal road to learning.
HighGo Software Co.
Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
> > I am getting close to applying these patches, probably this week.  The
> > patches are cumulative:
> > 
> >     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> >     https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff
> 
> I strongly object to a commit based on the current state of the patch.
> Based on my lookup of the patches you are referring to, I see a couple
> of things that should be splitted up and refactored properly before
> thinking about the core part of the patch (FWIW, I don't have much
> thoughts to offer about the core part because I haven't really thought
> about it, but it does not prevent to do a correct refactoring).  Here
> are some notes:
> - The code lacks a lot of comments IMO.  Why is retrieve_passphrase()
> doing what it does?  It seems to me that pg_altercpass needs a large
> brushup.

Uh, pg_altercpass is a new file I wrote and almost every block has a
comment.  I added a few more, but can you be more specific?

> - There are no changes to src/tools/msvc/.  Seeing the diffs in
> src/common/, this stuff breaks Windows builds.

OK, done in patch at URL.

> - The HMAC split is terrible, as mentioned upthread.  I think that you
> would need to extract this stuff as a separate patch, and not add more
> mess to the existing mess (SCRAM uses its own HMAC with SHA256, which
> is already wrong).  We can and should have a fallback implementation,
> because that's easy to do.  And we need to have the fallback
> implementation depend on the contents of cryptohash.c (in a
> src/common/hmac.c), while the OpenSSL portion requires a
> hmac_openssl.c where you can choose the hash type based on
> pg_cryptohash_type.  So ossl_HMAC_SHA512() does not do the right
> thing.  APIs flexible enough to allow a new SSL library to plug into
> this stuff are required.
> - Not much a fan of the changes done in cryptohash.c for the resource
> owners as well.  It also feels like this could be thought as something
> directly for resowner.c.
> - The cipher split also should be done in its own patch, and reviewed
> on its own.  There is a mix of dependencies between non-OpenSSL and
> OpenSSL code paths, making the pluggin of a new SSL library harder to
> do.  In short, this requires proper API design, which is not something
> we have here.  There are also no changes in pgcrypto for that stuff.

I am going to need someone to help me make these changes.  I don't feel
I know enough about the crypto API to do it, and it will take me 1+ week
to learn it.

> > I do have a few questions:
> 
> That looks like a lot of things to sort out as well.
> 
> >     Can anyone test this on Windows, particularly -R handling?
> >     
> >     What testing infrastructure should this have?
> 
> Using TAP tests would be adapted for those two points.

OK, I will try that.

> >     There are a few shell script I should include to show how to create
> >     commands.  Where should they be stored?  /contrib module?
> 
> Why are these needed.  Is it a matter of documentation?

I have posted some of the scripts.  They involved complex shell
scripting that I doubt the average user can do.  The scripts are for
prompting for a passphrase from the user's terminal, or using the
Yubikey, with our without typing a pin.  I can put them in the docs and
then users can copy them into a file.  Is that the preferred method?

> >     Are people okay with having the feature enabled, but invisible
> >     since the docs and --help output are missing?  When we enable
> >     ssl_passphrase_command to prompt from the terminal, some of the
> >     command-line options will be useful.
> 
> You are suggesting to enable the feature by default, make it invisible
> to the users and leave it undocumented?  Is there something I missing
> here?

The point is that the command-line options will be active, but will not
be documented.  It will not do anything on its own unless you specify
that command-line option.  I can comment-out the command-line options
from being active but the code it going to look very messy.

> >     Do people like the command-letter choices?
> > 
> >     I called the alter passphrase utility pg_altercpass.  I could
> >     have called it pg_clusterpass, but I wanted to highlight it is
> >     only for changing the passphrase, not for creating them.
> 
> I think that you should attach such patches directly to the emails
> sent to pgsql-hackers, if those github links disappear for some
> reason, it would become impossible to refer to see what has been
> discussed here.

Well, the patches are changing frequently.  I am attaching a version to
this email.

> +/*
> + * We have to use postgres.h not postgres_fe.h here, because there's so much
> + * backend-only stuff in the kmgr include files we need.  But we need a
> + * frontend-ish environment otherwise.  Hence this ugly hack.
> + */
> +#define FRONTEND 1
> +
> +#include "postgres.h"
> I would advise to really understand why this happens and split up the
> backend and frontend parts cleanly.  This style ought to be avoided as
> much as possible.

Uh, I got this code from pg_resetwal.c, which does something similar to
pg_altercpass.

> @@ -95,9 +101,9 @@ pg_cryptohash_create(pg_cryptohash_type type)
> 
>     if (state->evpctx == NULL)
>     {
> +#ifndef FRONTEND
>         explicit_bzero(state, sizeof(pg_cryptohash_state));
>     explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
> -#ifndef FRONTEND
> I think that this change is incorrect.  Any clean up of memory should
> be done for the frontend *and* the backend.

Oh, good point.  Fixed.

> +#ifdef USE_OPENSSL
> +       ctx = (PgCipherCtx *) palloc0(sizeof(PgCipherCtx));
> +
> +       ctx->encctx = ossl_cipher_ctx_create(cipher, key, klen, true);
> +       ctx->decctx = ossl_cipher_ctx_create(cipher, key, klen, false);
> +#endif
> There is a cipher_openssl.c and a cipher.c that includes USE_OPENSSL.
> This requires a cleaner split IMO.  We should avoid as much as
> possible OpenSSL-specific code paths in files part of src/common/ when
> not building with OpenSSL.  So this is now a mixed bag of
> dependencies.

Again, I need help here.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 10:36:56AM +0800, Neil Chen wrote:
> Hi, Bruce  
> 
> I read your question and here are some of my thoughts.
> 
> 
>             Why is KmgrShmemData a struct, when it only has a single member? 
>     Are
>             all shared memory areas structs?
> 
>  
> Yes, all areas created by ShmemInitStruct() are structs. I think the
> significance of using struct is that it delimits an "area"  to store the KMS
> related things, which makes our memory management more clear and unified.

OK, thanks, that helps.

>             What testing infrastructure should this have?
> 
>             There are a few shell script I should include to show how to create
>             commands.  Where should they be stored?  /contrib module?
> 
>      
> 
>             Are people okay with having the feature enabled, but invisible
>             since the docs and --help output are missing? When we enable
>             ssl_passphrase_command to prompt from the terminal, some of the
>             command-line options will be useful.
> 
>  
> Since our implementation is not in contrib, I don't think we should put the
> script there. Maybe we can refer to postgresql.conf.sample?  

Uh, the script are 20-60 lines long --- I am attaching them to this
email.  Plus, when we allow user prompting for the SSL passphrase, we
will have another script, or maybe three mor if people want to use a
Yubikey to unlock the SSL passphrase.

> Actually, I wonder whether we can add the UDK(user data encryption key) to the
> first version of KMS, which can be provided to plug-ins such as pgsodium. In
> this way, users can at least use it to a certain extent.

I don't thinK I want to get into that at this point.  It can be done
later.

> Also, I have tested some KMS functionalities by adding very simple TDE logic.
> In the meantime, I found some confusion in the following places:

OK, I know Cybertec has a TDE patch too.

> 1. Previously, we added a variable bootstrap_keys_wrap that is used for
> encryption during initdb. However, since we save the "wrapped" key, we need to
> use a global KEK that can be accessed in boot mode to unwrap it before use... I
> don't know if that's good. To make it simple, I modified the
> bootstrap_keys_wrap to store the "unwrapped" key so that the encryption
> function can get it correctly. (The variable name should be changed
> accordingly).

I see what you are saying.  We store the wrapped in bootstrap mode, but
the unwrapped in normal mode.  There is also the case of when we copy
the keys from an old cluster.  I will work on a patch tomorrow and
report back here.

> 2. I tried to add support for AES_CTR mode, and the code for encrypting buffer
> blocks. In the process I found that in pg_cipher_ctx_create, the key length is
> declared as "byte". However, in the CryptoKey structure, the length is stored
> as "bit", which leads me to use a form similar to Key->klen / 8 when I call
> this function. Maybe we should unify the two to avoid unnecessary confusion.

Agreed.  I will look at that too tomorrow.

> 3. This one is not a problem/bug. I have some doubts about the length of data
> encryption blocks. For the page, we do not encrypt the PageHeaderData, which is
> 192 bit long. By default, the page size is 8K(65536 bits), so the length of the
> data we want to encrypt is 65344 bits. This number can't be divisible by 128
> and 192 keys and 256 bits long keys. Will this cause a problem?

Since we are using CTR mode for everything, it should be fine.  We
encrypt WAL as it is written.

> Here is a simple patch that I wrote with references to Sawada's previous TDE
> works, hope it can help you.

Thanks.  I will work on the items you mentioned.  Can you look at the
Cybertec patch and then our wiki to see what needs to be done next?

    https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

Thanks for getting a proof-of-concept patch out there.  :-)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote:
> On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
>> - The HMAC split is terrible, as mentioned upthread.  I think that you
>> would need to extract this stuff as a separate patch, and not add more
>> mess to the existing mess (SCRAM uses its own HMAC with SHA256, which
>> is already wrong).  We can and should have a fallback implementation,
>> because that's easy to do.  And we need to have the fallback
>> implementation depend on the contents of cryptohash.c (in a
>> src/common/hmac.c), while the OpenSSL portion requires a
>> hmac_openssl.c where you can choose the hash type based on
>> pg_cryptohash_type.  So ossl_HMAC_SHA512() does not do the right
>> thing.  APIs flexible enough to allow a new SSL library to plug into
>> this stuff are required.
>> - Not much a fan of the changes done in cryptohash.c for the resource
>> owners as well.  It also feels like this could be thought as something
>> directly for resowner.c.
>> - The cipher split also should be done in its own patch, and reviewed
>> on its own.  There is a mix of dependencies between non-OpenSSL and
>> OpenSSL code paths, making the pluggin of a new SSL library harder to
>> do.  In short, this requires proper API design, which is not something
>> we have here.  There are also no changes in pgcrypto for that stuff.
>
> I am going to need someone to help me make these changes.  I don't feel
> I know enough about the crypto API to do it, and it will take me 1+ week
> to learn it.

I think that designing a correct set of APIs that can be plugged with
any SSL library is the correct move in the long term.  I have on my
agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
use that with SHA512.  Daniel has mentioned that he has been touching
this area, and I also got a patch halfly done though pgcrypto needs
some extra thoughts.  So this is still WIP but you could reuse that
here.

> Uh, I got this code from pg_resetwal.c, which does something similar to
> pg_altercpass.

Yeah, that's actually the part where it is a bad idea to just copy
this pattern.  pg_resetwal should not do that in the long term in my
opinion, but nobody has come to clean up this stuff.  (- -;)

>> +#ifdef USE_OPENSSL
>> +       ctx = (PgCipherCtx *) palloc0(sizeof(PgCipherCtx));
>> +
>> +       ctx->encctx = ossl_cipher_ctx_create(cipher, key, klen, true);
>> +       ctx->decctx = ossl_cipher_ctx_create(cipher, key, klen, false);
>> +#endif
>> There is a cipher_openssl.c and a cipher.c that includes USE_OPENSSL.
>> This requires a cleaner split IMO.  We should avoid as much as
>> possible OpenSSL-specific code paths in files part of src/common/ when
>> not building with OpenSSL.  So this is now a mixed bag of
>> dependencies.
>
> Again, I need help here.

My take would be to try to sort out the HMAC part first, then look at
the ciphers.  One step at a time.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 02:20:33PM +0900, Michael Paquier wrote:
> > Uh, I got this code from pg_resetwal.c, which does something similar to
> > pg_altercpass.
> 
> Yeah, that's actually the part where it is a bad idea to just copy
> this pattern.  pg_resetwal should not do that in the long term in my
> opinion, but nobody has come to clean up this stuff.  (- -;)

Glad you asked about this.  It turns out pg_altercpass works fine with
postgres_fe.h, so I will now use that.  pg_resetwal still can't use it
though.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec 14, 2020 at 11:16:18PM -0500, Bruce Momjian wrote:
> > 1. Previously, we added a variable bootstrap_keys_wrap that is used for
> > encryption during initdb. However, since we save the "wrapped" key, we need to
> > use a global KEK that can be accessed in boot mode to unwrap it before use... I
> > don't know if that's good. To make it simple, I modified the
> > bootstrap_keys_wrap to store the "unwrapped" key so that the encryption
> > function can get it correctly. (The variable name should be changed
> > accordingly).
> 
> I see what you are saying.  We store the wrapped in bootstrap mode, but
> the unwrapped in normal mode.  There is also the case of when we copy
> the keys from an old cluster.  I will work on a patch tomorrow and
> report back here.

I had not considered that we need the date keys available in bootstrap
mode, even if we copied them from another cluster during pg_upgrade.  I
have updated the diff URLs and attaching a patch showing the changes I
made. Basically, I had to separate BootStrapKmgr() into sections:

1.  copy or create an empty live key directory
2.  get the pass phrase
3.  populate the live key directory if we didn't copy it
4.  decrypt they keys into a file-scoped variable

Thanks for showing me this missing feature.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 10:36:56AM +0800, Neil Chen wrote:
> 2. I tried to add support for AES_CTR mode, and the code for encrypting buffer
> blocks. In the process I found that in pg_cipher_ctx_create, the key length is
> declared as "byte". However, in the CryptoKey structure, the length is stored
> as "bit", which leads me to use a form similar to Key->klen / 8 when I call
> this function. Maybe we should unify the two to avoid unnecessary confusion.

Yes, I would also like to get opinions on this.  We certainly have to
have the key length be in _bit_ units when visible by users, but I see a
lot of cases where we allocate arrays based on bytes.  I am unclear
where the proper units should be.  At a minimum, we should specify the
units in the function parameter names.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
> > > I am getting close to applying these patches, probably this week.  The
> > > patches are cumulative:
> > >
> > >     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> > >     https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff
> >
> > I strongly object to a commit based on the current state of the patch.

Yeah, I agree that this could use some work though I don't think it's
all that far away from being something we can get in, to allow us to
move forward with the other work around supporting TDE.

> > >     There are a few shell script I should include to show how to create
> > >     commands.  Where should they be stored?  /contrib module?
> >
> > Why are these needed.  Is it a matter of documentation?
>
> I have posted some of the scripts.  They involved complex shell
> scripting that I doubt the average user can do.  The scripts are for
> prompting for a passphrase from the user's terminal, or using the
> Yubikey, with our without typing a pin.  I can put them in the docs and
> then users can copy them into a file.  Is that the preferred method?

If we are going to include these in core anywhere, I would think a new
directory under contrib would make the most sense.  I'd hope that we
could find a way to automate the testing of them though, so that we have
some idea when they break because otherwise I'd be concerned that
they'll break somewhere down the line and we won't notice for quite a
while.

This doesn't seem like something that would make sense to only have in
the documentation, which isn't a terribly convenient way to make use of
them.

> > >     Are people okay with having the feature enabled, but invisible
> > >     since the docs and --help output are missing?  When we enable
> > >     ssl_passphrase_command to prompt from the terminal, some of the
> > >     command-line options will be useful.
> >
> > You are suggesting to enable the feature by default, make it invisible
> > to the users and leave it undocumented?  Is there something I missing
> > here?
>
> The point is that the command-line options will be active, but will not
> be documented.  It will not do anything on its own unless you specify
> that command-line option.  I can comment-out the command-line options
> from being active but the code it going to look very messy.

I'm a bit concerned with what's being contemplated here..  Ideally,
we'll actually get everything committed for v14 but if we don't and this
doesn't serve any use-case then I'm not sure that it should be
included in the release.  I also don't like committing and reverting
things either, but having command line options that aren't documented
but which exist in the code isn't great either, nor is having random
code commented out..

> > >     Do people like the command-letter choices?
> > >
> > >     I called the alter passphrase utility pg_altercpass.  I could
> > >     have called it pg_clusterpass, but I wanted to highlight it is
> > >     only for changing the passphrase, not for creating them.
> >
> > I think that you should attach such patches directly to the emails
> > sent to pgsql-hackers, if those github links disappear for some
> > reason, it would become impossible to refer to see what has been
> > discussed here.
>
> Well, the patches are changing frequently.  I am attaching a version to
> this email.

I agree with having the patches posted to the list.  I don't really
agree with the argument of "they change frequently".

* Michael Paquier (michael@paquier.xyz) wrote:
> On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote:
> > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> >> - The cipher split also should be done in its own patch, and reviewed
> >> on its own.  There is a mix of dependencies between non-OpenSSL and
> >> OpenSSL code paths, making the pluggin of a new SSL library harder to
> >> do.  In short, this requires proper API design, which is not something
> >> we have here.  There are also no changes in pgcrypto for that stuff.
> >
> > I am going to need someone to help me make these changes.  I don't feel
> > I know enough about the crypto API to do it, and it will take me 1+ week
> > to learn it.
>
> I think that designing a correct set of APIs that can be plugged with
> any SSL library is the correct move in the long term.  I have on my
> agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
> use that with SHA512.  Daniel has mentioned that he has been touching
> this area, and I also got a patch halfly done though pgcrypto needs
> some extra thoughts.  So this is still WIP but you could reuse that
> here.

Yeah, looking at what's been done there seems like the right approach to
me as well, ideally we'd have one set of APIs that'll support all these
use cases (not unlike what curl does..).

The other thought I had here off-hand was that we might want to randomly
generate a key during startup and have that available and use it for
anything that's temporary and which is going to get blown away on a
restart, I've been thinking that doing that might allow us to have a
relatively simpler API for transient/temporary files too.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 02:20:33PM +0900, Michael Paquier wrote:
> On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote:
> > I am going to need someone to help me make these changes.  I don't feel
> > I know enough about the crypto API to do it, and it will take me 1+ week
> > to learn it.
> 
> I think that designing a correct set of APIs that can be plugged with
> any SSL library is the correct move in the long term.  I have on my
> agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
> use that with SHA512.  Daniel has mentioned that he has been touching
> this area, and I also got a patch halfly done though pgcrypto needs
> some extra thoughts.  So this is still WIP but you could reuse that
> here.

I thought this was going to be a huge job, but once I looked at it, it
was clear exactly what you were saying.  Comparing cryptohash.c and
cryptohash_openssl.c I saw exactly what you wanted, and I think I have
completed it in the attached patch.  cryptohash.c implemented the hash
in C code if OpenSSL is not present --- I assume you didn't want me to
do that, but rather to split the API so it was easy to add another
implementation.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> > > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
> > > > I am getting close to applying these patches, probably this week.  The
> > > > patches are cumulative:
> > > > 
> > > >     https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> > > >     https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff
> > > 
> > > I strongly object to a commit based on the current state of the patch.
> 
> Yeah, I agree that this could use some work though I don't think it's
> all that far away from being something we can get in, to allow us to
> move forward with the other work around supporting TDE.

Glad to hear that.  Michael Paquier was right that the common/cipher*.c
API was wrong and should not be committed until fixed, which I think it
has been.  If there are other messy parts of this patch, I would like to
fix those too.

> > I have posted some of the scripts.  They involved complex shell
> > scripting that I doubt the average user can do.  The scripts are for
> > prompting for a passphrase from the user's terminal, or using the
> > Yubikey, with our without typing a pin.  I can put them in the docs and
> > then users can copy them into a file.  Is that the preferred method?
> 
> If we are going to include these in core anywhere, I would think a new
> directory under contrib would make the most sense.  I'd hope that we
> could find a way to automate the testing of them though, so that we have
> some idea when they break because otherwise I'd be concerned that
> they'll break somewhere down the line and we won't notice for quite a
> while.

I am doing automated testing here, but I have a Yubikey.  Michael
Paquier recommended TAP tests, I guess like we do for pg_upgrade, and I
will look into that.

> This doesn't seem like something that would make sense to only have in
> the documentation, which isn't a terribly convenient way to make use of
> them.

Yeah, it is hard to cut-paste 60 lines.  The /contrib directory would be
for SSL and cluster file encryption passphrase control, so it should
have more general usefulness.  Would those file be copied to the install
directory somewhere if you install /contrib?  Do we have any cases of
this?

> > The point is that the command-line options will be active, but will not
> > be documented.  It will not do anything on its own unless you specify
> > that command-line option.  I can comment-out the command-line options
> > from being active but the code it going to look very messy.
> 
> I'm a bit concerned with what's being contemplated here..  Ideally,
> we'll actually get everything committed for v14 but if we don't and this
> doesn't serve any use-case then I'm not sure that it should be
> included in the release.  I also don't like committing and reverting
> things either, but having command line options that aren't documented
> but which exist in the code isn't great either, nor is having random
> code commented out..

Agreed.  Once we use this code for SSL passphrase prompting, many of the
options will actually have usefulness.  What we could do is to not hide
anything, including the docs, and then hide them once we are ready to
start beta.  At that point, maybe putting in the #ifdefs will be
acceptable, since we would not be working on the feature until we
branch, and then we can just remove them again.  We had similar issues
with the Win32 port, but that had fewer visible knobs.

> > > I think that you should attach such patches directly to the emails
> > > sent to pgsql-hackers, if those github links disappear for some
> > > reason, it would become impossible to refer to see what has been
> > > discussed here.
> > 
> > Well, the patches are changing frequently.  I am attaching a version to
> > this email.
> 
> I agree with having the patches posted to the list.  I don't really
> agree with the argument of "they change frequently".

Sure, they are only 35k compressed.  My point is that I am modifying my
git tree all day, and with github, I can easily push the changes and
when someone looks at the diff, they see the most recent version.

> > I think that designing a correct set of APIs that can be plugged with
> > any SSL library is the correct move in the long term.  I have on my
> > agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
> > use that with SHA512.  Daniel has mentioned that he has been touching
> > this area, and I also got a patch halfly done though pgcrypto needs
> > some extra thoughts.  So this is still WIP but you could reuse that
> > here.
> 
> Yeah, looking at what's been done there seems like the right approach to
> me as well, ideally we'd have one set of APIs that'll support all these
> use cases (not unlike what curl does..).

I think I accomplished this in a patch I just posted.

> The other thought I had here off-hand was that we might want to randomly
> generate a key during startup and have that available and use it for
> anything that's temporary and which is going to get blown away on a
> restart, I've been thinking that doing that might allow us to have a
> relatively simpler API for transient/temporary files too.

That is easy to do because of the design.  If you want it done now, I
can to it --- just let me know.

I want to thank everyone for the very helpful feedback I have received
since posting the first version of this patch.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Alastair Turner
Date:
Hi Bruce et al

Firstly, thanks for shaping the patch, getting it down to a manageable scope of cluster file encryption. I think this is a great feature and it matters to a lot of the customers I talk to at VMware about adopting Postgres.

Since it's exciting stuff, I've been trying to lash together a PoC integration with the crypto infrastructure we see at these customers. Unfortunately, in short, the API doesn't seem to suit integration with HSM big iron, like Thales, Utimaco, (other options are available), or their cloudy lookalikes.

I understand the model of wrapping the Data Encryption Key and storing the wrapped key with the encrypted data. The thing I can't find support for in your patch is passing a wrapped DEK to an external system for decryption. A key feature of these key management approaches is that the application handling the encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back after unwrapping it. Google have a neat illustration of their approach, which is very similar to others, at https://cloud.google.com/kms/docs/envelope-encryption#how_to_decrypt_data_using_envelope_encryption

So instead of calling out to a passphrase command which returns input for a PBKDF, Postgres (in the form of initdb or postmaster) should be passing the wrapped DEK and getting back the DEK in plain. The value passed from Postgres to the external process doesn't even have to be a wrapped DEK, it could be a key in the key->value sense, for use in a lookup against Vault, CredHub or the Kubernetes secret store. Making the stored keys opaque to the Postgres processes and pushing the task of turning them into cryptographic material to an external hepler process probably wouldn't shrink the patch overall, but it would move a lot of code from core processes into the helper. Maybe that helper should live in contrib, as discussed below, where it would hopefully be joined by a bridge for KMIP and others.

On Tue, 15 Dec 2020 at 19:09, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
 <snip>
> > >   There are a few shell script I should include to show how to create
> > >   commands.  Where should they be stored?  /contrib module?
> >
> > Why are these needed.  Is it a matter of documentation?
>
> I have posted some of the scripts.  They involved complex shell
> scripting that I doubt the average user can do.  The scripts are for
> prompting for a passphrase from the user's terminal, or using the
> Yubikey, with our without typing a pin.  I can put them in the docs and
> then users can copy them into a file.  Is that the preferred method?

If we are going to include these in core anywhere, I would think a new
directory under contrib would make the most sense.  I'd hope that we
could find a way to automate the testing of them though, so that we have
some idea when they break because otherwise I'd be concerned that
they'll break somewhere down the line and we won't notice for quite a
while.

Regards

Alastair 

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 15, 2020 at 11:13:12PM +0000, Alastair Turner wrote:
> Since it's exciting stuff, I've been trying to lash together a PoC integration
> with the crypto infrastructure we see at these customers. Unfortunately, in
> short, the API doesn't seem to suit integration with HSM big iron, like Thales,
> Utimaco, (other options are available), or their cloudy lookalikes.
> 
> I understand the model of wrapping the Data Encryption Key and storing the
> wrapped key with the encrypted data. The thing I can't find support for in your
> patch is passing a wrapped DEK to an external system for decryption. A key
> feature of these key management approaches is that the application handling the
> encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back after
> unwrapping it. Google have a neat illustration of their approach, which is very
> similar to others, at https://cloud.google.com/kms/docs/envelope-encryption#
> how_to_decrypt_data_using_envelope_encryption
> 
> So instead of calling out to a passphrase command which returns input for a
> PBKDF, Postgres (in the form of initdb or postmaster) should be passing the
> wrapped DEK and getting back the DEK in plain. The value passed from Postgres
> to the external process doesn't even have to be a wrapped DEK, it could be a
> key in the key->value sense, for use in a lookup against Vault, CredHub or the
> Kubernetes secret store. Making the stored keys opaque to the Postgres
> processes and pushing the task of turning them into cryptographic material to
> an external hepler process probably wouldn't shrink the patch overall, but it
> would move a lot of code from core processes into the helper. Maybe that helper
> should live in contrib, as discussed below, where it would hopefully be joined
> by a bridge for KMIP and others.

Well, I think we can go two ways with this.  First, if you look at my
attached Yubikey script, you will see that, at boot time, since
$DIR/yubipass.key does not exit, the script generates a passphrase,
wraps it with the Yubikey's PIV #3 public key, and stores it in the live
key directory.  The next time it is called, it decrypts the wrapped
passphrase using the Yubikey, and uses that as the KEK to decrypt the
two DEKs.  Now, you could modify the script to call the HSM at boot time
to retrieve a DEK wrapped in a KEK, and store it in the key directory. 
On all future calls, you can pass the DEK wrapped by KEK to the HSM, and
use the returned DEK as the KEK/passphrase to decrypt the real DEK.  The
big value of this is that the keys are validated, and it uses the
existing API.  Ideally we write a sample script of how to this and
include it in /contrib.

The second approach is to make a new API for what you want.  It would be
similar to the cluster_passphrase_command, but it would be simple in
some ways, but complex in others since we need at least two DEKs, and
key rotation might be very risky since there isn't validation in the
server.  I don't think this can be accomplished by a contrib module, but
would actually be easy to implement, but perhaps hard to document
because the user API might be tricky.

I think my big question is about your sentence, "A key feature of these
key management approaches is that the application handling the
encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
after unwrapping it."  It is less secure for the HSM to return a KEK
rather than a DEK?  I can't see why it would be.  The application never
sees the KEK used to wrap the DEK it has stored in the file system,
though that DEK is actually used as a passphrase by Postgres.  This is
the same with the Yubikey --- Postgres never sees the private key used
to unlock what it locked with the Yubikey public key.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
Hi Stephen,

On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
> Yeah, looking at what's been done there seems like the right approach to
> me as well, ideally we'd have one set of APIs that'll support all these
> use cases (not unlike what curl does..).

Ooh...  This is interesting.  What did curl do wrong here?  It is
always good to hear from such past experiences.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Tue, Dec 15, 2020 at 04:02:12PM -0500, Bruce Momjian wrote:
> I thought this was going to be a huge job, but once I looked at it, it
> was clear exactly what you were saying.  Comparing cryptohash.c and
> cryptohash_openssl.c I saw exactly what you wanted, and I think I have
> completed it in the attached patch.  cryptohash.c implemented the hash
> in C code if OpenSSL is not present --- I assume you didn't want me to
> do that, but rather to split the API so it was easy to add another
> implementation.

Hmm.  I don't think that this is right.  First, this still mixes
ciphers and HMAC.  Second, it is only possible here to use HMAC with
SHA512 but we want to be able to use SHA256 as well with SCRAM (per se
the scram_HMAC_* business in scram-common.c).  Third, this lacks a
fallback implementation.  Finally, pgcrypto is not touched, but we
should be able to simplify the area around int_digest_list in
px-hmac.c which lists for HMAC as available options MD5, SHA1 and all
four SHA2.

Please note that we have MD5 and SHA2 as choices in cryptohash.h, and
I have already a patch to add SHA1 to the cryptohash set pending for
review: https://commitfest.postgresql.org/31/2868/.  If all that is
done correctly, a lot of code can be cleaned up while making things
still pluggable with various SSL libraries.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
> > Yeah, looking at what's been done there seems like the right approach to
> > me as well, ideally we'd have one set of APIs that'll support all these
> > use cases (not unlike what curl does..).
>
> Ooh...  This is interesting.  What did curl do wrong here?  It is
> always good to hear from such past experiences.

Not sure that came across very well- curl did something right in terms
of their vTLS layer which allows for building curl with a number of
different SSL/TLS libraries without the core code having to know about
all the differences.  I was suggesting that we might want to look at how
they did that, and naturally discuss with Daniel and ask him what
thoughts he has from having worked with curl and the vTLS layer.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
Hi Bruce

On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
>
...
>
> The second approach is to make a new API for what you want....

I am trying to motivate for an alternate API. Specifically, an API
which allows any potential adopter of Postgres and Cluster File
Encryption to adopt them without having to accept any particular
approach to key management, key derivation, wrapping, validation, etc.
A passphrase key-wrapper with validation will probably be very useful
to a lot of people, but making it mandatory and requiring twists and
turns to integrate with already-established security infrastructure
sounds like a barrier to adoption.

> ...It would be
> similar to the cluster_passphrase_command, but it would be simple in
> some ways, but complex in others since we need at least two DEKs, and
> key rotation might be very risky since there isn't validation in the
> server.
>

I guess that the risk you're talking about here is encryption with a
bogus key and bricking the data? In a world where the wrapped keys are
opaque to the application, a key would be validated through a
roundtrip: request the unwrapping of the key, encrypt a known string,
request the unwrapping again, decrypt the known string, compare. If
any of those steps fail, don't use the wrapped key provided.
Validating that the stored keys have not been fiddled/damaged is the
KMS/HSM's responsibility.

>
>... I don't think this can be accomplished by a contrib module, but
> would actually be easy to implement, but perhaps hard to document
> because the user API might be tricky.
>

If integration through a pipeline isn't good enough (it would be a
pain for the passphrase, with multiple prompts), what else do you see
an API having to provide?

>
> I think my big question is about your sentence, "A key feature of these
> key management approaches is that the application handling the
> encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> after unwrapping it."  It is less secure for the HSM to return a KEK
> rather than a DEK?  I can't see why it would be.  The application never
> sees the KEK used to wrap the DEK it has stored in the file system,
> though that DEK is actually used as a passphrase by Postgres.  This is
> the same with the Yubikey --- Postgres never sees the private key used
> to unlock what it locked with the Yubikey public key.
>

The protocols and practices are designed for a lot of DEKs and small
number of KEK's. So the compromise of a KEK would, potentially, lead
to compromise of thousands of DEKs. In this particular case with 2
DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
agree. From an acceptance and adoption point of view, I'm just
inclined to use the security ecosystem in an established and
understood way.

Regards
Alastair



Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec 16, 2020 at 06:07:26PM +0000, Alastair Turner wrote:
> Hi Bruce
> 
> On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
> >
> ...
> >
> > The second approach is to make a new API for what you want....
> 
> I am trying to motivate for an alternate API. Specifically, an API
> which allows any potential adopter of Postgres and Cluster File
> Encryption to adopt them without having to accept any particular
> approach to key management, key derivation, wrapping, validation, etc.
> A passphrase key-wrapper with validation will probably be very useful
> to a lot of people, but making it mandatory and requiring twists and
> turns to integrate with already-established security infrastructure
> sounds like a barrier to adoption.

Attached is a script that uses the AWS Secrets Manager, and it does key
rotation with the new pg_altercpass tool too, just like all the other
methods.

I am not inclined to add any more APIs unless there is clear value for
it, and I am not seeing that yet.

> > ...It would be
> > similar to the cluster_passphrase_command, but it would be simple in
> > some ways, but complex in others since we need at least two DEKs, and
> > key rotation might be very risky since there isn't validation in the
> > server.
> >
> 
> I guess that the risk you're talking about here is encryption with a
> bogus key and bricking the data? In a world where the wrapped keys are
> opaque to the application, a key would be validated through a
> roundtrip: request the unwrapping of the key, encrypt a known string,
> request the unwrapping again, decrypt the known string, compare. If
> any of those steps fail, don't use the wrapped key provided.
> Validating that the stored keys have not been fiddled/damaged is the
> KMS/HSM's responsibility.

OK, but we already have validation.

> >... I don't think this can be accomplished by a contrib module, but
> > would actually be easy to implement, but perhaps hard to document
> > because the user API might be tricky.
> >
> 
> If integration through a pipeline isn't good enough (it would be a
> pain for the passphrase, with multiple prompts), what else do you see
> an API having to provide?

The big problem is that at bootstrap time you have to call the command
at a specific time, and I don't see how that could be done via contrib.
Also, I am trying to see value in offering another API.  We don't need
to serve every need.

> > I think my big question is about your sentence, "A key feature of these
> > key management approaches is that the application handling the
> > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> > after unwrapping it."  It is less secure for the HSM to return a KEK
> > rather than a DEK?  I can't see why it would be.  The application never
> > sees the KEK used to wrap the DEK it has stored in the file system,
> > though that DEK is actually used as a passphrase by Postgres.  This is
> > the same with the Yubikey --- Postgres never sees the private key used
> > to unlock what it locked with the Yubikey public key.
> 
> The protocols and practices are designed for a lot of DEKs and small
> number of KEK's. So the compromise of a KEK would, potentially, lead
> to compromise of thousands of DEKs. In this particular case with 2
> DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
> agree. From an acceptance and adoption point of view, I'm just
> inclined to use the security ecosystem in an established and
> understood way.

Right, if there were many DEKs I can see your point.  Using many DEKs in
a database is a big problem since so many parts are interrelated.  We
looked at per-user or per-tablespace DEKs, but found it unworkable.  We
use two DEKs so we can failover to a standby for DEK rotation purposes.

I think for your purposes, your KMS DEK ends up being a KEK for
Postgres.  I am guessing most applications don't have the validation and
key rotation needs Postgres has, so a DEK is fine, but for Postgres,
because we are supporing already four different authentication methods
via a single command, we have those features already, so getting a DEK
from a KMS that we treat as a KEK seems natural to me.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Alastair Turner (minion@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
> > The second approach is to make a new API for what you want....
>
> I am trying to motivate for an alternate API. Specifically, an API
> which allows any potential adopter of Postgres and Cluster File
> Encryption to adopt them without having to accept any particular
> approach to key management, key derivation, wrapping, validation, etc.
> A passphrase key-wrapper with validation will probably be very useful
> to a lot of people, but making it mandatory and requiring twists and
> turns to integrate with already-established security infrastructure
> sounds like a barrier to adoption.

Yeah, I mentioned this in one of the other threads where we were
discussing KEKs and DEKs and such.  Forcing one solution for working
with the KEK/DEK isn't really ideal.

That said, maybe there's an option to have the user indicate to PG if
they're passing in a passphrase or a DEK, but we otherwise more-or-less
keep things as they are where the DEK that the user provides actually
goes to encrypting the sub-keys that we use for tables vs. WAL..?  That
avoids the complication of having to have an API that somehow provides
more than one key, while also using the primary DEK key as-is from the
key management service and the KEK never being seen on the system where
PG is running.

> > ...It would be
> > similar to the cluster_passphrase_command, but it would be simple in
> > some ways, but complex in others since we need at least two DEKs, and
> > key rotation might be very risky since there isn't validation in the
> > server.
>
> I guess that the risk you're talking about here is encryption with a
> bogus key and bricking the data? In a world where the wrapped keys are
> opaque to the application, a key would be validated through a
> roundtrip: request the unwrapping of the key, encrypt a known string,
> request the unwrapping again, decrypt the known string, compare. If
> any of those steps fail, don't use the wrapped key provided.
> Validating that the stored keys have not been fiddled/damaged is the
> KMS/HSM's responsibility.

I'm not really sure what the validation concern being raised here is,
but I can understand Bruce's point about having to set up an API that
allows us to ask for two distinct DEK's- but I'd think that keeping the
two keys we have today as sub-keys addresses that as I outline above.

> >... I don't think this can be accomplished by a contrib module, but
> > would actually be easy to implement, but perhaps hard to document
> > because the user API might be tricky.
>
> If integration through a pipeline isn't good enough (it would be a
> pain for the passphrase, with multiple prompts), what else do you see
> an API having to provide?

I certainly agree that it'd be good to have a way to get an encrypted
cluster set up and running which doesn't involve actual prompts.

> > I think my big question is about your sentence, "A key feature of these
> > key management approaches is that the application handling the
> > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> > after unwrapping it."  It is less secure for the HSM to return a KEK
> > rather than a DEK?  I can't see why it would be.  The application never
> > sees the KEK used to wrap the DEK it has stored in the file system,
> > though that DEK is actually used as a passphrase by Postgres.  This is
> > the same with the Yubikey --- Postgres never sees the private key used
> > to unlock what it locked with the Yubikey public key.
>
> The protocols and practices are designed for a lot of DEKs and small
> number of KEK's. So the compromise of a KEK would, potentially, lead
> to compromise of thousands of DEKs. In this particular case with 2
> DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
> agree. From an acceptance and adoption point of view, I'm just
> inclined to use the security ecosystem in an established and
> understood way.

I'm not quite following this- I agree that we don't want PG, or anything
really, to see the private key that's on the Yubikey, as that wouldn't
be good- instead, we should be able to ask for the Yubikey to decrypt
the DEK for us and then use that, which I had thought was happening but
it's not entirely clear from the above discussion (and, admittedly, I've
not tried using the patch with my yubikey yet, but I do plan to...).

Are we sure we're talking about the same thing here..?  That's really
the question I'm asking myself.

There's an entirely independent discussion to be had about figuring out
a way to actually off-load *entirely* the encryption/decryption of data
to the linux crypto system or hardware devices, but unless someone can
step up and write code for those today, I'd suggest that we table that
effort until after we get this initial capability of having TDE with PG
doing all of the encryption/decryption.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec 16, 2020 at 10:23:23AM +0900, Michael Paquier wrote:
> On Tue, Dec 15, 2020 at 04:02:12PM -0500, Bruce Momjian wrote:
> > I thought this was going to be a huge job, but once I looked at it, it
> > was clear exactly what you were saying.  Comparing cryptohash.c and
> > cryptohash_openssl.c I saw exactly what you wanted, and I think I have
> > completed it in the attached patch.  cryptohash.c implemented the hash
> > in C code if OpenSSL is not present --- I assume you didn't want me to
> > do that, but rather to split the API so it was easy to add another
> > implementation.
> 
> Hmm.  I don't think that this is right.  First, this still mixes
> ciphers and HMAC.

I looked at the code.  The HMAC function body is just one function call
to OpenSSL.  Do you want it moved to cryptohash.c and
cryptohash_openssl.c?  To a new C file?

> Second, it is only possible here to use HMAC with
> SHA512 but we want to be able to use SHA256 as well with SCRAM (per se
> the scram_HMAC_* business in scram-common.c).  Third, this lacks a

I looked at the SCRAM HMAC code.  It looks complex, but I assume ideally
that would be moved into a separate C file and used by cluster file
encryption and SCRAM, right?  I need help to do that because I am
unclear how much of the SCRAM hash code is specific to SCRAM.

> fallback implementation.  Finally, pgcrypto is not touched, but we

I have a fallback implemention --- it fails?  ;-)  Did you want me to
include an AES implementation?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec 16, 2020 at 04:32:00PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Alastair Turner (minion@decodable.me) wrote:
> > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
> > > The second approach is to make a new API for what you want....
> > 
> > I am trying to motivate for an alternate API. Specifically, an API
> > which allows any potential adopter of Postgres and Cluster File
> > Encryption to adopt them without having to accept any particular
> > approach to key management, key derivation, wrapping, validation, etc.
> > A passphrase key-wrapper with validation will probably be very useful
> > to a lot of people, but making it mandatory and requiring twists and
> > turns to integrate with already-established security infrastructure
> > sounds like a barrier to adoption.
> 
> Yeah, I mentioned this in one of the other threads where we were
> discussing KEKs and DEKs and such.  Forcing one solution for working
> with the KEK/DEK isn't really ideal.
> 
> That said, maybe there's an option to have the user indicate to PG if
> they're passing in a passphrase or a DEK, but we otherwise more-or-less
> keep things as they are where the DEK that the user provides actually
> goes to encrypting the sub-keys that we use for tables vs. WAL..?  That

Yes, that is what I suggested in an earlier email.

> avoids the complication of having to have an API that somehow provides
> more than one key, while also using the primary DEK key as-is from the
> key management service and the KEK never being seen on the system where
> PG is running.

How is that diffeent from what we have now?  Did you read my reply today
to Alastair with the AWS script?

> > > ...It would be
> > > similar to the cluster_passphrase_command, but it would be simple in
> > > some ways, but complex in others since we need at least two DEKs, and
> > > key rotation might be very risky since there isn't validation in the
> > > server.
> > 
> > I guess that the risk you're talking about here is encryption with a
> > bogus key and bricking the data? In a world where the wrapped keys are
> > opaque to the application, a key would be validated through a
> > roundtrip: request the unwrapping of the key, encrypt a known string,
> > request the unwrapping again, decrypt the known string, compare. If
> > any of those steps fail, don't use the wrapped key provided.
> > Validating that the stored keys have not been fiddled/damaged is the
> > KMS/HSM's responsibility.
> 
> I'm not really sure what the validation concern being raised here is,
> but I can understand Bruce's point about having to set up an API that
> allows us to ask for two distinct DEK's- but I'd think that keeping the
> two keys we have today as sub-keys addresses that as I outline above.

Right, how is a passphrase different than subkeys?

> > >... I don't think this can be accomplished by a contrib module, but
> > > would actually be easy to implement, but perhaps hard to document
> > > because the user API might be tricky.
> > 
> > If integration through a pipeline isn't good enough (it would be a
> > pain for the passphrase, with multiple prompts), what else do you see
> > an API having to provide?
> 
> I certainly agree that it'd be good to have a way to get an encrypted
> cluster set up and running which doesn't involve actual prompts.

Uh, two of my four scripts do not require prompts.

> > > I think my big question is about your sentence, "A key feature of these
> > > key management approaches is that the application handling the
> > > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> > > after unwrapping it."  It is less secure for the HSM to return a KEK
> > > rather than a DEK?  I can't see why it would be.  The application never
> > > sees the KEK used to wrap the DEK it has stored in the file system,
> > > though that DEK is actually used as a passphrase by Postgres.  This is
> > > the same with the Yubikey --- Postgres never sees the private key used
> > > to unlock what it locked with the Yubikey public key.
> > 
> > The protocols and practices are designed for a lot of DEKs and small
> > number of KEK's. So the compromise of a KEK would, potentially, lead
> > to compromise of thousands of DEKs. In this particular case with 2
> > DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
> > agree. From an acceptance and adoption point of view, I'm just
> > inclined to use the security ecosystem in an established and
> > understood way.
> 
> I'm not quite following this- I agree that we don't want PG, or anything
> really, to see the private key that's on the Yubikey, as that wouldn't
> be good- instead, we should be able to ask for the Yubikey to decrypt
> the DEK for us and then use that, which I had thought was happening but
> it's not entirely clear from the above discussion (and, admittedly, I've
> not tried using the patch with my yubikey yet, but I do plan to...).

What it does is to encrypt the passphrase on boot, store it on disk, and
then pass the file to the Yubikey to be decrypted and the passphrase
passed to the server.

> Are we sure we're talking about the same thing here..?  That's really
> the question I'm asking myself.

Seems so.
> 
> There's an entirely independent discussion to be had about figuring out
> a way to actually off-load *entirely* the encryption/decryption of data
> to the linux crypto system or hardware devices, but unless someone can
> step up and write code for those today, I'd suggest that we table that
> effort until after we get this initial capability of having TDE with PG
> doing all of the encryption/decryption.

Agreed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec 16, 2020 at 01:42:57PM -0500, Bruce Momjian wrote:
> On Wed, Dec 16, 2020 at 06:07:26PM +0000, Alastair Turner wrote:
> > Hi Bruce
> > 
> > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > ...
> > >
> > > The second approach is to make a new API for what you want....
> > 
> > I am trying to motivate for an alternate API. Specifically, an API
> > which allows any potential adopter of Postgres and Cluster File
> > Encryption to adopt them without having to accept any particular
> > approach to key management, key derivation, wrapping, validation, etc.
> > A passphrase key-wrapper with validation will probably be very useful
> > to a lot of people, but making it mandatory and requiring twists and
> > turns to integrate with already-established security infrastructure
> > sounds like a barrier to adoption.
> 
> Attached is a script that uses the AWS Secrets Manager, and it does key
> rotation with the new pg_altercpass tool too, just like all the other
> methods.

Attached is an improved script that does not pass the secret on the
command line.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
On Wed, 16 Dec 2020 at 21:32, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Alastair Turner (minion@decodable.me) wrote:
> > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
> > > The second approach is to make a new API for what you want....
> >
> > I am trying to motivate for an alternate API. Specifically, an API
> > which allows any potential adopter of Postgres and Cluster File
> > Encryption to adopt them without having to accept any particular
> > approach to key management, key derivation, wrapping, validation, etc.
> > A passphrase key-wrapper with validation will probably be very useful
> > to a lot of people, but making it mandatory and requiring twists and
> > turns to integrate with already-established security infrastructure
> > sounds like a barrier to adoption.
>
> Yeah, I mentioned this in one of the other threads where we were
> discussing KEKs and DEKs and such.  Forcing one solution for working
> with the KEK/DEK isn't really ideal.
>
> That said, maybe there's an option to have the user indicate to PG if
> they're passing in a passphrase or a DEK, ...

Some options very much like the current cluster_passphrase_command,
but that takes an input sounds to me like it would meet that
requirement. The challenge sent to the command could be just a text
prompt, or it could be the wrapped DEK. The selection of the command
would indicate what the user was expected to pass. The return would be
a DEK.

> ...but we otherwise more-or-less
> keep things as they are where the DEK that the user provides actually
> goes to encrypting the sub-keys that we use for tables vs. WAL..?  ...

The idea of subkeys sounds great, if it can merge the two potential
interactions into one and not complicate rotating the two parts
separately. But having Postgres encrypting them, leaves us open to
many of the same issues. That still feels like managing the keys, so a
corporate who have strong opinions on how that should be done will
still ask all sorts of pointy questions, complicating adoption.

> ...That
> avoids the complication of having to have an API that somehow provides
> more than one key, while also using the primary DEK key as-is from the
> key management service and the KEK never being seen on the system where
> PG is running.
>

Other than calling out (and therefore potentially prompting) twice,
what do you see as the complications of having more than one key?

> > > ...It would be
> > > similar to the cluster_passphrase_command, but it would be simple in
> > > some ways, but complex in others since we need at least two DEKs, and
> > > key rotation might be very risky since there isn't validation in the
> > > server.
> >
...
>
> I'm not quite following this- I agree that we don't want PG, or anything
> really, to see the private key that's on the Yubikey, as that wouldn't
> be good- instead, we should be able to ask for the Yubikey to decrypt
> the DEK for us and then use that, which I had thought was happening but
> it's not entirely clear from the above discussion (and, admittedly, I've
> not tried using the patch with my yubikey yet, but I do plan to...).
>
> Are we sure we're talking about the same thing here..?  That's really
> the question I'm asking myself.
>

I'd describe what the current patch does as using YubiKey to encrypt
and then decrypt an intermediate secret, which is then used to
generate/derive a KEK, which is then used to unwrap the stored,
wrapped DEK.

> There's an entirely independent discussion to be had about figuring out
> a way to actually off-load *entirely* the encryption/decryption of data
> to the linux crypto system or hardware devices, but unless someone can
> step up and write code for those today, I'd suggest that we table that
> effort until after we get this initial capability of having TDE with PG
> doing all of the encryption/decryption.

I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
to help in this direct.

Regards

Alastair



Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Alastair Turner (minion@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 21:32, Stephen Frost <sfrost@snowman.net> wrote:
> > * Alastair Turner (minion@decodable.me) wrote:
> > > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian <bruce@momjian.us> wrote:
> > > > The second approach is to make a new API for what you want....
> > >
> > > I am trying to motivate for an alternate API. Specifically, an API
> > > which allows any potential adopter of Postgres and Cluster File
> > > Encryption to adopt them without having to accept any particular
> > > approach to key management, key derivation, wrapping, validation, etc.
> > > A passphrase key-wrapper with validation will probably be very useful
> > > to a lot of people, but making it mandatory and requiring twists and
> > > turns to integrate with already-established security infrastructure
> > > sounds like a barrier to adoption.
> >
> > Yeah, I mentioned this in one of the other threads where we were
> > discussing KEKs and DEKs and such.  Forcing one solution for working
> > with the KEK/DEK isn't really ideal.
> >
> > That said, maybe there's an option to have the user indicate to PG if
> > they're passing in a passphrase or a DEK, ...
>
> Some options very much like the current cluster_passphrase_command,
> but that takes an input sounds to me like it would meet that
> requirement. The challenge sent to the command could be just a text
> prompt, or it could be the wrapped DEK. The selection of the command
> would indicate what the user was expected to pass. The return would be
> a DEK.

If I'm following, you're suggesting something like:

cluster_passphrase_command = 'aws get %q'

and then '%q' gets replaced with "Please provide the WAL DEK: ", or
something like that?  Prompting the user for each key?  Not sure how
well that's work if want to automate everything though.

If you have specific ideas, it'd really be helpful to give examples of
what you're thinking.

> > ...but we otherwise more-or-less
> > keep things as they are where the DEK that the user provides actually
> > goes to encrypting the sub-keys that we use for tables vs. WAL..?  ...
>
> The idea of subkeys sounds great, if it can merge the two potential
> interactions into one and not complicate rotating the two parts
> separately. But having Postgres encrypting them, leaves us open to
> many of the same issues. That still feels like managing the keys, so a
> corporate who have strong opinions on how that should be done will
> still ask all sorts of pointy questions, complicating adoption.

This really needs more detail- what exactly is the concern?  What are
the 'pointy questions'?  What's complicated about using sub-keys?  One
of the great advantages of using sub-keys is that you can rotate the KEK
(or whatever is passed to PG) without having to actually go through the
entire system and decyprt/re-encrypt everything.  There's, of course,
the downside that then you don't get the keys rotated as often as you
might like to, but I am, at least, hopeful that we might be able to find
a way to do that in the future.

> > ...That
> > avoids the complication of having to have an API that somehow provides
> > more than one key, while also using the primary DEK key as-is from the
> > key management service and the KEK never being seen on the system where
> > PG is running.
>
> Other than calling out (and therefore potentially prompting) twice,
> what do you see as the complications of having more than one key?

Mainly just a concern about the API and about what happens if, say, we
decide that we want to have another sub-key, for example.  If we're
handling them then there's really no issue- we just add another key in,
but if that's not happening then it's going to mean changes for
administrators.  If there's a good justification for it, then perhaps
that's alright, but hand waving at what the issue is doesn't really
help.

> > > > ...It would be
> > > > similar to the cluster_passphrase_command, but it would be simple in
> > > > some ways, but complex in others since we need at least two DEKs, and
> > > > key rotation might be very risky since there isn't validation in the
> > > > server.
> > >
> ...
> >
> > I'm not quite following this- I agree that we don't want PG, or anything
> > really, to see the private key that's on the Yubikey, as that wouldn't
> > be good- instead, we should be able to ask for the Yubikey to decrypt
> > the DEK for us and then use that, which I had thought was happening but
> > it's not entirely clear from the above discussion (and, admittedly, I've
> > not tried using the patch with my yubikey yet, but I do plan to...).
> >
> > Are we sure we're talking about the same thing here..?  That's really
> > the question I'm asking myself.
>
> I'd describe what the current patch does as using YubiKey to encrypt
> and then decrypt an intermediate secret, which is then used to
> generate/derive a KEK, which is then used to unwrap the stored,
> wrapped DEK.

This seems like a crux of at least one concern- that the current patch
is deriving the actual KEK from the passphrase and not just taking the
provided value (at least, that's what it looks like from a *very* quick
look into it), and that's the part that I was suggesting that we might
add an option for- to indicate if the cluster passphrase command is
actually returning a passphrase which should be used to derive a key, or
if it's returning a key directly to be used.  That does seem to be a
material difference to me and one which we should care about.

> > There's an entirely independent discussion to be had about figuring out
> > a way to actually off-load *entirely* the encryption/decryption of data
> > to the linux crypto system or hardware devices, but unless someone can
> > step up and write code for those today, I'd suggest that we table that
> > effort until after we get this initial capability of having TDE with PG
> > doing all of the encryption/decryption.
>
> I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
> to help in this direct.

Yes, I agree with that general idea but it's a 'next step' kind of
thing, not something we need to try and bake in today.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Wed, Dec 16, 2020 at 05:04:12PM -0500, Bruce Momjian wrote:
> On Wed, Dec 16, 2020 at 10:23:23AM +0900, Michael Paquier wrote:
>> Hmm.  I don't think that this is right.  First, this still mixes
>> ciphers and HMAC.
>
> I looked at the code.  The HMAC function body is just one function call
> to OpenSSL.  Do you want it moved to cryptohash.c and
> cryptohash_openssl.c?  To a new C file?
>
>> Second, it is only possible here to use HMAC with
>> SHA512 but we want to be able to use SHA256 as well with SCRAM (per se
>> the scram_HMAC_* business in scram-common.c).  Third, this lacks a
>
> I looked at the SCRAM HMAC code.  It looks complex, but I assume ideally
> that would be moved into a separate C file and used by cluster file
> encryption and SCRAM, right?  I need help to do that because I am
> unclear how much of the SCRAM hash code is specific to SCRAM.

I have gone though the same exercise for HMAC, with more success it
seems:
https://www.postgresql.org/message-id/X9m0nkEJEzIPXjeZ@paquier.xyz

This includes a fallback implementation that returns the same results
as OpenSSL, and the same results as samples in wikipedia or such
sources.  The set APIs in this refactoring could be reused here and
plugged into any SSL libraries, and my patch has changed SCRAM to
reuse that.  With this, it is also possible to remove all the HMAC
code from pgcrypto but this cannot happen without SHA1 support in
cryptohash.c, another patch I have submitted -hackers recently.  So I
think that it is a win as a whole.

>> fallback implementation.  Finally, pgcrypto is not touched, but we
>
> I have a fallback implemention --- it fails?  ;-)  Did you want me to
> include an AES implementation?

No idea about this one yet.  There are no direct users of AES except
pgcrypto in core.  One thing that would be good IMO is to properly
split the patch of this thread into individual parts that could be
reviewed separately using for example "git format-patch" to generate
patch series.  What's presented is a mixed bag, so that's harder to
look at it and consider how this stuff should work, and if there are
pieces that should be designed better or not.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
On Wed, 16 Dec 2020 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
...
>
> If I'm following, you're suggesting something like:
>
> cluster_passphrase_command = 'aws get %q'
>
> and then '%q' gets replaced with "Please provide the WAL DEK: ", or
> something like that?  Prompting the user for each key?  Not sure how
> well that's work if want to automate everything though.
>
> If you have specific ideas, it'd really be helpful to give examples of
> what you're thinking.

I can think of three specific ideas off the top of my head: the
passphrase key wrapper, the secret store and the cloud/HW KMS.

Since the examples expand the purpose of cluster_passphrase_command,
let's call it cluster_key_challenge_command in the examples.

Starting with the passphrase key wrapper, since it's what's in place now.

 - cluster_key_challenge_command = 'password_key_wrapper %q'
 - Supplied on stdin to the process is the wrapped DEK (either a
combined key for db files and WAL or one for each, on separate calls)
 - %q is "Please provide WAL key wrapper password" or just "...provide
key wrapper password"
 - Returned on stdout is the unwrapped DEK

For a secret store

 - cluster_key_challenge_command = 'vault_key_fetch'
 - Supplied on stdin to the process is the secret's identifier (pg_dek_xxUUIDxx)
 - Returned on stdout is the DEK, which may never have been wrapped,
depending on implementation
 - Access control to the secret store is managed through the challenge
command's own config, certs, HBA, ...

For an HSM or cloud KMS

 - cluster_key_challenge_command = 'gcp_kms_key_fetch'
 - Supplied on stdin to the process is the the wrapped DEK (individual
or combined)
 - Returned on stdout is the DEK (individual or combined)
 - Access control to the kms is managed through the challenge
command's own config, certs, HBA, ...

The secret store and HSM/KMS options may be promptless, and therefore
amenable to automation, depending on the setup of those clients.

>
...
>
> > > ...That
> > > avoids the complication of having to have an API that somehow provides
> > > more than one key, while also using the primary DEK key as-is from the
> > > key management service and the KEK never being seen on the system where
> > > PG is running.
> >
> > Other than calling out (and therefore potentially prompting) twice,
> > what do you see as the complications of having more than one key?
>
> Mainly just a concern about the API and about what happens if, say, we
> decide that we want to have another sub-key, for example.  If we're
> handling them then there's really no issue- we just add another key in,
> but if that's not happening then it's going to mean changes for
> administrators.  If there's a good justification for it, then perhaps
> that's alright, but hand waving at what the issue is doesn't really
> help.
>

Sorry, I wasn't trying to hand wave it away. For automated
interactions, like big iron HSMs or cloud KSMs, the difference between
2 operations and 10 operations to start a DB server won't matter. For
an admin/operator having to type 10 passwords or get 10 clean
thumbprint scans, it would be horrible. My underlying question was, is
that toil the only problem to be solved, or is there another reason to
get into key combination, key splitting and the related issues which
are less documented and less well understood than key wrapping.

>
...
> >
> > I'd describe what the current patch does as using YubiKey to encrypt
> > and then decrypt an intermediate secret, which is then used to
> > generate/derive a KEK, which is then used to unwrap the stored,
> > wrapped DEK.
>
> This seems like a crux of at least one concern- that the current patch
> is deriving the actual KEK from the passphrase and not just taking the
> provided value (at least, that's what it looks like from a *very* quick
> look into it), and that's the part that I was suggesting that we might
> add an option for- to indicate if the cluster passphrase command is
> actually returning a passphrase which should be used to derive a key, or
> if it's returning a key directly to be used.  That does seem to be a
> material difference to me and one which we should care about.
>

Yes. Caring about that is the reason I've been making a nuisance of myself.

> > > There's an entirely independent discussion to be had about figuring out
> > > a way to actually off-load *entirely* the encryption/decryption of data
> > > to the linux crypto system or hardware devices, but unless someone can
> > > step up and write code for those today, I'd suggest that we table that
> > > effort until after we get this initial capability of having TDE with PG
> > > doing all of the encryption/decryption.
> >
> > I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
> > to help in this direct.
>
> Yes, I agree with that general idea but it's a 'next step' kind of
> thing, not something we need to try and bake in today.
>

Agreed.

Thanks
Alastair



Re: Proposed patch for key managment

From
Daniel Gustafsson
Date:
> On 16 Dec 2020, at 17:26, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Michael Paquier (michael@paquier.xyz) wrote:
>> On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
>>> Yeah, looking at what's been done there seems like the right approach to
>>> me as well, ideally we'd have one set of APIs that'll support all these
>>> use cases (not unlike what curl does..).
>>
>> Ooh...  This is interesting.  What did curl do wrong here?  It is
>> always good to hear from such past experiences.
>
> Not sure that came across very well- curl did something right in terms
> of their vTLS layer which allows for building curl with a number of
> different SSL/TLS libraries without the core code having to know about
> all the differences.

The vtls layer in libcurl is actually quite similar to what we have in libpq
with be-secure.c and be-secure-*.c for TLS and with what Michael has been doing
lately with cryptohash.

In vtls library contexts are abstracted to the core code, with implementations
supplying a struct with a set of function pointers implementing functionality
(this difference is due to libcurl supporting multiple TLS libraries compiled
at the same time, something postgres IMO shouldn't do).  We do give
implementations a bit more leeway with how feature complete they must be,
mainly due to the wide variety of libraries supported (from OpenSSL to IBM
GSKit and most ones in between).  While basic it has served us quite well and
we have had first time contributors successfully come with a new TLS library as
a patch.

What we have so far in the tree (an abstract *reasonably generic* API hiding
all library context interactions) makes implementing support for a new TLS
library less daunting as no core code needs to be touched really.  Having
kicked the tyres on this a fair bit I really think we should maintain that
separation, it's complicated enough as it is given just how much code needs to
be written.

cheers ./daniel


Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Thu, Dec 17, 2020 at 01:15:37AM +0100, Daniel Gustafsson wrote:
> In vtls library contexts are abstracted to the core code, with implementations
> supplying a struct with a set of function pointers implementing functionality
> (this difference is due to libcurl supporting multiple TLS libraries compiled
> at the same time, something postgres IMO shouldn't do).  We do give
> implementations a bit more leeway with how feature complete they must be,
> mainly due to the wide variety of libraries supported (from OpenSSL to IBM
> GSKit and most ones in between).  While basic it has served us quite well and
> we have had first time contributors successfully come with a new TLS library as
> a patch.

This infrastructure has been chosen because curl requires to be able
to use multiple types of libraries at run-time, right?  I don't think
we need to get down to that for Postgres and keep things so as we are
only able to use one TLS library at the same time, the one compiled
with.  This makes the protocol simpler.  But perhaps I just lack
ambition and vision.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Wed, Dec 16, 2020 at 05:04:12PM -0500, Bruce Momjian wrote:
> >> fallback implementation.  Finally, pgcrypto is not touched, but we
> >
> > I have a fallback implemention --- it fails?  ;-)  Did you want me to
> > include an AES implementation?
>
> No idea about this one yet.  There are no direct users of AES except
> pgcrypto in core.  One thing that would be good IMO is to properly
> split the patch of this thread into individual parts that could be
> reviewed separately using for example "git format-patch" to generate
> patch series.  What's presented is a mixed bag, so that's harder to
> look at it and consider how this stuff should work, and if there are
> pieces that should be designed better or not.

I don't think there's any need for us to implement a fallback
implementation of AES.  I'm not entirely sure we need it for hashes
but since we've already got it...

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Thu, Dec 17, 2020 at 11:39:55AM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Michael Paquier (michael@paquier.xyz) wrote:
> > On Wed, Dec 16, 2020 at 05:04:12PM -0500, Bruce Momjian wrote:
> > >> fallback implementation.  Finally, pgcrypto is not touched, but we
> > > 
> > > I have a fallback implemention --- it fails?  ;-)  Did you want me to
> > > include an AES implementation?
> > 
> > No idea about this one yet.  There are no direct users of AES except
> > pgcrypto in core.  One thing that would be good IMO is to properly
> > split the patch of this thread into individual parts that could be
> > reviewed separately using for example "git format-patch" to generate
> > patch series.  What's presented is a mixed bag, so that's harder to
> > look at it and consider how this stuff should work, and if there are
> > pieces that should be designed better or not.
> 
> I don't think there's any need for us to implement a fallback
> implementation of AES.  I'm not entirely sure we need it for hashes
> but since we've already got it...

Agreed.  I think there is serious risk we would do AES in a different
way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
one day if we want, but as stated by Michael Paquier, it has to be
tested so we are sure it returns exactly the same values as OpenSSL.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec 14, 2020 at 11:16:18PM -0500, Bruce Momjian wrote:
> On Tue, Dec 15, 2020 at 10:36:56AM +0800, Neil Chen wrote:
> > Since our implementation is not in contrib, I don't think we should put the
> > script there. Maybe we can refer to postgresql.conf.sample?  
> 
> Uh, the script are 20-60 lines long --- I am attaching them to this
> email.  Plus, when we allow user prompting for the SSL passphrase, we
> will have another script, or maybe three mor if people want to use a
> Yubikey to unlock the SSL passphrase.

Here is a run of all four authentication methods, and updated scripts. 
I have renamed Yubiki to PIV since the script should work with anY
PIV-enabled deviced, like a CAC.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Michael Paquier
Date:
On Thu, Dec 17, 2020 at 12:10:22PM -0500, Bruce Momjian wrote:
> On Thu, Dec 17, 2020 at 11:39:55AM -0500, Stephen Frost wrote:
>> I don't think there's any need for us to implement a fallback
>> implementation of AES.  I'm not entirely sure we need it for hashes
>> but since we've already got it...

We need fallback implementations for cryptohashes (MD5, SHA1/2) and
HMAC because we have SCRAM authentication, pgcrypto and SQL functions
that should be able to work even when not building with any SSL
libraries.  So that's here to stay to ensure compatibility with what
we do.  All this stuff is already in the tree for ages, it was just
not shaped for a more centralized reuse.

> Agreed.  I think there is serious risk we would do AES in a different
> way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
> one day if we want, but as stated by Michael Paquier, it has to be
> tested so we are sure it returns exactly the same values as OpenSSL.

I think that it would be good to put some generalization here, and
look at options that are offered by other SSL libraries, like libnss
so as we don't finish with a design that restricts the use of a given
feature only to OpenSSL.
--
Michael

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec 18, 2020 at 10:01:22AM +0900, Michael Paquier wrote:
> On Thu, Dec 17, 2020 at 12:10:22PM -0500, Bruce Momjian wrote:
> > On Thu, Dec 17, 2020 at 11:39:55AM -0500, Stephen Frost wrote:
> >> I don't think there's any need for us to implement a fallback
> >> implementation of AES.  I'm not entirely sure we need it for hashes
> >> but since we've already got it...
> 
> We need fallback implementations for cryptohashes (MD5, SHA1/2) and
> HMAC because we have SCRAM authentication, pgcrypto and SQL functions
> that should be able to work even when not building with any SSL
> libraries.  So that's here to stay to ensure compatibility with what
> we do.  All this stuff is already in the tree for ages, it was just
> not shaped for a more centralized reuse.

One question is whether we want to support cluster file encryption
without OpenSSL --- right now we can't.  I am wondering if we really
need the hardware acceleration of OpenSSL for AES so doing our own AES
implementation might not even make sense, performance-wise.

> > Agreed.  I think there is serious risk we would do AES in a different
> > way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
> > one day if we want, but as stated by Michael Paquier, it has to be
> > tested so we are sure it returns exactly the same values as OpenSSL.
> 
> I think that it would be good to put some generalization here, and
> look at options that are offered by other SSL libraries, like libnss
> so as we don't finish with a design that restricts the use of a given
> feature only to OpenSSL.

Uh, you mean the C API or the user API?  I don't think we have any
OpenSSL requirement at the user level, except that my examples use
command-line openssl to generate the passphrase.

I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
so the single HMAC function is not in the cipher file anymore, attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Neil Chen
Date:


On Fri, Dec 18, 2020 at 3:02 AM Bruce Momjian <bruce@momjian.us> wrote:

Here is a run of all four authentication methods, and updated scripts.
I have renamed Yubiki to PIV since the script should work with anY
PIV-enabled deviced, like a CAC.

 
Thanks for attaching these patches. 
The unfortunate thing is that I am not very familiar with yubikey, so I will try to read it but may not be able to give useful advice. 
Regarding the location of script storage, why don't we name them like "pass_fd.sh.sample" and store them in the $DATA/share/postgresql directory after installation, where other .sample files are also stored here. In the source code directory, just put them in a directory related to KMGR.

Through your suggestions, I am learning about Cybertec's TDE which is a relatively "complete" implementation. I will continue to rely on these TDE patches and the goals listed in the Wiki to verify whether the KMS system can support our future feature.

Thanks.
--
There is no royal road to learning.
HighGo Software Co.

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec 18, 2020 at 11:19:02AM +0800, Neil Chen wrote:
> 
> 
> On Fri, Dec 18, 2020 at 3:02 AM Bruce Momjian <bruce@momjian.us> wrote:
> 
> 
>     Here is a run of all four authentication methods, and updated scripts.
>     I have renamed Yubiki to PIV since the script should work with anY
>     PIV-enabled deviced, like a CAC.
> 
> 
>  
> Thanks for attaching these patches. 
> The unfortunate thing is that I am not very familiar with yubikey, so I will
> try to read it but may not be able to give useful advice. 
> Regarding the location of script storage, why don't we name them like
> "pass_fd.sh.sample" and store them in the $DATA/share/postgresql directory
> after installation, where other .sample files are also stored here. In the
> source code directory, just put them in a directory related to KMGR.

Yeah, that makes sense.  They are small.

> Through your suggestions, I am learning about Cybertec's TDE which is a
> relatively "complete" implementation. I will continue to rely on these TDE
> patches and the goals listed in the Wiki to verify whether the KMS system can
> support our future feature.

Great to hear, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Thu, Dec 17, 2020 at 12:10:22PM -0500, Bruce Momjian wrote:
> > Agreed.  I think there is serious risk we would do AES in a different
> > way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
> > one day if we want, but as stated by Michael Paquier, it has to be
> > tested so we are sure it returns exactly the same values as OpenSSL.
>
> I think that it would be good to put some generalization here, and
> look at options that are offered by other SSL libraries, like libnss
> so as we don't finish with a design that restricts the use of a given
> feature only to OpenSSL.

While I agree with the general idea proposed here, I don't know that we
need to push super hard on it to be somehow perfect right now because it
simply won't be until we actually add support for another library, and I
don't think that's really this patch's responsibility.

So, yes, let's lay the groundwork and structure and perhaps spend a bit
of time looking at other libraries, but not demand this patch also add
support for a second library today, and accept that that means that the
structure we put in place may not end up being exactly perfect.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Dec 18, 2020 at 10:01:22AM +0900, Michael Paquier wrote:
> > On Thu, Dec 17, 2020 at 12:10:22PM -0500, Bruce Momjian wrote:
> > > On Thu, Dec 17, 2020 at 11:39:55AM -0500, Stephen Frost wrote:
> > >> I don't think there's any need for us to implement a fallback
> > >> implementation of AES.  I'm not entirely sure we need it for hashes
> > >> but since we've already got it...
> >
> > We need fallback implementations for cryptohashes (MD5, SHA1/2) and
> > HMAC because we have SCRAM authentication, pgcrypto and SQL functions
> > that should be able to work even when not building with any SSL
> > libraries.  So that's here to stay to ensure compatibility with what
> > we do.  All this stuff is already in the tree for ages, it was just
> > not shaped for a more centralized reuse.
>
> One question is whether we want to support cluster file encryption
> without OpenSSL --- right now we can't.  I am wondering if we really
> need the hardware acceleration of OpenSSL for AES so doing our own AES
> implementation might not even make sense, performance-wise.

No, I don't think we need to support file encryption independently of
any library being available.  Maybe some day we will, but that should
really just be another option to build with.

Guess it wasn't clear, but I was being a bit tounge-in-cheek regarding
the idea of dropping SCRAM support if we aren't built with OpenSSL.

> > > Agreed.  I think there is serious risk we would do AES in a different
> > > way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
> > > one day if we want, but as stated by Michael Paquier, it has to be
> > > tested so we are sure it returns exactly the same values as OpenSSL.
> >
> > I think that it would be good to put some generalization here, and
> > look at options that are offered by other SSL libraries, like libnss
> > so as we don't finish with a design that restricts the use of a given
> > feature only to OpenSSL.
>
> Uh, you mean the C API or the user API?  I don't think we have any
> OpenSSL requirement at the user level, except that my examples use
> command-line openssl to generate the passphrase.

What I would be thinking about with this are really three pieces-

- C API for libpq (not relevant for this, but we have had issues in the
  past with exposing OpenSSL-specific things there)

- C API in backend - we should try to at least set up the structure to
  allow multiple encryption implementations, either via different
  libraries or if someone feels it'd be useful to write a built-in
  implementation, but as I mentioned just a moment ago, we aren't going
  to get this perfect and we should accept that.

- User API - we should avoid having OpenSSL-specific bits exposed to
  users, and I don't think we do today, so I don't think this is an
  issue at the moment.

> I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> so the single HMAC function is not in the cipher file anymore, attached.

Will try to look at this soon, but in general the idea seems right.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec 18, 2020 at 08:18:53AM -0500, Stephen Frost wrote:
> What I would be thinking about with this are really three pieces-
> 
> - C API for libpq (not relevant for this, but we have had issues in the
>   past with exposing OpenSSL-specific things there)

Right.

> - C API in backend - we should try to at least set up the structure to
>   allow multiple encryption implementations, either via different
>   libraries or if someone feels it'd be useful to write a built-in
>   implementation, but as I mentioned just a moment ago, we aren't going
>   to get this perfect and we should accept that.

All my OpenSSL-specific stuff is isolated in src/common.

> - User API - we should avoid having OpenSSL-specific bits exposed to
>   users, and I don't think we do today, so I don't think this is an
>   issue at the moment.

Right, there is nothing OpenSSL-specific on the user side, but some of
the scripts assume you can pass an open file descriptor to a
PKCS11-enabled tool, and I don't know if non-OpenSSL libraries support
that.

Ideally, we would have scripts for each library to use their
command-line API for this.  I am hestitant to rename the scripts to
contain the name openssl because I am unclear if we will ever have
non-OpenSSL implementations of these.  I updated the script comments at
the top to indicate "It uses OpenSSL with PKCS11 enabled via OpenSC.".

One point is that the passphrase scripts are useful for cluster file
encryption _and_ unlocking TLS certificates.

> > I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> > so the single HMAC function is not in the cipher file anymore, attached.
> 
> Will try to look at this soon, but in general the idea seems right.

Should I be using the init/update/final HMAC API that SCRAM uses so it
is easier to integrate into Michael's patch?  I can do that, but didn't
because that is going to require me to create a much larger footprint in
the main code, and that might be harder to integrate once Michael's API
is final.  It is easier for me to change one line to five lines than to
change five lines to five different lines.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Dec 18, 2020 at 08:18:53AM -0500, Stephen Frost wrote:
> > - C API in backend - we should try to at least set up the structure to
> >   allow multiple encryption implementations, either via different
> >   libraries or if someone feels it'd be useful to write a built-in
> >   implementation, but as I mentioned just a moment ago, we aren't going
> >   to get this perfect and we should accept that.
>
> All my OpenSSL-specific stuff is isolated in src/common.

... and in 'openssl' files, with 'generic' functions on top, right?
That's what I recall from before and seems like the right approach to at
least start with, and then we likely make adjustments as needed to the
API when we add another encryption library.

> > - User API - we should avoid having OpenSSL-specific bits exposed to
> >   users, and I don't think we do today, so I don't think this is an
> >   issue at the moment.
>
> Right, there is nothing OpenSSL-specific on the user side, but some of
> the scripts assume you can pass an open file descriptor to a
> PKCS11-enabled tool, and I don't know if non-OpenSSL libraries support
> that.

That generally seems alright to me.

> Ideally, we would have scripts for each library to use their
> command-line API for this.  I am hestitant to rename the scripts to
> contain the name openssl because I am unclear if we will ever have
> non-OpenSSL implementations of these.  I updated the script comments at
> the top to indicate "It uses OpenSSL with PKCS11 enabled via OpenSC.".

I'm not too worried about the specific names of those scripts.
Definitely having comments that indicate what the dependencies are is
certainly good.

> One point is that the passphrase scripts are useful for cluster file
> encryption _and_ unlocking TLS certificates.

That's certainly good.

> > > I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> > > so the single HMAC function is not in the cipher file anymore, attached.
> >
> > Will try to look at this soon, but in general the idea seems right.
>
> Should I be using the init/update/final HMAC API that SCRAM uses so it
> is easier to integrate into Michael's patch?  I can do that, but didn't
> because that is going to require me to create a much larger footprint in
> the main code, and that might be harder to integrate once Michael's API
> is final.  It is easier for me to change one line to five lines than to
> change five lines to five different lines.

For my 2c, ideally we'd get a review done of Michael's changes and just
get that committed and have your work here rebased over that.  I don't
see any reason why we can't get that done in relatively short order.
Just because it's registered in the January CF doesn't mean we actually
have to wait for that to get done, it just needs a review from another
committer (or self certification from Michael if he's happy with it).

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec 18, 2020 at 11:13:44AM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Fri, Dec 18, 2020 at 08:18:53AM -0500, Stephen Frost wrote:
> > > - C API in backend - we should try to at least set up the structure to
> > >   allow multiple encryption implementations, either via different
> > >   libraries or if someone feels it'd be useful to write a built-in
> > >   implementation, but as I mentioned just a moment ago, we aren't going
> > >   to get this perfect and we should accept that.
> > 
> > All my OpenSSL-specific stuff is isolated in src/common.
> 
> ... and in 'openssl' files, with 'generic' functions on top, right?
> That's what I recall from before and seems like the right approach to at
> least start with, and then we likely make adjustments as needed to the
> API when we add another encryption library.

Uh, I did it the way cryptohash_openssl.c does it. Sawada-san's patch did
it kind of like that, but had #ifdef calls to OpenSSL versions, while
the cryptohash_openssl.c method has two files with exactly the same
function names, and they are conditionally compiled in the makefile
based on makefile defines, and that is how I did it.  Michael Paquier
pointed this out, and the msvc changes needed to handle it.

> > > > I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> > > > so the single HMAC function is not in the cipher file anymore, attached.
> > > 
> > > Will try to look at this soon, but in general the idea seems right.
> > 
> > Should I be using the init/update/final HMAC API that SCRAM uses so it
> > is easier to integrate into Michael's patch?  I can do that, but didn't
> > because that is going to require me to create a much larger footprint in
> > the main code, and that might be harder to integrate once Michael's API
> > is final.  It is easier for me to change one line to five lines than to
> > change five lines to five different lines.
> 
> For my 2c, ideally we'd get a review done of Michael's changes and just
> get that committed and have your work here rebased over that.  I don't

That would be nice.

> see any reason why we can't get that done in relatively short order.
> Just because it's registered in the January CF doesn't mean we actually
> have to wait for that to get done, it just needs a review from another
> committer (or self certification from Michael if he's happy with it).

Agreed.  I just don't want to wait until mid-January to get this into
the tree, and I am going to defer to Michael's timeline on this, which
is why I figured I might need to go first.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings Alastair,

* Alastair Turner (minion@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:
> > If I'm following, you're suggesting something like:
> >
> > cluster_passphrase_command = 'aws get %q'
> >
> > and then '%q' gets replaced with "Please provide the WAL DEK: ", or
> > something like that?  Prompting the user for each key?  Not sure how
> > well that's work if want to automate everything though.
> >
> > If you have specific ideas, it'd really be helpful to give examples of
> > what you're thinking.
>
> I can think of three specific ideas off the top of my head: the
> passphrase key wrapper, the secret store and the cloud/HW KMS.
>
> Since the examples expand the purpose of cluster_passphrase_command,
> let's call it cluster_key_challenge_command in the examples.

These ideas don't seem too bad but I'd think we'd pass which key we want
on the command-line using a %i or something like that, rather than using
stdin, unless there's some reason that would be an issue..?  Certainly
the CLI utilities I've seen tend to expect the name of the secret that
you're asking for to be passed on the command line.

> Starting with the passphrase key wrapper, since it's what's in place now.
>
>  - cluster_key_challenge_command = 'password_key_wrapper %q'

I do tend to like this idea of having what
cluster_key_challenge_command, or whatever we call it, expects is an
actual key and have the command that is run be a separate command that
takes the passphrase and runs the KDF (key derivation function) on it,
when a passphrase is what the user wishes to use.

That generally makes more sense to me than having the key derivation
effort built into the backend which I have a hard time seeing any
particular reason for, as long we're already calling out to some
external utility of some kind to get the key.

>  - Supplied on stdin to the process is the wrapped DEK (either a
> combined key for db files and WAL or one for each, on separate calls)
>  - %q is "Please provide WAL key wrapper password" or just "...provide
> key wrapper password"
>  - Returned on stdout is the unwrapped DEK

> For a secret store
>
>  - cluster_key_challenge_command = 'vault_key_fetch'
>  - Supplied on stdin to the process is the secret's identifier (pg_dek_xxUUIDxx)
>  - Returned on stdout is the DEK, which may never have been wrapped,
> depending on implementation
>  - Access control to the secret store is managed through the challenge
> command's own config, certs, HBA, ...

> For an HSM or cloud KMS
>
>  - cluster_key_challenge_command = 'gcp_kms_key_fetch'
>  - Supplied on stdin to the process is the the wrapped DEK (individual
> or combined)
>  - Returned on stdout is the DEK (individual or combined)
>  - Access control to the kms is managed through the challenge
> command's own config, certs, HBA, ...
>
> The secret store and HSM/KMS options may be promptless, and therefore
> amenable to automation, depending on the setup of those clients.

We can't really have what is passed on stdin, or not, be different
without having some other option say which it is and that seems like
it'd be making it overly complicated, and I get why Bruce would rather
not make this too complicated.

With the thought of trying to keep it a reasonably simple interface, I
had a thought along these lines:

- Separate command that runs the KDF, this is simple enough as it's
  really just a hash, and it returns the key on stdout.
- initdb time option that says if we're going to have PG manage the
  sub-keys, or not.
- cluster_key_command defined as "a command that is passed the ID of
  the key, or keys, required for the cluster to start"

initdb --pg-subkeys
  - Calls cluster_key_command once with "$PG_SYSTEM_ID-main" or similar
    and expects the main key to be provided on stdout.  Everything else
    is then more-or-less as is today: PG generates DEK sub-keys for data
    and WAL and then encrypts them with the main key and stores them.

As long as the enveloped keys file exists on the filesystem, when PG
starts, it'll call the cluster_key_command and will expect the 'main'
key to be provided and it'll then decrypt and verify the DEK sub-keys,
very similar to today.

In this scenario, cluster_key_command might be set to call a command
which accepts a passphrase and runs a KDF on it, or it might be set to
call out to an external vaulting system or to a Yubikey, etc.

initdb --no-pg-subkeys
  - Calls cluster_key_command for each of the sub-keys that "pg-subkeys"
    would normally generate itself, passing "$PG_SYSTEM_ID-keyid" for
    each (eg: $PG_SYSTEM_ID-data, $PG_SYSTEM_ID-wal), and getting back
    the keys on stdout to use.

When PG starts, it sees that the enveloped keys file doesn't exist and
does the same as initdb did- calls cluster_key_command multiple times to
get the keys which are needed.  We'd want to make sure to have a way
early on that checks that the data DEK provided actually decrypts the
cluster, and bail out otherwise, before actually writing any data with
that key.  I'll note though that this approach would actually allow you
to change the WAL key, if you wanted to, though, which could certainly
be nice (naturally there would be various caveats about doing so and
that replicas would have to also be switched to the new key, etc, but
that all seems reasonably solvable).  Having a stand-alone utility that
could do that for the --pg-subkeys case would be useful too (and just
generally decrypt it for viewing/backup/replacement/etc).

Down the line this might even allow us to do an online re-keying, at
least once the challenges around enabling online data checksums are
sorted out, but I don't think that's something to worry about today.
Still, it seems like this potentially provides a pretty clean interface
for that to happen eventually.

> > > > ...That
> > > > avoids the complication of having to have an API that somehow provides
> > > > more than one key, while also using the primary DEK key as-is from the
> > > > key management service and the KEK never being seen on the system where
> > > > PG is running.
> > >
> > > Other than calling out (and therefore potentially prompting) twice,
> > > what do you see as the complications of having more than one key?
> >
> > Mainly just a concern about the API and about what happens if, say, we
> > decide that we want to have another sub-key, for example.  If we're
> > handling them then there's really no issue- we just add another key in,
> > but if that's not happening then it's going to mean changes for
> > administrators.  If there's a good justification for it, then perhaps
> > that's alright, but hand waving at what the issue is doesn't really
> > help.
>
> Sorry, I wasn't trying to hand wave it away. For automated
> interactions, like big iron HSMs or cloud KSMs, the difference between
> 2 operations and 10 operations to start a DB server won't matter. For
> an admin/operator having to type 10 passwords or get 10 clean
> thumbprint scans, it would be horrible. My underlying question was, is
> that toil the only problem to be solved, or is there another reason to
> get into key combination, key splitting and the related issues which
> are less documented and less well understood than key wrapping.

I appreciate that you're not trying to hand wave it away but this also
didn't really answer the actual question- what's the advantage to having
all of the keys provided by something external rather than having that
external thing provide just one 'main' key, which we then use to decrypt
our enveloped keys that we actually use?  I can think of some possible
advantages but I'd really like to know what advantages you're seeing in
doing that.

> > > I'd describe what the current patch does as using YubiKey to encrypt
> > > and then decrypt an intermediate secret, which is then used to
> > > generate/derive a KEK, which is then used to unwrap the stored,
> > > wrapped DEK.
> >
> > This seems like a crux of at least one concern- that the current patch
> > is deriving the actual KEK from the passphrase and not just taking the
> > provided value (at least, that's what it looks like from a *very* quick
> > look into it), and that's the part that I was suggesting that we might
> > add an option for- to indicate if the cluster passphrase command is
> > actually returning a passphrase which should be used to derive a key, or
> > if it's returning a key directly to be used.  That does seem to be a
> > material difference to me and one which we should care about.
>
> Yes. Caring about that is the reason I've been making a nuisance of myself.

Right, I think we can agree on this aspect and I've chatted with Bruce
about it some also.  When a direct key can be provided, we should use
that, and not run it through a KDF.  This seems like a particularly
important case that we should care about even at this early stage.

> > > > There's an entirely independent discussion to be had about figuring out
> > > > a way to actually off-load *entirely* the encryption/decryption of data
> > > > to the linux crypto system or hardware devices, but unless someone can
> > > > step up and write code for those today, I'd suggest that we table that
> > > > effort until after we get this initial capability of having TDE with PG
> > > > doing all of the encryption/decryption.
> > >
> > > I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
> > > to help in this direct.
> >
> > Yes, I agree with that general idea but it's a 'next step' kind of
> > thing, not something we need to try and bake in today.
>
> Agreed.

Great, glad we can agree to hold off on this until a future point- being
able to agree on how we box in this particular capability is important
or we may never end up releasing it.  I know that's one of the concerns
that others on this thread have and it's an important one.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Fri, Dec 18, 2020 at 04:36:24PM -0500, Stephen Frost wrote:
> > Starting with the passphrase key wrapper, since it's what's in place now.
> > 
> >  - cluster_key_challenge_command = 'password_key_wrapper %q'
> 
> I do tend to like this idea of having what
> cluster_key_challenge_command, or whatever we call it, expects is an
> actual key and have the command that is run be a separate command that
> takes the passphrase and runs the KDF (key derivation function) on it,
> when a passphrase is what the user wishes to use.
> 
> That generally makes more sense to me than having the key derivation
> effort built into the backend which I have a hard time seeing any
> particular reason for, as long we're already calling out to some
> external utility of some kind to get the key.

I have modified the patch to do as you suggested, and as you explained
to me on the phone.  :-)  Instead of having the server hash the pass
phrase provided by the cluster_passphrase_command, have the
cluster_passphrase_command generate a sha256 hash if the user-provided
input is not already 64 hex bytes.  If it is already 64 hex bytes, e.g,
via openssl rand -hex 32, no need to have the server hash that again. 
Double-hashing is less secure.

I have modified the attached patch to do that, and the scripts --- all
my tests pass.  FYI, I moved hex_decode to src/common so that front-end
could use it, and removed it from ecpg since ecpg can use the common one
now too.

> > Sorry, I wasn't trying to hand wave it away. For automated
> > interactions, like big iron HSMs or cloud KSMs, the difference between
> > 2 operations and 10 operations to start a DB server won't matter. For
> > an admin/operator having to type 10 passwords or get 10 clean
> > thumbprint scans, it would be horrible. My underlying question was, is
> > that toil the only problem to be solved, or is there another reason to
> > get into key combination, key splitting and the related issues which
> > are less documented and less well understood than key wrapping.
> 
> I appreciate that you're not trying to hand wave it away but this also
> didn't really answer the actual question- what's the advantage to having
> all of the keys provided by something external rather than having that
> external thing provide just one 'main' key, which we then use to decrypt
> our enveloped keys that we actually use?  I can think of some possible
> advantages but I'd really like to know what advantages you're seeing in
> doing that.

I am not going be as kind.  Our workflow is:

    Desirability -> Design -> Implement -> Test -> Review -> Commit
    https://wiki.postgresql.org/wiki/Todo#Development_Process

I have already asked about the first item, and received a reply talking
about the design --- that is not helpful.  I only have so much patience
for the "I want my secret decoder ring to glow in the dark" type of
feature additions to this already complex feature.  Unless we stay on
Desirability, I will not be replying.  If you can't answer the
Desirability question, well, talking about items farther right on the
list is not very helpful.

Now, I will say that your question about how a KMS will use this got me
thinking about how to test this, and that got me to implement the AWS
Secret Manager script, so that we definitely helpful.  My point is that
I don't think it is helpful to get into long discussions unless the
Desirability is clearly answered.  This is not just related to this
thread --- this kind of jump-over-Desirability has happened a lot in
relation to this feature, so I thought it would be good to clearly state
it now.

> > > This seems like a crux of at least one concern- that the current patch
> > > is deriving the actual KEK from the passphrase and not just taking the
> > > provided value (at least, that's what it looks like from a *very* quick
> > > look into it), and that's the part that I was suggesting that we might
> > > add an option for- to indicate if the cluster passphrase command is
> > > actually returning a passphrase which should be used to derive a key, or
> > > if it's returning a key directly to be used.  That does seem to be a
> > > material difference to me and one which we should care about.
> > 
> > Yes. Caring about that is the reason I've been making a nuisance of myself.
> 
> Right, I think we can agree on this aspect and I've chatted with Bruce
> about it some also.  When a direct key can be provided, we should use
> that, and not run it through a KDF.  This seems like a particularly
> important case that we should care about even at this early stage.

Agreed, and done in the attached patch.  :-)  I am thankful for all the
ideas that has helped move this feature forward.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
Hi Stephen

On Fri, 18 Dec 2020 at 21:36, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings Alastair,
>
> * Alastair Turner (minion@decodable.me) wrote:
> > On Wed, 16 Dec 2020 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:
...
> > passphrase key wrapper, the secret store and the cloud/HW KMS.
> >
> > Since the examples expand the purpose of cluster_passphrase_command,
> > let's call it cluster_key_challenge_command in the examples.
>
> These ideas don't seem too bad but I'd think we'd pass which key we want
> on the command-line using a %i or something like that, rather than using
> stdin, unless there's some reason that would be an issue..?  Certainly
> the CLI utilities I've seen tend to expect the name of the secret that
> you're asking for to be passed on the command line.

Agreed. I was working on the assumption that the calling process
(initdb or pg_ctl) would have access to the challenge material and be
passing it to the utility, so putting it in a command line would allow
it to leak. If the utility is accessing the challenge material
directly, then just an indicator of which key to work with would be a
lot simpler, true.

>
> > Starting with the passphrase key wrapper, since it's what's in place now.
> >
> >  - cluster_key_challenge_command = 'password_key_wrapper %q'
>
> I do tend to like this idea of having what
> cluster_key_challenge_command, or whatever we call it, expects is an
> actual key and have the command that is run be a separate command that
> takes the passphrase and runs the KDF (key derivation function) on it,
> when a passphrase is what the user wishes to use.
>
> That generally makes more sense to me than having the key derivation
> effort built into the backend which I have a hard time seeing any
> particular reason for, as long we're already calling out to some
> external utility of some kind to get the key.
>
...
>
> With the thought of trying to keep it a reasonably simple interface, I
> had a thought along these lines:
>
> - Separate command that runs the KDF, this is simple enough as it's
>   really just a hash, and it returns the key on stdout.
> - initdb time option that says if we're going to have PG manage the
>   sub-keys, or not.
> - cluster_key_command defined as "a command that is passed the ID of
>   the key, or keys, required for the cluster to start"
>
> initdb --pg-subkeys
>   - Calls cluster_key_command once with "$PG_SYSTEM_ID-main" or similar
>     and expects the main key to be provided on stdout.  Everything else
>     is then more-or-less as is today: PG generates DEK sub-keys for data
>     and WAL and then encrypts them with the main key and stores them.
>
> As long as the enveloped keys file exists on the filesystem, when PG
> starts, it'll call the cluster_key_command and will expect the 'main'
> key to be provided and it'll then decrypt and verify the DEK sub-keys,
> very similar to today.
>
> In this scenario, cluster_key_command might be set to call a command
> which accepts a passphrase and runs a KDF on it, or it might be set to
> call out to an external vaulting system or to a Yubikey, etc.
>
> initdb --no-pg-subkeys
>   - Calls cluster_key_command for each of the sub-keys that "pg-subkeys"
>     would normally generate itself, passing "$PG_SYSTEM_ID-keyid" for
>     each (eg: $PG_SYSTEM_ID-data, $PG_SYSTEM_ID-wal), and getting back
>     the keys on stdout to use.
>
> When PG starts, it sees that the enveloped keys file doesn't exist and
> does the same as initdb did- calls cluster_key_command multiple times to
> get the keys which are needed.  We'd want to make sure to have a way
> early on that checks that the data DEK provided actually decrypts the
> cluster, and bail out otherwise, before actually writing any data with
> that key.  I'll note though that this approach would actually allow you
> to change the WAL key, if you wanted to, though, which could certainly
> be nice (naturally there would be various caveats about doing so and
> that replicas would have to also be switched to the new key, etc, but
> that all seems reasonably solvable). Having a stand-alone utility that
> could do that for the --pg-subkeys case would be useful too (and just
> generally decrypt it for viewing/backup/replacement/etc).
>
> Down the line this might even allow us to do an online re-keying, at
> least once the challenges around enabling online data checksums are
> sorted out, but I don't think that's something to worry about today.
> Still, it seems like this potentially provides a pretty clean interface
> for that to happen eventually.
>

Changing the WAL key independently of the local storage key is the big
operational advantage I see of managing the two keys separately. It
lays the basis for a full key rotation story.

Rotating WAL keys during switchover between a pair with the new key
applicable from a certain timeline number maye be close enough to
online to satisfy most requirements. Not something to worry about
today, as you say.

>
> Right, I think we can agree on this aspect and I've chatted with Bruce
> about it some also.  When a direct key can be provided, we should use
> that, and not run it through a KDF.  This seems like a particularly
> important case that we should care about even at this early stage.
>

Thanks for following it up.

Thanks,
Alastair



Re: Proposed patch for key managment

From
Alastair Turner
Date:
Hi Bruce

On Sat, 19 Dec 2020 at 02:38, Bruce Momjian <bruce@momjian.us> wrote:
>
> I am not going be as kind.  Our workflow is:
>
>         Desirability -> Design -> Implement -> Test -> Review -> Commit
>         https://wiki.postgresql.org/wiki/Todo#Development_Process
>
> I have already asked about the first item, and received a reply talking
> about the design --- that is not helpful.  I only have so much patience
> for the "I want my secret decoder ring to glow in the dark" type of
> feature additions to this already complex feature.  Unless we stay on
> Desirability, I will not be replying.  If you can't answer the
> Desirability question, well, talking about items farther right on the
> list is not very helpful.
>
> Now, I will say that your question about how a KMS will use this got me
> thinking about how to test this, and that got me to implement the AWS
> Secret Manager script, so that we definitely helpful.  My point is that
> I don't think it is helpful to get into long discussions unless the
> Desirability is clearly answered.  This is not just related to this
> thread --- this kind of jump-over-Desirability has happened a lot in
> relation to this feature, so I thought it would be good to clearly state
> it now.
>

Sorry, I have waved Desirability through under the headings of ease of
adoption or not raising barriers to adoption, without detailing what
barriers I see or how to avoid them. I also realise that "don't scare
the users" is so open-ended so as to be actively unhelpful and very
quickly starts to sound like "I want my secret decoder ring to glow
pink, or blue, or green when anyone asks". Given the complexity of the
feature and pixels spilled in discussing it, I understand that it gets
frustrating. That said, I believe there is an important case to be
made here.

In summary, I believe that forcing an approach for key generation and
wrapping onto users of Cluster File Encryption limits the Desirability
of the feature.

Cluster File Encryption for Postgres would be Desirable to many users
I meet if, and only if, the generation and handling of keys fits with
their corporate policies. Therefore, giving a user the option to pass
an encryption key to Postgres for CFE is Desirable. I realise that the
option for feeding the keys in directly is an additional feature, and
that it has a user experience impact if a passphrase is used. It is,
however, a feature which opens up near-infinite flexibility. To
stretch the analogy, it is the clip for attaching coloured or opaque
coverings to the glowy bit on the secret decoder ring.

The generation of keys and the wrapping of keys are contentious issues
and are frequently addressed/specified in corporate security policies,
standards and technical papers (NIST 800-38F is often mentioned in
product docs). There are, for instance, policies in the wild which
require that keys for long term use are generated from the output of a
True Random Number Generator - the practical implication being that
the key used to encrypt data at rest should be created by an HSM. When
and why to use symmetric or asymmetric cyphers for key wrapping is
another item which different organisations have different policies on
- the hardware and cloud service vendors all offer both options, even
if they recommend one and make it easier to use.

Thanks

Alastair



Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sat, Dec 19, 2020 at 11:45:15AM +0000, Alastair Turner wrote:
> Sorry, I have waved Desirability through under the headings of ease of
> adoption or not raising barriers to adoption, without detailing what
> barriers I see or how to avoid them. I also realise that "don't scare
> the users" is so open-ended so as to be actively unhelpful and very
> quickly starts to sound like "I want my secret decoder ring to glow
> pink, or blue, or green when anyone asks". Given the complexity of the
> feature and pixels spilled in discussing it, I understand that it gets
> frustrating. That said, I believe there is an important case to be
> made here.

I am pleased you understood my feelings on this.  Our last big
discussion on this topic is here:


https://www.postgresql.org/message-id/flat/CA%2Bfd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD%3D54Zp6NBQTCQ%40mail.gmail.com

and it was so unproductive that we started having closed voice calls
every other Friday so we could discuss this without lots of "decoder
ring" ideas that had to be explained.  The result is our wiki page:

    https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

It has taken me a while to understand why this topic seems to almost
uniquely gravitate toward "decoder ring" discussion.

> In summary, I believe that forcing an approach for key generation and
> wrapping onto users of Cluster File Encryption limits the Desirability
> of the feature.
> 
> Cluster File Encryption for Postgres would be Desirable to many users
> I meet if, and only if, the generation and handling of keys fits with
> their corporate policies. Therefore, giving a user the option to pass
> an encryption key to Postgres for CFE is Desirable. I realise that the
> option for feeding the keys in directly is an additional feature, and
> that it has a user experience impact if a passphrase is used. It is,
> however, a feature which opens up near-infinite flexibility. To
> stretch the analogy, it is the clip for attaching coloured or opaque
> coverings to the glowy bit on the secret decoder ring.
> 
> The generation of keys and the wrapping of keys are contentious issues
> and are frequently addressed/specified in corporate security policies,
> standards and technical papers (NIST 800-38F is often mentioned in
> product docs). There are, for instance, policies in the wild which
> require that keys for long term use are generated from the output of a
> True Random Number Generator - the practical implication being that
> the key used to encrypt data at rest should be created by an HSM. When
> and why to use symmetric or asymmetric cyphers for key wrapping is
> another item which different organisations have different policies on
> - the hardware and cloud service vendors all offer both options, even
> if they recommend one and make it easier to use.

To enable the direct injection of keys into the server, we would need a
new command for this, since trying to make the passphrase command do
this will lead to unnecessary complexity.  The passphrase command should
do one thing, and you can't have it changing its behavior based on
parameters, since the parameters are for the script to process, not to
change behavior of the server.

If we wanted to do this, we would need a new command, and one of the
parameters would be %k or something that would identify the key number
we want to retrieve from the KMS.  Stephen pointed out that we could
still validate that key;  the key would not be stored wrapped in the
file system, but we could store an HMAC in the file system, and use that
for validation.

On other interesting approach would be to allow the server to call out
for a KMS when it needs to generate the initial data keys that are
wrapped by the passphrase;  this is in contrast to calling the KMS
everytime it needs the data keys.

We should create code for the general use-case, which is currently
passphrase-wrapped system-generated data keys, and then get feedback on
what else we need.  However, I should point out that the community is
somewhat immune to the "my company needs this" kind of argument without
logic to back it up, though sometimes I think this entire feature is in
that category.  The data keys are generated using this random value
code:

    https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/port/pg_strong_random.c;hb=HEAD

so someone would have to explain why generating this remotely in a KMS
is superior, not just based on company policy.

My final point is that we can find ways to do what you are suggesting as
an addition to what we are adding now.  What we need is clear
justification of why these additional features are needed.  Requiring
the use of a true random number generator is a valid argument, but we
need to figure out, once this is implemented, who really wants that, and
how to implement it cleanly.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sat, Dec 19, 2020 at 11:58:37AM -0500, Bruce Momjian wrote:
> My final point is that we can find ways to do what you are suggesting as
> an addition to what we are adding now.  What we need is clear
> justification of why these additional features are needed.  Requiring
> the use of a true random number generator is a valid argument, but we
> need to figure out, once this is implemented, who really wants that, and
> how to implement it cleanly.

I added a comment to this script to explain how to generate a true
random passphrase, attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
Thanks, Bruce

On Sat, 19 Dec 2020 at 16:58, Bruce Momjian <bruce@momjian.us> wrote:
>
...
>
> To enable the direct injection of keys into the server, we would need a
> new command for this, since trying to make the passphrase command do
> this will lead to unnecessary complexity.  The passphrase command should
> do one thing, and you can't have it changing its behavior based on
> parameters, since the parameters are for the script to process, not to
> change behavior of the server.
>
> If we wanted to do this, we would need a new command, and one of the
> parameters would be %k or something that would identify the key number
> we want to retrieve from the KMS.  Stephen pointed out that we could
> still validate that key;  the key would not be stored wrapped in the
> file system, but we could store an HMAC in the file system, and use that
> for validation.
>

I think that this is where we have ended up talking at cross-purposes.
My idea (after some refining based on Stephen's feedback) is that
there should be only this new command (the cluster key command) and
that the program for unwrapping stored keys should be called this way.
As could a program to contact an HSM, etc. This moves the generating
and wrapping functionality out of the core Postgres processes, making
it far easier to add alternatives. I see this approach was discussed
on the email thread you linked to, but I can't see where or how a
conclusion was reached about it...

>
> On other interesting approach would be to allow the server to call out
> for a KMS when it needs to generate the initial data keys that are
> wrapped by the passphrase;  this is in contrast to calling the KMS
> everytime it needs the data keys.
>
...
> ...The data keys are generated using this random value
> code:
>
>         https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/port/pg_strong_random.c;hb=HEAD
>
> so someone would have to explain why generating this remotely in a KMS
> is superior, not just based on company policy.
>

The pg_strong_random() function uses OpesnSSL's RAND_bytes() where
available. With appropriate configuration of an OpenSSL engine
supplying an alternative RAND, this could be handed off to a TRNG.

This is an example of the other option for providing flexibility to
support specific key generation, wrapping, ... requirements - handing
the operations off to a library like OpenSSL or nss tls which, in
turn, can use pluggable providers for these functions. FWIW, the
proprietary DBMSs use a pluggable engine approach to meet requirements
in this category.

>
> We should create code for the general use-case, which is currently
> passphrase-wrapped system-generated data keys, and then get feedback on
> what else we need.  However, I should point out that the community is
> somewhat immune to the "my company needs this" kind of argument without
> logic to back it up, though sometimes I think this entire feature is in
> that category. ...

Yeah. Security in general, and crypto in particular, are often
"because someone told me to" requirements.

>
> My final point is that we can find ways to do what you are suggesting as
> an addition to what we are adding now.  What we need is clear
> justification of why these additional features are needed.  Requiring
> the use of a true random number generator is a valid argument, but we
> need to figure out, once this is implemented, who really wants that, and
> how to implement it cleanly.
>

Clean interfaces would be either "above" the database calling out to
commands in user-land or "below" the database in the abstractions
which OpenSSL, nss tls and friends provide. Since the conclusion seems
already to have been reached that the keyring should be inside the
core process and only the passphrase should pop out above, I'll review
the patch in this vein and see if I can suggest any routes to carving
out more manageable subsets of the patch.

"...once this is implemented..." changes become a lot more difficult.
Particularly when the changes would affect what are regarded as the
database's on-disk files. Which I suspect is a contributing factor to
the level of engagement surrounding this feature.

Regards
Alastair



Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings Alastair,

* Alastair Turner (minion@decodable.me) wrote:
> On Sat, 19 Dec 2020 at 16:58, Bruce Momjian <bruce@momjian.us> wrote:
> > To enable the direct injection of keys into the server, we would need a
> > new command for this, since trying to make the passphrase command do
> > this will lead to unnecessary complexity.  The passphrase command should
> > do one thing, and you can't have it changing its behavior based on
> > parameters, since the parameters are for the script to process, not to
> > change behavior of the server.
> >
> > If we wanted to do this, we would need a new command, and one of the
> > parameters would be %k or something that would identify the key number
> > we want to retrieve from the KMS.  Stephen pointed out that we could
> > still validate that key;  the key would not be stored wrapped in the
> > file system, but we could store an HMAC in the file system, and use that
> > for validation.
>
> I think that this is where we have ended up talking at cross-purposes.
> My idea (after some refining based on Stephen's feedback) is that
> there should be only this new command (the cluster key command) and
> that the program for unwrapping stored keys should be called this way.
> As could a program to contact an HSM, etc. This moves the generating
> and wrapping functionality out of the core Postgres processes, making
> it far easier to add alternatives. I see this approach was discussed
> on the email thread you linked to, but I can't see where or how a
> conclusion was reached about it...

I do generally like the idea of having the single command able to be
used to either provide the KEK (where PG manages the keyring
internally), or called multiple times to retrieve the DEKs (in which
case PG wouldn't be managing the keyring).

However, after chatting with Bruce about it for a bit this weekend, I'm
not sure that we need to tackle the second case today.  I don't think
there's any doubt that there will be users who will want PG to manage
the keyring and to be able to work with just a passphrase, as Bruce has
worked to make happen here and which we have a patch for.  I'm hoping
Bruce will post a new patch soon based on the work that he and I
discussed today (mostly just clarifications around keys vs. passphrases
and having the PG side accept a key which the command that's run will
produce).  With that, I'm feeling pretty comfortable that we can move
forward and at least get this initial work committed.

> The pg_strong_random() function uses OpesnSSL's RAND_bytes() where
> available. With appropriate configuration of an OpenSSL engine
> supplying an alternative RAND, this could be handed off to a TRNG.

Sure.

> This is an example of the other option for providing flexibility to
> support specific key generation, wrapping, ... requirements - handing
> the operations off to a library like OpenSSL or nss tls which, in
> turn, can use pluggable providers for these functions. FWIW, the
> proprietary DBMSs use a pluggable engine approach to meet requirements
> in this category.

OpenSSL provides for this configuration and gives us a pluggable
architecture to make this happen, so I'm not sure what the concern here
is..?  I agree that eventually it'd be nice to allow the administrator
to, more directly, control the DEKs but we're still going to need to
have a solution for the user who wishes to just provide a passphrase or
a KEK and I don't see any particular reason to hold off on the
implementation of that.

> > My final point is that we can find ways to do what you are suggesting as
> > an addition to what we are adding now.  What we need is clear
> > justification of why these additional features are needed.  Requiring
> > the use of a true random number generator is a valid argument, but we
> > need to figure out, once this is implemented, who really wants that, and
> > how to implement it cleanly.
>
> Clean interfaces would be either "above" the database calling out to
> commands in user-land or "below" the database in the abstractions
> which OpenSSL, nss tls and friends provide. Since the conclusion seems
> already to have been reached that the keyring should be inside the
> core process and only the passphrase should pop out above, I'll review
> the patch in this vein and see if I can suggest any routes to carving
> out more manageable subsets of the patch.

There's no doubt that there needs to be a solution where the keyring is
managed by PG.  Perhaps there are options to allow an external keyring
as well, but we can add that in the future.

> "...once this is implemented..." changes become a lot more difficult.
> Particularly when the changes would affect what are regarded as the
> database's on-disk files. Which I suspect is a contributing factor to
> the level of engagement surrounding this feature.

Yes, it's true that after things are implemented it can be more
difficult to change them- but if you're concerned about the specific
on-disk format of the keyring then please help us understand what your
concern is there.  The concern you've raised so far is just that you'd
like an option to have an external keyring, and that's fine, but we are
also going to need to have an internal one and which comes first doesn't
seem particularly material to me.  I don't think having one or the other
in place first will really detract or make more difficult the adding of
the other later.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
Hi Stephen

On Sun, 20 Dec 2020 at 22:59, Stephen Frost <sfrost@snowman.net> wrote:
>
...
> However, after chatting with Bruce about it for a bit this weekend, I'm
> not sure that we need to tackle the second case today.  I don't think
> there's any doubt that there will be users who will want PG to manage
> the keyring and to be able to work with just a passphrase, as Bruce has
> worked to make happen here and which we have a patch for.  I'm hoping
> Bruce will post a new patch soon based on the work that he and I
> discussed today (mostly just clarifications around keys vs. passphrases
> and having the PG side accept a key which the command that's run will
> produce)...
>

Having Postgres accept a key which the command will produce sounds great

>
> Yes, it's true that after things are implemented it can be more
> difficult to change them- but if you're concerned about the specific
> on-disk format of the keyring then please help us understand what your
> concern is there.  The concern you've raised so far is just that you'd
> like an option to have an external keyring, and that's fine, but we are
> also going to need to have an internal one and which comes first doesn't
> seem particularly material to me.  I don't think having one or the other
> in place first will really detract or make more difficult the adding of
> the other later.
>
What I'd like specifically is to have the option of an external
keyring as a first class key store, where the keys stored in the
external keyring are used directly to encrypt the database contents.
The examples in this discussion so far have all put the internal
keyring between any other crypto infrastructure and the file
encryption process. Your description above of changes to pass keys out
of the command sound like they may have addressed this.

Regarding the on-disk format, separate storage of the key HMACs and
the wrapped keys sounds like a requirement for implementing a fully
external keyring or doing key wrapping externally via an OpenSSL or
nss tls pluggable engine. I'm still looking through the code.

Thanks
Alastair



Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Alastair Turner (minion@decodable.me) wrote:
> On Sun, 20 Dec 2020 at 22:59, Stephen Frost <sfrost@snowman.net> wrote:
> > Yes, it's true that after things are implemented it can be more
> > difficult to change them- but if you're concerned about the specific
> > on-disk format of the keyring then please help us understand what your
> > concern is there.  The concern you've raised so far is just that you'd
> > like an option to have an external keyring, and that's fine, but we are
> > also going to need to have an internal one and which comes first doesn't
> > seem particularly material to me.  I don't think having one or the other
> > in place first will really detract or make more difficult the adding of
> > the other later.
>
> What I'd like specifically is to have the option of an external
> keyring as a first class key store, where the keys stored in the
> external keyring are used directly to encrypt the database contents.

Right- understood, but that's not what the patch does today and there
isn't anyone who has proposed such a patch, nor explained why we
couldn't add that capability later.

> The examples in this discussion so far have all put the internal
> keyring between any other crypto infrastructure and the file
> encryption process. Your description above of changes to pass keys out
> of the command sound like they may have addressed this.

The updates are intended to make it so that the KEK which wraps the
internal keyring will be obtained directly from the cluster key command,
pushing the transformation of a passphrase (should one be provided) into
a proper key to the script, but otherwise allowing the result of things
like 'openssl rand -hex 32' to be used directly as the KEK.

> Regarding the on-disk format, separate storage of the key HMACs and
> the wrapped keys sounds like a requirement for implementing a fully
> external keyring or doing key wrapping externally via an OpenSSL or
> nss tls pluggable engine. I'm still looking through the code.

This seems a bit confusing as the question at hand is the on-disk format
of the internal keyring, not anything to do with an external keyring
(which we wouldn't have any storage of and so I don't understand how it
makes sense to even discuss the idea of storage for the external
keyring..?).

Again, we will need the ability to perform the encryption using a simple
passphrase provided by the user, while using multiple randomly generated
data encryption keys, and that's what the latest set of patches are
geared towards.  I'm generally in support of adding additional
capabilities in this area in the future, but I don't think we need to
bog down the current effort by demanding that be implemented today.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
Thanks Stephen,

On Mon, 21 Dec 2020 at 00:33, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Alastair Turner (minion@decodable.me) wrote:
...
> >
> > What I'd like specifically is to have the option of an external
> > keyring as a first class key store, where the keys stored in the
> > external keyring are used directly to encrypt the database contents.
>
> Right- understood, but that's not what the patch does today and there
> isn't anyone who has proposed such a patch, nor explained why we
> couldn't add that capability later.
>

I'm keen to propose a patch along those lines, even if just as a
sample. Minimising the amount of code which would have to be unpicked
in that effort would be great. Particularly avoiding any changes to
data structures, since that may effectively block adding new
capabilities.

>
> >...Your description above of changes to pass keys out
> > of the command sound like they may have addressed this.
>
> The updates are intended to make it so that the KEK which wraps the
> internal keyring will be obtained directly from the cluster key command,
> pushing the transformation of a passphrase (should one be provided) into
> a proper key to the script, but otherwise allowing the result of things
> like 'openssl rand -hex 32' to be used directly as the KEK.
>

Sounds good.

>
> > Regarding the on-disk format, separate storage of the key HMACs and
> > the wrapped keys sounds like a requirement for implementing a fully
> > external keyring or doing key wrapping externally via an OpenSSL or
> > nss tls pluggable engine. I'm still looking through the code.
>
> This seems a bit confusing as the question at hand is the on-disk format
> of the internal keyring, not anything to do with an external keyring
> (which we wouldn't have any storage of and so I don't understand how it
> makes sense to even discuss the idea of storage for the external
> keyring..?).
>

A requirement for validating the keys, no matter which keyring they
came from, was mentioned along the way. Since there would be no point
in validating keys from the internal ring twice, storing the
validation data (HMACs in the current design) separately from the
internal keyring's keys seems like it would avoid changing the data
structures for the internal keyring when adding an external option.

>
> Again, we will need the ability to perform the encryption using a simple
> passphrase provided by the user, while using multiple randomly generated
> data encryption keys, and that's what the latest set of patches are
> geared towards.  I'm generally in support of adding additional
> capabilities in this area in the future, but I don't think we need to
> bog down the current effort by demanding that be implemented today.
>

I'm really not looking to bog down the current effort.

I'll have a go at adding another keyring and/or abstracting the key
wrap and see how well a true peer to the passphrase approach fits in.

Regards
Alastair



Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Alastair Turner (minion@decodable.me) wrote:
> On Mon, 21 Dec 2020 at 00:33, Stephen Frost <sfrost@snowman.net> wrote:
> > * Alastair Turner (minion@decodable.me) wrote:
> > > What I'd like specifically is to have the option of an external
> > > keyring as a first class key store, where the keys stored in the
> > > external keyring are used directly to encrypt the database contents.
> >
> > Right- understood, but that's not what the patch does today and there
> > isn't anyone who has proposed such a patch, nor explained why we
> > couldn't add that capability later.
>
> I'm keen to propose a patch along those lines, even if just as a
> sample. Minimising the amount of code which would have to be unpicked
> in that effort would be great. Particularly avoiding any changes to
> data structures, since that may effectively block adding new
> capabilities.

Ok, sure, but are there such changes that would need to be made for this
case...?  Seems to only be speculation at this point.

> > > Regarding the on-disk format, separate storage of the key HMACs and
> > > the wrapped keys sounds like a requirement for implementing a fully
> > > external keyring or doing key wrapping externally via an OpenSSL or
> > > nss tls pluggable engine. I'm still looking through the code.
> >
> > This seems a bit confusing as the question at hand is the on-disk format
> > of the internal keyring, not anything to do with an external keyring
> > (which we wouldn't have any storage of and so I don't understand how it
> > makes sense to even discuss the idea of storage for the external
> > keyring..?).
>
> A requirement for validating the keys, no matter which keyring they
> came from, was mentioned along the way. Since there would be no point
> in validating keys from the internal ring twice, storing the
> validation data (HMACs in the current design) separately from the
> internal keyring's keys seems like it would avoid changing the data
> structures for the internal keyring when adding an external option.

This doesn't strike me as very clear- specifically which HMACs are you
referring to which would need to be stored separately from the internal
keyring to make it possible for an external keyring to be used?

> > Again, we will need the ability to perform the encryption using a simple
> > passphrase provided by the user, while using multiple randomly generated
> > data encryption keys, and that's what the latest set of patches are
> > geared towards.  I'm generally in support of adding additional
> > capabilities in this area in the future, but I don't think we need to
> > bog down the current effort by demanding that be implemented today.
>
> I'm really not looking to bog down the current effort.

Great, glad to hear that.

> I'll have a go at adding another keyring and/or abstracting the key
> wrap and see how well a true peer to the passphrase approach fits in.

Having patches to review and consider definitely helps, imv.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sun, Dec 20, 2020 at 05:59:09PM -0500, Stephen Frost wrote:
> However, after chatting with Bruce about it for a bit this weekend, I'm
> not sure that we need to tackle the second case today.  I don't think
> there's any doubt that there will be users who will want PG to manage
> the keyring and to be able to work with just a passphrase, as Bruce has
> worked to make happen here and which we have a patch for.  I'm hoping
> Bruce will post a new patch soon based on the work that he and I
> discussed today (mostly just clarifications around keys vs. passphrases
> and having the PG side accept a key which the command that's run will
> produce).  With that, I'm feeling pretty comfortable that we can move
> forward and at least get this initial work committed.

OK, here it the updated patch.  The major change, as Stephen pointed
out, is that the cluser_key_command (was cluster_passphrase_command) now
returns a hex-string representing the 32-byte KEK, rather than having
the server hash whatever the command used to return.  This avoids
double-hashing in cases where you are _not_ using a passphrase, but are
computing a random 32-byte value in the script.  This does mean the
script has to run sha256 for passphrases, but it seems like a win. 
Updated script are attached too.

The URLs are still accurate:

    https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
    https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sun, Dec 20, 2020 at 09:38:55PM -0500, Stephen Frost wrote:
> > I'll have a go at adding another keyring and/or abstracting the key
> > wrap and see how well a true peer to the passphrase approach fits in.
> 
> Having patches to review and consider definitely helps, imv.

I disagree.  Our order is:

    Desirability -> Design -> Implement -> Test -> Review -> Commit
    https://wiki.postgresql.org/wiki/Todo#Development_Process

so, by posting a patch, you are decided to skip the first _two_
requirements.  I think it might be time for me to create a new thread
so it is not confused by whatever is posted here.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sun, Dec 20, 2020 at 05:59:09PM -0500, Stephen Frost wrote:
> I do generally like the idea of having the single command able to be
> used to either provide the KEK (where PG manages the keyring
> internally), or called multiple times to retrieve the DEKs (in which
> case PG wouldn't be managing the keyring).
> 
> However, after chatting with Bruce about it for a bit this weekend, I'm
> not sure that we need to tackle the second case today.  I don't think
> there's any doubt that there will be users who will want PG to manage
> the keyring and to be able to work with just a passphrase, as Bruce has
> worked to make happen here and which we have a patch for.  I'm hoping
> Bruce will post a new patch soon based on the work that he and I
> discussed today (mostly just clarifications around keys vs. passphrases
> and having the PG side accept a key which the command that's run will
> produce).  With that, I'm feeling pretty comfortable that we can move
> forward and at least get this initial work committed.

I agree with very little of this sub-discussion, but I am still reading
it to see if I can get useful ideas from it.  Looking at what we used to
do with 'passphrase', we did the prompting in the script, and did the
hash, HMAC validation, data key generation and its wrap in the server. 
With the 'cluster key' patch I just posted, the hashing of the
passphrase is removed from the server and happens only in the script,
because in many cases the hashing is not needed, and double-hashing is
less secure, so that seems like a win.

To go any further, you are starting to look at possible data key
generation in the script, either at boot time, and then wrapped with a
passphrase, or just retrieved on every boot.  That would create a larger
burden for any script, meaning a passphrase usage would have to not only
hash, which it does now, but also verify its MAC, then decrypt keys
stored in the file system, then echo those to its output, to be read by
the server.  I think this is the big point --- I have five scripts, and
only one needs to hash its input (passphrase).  If you go any farther in
pushing code into the scripts, these scripts become much harder to
write, and much harder to do error checks.

I think the common case is passphrase, so I want to optimize for that. 
Pushing anymore features into the scripts is going to make that common
case less reliable, which I am opposed to.

Also, as far as Desirability, we only have _one_ person saying people
will want more options.  I need to hear from a lot more people before
this gets added to Postgres.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
"David G. Johnston"
Date:
On Mon, Dec 21, 2020 at 2:44 PM Bruce Momjian <bruce@momjian.us> wrote:
On Sun, Dec 20, 2020 at 09:38:55PM -0500, Stephen Frost wrote:
> > I'll have a go at adding another keyring and/or abstracting the key
> > wrap and see how well a true peer to the passphrase approach fits in.
>
> Having patches to review and consider definitely helps, imv.

I disagree.  Our order is:

        Desirability -> Design -> Implement -> Test -> Review -> Commit
        https://wiki.postgresql.org/wiki/Todo#Development_Process

so, by posting a patch, you are decided to skip the first _two_
requirements.  I think it might be time for me to create a new thread
so it is not confused by whatever is posted here.


I agree with Stephen; so maybe that part of the Wiki needs to be updated instead of having it sit there for use as a hammer when someone disagrees with a proffered patch.

Or maybe consider that "a patch" doesn't only mean "implement" - it is simply another language through which desirability and design can also be communicated.  The author gets to decide how wordy their prose is and choosing a wordy but concise language that is understandable by the vast majority of the target audience seems like it should be considered acceptable.

David J.

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec 21, 2020 at 03:00:32PM -0700, David G. Johnston wrote:
> I agree with Stephen; so maybe that part of the Wiki needs to be updated
> instead of having it sit there for use as a hammer when someone disagrees with
> a proffered patch.
> 
> Or maybe consider that "a patch" doesn't only mean "implement" - it is simply
> another language through which desirability and design can also be
> communicated.  The author gets to decide how wordy their prose is and choosing
> a wordy but concise language that is understandable by the vast majority of the
> target audience seems like it should be considered acceptable.

If you want to debate that TODO section, you will need to start a new
thread.  I suggest Alastair's patch also appear in a new thread.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec 21, 2020 at 04:35:14PM -0500, Bruce Momjian wrote:
> On Sun, Dec 20, 2020 at 05:59:09PM -0500, Stephen Frost wrote:
> OK, here it the updated patch.  The major change, as Stephen pointed
> out, is that the cluser_key_command (was cluster_passphrase_command) now
> returns a hex-string representing the 32-byte KEK, rather than having
> the server hash whatever the command used to return.  This avoids
> double-hashing in cases where you are _not_ using a passphrase, but are
> computing a random 32-byte value in the script.  This does mean the
> script has to run sha256 for passphrases, but it seems like a win. 
> Updated script are attached too.

Attached is the script patch.  It is also at:

    https://github.com/postgres/postgres/compare/master...bmomjian:cfe-sh.diff

I think it still needs docs but those will have to be done after the
code doc patch is added.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Mon, Dec 21, 2020 at 10:07:48PM -0500, Bruce Momjian wrote:
> Attached is the script patch.  It is also at:
> 
>     https://github.com/postgres/postgres/compare/master...bmomjian:cfe-sh.diff
> 
> I think it still needs docs but those will have to be done after the
> code doc patch is added.

Here is an updated patch.  Are people happy with the Makefile, its
location in the source tree, and the install directory name?  I used the
directory name 'auth_commands' because I thought 'auth' was too easily
misinterpreted. I put the scripts in /src/backend/utils/auth_commands. 
It also contains a script that can be used for SSL passphrase prompting,
but I haven't written the C code for that yet.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Alastair Turner
Date:
Hi Bruce

In ckey_passphrase.sh.sample

+
+echo "$PASS" | sha256sum | cut -d' ' -f1
+

Under the threat model discussed, a copy of the keyfile could be
attacked offline. So getting from passphrase to DEKs should be as
resource intensive as possible to slow down brute-force attempts.
Instead of just a SHA hash, this should be at least a PBKDF2 (PKCS#5)
operation or preferably scrypt (which is memory intensive as well as
computationally intensive). While OpenSSL provides these key
derivation options for it's encoding operations, it does not offer a
command line option for key derivation using them. What's your view on
including third party utilities like nettlepbkdf in the sample or
providing this utility as a compiled program?

In the same theme - in ossl_cipher_ctx_create and the functions using
that context should rather be using the key-wrapping variants of the
cipher. As well as lower throughput, with the resulting effects on
brute force attempts, the key wrap variants provide more robust
ciphertext output
"
KW, KWP, and TKW are each robust in the sense that each bit of output
can be expected to depend in a nontrivial fashion on each bit of
input, even when the length of the input data is greater than one
block. This property is achieved at the cost of a considerably lower
throughput rate, compared to other authenticated-encryption modes, but
the tradeoff may be appropriate for some key management applications.
For example, a robust method may be desired when the length of the
keys to be protected is greater than the block size of the underlying
block cipher, or when the value of the protected data is very high.
" - NIST SP 800-38F

and aim to manage the risk of short or predictable IVs ([1] final para
of page 1)

On Tue, 22 Dec 2020 at 15:40, Bruce Momjian <bruce@momjian.us> wrote:
>
> Here is an updated patch.  Are people happy with the Makefile, its
> location in the source tree, and the install directory name?  I used the
> directory name 'auth_commands' because I thought 'auth' was too easily
> misinterpreted. I put the scripts in /src/backend/utils/auth_commands.
>

What's implemented in these patches is an internal keystore, wrapped
with a key derived from a passphrase. I'd think that the scripts
directory should reflect what they interact with, so
'keystore_commands' or 'local_keystore_command' sounds more specific
and therefore better than 'auth_commands'.

Regards
Alastair

[1] https://web.cs.ucdavis.edu/~rogaway/papers/keywrap.pdf



Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 22, 2020 at 08:15:27PM +0000, Alastair Turner wrote:
> Hi Bruce
> 
> In ckey_passphrase.sh.sample
> 
> +
> +echo "$PASS" | sha256sum | cut -d' ' -f1
> +
> 
> Under the threat model discussed, a copy of the keyfile could be
> attacked offline. So getting from passphrase to DEKs should be as
> resource intensive as possible to slow down brute-force attempts.
> Instead of just a SHA hash, this should be at least a PBKDF2 (PKCS#5)

I am satisfied with the security of SHA256.

> On Tue, 22 Dec 2020 at 15:40, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > Here is an updated patch.  Are people happy with the Makefile, its
> > location in the source tree, and the install directory name?  I used the
> > directory name 'auth_commands' because I thought 'auth' was too easily
> > misinterpreted. I put the scripts in /src/backend/utils/auth_commands.
> >
> 
> What's implemented in these patches is an internal keystore, wrapped
> with a key derived from a passphrase. I'd think that the scripts
> directory should reflect what they interact with, so
> 'keystore_commands' or 'local_keystore_command' sounds more specific
> and therefore better than 'auth_commands'.

The point is that some commands are used for keystore and some for SSL
certificate passphrase entry, hence "auth".

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 22, 2020 at 04:13:06PM -0500, Bruce Momjian wrote:
> On Tue, Dec 22, 2020 at 08:15:27PM +0000, Alastair Turner wrote:
> > Hi Bruce
> > 
> > In ckey_passphrase.sh.sample
> > 
> > +
> > +echo "$PASS" | sha256sum | cut -d' ' -f1
> > +
> > 
> > Under the threat model discussed, a copy of the keyfile could be
> > attacked offline. So getting from passphrase to DEKs should be as
> > resource intensive as possible to slow down brute-force attempts.
> > Instead of just a SHA hash, this should be at least a PBKDF2 (PKCS#5)
> 
> I am satisfied with the security of SHA256.

Sorry, I should have said I am happy with a SHA512 HMAC in a 256-bit
keyspace.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Tue, Dec 22, 2020 at 10:40:17AM -0500, Bruce Momjian wrote:
> On Mon, Dec 21, 2020 at 10:07:48PM -0500, Bruce Momjian wrote:
> > Attached is the script patch.  It is also at:
> > 
> >     https://github.com/postgres/postgres/compare/master...bmomjian:cfe-sh.diff
> > 
> > I think it still needs docs but those will have to be done after the
> > code doc patch is added.
> 
> Here is an updated patch.  Are people happy with the Makefile, its
> location in the source tree, and the install directory name?  I used the
> directory name 'auth_commands' because I thought 'auth' was too easily
> misinterpreted. I put the scripts in /src/backend/utils/auth_commands. 
> It also contains a script that can be used for SSL passphrase prompting,
> but I haven't written the C code for that yet.

Here is a new patch, build on previous patches, which allows for the SSL
passphrase to be prompted from the terminal.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Sat, Dec 19, 2020 at 11:45:08AM +0000, Alastair Turner wrote:
> On Fri, 18 Dec 2020 at 21:36, Stephen Frost <sfrost@snowman.net> wrote:
> > These ideas don't seem too bad but I'd think we'd pass which key we want
> > on the command-line using a %i or something like that, rather than using
> > stdin, unless there's some reason that would be an issue..?  Certainly
> > the CLI utilities I've seen tend to expect the name of the secret that
> > you're asking for to be passed on the command line.
> 
> Agreed. I was working on the assumption that the calling process
> (initdb or pg_ctl) would have access to the challenge material and be
> passing it to the utility, so putting it in a command line would allow
> it to leak. If the utility is accessing the challenge material
> directly, then just an indicator of which key to work with would be a
> lot simpler, true.

I want to repeat here what I said in another thread:

> I think ultimately we will need three commands to control the keys.
> First, there is the cluster_key_command, which we have now.  Second, I
> think we will need an optional command which returns random bytes ---
> this would allow users to get random bytes from a different source than
> that used by the server code.
> 
> Third, we will probably need a command that returns the data encryption
> keys directly, either heap/index or WAL keys, probably based on key
> number --- you pass the key number you want, and the command returns the
> data key.  There would not be a cluster key in this case, but the
> command could still prompt the user for perhaps a password to the KMS
> server. It could not be used if any of the previous two commands are
> used. I assume an HMAC would still be stored in the pg_cryptokeys
> directory to check that the right key has been returned.
> 
> I thought we should implement the first command, because it will
> probably be the most common and easiest to use, and then see what people
> want added.

There is also a fourth option where the command returns multiple keys,
one per line of hex digits.  That could be written in shell script, but
it would be fragile and complex.  It could be written in Perl, but that
would add a new language requirement for this feature.  It could be
written in C, but that would limits its flexibility because changes
would require a recompile, and you would probably need that C file to
call external scripts to tailor input like we do now from the server.

You could actually write a full implemention of what we do on the server
side in client code, but pg_alterckey would not work, since it would not
know the data format, so we would need another cluster key alter for that.

It could be written as a C extension, but that would be also have C's
limitations.  In summary, having the server do most of the complex work
for the default case seems best, and eventually allowing the ability for
the client to do everything seems ideal.  I think we need more input
before we go beyond what we do now.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Fabien COELHO
Date:
> I want to repeat here what I said in another thread:
>
>> I think ultimately we will need three commands to control the keys.
>> First, there is the cluster_key_command, which we have now.  Second, I
>> think we will need an optional command which returns random bytes ---
>> this would allow users to get random bytes from a different source than
>> that used by the server code.
>>
>> Third, we will probably need a command that returns the data encryption
>> keys directly, either heap/index or WAL keys, probably based on key
>> number --- you pass the key number you want, and the command returns the
>> data key.  There would not be a cluster key in this case, but the
>> command could still prompt the user for perhaps a password to the KMS
>> server. It could not be used if any of the previous two commands are
>> used. I assume an HMAC would still be stored in the pg_cryptokeys
>> directory to check that the right key has been returned.
>>
>> I thought we should implement the first command, because it will
>> probably be the most common and easiest to use, and then see what people
>> want added.
>
> There is also a fourth option where the command returns multiple keys,
> one per line of hex digits.  That could be written in shell script, but
> it would be fragile and complex.  It could be written in Perl, but that
> would add a new language requirement for this feature.  It could be
> written in C, but that would limits its flexibility because changes
> would require a recompile, and you would probably need that C file to
> call external scripts to tailor input like we do now from the server.
>
> You could actually write a full implemention of what we do on the server
> side in client code, but pg_alterckey would not work, since it would not
> know the data format, so we would need another cluster key alter for that.
>
> It could be written as a C extension, but that would be also have C's
> limitations.  In summary, having the server do most of the complex work
> for the default case seems best, and eventually allowing the ability for
> the client to do everything seems ideal.  I think we need more input
> before we go beyond what we do now.

As I said in the commit thread, I disagree with this approach because it 
pushes for no or partial or maybe bad design.

I think that an API should be carefully thought about, without assumption 
about the underlying cryptography (algorithm, key lengths, modes, how keys 
are derived and stored, and so on), and its usefulness be demonstrated by 
actually being used for one implementation which would be what is 
currently being proposed in the patch, and possibly others thrown in for 
free.

The implementations should not have to be in any particular language: 
Shell, Perl, Python, C should be possible.

After giving it more thought during the day, I think that only one
command and a basic protocol is needed. Maybe something as simple as

   /path/to/command --options arguments…

With a basic (text? binary?) protocol on stdin/stdout (?) for the 
different functions. What the command actually does (connect to a remote 
server, ask for a master password, open some other database, whatever) 
should be irrelevant to pg, which would just get and pass bunch of bytes 
to functions, which could use them for keys, secrets, whatever, and be 
easily replaceable.

The API should NOT make assumptions about the cryptographic design, what 
depends about what, where things are stored… ISTM that Pg should only care 
about naming keys, holding them when created/retrieved (but not create 
them), basically interacting with the key manager, passing the stuff to 
functions for encryption/decryption seen as black boxes.

I may have suggested something along these lines at the beginning of the 
key management thread, probably. Not going this way implicitely implies 
making some assumptions which may or may not suit other use cases, so 
makes them specific not generic. I do not think pg should do that.

-- 
Fabien.

Re: Proposed patch for key management

From
Bruce Momjian
Date:
On Mon, Dec 28, 2020 at 06:15:44PM -0400, Fabien COELHO wrote:
> The API should NOT make assumptions about the cryptographic design, what
> depends about what, where things are stored… ISTM that Pg should only care
> about naming keys, holding them when created/retrieved (but not create
> them), basically interacting with the key manager, passing the stuff to
> functions for encryption/decryption seen as black boxes.
> 
> I may have suggested something along these lines at the beginning of the key
> management thread, probably. Not going this way implicitely implies making
> some assumptions which may or may not suit other use cases, so makes them
> specific not generic. I do not think pg should do that.

I am not sure what else I can add to this discussion.  Having something
that is completely external might be a nice option, but I don't think it
is the common use-case, and will make the most common use cases more
complex.  Adding a pre-defined system will not prevent future work on a
completely external option.  I know archive_command is completely
external, but that is much simpler than what would need to be done for
key management, key rotation, and key verification.

I will say that if the community feels external-only should be the only
option, I will stop working on this feature because I feel the result
would be too fragile to be reliable, and I would not want to be
associated with it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Fabien COELHO (coelho@cri.ensmp.fr) wrote:
> I think that an API should be carefully thought about, without assumption
> about the underlying cryptography (algorithm, key lengths, modes, how keys
> are derived and stored, and so on), and its usefulness be demonstrated by
> actually being used for one implementation which would be what is currently
> being proposed in the patch, and possibly others thrown in for free.

Perhaps I'm misunderstanding, but this is basically what we have in the
currently proposed patch as 'cipher.h', with cipher.c as a stub that
fails if called directly, and cipher_openssl.c with the actual OpenSSL
based implementation.

> The implementations should not have to be in any particular language: Shell,
> Perl, Python, C should be possible.

I disagree that it makes any sense to pass actual encryption out to a
shell script.

> After giving it more thought during the day, I think that only one
> command and a basic protocol is needed. Maybe something as simple as
>
>   /path/to/command --options arguments…

This is what's proposed- a command is run to acquire the KEK (as a hex
encoded set of bytes, making it possible to use a shell script, which is
what the patch does too).

> With a basic (text? binary?) protocol on stdin/stdout (?) for the different
> functions. What the command actually does (connect to a remote server, ask
> for a master password, open some other database, whatever) should be
> irrelevant to pg, which would just get and pass bunch of bytes to functions,
> which could use them for keys, secrets, whatever, and be easily replaceable.

Right- we just get bytes back from the command and then we use that as
the KEK.  How that scripts gets those bytes is entirely up to the script
and is not an issue for PG to worry about.

> The API should NOT make assumptions about the cryptographic design, what
> depends about what, where things are stored… ISTM that Pg should only care
> about naming keys, holding them when created/retrieved (but not create
> them), basically interacting with the key manager, passing the stuff to
> functions for encryption/decryption seen as black boxes.

The API to fetch the KEK doesn't care at all about where it's stored or
how it's derived or anything like that.  There's a relatively small
change which could be made to have PG request all of the keys that it'll
need on startup, if we want to go there as has been suggested elsewhere,
but even if we do that, PG needs to be able to do that itself too,
otherwise it's not a complete capability and there seems little point in
doing something that's just a pass-thru to something else and isn't able
to really be used.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Bruce Momjian
Date:
On Wed, Dec 30, 2020 at 06:49:34PM -0500, Stephen Frost wrote:
> The API to fetch the KEK doesn't care at all about where it's stored or
> how it's derived or anything like that.  There's a relatively small
> change which could be made to have PG request all of the keys that it'll
> need on startup, if we want to go there as has been suggested elsewhere,
> but even if we do that, PG needs to be able to do that itself too,
> otherwise it's not a complete capability and there seems little point in
> doing something that's just a pass-thru to something else and isn't able
> to really be used.

Right now, the script returns a cluster key (KEK), and during initdb the
server generates data encryption keys and wraps them with the KEK. 
During server start, the server validates the KEK and decrypts the data
keys.  pg_alterckey allows changing the KEK.

I think Fabien is saying this all should _only_ be done using external
tools --- that's what I don't agree with.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key management

From
Fabien COELHO
Date:
Hello Bruce,

>> The API should NOT make assumptions about the cryptographic design, what
>> depends about what, where things are stored… ISTM that Pg should only care
>> about naming keys, holding them when created/retrieved (but not create
>> them), basically interacting with the key manager, passing the stuff to
>> functions for encryption/decryption seen as black boxes.
>>
>> I may have suggested something along these lines at the beginning of the key
>> management thread, probably. Not going this way implicitely implies making
>> some assumptions which may or may not suit other use cases, so makes them
>> specific not generic. I do not think pg should do that.
>
> I am not sure what else I can add to this discussion.  Having something 
> that is completely external might be a nice option, but I don't think it 
> is the common use-case, and will make the most common use cases more 
> complex.

I'm unsure about what is the "common use case". ISTM that having the 
postgres process holding the master key looks like a "no" for me in any 
use case with actual security concern: as the database must be remotely 
accessible, a reasonable security assumption is that a pg account could be 
compromised, and the "master key" (if any, that is just one particular 
cryptographic design) should not be accessible in that case. The first 
barrier would be pg admin account. With external, the options are that the 
key is hold by another id, so the host must be compromised as well, and 
could even be managed remotely on another host (I agree that this later 
option would be fragile because of the implied network connection, but it 
would be a tradeoff depending on the security concerns, but it pretty much 
correspond to the kerberos model).

> Adding a pre-defined system will not prevent future work on a
> completely external option.

Yes and no.

Having two side-by-side cluster-encryption scheme in core, the first that 
could be implemented on top of the second without much effort, would look 
pretty weird. Moreover, required changes in core to allow an API are the 
very same as the one in this patch. The other concern I have with doing 
the cleaning work afterwards is that the API would be somehow in the 
middle of the existing functions if it has not been thought of before.

> I know archive_command is completely external, but that is much simpler 
> than what would need to be done for key management, key rotation, and 
> key verification.

I'm not sure of the design of the key rotation stuff as I recall it from 
the patches I looked at, but the API design looks like the more pressing 
issue. First, the mere existence of an "master key" is a cryptographic 
choice which is debatable, because it creates issues if one must be able 
to change it, hence the whole "key rotation" stuff. ISTM that what core 
needs to know is that one particular key is changed by a new one and the 
underlying file is rewritten. It should not need to know anything about 
master keys and key derivation at all. It should need to know that the 
user wants to change file keys, though.

> I will say that if the community feels external-only should be the only
> option, I will stop working on this feature because I feel the result
> would be too fragile to be reliable,

I'm do not see why it would be the case. I'm just arguing to have key 
management in a separate, possibly suid something-else, process, which 
given the security concerns which dictates the feature looks like a must 
have, or at least must be possible. From a line count point of view, it 
should be a small addition to the current code.

> and I would not want to be associated with it.

You do as you want. I'm no one, you can ignore me and proceed to commit 
whatever you want. I'm only advising to the best of my ability, what I 
think should be the level of API pursued for such a feature in pg, at the 
design level.

I feel that the current proposal is built around one particular use case 
with many implicit and unnecessary assumptions without giving much 
thoughts to the generic API which should really be pg concern, and I do 
not think that it can really be corrected once the ship has sailed, hence 
my attempt at having some thoughts given to the matter before that.

IMHO, the *minimum* should be to have the API clearly designed, and the 
implementation compatible with it at some level, including not having 
assumptions about underlying cryptographic functions and key derivations 
(I mean at the API level, the code itself can do it obviously).

If you want a special "key_mgt_command = internal" because you feel that 
processes are fragile and unreliable and do not bring security, so be it, 
but I think that the API should be designed beforehand nevertheless.

Note that if people think that I'm wrong, then I'm wrong by definition.

-- 
Fabien.

Re: Proposed patch for key managment

From
Fabien COELHO
Date:
Hello Stephen,

>> The implementations should not have to be in any particular language: Shell,
>> Perl, Python, C should be possible.
>
> I disagree that it makes any sense to pass actual encryption out to a
> shell script.

Yes, sure. I'm talking about key management. For actual crypto functions, 
ISTM that they should be "replaceable", i.e. if some institution does not 
believe in AES then they could switch to something else.

>> After giving it more thought during the day, I think that only one
>> command and a basic protocol is needed. Maybe something as simple as
>>
>>   /path/to/command --options arguments…
>
> This is what's proposed- a command is run to acquire the KEK (as a hex
> encoded set of bytes, making it possible to use a shell script, which is
> what the patch does too).

Yep, but that is not what I'm trying to advocate. The "KEK" (if any), 
would stay in the process, not be given back to the database or command 
using it. Then the database/tool would interact with the command to get 
the actual per-file keys when needed.

> [...] The API to fetch the KEK doesn't care at all about where it's 
> stored or how it's derived or anything like that.

> There's a relatively small change which could be made to have PG request 
> all of the keys that it'll need on startup, if we want to go there as 
> has been suggested elsewhere, but even if we do that, PG needs to be 
> able to do that itself too, otherwise it's not a complete capability and 
> there seems little point in doing something that's just a pass-thru to 
> something else and isn't able to really be used.

I do not understand. Postgres should provide a working implementation, 
whether the functions are directly inside pg or though an external 
process, they need to be there anyway. I'm trying to fix a clean, possibly 
external API so that it is possible to have something different from the 
choices made in the patch.

-- 
Fabien.

Re: Proposed patch for key managment

From
Fabien COELHO
Date:
Hello,

>> The API to fetch the KEK doesn't care at all about where it's stored or
>> how it's derived or anything like that.  There's a relatively small
>> change which could be made to have PG request all of the keys that it'll
>> need on startup, if we want to go there as has been suggested elsewhere,
>> but even if we do that, PG needs to be able to do that itself too,
>> otherwise it's not a complete capability and there seems little point in
>> doing something that's just a pass-thru to something else and isn't able
>> to really be used.
>
> Right now, the script returns a cluster key (KEK), and during initdb the
> server generates data encryption keys and wraps them with the KEK.
> During server start, the server validates the KEK and decrypts the data
> keys.  pg_alterckey allows changing the KEK.
>
> I think Fabien is saying this all should _only_ be done using external
> tools --- that's what I don't agree with.

Yep.

I could compromise on "could be done using an external tool", but that 
requires designing the API and thinking about where and how things are 
done before everything is hardwired. Designing afterwards is too late. 
ISTM that the current patch does not separate API design and cryptographic 
design, so both are deeply entwined, and would be difficult to 
disentangle.

-- 
Fabien.



Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Wed, Dec 30, 2020 at 06:49:34PM -0500, Stephen Frost wrote:
> > The API to fetch the KEK doesn't care at all about where it's stored or
> > how it's derived or anything like that.  There's a relatively small
> > change which could be made to have PG request all of the keys that it'll
> > need on startup, if we want to go there as has been suggested elsewhere,
> > but even if we do that, PG needs to be able to do that itself too,
> > otherwise it's not a complete capability and there seems little point in
> > doing something that's just a pass-thru to something else and isn't able
> > to really be used.
>
> Right now, the script returns a cluster key (KEK), and during initdb the
> server generates data encryption keys and wraps them with the KEK.
> During server start, the server validates the KEK and decrypts the data
> keys.  pg_alterckey allows changing the KEK.

Right- all of those are really necessary things for PG to have a useful
implementation.

> I think Fabien is saying this all should _only_ be done using external
> tools --- that's what I don't agree with.

I also don't agree that everything should be external.  We effectively
have that today and it certainly doesn't get us any closer to having a
solution in PG, a point which we are frequently pointed to as a reason
why certain, entirely reasonable, use-cases can't be satisfied with PG.

Thanks,

Stephen

Attachment

Re: Proposed patch for key management

From
Stephen Frost
Date:
Greetings,

* Fabien COELHO (coelho@cri.ensmp.fr) wrote:
> >>The API should NOT make assumptions about the cryptographic design, what
> >>depends about what, where things are stored… ISTM that Pg should only care
> >>about naming keys, holding them when created/retrieved (but not create
> >>them), basically interacting with the key manager, passing the stuff to
> >>functions for encryption/decryption seen as black boxes.
> >>
> >>I may have suggested something along these lines at the beginning of the key
> >>management thread, probably. Not going this way implicitely implies making
> >>some assumptions which may or may not suit other use cases, so makes them
> >>specific not generic. I do not think pg should do that.
> >
> >I am not sure what else I can add to this discussion.  Having something
> >that is completely external might be a nice option, but I don't think it
> >is the common use-case, and will make the most common use cases more
> >complex.
>
> I'm unsure about what is the "common use case". ISTM that having the
> postgres process holding the master key looks like a "no" for me in any use
> case with actual security concern: as the database must be remotely
> accessible, a reasonable security assumption is that a pg account could be
> compromised, and the "master key" (if any, that is just one particular
> cryptographic design) should not be accessible in that case. The first
> barrier would be pg admin account. With external, the options are that the
> key is hold by another id, so the host must be compromised as well, and
> could even be managed remotely on another host (I agree that this later
> option would be fragile because of the implied network connection, but it
> would be a tradeoff depending on the security concerns, but it pretty much
> correspond to the kerberos model).

No, this isn't anything like the Kerberos model and there's no case
where the PG account won't have access to the DEK (data encryption key)
during normal operation (except with the possibility of off-loading to a
hardware crypto device which, while interesting, is definitely something
that's of very limited utility and which could be added on as a
capability later anyway.  This is also well understood by those who are
interested in this capability and it's perfectly acceptable that PG will
have, in memory, the DEK.

The keys which are stored on disk with the proposed patch are encrypted
using a KEK which is external to PG- that's all part of the existing
patch and design.

> >Adding a pre-defined system will not prevent future work on a
> >completely external option.
>
> Yes and no.
>
> Having two side-by-side cluster-encryption scheme in core, the first that
> could be implemented on top of the second without much effort, would look
> pretty weird. Moreover, required changes in core to allow an API are the
> very same as the one in this patch. The other concern I have with doing the
> cleaning work afterwards is that the API would be somehow in the middle of
> the existing functions if it has not been thought of before.

This has been considered and the functions which are proposed are
intentionally structured to allow for multiple implementations already,
so it's unclear to me what the argument here is.  Further, we haven't
even gotten to actual cluster encryption yet- this is all just the key
management side of things, which is absolutely a necessary and important
part but argueing that it doesn't address the cluster encryption
approach in core simply doesn't make any sense as that's not a part of
this patch.

Let's have that discussion when we actually get to the point of talking
about cluster encryption.

> >I know archive_command is completely external, but that is much simpler
> >than what would need to be done for key management, key rotation, and key
> >verification.
>
> I'm not sure of the design of the key rotation stuff as I recall it from the
> patches I looked at, but the API design looks like the more pressing issue.
> First, the mere existence of an "master key" is a cryptographic choice which
> is debatable, because it creates issues if one must be able to change it,
> hence the whole "key rotation" stuff. ISTM that what core needs to know is
> that one particular key is changed by a new one and the underlying file is
> rewritten. It should not need to know anything about master keys and key
> derivation at all. It should need to know that the user wants to change file
> keys, though.

No, it's not debatable that a KEK is needed, I disagree entirely.
Perhaps we can have support for an external key store in the future for
the DEKs, but we really need to have our own key management and the
ability to reasonably rotate keys (which is what the KEK provides).
Ideally, we'll get to a point where we can even have multiple DEKs and
deal with rotating them too, but that, if anything, point even stronger
to the need to have a key management system and KEK as we'll be dealing
with that many more keys.

> >I will say that if the community feels external-only should be the only
> >option, I will stop working on this feature because I feel the result
> >would be too fragile to be reliable,
>
> I'm do not see why it would be the case. I'm just arguing to have key
> management in a separate, possibly suid something-else, process, which given
> the security concerns which dictates the feature looks like a must have, or
> at least must be possible. From a line count point of view, it should be a
> small addition to the current code.

All of this hand-waving really isn't helping.

If it's a small addition to the current code then it'd be fantastic if
you'd propose a specific patch which adds what you're suggesting.  I
don't think either Bruce or I would have any issue with others helping
out on this effort, but let's be clear- we need something that *is* part
of core PG, even if we have an ability to have other parts exist outside
of PG.

Thanks,

Stephen

Attachment

Re: Proposed patch for key managment

From
Stephen Frost
Date:
Greetings,

* Fabien COELHO (coelho@cri.ensmp.fr) wrote:
> >>The implementations should not have to be in any particular language: Shell,
> >>Perl, Python, C should be possible.
> >
> >I disagree that it makes any sense to pass actual encryption out to a
> >shell script.
>
> Yes, sure. I'm talking about key management. For actual crypto functions,
> ISTM that they should be "replaceable", i.e. if some institution does not
> believe in AES then they could switch to something else.

The proposed implementation makes it pretty straight-forward to add in
other implementations and other crypto algorithms if people wish to do
so.

> >>After giving it more thought during the day, I think that only one
> >>command and a basic protocol is needed. Maybe something as simple as
> >>
> >>  /path/to/command --options arguments…
> >
> >This is what's proposed- a command is run to acquire the KEK (as a hex
> >encoded set of bytes, making it possible to use a shell script, which is
> >what the patch does too).
>
> Yep, but that is not what I'm trying to advocate. The "KEK" (if any), would
> stay in the process, not be given back to the database or command using it.
> Then the database/tool would interact with the command to get the actual
> per-file keys when needed.

"When needed" is every single write that we do of any file in the entire
backend.  Reaching out to something external every time we go to use the
key really isn't sensible- except, perhaps, in the case where we are
doing full crypto off-loading and we don't actually have to deal with
the KEK or the DEK at all, but that's all on the cluster encryption
side, really, and isn't the focus of this patch and what it's bringing-
all of which is needed anyway.

> >[...] The API to fetch the KEK doesn't care at all about where it's stored
> >or how it's derived or anything like that.
>
> >There's a relatively small change which could be made to have PG request
> >all of the keys that it'll need on startup, if we want to go there as has
> >been suggested elsewhere, but even if we do that, PG needs to be able to
> >do that itself too, otherwise it's not a complete capability and there
> >seems little point in doing something that's just a pass-thru to something
> >else and isn't able to really be used.
>
> I do not understand. Postgres should provide a working implementation,
> whether the functions are directly inside pg or though an external process,
> they need to be there anyway. I'm trying to fix a clean, possibly external
> API so that it is possible to have something different from the choices made
> in the patch.

Right, we need a working implementation that's part of core, that's what
this effort is trying to bring.

Thanks,

Stephen

Attachment

Re: Proposed patch for key management

From
Bruce Momjian
Date:
On Thu, Dec 31, 2020 at 11:11:11AM +0100, Fabien COELHO wrote:
> > I am not sure what else I can add to this discussion.  Having something
> > that is completely external might be a nice option, but I don't think it
> > is the common use-case, and will make the most common use cases more
> > complex.
> 
> I'm unsure about what is the "common use case". ISTM that having the
> postgres process holding the master key looks like a "no" for me in any use
> case with actual security concern: as the database must be remotely
> accessible, a reasonable security assumption is that a pg account could be
> compromised, and the "master key" (if any, that is just one particular
> cryptographic design) should not be accessible in that case. The first
> barrier would be pg admin account.

Let's unpack this logic, since I know others, like Alastair Turner (CC
added), had similar concerns.  Frankly, I feel this feature has limited
security usefulness, so if we don't feel it has sufficient value, let's
mark it as not wanted and move on.

I think there are two basic key configurations:

1.  pass data encryption keys in from an external source
2.  store the data encryption keys wrapped by a key encryption key (KEK)
    passed in from an external source

The current code implements #2 because it simplifies administration,
checking, external key (KEK) rotation, and provides good error reporting
when something goes wrong.  For example, with #1, there is no way to
rotate the externally-stored key except reencrypting the entire cluster.

When using AES256 with GCM (for verification) is #1 more secure than #2?
I don't think so.  If the Postgres account is compromised, they can grab
the data encryption keys as the come in from the external script (#1),
or when they are decrypted by the KEK (#2), or while they are in shared
memory while the server is running.  If the postgres account is not
compromised, I don't think it is any easier to get the data key wrapped
by a KEK than it is to try decrypting the actual heap/index/WAL blocks.
(Once you find the key for one, they would all decrypt.)

> With external, the options are that the
> key is hold by another id, so the host must be compromised as well, and
> could even be managed remotely on another host (I agree that this later
> option would be fragile because of the implied network connection, but it
> would be a tradeoff depending on the security concerns, but it pretty much
> correspond to the kerberos model).

We don't store the data encryption keys raw on the server, but rather
wrap them with a KEK.  If the external keystore is compromised using #1,
you can't rotate those keys without reencrypting the entire cluster.

> > Adding a pre-defined system will not prevent future work on a
> > completely external option.
> 
> Yes and no.
> 
> Having two side-by-side cluster-encryption scheme in core, the first that
> could be implemented on top of the second without much effort, would look
> pretty weird. Moreover, required changes in core to allow an API are the
> very same as the one in this patch. The other concern I have with doing the
> cleaning work afterwards is that the API would be somehow in the middle of
> the existing functions if it has not been thought of before.

The question is whether there is a common use-case, and if so, do we
implement that first, with all the safeguards, and then allow others to
customize?  I don't want to optimize for the rare case if that makes the
common case more error-prone.

> > I know archive_command is completely external, but that is much simpler
> > than what would need to be done for key management, key rotation, and
> > key verification.
> 
> I'm not sure of the design of the key rotation stuff as I recall it from the
> patches I looked at, but the API design looks like the more pressing issue.
> First, the mere existence of an "master key" is a cryptographic choice which
> is debatable, because it creates issues if one must be able to change it,
> hence the whole "key rotation" stuff. ISTM that what core needs to know is

Well, you would want to change the external-stored key independent of
the data encryption keys, which are harder to change.

> > I will say that if the community feels external-only should be the only
> > option, I will stop working on this feature because I feel the result
> > would be too fragile to be reliable,
> 
> I'm do not see why it would be the case. I'm just arguing to have key
> management in a separate, possibly suid something-else, process, which given
> the security concerns which dictates the feature looks like a must have, or
> at least must be possible. From a line count point of view, it should be a
> small addition to the current code.

See above.  Please explain the security value of this complexity.

> > and I would not want to be associated with it.
> 
> You do as you want. I'm no one, you can ignore me and proceed to commit
> whatever you want. I'm only advising to the best of my ability, what I think
> should be the level of API pursued for such a feature in pg, at the design
> level.

You are not the only one who is suggesting this, so we should decide as
a group on an acceptable direction.

> I feel that the current proposal is built around one particular use case
> with many implicit and unnecessary assumptions without giving much thoughts
> to the generic API which should really be pg concern, and I do not think
> that it can really be corrected once the ship has sailed, hence my attempt
> at having some thoughts given to the matter before that.

I have already explained why I am optimizing for the common use case,
and you are not addressing my reasons here.

> IMHO, the *minimum* should be to have the API clearly designed, and the
> implementation compatible with it at some level, including not having
> assumptions about underlying cryptographic functions and key derivations (I
> mean at the API level, the code itself can do it obviously).
> 
> If you want a special "key_mgt_command = internal" because you feel that
> processes are fragile and unreliable and do not bring security, so be it,
> but I think that the API should be designed beforehand nevertheless.

I think we need a separate command for the fully external case, rather
than trying to lay it on top of the command for the internal one.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key management

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Dec 31, 2020 at 11:11:11AM +0100, Fabien COELHO wrote:
> > > I am not sure what else I can add to this discussion.  Having something
> > > that is completely external might be a nice option, but I don't think it
> > > is the common use-case, and will make the most common use cases more
> > > complex.
> >
> > I'm unsure about what is the "common use case". ISTM that having the
> > postgres process holding the master key looks like a "no" for me in any use
> > case with actual security concern: as the database must be remotely
> > accessible, a reasonable security assumption is that a pg account could be
> > compromised, and the "master key" (if any, that is just one particular
> > cryptographic design) should not be accessible in that case. The first
> > barrier would be pg admin account.
>
> Let's unpack this logic, since I know others, like Alastair Turner (CC
> added), had similar concerns.  Frankly, I feel this feature has limited
> security usefulness, so if we don't feel it has sufficient value, let's
> mark it as not wanted and move on.

Considering the amount of requests for this, I don't feel it's at all
reasonable to suggest that it's 'not wanted'.

> I think there are two basic key configurations:
>
> 1.  pass data encryption keys in from an external source
> 2.  store the data encryption keys wrapped by a key encryption key (KEK)
>     passed in from an external source

I agree those are two basic configurations (though they aren't the only
possible ones).

> The current code implements #2 because it simplifies administration,
> checking, external key (KEK) rotation, and provides good error reporting
> when something goes wrong.  For example, with #1, there is no way to
> rotate the externally-stored key except reencrypting the entire cluster.

Right, and there also wouldn't be a very simple way to actually get PG
going with a single key or a passphrase.

> When using AES256 with GCM (for verification) is #1 more secure than #2?
> I don't think so.  If the Postgres account is compromised, they can grab
> the data encryption keys as the come in from the external script (#1),
> or when they are decrypted by the KEK (#2), or while they are in shared
> memory while the server is running.  If the postgres account is not
> compromised, I don't think it is any easier to get the data key wrapped
> by a KEK than it is to try decrypting the actual heap/index/WAL blocks.
> (Once you find the key for one, they would all decrypt.)

I can see some usefulness for supporting #1, but that has got next to
nothing to do with what this patch is about and is all about the *next*
step, which is to actually look at supporting encryption of the rest of
the cluster, beyond just the keys.  We need to get past this first step
though, and it seems to be nearly impossible to do so, which has
frustrated multiple attempts to actually accomplish anything here.

I agree that none of these approaches address an online compromise of
the postgres account.  Thankfully, that isn't actually what this is
intended to address and to talk about that case is just a distraction
that isn't solvable and wastes time.

Thanks,

Stephen

Attachment

Re: Proposed patch for key management

From
Bruce Momjian
Date:
On Thu, Dec 31, 2020 at 12:05:53PM -0500, Stephen Frost wrote:
> > Let's unpack this logic, since I know others, like Alastair Turner (CC
> > added), had similar concerns.  Frankly, I feel this feature has limited
> > security usefulness, so if we don't feel it has sufficient value, let's
> > mark it as not wanted and move on.
> 
> Considering the amount of requests for this, I don't feel it's at all
> reasonable to suggest that it's 'not wanted'.

Well, people ask for (or used to ask for) query hints, so request count
and usefulness aren't always correlated.  I brought up this point
because people sometimes want to add complexity to try and guard against
some exploit that is impossible to guard against, so I thought we should
be clear what we can't protect, no matter how we configure it.

> > When using AES256 with GCM (for verification) is #1 more secure than #2?
> > I don't think so.  If the Postgres account is compromised, they can grab
> > the data encryption keys as the come in from the external script (#1),
> > or when they are decrypted by the KEK (#2), or while they are in shared
> > memory while the server is running.  If the postgres account is not
> > compromised, I don't think it is any easier to get the data key wrapped
> > by a KEK than it is to try decrypting the actual heap/index/WAL blocks.
> > (Once you find the key for one, they would all decrypt.)
> 
> I can see some usefulness for supporting #1, but that has got next to
> nothing to do with what this patch is about and is all about the *next*
> step, which is to actually look at supporting encryption of the rest of
> the cluster, beyond just the keys.  We need to get past this first step
> though, and it seems to be nearly impossible to do so, which has
> frustrated multiple attempts to actually accomplish anything here.
> 
> I agree that none of these approaches address an online compromise of
> the postgres account.  Thankfully, that isn't actually what this is
> intended to address and to talk about that case is just a distraction
> that isn't solvable and wastes time.

I assume people don't want the data keys stored in the file system, even
in encrypted form, and they believe this makes the system more secure.  I
don't think it does, and I think it makes external key rotation very
difficult, among other negatives.  Also, keep in mind any failure of the
key management system could make the cluster unreadable, so making it as
simple/error-free as possible is a big requirement for me.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key management

From
Joshua Drake
Date:


On Wed, Dec 30, 2020 at 3:47 PM Bruce Momjian <bruce@momjian.us> wrote:

I will say that if the community feels external-only should be the only
option, I will stop working on this feature because I feel the result
would be too fragile to be reliable, and I would not want to be
associated with it.


I can say that the people that I work with would prefer an "officially" supported mechanism from Postgresql.org. The use of a generic API would be useful for the ecosystem who wants to build on core functionality but Postgresql should have competent built-in support as well.

JD

Re: Proposed patch for key management

From
Joshua Drake
Date:

> >I will say that if the community feels external-only should be the only
> >option, I will stop working on this feature because I feel the result
> >would be too fragile to be reliable,
>
> I'm do not see why it would be the case. I'm just arguing to have key
> management in a separate, possibly suid something-else, process, which given
> the security concerns which dictates the feature looks like a must have, or
> at least must be possible. From a line count point of view, it should be a
> small addition to the current code.

All of this hand-waving really isn't helping.

If it's a small addition to the current code then it'd be fantastic if
you'd propose a specific patch which adds what you're suggesting.  I
don't think either Bruce or I would have any issue with others helping
out on this effort, but let's be clear- we need something that *is* part
of core PG, even if we have an ability to have other parts exist outside
of PG.

+1

JD
 

Re: Proposed patch for key management

From
Alastair Turner
Date:
Hi Stephen, Bruce, Fabien

On Thu, 31 Dec 2020 at 17:05, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Thu, Dec 31, 2020 at 11:11:11AM +0100, Fabien COELHO wrote:
> > > > I am not sure what else I can add to this discussion.  Having something
> > > > that is completely external might be a nice option, but I don't think it
> > > > is the common use-case, and will make the most common use cases more
> > > > complex.
> > >
> > > I'm unsure about what is the "common use case". ISTM that having the
> > > postgres process holding the master key looks like a "no" for me in any use
> > > case with actual security concern: as the database must be remotely
> > > accessible, a reasonable security assumption is that a pg account could be
> > > compromised, and the "master key" (if any, that is just one particular
> > > cryptographic design) should not be accessible in that case. The first
> > > barrier would be pg admin account.
> >
> > Let's unpack this logic, since I know others, like Alastair Turner (CC
> > added), had similar concerns.  Frankly, I feel this feature has limited
> > security usefulness, so if we don't feel it has sufficient value, let's
> > mark it as not wanted and move on.
>
> Considering the amount of requests for this, I don't feel it's at all
> reasonable to suggest that it's 'not wanted'.
>

FWIW, I think that cluster file encryption has great value, even if we
only consider its incremental value on top of disk or volume
encryption. Particularly in big IT estates where monitoring tools,
backup managers, etc have read access to a lot of mounted volumes.
Limiting the impact of these systems (or their credentials) being
compromised is definitely useful.

>
> > I think there are two basic key configurations:
> >
> > 1.  pass data encryption keys in from an external source
> > 2.  store the data encryption keys wrapped by a key encryption key (KEK)
> >     passed in from an external source
>
> I agree those are two basic configurations (though they aren't the only
> possible ones).
>

If we rephrase these two in terms of what the server process does to
get a key, we could say

 1. Get the keys from another process at startup (and maybe reload) time
 2. Get the wrapped keys from a local file...
 2.a. directly and unwrap them using the server's own logic and user prompt
 2.b. via a library which manages wrapping and user prompting

With the general feeling in the mailing list discussions favouring an
approach in the #2 family, I have been looking at options around 2b. I
was hoping to provide some flexibility for users without adding
complexity to the pg code base and use some long standing,
standardised, features rather than reinventing or recoding the wheel.
It has not been a complete success, or failure.

The nearest thing to a standard for wrapped secret store files is
PKCS#12. I searched and cannot find it discussed on-list in this
context before. It is the format used for Java local keystores in
recent versions. The format is supported, up to a point, by OpenSSL,
nss and gnuTLS. OpenSSL also has command line support for many
operations on the files, including updating the wrapping password.

Content in the PKCS12 files is stored in typed "bags". The challenge
is that the OpenSSL command line tools currently support only the
content types related to TLS operations - certificates, public or
private keys, CRLs, ... and not the untyped secret bag which is
appropriate for storing AES keys. The c APIs will work with this
content but there are not as many helper functions as for other
content types.

For future information - the nss command line tools understand the
secret bags, but don't offer nearly so many options for configuration.

For building something now - the OpenSSL pkcs12 functions provide
features like tags and friendly names for keys, and configurable key
derivation from passwords, but the code to interact with the
secretBags is accumulating quickly in my prototype.

After the long intro, my question - If using a standard format,
managed by a library, for the internal keystore does not result in a
smaller or simpler patch, is there enough other value for this to be
worth considering? For security features, standards and building on
well exercised code bases do really have value, but is it enough...

Regards
Alastair



Re: Proposed patch for key management

From
Fabien COELHO
Date:
Hello Stephen,

>> I'm unsure about what is the "common use case". ISTM that having the
>> postgres process holding the master key looks like a "no" for me in any use
>> case with actual security concern: as the database must be remotely
>> accessible, a reasonable security assumption is that a pg account could be
>> compromised, and the "master key" (if any, that is just one particular
>> cryptographic design) should not be accessible in that case. The first
>> barrier would be pg admin account. With external, the options are that the
>> key is hold by another id, so the host must be compromised as well, and
>> could even be managed remotely on another host (I agree that this later
>> option would be fragile because of the implied network connection, but it
>> would be a tradeoff depending on the security concerns, but it pretty much
>> correspond to the kerberos model).
>
> No, this isn't anything like the Kerberos model and there's no case
> where the PG account won't have access to the DEK (data encryption key)
> during normal operation (except with the possibility of off-loading to a
> hardware crypto device which, while interesting, is definitely something
> that's of very limited utility and which could be added on as a
> capability later anyway.  This is also well understood by those who are
> interested in this capability and it's perfectly acceptable that PG will
> have, in memory, the DEK.

I'm sorry that I'm not very clear. My ranting is having a KEK "master key" 
in pg memory. I think I'm fine at this point with having DEKs available on 
the host to code/decode files. What I meant about kerberos is that the 
setup I was describing was making the database dependent on a remote host, 
which is somehow what keroberos does.

> The keys which are stored on disk with the proposed patch are encrypted
> using a KEK which is external to PG- that's all part of the existing
> patch and design.

Yep. My point is that the patch hardwires a cryptographic design with many 
assumptions, whereas I think it should design an API compatible with a 
larger class of designs, and possibly implement one with KEK and so on, 
I'm fine with that.

>>> Adding a pre-defined system will not prevent future work on a
>>> completely external option.
>>
>> Yes and no.
>>
>> Having two side-by-side cluster-encryption scheme in core, the first that
>> could be implemented on top of the second without much effort, would look
>> pretty weird. Moreover, required changes in core to allow an API are the
>> very same as the one in this patch. The other concern I have with doing the
>> cleaning work afterwards is that the API would be somehow in the middle of
>> the existing functions if it has not been thought of before.
>
> This has been considered and the functions which are proposed are
> intentionally structured to allow for multiple implementations already,

Does it? This is unclear to me from looking at the code, and the limited
documentation does not point to that. I can see that the "kek provider" 
and the "random provider" can be changed, but the overall cryptographic 
design seems hardwired.

> so it's unclear to me what the argument here is.

The argument is that professional cryptophers do wrong designs, and I do 
not see why you would do different. So instead of hardwiring choice that 
you think are THE good ones, ISTM that pg should design a reasonably 
flexible API, and also provide an implementation, obviously.

> Further, we haven't even gotten to actual cluster encryption yet- this 
> is all just the key management side of things,

With hardwired choices about 1 KEK and 2 DEK that are debatable, see 
below.

> which is absolutely a necessary and important part but argueing that it 
> doesn't address the cluster encryption approach in core simply doesn't 
> make any sense as that's not a part of this patch.
>
> Let's have that discussion when we actually get to the point of talking
> about cluster encryption.
>
>>> I know archive_command is completely external, but that is much simpler
>>> than what would need to be done for key management, key rotation, and key
>>> verification.
>>
>> I'm not sure of the design of the key rotation stuff as I recall it from the
>> patches I looked at, but the API design looks like the more pressing issue.
>> First, the mere existence of an "master key" is a cryptographic choice which
>> is debatable, because it creates issues if one must be able to change it,
>> hence the whole "key rotation" stuff. ISTM that what core needs to know is
>> that one particular key is changed by a new one and the underlying file is
>> rewritten. It should not need to know anything about master keys and key
>> derivation at all. It should need to know that the user wants to change file
>> keys, though.
>
> No, it's not debatable that a KEK is needed, I disagree entirely.

ISTM that there is cryptographic design which suits your needs and you 
seem to decide that it is the only possible way to do it It seems that you 
know. As a researcher, I know that I do not know:-) As a software 
engineer, I'm trying to suggest enabling more with the patch, including 
not hardwiring a three-key management scheme.

I'm advocating that you pg not decide for others what is best for them, 
but rather allow multiple choices and implementations through an API. Note 
that I'm not claiming that the 1 KEK + 2 DEK + AES… design is bad. I'm 
claiming that it is just one possible design, quite peculiar though, and 
that pg should not hardwire it, and it does not need to anyway.

The documentation included in the key management patch states: "Key zero 
is the key used to encrypt database heap and index files which are stored 
in the file system, plus temporary files created during database 
operation." Read as is, it suggests that the same key is used to encrypt 
many files. From a cryptographic point of view this looks like a bad idea. 
The mode used is unclear. If this is a stream cypher generated in counter 
mode, it could be a very bad idea. Hopefully this is not what is intended, 
but the included documentation is frankly misleading in that case. IMHO 
there should be one key per file. Also, the CTR mode should be avoided if 
possible because it has quite special properties, unless these properties 
are absolutely necessary.

From what I see in the patch, the DEKs cannot be changed, only the KEK 
which encodes de DEK. I cannot say I'm thrilled by such a property. If the 
KEK is lost, then probably the DEKs are lost as well. Changing the KEK 
does not change anything about it, so that the attacker would be able to 
retrieve the contents of later/previous state of the database if they can 
retrieve a previous/later encoded version, even if the KEK is changed 
every day. This is a bad property which somehow provides a false sense of 
security. I'm not sure that it makes much sense to change a KEK without 
changing the associated DEK, security-wise.

Anyway, I'd suggest to provide a script to re-encrypt a cluster, possibly 
while offline, with different DEKs. From an audit perspective, this could 
be a requirement.

I'd expect a very detailed README or documentation somewhere, but there 
are none in the committed/uncommitted patch.

In https://wiki.postgresql.org/wiki/Transparent_Data_Encryption, I see a 
lot of minute cryptographic design decision which are debatable, and a few 
which seem to be still opened, whereas other things are already in the 
process of being committed. I'm annoyed by the decision to reuse the same 
key for a lot of encryption, even if there are discussions about changing 
the iv in some way. If you have some clever way to build an iv, then 
please please change the key as well?

ISTM that pg at the core level should (only) directly provide:

(1) a per-file encryption scheme, with loadable (hook-replaceable 
functions??) to manage pages, maybe:

   encrypt(page_id, *key, *clear_page, *encrypted_page);
   decrypt(page_id, *key, *encrypted_page, *clear_page);

What encrypt/decrypt does is beyond pg core stuff. Ok, a reasonable
implementation should be provided, obviously, possibly in core. There may 
be some block-size issues if not full pages are encrypted, so maybe the 
interface needs to be a little more subtle.

(2) offer a key management scheme interface, to manage *per-file* keys, 
possibly loadable (hook replaceable?). If all files have the same key, 
which is stored in a directory and encoded with a KEK, this is just one 
(debatable) implementation choice. For that, ISTM that what is needed at 
this level is:

   get_key(file_id (relative name? oid? 8 or 16 bytes something?));

Maybe that should be enough to create a new key implicitely, or maybe it 
should be a different interface, eg get_new_key(file_id), not sure. There 
is the question of the key length, which may for instance include an iv. 
Maybe an information seeking function should be provided, to know about 
sizes or whatever, not sure. Maybe there should/could be a parameter to 
manage key versions, not sure.

The internal key infrastructure would interact with this/these functions 
when needed, and store the key in the file management structure. All this 
should be a black box to core pg, that is the point of an API. This is 
what I'd have expected to see in the "postgres core key management" stuff, 
not a hardwired 2-key store with a KEK.

(3) ISTM that the key management interface should be external, or at least 
it should be possible to make it external easily. I do not think that 
there is a significant performance issue because keys are needed once, and 
once loaded they are there. A simple way to do that is a separate process 
with a basic protocol on stdin/stdout to implement "get_key", which is 
basically already half implemented in the patch for retrieving the KEK.

> Perhaps we can have support for an external key store in the future for
> the DEKs, but we really need to have our own key management and the
> ability to reasonably rotate keys (which is what the KEK provides).

I think that the KEK/2-DEK store is a peculiar design which does not 
*need* to be hardwired deeply in core.

> Ideally, we'll get to a point where we can even have multiple DEKs and
> deal with rotating them too, but that, if anything, point even stronger
> to the need to have a key management system and KEK as we'll be dealing
> with that many more keys.
>
>>> I will say that if the community feels external-only should be the only
>>> option, I will stop working on this feature because I feel the result
>>> would be too fragile to be reliable,
>>
>> I'm do not see why it would be the case. I'm just arguing to have key
>> management in a separate, possibly suid something-else, process, which given
>> the security concerns which dictates the feature looks like a must have, or
>> at least must be possible. From a line count point of view, it should be a
>> small addition to the current code.
>
> All of this hand-waving really isn't helping.

I'm sorry that you feel that. I'm really trying to help improve the 
approach and design with my expertise. As I said in another mail, I'm no 
one, you can always chose to ignore me.

> If it's a small addition to the current code then it'd be fantastic if
> you'd propose a specific patch which adds what you're suggesting.

My point is to discuss an API.

Once there is some agreement the possible API outline, then 
implementations can be provided, but if people do not want an API in the 
first place, I'm not sure I can see the point of sending patches.

> I don't think either Bruce or I would have any issue with others helping 
> out on this effort, but let's be clear- we need something that *is* part 
> of core PG, even if we have an ability to have other parts exist outside 
> of PG.

Yep. I have not suggested that PG should not implement the API, on the 
contrary it definitely should.

-- 
Fabien.

Re: Proposed patch for key management

From
Alastair Turner
Date:
Hi Fabien

On Sat, 2 Jan 2021 at 09:50, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
...
> ISTM that pg at the core level should (only) directly provide:
>
> (1) a per-file encryption scheme, with loadable (hook-replaceable
> functions??) to manage pages, maybe:
>
>    encrypt(page_id, *key, *clear_page, *encrypted_page);
>    decrypt(page_id, *key, *encrypted_page, *clear_page);
>
> What encrypt/decrypt does is beyond pg core stuff. Ok, a reasonable
> implementation should be provided, obviously, possibly in core. There may
> be some block-size issues if not full pages are encrypted, so maybe the
> interface needs to be a little more subtle.
>

There are a lot of specifics of the encryption implementation which
need to be addressed in future patches. This patch focuses on making
keys available to the encryption processes at run-time, so ...

>
> (2) offer a key management scheme interface, to manage *per-file* keys,
> possibly loadable (hook replaceable?). If all files have the same key,
> which is stored in a directory and encoded with a KEK, this is just one
> (debatable) implementation choice. For that, ISTM that what is needed at
> this level is:
>
>    get_key(file_id (relative name? oid? 8 or 16 bytes something?));
>

Per-cluster keys for permanent data and WAL allow a useful level of
protection, even if it could be improved upon. It's also going to be
quicker/simpler to implement, so any API should allow for it. If
there's an arbitrary number of DEK's, using a scope label for
accessing them sounds right, so "WAL", "local_data",
"local_data/tablespaceiod" or "local_data/dboid/tableoid".

>
...
> (3) ISTM that the key management interface should be external, or at least
> it should be possible to make it external easily. I do not think that
> there is a significant performance issue because keys are needed once, and
> once loaded they are there. A simple way to do that is a separate process
> with a basic protocol on stdin/stdout to implement "get_key", which is
> basically already half implemented in the patch for retrieving the KEK.
>

If keys can have arbitrary scope, then the pg backend won't know what
to ask for. So the API becomes even simpler with no specific request
on stdin and all the relevant keys on stdout. I generally like this
approach as well, and it will be the only option for some
integrations. On the other hand, there is an advantage to having the
key retrieval piece of key management in-process - the keys are not
being passed around in plain.

There is also a further validation task - probably beyond the scope of
the key management patch and into the encryption patch[es] territory -
checking that the keys supplied are the same keys in use for the data
currently on disk. It feels to me like this should be done at startup,
rather than as each file is accessed, which could make startup quite
slow if there are a lot of keys with narrow scope.

Regards
Alastair



Re: Proposed patch for key management

From
Bruce Momjian
Date:
On Sat, Jan  2, 2021 at 10:50:15AM +0100, Fabien COELHO wrote:
> > No, this isn't anything like the Kerberos model and there's no case
> > where the PG account won't have access to the DEK (data encryption key)
> > during normal operation (except with the possibility of off-loading to a
> > hardware crypto device which, while interesting, is definitely something
> > that's of very limited utility and which could be added on as a
> > capability later anyway.  This is also well understood by those who are
> > interested in this capability and it's perfectly acceptable that PG will
> > have, in memory, the DEK.
> 
> I'm sorry that I'm not very clear. My ranting is having a KEK "master key"
> in pg memory. I think I'm fine at this point with having DEKs available on
> the host to code/decode files. What I meant about kerberos is that the setup
> I was describing was making the database dependent on a remote host, which
> is somehow what keroberos does.

We bzero the master key once we are done with it in the server.  Why is
having the master key in memory for a short time while a problem while
having the data keys always in memory acceptable?  Aren't the data keys
more valuable, and hence they fact the master key is in memory for a
short time no additional risk?

> > The keys which are stored on disk with the proposed patch are encrypted
> > using a KEK which is external to PG- that's all part of the existing
> > patch and design.
> 
> Yep. My point is that the patch hardwires a cryptographic design with many
> assumptions, whereas I think it should design an API compatible with a
> larger class of designs, and possibly implement one with KEK and so on, I'm
> fine with that.

You are not addressing my complexity/fragility reasons for having a
single method.  As stated above, this feature has limited usefulness,
and if it breaks or you lose the KEK, your database server and backups
might be gone.

> > so it's unclear to me what the argument here is.
> 
> The argument is that professional cryptophers do wrong designs, and I do not
> see why you would do different. So instead of hardwiring choice that you
> think are THE good ones, ISTM that pg should design a reasonably flexible
> API, and also provide an implementation, obviously.

See above.

> > Further, we haven't even gotten to actual cluster encryption yet- this
> > is all just the key management side of things,
> 
> With hardwired choices about 1 KEK and 2 DEK that are debatable, see below.

Then let's debate them, since this patch isn't moving forward as long as
people are asking questions about the implementation.

> > > I'm not sure of the design of the key rotation stuff as I recall it from the
> > > patches I looked at, but the API design looks like the more pressing issue.
> > > First, the mere existence of an "master key" is a cryptographic choice which
> > > is debatable, because it creates issues if one must be able to change it,
> > > hence the whole "key rotation" stuff. ISTM that what core needs to know is
> > > that one particular key is changed by a new one and the underlying file is
> > > rewritten. It should not need to know anything about master keys and key
> > > derivation at all. It should need to know that the user wants to change file
> > > keys, though.
> > 
> > No, it's not debatable that a KEK is needed, I disagree entirely.
> 
> ISTM that there is cryptographic design which suits your needs and you seem
> to decide that it is the only possible way to do it It seems that you know.
> As a researcher, I know that I do not know:-) As a software engineer, I'm
> trying to suggest enabling more with the patch, including not hardwiring a
> three-key management scheme.

Well, your open API goal might work, but I am not comfortable
implementing that for reasons already explained, so what do we do?

> I'm advocating that you pg not decide for others what is best for them, but
> rather allow multiple choices and implementations through an API. Note that
> I'm not claiming that the 1 KEK + 2 DEK + AES… design is bad. I'm claiming
> that it is just one possible design, quite peculiar though, and that pg
> should not hardwire it, and it does not need to anyway.
> 
> The documentation included in the key management patch states: "Key zero is
> the key used to encrypt database heap and index files which are stored in
> the file system, plus temporary files created during database operation."
> Read as is, it suggests that the same key is used to encrypt many files.
> From a cryptographic point of view this looks like a bad idea. The mode used
> is unclear. If this is a stream cypher generated in counter mode, it could
> be a very bad idea. Hopefully this is not what is intended, but the included
> documentation is frankly misleading in that case. IMHO there should be one
> key per file. Also, the CTR mode should be avoided if possible because it
> has quite special properties, unless these properties are absolutely
> necessary.

We need CTR since the WAL and even data blocks are not encrypted in full
blocks.  We are using GCM for key validation and tamper detection.

What is the value of one key per file?  It means every CREATE TABLE
needs a new data key, and if we lose any key, we lose some data.  I
would rather walk away from this feature rather than implement this.

This gets to a general thread in this discussion of adding complexity
and fragility without a clear explanation of the security value of
adding it.  This is not new --- we created private voice meetings to
avoid the distraction of such suggestions, but now that the patch is
being considered, that isn't possible anymore.

> > From what I see in the patch, the DEKs cannot be changed, only the KEK
> which encodes de DEK. I cannot say I'm thrilled by such a property. If the
> KEK is lost, then probably the DEKs are lost as well. Changing the KEK does

Yes.

> not change anything about it, so that the attacker would be able to retrieve
> the contents of later/previous state of the database if they can retrieve a
> previous/later encoded version, even if the KEK is changed every day. This
> is a bad property which somehow provides a false sense of security. I'm not
> sure that it makes much sense to change a KEK without changing the
> associated DEK, security-wise.

Well, this is a common security feature.  You might have your KMS
compromised, but if you change your KEK before you are attacked, or if
someone leaves the organization, you get some security value.  However,
you are right that if you can access a DEK wrapped with an old KEK, you
can read the data keys, and the DEK will work with the new cluster.

> Anyway, I'd suggest to provide a script to re-encrypt a cluster, possibly
> while offline, with different DEKs. From an audit perspective, this could be
> a requirement.

We already have that documented in the wiki that it will use a standby
for key rotation, though I have heard someone suggest having LSN-based
keys that can change as transactions increment.

> I'd expect a very detailed README or documentation somewhere, but there are
> none in the committed/uncommitted patch.

For what?  They key management?  We don't have the data encryption part
done yet.

> In https://wiki.postgresql.org/wiki/Transparent_Data_Encryption, I see a lot
> of minute cryptographic design decision which are debatable, and a few which
> seem to be still opened, whereas other things are already in the process of
> being committed. I'm annoyed by the decision to reuse the same key for a lot
> of encryption, even if there are discussions about changing the iv in some
> way. If you have some clever way to build an iv, then please please change
> the key as well?

Why?  See above.  Adding complexity for unclear security value isn't
helpful.

> > All of this hand-waving really isn't helping.
> 
> I'm sorry that you feel that. I'm really trying to help improve the approach
> and design with my expertise. As I said in another mail, I'm no one, you can
> always chose to ignore me.

No, we can't.  People read this thread, conclude the implementation path
is unclear, and assume these issues should be resolved before moving
forward.  I am frankly leaning in the TDE-not-wanted direction based on
discussions here.   Frankly, I am happier with that than to try to push
this rock up hill for several months.

> > If it's a small addition to the current code then it'd be fantastic if
> > you'd propose a specific patch which adds what you're suggesting.
> 
> My point is to discuss an API.
> 
> Once there is some agreement the possible API outline, then implementations
> can be provided, but if people do not want an API in the first place, I'm
> not sure I can see the point of sending patches.

I think you are accurate here.  You can certainly propose an external
API, but I think any external API would have more negatives (complexity,
fragility) than security positives.  Any discussion that ignores the
negatives is unlikely to be adopted.

Perhaps someone proposes an external API, and I, and perhaps others,
complain about its negatives, so it isn't adopted.  So the internal key
management is rejected, the external API is rejected, so we do nothing.
I feel that is where we are now.  Marking this feature as not wanted
would at least eliminate discussions that will never lead to any value.
Another option would be to move forward without complete agreement.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key management

From
Bruce Momjian
Date:
On Sat, Jan  2, 2021 at 12:47:19PM +0000, Alastair Turner wrote:
> If keys can have arbitrary scope, then the pg backend won't know what
> to ask for. So the API becomes even simpler with no specific request
> on stdin and all the relevant keys on stdout. I generally like this
> approach as well, and it will be the only option for some
> integrations. On the other hand, there is an advantage to having the
> key retrieval piece of key management in-process - the keys are not
> being passed around in plain.
> 
> There is also a further validation task - probably beyond the scope of
> the key management patch and into the encryption patch[es] territory -
> checking that the keys supplied are the same keys in use for the data
> currently on disk. It feels to me like this should be done at startup,
> rather than as each file is accessed, which could make startup quite
> slow if there are a lot of keys with narrow scope.

We do that already on startup by using GCM to validate the  KEK when
encrypting each DEK.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key management

From
Bruce Momjian
Date:
On Fri, Jan  1, 2021 at 06:26:36PM +0000, Alastair Turner wrote:
> After the long intro, my question - If using a standard format,
> managed by a library, for the internal keystore does not result in a
> smaller or simpler patch, is there enough other value for this to be
> worth considering? For security features, standards and building on
> well exercised code bases do really have value, but is it enough...

I know Stephen Frost looked at PKCS12 but didn't see much value in it
since the features provided by PKCS12 didn't seem to add much for
Postgres, and added complexity.  Using GCM for key wrapping, heap/index
files, and WAL seemed simple.

FYI, the current patch has an initdb option --copy-encryption-key, which
copies the key from an old cluster, for use by pg_upgrade.  Technically
you could create a directory called pg_cryptokey/live, put wrapped files
0/1 in there, and use that when initializing a new cluster.  That would
allow you to generate the data keys using your own random source.  I am
not sure how useful that would be, but we could document this ability.

There is all sorts of flexibility being proposed:

*  scope of keys
*  encryption method
*  encryption mode
*  internal/external

Some of this is related to wrapping the data keys, and some of it gets
to the encryption of cluster files.  I just don't see how anything with
the level of flexibility being asked for could ever be reliable or
simple to administer, and I don't see the security value of that
flexibility.  I feel at a certain point, we just need to choose the best
options and move forward.

I thought this had all been debated, but people continue to show up and
ask for it to be debated again.  At one level, these discussions are
valuable, but at a certain point you end up re-litigate this that you
get nothing else done.  I know some people who have given up working on
this feature for this reason.

I am not asking we stop discussing it, but I do ask we address the
negatives of suggestions, especially when the negatives have already
been clearly stated.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Proposed patch for key management

From
Neil Chen
Date:


On Tue, Jan 5, 2021 at 10:18 AM Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Jan  1, 2021 at 06:26:36PM +0000, Alastair Turner wrote:

There is all sorts of flexibility being proposed:

*  scope of keys
*  encryption method
*  encryption mode
*  internal/external

Some of this is related to wrapping the data keys, and some of it gets
to the encryption of cluster files.  I just don't see how anything with
the level of flexibility being asked for could ever be reliable or
simple to administer, and I don't see the security value of that
flexibility.  I feel at a certain point, we just need to choose the best
options and move forward.

+1. 

I thought this had all been debated, but people continue to show up and
ask for it to be debated again.  At one level, these discussions are
valuable, but at a certain point you end up re-litigate this that you
get nothing else done.  I know some people who have given up working on
this feature for this reason.

I am not asking we stop discussing it, but I do ask we address the
negatives of suggestions, especially when the negatives have already
been clearly stated.

Well, in fact, because the discussion about TDE/KMS has several very long threads, maybe not everyone can read them completely first, and inevitably, there will be topics that have already been discussed.  
  
At least for the initial version, I think we should pay more attention to whether the currently provided functions are acceptable, instead of always staring at what it lacks, and how it should be more flexible. 

For basic KMS functions, we need to ensure its stability and safety. Use a more flexible API to support different encryption algorithms. Yes, it does look more powerful, but it does not see much value for stability and security. Regardless of whether we want to do this or not, but I think this can at least not be discussed in the first version. We will not discuss whether it is safer to use more keys, or even introduce HSM management. We only evaluate whether this design can resist attacks under our Threat_models as the initial version. 

Of course, the submission of a feature needs to accept the opinions of many people, but for a large and complex feature, I think that dividing the stage to limit the scope of the discussion will help us avoid getting into endless disputes. 

--
There is no royal road to learning.
HighGo Software Co.

Re: Proposed patch for key management

From
Alastair Turner
Date:
Hi Bruce

On Mon, 4 Jan 2021 at 18:23, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Jan  1, 2021 at 06:26:36PM +0000, Alastair Turner wrote:
> > After the long intro, my question - If using a standard format,
> > managed by a library, for the internal keystore does not result in a
> > smaller or simpler patch, is there enough other value for this to be
> > worth considering? For security features, standards and building on
> > well exercised code bases do really have value, but is it enough...
>
> I know Stephen Frost looked at PKCS12 but didn't see much value in it
> since the features provided by PKCS12 didn't seem to add much for
> Postgres, and added complexity...

Yeah, OpenSSL 1.1's PKCS12 implementation doesn't offer much for
managing arbitrary secrets. Building a reasonable abstraction over the
primitives it does provide requires a non-trivial amount of code.

> ... Using GCM for key wrapping, heap/index
> files, and WAL seemed simple.

Having Postgres take on the task of wrapping the keys and deriving the
wrapping key requires less code than getting a library to do it, sure.
It also expands the functional scope of Postgres, what Postgres is
responsible for. This isn't what libraries should ideally do for
software development, but it's not uncommon and it's very fertile
ground for some of the discussion which has dogged this feature's
development.

There's a question of whether the new scope should be taken on at all.
Fabien expressed a strong opinion that it should not be earlier in the
thread. Even if you don't take an absolute view on the additional
scope, lines of code do not all create equal risk. Ensuring the
confidentiality of an encryption key is far higher stakes code than
serialising and deserialising the key which is being secured and
stored by a library or external provider. Which is why I like the idea
of having OpenSSL (and, at some point, nss) do the KDF, wrapping and
storage bit, and have been hacking on some code in that direction.

If all that Postgres is doing is the serde, that's also something
which can objectively be said to be right or wrong. Key derivation and
wrapping are only ever "strong enough for the moment" or "in line with
policy/best practice". Which brings us to flexibility.

> There is all sorts of flexibility being proposed:
>
> *  scope of keys
> *  encryption method
> *  encryption mode
> *  internal/external
>
> Some of this is related to wrapping the data keys, and some of it gets
> to the encryption of cluster files. I just don't see how anything with
> the level of flexibility being asked for could ever be reliable or
> simple to administer,...

Externalising the key wrap and its KDF (either downward to OpenSSL or
to a separate process) makes the related points of flexibility
something else's problem. OpenSSL and the like have put a lot of
effort into making these things configurable and building the APIs
(mainly PCKS11 and KMIP) for handing the functions off to dedicated
hardware. Many or most organisations requiring that sort of
integration will have those complexities budgeted in already.

>... and I don't see the security value of that
> flexibility.  I feel at a certain point, we just need to choose the best
> options and move forward.
>
> I thought this had all been debated, but people continue to show up and
> ask for it to be debated again.  At one level, these discussions are
> valuable, but at a certain point you end up re-litigate this that you
> get nothing else done.  I know some people who have given up working on
> this feature for this reason.

Apologies for covering the PCKS12 ground again, I really did look for
any on-list and wiki references to the topic.

> I am not asking we stop discussing it, but I do ask we address the
> negatives of suggestions, especially when the negatives have already
> been clearly stated.

The negatives are increasing the amount of code involved in delivering
TDE and, implicitly, pushing back the delivery of TDE. I'm not
claiming that they aren't valid. I was hoping that PCKS12 would
deliver some benefits on the code side, but a bit of investigation
showed that wasn't the case.

To offset these, I believe that taking a slightly longer route now
(and I am keen to do some of that slog) has significant benefits in
getting Postgres (or at least the core server processes) out of the
business of key derivation and wrapping. Not just the implementation
detail, but also decisions about what is "good enough" - the KDF
worries me particularly in this regard. With my presales techie hat
on, I'm very aware that everyone's good enough is slightly different,
and not just on one axis. Using library functions (let's say OpenSSL)
will provide a range of flexibility to fit these varied boundaries, at
no incremental cost beyond the library integration, and avoid any
implementation decisions becoming a lightning rod for objections to
Postgres.

Regards
Alastair



Re: Proposed patch for key management

From
Alastair Turner
Date:
On Mon, 4 Jan 2021 at 17:56, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Sat, Jan  2, 2021 at 12:47:19PM +0000, Alastair Turner wrote:
> >
> > There is also a further validation task - probably beyond the scope of
> > the key management patch and into the encryption patch[es] territory -
> > checking that the keys supplied are the same keys in use for the data
> > currently on disk. It feels to me like this should be done at startup,
> > rather than as each file is accessed, which could make startup quite
> > slow if there are a lot of keys with narrow scope.
>
> We do that already on startup by using GCM to validate the  KEK when
> encrypting each DEK.
>
Which validates two things - that the KEK is the same one which was
used to encrypt the DEKs (instead of returning garbage plaintext when
given a garbage key), and that the DEKs have not been tampered with at
rest. What it does not check is that the DEKs are the keys used to
encrypt the data, that one has not been copied or restored independent
of the other.