Thread: Moving forward with TDE

Moving forward with TDE

From
David Christensen
Date:
Hi -hackers,

Working with Stephen, I am attempting to pick up some of the work that
was left off with TDE and the key management infrastructure.  I have
rebased Bruce's KMS/TDE patches as they existed on the
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption wiki
page, which are enclosed in this email.

I would love to open a discussion about how to move forward and get
some of these features built out.  The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature?  Any major
developments in direction that would not be reflected in the code from
May 2021?

Thanks,

David

Attachment

Re: Moving forward with TDE

From
Aleksander Alekseev
Date:
Hi David,

> Working with Stephen, I am attempting to pick up some of the work that
> was left off with TDE and the key management infrastructure.  I have
> rebased Bruce's KMS/TDE patches as they existed on the
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption wiki
> page, which are enclosed in this email.

I'm happy to see that the TDE effort was picked up.

> I would love to open a discussion about how to move forward and get
> some of these features built out.  The historical threads here are
> quite long and complicated; is there a "current state" other than the
> wiki that reflects the general thinking on this feature?  Any major
> developments in direction that would not be reflected in the code from
> May 2021?

The patches seem to be well documented and decomposed in small pieces.
That's good.

Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].

One limitation of the design described on the wiki I see is that it
seems to heavily rely on AES:

> We will use Advanced Encryption Standard (AES) [4]. We will offer three key length options (128, 192, and 256-bits)
selectedat initdb time with --file-encryption-method
 

(there doesn't seem to be any mention of the hash/MAC algorithms,
that's odd). In the future we should be able to add the support of
alternative algorithms. The reason is that the algorithms can become
weak every 20 years or so, and the preferred algorithms may also
depend on the region. This should NOT be implemented in this
particular patchset, but the design shouldn't prevent from
implementing this in the future.

[1]: https://commitfest.postgresql.org/40/3985/
[2]: http://cfbot.cputube.org/

-- 
Best regards,
Aleksander Alekseev



Re: Moving forward with TDE

From
David Christensen
Date:
> Unless somebody in the community remembers open questions/issues with
> TDE that were never addressed I suggest simply iterating with our
> usual testing/reviewing process. For now I'm going to change the
> status of the CF entry [1] to "Waiting for Author" since the patchset
> doesn't pass the CI [2].

Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.

Known issues (just discovered by me in testing the latest revision) is
that databases created from `template0` are not decrypting properly,
but `template1` works fine, so going to dig in on that soon.

> One limitation of the design described on the wiki I see is that it
> seems to heavily rely on AES:
>
> > We will use Advanced Encryption Standard (AES) [4]. We will offer three key length options (128, 192, and 256-bits)
selectedat initdb time with --file-encryption-method
 
>
> (there doesn't seem to be any mention of the hash/MAC algorithms,
> that's odd). In the future we should be able to add the support of
> alternative algorithms. The reason is that the algorithms can become
> weak every 20 years or so, and the preferred algorithms may also
> depend on the region. This should NOT be implemented in this
> particular patchset, but the design shouldn't prevent from
> implementing this in the future.

Yes, we definitely are considering multiple algorithms support as part
of this effort.

Best,

David

Attachment

Re: Moving forward with TDE

From
Dilip Kumar
Date:
On Fri, Nov 4, 2022 at 3:36 AM David Christensen
<david.christensen@crunchydata.com> wrote:
>
> > Unless somebody in the community remembers open questions/issues with
> > TDE that were never addressed I suggest simply iterating with our
> > usual testing/reviewing process. For now I'm going to change the
> > status of the CF entry [1] to "Waiting for Author" since the patchset
> > doesn't pass the CI [2].
>
> Thanks, enclosed is a new version that is rebased on HEAD and fixes a
> bug that the new pg_control_init() test picked up.

I was looking into the documentation patches 0001 and 0002, I think
the explanation is very clear.  I have a few questions/comments

+By not using the database id in the IV, CREATE DATABASE can copy the
+heap/index files from the old database to a new one without
+decryption/encryption.  Both page copies are valid.  Once a database
+changes its pages, it gets new LSNs, and hence new IV.

How about the WAL_LOG method for creating a database? because in that
we get the new LSN for the pages in the new database, so do we
reencrypt, if yes then this documentation needs to be updated
otherwise we might need to add that code.

+changes its pages, it gets new LSNs, and hence new IV.  Using only the
+LSN and page number also avoids requiring pg_upgrade to preserve
+database oids, tablespace oids, and relfilenodes.

I think this line needs to be changed, because now we are already
preserving dbid/tbsid/relfilenode.  So even though we are not using
those in IV there is no point in saying we are avoiding that
requirement.

I will review the remaining patches soon.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Moving forward with TDE

From
Jacob Champion
Date:
On Mon, Oct 24, 2022 at 9:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:
> I would love to open a discussion about how to move forward and get
> some of these features built out.  The historical threads here are
> quite long and complicated; is there a "current state" other than the
> wiki that reflects the general thinking on this feature?  Any major
> developments in direction that would not be reflected in the code from
> May 2021?

I don't think the patchset here has incorporated the results of the
discussion [1] that happened at the end of 2021. For example, it looks
like AES-CTR is still in use for the pages, which I thought was
already determined to be insufficient.

The following next steps were proposed in that thread:

> 1. modify temporary file I/O to use a more centralized API
> 2. modify the existing cluster file encryption patch to use XTS with a
>    IV that uses more than the LSN
> 3. add XTS regression test code like CTR
> 4. create WAL encryption code using CTR

Does this patchset need review before those steps are taken (or was
there additional conversation/work that I missed)?

Thanks,
--Jacob

[1] https://www.postgresql.org/message-id/flat/20211013222648.GA373%40momjian.us



Re: Moving forward with TDE

From
David Christensen
Date:
> On Nov 15, 2022, at 1:08 PM, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Mon, Oct 24, 2022 at 9:29 AM David Christensen
> <david.christensen@crunchydata.com> wrote:
>> I would love to open a discussion about how to move forward and get
>> some of these features built out.  The historical threads here are
>> quite long and complicated; is there a "current state" other than the
>> wiki that reflects the general thinking on this feature?  Any major
>> developments in direction that would not be reflected in the code from
>> May 2021?
>
> I don't think the patchset here has incorporated the results of the
> discussion [1] that happened at the end of 2021. For example, it looks
> like AES-CTR is still in use for the pages, which I thought was
> already determined to be insufficient.

Good to know about the next steps, thanks.

> The following next steps were proposed in that thread:
>
>> 1. modify temporary file I/O to use a more centralized API
>> 2. modify the existing cluster file encryption patch to use XTS with a
>>   IV that uses more than the LSN
>> 3. add XTS regression test code like CTR
>> 4. create WAL encryption code using CTR
>
> Does this patchset need review before those steps are taken (or was
> there additional conversation/work that I missed)?

This was just a refresh of the old patches on the wiki to work as written on HEAD. If there are known TODOs here this
thenthat work is still needing to be done.  

I was going to take 2) and Stephen was going to work on 3); I am not sure about the other two but will review the
threadyou pointed to. Thanks for pointing that out.  

David




Re: Moving forward with TDE

From
Jacob Champion
Date:
On Tue, Nov 15, 2022 at 11:39 AM David Christensen
<david.christensen@crunchydata.com> wrote:
> Good to know about the next steps, thanks.

You're welcome!

> This was just a refresh of the old patches on the wiki to work as written on HEAD. If there are known TODOs here this
thenthat work is still needing to be done.
 
>
> I was going to take 2) and Stephen was going to work on 3); I am not sure about the other two but will review the
threadyou pointed to. Thanks for pointing that out.
 

I've attached the diffs I'm carrying to build this under meson (as
well as -Wshadow; my removal of the two variables probably needs some
scrutiny). It looks like the testcrypto executable will need
substantial changes after the common/hex.h revert.

--Jacob

Attachment

Re: Moving forward with TDE

From
David Christensen
Date:
Hi Jacob,

Thanks, I've added this patch in my tree [1]. (For now, just adding
fixes and the like atop the original separate patches, but will
eventually get things winnowed down into probably the same 12 parts
the originals were reviewed in.

Best,

David

[1] https://github.com/pgguru/postgres/tree/tde



Re: Moving forward with TDE

From
David Christensen
Date:
Hi Dilip,

Thanks for the feedback here. I will review the docs changes and add to my tree.

Best,

David



Re: Moving forward with TDE

From
vignesh C
Date:
On Fri, 4 Nov 2022 at 03:36, David Christensen
<david.christensen@crunchydata.com> wrote:
>
> > Unless somebody in the community remembers open questions/issues with
> > TDE that were never addressed I suggest simply iterating with our
> > usual testing/reviewing process. For now I'm going to change the
> > status of the CF entry [1] to "Waiting for Author" since the patchset
> > doesn't pass the CI [2].
>
> Thanks, enclosed is a new version that is rebased on HEAD and fixes a
> bug that the new pg_control_init() test picked up.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
b82557ecc2ebbf649142740a1c5ce8d19089f620 ===
=== applying patch
./v2-0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patch
patching file src/common/Makefile
Hunk #2 FAILED at 84.
1 out of 2 hunks FAILED -- saving rejects to file src/common/Makefile.rej

[1] - http://cfbot.cputube.org/patch_41_3985.log

Regards,
Vignesh



Re: Moving forward with TDE

From
Chris Travers
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

I have decided to write a review here in terms of whether we want this feature, and perhaps the way we should look at
encryptionas a project down the road, since I think this is only the beginning.  I am hoping to run some full tests of
thefeature sometime in coming weeks.  Right now this review is limited to the documentation and documented feature.
 

From the documentation, the primary threat model of TDE is to prevent decryption of data from archived wal segments
(anddata files), for example on a backup system.  While there are other methods around this problem to date, I think
thatthis feature is worth pursuing for that reason.  I want to address a couple of reasons for this and then go into
somereservations I have about how some of this is documented.
 

There are current workarounds to ensuring encryption at rest, but these have a number of problems.  Encryption
passphrasesend up lying around the system in various places.  Key rotation is often difficult.  And one mistake can
easilyrender all efforts ineffective.  TDE solves these problems.  The overall design from the internal docs looks
solid. This definitely is something I would recommend for many users.
 

I have a couple small caveats though.  Encryption of data is a large topic and there isn't a one-size-fits-all solution
toindustrial or state requirements.  Having all this key management available in PostgreSQL is a very good thing.  Long
runit is likely to end up being extensible, and therefore both more powerful and offering a wider range of choices for
solutionarchitects.  Implementing encryption is also something that is easy to mess up.  For this reason I think it
wouldbe great if we had a standardized format for discussing encryption options that we could use going forward.  I
don'tthink that should be held against this patch but I think we need to start discussing it now because it will be a
biggerproblem later.
 

A second caveat I have is that key management is a topic where you really need a good overview of internals in order to
implementeffectively.  If you don't know how an SSL handshake works or what is in a certificate, you can easily make
mistakesin setting up SSL.  I can see the same thing happening here.  For example, I don't think it would be safe to
leavethe KEK on an encrypted filesystem that is decrypted at runtime (or at least I wouldn't consider that safe -- your
appetitefor risk may vary).
 

My proposal would be to have build a template for encryption options in the documentation.  This could include topics
likeSSL as well.  In such a template we'd have sections like "Threat model,"  "How it works," "Implementation
Requirements"and so forth.  Again I don't think this needs to be part of the current patch but I think it is something
weneed to start thinking about now.  Maybe after this goes in, I can present a proposed documentation patch.
 

I will also note that I don't consider myself to be very qualified on topics like encryption.  I can reason about key
managementto some extent but some implementation details may be beyond me.  I would hope we could get some extra review
onthis patch set soon. 

Re: Moving forward with TDE

From
Stephen Frost
Date:
Greetings,

* Chris Travers (chris.travers@gmail.com) wrote:
> From the documentation, the primary threat model of TDE is to prevent decryption of data from archived wal segments
(anddata files), for example on a backup system.  While there are other methods around this problem to date, I think
thatthis feature is worth pursuing for that reason.  I want to address a couple of reasons for this and then go into
somereservations I have about how some of this is documented.
 

Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated.  That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.

> There are current workarounds to ensuring encryption at rest, but these have a number of problems.  Encryption
passphrasesend up lying around the system in various places.  Key rotation is often difficult.  And one mistake can
easilyrender all efforts ineffective.  TDE solves these problems.  The overall design from the internal docs looks
solid. This definitely is something I would recommend for many users.
 

There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another.  This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.

> I have a couple small caveats though.  Encryption of data is a large topic and there isn't a one-size-fits-all
solutionto industrial or state requirements.  Having all this key management available in PostgreSQL is a very good
thing. Long run it is likely to end up being extensible, and therefore both more powerful and offering a wider range of
choicesfor solution architects.  Implementing encryption is also something that is easy to mess up.  For this reason I
thinkit would be great if we had a standardized format for discussing encryption options that we could use going
forward. I don't think that should be held against this patch but I think we need to start discussing it now because it
willbe a bigger problem later.
 

Do you have a suggestion as to the format to use?

> A second caveat I have is that key management is a topic where you really need a good overview of internals in order
toimplement effectively.  If you don't know how an SSL handshake works or what is in a certificate, you can easily make
mistakesin setting up SSL.  I can see the same thing happening here.  For example, I don't think it would be safe to
leavethe KEK on an encrypted filesystem that is decrypted at runtime (or at least I wouldn't consider that safe -- your
appetitefor risk may vary).
 

Agreed that we should document this and make clear that the KEK is
necessary for server start but absolutely should be kept as safe as
possible and certainly not stored on disk somewhere nearby the encrypted
cluster.

> My proposal would be to have build a template for encryption options in the documentation.  This could include topics
likeSSL as well.  In such a template we'd have sections like "Threat model,"  "How it works," "Implementation
Requirements"and so forth.  Again I don't think this needs to be part of the current patch but I think it is something
weneed to start thinking about now.  Maybe after this goes in, I can present a proposed documentation patch.
 

I'm not entirely sure that it makes sense to lump this and TLS in the
same place as they end up being rather independent at the end of the
day.  If you have ideas for how to improve the documentation, I'd
certainly encourage you to go ahead and work on that and submit it as a
patch rather than waiting for this to actually land in core.  Having
good and solid documentation is something that will help this get in,
after all, and to the extent that it's covering existing topics like
TLS, those could likely be included independently and that would be of
benefit to everyone.

> I will also note that I don't consider myself to be very qualified on topics like encryption.  I can reason about key
managementto some extent but some implementation details may be beyond me.  I would hope we could get some extra review
onthis patch set soon.
 

Certainly agree with you there though there's an overall trajectory of
patches involved in all of this that's a bit deep.  The plan is to
discuss that at PGCon (On the Road to TDE) and at the PGCon
Unconference after.  I certainly hope those interested will be there.
I'm also happy to have a call with anyone interested in this effort
independent of that, of course.

Thanks!

Stephen

Attachment

Re: Moving forward with TDE

From
Bruce Momjian
Date:
On Wed, Mar  8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
> Agreed, though the latest efforts include an option for *authenticated*
> encryption as well as unauthenticated.  That makes it much more
> difficult to make undetected changes to the data that's protected by
> the authenticated encryption being used.

I thought some more about this.  GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer.  In the Postgres
case, it seems the two end points and the transfer are all in the same
place.  Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.

> There's clearly user demand for it as there's a number of organizations
> who have forks which are providing it in one shape or another.  This
> kind of splintering of the community is actually an actively bad thing
> for the project and is part of what killed Unix, by at least some pretty
> reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern.  Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

> Certainly agree with you there though there's an overall trajectory of
> patches involved in all of this that's a bit deep.  The plan is to
> discuss that at PGCon (On the Road to TDE) and at the PGCon
> Unconference after.  I certainly hope those interested will be there.
> I'm also happy to have a call with anyone interested in this effort
> independent of that, of course.

I will not be attending Ottawa.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.



Re: Moving forward with TDE

From
Stephen Frost
Date:
Greetings,

On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Mar  8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
> Agreed, though the latest efforts include an option for *authenticated*
> encryption as well as unauthenticated.  That makes it much more
> difficult to make undetected changes to the data that's protected by
> the authenticated encryption being used.

I thought some more about this.  GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer.  In the Postgres
case, it seems the two end points and the transfer are all in the same
place.  Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.

What are the two end points you are referring to and why don’t you feel there is an opportunity between them for a malicious actor to attack the system?

There are simpler cases to consider than an online attack on a single independent system where an attacker having access to modify the data in transit between PG and the storage would imply the attacker also having access to read keys out of PG’s memory. 

As specific examples, consider:

An attack against the database system where the database server is shut down, or a backup, and  the encryption key isn’t available on the system.

The backup system itself, not running as the PG user (an option supported by PG and at least pgbackrest) being compromised, thus allowing for injection of changes into a backup or into a restore.

The beginning of this discussion also very clearly had individuals voicing strong opinions that unauthenticated encryption methods were not acceptable as an end-state for PG due to the clear issue of there then being no protection against modification of data.  The approach we are working towards provides both the unauthenticated option, which clearly has value to a large number of our collective user base considering the number of commercial implementations which have now arisen, and the authenticated solution which goes further and provides the level clearly expected of the PG community. This gets us a win-win situation.

> There's clearly user demand for it as there's a number of organizations
> who have forks which are providing it in one shape or another.  This
> kind of splintering of the community is actually an actively bad thing
> for the project and is part of what killed Unix, by at least some pretty
> reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern.  Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

I’ve grown weary of this argument as the other major piece of work it was routinely applied to was RLS and yet that has certainly been seen broadly as a beneficial feature with users clearly leveraging it and in more than some “checkbox” way.

Indeed, it’s similar also in that commercial implementations were done of RLS while there were arguments made about it being a checkbox feature which were used to discourage it from being implemented in core.  Were it truly checkbox, I don’t feel we would have the regular and ongoing discussion about it on the lists that we do, nor see other tools built on top of PG which specifically leverage it. Perhaps there are truly checkbox features out there which we will never implement, but I’m (perhaps due to what my dad would call selective listening on my part, perhaps not) having trouble coming up with any presently. Features that exist in other systems that we don’t want?  Certainly. We don’t characterize those as simply “checkbox” though. Perhaps that’s in part because we provide alternatives- but that’s not the case here. We have no comparable way to have this capability as part of the core system.

We, as a community, are clearly losing value by lack of this capability, if by no other measure than simply the numerous users of the commercial implementations feeling that they simply can’t use PG without this feature, for whatever their reasoning.

Thanks,

Stephen

Re: Moving forward with TDE

From
Bruce Momjian
Date:
On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:
> Greetings,
> 
> On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Wed, Mar  8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
>     > Agreed, though the latest efforts include an option for *authenticated*
>     > encryption as well as unauthenticated.  That makes it much more
>     > difficult to make undetected changes to the data that's protected by
>     > the authenticated encryption being used.
> 
>     I thought some more about this.  GCM-style authentication of encrypted
>     data has value because it assumes the two end points are secure but that
>     a malicious actor could modify data during transfer.  In the Postgres
>     case, it seems the two end points and the transfer are all in the same
>     place.  Therefore, it is unclear to me the value of using GCM-style
>     authentication because if the GCM-level can be modified, so can the end
>     points, and the encryption key exposed.
> 
> 
> What are the two end points you are referring to and why don’t you feel there
> is an opportunity between them for a malicious actor to attack the system?

Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?

> There are simpler cases to consider than an online attack on a single
> independent system where an attacker having access to modify the data in
> transit between PG and the storage would imply the attacker also having access
> to read keys out of PG’s memory. 

I consider the operating system and its processes as much more of a
single entity than TLS over a network.

> As specific examples, consider:
> 
> An attack against the database system where the database server is shut down,
> or a backup, and  the encryption key isn’t available on the system.
> 
> The backup system itself, not running as the PG user (an option supported by PG
> and at least pgbackrest) being compromised, thus allowing for injection of
> changes into a backup or into a restore.

I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.

> The beginning of this discussion also very clearly had individuals voicing
> strong opinions that unauthenticated encryption methods were not acceptable as
> an end-state for PG due to the clear issue of there then being no protection
> against modification of data.  The approach we are working towards provides

What were the _technical_ reasons for those objections?

> both the unauthenticated option, which clearly has value to a large number of
> our collective user base considering the number of commercial implementations
> which have now arisen, and the authenticated solution which goes further and
> provides the level clearly expected of the PG community. This gets us a win-win
> situation.
> 
>     > There's clearly user demand for it as there's a number of organizations
>     > who have forks which are providing it in one shape or another.  This
>     > kind of splintering of the community is actually an actively bad thing
>     > for the project and is part of what killed Unix, by at least some pretty
>     > reputable accounts, in my view.
> 
>     Yes, the number of commercial implementations of this is a concern.  Of
>     course, it is also possible that those commercial implementations are
>     meeting checkbox requirements rather than technical ones, and the
>     community has been hostile to check box-only features.
> 
> 
> I’ve grown weary of this argument as the other major piece of work it was
> routinely applied to was RLS and yet that has certainly been seen broadly as a
> beneficial feature with users clearly leveraging it and in more than some
> “checkbox” way.

RLS has to overcome that objection, and I think it did, as was better
for doing that.

> We, as a community, are clearly losing value by lack of this capability, if by
> no other measure than simply the numerous users of the commercial
> implementations feeling that they simply can’t use PG without this feature, for
> whatever their reasoning.

That is true, but I go back to my concern over useful feature vs. check
box.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.



Re: Moving forward with TDE

From
Stephen Frost
Date:
Greetings,

On Mon, Mar 27, 2023 at 18:17 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:
> Greetings,
>
> On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
>
>     On Wed, Mar  8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
>     > Agreed, though the latest efforts include an option for *authenticated*
>     > encryption as well as unauthenticated.  That makes it much more
>     > difficult to make undetected changes to the data that's protected by
>     > the authenticated encryption being used.
>
>     I thought some more about this.  GCM-style authentication of encrypted
>     data has value because it assumes the two end points are secure but that
>     a malicious actor could modify data during transfer.  In the Postgres
>     case, it seems the two end points and the transfer are all in the same
>     place.  Therefore, it is unclear to me the value of using GCM-style
>     authentication because if the GCM-level can be modified, so can the end
>     points, and the encryption key exposed.
>
>
> What are the two end points you are referring to and why don’t you feel there
> is an opportunity between them for a malicious actor to attack the system?

Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?

TLS does use GCM.. pretty much exclusively as far as I can recall. So do a lot of other things though..

> There are simpler cases to consider than an online attack on a single
> independent system where an attacker having access to modify the data in
> transit between PG and the storage would imply the attacker also having access
> to read keys out of PG’s memory. 

I consider the operating system and its processes as much more of a
single entity than TLS over a network.

This may be the case sometimes but there’s absolutely no shortage of other cases and it’s almost more the rule these days, that there is some kind of network between the OS processes and the storage- a SAN, an iSCSI network, NFS, are all quite common.

> As specific examples, consider:
>
> An attack against the database system where the database server is shut down,
> or a backup, and  the encryption key isn’t available on the system.
>
> The backup system itself, not running as the PG user (an option supported by PG
> and at least pgbackrest) being compromised, thus allowing for injection of
> changes into a backup or into a restore.

I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.

Pgbackrest has encryption and authentication of it … but that doesn’t actually address the attack vector that I outlined. If the backup user is compromised then they can change the data before it gets to the storage.  If the backup user is compromised then they have access to whatever key is used to encrypt and authenticate the backup and therefore can trivially manipulate the data.

Encryption of backups by the backup tool serves to protect the data after it leaves the backup system and is stored in cloud storage or in whatever format the repository takes.  This is beneficial, particularly when the data itself offers no protection, but simply not the same.

> The beginning of this discussion also very clearly had individuals voicing
> strong opinions that unauthenticated encryption methods were not acceptable as
> an end-state for PG due to the clear issue of there then being no protection
> against modification of data.  The approach we are working towards provides

What were the _technical_ reasons for those objections?

I believe largely the ones I’m bringing up here and which I outline above… I don’t mean to pretend that any of this is of my own independent construction. I don’t believe it is and my apologies if it came across that way.

> both the unauthenticated option, which clearly has value to a large number of
> our collective user base considering the number of commercial implementations
> which have now arisen, and the authenticated solution which goes further and
> provides the level clearly expected of the PG community. This gets us a win-win
> situation.
>
>     > There's clearly user demand for it as there's a number of organizations
>     > who have forks which are providing it in one shape or another.  This
>     > kind of splintering of the community is actually an actively bad thing
>     > for the project and is part of what killed Unix, by at least some pretty
>     > reputable accounts, in my view.
>
>     Yes, the number of commercial implementations of this is a concern.  Of
>     course, it is also possible that those commercial implementations are
>     meeting checkbox requirements rather than technical ones, and the
>     community has been hostile to check box-only features.
>
>
> I’ve grown weary of this argument as the other major piece of work it was
> routinely applied to was RLS and yet that has certainly been seen broadly as a
> beneficial feature with users clearly leveraging it and in more than some
> “checkbox” way.

RLS has to overcome that objection, and I think it did, as was better
for doing that.

Beyond it being called a checkbox - what were the arguments against it?  I don’t object to being challenged to point out the use cases, but I feel that at least some very clear and straight forward ones are outlined from what has been said above. I also don’t believe those are the only ones but I don’t think I could enumerate every use case for RLS either, even after seeing it used for quite a few years.  I do seriously question the level of effort expected of features that are claimed to be “Checkbox” and tossed almost exclusively for that reason on this list given the success of the ones that have been accepted and are in active use by our users today.

> We, as a community, are clearly losing value by lack of this capability, if by
> no other measure than simply the numerous users of the commercial
> implementations feeling that they simply can’t use PG without this feature, for
> whatever their reasoning.

That is true, but I go back to my concern over useful feature vs. check
box.

While it’s easy to label something as checkbox, I don’t feel we have been fair to our users in doing so as it has historically prevented features which our users are demanding and end up getting from commercial providers until we implement them ultimately anyway.  This particular argument simply doesn’t seem to actually hold the value that proponents of it claim, for us at least, and we have clear counter-examples which we can point to and I hope we learn from those.

Thanks!

Stephen

Re: Moving forward with TDE

From
Bruce Momjian
Date:
On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:
>     I consider the operating system and its processes as much more of a
>     single entity than TLS over a network.
> 
> This may be the case sometimes but there’s absolutely no shortage of other
> cases and it’s almost more the rule these days, that there is some kind of
> network between the OS processes and the storage- a SAN, an iSCSI network, NFS,
> are all quite common.

Yes, but consider that the database cluster is having to get its data
from that remote storage --- the remote storage is not an independent
entity that can be corrupted without the databaes server being
compromised. If everything in PGDATA was GCM-verified, it would be
secure, but because some parts are not, I don't think it would be.

>     > As specific examples, consider:
>     >
>     > An attack against the database system where the database server is shut
>     down,
>     > or a backup, and  the encryption key isn’t available on the system.
>     >
>     > The backup system itself, not running as the PG user (an option supported
>     by PG
>     > and at least pgbackrest) being compromised, thus allowing for injection
>     of
>     > changes into a backup or into a restore.
> 
>     I then question why we are not adding encryption to pg_basebackup or
>     pgbackrest rather than the database system.
> 
> Pgbackrest has encryption and authentication of it … but that doesn’t actually
> address the attack vector that I outlined. If the backup user is compromised
> then they can change the data before it gets to the storage.  If the backup
> user is compromised then they have access to whatever key is used to encrypt
> and authenticate the backup and therefore can trivially manipulate the data.

So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.

>     What were the _technical_ reasons for those objections?
> 
> I believe largely the ones I’m bringing up here and which I outline above… I
> don’t mean to pretend that any of this is of my own independent construction. I
> don’t believe it is and my apologies if it came across that way.

Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.

>     > I’ve grown weary of this argument as the other major piece of work it was
>     > routinely applied to was RLS and yet that has certainly been seen broadly
>     as a
>     > beneficial feature with users clearly leveraging it and in more than some
>     > “checkbox” way.
> 
>     RLS has to overcome that objection, and I think it did, as was better
>     for doing that.
> 
> Beyond it being called a checkbox - what were the arguments against it?  I

The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.

>     > We, as a community, are clearly losing value by lack of this capability,
>     if by
>     > no other measure than simply the numerous users of the commercial
>     > implementations feeling that they simply can’t use PG without this
>     feature, for
>     > whatever their reasoning.
> 
>     That is true, but I go back to my concern over useful feature vs. check
>     box.
> 
> While it’s easy to label something as checkbox, I don’t feel we have been fair

No, actually, it isn't.  I am not sure why you are saying that.

> to our users in doing so as it has historically prevented features which our
> users are demanding and end up getting from commercial providers until we
> implement them ultimately anyway.  This particular argument simply doesn’t seem
> to actually hold the value that proponents of it claim, for us at least, and we
> have clear counter-examples which we can point to and I hope we learn from
> those.

I don't think you are addressing actual issues above.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.



Re: Moving forward with TDE

From
Stephen Frost
Date:
Greetings,

On Mon, Mar 27, 2023 at 19:19 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:
>     I consider the operating system and its processes as much more of a
>     single entity than TLS over a network.
>
> This may be the case sometimes but there’s absolutely no shortage of other
> cases and it’s almost more the rule these days, that there is some kind of
> network between the OS processes and the storage- a SAN, an iSCSI network, NFS,
> are all quite common.

Yes, but consider that the database cluster is having to get its data
from that remote storage --- the remote storage is not an independent
entity that can be corrupted without the databaes server being
compromised. If everything in PGDATA was GCM-verified, it would be
secure, but because some parts are not, I don't think it would be.

The remote storage is certainly an independent system. Multi-mount LUNs are entirely possible in a SAN (and absolutely with NFS, or just the NFS server itself is compromised..), so while the attacker may not have any access to the database server itself, they may have access to these other systems, and that’s not even considering in-transit attacks which are also absolutely possible, especially with iSCSI or NFS. 

I don’t understand what is being claimed that the remote storage is “not an independent system” based on my understanding of, eg, NFS. With NFS, a directory on the NFS server is exported and the client mounts that directory as NFS locally, all over a network which may or may not be secured against manipulation.  A user on the NFS server with root access is absolutely able to access and modify files on the NFS server trivially, even if they have no access to the PG server.  Would you explain what you mean?

I do agree that the ideal case would be that we encrypt everything we can (not everything can be for various reasons, but we don’t actually need to either) in the PGDATA directory is encrypted and authenticated, just like it would be ideal if everything was checksum’d and isn’t today. We are progressing in that direction thanks to efforts such as reworking the other subsystems to used shared buffers and a consistent page format, but just like with checksums we do not need to have the perfect solution for us to provide a lot of value here- and our users know that as the same is true of the unauthenticated encryption approaches being offered by the commercial solutions.

>     > As specific examples, consider:
>     >
>     > An attack against the database system where the database server is shut
>     down,
>     > or a backup, and  the encryption key isn’t available on the system.
>     >
>     > The backup system itself, not running as the PG user (an option supported
>     by PG
>     > and at least pgbackrest) being compromised, thus allowing for injection
>     of
>     > changes into a backup or into a restore.
>
>     I then question why we are not adding encryption to pg_basebackup or
>     pgbackrest rather than the database system.
>
> Pgbackrest has encryption and authentication of it … but that doesn’t actually
> address the attack vector that I outlined. If the backup user is compromised
> then they can change the data before it gets to the storage.  If the backup
> user is compromised then they have access to whatever key is used to encrypt
> and authenticate the backup and therefore can trivially manipulate the data.

So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.

That’s perhaps a fair consideration- but it’s clearly of enough value that many of our users are asking for it and not using PG because we don’t have it today. Ultimately though, this clearly makes it more than a “checkbox” feature. I hope we are able to agree on that now.

>     What were the _technical_ reasons for those objections?
>
> I believe largely the ones I’m bringing up here and which I outline above… I
> don’t mean to pretend that any of this is of my own independent construction. I
> don’t believe it is and my apologies if it came across that way.

Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.

For the users who ask on the lists for this feature, regularly, how many don’t ask because they google or find prior responses on the list to the question of if we have this capability?  How do we know that their cases are “checkbox”?  Consider that there are standards groups which explicitly consider these attack vectors and consider them important enough to require mitigations to address those vectors. Do the end users of PG understand the attack vectors or why they matter?  Perhaps not, but just because they can’t articulate the reasoning does NOT mean that the attack vector doesn’t exist or that their environment is somehow immune to it- indeed, as the standards bodies surely know, the opposite is true- they’re almost certainly at risk of those attack vectors and therefore the standards bodies are absolutely justified in requiring them to provide a solution. Treating these users as unimportant because they don’t have the depth of understanding that we do or that the standards body does is not helping them- it’s actively driving them away from PG. 

>     > I’ve grown weary of this argument as the other major piece of work it was
>     > routinely applied to was RLS and yet that has certainly been seen broadly
>     as a
>     > beneficial feature with users clearly leveraging it and in more than some
>     > “checkbox” way.
>
>     RLS has to overcome that objection, and I think it did, as was better
>     for doing that.
>
> Beyond it being called a checkbox - what were the arguments against it?  I

The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.

This is an excellent point- and dovetails very nicely into my argument that protecting primary data (what is provided by users and ends up in indexes and heaps) is valuable even if we don’t (yet..) have protection for other parts of the system. Reducing the size of the attack vector is absolutely useful, especially when it’s such a large amount of the data in the system. Yes, we should, and will, continue to improve- as we do with many features, but we don’t need to wait for perfection to include this feature, just as with RLS and numerous other features we have. 

>     > We, as a community, are clearly losing value by lack of this capability,
>     if by
>     > no other measure than simply the numerous users of the commercial
>     > implementations feeling that they simply can’t use PG without this
>     feature, for
>     > whatever their reasoning.
>
>     That is true, but I go back to my concern over useful feature vs. check
>     box.
>
> While it’s easy to label something as checkbox, I don’t feel we have been fair

No, actually, it isn't.  I am not sure why you are saying that.

I’m confused as to what is required to label a feature as a “checkbox” feature then. What did you us to make that determination of this feature?  I’m happy to be wrong here. 

> to our users in doing so as it has historically prevented features which our
> users are demanding and end up getting from commercial providers until we
> implement them ultimately anyway.  This particular argument simply doesn’t seem
> to actually hold the value that proponents of it claim, for us at least, and we
> have clear counter-examples which we can point to and I hope we learn from
> those.

I don't think you are addressing actual issues above.

Specifics would be really helpful. I don’t doubt that there are things I’m missing, but I’ve tried to address each point raised clearly and concisely.

Thanks!

Stephen

Re: Moving forward with TDE

From
Bruce Momjian
Date:
On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:
> The remote storage is certainly an independent system. Multi-mount LUNs are
> entirely possible in a SAN (and absolutely with NFS, or just the NFS server
> itself is compromised..), so while the attacker may not have any access to the
> database server itself, they may have access to these other systems, and that’s
> not even considering in-transit attacks which are also absolutely possible,
> especially with iSCSI or NFS. 
> 
> I don’t understand what is being claimed that the remote storage is “not an
> independent system” based on my understanding of, eg, NFS. With NFS, a
> directory on the NFS server is exported and the client mounts that directory as
> NFS locally, all over a network which may or may not be secured against
> manipulation.  A user on the NFS server with root access is absolutely able to
> access and modify files on the NFS server trivially, even if they have no
> access to the PG server.  Would you explain what you mean?

The point is that someone could change values in the storage, pg_xact,
encryption settings, binaries, that would allow the attacker to learn
the encryption key.  This is not possible for two secure endpoints and
someone changing data in transit.  Yeah, it took me a while to
understand these boundaries too.

>     So the idea is that the backup user can be compromised without the data
>     being vulnerable --- makes sense, though that use-case seems narrow.
> 
> That’s perhaps a fair consideration- but it’s clearly of enough value that many
> of our users are asking for it and not using PG because we don’t have it today.
> Ultimately though, this clearly makes it more than a “checkbox” feature. I hope
> we are able to agree on that now.

It is more than a check box feature, yes, but I am guessing few people
are wanting the this for the actual features beyond check box.

>     Yes, there is value beyond the check-box, but in most cases those
>     values are limited considering the complexity of the features, and the
>     check-box is what most people are asking for, I think.
> 
> For the users who ask on the lists for this feature, regularly, how many don’t
> ask because they google or find prior responses on the list to the question of
> if we have this capability?  How do we know that their cases are “checkbox”? 

Because I have rarely heard people articulate the value beyond check
box.

> Consider that there are standards groups which explicitly consider these attack
> vectors and consider them important enough to require mitigations to address
> those vectors. Do the end users of PG understand the attack vectors or why they
> matter?  Perhaps not, but just because they can’t articulate the reasoning does
> NOT mean that the attack vector doesn’t exist or that their environment is
> somehow immune to it- indeed, as the standards bodies surely know, the opposite
> is true- they’re almost certainly at risk of those attack vectors and therefore
> the standards bodies are absolutely justified in requiring them to provide a
> solution. Treating these users as unimportant because they don’t have the depth
> of understanding that we do or that the standards body does is not helping
> them- it’s actively driving them away from PG. 

Well, then who is going to explain them here, because I have not heard
them yet.

>     The RLS arguments were that queries could expoose some of the underlying
>     data, but in summary, that was considered acceptable.
> 
> This is an excellent point- and dovetails very nicely into my argument that
> protecting primary data (what is provided by users and ends up in indexes and
> heaps) is valuable even if we don’t (yet..) have protection for other parts of
> the system. Reducing the size of the attack vector is absolutely useful,
> especially when it’s such a large amount of the data in the system. Yes, we
> should, and will, continue to improve- as we do with many features, but we
> don’t need to wait for perfection to include this feature, just as with RLS and
> numerous other features we have. 

The issue is that you needed a certain type of user with a certain type
of access to break RLS, while for this, writing to PGDATA is the simple
case for all the breakage, and the thing we are protecting with
authentication.

>     >     > We, as a community, are clearly losing value by lack of this
>     capability,
>     >     if by
>     >     > no other measure than simply the numerous users of the commercial
>     >     > implementations feeling that they simply can’t use PG without this
>     >     feature, for
>     >     > whatever their reasoning.
>     >
>     >     That is true, but I go back to my concern over useful feature vs.
>     check
>     >     box.
>     >
>     > While it’s easy to label something as checkbox, I don’t feel we have been
>     fair
> 
>     No, actually, it isn't.  I am not sure why you are saying that.
> 
> I’m confused as to what is required to label a feature as a “checkbox” feature
> then. What did you us to make that determination of this feature?  I’m happy to
> be wrong here. 

I don't see the point in me continuing to reply here.  You just seem to
continue asking questions without actually thinking of what I am saying,
and hope I get tired or something.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.



Re: Moving forward with TDE

From
Stephen Frost
Date:
Greetings,

On Mon, Mar 27, 2023 at 21:35 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:
> The remote storage is certainly an independent system. Multi-mount LUNs are
> entirely possible in a SAN (and absolutely with NFS, or just the NFS server
> itself is compromised..), so while the attacker may not have any access to the
> database server itself, they may have access to these other systems, and that’s
> not even considering in-transit attacks which are also absolutely possible,
> especially with iSCSI or NFS. 
>
> I don’t understand what is being claimed that the remote storage is “not an
> independent system” based on my understanding of, eg, NFS. With NFS, a
> directory on the NFS server is exported and the client mounts that directory as
> NFS locally, all over a network which may or may not be secured against
> manipulation.  A user on the NFS server with root access is absolutely able to
> access and modify files on the NFS server trivially, even if they have no
> access to the PG server.  Would you explain what you mean?

The point is that someone could change values in the storage, pg_xact,
encryption settings, binaries, that would allow the attacker to learn
the encryption key.  This is not possible for two secure endpoints and
someone changing data in transit.  Yeah, it took me a while to
understand these boundaries too.

This depends on the specific configuration of the systems, clearly. Being able to change values in other parts of the system isn’t great and we should work to improve on that, but clearly that isn’t so much of an issue that people aren’t willing to accept a partial solution or existing commercial solutions wouldn’t be accepted or considered viable. Indeed, using GCM is objectively an improvement over what’s being offered commonly today.

I also generally object to the idea that being able to manipulate the PGDATA directory necessarily means being able to gain access to the KEK. In trivial solutions, sure, it’s possible, but the NFS server should never be asking some external KMS for the key to a given DB server and a reasonable implementation won’t allow this, and instead would flag and log such an attempt for someone to review, leading to a much faster realization of a compromised system.

Certainly it’s much simpler to reason about an attacker with no knowledge of either system and only network access to see if they can penetrate the communications between the two end-points, but that is not the only case where authenticated encryption is useful.

>     So the idea is that the backup user can be compromised without the data
>     being vulnerable --- makes sense, though that use-case seems narrow.
>
> That’s perhaps a fair consideration- but it’s clearly of enough value that many
> of our users are asking for it and not using PG because we don’t have it today.
> Ultimately though, this clearly makes it more than a “checkbox” feature. I hope
> we are able to agree on that now.

It is more than a check box feature, yes, but I am guessing few people
are wanting the this for the actual features beyond check box.

As I explained previously, perhaps the people asking are doing so for only the “checkbox”, but that doesn’t mean it isn’t a useful feature or that it isn’t valuable in its own right. Those checklists were compiled and enforced for a reason, which the end users might not understand but is still absolutely valuable.  Sad to say, but frankly this is becoming more and more common but we shouldn’t be faulting the users asking for it- if it were truly useless then eventually it would be removed from the standard, but it hasn’t and it won’t be because, while not every end user has a depth of understanding to explain it, it is actually a useful and important capability to have and one that is important to implement.

>     Yes, there is value beyond the check-box, but in most cases those
>     values are limited considering the complexity of the features, and the
>     check-box is what most people are asking for, I think.
>
> For the users who ask on the lists for this feature, regularly, how many don’t
> ask because they google or find prior responses on the list to the question of
> if we have this capability?  How do we know that their cases are “checkbox”? 

Because I have rarely heard people articulate the value beyond check
box.

Have I done so sufficiently then that we can agree that calling it “checkbox” is inappropriate and detrimental to our user base?

> Consider that there are standards groups which explicitly consider these attack
> vectors and consider them important enough to require mitigations to address
> those vectors. Do the end users of PG understand the attack vectors or why they
> matter?  Perhaps not, but just because they can’t articulate the reasoning does
> NOT mean that the attack vector doesn’t exist or that their environment is
> somehow immune to it- indeed, as the standards bodies surely know, the opposite
> is true- they’re almost certainly at risk of those attack vectors and therefore
> the standards bodies are absolutely justified in requiring them to provide a
> solution. Treating these users as unimportant because they don’t have the depth
> of understanding that we do or that the standards body does is not helping
> them- it’s actively driving them away from PG. 

Well, then who is going to explain them here, because I have not heard
them yet.

I thought I was doing so.

>     The RLS arguments were that queries could expoose some of the underlying
>     data, but in summary, that was considered acceptable.
>
> This is an excellent point- and dovetails very nicely into my argument that
> protecting primary data (what is provided by users and ends up in indexes and
> heaps) is valuable even if we don’t (yet..) have protection for other parts of
> the system. Reducing the size of the attack vector is absolutely useful,
> especially when it’s such a large amount of the data in the system. Yes, we
> should, and will, continue to improve- as we do with many features, but we
> don’t need to wait for perfection to include this feature, just as with RLS and
> numerous other features we have. 

The issue is that you needed a certain type of user with a certain type
of access to break RLS, while for this, writing to PGDATA is the simple
case for all the breakage, and the thing we are protecting with
authentication.

This goes back to the “if it isn’t perfect then it’s useless” argument … but that’s exactly the discussion which was had around RLS and ultimately we decided that RLS was still useful even with the leaks- and our users accepted that also and have benefitted from it ever since it was included in core. The same exists here- yes, more needs to be done than the absolute simplest “make install” to have the system be secure (not unlike today with our defaults from a source build with “make install”..) but at least with this capability included it’s possible, and we can write “securing PostgreSQL” documentation on how to, whereas without it there is simply no way to address the attack vectors I’ve articulated here. 

>     >     > We, as a community, are clearly losing value by lack of this
>     capability,
>     >     if by
>     >     > no other measure than simply the numerous users of the commercial
>     >     > implementations feeling that they simply can’t use PG without this
>     >     feature, for
>     >     > whatever their reasoning.
>     >
>     >     That is true, but I go back to my concern over useful feature vs.
>     check
>     >     box.
>     >
>     > While it’s easy to label something as checkbox, I don’t feel we have been
>     fair
>
>     No, actually, it isn't.  I am not sure why you are saying that.
>
> I’m confused as to what is required to label a feature as a “checkbox” feature
> then. What did you us to make that determination of this feature?  I’m happy to
> be wrong here. 

I don't see the point in me continuing to reply here.  You just seem to
continue asking questions without actually thinking of what I am saying,
and hope I get tired or something.

I hope we have others who have a moment to chime in here and provide their viewpoints as I don’t feel this is an accurate representation of the discussion thus far. 

Thanks,

Stephen

Re: Moving forward with TDE

From
Chris Travers
Date:


On Tue, Mar 28, 2023 at 5:02 AM Stephen Frost <sfrost@snowman.net> wrote:

> There's clearly user demand for it as there's a number of organizations
> who have forks which are providing it in one shape or another.  This
> kind of splintering of the community is actually an actively bad thing
> for the project and is part of what killed Unix, by at least some pretty
> reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern.  Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

I’ve grown weary of this argument as the other major piece of work it was routinely applied to was RLS and yet that has certainly been seen broadly as a beneficial feature with users clearly leveraging it and in more than some “checkbox” way.

Indeed, it’s similar also in that commercial implementations were done of RLS while there were arguments made about it being a checkbox feature which were used to discourage it from being implemented in core.  Were it truly checkbox, I don’t feel we would have the regular and ongoing discussion about it on the lists that we do, nor see other tools built on top of PG which specifically leverage it. Perhaps there are truly checkbox features out there which we will never implement, but I’m (perhaps due to what my dad would call selective listening on my part, perhaps not) having trouble coming up with any presently. Features that exist in other systems that we don’t want?  Certainly. We don’t characterize those as simply “checkbox” though. Perhaps that’s in part because we provide alternatives- but that’s not the case here. We have no comparable way to have this capability as part of the core system.

We, as a community, are clearly losing value by lack of this capability, if by no other measure than simply the numerous users of the commercial implementations feeling that they simply can’t use PG without this feature, for whatever their reasoning.

I also think this is something of a problem because very few requirements are actually purely technical requirements, and I think the issue is that in many cases there are ways around the lack of the feature. 

So I would phrase this differently.  What is the value of doing this in core?

This dramatically simplifies the question of setting up a PostgreSQL environment that is properly protected with encryption at rest.  That in itself is valuable.  Today you can accomplish something similar with encrypted filesystems and encryption options in things like pgbackrest.  However these are many different pieces of a solution and missing up the setup of any one of them can compromise the data.  Having a single point of encryption and decryption means fewer opportunities to mess it up and that means less risk.  This in turn makes it easier to settle on using PostgreSQL.

There are certainly going to be those who approach encryption at rest as a checkbox item and who don't really care if there are holes in it.  But there are others who really should be concerned (and this is becoming a bigger issue where data privacy, PCI-DSS, and other requirements may come into play), and those need better tooling than we have.  I also think that as data privacy becomes a larger issue, this will become a larger topic.

 Anyway, my contribution to that question.

Best Wishes,
Chris Travers

Thanks,

Stephen


--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Moving forward with TDE

From
Chris Travers
Date:


On Tue, Mar 28, 2023 at 8:35 AM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:
> The remote storage is certainly an independent system. Multi-mount LUNs are
> entirely possible in a SAN (and absolutely with NFS, or just the NFS server
> itself is compromised..), so while the attacker may not have any access to the
> database server itself, they may have access to these other systems, and that’s
> not even considering in-transit attacks which are also absolutely possible,
> especially with iSCSI or NFS. 
>
> I don’t understand what is being claimed that the remote storage is “not an
> independent system” based on my understanding of, eg, NFS. With NFS, a
> directory on the NFS server is exported and the client mounts that directory as
> NFS locally, all over a network which may or may not be secured against
> manipulation.  A user on the NFS server with root access is absolutely able to
> access and modify files on the NFS server trivially, even if they have no
> access to the PG server.  Would you explain what you mean?

The point is that someone could change values in the storage, pg_xact,
encryption settings, binaries, that would allow the attacker to learn
the encryption key.  This is not possible for two secure endpoints and
someone changing data in transit.  Yeah, it took me a while to
understand these boundaries too.

>     So the idea is that the backup user can be compromised without the data
>     being vulnerable --- makes sense, though that use-case seems narrow.
>
> That’s perhaps a fair consideration- but it’s clearly of enough value that many
> of our users are asking for it and not using PG because we don’t have it today.
> Ultimately though, this clearly makes it more than a “checkbox” feature. I hope
> we are able to agree on that now.

It is more than a check box feature, yes, but I am guessing few people
are wanting the this for the actual features beyond check box.

>     Yes, there is value beyond the check-box, but in most cases those
>     values are limited considering the complexity of the features, and the
>     check-box is what most people are asking for, I think.
>
> For the users who ask on the lists for this feature, regularly, how many don’t
> ask because they google or find prior responses on the list to the question of
> if we have this capability?  How do we know that their cases are “checkbox”? 

Because I have rarely heard people articulate the value beyond check
box.

I think there is value.  I am going to try to articulate a case for this here.

The first is that if people just want a "checkbox" then they can implement PostgreSQL in ways that have encryption at rest today.  This includes using LUKS and the encryption options in pgbackrest.  That's good enough for a checkbox.  It isn't good enough for a real, secured instance however.

There are a few problems with trying to do this for a secured instance.  The first is that you have multiple links in the encryption chain, and the failure of any one of them ill lead to cleartext exposure of data files.  This is not a problem for those who just want to tick a checkbox.  Also the fact that backups and main systems are separately encrypted there (if the backups are encrypted at all) means that people have to choose between complicating a restore process and simply ditching encryption on the backup, which makes the checkbox somewhat pointless.

Where I have usually seen this come up is in the question of "how do you prevent the problem of someone pulling storage devices from your servers and taking them away to compromise your data?"  Physical security comes into it but often times people want more than that as an answer.  I saw questions like that from external auditors when I was at Adjust.

If you want to actually address that problem, then the current tooling is quite cumbersome.  Yes you can do it, but it is very hard to make sure it has been fully secured and also very hard to monitor.  TDE would make the setup and verification of this much easier.  And in particular it solves a number of other issues that I can see arising from LUKS and similar approaches since it doesn't rely on the kernel to be able to translate plain text to and from cypher text.

I have actually worked with folks who have PII and need to protect it and who currently use LUKS and pg_backrest to do so.  I would be extremely happy to see TDE replace those for their needs.  I can imagine that those who hold high value data would use it as well instead of these other more error prone and less secure setups.
 

> Consider that there are standards groups which explicitly consider these attack
> vectors and consider them important enough to require mitigations to address
> those vectors. Do the end users of PG understand the attack vectors or why they
> matter?  Perhaps not, but just because they can’t articulate the reasoning does
> NOT mean that the attack vector doesn’t exist or that their environment is
> somehow immune to it- indeed, as the standards bodies surely know, the opposite
> is true- they’re almost certainly at risk of those attack vectors and therefore
> the standards bodies are absolutely justified in requiring them to provide a
> solution. Treating these users as unimportant because they don’t have the depth
> of understanding that we do or that the standards body does is not helping
> them- it’s actively driving them away from PG. 

Well, then who is going to explain them here, because I have not heard
them yet.

>     The RLS arguments were that queries could expoose some of the underlying
>     data, but in summary, that was considered acceptable.
>
> This is an excellent point- and dovetails very nicely into my argument that
> protecting primary data (what is provided by users and ends up in indexes and
> heaps) is valuable even if we don’t (yet..) have protection for other parts of
> the system. Reducing the size of the attack vector is absolutely useful,
> especially when it’s such a large amount of the data in the system. Yes, we
> should, and will, continue to improve- as we do with many features, but we
> don’t need to wait for perfection to include this feature, just as with RLS and
> numerous other features we have. 

The issue is that you needed a certain type of user with a certain type
of access to break RLS, while for this, writing to PGDATA is the simple
case for all the breakage, and the thing we are protecting with
authentication.

>     >     > We, as a community, are clearly losing value by lack of this
>     capability,
>     >     if by
>     >     > no other measure than simply the numerous users of the commercial
>     >     > implementations feeling that they simply can’t use PG without this
>     >     feature, for
>     >     > whatever their reasoning.
>     >
>     >     That is true, but I go back to my concern over useful feature vs.
>     check
>     >     box.
>     >
>     > While it’s easy to label something as checkbox, I don’t feel we have been
>     fair
>
>     No, actually, it isn't.  I am not sure why you are saying that.
>
> I’m confused as to what is required to label a feature as a “checkbox” feature
> then. What did you us to make that determination of this feature?  I’m happy to
> be wrong here. 

I don't see the point in me continuing to reply here.  You just seem to
continue asking questions without actually thinking of what I am saying,
and hope I get tired or something.

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.


--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Moving forward with TDE [PATCH v3]

From
David Christensen
Date:
Greetings,

I am including an updated version of this patch series; it has been rebased onto 6ec62b7799 and reworked somewhat.

The patches are as follows:

0001 - doc updates
0002 - Basic key management and cipher support
0003 - Backend-related changes to support heap encryption
0004 - modifications to bin tools and programs to manage key rotation and add other knowledge
0005 - Encrypted/authenticated WAL

These are very broad strokes at this point and should be split up a bit more to make things more granular and easier to review, but I wanted to get this update out.

Of note, the encryption supported in this release as exposed to the heap-level is AES-XTS-128 and AES-XTS-256; there is built-in support for CTR and GCM, however based on other discussions related how to store the additional authenticated data on the page, GCM has been removed from the list of supported ciphers.  This could certainly be enabled in the future, however the other pieces that this patchset provides would enable TDE without the additional block size/storage concerns.

Best,

David
Attachment

Re: Moving forward with TDE [PATCH v3]

From
Bruce Momjian
Date:
On Tue, Oct 31, 2023 at 04:23:17PM -0500, David Christensen wrote:
> Greetings,
> 
> I am including an updated version of this patch series; it has been rebased
> onto 6ec62b7799 and reworked somewhat.
> 
> The patches are as follows:
> 
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption
> 0004 - modifications to bin tools and programs to manage key rotation and add
> other knowledge
> 0005 - Encrypted/authenticated WAL
> 
> These are very broad strokes at this point and should be split up a bit more to
> make things more granular and easier to review, but I wanted to get this update
> out.

This lacks temp table file encryption, right?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Moving forward with TDE [PATCH v3]

From
David Christensen
Date:
On Tue, Oct 31, 2023 at 4:30 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Oct 31, 2023 at 04:23:17PM -0500, David Christensen wrote:
> Greetings,
>
> I am including an updated version of this patch series; it has been rebased
> onto 6ec62b7799 and reworked somewhat.
>
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption
> 0004 - modifications to bin tools and programs to manage key rotation and add
> other knowledge
> 0005 - Encrypted/authenticated WAL
>
> These are very broad strokes at this point and should be split up a bit more to
> make things more granular and easier to review, but I wanted to get this update
> out.

This lacks temp table file encryption, right?

Temporary /files/ are handled in a different patch set and are not included here (not sure of the status of integrating at this point). I  believe that this patch should handle temporary heap files just fine since they go through the Page API.

Re: Moving forward with TDE [PATCH v3]

From
Bruce Momjian
Date:
On Tue, Oct 31, 2023 at 04:32:38PM -0500, David Christensen wrote:
> On Tue, Oct 31, 2023 at 4:30 PM Bruce Momjian <bruce@momjian.us> wrote:
> Temporary /files/ are handled in a different patch set and are not included
> here (not sure of the status of integrating at this point). I  believe that
> this patch should handle temporary heap files just fine since they go through
> the Page API.

Yes, that's what I thought, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Moving forward with TDE [PATCH v3]

From
Matthias van de Meent
Date:
On Tue, 31 Oct 2023 at 22:23, David Christensen
<david.christensen@crunchydata.com> wrote:
>
> Greetings,
>
> I am including an updated version of this patch series; it has been rebased onto 6ec62b7799 and reworked somewhat.
>
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption

I'm quite surprised at the significant number of changes being made
outside the core storage manager files. I thought that changing out
mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
would be the most obvious change to implement cluster-wide encryption
with the least code touched, as relations don't need to know whether
the files they're writing are encrypted, right? Is there a reason to
not implement this at the smgr level that I overlooked in the
documentation of these patches?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Moving forward with TDE [PATCH v3]

From
Andres Freund
Date:
Hi,

On 2023-10-31 16:23:17 -0500, David Christensen wrote:
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption
> 0004 - modifications to bin tools and programs to manage key rotation and
> add other knowledge
> 0005 - Encrypted/authenticated WAL
>
> These are very broad strokes at this point and should be split up a bit
> more to make things more granular and easier to review, but I wanted to get
> this update out.

Yes, particularly 0003 really needs to be split - as is it's not easily
reviewable.



> From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> From: David Christensen <david@pgguru.net>
> Date: Fri, 29 Sep 2023 15:16:00 -0400
> Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
>
> When using an encrypted cluster, we need to ensure that the WAL is also
> encrypted. While we could go with an page-based approach, we use instead a
> per-record approach, using GCM for the encryption method and storing the AuthTag
> in the xl_crc field.
>
> We change the xl_crc field to instead be a union struct, with a compile-time
> adjustable size to allow us to customize the number of bytes allocated to the
> GCM authtag.  This allows us to easily adjust the size of bytes needed to
> support our authentication.  (Testing has included up to 12, but leaving at this
> point to 4 due to keeping the size of the WAL records the same.)

Ugh, that'll be quite a bit of overhead in some workloads... You can't really
use such a build for non-encrypted workloads, making this a not very
deployable path...


> @@ -905,20 +905,28 @@ XLogInsertRecord(XLogRecData *rdata,
>      {
>          /*
>           * Now that xl_prev has been filled in, calculate CRC of the record
> -         * header.
> +         * header.  If we are using encrypted WAL, this CRC is overwritten by
> +         * the authentication tag, so just zero
>           */
> -        rdata_crc = rechdr->xl_crc;
> -        COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
> -        FIN_CRC32C(rdata_crc);
> -        rechdr->xl_crc = rdata_crc;
> +        if (!encrypt_wal)
> +        {
> +            rdata_crc = rechdr->xl_integrity.crc;
> +            COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_integrity.crc));
> +            FIN_CRC32C(rdata_crc);
> +            rechdr->xl_integrity.crc = rdata_crc;
> +        }
> +        else
> +            memset(&rechdr->xl_integrity, 0, sizeof(rechdr->xl_integrity));

Why aren't you encrypting most of the data here?  Just as for CRC computation,
encrypting a large record in XLOG_BLCKSZ


>   * XLogRecordDataHeaderLong structs all begin with a single 'id' byte. It's
>   * used to distinguish between block references, and the main data structs.
>   */
> +
> +#define XL_AUTHTAG_SIZE 4
> +#define XL_HEADER_PAD 2
> +
>  typedef struct XLogRecord
>  {
>      uint32        xl_tot_len;        /* total len of entire record */
> @@ -45,14 +49,16 @@ typedef struct XLogRecord
>      XLogRecPtr    xl_prev;        /* ptr to previous record in log */
>      uint8        xl_info;        /* flag bits, see below */
>      RmgrId        xl_rmid;        /* resource manager for this record */
> -    /* 2 bytes of padding here, initialize to zero */
> -    pg_crc32c    xl_crc;            /* CRC for this record */
> -
> +    uint8       xl_pad[XL_HEADER_PAD];        /* required alignment padding */

What does "required" mean here? And why is this defined in a separate define?
And why do we need the explicit field here at all? The union will still
provide sufficient alignment for a uint32.

It also doesn't seem right to remove the comment about needing to zero out the
space.


> From 7557acf60f52da4a86fd9f902bab4804c028dd4b Mon Sep 17 00:00:00 2001
> From: David Christensen <david.christensen@crunchydata.com>
> Date: Tue, 31 Oct 2023 15:24:02 -0400
> Subject: [PATCH v3 3/5] Backend-related changes




> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -323,6 +323,11 @@ end_heap_rewrite(RewriteState state)
>                          state->rs_buffer,
>                          true);
>
> +        PageEncryptInplace(state->rs_buffer, MAIN_FORKNUM,
> +                           RelationIsPermanent(state->rs_new_rel),
> +                           state->rs_blockno,
> +                           RelationGetSmgr(state->rs_new_rel)->smgr_rlocator.locator.relNumber
> +            );
>          PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>
>          smgrextend(RelationGetSmgr(state->rs_new_rel), MAIN_FORKNUM,

I don't think it's ok to have to make such changes in a bunch of places. I
think we need to add an abstraction layer between smgr and code like this
first.  Luckily I think Heikki was working abstracting away some of these
direct smgr* uses...



> +Architecture
> +------------
> +
> +Fundamentally, cluster file encryption must store data in a file system
> +in such a way that the keys required to decrypt the file system data can
> +only be accessed using somewhere outside of the file system itself.  The
> +external requirement can be someone typing in a passphrase, getting a
> +key from a key management server (KMS), or decrypting a key stored in
> +the file system using a hardware security module (HSM).  The current
> +architecture supports all of these methods, and includes sample scripts
> +for them.
> +
> +The simplest method for accessing data keys using some external
> +requirement would be to retrieve all data encryption keys from a KMS.
> +However, retrieved keys would still need to be verified as valid.  This
> +method also introduces unacceptable complexity for simpler use-cases,
> +like user-supplied passphrases or HSM usage.  External key rotation
> +would also be very hard since it would require re-encrypting all the
> +file system data with the new externally-stored keys.
> +
> +For these reason, a two-tiered architecture is used, which uses two
> +types of encryption keys: a key encryption key (KEK) and data encryption
> +keys (DEK). The KEK should not be present unencrypted in the file system
> +--- it should be supplied the user, stored externally (e.g., in a KMS)

*by* the user?

> +or stored in the file system encrypted with a HSM (e.g., PIV device).
> +The DEK is used to encrypt database files and is stored in the same file
> +system as the database but is encrypted using the KEK.  Because the DEK
> +is encrypted, its storage in the file system is no more of a security
> +weakness and the storage of the encrypted database files in the same
> +file system.

As is this paragraph doesn't really follow from the prior paragraph for
me. That a KMS would be hard to use isn't obviously related to splitting the
KEK and the DEK.



> +Implementation
> +--------------
> +
> +To enable cluster file encryption, the initdb option
> +--cluster-key-command must be used, which specifies a command to
> +retrieve the KEK.

FWIW, I think "cluster file encryption" is somewhat ambiguous. That could also
mean encrypting on the file system level or such.


> initdb records the cluster_key_command in
> +postgresql.conf.  Every time the KEK is needed, the command is run and
> +must return 64 hex characters which are decoded into the KEK.  The
> +command is called twice during initdb, and every time the server starts.
> +initdb also sets the encryption method in controldata during server
> +bootstrap.
> +
> +initdb runs "postgres --boot", which calls function
> +kmgr.c::BootStrapKmgr(), which calls the cluster key command.  The
> +cluster key command returns a KEK which is used to encrypt random bytes
> +for each DEK and writes them to the file system by
> +kmgr.c::KmgrWriteCryptoKeys() (unless --copy-encryption-keys is used).
> +Currently the DEK files are 0 and 1 and are stored in
> +$PGDATA/pg_cryptokeys/live.  The wrapped DEK files use Key Wrapping with
> +Padding which verifies the validity of the KEK.
> +
> +initdb also does a non-boot backend start which calls
> +kmgr.c::InitializeKmgr(), which calls the cluster key command a second
> +time.  This decrypts/unwraps the DEK keys and stores them in the shared
> +memory structure KmgrShmem. This step also happens every time the server
> +starts. Later patches will use the keys stored in KmgrShmem to
> +encrypt/decrypt database files.  KmgrShmem is erased via
> +explicit_bzero() on server shutdown.

I think this encodes too many details of how initdb works today. It seems
likely that nobody adding or removing a restart will think of updating this
file - nor should they have to.  I'd just say that initdb starts/stops
postgres multiple times.



> +Initialization Vector
> +---------------------
> +
> +Nonce means "number used once". An Initialization Vector (IV) is a
> +specific type of nonce. That is, unique but not necessarily random or
> +secret, as specified by the NIST
> +(https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf).
> +To generate unique IVs, the NIST recommends two methods:
> +
> +    The first method is to apply the forward cipher function, under
> +    the same key that is used for the encryption of the plaintext,
> +    to a nonce. The nonce must be a data block that is unique to
> +    each execution of the encryption operation. For example, the
> +    nonce may be a counter, as described in Appendix B, or a message
> +    number. The second method is to generate a random data block
> +    using a FIPS-approved random number generator.
> +
> +We will use the first method to generate IVs. That is, select nonce
> +carefully and use a cipher with the key to make it unique enough to use
> +as an IV. The nonce selection for buffer encryption and WAL encryption
> +are described below.
> +
> +If the IV was used more than once with the same key (and we only use one
> +data encryption key), changes in the unencrypted data would be visible
> +in the encrypted data.
> +
> +IV for Heap/Index Encryption
> +- - - - - - - - - - - - - -
> +
> +To create the 16-byte IV needed by AES for each page version, we will
> +use the page LSN (8 bytes) and page number (4 bytes).

I still am quite quite unconvinced that using the LSN as a nonce is a good
design decision.

- LSNs can end up being reused after crash restarts
- the LSN does not contain the timeline ID, if a standby is promoted, two
  systems can be using the same LSNs
- The LSN does *NOT* actually change every time a page is modified. Even with
  wal_log_hint_bits, only the first hint bit modification to a page in a
  checkpoint cycles will cause WAL writes - and changing that would have
  a quite substantial overhead.


>  The LSN is ideal for use in the IV because it is +always increasing, and is
> changed every time a page is updated.  The +same LSN is never used for two
> relations with different page contents.

As mentioned above, this is not true - the LSN does *NOT* change every time
the page is updated.


> +However, the same LSN can be used in multiple pages in the same relation
> +--- this can happen when a heap update expires an old tuple and adds a
> +new tuple to another page.  By adding the page number to the IV, we keep
> +the IV unique.

There's many other ways that can happen.


> +CREATE DATABASE can be run with two different strategies: FILE_COPY or
> +WAL_LOG.  If using WAL_LOG, the heap/index files are automatically
> +rewritten with new LSNs as part of the copy operation and will get new
> +IVs automatically.
> +
> +This approach still works with the older FILE_COPY stragegy; by not
> +using the database id in the IV, CREATE DATABASE can copy the heap/index
> +files from the old database to a new one without decryption/encryption.
> +Both page copies are valid.  Once a database changes its pages, it gets
> +new LSNs, and hence new IV.
> +
> +As part of WAL logging, every change of a WAL-logged page gets a new
> +LSN, and therefore a new IV automatically.
> +
> +However, the LSN must then be visible on encrypted pages, so we will not
> +encrypt the LSN on the page. We will also not encrypt the CRC so
> +pg_checksums can still check pages offline without access to the keys.

s/crc/checksum/? The data-page-level checksum isn't a CRC.


> +Non-Permanent Relations
> +- - - - - - - - - - - -
> +
> +To avoid the overhead of generating WAL for non-permanent (unlogged and
> +temporary) relations, we assign fake LSNs that are derived from a
> +counter via xlog.c::GetFakeLSNForUnloggedRel().  (GiST also uses this
> +counter for LSNs.)  We also set a bit in the IV so the use of the same
> +value for WAL (real) and fake LSNs will still generate unique IVs.  Only
> +main forks are encrypted, not init, vm, or fsm files.

Why aren't other forks encrypted? This seems like a very odd design to me.


> +Hint Bits
> +- - - - -
> +
> +For hint bit changes, the LSN normally doesn't change, which is a
> +problem.  By enabling wal_log_hints, you get full page writes to the WAL
> +after the first hint bit change of the checkpoint.  This is useful for
> +two reasons.  First, it generates a new LSN, which is needed for the IV
> +to be secure.  Second, full page images protect against torn pages,
> +which is an even bigger requirement for encryption because the new LSN
> +is re-encrypting the entire page, not just the hint bit changes.  You
> +can safely lose the hint bit changes, but you need to use the same LSN
> +to decrypt the entire page, so a torn page with an LSN change cannot be
> +decrypted.  To prevent this, wal_log_hints guarantees that the
> +pre-hint-bit version (and previous LSN version) of the page is restored.
> +
> +However, if a hint-bit-modified page is written to the file system
> +during a checkpoint, and there is a later hint bit change switching the
> +same page from clean to dirty during the same checkpoint, we need a new
> +LSN, and wal_log_hints doesn't give us a new LSN here.  The fix for this
> +is to update the page LSN by writing a dummy WAL record via
> +xloginsert.c::LSNForEncryption() in such cases.

Ugh, so that's really the plan. That's a substantial overhead in some common
scenarios.

...

Greetings,

Andres Freund



Re: Moving forward with TDE [PATCH v3]

From
Andres Freund
Date:
Hi,

On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> I'm quite surprised at the significant number of changes being made
> outside the core storage manager files. I thought that changing out
> mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> would be the most obvious change to implement cluster-wide encryption
> with the least code touched, as relations don't need to know whether
> the files they're writing are encrypted, right? Is there a reason to
> not implement this at the smgr level that I overlooked in the
> documentation of these patches?

You can't really implement encryption transparently inside an smgr without
significant downsides. You need a way to store an initialization vector
associated with the page (or you can store that elsewhere, but then you've
doubled the worst cse amount of random reads/writes). The patch uses the LSN
as the IV (which I doubt is a good idea). For authenticated encryption further
additional storage space is required.

To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
increases in additional situations (a more aggressive version of
wal_log_hint_bits) - which can't be done below smgr, where we don't know about
what WAL logging was done. Nor can you easily just add space on the page below
md.c, for the purpose of storing an LSN independent IV and the authentication
data.

You could decide that the security that XTS provides is sufficient (XTS could
be used without a real IV, with some downsides), but we'd pretty much prevent
ourselves from ever implementing authenticated encryption.

Greetings,

Andres Freund



Re: Moving forward with TDE [PATCH v3]

From
Andres Freund
Date:
On 2023-11-02 19:32:28 -0700, Andres Freund wrote:
> > From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> > From: David Christensen <david@pgguru.net>
> > Date: Fri, 29 Sep 2023 15:16:00 -0400
> > Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
> >
> > When using an encrypted cluster, we need to ensure that the WAL is also
> > encrypted. While we could go with an page-based approach, we use instead a
> > per-record approach, using GCM for the encryption method and storing the AuthTag
> > in the xl_crc field.

What was the reason for this decision?

?



Re: Moving forward with TDE [PATCH v3]

From
Matthias van de Meent
Date:
On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > I'm quite surprised at the significant number of changes being made
> > outside the core storage manager files. I thought that changing out
> > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > would be the most obvious change to implement cluster-wide encryption
> > with the least code touched, as relations don't need to know whether
> > the files they're writing are encrypted, right? Is there a reason to
> > not implement this at the smgr level that I overlooked in the
> > documentation of these patches?
>
> You can't really implement encryption transparently inside an smgr without
> significant downsides. You need a way to store an initialization vector
> associated with the page (or you can store that elsewhere, but then you've
> doubled the worst cse amount of random reads/writes). The patch uses the LSN
> as the IV (which I doubt is a good idea). For authenticated encryption further
> additional storage space is required.

I am unaware of any user of the smgr API that doesn't also use the
buffer cache, and thus implicitly the Page layout with PageHeader
[^1]. The API of smgr is also tailored to page-sized quanta of data
with mostly relation-level information. I don't see why there would be
a veil covering the layout of Page for smgr when all other information
already points to the use of PageHeader and Page layouts. In my view,
it would even make sense to allow the smgr to get exclusive access to
some part of the page in the current Page layout.

Yes, I agree that there will be an impact on usable page size if you
want authenticated encryption, and that AMs will indeed need to
account for storage space now being used by the smgr - inconvenient,
but it serves a purpose. That would happen regardless of whether smgr
or some higher system decides where to store the data for encryption -
as long as it is on the page, the AM effectively can't use those
bytes.
But I'd say that's best solved by making the Page documentation and
PageInit API explicit about the potential use of that space by the
chosen storage method (encrypted, plain, ...) instead of requiring the
various AMs to manually consider encryption when using Postgres' APIs
for writing data to disk without hitting shared buffers; page space
management is already a task of AMs, but handling the actual
encryption is not.

Should the AM really care whether the data on disk is encrypted or
not? I don't think so. When the disk contains encrypted bytes, but
smgrread() and smgrwrite() both produce and accept plaintext data,
who's going to complain? Requiring AMs to be mindful about encryption
on all common paths only adds pitfalls where encryption would be
forgotten by the developer of AMs in one path or another.

> To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
> increases in additional situations (a more aggressive version of
> wal_log_hint_bits) - which can't be done below smgr, where we don't know about
> what WAL logging was done. Nor can you easily just add space on the page below
> md.c, for the purpose of storing an LSN independent IV and the authentication
> data.

I think that getting PageInit to allocate the smgr-specific area would
take some effort, too (which would potentially require adding some
relational context to PageInit, so that it knows which page of which
relation it is going to initialize), but IMHO that would be more
natural than requiring all index and table AMs to be aware the actual
encryption of its pages and require manual handling of that encryption
when the page needs to be written to disk, when it otherwise already
conforms to the various buffer management and file extension APIs
currently in use in PostgreSQL. I would expect "transparent" data
encryption to be handled at the file write layer (i.e. smgr), not
inside the AMs.

Kind regards,

Matthias van de Meent

[^1] ReadBuffer_common uses PageIsVerifiedExtended which verifies that
a page conforms with Postgres' Page layout if checksums are enabled.
Furthermore, all builtin index AMs utilize pd_special, further
implying the use of a PageInit/PageHeader-based page layout.
Additionally, the heap tableAM also complies, and both FSM and VM also
use postgres' Page layout.
As for other AMs that I could check: bloom, rum, and pgvector's
ivfflat and hnsw all use page layouts.



Re: Moving forward with TDE [PATCH v3]

From
Stephen Frost
Date:
Greetings,

Thanks for your feedback on this.

* Andres Freund (andres@anarazel.de) wrote:
> I still am quite quite unconvinced that using the LSN as a nonce is a good
> design decision.

This is a really important part of the overall path to moving this
forward, so I wanted to jump to it and have a specific discussion
around this.  I agree that there are downsides to using the LSN, some of
which we could possibly address (eg: include the timeline ID in the IV),
but others that would be harder to deal with.

The question then is- what's the alternative?

One approach would be to change the page format to include space for an
explicit nonce.  I don't see the community accepting such a new page
format as the only format we support though as that would mean no
pg_upgrade support along with wasted space if TDE isn't being used.
Ideally, we'd instead be able to support multiple page formats where
users could decide when they create their cluster what features they
want- and luckily we even have such an effort underway with patches
posted for review [1].  Certainly, with the base page-special-feature
patch, we could have an option for users to choose that they want a
better nonce than the LSN, or we could bundle that assumption in with,
say, the authenticated-encryption feature (if you want authenticated
encryption, then you want more from the encryption system than the
basics, and therefore we presume you also want a better nonce than the
LSN).

Another approach would be a separate fork, but that then has a number of
downsides too- every write has to touch that too, and a single page of
nonces would cover a pretty large number of pages also.

Ultimately, I agree with you that the LSN isn't perfect and we really
shouldn't be calling it 'ideal' as it isn't, and we can work to fix that
language in the patch, but the lack of any alternative being proposed
that might be acceptable makes this feel a bit like rock management [2].

My apologies if there's something good that's already been specifically
pushed and I just missed it; if so, a link to that suggestion and
discussion would be greatly appreciated.

Thanks again!

Stephen

[1]: https://commitfest.postgresql.org/45/3986/
[2]: https://en.wikipedia.org/wiki/Wikipedia:Bring_me_a_rock ; though
that isn't great for a quick summary (which I tried to find on an
isolated page somewhere and didn't).

The gist is, without a suggestion of things to try, we're left
to our own devices to try and figure out things which might be
successful, only to have those turned down too when we come back with
them, see [1] for what feels like an example of this.  Given your
feedback overall, which I'm very thankful for, I'm hopeful that you see
that this is, indeed, a useful feature that people are asking for and
therefore are willing to spend some time on it, but if the feedback is
that nothing on the page is acceptable to use for the nonce, we can't
put the nonce somewhere else, and we can't change the page format, then
everything else is just moving deck chairs around on the titanic that
has been this effort.

Attachment

Re: Moving forward with TDE [PATCH v3]

From
Bruce Momjian
Date:
On Thu, Nov  2, 2023 at 07:32:28PM -0700, Andres Freund wrote:
> On 2023-10-31 16:23:17 -0500, David Christensen wrote:
> > +Implementation
> > +--------------
> > +
> > +To enable cluster file encryption, the initdb option
> > +--cluster-key-command must be used, which specifies a command to
> > +retrieve the KEK.
> 
> FWIW, I think "cluster file encryption" is somewhat ambiguous. That could also
> mean encrypting on the file system level or such.

We could call it:

*  cluster data file encryption
*  cluster data encryption
*  database cluster encryption

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Moving forward with TDE [PATCH v3]

From
Bruce Momjian
Date:
On Mon, Nov  6, 2023 at 09:56:37AM -0500, Stephen Frost wrote:
> The gist is, without a suggestion of things to try, we're left
> to our own devices to try and figure out things which might be
> successful, only to have those turned down too when we come back with
> them, see [1] for what feels like an example of this.  Given your
> feedback overall, which I'm very thankful for, I'm hopeful that you see
> that this is, indeed, a useful feature that people are asking for and
> therefore are willing to spend some time on it, but if the feedback is
> that nothing on the page is acceptable to use for the nonce, we can't
> put the nonce somewhere else, and we can't change the page format, then
> everything else is just moving deck chairs around on the titanic that
> has been this effort.

Yeah, I know the feeling, though I thought XTS was immune enough to
nonce/LSN reuse that it was acceptable.

What got me sunk on the feature was the complexity of adding temporary
file encryption support and that tipped the scales in the negative for
me in community value of the feature vs. added complexity. (Yeah, I used
a Titanic reference in the last sentence. ;-) )  However, I am open to
the community value and complexity values changing over time.  My blog
post on the topic:

    https://momjian.us/main/blogs/pgblog/2023.html#October_19_2023

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Moving forward with TDE [PATCH v3]

From
David Christensen
Date:
Hi, thanks for the detailed feedback here.

I do think it's worth addressing the question Stephen raised as far as what we use for the IV[1]; whether LSN or something else entirely, and if so what.  The choice of LSN here is fairly fundamental to the existing implementation, so if we decide to do something different some of this might be moot.

Best,

David

Re: Moving forward with TDE [PATCH v3]

From
David Christensen
Date:
On Fri, Nov 3, 2023 at 9:53 PM Andres Freund <andres@anarazel.de> wrote:
On 2023-11-02 19:32:28 -0700, Andres Freund wrote:
> > From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> > From: David Christensen <david@pgguru.net>
> > Date: Fri, 29 Sep 2023 15:16:00 -0400
> > Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
> >
> > When using an encrypted cluster, we need to ensure that the WAL is also
> > encrypted. While we could go with an page-based approach, we use instead a
> > per-record approach, using GCM for the encryption method and storing the AuthTag
> > in the xl_crc field.

What was the reason for this decision?
 
This was mainly to prevent IV reuse by using a per-record encryption rather than per-page, since partial writes out on the WAL buffer would result in reuse there.  This was somewhat of an experiment since authenticated data per record was basically equivalent in function to the CRC.

There was a switch here so normal clusters use the crc field with the existing CRC implementation, only encrypted clusters use this alternate approach.

Re: Moving forward with TDE [PATCH v3]

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Nov  6, 2023 at 09:56:37AM -0500, Stephen Frost wrote:
> > The gist is, without a suggestion of things to try, we're left
> > to our own devices to try and figure out things which might be
> > successful, only to have those turned down too when we come back with
> > them, see [1] for what feels like an example of this.  Given your
> > feedback overall, which I'm very thankful for, I'm hopeful that you see
> > that this is, indeed, a useful feature that people are asking for and
> > therefore are willing to spend some time on it, but if the feedback is
> > that nothing on the page is acceptable to use for the nonce, we can't
> > put the nonce somewhere else, and we can't change the page format, then
> > everything else is just moving deck chairs around on the titanic that
> > has been this effort.
>
> Yeah, I know the feeling, though I thought XTS was immune enough to
> nonce/LSN reuse that it was acceptable.

Ultimately it depends on the attack vector you're trying to address, but
generally, I think you're right about the XTS tweak reuse not being that
big of a deal.  XTS isn't CTR or GCM.

With FDE (full disk encryption) you're expecting the attacker to steal
the physical laptop, hard drive, etc, generally, and so the downside of
using the same tweak with XTS over and over again isn't that bad (and is
what most FDE solutions do, aiui, by simply using the sector number; we
could do something similar to that by using the relfilenode + block
number) because that re-use is a problem if the attacker is able to see
multiple copies of the same block over time where the block has been
encrypted with different data but the same key and tweak.

Using the LSN actually is better than what the FDE solutions do because
the LSN varies much more often than the sector number.  Sure, it doesn't
change with every write and maybe an attacker could glean something from
that, but that's quite narrow.  The downside from the LSN based approach
with XTS is probably more that using the LSN means that we can't encrypt
the LSN itself and that is a leak too- but then again, we leak that
through the simple WAL filenames too, to some extent, so it doesn't
strike me as a huge issue.

XTS as a block cipher doesn't suffer from the IV-reuse issue that you
have with streaming ciphers where the same key+IV and different data
leads to being able to trivally retrive the plaintext though and I worry
that maybe that's what people were thinking.

The README and comments I don't think were terribly clear on this and I
think may have even been from back when CTR was being considered, where
IV reuse would have resulted in plaintext being trivially available.

> What got me sunk on the feature was the complexity of adding temporary
> file encryption support and that tipped the scales in the negative for
> me in community value of the feature vs. added complexity. (Yeah, I used
> a Titanic reference in the last sentence. ;-) )  However, I am open to
> the community value and complexity values changing over time.  My blog
> post on the topic:

We do need to address the temporary file situation too and we do have a
bit of an issue that how we deal with temporary files today in PG isn't
very consistent and there's too many ways to do that.  There's a patch
that works on that, though it has some bitrot that we're working on
addressing currently.

There is value in simply fixing that situation wrt temporary file
management independent of encryption, though of course then encryption
of those temporary files becomes much simpler.

Thanks,

Stephen

Attachment

Re: Moving forward with TDE [PATCH v3]

From
Andres Freund
Date:
Hi,

On 2023-11-06 09:56:37 -0500, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > I still am quite quite unconvinced that using the LSN as a nonce is a good
> > design decision.
> 
> This is a really important part of the overall path to moving this
> forward, so I wanted to jump to it and have a specific discussion
> around this.  I agree that there are downsides to using the LSN, some of
> which we could possibly address (eg: include the timeline ID in the IV),
> but others that would be harder to deal with.

> The question then is- what's the alternative?
> 
> One approach would be to change the page format to include space for an
> explicit nonce.  I don't see the community accepting such a new page
> format as the only format we support though as that would mean no
> pg_upgrade support along with wasted space if TDE isn't being used.

Right.


> Ideally, we'd instead be able to support multiple page formats where
> users could decide when they create their cluster what features they
> want- and luckily we even have such an effort underway with patches
> posted for review [1].

I think there are some details wrong with that patch - IMO the existing macros
should just continue to work as-is and instead the places that want the more
narrow definition should be moved to the new macros and it changes places that
should continue to use compile time constants - but it doesn't seem like a
fundamentally bad idea to me.  I certainly like it much better than making the
page size runtime configurable.

(I'll try to reply with the above points to [1])


> Certainly, with the base page-special-feature patch, we could have an option
> for users to choose that they want a better nonce than the LSN, or we could
> bundle that assumption in with, say, the authenticated-encryption feature
> (if you want authenticated encryption, then you want more from the
> encryption system than the basics, and therefore we presume you also want a
> better nonce than the LSN).

I don't think we should support using the LSN as a nonce if we have an
alternative. The cost and complexity overhead is just not worth it.  Yes,
it'll be harder for users to migrate to encryption, but adding complexity
elsewhere in the system to get an inferior result isn't worth it.


> Another approach would be a separate fork, but that then has a number of
> downsides too- every write has to touch that too, and a single page of
> nonces would cover a pretty large number of pages also.

Yea, the costs of doing so is nontrivial. If you were trying to implement
encryption on the smgr level - which I doubt you should but am not certain
about - my suggestion would be to interleave pages with metadata like nonces
and AEAD with the data pages. I.e. one metadata page would be followed by
  (BLCKSZ - SizeOfPageHeaderData) / (sizeof(nonce) + sizeof(AEAD))
pages containing actual relation data.  That way you still get decent locality
during scans and writes.

Relation forks were a mistake, we shouldn't use them in more places.


I think it'd be much better if we also encrypted forks, rather than just the
main fork...

Greetings,

Andres Freund



Re: Moving forward with TDE [PATCH v3]

From
Andres Freund
Date:
Hi,

On 2023-11-06 11:26:44 +0100, Matthias van de Meent wrote:
> On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres@anarazel.de> wrote:
> > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > > I'm quite surprised at the significant number of changes being made
> > > outside the core storage manager files. I thought that changing out
> > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > > would be the most obvious change to implement cluster-wide encryption
> > > with the least code touched, as relations don't need to know whether
> > > the files they're writing are encrypted, right? Is there a reason to
> > > not implement this at the smgr level that I overlooked in the
> > > documentation of these patches?
> >
> > You can't really implement encryption transparently inside an smgr without
> > significant downsides. You need a way to store an initialization vector
> > associated with the page (or you can store that elsewhere, but then you've
> > doubled the worst cse amount of random reads/writes). The patch uses the LSN
> > as the IV (which I doubt is a good idea). For authenticated encryption further
> > additional storage space is required.
> 
> I am unaware of any user of the smgr API that doesn't also use the
> buffer cache, and thus implicitly the Page layout with PageHeader
> [^1]

Everything indeed uses a PageHeader - but there are a number of places that do
*not* utilize pd_lower/upper/special. E.g. visibilitymap.c just assumes that
those fields are zero - and changing that wouldn't be trivial / free, because
we do a lot of bitmasking/shifting with constants derived from

#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))

which obviously wouldn't be constant anymore if you could reserve space on the
page.


> The API of smgr is also tailored to page-sized quanta of data
> with mostly relation-level information. I don't see why there would be
> a veil covering the layout of Page for smgr when all other information
> already points to the use of PageHeader and Page layouts. In my view,
> it would even make sense to allow the smgr to get exclusive access to
> some part of the page in the current Page layout.
> 
> Yes, I agree that there will be an impact on usable page size if you
> want authenticated encryption, and that AMs will indeed need to
> account for storage space now being used by the smgr - inconvenient,
> but it serves a purpose. That would happen regardless of whether smgr
> or some higher system decides where to store the data for encryption -
> as long as it is on the page, the AM effectively can't use those
> bytes.
> But I'd say that's best solved by making the Page documentation and
> PageInit API explicit about the potential use of that space by the
> chosen storage method (encrypted, plain, ...) instead of requiring the
> various AMs to manually consider encryption when using Postgres' APIs
> for writing data to disk without hitting shared buffers; page space
> management is already a task of AMs, but handling the actual
> encryption is not.

I don't particularly disagree with any detail here - but to me reserving space
for nonces etc at PageInit() time pretty much is the opposite of handling
encryption inside smgr.


> Should the AM really care whether the data on disk is encrypted or
> not? I don't think so. When the disk contains encrypted bytes, but
> smgrread() and smgrwrite() both produce and accept plaintext data,
> who's going to complain? Requiring AMs to be mindful about encryption
> on all common paths only adds pitfalls where encryption would be
> forgotten by the developer of AMs in one path or another.

I agree with that - I think the way the patch currently is designed is not
right.


There's other stuff you can't trivially do at the smgr level. E.g. if
checksums or encryption is enabled, you need to copy the buffer to compute
checksums / do IO if in shared buffers, because somebody could set a hint bit
even with just a shared content lock. But you don't need that when coming from
private buffers during index builds.



> I think that getting PageInit to allocate the smgr-specific area would
> take some effort, too (which would potentially require adding some
> relational context to PageInit, so that it knows which page of which
> relation it is going to initialize), but IMHO that would be more
> natural than requiring all index and table AMs to be aware the actual
> encryption of its pages and require manual handling of that encryption
> when the page needs to be written to disk, when it otherwise already
> conforms to the various buffer management and file extension APIs
> currently in use in PostgreSQL. I would expect "transparent" data
> encryption to be handled at the file write layer (i.e. smgr), not
> inside the AMs.

As mentioned above - I agree that the relevant code shouldn't be in index
AMs. But I somewhat doubt that smgr is the right level either. For one, the
place computing checksums needs awareness of locking / sharing semantics as
well as knowledge about WAL logging. That's IMO above smgr. For another, if we
ever got another smgr implementation - should it have to reimplement
encryption?

ISTM that there's a layer missing. Places bypassing bufmgr.c currently need
their own handling of checksums (and in the future encryption), because the
relevant bufmgr.c code can't be reached without pages in the buffer
pool. Which is why we have PageIsVerifiedExtended() and
PageSetChecksumInplace() calls in gist, hash, heapam, nbtree ... IMO when
checksums were added, we should have added the proper abstraction layer
instead of littering the code with redundant copies.


Greetings,

Andres Freund



Re: Moving forward with TDE [PATCH v3]

From
David Christensen
Date:
On Tue, Nov 7, 2023 at 6:47 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2023-11-06 11:26:44 +0100, Matthias van de Meent wrote:
> On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres@anarazel.de> wrote:
> > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > > I'm quite surprised at the significant number of changes being made
> > > outside the core storage manager files. I thought that changing out
> > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > > would be the most obvious change to implement cluster-wide encryption
> > > with the least code touched, as relations don't need to know whether
> > > the files they're writing are encrypted, right? Is there a reason to
> > > not implement this at the smgr level that I overlooked in the
> > > documentation of these patches?
> >
> > You can't really implement encryption transparently inside an smgr without
> > significant downsides. You need a way to store an initialization vector
> > associated with the page (or you can store that elsewhere, but then you've
> > doubled the worst cse amount of random reads/writes). The patch uses the LSN
> > as the IV (which I doubt is a good idea). For authenticated encryption further
> > additional storage space is required.
>
> I am unaware of any user of the smgr API that doesn't also use the
> buffer cache, and thus implicitly the Page layout with PageHeader
> [^1]

Everything indeed uses a PageHeader - but there are a number of places that do
*not* utilize pd_lower/upper/special. E.g. visibilitymap.c just assumes that
those fields are zero - and changing that wouldn't be trivial / free, because
we do a lot of bitmasking/shifting with constants derived from

#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))

which obviously wouldn't be constant anymore if you could reserve space on the
page.

While not constants, I was able to get this working with variable values here in a way that did not have the overhead of the original patch for the vismap specifically using Montgomery Multiplication for division/mod. This was actually the heaviest of the changes from moving to runtime-calculated, so we might be able to use this approach in this specific case even if only this change is required for this specific fork.
 
> The API of smgr is also tailored to page-sized quanta of data
> with mostly relation-level information. I don't see why there would be
> a veil covering the layout of Page for smgr when all other information
> already points to the use of PageHeader and Page layouts. In my view,
> it would even make sense to allow the smgr to get exclusive access to
> some part of the page in the current Page layout.
>
> Yes, I agree that there will be an impact on usable page size if you
> want authenticated encryption, and that AMs will indeed need to
> account for storage space now being used by the smgr - inconvenient,
> but it serves a purpose. That would happen regardless of whether smgr
> or some higher system decides where to store the data for encryption -
> as long as it is on the page, the AM effectively can't use those
> bytes.
> But I'd say that's best solved by making the Page documentation and
> PageInit API explicit about the potential use of that space by the
> chosen storage method (encrypted, plain, ...) instead of requiring the
> various AMs to manually consider encryption when using Postgres' APIs
> for writing data to disk without hitting shared buffers; page space
> management is already a task of AMs, but handling the actual
> encryption is not.

I don't particularly disagree with any detail here - but to me reserving space
for nonces etc at PageInit() time pretty much is the opposite of handling
encryption inside smgr.

Originally, I was anticipating that we might want different space amounts reserved on different classes of pages (apart from encryption), so while we'd be storing the default page reserved size in pg_control we'd not be limited to this in the structure of the page calls.  We could presumably just move the logic into PageInit() itself if every reserved allocation is the same and individual call sites wouldn't need to know about it.  The call sites do have more context as to the requirements of the page or the "type" of page in play, which if we made it dependent on page type would need to get passed in somehow, which was where the reserved_page_size parameter came in to the current patch.

> Should the AM really care whether the data on disk is encrypted or
> not? I don't think so. When the disk contains encrypted bytes, but
> smgrread() and smgrwrite() both produce and accept plaintext data,
> who's going to complain? Requiring AMs to be mindful about encryption
> on all common paths only adds pitfalls where encryption would be
> forgotten by the developer of AMs in one path or another.

I agree with that - I think the way the patch currently is designed is not
right.

The things that need to care tend to be the same places that need to care about setting checksums, that or being aware of the LSNs in play or needing to be set.  I'd agree that a common interface for "get this page ready for writing to storage" and "get this page converted from storage" which could handle both checksums, encryption, or additional page features would make sense. (I doubt we'd want hooks to support page in/page out, but if we /did/ want that, this'd also likely live there.)

There's other stuff you can't trivially do at the smgr level. E.g. if
checksums or encryption is enabled, you need to copy the buffer to compute
checksums / do IO if in shared buffers, because somebody could set a hint bit
even with just a shared content lock. But you don't need that when coming from
private buffers during index builds.



> I think that getting PageInit to allocate the smgr-specific area would
> take some effort, too (which would potentially require adding some
> relational context to PageInit, so that it knows which page of which
> relation it is going to initialize), but IMHO that would be more
> natural than requiring all index and table AMs to be aware the actual
> encryption of its pages and require manual handling of that encryption
> when the page needs to be written to disk, when it otherwise already
> conforms to the various buffer management and file extension APIs
> currently in use in PostgreSQL. I would expect "transparent" data
> encryption to be handled at the file write layer (i.e. smgr), not
> inside the AMs.

As mentioned above - I agree that the relevant code shouldn't be in index
AMs. But I somewhat doubt that smgr is the right level either. For one, the
place computing checksums needs awareness of locking / sharing semantics as
well as knowledge about WAL logging. That's IMO above smgr. For another, if we
ever got another smgr implementation - should it have to reimplement
encryption?

ISTM that there's a layer missing. Places bypassing bufmgr.c currently need
their own handling of checksums (and in the future encryption), because the
relevant bufmgr.c code can't be reached without pages in the buffer
pool. Which is why we have PageIsVerifiedExtended() and
PageSetChecksumInplace() calls in gist, hash, heapam, nbtree ... IMO when
checksums were added, we should have added the proper abstraction layer
instead of littering the code with redundant copies.

Agreed that a fixup patch to add /something/ here would be good.  A concern here is what context would need to be passed in; certainly with only checksums just a Page is sufficient, but if we have AAD we'd need to be able to pass that in or otherwise be able to identify it in the page. As a point of references, the existing GCM patch authenticates all unencrypted header fields (i.e., PageHeaderData up to the pd_special field), plus the RelFileNumber and BlockNumber, of which we'd need to pass in RelFileNumber and BlockNumber (and presumably would want the ForkNum as well if expanding the set of authenticated data).  To some extent, we can punt on some of this, as the existing call sites have been modified in this patch to pass that info in already, so it's really about how we marshal that data.

Thanks,

David

Re: Moving forward with TDE [PATCH v3]

From
David Christensen
Date:
On Tue, Nov 7, 2023 at 5:49 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2023-11-06 09:56:37 -0500, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > I still am quite quite unconvinced that using the LSN as a nonce is a good
> > design decision.
>
> This is a really important part of the overall path to moving this
> forward, so I wanted to jump to it and have a specific discussion
> around this.  I agree that there are downsides to using the LSN, some of
> which we could possibly address (eg: include the timeline ID in the IV),
> but others that would be harder to deal with.

> The question then is- what's the alternative?
>
> One approach would be to change the page format to include space for an
> explicit nonce.  I don't see the community accepting such a new page
> format as the only format we support though as that would mean no
> pg_upgrade support along with wasted space if TDE isn't being used.

Right.

Hmm, if we /were/ to introduce some sort of page format change, Couldn't that be a use case for modifying the pd_version field?  Could v4 pages be read in and written out as v5 pages with different interpretations?
 
> Ideally, we'd instead be able to support multiple page formats where
> users could decide when they create their cluster what features they
> want- and luckily we even have such an effort underway with patches
> posted for review [1].

I think there are some details wrong with that patch - IMO the existing macros
should just continue to work as-is and instead the places that want the more
narrow definition should be moved to the new macros and it changes places that
should continue to use compile time constants - but it doesn't seem like a
fundamentally bad idea to me.  I certainly like it much better than making the
page size runtime configurable.

There had been some discussion about this WRT renaming macros and the like (constants, etc)—I think a new pass eliminating the variable blocksize pieces and seeing if we can minimize churn here is worthwhile, will take a look and see what the minimally-viable set of changes is here.
 
(I'll try to reply with the above points to [1])


> Certainly, with the base page-special-feature patch, we could have an option
> for users to choose that they want a better nonce than the LSN, or we could
> bundle that assumption in with, say, the authenticated-encryption feature
> (if you want authenticated encryption, then you want more from the
> encryption system than the basics, and therefore we presume you also want a
> better nonce than the LSN).

I don't think we should support using the LSN as a nonce if we have an
alternative. The cost and complexity overhead is just not worth it.  Yes,
it'll be harder for users to migrate to encryption, but adding complexity
elsewhere in the system to get an inferior result isn't worth it.

From my read, XTS (which I'd see as inferior to authenticated encryption, but better than some other options) could use LSN as an IV without leakage concerns, perhaps mixing in the BlockNumber as well.  If we are going to allow multiple encryption types, I think we may need to consider that needs for IVs may be different, so this may need to be something that is selectable per encryption type.  

I am unclear how much of a requirement this is, but seems like having a design supporting this to be pluggable—even if a static lookup table internally for encryption type, block length, IV source, etc—seems the most future proof if we had to retire an encryption method or prevent creation of specific methods, say.
 
> Another approach would be a separate fork, but that then has a number of
> downsides too- every write has to touch that too, and a single page of
> nonces would cover a pretty large number of pages also.

Yea, the costs of doing so is nontrivial. If you were trying to implement
encryption on the smgr level - which I doubt you should but am not certain
about - my suggestion would be to interleave pages with metadata like nonces
and AEAD with the data pages. I.e. one metadata page would be followed by
  (BLCKSZ - SizeOfPageHeaderData) / (sizeof(nonce) + sizeof(AEAD))
pages containing actual relation data.  That way you still get decent locality
during scans and writes.

Hmm, this is actually an interesting idea, I will think about this a bit.
 
Relation forks were a mistake, we shouldn't use them in more places.


I think it'd be much better if we also encrypted forks, rather than just the
main fork...

I believe the existing code should just work by modifying the PageNeedsToBeEncrypted macro; I will test that and see if anything blows up.

David

Re: Moving forward with TDE

From
Chris Travers
Date:
Hi,

I was re-reading the patches here  and there was one thing I didn't understand.

There are provisions for a separation of data encryption keys for primary and replica I see, and these share a single
WALkey.
 

But if I am setting up a replica from the primary, and the primary is already encrypted, then do these forceably share
thesame data encrypting keys?  Is there a need to have (possibly in a follow-up patch) an ability to decrypt and
re-encryptin pg_basebackup (which would need access to both keys) or is this handled already and I just missed it?
 

Best Wishes,
Chris Travers

Re: Moving forward with TDE

From
Bruce Momjian
Date:
On Sun, Dec 17, 2023 at 06:30:50AM +0000, Chris Travers wrote:
> Hi,
> 
> I was re-reading the patches here  and there was one thing I didn't understand.
> 
> There are provisions for a separation of data encryption keys for primary and replica I see, and these share a single
WALkey.
 
> 
> But if I am setting up a replica from the primary, and the primary is already encrypted, then do these forceably
sharethe same data encrypting keys?  Is there a need to have (possibly in a follow-up patch) an ability to decrypt and
re-encryptin pg_basebackup (which would need access to both keys) or is this handled already and I just missed it?
 

Yes, decrypt and re-encrypt in pg_basebackup would be necessary, or in
the actual protocol stream.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Moving forward with TDE [PATCH v3]

From
Peter Smith
Date:
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

======
[1] https://commitfest.postgresql.org/46/3985/
[2] https://cirrus-ci.com/task/5498215743619072

Kind Regards,
Peter Smith.



Re: Moving forward with TDE [PATCH v3]

From
vignesh C
Date:
On Mon, 22 Jan 2024 at 11:47, Peter Smith <smithpb2250@gmail.com> wrote:
>
> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh