Thread: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Hello Hackers,

This propose a way to develop "Table-level" Transparent Data Encryption (TDE) and Key Management Service (KMS) support
in
PostgreSQL.


Issues on data encryption of PostgreSQL
==========
Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
However, it is inconvenient to use pgcrypto to encrypts data in some cases.

There are two significant inconveniences.

First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
Second, we must modify application program code much if we want to do database migration to PostgreSQL from other
databasesthat is
 
using TDE.

To resolved these inconveniences, many users want to support TDE.
There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.

However, currently PostgreSQL does not support TDE, so in development community, there are discussions whether it's
necessaryto
 
support TDE or not.

In these discussions, there were requirements necessary to support TDE in PostgreSQL.

1) The performance overhead of encryption and decryption database data must be minimized
2) Need to support WAL encryption.
3) Need to support Key Management Service.

Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
Since this feature will become very large, I'd like to hear opinions from community before starting making the patch.

First, my proposal is table-level TDE which is that user can specify tables begin encrypted. 
Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.

Moreover, I want to support encryption for large object as well.
But I haven't found a good way for it so far. So I'd like to remain it as future TODO.

My proposal has five characteristics features of "table-level TDE".

1) Buffer-level data encryption and decryption
2) Per-table encryption
3) 2-tier encryption key management
4) Working with external key management services(KMS)
5) WAL encryption

Here are more details for each items.


1. Buffer-level data encryption and decryption
==================
Transparent data encryption and decryption accompany by storage operation 
With ordinally way like using pgcrypto, the biggest problem with encrypted data is the performance overhead of
decryptingthe data
 
each time the run to queries.

My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.

With this design, data encryption/decryption implementations can be developed by modifying the codes of the storage and
buffer
manager modules, 
which are responsible for performing DISK I/O operation.


2. Per-table encryption
==================
User can enable TDE per table as they want.
I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.

    // Generate  the encryption table
       CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );

    // Change to the non-encryption table
       ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );

This approach minimizes the overhead for tables that do not require encryption options.
For tables that enable TDE, the corresponding table key will be generated with random values, and it's stored into the
newsystem
 
catalog after being encrypted by the master key.

BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this proposal. 
I'd like to hear opinions by security engineer.


3. 2-tier encryption key management
==================
when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to all
data.

To solve this problem we employee 2-tier encryption. 
2-tier encryption is All table keys can be stored in the database cluster after being encrypted by the master key, And
masterkeys
 
must be stored at external of PostgreSQL.

Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the
databasedata.
 

When changing the key, it's not necessary to re-encrypt for all data.
We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the performance
overhead.

For table keys, all TDE-enabled tables have different table keys. 
And for master key, all database have different master keys. Table keys are encrypted by the master key of its own
database.
For WAL encryption, we have another cryptographic key. WAL-key is also encrypted by a master key, but it is shared
acrossthe
 
database cluster.


4. Working with external key management services(KMS)
==================
A key management service is an integrated approach for generating, fetching and managing encryption keys for key
control.
 
They may cover all aspects of security from the secure generation of keys, secure storing keys, and secure fetching
keysup to
 
encryption key handling.
Also, various types of KMSs are provided by many companies, and users can choose them.

Therefore I would like to manage the master key using KMS.
Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that users
canuse many
 
types of KMS as they want.

In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS. 
So in my proposal, all master keys are distinguished by its ID, called "master key ID". 
The master key ID is made, for example, using the database oid and a sequence number, like <OID>_<SeqNo>. And they are
managedin
 
PostgreSQL.
    
When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.

When it comes time to rotate the master keys, run this query.

    ALTER SYSTEM ROTATION MASTER KEY;

In this query, the master key is rotated with the following step.
1. Generate new master key,
2. Change master key IDs and emit corresponding WAL
3. Re-encrypt all table keys on its database

Also during checkpoint, master key IDs on shared memory become a permanent condition.


5. WAL encryption
==================
If we encrypt all WAL records, performance overhead can be significant.
Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL buffer,
insteadof
 
encrypting a whole WAL record.
WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier
encryptionfor WAL
 
encryption as well.
So, when it comes time to rotate the WAL encryption key, run this query.

    ALTER SYSTEM ROTATION WAL KEY;

Next, I will explain how to encrypt WAL.

To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or not.

Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.

In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
In the case of PITR, we use WAL key ID in the backup file.

With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost the
sameas
 
before.


==================
I'd like to discuss the design before starting making any change of code.
After a more discussion I want to make a PoC.
Feedback and suggestion are very welcome.

Finally, thank you initial design input for Masahiko Sawada.

Thank you.

[1] What does TDE mean?
    > https://en.wikipedia.org/wiki/Transparent_Data_Encryption

[2] What does KMS mean?
    > https://en.wikipedia.org/wiki/Key_management#Key_Management_System

[3] What does CBC-Mode mean?
    > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
    
[4] Recently discussed mail
    https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com


Regards.
Moon.
----------------------------------------
Moon, Insung
NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
----------------------------------------





Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:

This patch seems to implement some of the features you propose, especially
encryption of buffers and WAL. I recommend you to check so that no effort is
duplicated:

> [4] Recently discussed mail
>     https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com



--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Hello Moon,

I promised to email links to the articles I mentioned during your talk
on the PGCon Unconference to this thread. Here they are:

* http://cryptowiki.net/index.php?title=Order-preserving_encryption
* https://en.wikipedia.org/wiki/Homomorphic_encryption

Also I realized that I was wrong regarding encryption of the indexes
since they will be encrypted on the block level the same way the heap
will be.

--
Best regards,
Aleksander Alekseev

Attachment
On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
<Moon_Insung_i3@lab.ntt.co.jp> wrote:
> Hello Hackers,
>
> This propose a way to develop "Table-level" Transparent Data Encryption (TDE) and Key Management Service (KMS)
supportin
 
> PostgreSQL.
>
>
> Issues on data encryption of PostgreSQL
> ==========
> Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
> However, it is inconvenient to use pgcrypto to encrypts data in some cases.
>
> There are two significant inconveniences.
>
> First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
> Second, we must modify application program code much if we want to do database migration to PostgreSQL from other
databasesthat is
 
> using TDE.
>
> To resolved these inconveniences, many users want to support TDE.
> There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.
>
> However, currently PostgreSQL does not support TDE, so in development community, there are discussions whether it's
necessaryto
 
> support TDE or not.
>
> In these discussions, there were requirements necessary to support TDE in PostgreSQL.
>
> 1) The performance overhead of encryption and decryption database data must be minimized
> 2) Need to support WAL encryption.
> 3) Need to support Key Management Service.
>
> Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
> Since this feature will become very large, I'd like to hear opinions from community before starting making the
patch.
>
> First, my proposal is table-level TDE which is that user can specify tables begin encrypted.
> Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.
>
> Moreover, I want to support encryption for large object as well.
> But I haven't found a good way for it so far. So I'd like to remain it as future TODO.
>
> My proposal has five characteristics features of "table-level TDE".
>
> 1) Buffer-level data encryption and decryption
> 2) Per-table encryption
> 3) 2-tier encryption key management
> 4) Working with external key management services(KMS)
> 5) WAL encryption
>
> Here are more details for each items.
>
>
> 1. Buffer-level data encryption and decryption
> ==================
> Transparent data encryption and decryption accompany by storage operation
> With ordinally way like using pgcrypto, the biggest problem with encrypted data is the performance overhead of
decryptingthe data
 
> each time the run to queries.
>
> My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
> Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.
>
> With this design, data encryption/decryption implementations can be developed by modifying the codes of the storage
andbuffer
 
> manager modules,
> which are responsible for performing DISK I/O operation.
>
>
> 2. Per-table encryption
> ==================
> User can enable TDE per table as they want.
> I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.
>
>     // Generate  the encryption table
>        CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );
>
>     // Change to the non-encryption table
>        ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );
>
> This approach minimizes the overhead for tables that do not require encryption options.
> For tables that enable TDE, the corresponding table key will be generated with random values, and it's stored into
thenew system
 
> catalog after being encrypted by the master key.
>
> BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this
proposal.
> I'd like to hear opinions by security engineer.
>
>
> 3. 2-tier encryption key management
> ==================
> when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to
alldata.
 
>
> To solve this problem we employee 2-tier encryption.
> 2-tier encryption is All table keys can be stored in the database cluster after being encrypted by the master key,
Andmaster keys
 
> must be stored at external of PostgreSQL.
>
> Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the
databasedata.
 
>
> When changing the key, it's not necessary to re-encrypt for all data.
> We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the
performanceoverhead.
 
>
> For table keys, all TDE-enabled tables have different table keys.
> And for master key, all database have different master keys. Table keys are encrypted by the master key of its own
database.
> For WAL encryption, we have another cryptographic key. WAL-key is also encrypted by a master key, but it is shared
acrossthe
 
> database cluster.
>
>
> 4. Working with external key management services(KMS)
> ==================
> A key management service is an integrated approach for generating, fetching and managing encryption keys for key
control.
> They may cover all aspects of security from the secure generation of keys, secure storing keys, and secure fetching
keysup to
 
> encryption key handling.
> Also, various types of KMSs are provided by many companies, and users can choose them.
>
> Therefore I would like to manage the master key using KMS.
> Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that
userscan use many
 
> types of KMS as they want.
>
> In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS.
> So in my proposal, all master keys are distinguished by its ID, called "master key ID".
> The master key ID is made, for example, using the database oid and a sequence number, like <OID>_<SeqNo>. And they
aremanaged in
 
> PostgreSQL.
>
> When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.
>
> When it comes time to rotate the master keys, run this query.
>
>         ALTER SYSTEM ROTATION MASTER KEY;
>
> In this query, the master key is rotated with the following step.
> 1. Generate new master key,
> 2. Change master key IDs and emit corresponding WAL
> 3. Re-encrypt all table keys on its database
>
> Also during checkpoint, master key IDs on shared memory become a permanent condition.
>
>
> 5. WAL encryption
> ==================
> If we encrypt all WAL records, performance overhead can be significant.
> Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL buffer,
insteadof
 
> encrypting a whole WAL record.
> WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier
encryptionfor WAL
 
> encryption as well.
> So, when it comes time to rotate the WAL encryption key, run this query.
>
>         ALTER SYSTEM ROTATION WAL KEY;
>
> Next, I will explain how to encrypt WAL.
>
> To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or
not.
>
> Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.
>
> In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
> In the case of PITR, we use WAL key ID in the backup file.
>
> With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost
thesame as
 
> before.
>
>
> ==================
> I'd like to discuss the design before starting making any change of code.
> After a more discussion I want to make a PoC.
> Feedback and suggestion are very welcome.
>
> Finally, thank you initial design input for Masahiko Sawada.
>
> Thank you.
>
> [1] What does TDE mean?
>     > https://en.wikipedia.org/wiki/Transparent_Data_Encryption
>
> [2] What does KMS mean?
>     > https://en.wikipedia.org/wiki/Key_management#Key_Management_System
>
> [3] What does CBC-Mode mean?
>     > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
>
> [4] Recently discussed mail
>     https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com
>
>

As per discussion at PGCon unconference, I think that firstly we need
to discuss what threats we want to defend database data against. If
user wants to defend against a threat that is malicious user who
logged in OS or database steals an important data on datbase this
design TDE would not help. Because such user can steal the data by
getting a memory dump or by SQL. That is of course differs depending
on system requirements or security compliance but what threats do you
want to defend database data against? and why?

Also, if I understand correctly, at unconference session there also
were two suggestions about the design other than the suggestion by
Alexander: implementing TDE at column level using POLICY, and
implementing TDE at table-space level. The former was suggested by Joe
but I'm not sure the detail of that suggestion. I'd love to hear the
deal of that suggestion. The latter was suggested by Tsunakawa-san.
Have you considered that?

You mentioned that encryption of temporary data for query processing
and large objects are still under the consideration. But other than
them you should consider the temporary data generated by other
subsystems such as reorderbuffer and transition table as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On 06/11/2018 11:22 AM, Masahiko Sawada wrote:
> On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
> <Moon_Insung_i3@lab.ntt.co.jp> wrote:
>> Hello Hackers,
>>
>> This propose a way to develop "Table-level" Transparent Data 
>> Encryption (TDE) and Key Management Service (KMS) support in 
>> PostgreSQL.
>>
>> ...
> 
> As per discussion at PGCon unconference, I think that firstly we
> need to discuss what threats we want to defend database data against.
> If user wants to defend against a threat that is malicious user who 
> logged in OS or database steals an important data on datbase this 
> design TDE would not help. Because such user can steal the data by 
> getting a memory dump or by SQL. That is of course differs depending 
> on system requirements or security compliance but what threats do
> you want to defend database data against? and why?
> 

I do agree with this - a description of the threat model needs to be 
part of the design discussion, otherwise it's not possible to compare it 
to alternative solutions (e.g. full-disk encryption using LUKS or using 
existing privilege controls and/or RLS).

TDE was proposed/discussed repeatedly in the past, and every time it 
died exactly because it was not very clear which issue it was attempting 
to solve.

Let me share some of the issues mentioned as possibly addressed by TDE 
(I'm not entirely sure TDE actually solves them, I'm just saying those 
were mentioned in previous discussions):

1) enterprise requirement - Companies want in-database encryption, for 
various reasons (because "enterprise solution" or something).

2) like FDE, but OS/filesystem independent - Same config on any OS and 
filesystem, which may make maintenance easier.

3) does not require special OS/filesystem setup - Does not require help 
from system adminitrators, setup of LUKS devices or whatever.

4) all filesystem access (basebackups/rsync) is encrypted anyway

5) solves key management (the main challenge with pgcrypto)

6) allows encrypting only some of the data (tables, columns) to minimize 
performance impact

IMHO it makes sense to have TDE even if it provides the same "security" 
as disk-level encryption, assuming it's more convenient to setup/use 
from the database.

> Also, if I understand correctly, at unconference session there also 
> were two suggestions about the design other than the suggestion by 
> Alexander: implementing TDE at column level using POLICY, and 
> implementing TDE at table-space level. The former was suggested by
> Joe but I'm not sure the detail of that suggestion. I'd love to hear
> the deal of that suggestion. The latter was suggested by
> Tsunakawa-san. Have you considered that?
> 
> You mentioned that encryption of temporary data for query processing 
> and large objects are still under the consideration. But other than 
> them you should consider the temporary data generated by other 
> subsystems such as reorderbuffer and transition table as well.
> 

The severity of those limitations is likely related to the threat model. 
I don't think encrypting temporary data would be a big problem, assuming 
you know which key to use.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Hi,

On 05/25/2018 01:41 PM, Moon, Insung wrote:
> Hello Hackers,
> 
> ...
> 
> BTW, I want to support CBC mode encryption[3]. However, I'm not sure 
> how to use the IV in CBC mode for this proposal. I'd like to hear
> opinions by security engineer.
> 

I'm not a cryptographer either, but this is exactly where you need a 
prior discussion about the threat models - there are a couple of 
chaining modes, each with different weaknesses.

FWIW it may also matter if data_checksums are enabled, because that may 
prevent malleability attacks affecting of the modes. Assuming active 
attacker (with the ability to modify the data files) is part of the 
threat model, of course.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


On 06/11/2018 05:22 AM, Masahiko Sawada wrote:
> As per discussion at PGCon unconference, I think that firstly we need
> to discuss what threats we want to defend database data against.

Exactly. While certainly there is demand for encryption for the sake of
"checking a box", different designs will defend against different
threats, and we should be clear on which ones we are trying to protect
against for any particular design.

> Also, if I understand correctly, at unconference session there also
> were two suggestions about the design other than the suggestion by
> Alexander: implementing TDE at column level using POLICY, and
> implementing TDE at table-space level. The former was suggested by Joe
> but I'm not sure the detail of that suggestion. I'd love to hear the
> deal of that suggestion.

The idea has not been extensively fleshed out yet, but the thought was
that we create column level POLICY, which would transparently apply some
kind of transform on input and/or output. The transforms would
presumably be expressions, which in turn could use functions (extension
or builtin) to do their work. That would allow encryption/decryption,
DLP (data loss prevention) schemes (masking, redacting), etc. to be
applied based on the policies.

This, in and of itself, would not address key management. There is
probably a separate need for some kind of built in key management --
perhaps a flexible way to integrate with external systems such as Vault
for example, or maybe something self contained, or perhaps both. Or
maybe key management is really tied into the separately discussed effort
to create SQL VARIABLEs somehow.

In any case certainly a lot of room for discussion.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
> Let me share some of the issues mentioned as possibly addressed by TDE
> (I'm not entirely sure TDE actually solves them, I'm just saying those
> were mentioned in previous discussions):

FYI, our product provides TDE like Oracle and SQL Server, which enables encryption per tablespace.  Relations, WAL
recordsand temporary files related to encrypted tablespace are encrypted.
 

http://www.fujitsu.com/global/products/software/middleware/opensource/postgres/

(I wonder why the web site doesn't offer the online manual... I've recognized we need to fix this situation.  Anyway, I
guessthe downloadable trial version includes the manual.)
 



> 1) enterprise requirement - Companies want in-database encryption, for
> various reasons (because "enterprise solution" or something).

To assist compliance with PCI DSS, HIPAA, etc.

> 2) like FDE, but OS/filesystem independent - Same config on any OS and
> filesystem, which may make maintenance easier.
> 
> 3) does not require special OS/filesystem setup - Does not require help
> from system adminitrators, setup of LUKS devices or whatever.
> 
> 4) all filesystem access (basebackups/rsync) is encrypted anyway
> 
> 5) solves key management (the main challenge with pgcrypto)
> 
> 6) allows encrypting only some of the data (tables, columns) to minimize
> performance impact

All yes.


> IMHO it makes sense to have TDE even if it provides the same "security"
> as disk-level encryption, assuming it's more convenient to setup/use
> from the database.

Agreed.


Regards
Takayuki Tsunakawa




> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
> On 05/25/2018 01:41 PM, Moon, Insung wrote:
> > BTW, I want to support CBC mode encryption[3]. However, I'm not sure
> > how to use the IV in CBC mode for this proposal. I'd like to hear
> > opinions by security engineer.
> >
> 
> I'm not a cryptographer either, but this is exactly where you need a
> prior discussion about the threat models - there are a couple of
> chaining modes, each with different weaknesses.
Our products uses XTS, which recent FDE software like BitLocker and TrueCrypt uses instead of CBC.

https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS

"According to SP 800-38E, "In the absence of authentication or access control, XTS-AES provides more protection than
theother approved confidentiality-only modes against unauthorized manipulation of the encrypted data.""
 



> FWIW it may also matter if data_checksums are enabled, because that may
> prevent malleability attacks affecting of the modes. Assuming active
> attacker (with the ability to modify the data files) is part of the
> threat model, of course.

Encrypt the page after embedding its checksum value.  If a malicious attacker modifies a page on disk, then the
decryptedpage would be corrupt anyway, which can be detected by checksum.
 


Regards
Takayuki Tsunakawa



On Wed, Jun 13, 2018 at 10:03 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 06/11/2018 11:22 AM, Masahiko Sawada wrote:
>>
>> On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
>> <Moon_Insung_i3@lab.ntt.co.jp> wrote:
>>>
>>> Hello Hackers,
>>>
>>> This propose a way to develop "Table-level" Transparent Data Encryption
>>> (TDE) and Key Management Service (KMS) support in PostgreSQL.
>>>
>>> ...
>>
>>
>> As per discussion at PGCon unconference, I think that firstly we
>> need to discuss what threats we want to defend database data against.
>> If user wants to defend against a threat that is malicious user who logged
>> in OS or database steals an important data on datbase this design TDE would
>> not help. Because such user can steal the data by getting a memory dump or
>> by SQL. That is of course differs depending on system requirements or
>> security compliance but what threats do
>> you want to defend database data against? and why?
>>
>
> I do agree with this - a description of the threat model needs to be part of
> the design discussion, otherwise it's not possible to compare it to
> alternative solutions (e.g. full-disk encryption using LUKS or using
> existing privilege controls and/or RLS).
>
> TDE was proposed/discussed repeatedly in the past, and every time it died
> exactly because it was not very clear which issue it was attempting to
> solve.
>
> Let me share some of the issues mentioned as possibly addressed by TDE (I'm
> not entirely sure TDE actually solves them, I'm just saying those were
> mentioned in previous discussions):

Thank you for sharing!

>
> 1) enterprise requirement - Companies want in-database encryption, for
> various reasons (because "enterprise solution" or something).

Yes, I'm often asked it by our customers especially for database
migration from DBMS that supports TDE in order to reduce costs of
migration.

>
> 2) like FDE, but OS/filesystem independent - Same config on any OS and
> filesystem, which may make maintenance easier.
>
> 3) does not require special OS/filesystem setup - Does not require help from
> system adminitrators, setup of LUKS devices or whatever.
>
> 4) all filesystem access (basebackups/rsync) is encrypted anyway
>
> 5) solves key management (the main challenge with pgcrypto)
>
> 6) allows encrypting only some of the data (tables, columns) to minimize
> performance impact
>
> IMHO it makes sense to have TDE even if it provides the same "security" as
> disk-level encryption, assuming it's more convenient to setup/use from the
> database.

Agreed.

>
>> Also, if I understand correctly, at unconference session there also were
>> two suggestions about the design other than the suggestion by Alexander:
>> implementing TDE at column level using POLICY, and implementing TDE at
>> table-space level. The former was suggested by
>> Joe but I'm not sure the detail of that suggestion. I'd love to hear
>> the deal of that suggestion. The latter was suggested by
>> Tsunakawa-san. Have you considered that?
>>
>> You mentioned that encryption of temporary data for query processing and
>> large objects are still under the consideration. But other than them you
>> should consider the temporary data generated by other subsystems such as
>> reorderbuffer and transition table as well.
>>
>
> The severity of those limitations is likely related to the threat model. I
> don't think encrypting temporary data would be a big problem, assuming you
> know which key to use.

Agreed. I thought the possibility of non-encrypted temporary data in
backups but since we don't include them in backups it would not be a
big problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Jun 13, 2018 at 10:20 PM, Joe Conway <mail@joeconway.com> wrote:
> On 06/11/2018 05:22 AM, Masahiko Sawada wrote:
>> As per discussion at PGCon unconference, I think that firstly we need
>> to discuss what threats we want to defend database data against.
>
> Exactly. While certainly there is demand for encryption for the sake of
> "checking a box", different designs will defend against different
> threats, and we should be clear on which ones we are trying to protect
> against for any particular design.
>
>> Also, if I understand correctly, at unconference session there also
>> were two suggestions about the design other than the suggestion by
>> Alexander: implementing TDE at column level using POLICY, and
>> implementing TDE at table-space level. The former was suggested by Joe
>> but I'm not sure the detail of that suggestion. I'd love to hear the
>> deal of that suggestion.
>
> The idea has not been extensively fleshed out yet, but the thought was
> that we create column level POLICY, which would transparently apply some
> kind of transform on input and/or output. The transforms would
> presumably be expressions, which in turn could use functions (extension
> or builtin) to do their work. That would allow encryption/decryption,
> DLP (data loss prevention) schemes (masking, redacting), etc. to be
> applied based on the policies.

It seems good idea. Which does this design encrypt data on, buffer or
both buffer and disk? And does this design (per-column encryption) aim
to satisfy something specific security compliance?

> This, in and of itself, would not address key management. There is
> probably a separate need for some kind of built in key management --
> perhaps a flexible way to integrate with external systems such as Vault
> for example, or maybe something self contained, or perhaps both.

I agree to have a flexible way in order to address different
requirements. I thought that having a GUC parameter to which we store
a shell command to get encryption key is enough but considering
integration with various key managements seamlessly I think that we
need to have APIs for key managements. (fetching key, storing key,
generating key etc)

> Or
> maybe key management is really tied into the separately discussed effort
> to create SQL VARIABLEs somehow.
>

Could you elaborate on how key management is tied into SQL VARIABLEs?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On 06/14/2018 12:19 PM, Masahiko Sawada wrote:
> On Wed, Jun 13, 2018 at 10:20 PM, Joe Conway <mail@joeconway.com> wrote:
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
> 
> Which does this design encrypt data on, buffer or both buffer and
> disk?


The point of the design is simply to provide a mechanism for input and
output transformation, not to provide the transform function itself.

How you use that transformation would be entirely up to you, but if you
were providing an encryption transform on input the data would be
encrypted both buffer and disk.

> And does this design (per-column encryption) aim to satisfy something
> specific security compliance?


Again, entirely up to you and dependent on what type of transformation
you provide. If, for example you provided input encryption and output
decryption based on some in memory session variable key, that would be
essentially TDE and would satisfy several common sets of compliance
requirements.


>> This, in and of itself, would not address key management. There is
>> probably a separate need for some kind of built in key management --
>> perhaps a flexible way to integrate with external systems such as Vault
>> for example, or maybe something self contained, or perhaps both.
> 
> I agree to have a flexible way in order to address different
> requirements. I thought that having a GUC parameter to which we store
> a shell command to get encryption key is enough but considering
> integration with various key managements seamlessly I think that we
> need to have APIs for key managements. (fetching key, storing key,
> generating key etc)


I don't like the idea of yet another path for arbitrary shell code
execution. An API for extension code would be preferable.


>> Or
>> maybe key management is really tied into the separately discussed effort
>> to create SQL VARIABLEs somehow.
> 
> Could you elaborate on how key management is tied into SQL VARIABLEs?

Well, the key management probably is not, but the SQL VARIABLE might be
where the key is stored for use.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


On Wed, Jun 13, 2018 at 9:20 AM, Joe Conway <mail@joeconway.com> wrote:
>> Also, if I understand correctly, at unconference session there also
>> were two suggestions about the design other than the suggestion by
>> Alexander: implementing TDE at column level using POLICY, and
>> implementing TDE at table-space level. The former was suggested by Joe
>> but I'm not sure the detail of that suggestion. I'd love to hear the
>> deal of that suggestion.
>
> The idea has not been extensively fleshed out yet, but the thought was
> that we create column level POLICY, which would transparently apply some
> kind of transform on input and/or output. The transforms would
> presumably be expressions, which in turn could use functions (extension
> or builtin) to do their work. That would allow encryption/decryption,
> DLP (data loss prevention) schemes (masking, redacting), etc. to be
> applied based on the policies.

It seems to me that column-level encryption is a lot less secure than
block-level encryption.  I am supposing here that the attack vector is
stealing the disk.  If all you've got is a bunch of 8192-byte blocks,
it's unlikely you can infer much about the contents.  You know the
size of the relations and that's probably about it.  If you've got
individual values being encrypted, then there's more latitude to
figure stuff out.  You can infer something about the length of
particular values.  Perhaps you can find cases where the same
encrypted value appears multiple times.  If there's a btree index, you
know the ordering of the values under whatever ordering semantics
apply to that index.  It's unclear to me how useful such information
would be in practice or to what extent it might allow you to attack
the underlying cryptography, but it seems like there might be cases
where the information leakage is significant.  For example, suppose
you're trying to determine which partially-encrypted record is that of
Aaron Aardvark... or this guy:
https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff,_Sr.

Recently, it was suggested to me that a use case for column-level
encryption might be to prevent casual DBA snooping.  So, you'd want
the data to appear in pg_dump output encrypted, because the DBA might
otherwise look at it, but you wouldn't really be concerned about the
threat of the DBA loading a hostile C module that would steal user
keys and use them to decrypt all the data, because they don't care
that much and would be fired if they were caught doing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On 06/18/2018 09:49 AM, Robert Haas wrote:
> On Wed, Jun 13, 2018 at 9:20 AM, Joe Conway <mail@joeconway.com> wrote:
>>> Also, if I understand correctly, at unconference session there also
>>> were two suggestions about the design other than the suggestion by
>>> Alexander: implementing TDE at column level using POLICY, and
>>> implementing TDE at table-space level. The former was suggested by Joe
>>> but I'm not sure the detail of that suggestion. I'd love to hear the
>>> deal of that suggestion.
>>
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
> 
> It seems to me that column-level encryption is a lot less secure than
> block-level encryption.  I am supposing here that the attack vector is
> stealing the disk.  If all you've got is a bunch of 8192-byte blocks,
> it's unlikely you can infer much about the contents.  You know the
> size of the relations and that's probably about it. 

Not necessarily. Our pages probably have enough predictable bytes to aid
cryptanalysis, compared to user data in a column which might not be very
predicable.


> If you've got individual values being encrypted, then there's more
> latitude to figure stuff out.  You can infer something about the
> length of particular values.  Perhaps you can find cases where the
> same encrypted value appears multiple times.

This completely depends on the encryption scheme you are using, and the
column level POLICY leaves that entirely up to you.

But in any case most encryption schemes use a random nonce (salt) to
ensure two identical strings do not encrypt to the same result. And
often the encrypted length is padded, so while you might be able to
infer short versus long, you would not usually be able to infer the
exact plaintext length.


> If there's a btree index, you know the ordering of the values under
> whatever ordering semantics apply to that index.  It's unclear to me
> how useful such information would be in practice or to what extent it
> might allow you to attack the underlying cryptography, but it seems
> like there might be cases where the information leakage is
> significant.  For example, suppose you're trying to determine which
> partially-encrypted record is that of Aaron Aardvark... or this guy: 
> https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff,_Sr.
Again, this only applies if your POLICY uses this type of encryption,
i.e. homomorphic encryption. If you use strong encryption you will not
be indexing those columns at all, which is pretty commonly the case.

> Recently, it was suggested to me that a use case for column-level
> encryption might be to prevent casual DBA snooping.  So, you'd want
> the data to appear in pg_dump output encrypted, because the DBA might
> otherwise look at it, but you wouldn't really be concerned about the
> threat of the DBA loading a hostile C module that would steal user
> keys and use them to decrypt all the data, because they don't care
> that much and would be fired if they were caught doing it.

Again completely dependent on the extension you use to do the encryption
for the input policy. The keys don't need to be stored with the data,
and the decryption can be transparent only for certain users or if
certain session variables exist which the DBA does not have access to.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
> Not necessarily. Our pages probably have enough predictable bytes to aid
> cryptanalysis, compared to user data in a column which might not be very
> predicable.

Really?  I would guess that the amount of entropy in a page is WAY
higher than in an individual column value.

> But in any case most encryption schemes use a random nonce (salt) to
> ensure two identical strings do not encrypt to the same result. And
> often the encrypted length is padded, so while you might be able to
> infer short versus long, you would not usually be able to infer the
> exact plaintext length.

Sure, that could be done, although it means that equality comparisons
must be done unencrypted.

> Again completely dependent on the extension you use to do the encryption
> for the input policy. The keys don't need to be stored with the data,
> and the decryption can be transparent only for certain users or if
> certain session variables exist which the DBA does not have access to.

Not arguing with that.  And to be clear, I'm not trying to attack your
proposal.  I'm just trying to have a discussion about advantages and
disadvantages of different approaches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On 06/18/2018 10:26 AM, Robert Haas wrote:
> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
>> Not necessarily. Our pages probably have enough predictable bytes to aid
>> cryptanalysis, compared to user data in a column which might not be very
>> predicable.
> 
> Really?  I would guess that the amount of entropy in a page is WAY
> higher than in an individual column value.

It isn't about the entropy of the page overall, it is about the
predictability of specific bytes at specific locations on the pages. At
least as far as I understand it.

>> But in any case most encryption schemes use a random nonce (salt) to
>> ensure two identical strings do not encrypt to the same result. And
>> often the encrypted length is padded, so while you might be able to
>> infer short versus long, you would not usually be able to infer the
>> exact plaintext length.
> 
> Sure, that could be done, although it means that equality comparisons
> must be done unencrypted.

Sure. Typically equality comparisons are done on other unencrypted
attributes. Or if you need to do equality on encrypted columns, you can
store non-reversible cryptographic hashes in a separate column.

>> Again completely dependent on the extension you use to do the encryption
>> for the input policy. The keys don't need to be stored with the data,
>> and the decryption can be transparent only for certain users or if
>> certain session variables exist which the DBA does not have access to.
> 
> Not arguing with that.  And to be clear, I'm not trying to attack your
> proposal.  I'm just trying to have a discussion about advantages and
> disadvantages of different approaches.

Understood. Ultimately we might want both page-level encryption and
column level POLICY, as they are each useful for different use-cases.
Personally I believe the former is more generally useful than the
latter, but YMMV.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
>> Not necessarily. Our pages probably have enough predictable bytes to aid
>> cryptanalysis, compared to user data in a column which might not be very
>> predicable.

> Really?  I would guess that the amount of entropy in a page is WAY
> higher than in an individual column value.

Depending on the specifics of the encryption scheme, having some amount
of known (or guessable) plaintext may allow breaking the cipher, even
if much of the plaintext is not known.  This is cryptology 101, really.

At the same time, having to have a bunch of independently-decipherable
short field values is not real secure either, especially if they're known
to all be encrypted with the same key.  But what you know or can guess
about the plaintext in such cases would be target-specific, rather than
an attack that could be built once and used against any PG database.

            regards, tom lane


On 06/18/2018 10:52 AM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
>>> Not necessarily. Our pages probably have enough predictable bytes to aid
>>> cryptanalysis, compared to user data in a column which might not be very
>>> predicable.
> 
>> Really?  I would guess that the amount of entropy in a page is WAY
>> higher than in an individual column value.
> 
> Depending on the specifics of the encryption scheme, having some amount
> of known (or guessable) plaintext may allow breaking the cipher, even
> if much of the plaintext is not known.  This is cryptology 101, really.

Exactly

> At the same time, having to have a bunch of independently-decipherable
> short field values is not real secure either, especially if they're known
> to all be encrypted with the same key.  But what you know or can guess
> about the plaintext in such cases would be target-specific, rather than
> an attack that could be built once and used against any PG database.

Again is dependent on the specific solution for encryption. In some
cases you might do something like generate a single use random key,
encrypt the payload with that, encrypt the single use key with the
"global" key, append the two results and store.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



On 06/18/2018 05:06 PM, Joe Conway wrote:
> On 06/18/2018 10:52 AM, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
>>>> Not necessarily. Our pages probably have enough predictable bytes to aid
>>>> cryptanalysis, compared to user data in a column which might not be very
>>>> predicable.
>>
>>> Really?  I would guess that the amount of entropy in a page is WAY
>>> higher than in an individual column value.
>>
>> Depending on the specifics of the encryption scheme, having some
>> amount of known (or guessable) plaintext may allow breaking the
>> cipher, even if much of the plaintext is not known. This is
>> cryptology 101, really.
> 
> Exactly
> 
>> At the same time, having to have a bunch of
>> independently-decipherable short field values is not real secure
>> either, especially if they're known to all be encrypted with the
>> same key. But what you know or can guess about the plaintext in
>> such cases would be target-specific, rather than an attack that
>> could be built once and used against any PG database.
> 
> Again is dependent on the specific solution for encryption. In some 
> cases you might do something like generate a single use random key, 
> encrypt the payload with that, encrypt the single use key with the 
> "global" key, append the two results and store.
> 

Yeah, I suppose we could even have per-page keys, for example.

One topic I haven't seen mentioned in this thread yet is indexes. That's 
a pretty significant side-channel, when built on encrypted columns. Even 
if the indexes are encrypted too, you can often deduce a lot of 
information from them.

So what's the plan here? Disallow indexes on encrypted columns? Index 
encypted values directly? Something else?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


On Mon, Jun 11, 2018 at 06:22:22PM +0900, Masahiko Sawada wrote:
> As per discussion at PGCon unconference, I think that firstly we need
> to discuss what threats we want to defend database data against. If

We call that a threat model.  There can be many threat models, of
course.

> user wants to defend against a threat that is malicious user who
> logged in OS or database steals an important data on datbase this
> design TDE would not help. Because such user can steal the data by
> getting a memory dump or by SQL. That is of course differs depending
> on system requirements or security compliance but what threats do you
> want to defend database data against? and why?

This design guards (somewhat) againts the threat of the storage theft
(e.g., because the storage is remote).  It's a fine threat model to
address, but it's also a lot easier to address in the filesystem or
device drivers -- there's no need to do this in PostgreSQL itself except
so as to support it on all platforms regardless of OS capabilities.

Note that unless the pg_catalog is protected against manipulation by
remote storage, then TDE for user tables might be possible to
compromise.  Like so: the attacker manipulates the pg_catalog to
escalate privelege in order to obtain the TDE keys.  This argues for
full database encryption, not just specific tables or columns.  But
again, this is for the threat model where the storage is the threat.

Another similar thread model is dump management, where dumps are sent
off-site where untrusted users might read them, or even edit them in the
hopes that they will be used for restores and thus compromise the
database.  This is most easily addressed by just encrypting the backups
externally to PG.

Threat models where client users are the threat are easily handled by
PG's permissions system.

I think any threat model where DBAs are not the threat is just not that
interesting to address with crypto within postgres itself...

Encryption to public keys for which postgres does not have private keys
would be one way to address DBAs-as-the-thread, but this is easily done
with an extension...  A small amount of syntactic sugar might help:

  CREATE ROLE "bar" WITH (PUBLIC KEY "...");

  CREATE TABLE foo (
    name TEXT PRIMARY KEY,
    payload TEXT ENCRYPTED TO ROLE "bar" BOUND TO name
  );

but this is just syntactic sugar, so not that valuable.  On the other
hand, just a bit of syntactic sugar can help tick a feature checkbox,
which might be very valuable for marketing reasons even if it's not
valuable for any other reason.

Note that encrypting the payload without a binding to the PK (or similar
name) is very dangerous!  So the encryption option would have to support
some way to indicate what other plaintext to bind in (here the "name"
column).

Note also that for key management reasons it would be necessary to be
able to write the payload as ciphertext rather than as to-be-encrypted
TEXT.

Lastly, for a symmetric encryption option one would need a remote oracle
to do the encryption, which seems rather complicated, but in some cases
may well perform faster.

Nico
-- 


On Fri, May 25, 2018 at 08:41:46PM +0900, Moon, Insung wrote:
> BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this proposal.

> I'd like to hear opinions by security engineer.

Well, CBC makes sense, and since AES uses a 16 byte block size, you
would start with the initialization vector (IV) and run over the 8k page
512 times.  The IV can be any random value that is not repeated, and
does not need to be secret.

However, using the same IV for the entire table would mean that people
can detect if two pages in the same table contain the same data.  You
might care about that, or you might not.  It would prevent detection of
two _tables_ containing the same 8k page.  A more secure solution would
be to use a different IV for each 8k page.

The cleanest idea would be for the per-table IV to be stored per table,
but the IV used for each block to be a mixture of the table's IV and the
page's offset in the table.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Wed, Jun 13, 2018 at 09:20:58AM -0400, Joe Conway wrote:
> On 06/11/2018 05:22 AM, Masahiko Sawada wrote:
> > As per discussion at PGCon unconference, I think that firstly we need
> > to discuss what threats we want to defend database data against.
> 
> Exactly. While certainly there is demand for encryption for the sake of
> "checking a box", different designs will defend against different
> threats, and we should be clear on which ones we are trying to protect
> against for any particular design.

Yep.  This slide covers the various encryption levels and the threats
they protect against:

    http://momjian.us/main/writings/crypto_hw_use.pdf#page=97

I do not have page-level encryption listed since that is not currently
possible with Postgres.

> > Also, if I understand correctly, at unconference session there also
> > were two suggestions about the design other than the suggestion by
> > Alexander: implementing TDE at column level using POLICY, and
> > implementing TDE at table-space level. The former was suggested by Joe
> > but I'm not sure the detail of that suggestion. I'd love to hear the
> > deal of that suggestion.
> 
> The idea has not been extensively fleshed out yet, but the thought was
> that we create column level POLICY, which would transparently apply some
> kind of transform on input and/or output. The transforms would
> presumably be expressions, which in turn could use functions (extension
> or builtin) to do their work. That would allow encryption/decryption,
> DLP (data loss prevention) schemes (masking, redacting), etc. to be
> applied based on the policies.

This is currently possible with stock Postgres as you can see from this
and the following slides:

    http://momjian.us/main/writings/crypto_hw_use.pdf#page=77

> This, in and of itself, would not address key management. There is
> probably a separate need for some kind of built in key management --
> perhaps a flexible way to integrate with external systems such as Vault
> for example, or maybe something self contained, or perhaps both. Or
> maybe key management is really tied into the separately discussed effort
> to create SQL VARIABLEs somehow.

I cover key management in this slide, and following:

    http://momjian.us/main/writings/crypto_hw_use.pdf#page=53

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Mon, Jun 18, 2018 at 08:29:32AM -0400, Joe Conway wrote:
> >> Or
> >> maybe key management is really tied into the separately discussed effort
> >> to create SQL VARIABLEs somehow.
> > 
> > Could you elaborate on how key management is tied into SQL VARIABLEs?
> 
> Well, the key management probably is not, but the SQL VARIABLE might be
> where the key is stored for use.

I disagree.  I would need to understand how an extension actually helps
here, because it certainly limits flexibility compared to a shell
command.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Mon, Jun 18, 2018 at 09:49:20AM -0400, Robert Haas wrote:
> figure stuff out.  You can infer something about the length of
> particular values.  Perhaps you can find cases where the same
> encrypted value appears multiple times.  If there's a btree index, you

Most encryption methods use a random initialization vector (IV) for each
encryption, e.g. pgp_sym_encrypt(), but length might allow this, as you
stated.

> know the ordering of the values under whatever ordering semantics
> apply to that index.  It's unclear to me how useful such information

I don't think an ordered index is possible, only indexing of encrypted
hashes, i.e. see this and the next slide:

    https://momjian.us/main/writings/crypto_hw_use.pdf#page=86

> would be in practice or to what extent it might allow you to attack
> the underlying cryptography, but it seems like there might be cases
> where the information leakage is significant.  For example, suppose
> you're trying to determine which partially-encrypted record is that of
> Aaron Aardvark... or this guy:
> https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff,_Sr.
> 
> Recently, it was suggested to me that a use case for column-level
> encryption might be to prevent casual DBA snooping.  So, you'd want
> the data to appear in pg_dump output encrypted, because the DBA might
> otherwise look at it, but you wouldn't really be concerned about the
> threat of the DBA loading a hostile C module that would steal user
> keys and use them to decrypt all the data, because they don't care
> that much and would be fired if they were caught doing it.

Yes, that is a benefit that is not possible with page-level encryption. 
It also encrypts the WAL and backups automatically;  see:

    http://momjian.us/main/writings/crypto_hw_use.pdf#page=97

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Mon, Jun 18, 2018 at 11:06:20AM -0400, Joe Conway wrote:
> > At the same time, having to have a bunch of independently-decipherable
> > short field values is not real secure either, especially if they're known
> > to all be encrypted with the same key.  But what you know or can guess
> > about the plaintext in such cases would be target-specific, rather than
> > an attack that could be built once and used against any PG database.
> 
> Again is dependent on the specific solution for encryption. In some
> cases you might do something like generate a single use random key,
> encrypt the payload with that, encrypt the single use key with the
> "global" key, append the two results and store.

Even if they are encrypted with the same key, they use different
initialization vectors that are stored inside the encrypted payload, so
you really can't identify much except the length, as Robert stated.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Mon, Jun 18, 2018 at 12:29:57PM -0500, Nico Williams wrote:
> On Mon, Jun 11, 2018 at 06:22:22PM +0900, Masahiko Sawada wrote:
> > As per discussion at PGCon unconference, I think that firstly we need
> > to discuss what threats we want to defend database data against. If
> 
> We call that a threat model.  There can be many threat models, of
> course.
> 
> > user wants to defend against a threat that is malicious user who
> > logged in OS or database steals an important data on datbase this
> > design TDE would not help. Because such user can steal the data by
> > getting a memory dump or by SQL. That is of course differs depending
> > on system requirements or security compliance but what threats do you
> > want to defend database data against? and why?
> 
> This design guards (somewhat) againts the threat of the storage theft
> (e.g., because the storage is remote).  It's a fine threat model to
> address, but it's also a lot easier to address in the filesystem or
> device drivers -- there's no need to do this in PostgreSQL itself except
> so as to support it on all platforms regardless of OS capabilities.
> 
> Note that unless the pg_catalog is protected against manipulation by
> remote storage, then TDE for user tables might be possible to
> compromise.  Like so: the attacker manipulates the pg_catalog to
> escalate privelege in order to obtain the TDE keys.  This argues for
> full database encryption, not just specific tables or columns.  But
> again, this is for the threat model where the storage is the threat.

Yes, one big problem with per-column encryption is that administrators
can silently delete data, though they can't add or modify it.

> Another similar thread model is dump management, where dumps are sent
> off-site where untrusted users might read them, or even edit them in the
> hopes that they will be used for restores and thus compromise the
> database.  This is most easily addressed by just encrypting the backups
> externally to PG.
> 
> Threat models where client users are the threat are easily handled by
> PG's permissions system.
> 
> I think any threat model where DBAs are not the threat is just not that
> interesting to address with crypto within postgres itself...


Yes, but in my analysis the only solution there is client-side
encryption:

    http://momjian.us/main/writings/crypto_hw_use.pdf#page=97

You might want to look at the earlier slides too.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Wed, Jun 20, 2018 at 05:16:46PM -0400, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 12:29:57PM -0500, Nico Williams wrote:
> > Note that unless the pg_catalog is protected against manipulation by
> > remote storage, then TDE for user tables might be possible to
> > compromise.  Like so: the attacker manipulates the pg_catalog to
> > escalate privelege in order to obtain the TDE keys.  This argues for
> > full database encryption, not just specific tables or columns.  But
> > again, this is for the threat model where the storage is the threat.
> 
> Yes, one big problem with per-column encryption is that administrators
> can silently delete data, though they can't add or modify it.

They can also re-add ("replay") deleted values; this can only be
defeated by also binding TX IDs or alike in the ciphertext.  And if you
don't bind the encrypted values to the PKs then they can add any value
they've seen to different rows.

One can protect to some degree agains replay and reuse attacks, but
protecting against silent deletion is much harder.  Protecting against
the rows (or the entire DB) being restored at a past point in time is
even harder -- you quickly end up wanting Merkle hash/MAC trees and key
rotation, but this complicates everything and is performance killing.

> > I think any threat model where DBAs are not the threat is just not that
> > interesting to address with crypto within postgres itself...
> 
> Yes, but in my analysis the only solution there is client-side
> encryption:

For which threat model?

For threat models where the DBAs are not the threat there's no need for
client-side encryption: just encrypt the storage at the postgres
instance (with encrypting device drivers or -preferably- filesystems).

For threat models where the DBAs are the threat then yes, client-side
encryption works (or server-side encryption to public keys), but you
must still bind the encrypted values to the primary keys, and you must
provide integrity protection for as much data as possible -- see above.

Client-side crypto is hard to do well and still get decent performance.
So on the whole I think that crypto is a poor fit for the DBAs-are-the-
threat threat model.  It's better to reduce the number of DBAs/sysadmins
and audit all privileged (and, for good measure, unprivileged) access.

Client-side encryption, of course, wouldn't be a feature of PG..., as PG
is mostly a very smart server + very dumb clients.  The client could be
a lot smarter, for sure -- it could be a full-fledged RDBMS, it could
even be a postgres instance accessing the real server via FDW.

For example, libgda, the GNOME Data Assistant, IIRC, is a smart client
that uses SQLite3 to access remote resources via virtual table
extensions that function a lot like PG's FDW.  This works well because
SQLite3 is embeddable and light-weight.  PG wouldn't fit that bill as
well, but one could start a PG instance to proxy a remote one via FDW,
with crypto done in the proxy.

>     http://momjian.us/main/writings/crypto_hw_use.pdf#page=97
> 
> You might want to look at the earlier slides too.

I will, thanks.

Nico
-- 


On 06/20/2018 05:09 PM, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 09:49:20AM -0400, Robert Haas wrote:
>> know the ordering of the values under whatever ordering semantics
>> apply to that index.  It's unclear to me how useful such information
> 
> I don't think an ordered index is possible, only indexing of encrypted
> hashes, i.e. see this and the next slide:

It is possible with homomorphic encryption -- whether we want to support
that in core is another matter.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


On 06/20/2018 05:05 PM, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 08:29:32AM -0400, Joe Conway wrote:
>>>> Or
>>>> maybe key management is really tied into the separately discussed effort
>>>> to create SQL VARIABLEs somehow.
>>>
>>> Could you elaborate on how key management is tied into SQL VARIABLEs?
>>
>> Well, the key management probably is not, but the SQL VARIABLE might be
>> where the key is stored for use.
> 
> I disagree.  I would need to understand how an extension actually helps
> here, because it certainly limits flexibility compared to a shell
> command.

That flexibility the shell command gives you is also a huge hole from a
security standpoint.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


On Wed, Jun 20, 2018 at 06:06:56PM -0400, Joe Conway wrote:
> On 06/20/2018 05:09 PM, Bruce Momjian wrote:
> > On Mon, Jun 18, 2018 at 09:49:20AM -0400, Robert Haas wrote:
> >> know the ordering of the values under whatever ordering semantics
> >> apply to that index.  It's unclear to me how useful such information
> > 
> > I don't think an ordered index is possible, only indexing of encrypted
> > hashes, i.e. see this and the next slide:
> 
> It is possible with homomorphic encryption -- whether we want to support
> that in core is another matter.

It's also possible using DNSSEC NSEC3-style designs.

Nico
-- 


On 06/20/2018 05:03 PM, Bruce Momjian wrote:
> On Wed, Jun 13, 2018 at 09:20:58AM -0400, Joe Conway wrote:
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
> 
> This is currently possible with stock Postgres as you can see from this
> and the following slides:
> 
>     http://momjian.us/main/writings/crypto_hw_use.pdf#page=77

That is definitely not the same thing. A column level POLICY would apply
an input and output transform expression over the column transparently
to the database user. That transform might produce, for example, a
different output depending on the logged in user (certain user sees
entire field whereas other users see redacted or masked form, or certain
users get decrypted result while others don't).

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


On 06/20/2018 05:12 PM, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 11:06:20AM -0400, Joe Conway wrote:
>>> At the same time, having to have a bunch of independently-decipherable
>>> short field values is not real secure either, especially if they're known
>>> to all be encrypted with the same key.  But what you know or can guess
>>> about the plaintext in such cases would be target-specific, rather than
>>> an attack that could be built once and used against any PG database.
>>
>> Again is dependent on the specific solution for encryption. In some
>> cases you might do something like generate a single use random key,
>> encrypt the payload with that, encrypt the single use key with the
>> "global" key, append the two results and store.
> 
> Even if they are encrypted with the same key, they use different
> initialization vectors that are stored inside the encrypted payload, so
> you really can't identify much except the length, as Robert stated.

The more you encrypt with a single key, the more fuel you give to the
person trying to solve for the key with cryptanalysis.

By encrypting only essentially random data (the single use keys,
generated with cryptographically strong random number generator) with
the "master key", and then encrypting the actual payloads (which are
presumably more predictable than the strong random single use keys), you
minimize the probability of someone cracking your master key and you
also minimize the damage caused by someone cracking one of the single
use keys.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


On Wed, Jun 20, 2018 at 06:19:40PM -0400, Joe Conway wrote:
> On 06/20/2018 05:12 PM, Bruce Momjian wrote:
> > On Mon, Jun 18, 2018 at 11:06:20AM -0400, Joe Conway wrote:
> > Even if they are encrypted with the same key, they use different
> > initialization vectors that are stored inside the encrypted payload, so
> > you really can't identify much except the length, as Robert stated.

Definitely use different IVs, and don't reuse them (or use cipher modes
where IV reuse is not fatal).

> The more you encrypt with a single key, the more fuel you give to the
> person trying to solve for the key with cryptanalysis.

With modern 128-bit block ciphers in modern cipher modes you'd have to
encrypt enough data to make this not a problem.  On the other hand,
you'll still have other reasons to do key rotation.  Key rotation
ultimately means re-encrypting everything.  Getting all of this right is
very difficult.

So again, what's the threat model?  Because if it's sysadmins/DBAs
you're afraid of, there are better things to do.

Nico
-- 


From: Bruce Momjian [mailto:bruce@momjian.us]
> On Fri, May 25, 2018 at 08:41:46PM +0900, Moon, Insung wrote:
> > BTW, I want to support CBC mode encryption[3]. However, I'm not sure how
> to use the IV in CBC mode for this proposal.
> > I'd like to hear opinions by security engineer.
> 
> Well, CBC makes sense, and since AES uses a 16 byte block size, you
> would start with the initialization vector (IV) and run over the 8k page
> 512 times.  The IV can be any random value that is not repeated, and
> does not need to be secret.

XTS is faster and more secure.  XTS seems to be the standard now:

https://www.truecrypt71a.com/documentation/technical-details/encryption-scheme/
"c.Mode of operation: XTS, LRW (deprecated/legacy), CBC (deprecated/legacy)"

Microsoft Introduces AES-XTS to BitLocker in Windows 10 Version 1511
https://www.petri.com/microsoft-introduces-aes-xts-to-bitlocker-in-windows-10-version-1511


> However, using the same IV for the entire table would mean that people
> can detect if two pages in the same table contain the same data.  You
> might care about that, or you might not.  It would prevent detection of
> two _tables_ containing the same 8k page.  A more secure solution would
> be to use a different IV for each 8k page.
> 
> The cleanest idea would be for the per-table IV to be stored per table,
> but the IV used for each block to be a mixture of the table's IV and the
> page's offset in the table.

TrueCrypt uses the 8-byte sector number for the 16-byte tweak value for XTS when encrypting each sector.  Maybe we can
justuse the page number.
 


Regards
Takayuki Tsunakawa





On Thu, Jun 21, 2018 at 6:57 AM, Nico Williams <nico@cryptonector.com> wrote:
> On Wed, Jun 20, 2018 at 05:16:46PM -0400, Bruce Momjian wrote:
>> On Mon, Jun 18, 2018 at 12:29:57PM -0500, Nico Williams wrote:
>> > Note that unless the pg_catalog is protected against manipulation by
>> > remote storage, then TDE for user tables might be possible to
>> > compromise.  Like so: the attacker manipulates the pg_catalog to
>> > escalate privelege in order to obtain the TDE keys.  This argues for
>> > full database encryption, not just specific tables or columns.  But
>> > again, this is for the threat model where the storage is the threat.
>>
>> Yes, one big problem with per-column encryption is that administrators
>> can silently delete data, though they can't add or modify it.
>
> They can also re-add ("replay") deleted values; this can only be
> defeated by also binding TX IDs or alike in the ciphertext.  And if you
> don't bind the encrypted values to the PKs then they can add any value
> they've seen to different rows.

I think we could avoid it by implementations. If we implement
per-column encryption by putting all encrypted columns out to another
table like TOAST table and encrypting whole that external table then
we can do per-column encryption without such concerns. Also, that way
we can encrypt data when disk I/O even if we use per-column
encryption. It would get a better performance. A downside of this idea
is extra overhead to access encrypted column but it would be
predictable since we have TOAST.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Thu, Jun 21, 2018 at 10:05:41AM +0900, Masahiko Sawada wrote:
> On Thu, Jun 21, 2018 at 6:57 AM, Nico Williams <nico@cryptonector.com> wrote:
> > On Wed, Jun 20, 2018 at 05:16:46PM -0400, Bruce Momjian wrote:
> >> On Mon, Jun 18, 2018 at 12:29:57PM -0500, Nico Williams wrote:
> >> > Note that unless the pg_catalog is protected against manipulation by
> >> > remote storage, then TDE for user tables might be possible to
> >> > compromise.  Like so: the attacker manipulates the pg_catalog to
> >> > escalate privelege in order to obtain the TDE keys.  This argues for
> >> > full database encryption, not just specific tables or columns.  But
> >> > again, this is for the threat model where the storage is the threat.
> >>
> >> Yes, one big problem with per-column encryption is that administrators
> >> can silently delete data, though they can't add or modify it.
> >
> > They can also re-add ("replay") deleted values; this can only be
> > defeated by also binding TX IDs or alike in the ciphertext.  And if you
> > don't bind the encrypted values to the PKs then they can add any value
> > they've seen to different rows.
> 
> I think we could avoid it by implementations. If we implement
> per-column encryption by putting all encrypted columns out to another
> table like TOAST table and encrypting whole that external table then
> we can do per-column encryption without such concerns. Also, that way
> we can encrypt data when disk I/O even if we use per-column
> encryption. It would get a better performance. A downside of this idea
> is extra overhead to access encrypted column but it would be
> predictable since we have TOAST.

The case we were discussing was one where the threat model is that the
DBAs are the threat.  It is only in that case that the replay,
cut-n-paste, and silent deletion attacks are relevant.  Encrypting a
table, or the whole DB, on the server side, does nothing to protect
against that threat.

Never lose track of the threat model.

Nico
-- 


On Thu, Jun 21, 2018 at 2:53 PM, Nico Williams <nico@cryptonector.com> wrote:
> On Thu, Jun 21, 2018 at 10:05:41AM +0900, Masahiko Sawada wrote:
>> On Thu, Jun 21, 2018 at 6:57 AM, Nico Williams <nico@cryptonector.com> wrote:
>> > On Wed, Jun 20, 2018 at 05:16:46PM -0400, Bruce Momjian wrote:
>> >> On Mon, Jun 18, 2018 at 12:29:57PM -0500, Nico Williams wrote:
>> >> > Note that unless the pg_catalog is protected against manipulation by
>> >> > remote storage, then TDE for user tables might be possible to
>> >> > compromise.  Like so: the attacker manipulates the pg_catalog to
>> >> > escalate privelege in order to obtain the TDE keys.  This argues for
>> >> > full database encryption, not just specific tables or columns.  But
>> >> > again, this is for the threat model where the storage is the threat.
>> >>
>> >> Yes, one big problem with per-column encryption is that administrators
>> >> can silently delete data, though they can't add or modify it.
>> >
>> > They can also re-add ("replay") deleted values; this can only be
>> > defeated by also binding TX IDs or alike in the ciphertext.  And if you
>> > don't bind the encrypted values to the PKs then they can add any value
>> > they've seen to different rows.
>>
>> I think we could avoid it by implementations. If we implement
>> per-column encryption by putting all encrypted columns out to another
>> table like TOAST table and encrypting whole that external table then
>> we can do per-column encryption without such concerns. Also, that way
>> we can encrypt data when disk I/O even if we use per-column
>> encryption. It would get a better performance. A downside of this idea
>> is extra overhead to access encrypted column but it would be
>> predictable since we have TOAST.
>
> The case we were discussing was one where the threat model is that the
> DBAs are the threat.  It is only in that case that the replay,
> cut-n-paste, and silent deletion attacks are relevant.  Encrypting a
> table, or the whole DB, on the server side, does nothing to protect
> against that threat.
>
> Never lose track of the threat model.
>

Understood.

>> On Thu, Jun 21, 2018 at 6:57 AM, Nico Williams <nico@cryptonector.com> wrote:
>> So on the whole I think that crypto is a poor fit for the DBAs-are-the-
>> threat threat model.  It's better to reduce the number of DBAs/sysadmins
>> and audit all privileged (and, for good measure, unprivileged) access.

I agree with this. The in-database data encryption can defend mainly
the threat of storage theft and the threat of memory dump attack. I'm
sure this design had been proposed for the former purpose. If we want
to defend the latter we must encrypt data even on database memory. To
be honest, I'm not sure that there is needs in practice that is user
want to defend the memory dump attack. What user often needs is to
defend the threat of storage theft with minimum performance overhead.
It's known that client-side encryption or encryption on database
memory increase additional performance overheads. So it would be
better to have several ways to defend different threats as Joe
mentioned.

As long as we encrypt data transparently in database, both the
encryption of network between database server and client and
encryption of logical backups (e.g pg_dump) can be problem. For
network encryption we can use SSL for now but for logical backups we
need to address in other ways.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Jun 20, 2018 at 04:57:18PM -0500, Nico Williams wrote:
> On Wed, Jun 20, 2018 at 05:16:46PM -0400, Bruce Momjian wrote:
> > On Mon, Jun 18, 2018 at 12:29:57PM -0500, Nico Williams wrote:
> > > Note that unless the pg_catalog is protected against manipulation by
> > > remote storage, then TDE for user tables might be possible to
> > > compromise.  Like so: the attacker manipulates the pg_catalog to
> > > escalate privelege in order to obtain the TDE keys.  This argues for
> > > full database encryption, not just specific tables or columns.  But
> > > again, this is for the threat model where the storage is the threat.
> > 
> > Yes, one big problem with per-column encryption is that administrators
> > can silently delete data, though they can't add or modify it.
> 
> They can also re-add ("replay") deleted values; this can only be
> defeated by also binding TX IDs or alike in the ciphertext.  And if you

Yes, and if you bind TX IDs so you can detect loss, you effectively have
to serialize every transaction, which is going to kill performance.

> don't bind the encrypted values to the PKs then they can add any value
> they've seen to different rows.

Yep, you kind of have to add the primary key into the encrypted value.

> One can protect to some degree agains replay and reuse attacks, but
> protecting against silent deletion is much harder.  Protecting against
> the rows (or the entire DB) being restored at a past point in time is
> even harder -- you quickly end up wanting Merkle hash/MAC trees and key
> rotation, but this complicates everything and is performance killing.

Yep.

> > > I think any threat model where DBAs are not the threat is just not that
> > > interesting to address with crypto within postgres itself...
> > 
> > Yes, but in my analysis the only solution there is client-side
> > encryption:
> 
> For which threat model?
> 
> For threat models where the DBAs are not the threat there's no need for
> client-side encryption: just encrypt the storage at the postgres
> instance (with encrypting device drivers or -preferably- filesystems).

Agreed.

> For threat models where the DBAs are the threat then yes, client-side
> encryption works (or server-side encryption to public keys), but you
> must still bind the encrypted values to the primary keys, and you must
> provide integrity protection for as much data as possible -- see above.

Yep.

> Client-side crypto is hard to do well and still get decent performance.
> So on the whole I think that crypto is a poor fit for the DBAs-are-the-
> threat threat model.  It's better to reduce the number of DBAs/sysadmins
> and audit all privileged (and, for good measure, unprivileged) access.

Yeah, kind of.  There is the value of preventing accidental viewing of
the data by the DBA, and of course WAL and backup encryption are nice.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Thu, Jun 21, 2018 at 04:49:34PM +0900, Masahiko Sawada wrote:
> >> On Thu, Jun 21, 2018 at 6:57 AM, Nico Williams <nico@cryptonector.com> wrote:
> >> So on the whole I think that crypto is a poor fit for the DBAs-are-the-
> >> threat threat model.  It's better to reduce the number of DBAs/sysadmins
> >> and audit all privileged (and, for good measure, unprivileged) access.
> 
> I agree with this. The in-database data encryption can defend mainly
> the threat of storage theft and the threat of memory dump attack. I'm
> sure this design had been proposed for the former purpose. If we want
> to defend the latter we must encrypt data even on database memory. To
> be honest, I'm not sure that there is needs in practice that is user
> want to defend the memory dump attack. What user often needs is to
> defend the threat of storage theft with minimum performance overhead.
> It's known that client-side encryption or encryption on database
> memory increase additional performance overheads. So it would be
> better to have several ways to defend different threats as Joe
> mentioned.

If you can view memory you can't really trust the server and have to do
encryption client-side.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Wed, Jun 20, 2018 at 06:19:40PM -0400, Joe Conway wrote:
> On 06/20/2018 05:12 PM, Bruce Momjian wrote:
> > On Mon, Jun 18, 2018 at 11:06:20AM -0400, Joe Conway wrote:
> >>> At the same time, having to have a bunch of independently-decipherable
> >>> short field values is not real secure either, especially if they're known
> >>> to all be encrypted with the same key.  But what you know or can guess
> >>> about the plaintext in such cases would be target-specific, rather than
> >>> an attack that could be built once and used against any PG database.
> >>
> >> Again is dependent on the specific solution for encryption. In some
> >> cases you might do something like generate a single use random key,
> >> encrypt the payload with that, encrypt the single use key with the
> >> "global" key, append the two results and store.
> > 
> > Even if they are encrypted with the same key, they use different
> > initialization vectors that are stored inside the encrypted payload, so
> > you really can't identify much except the length, as Robert stated.
> 
> The more you encrypt with a single key, the more fuel you give to the
> person trying to solve for the key with cryptanalysis.
> 
> By encrypting only essentially random data (the single use keys,
> generated with cryptographically strong random number generator) with
> the "master key", and then encrypting the actual payloads (which are
> presumably more predictable than the strong random single use keys), you
> minimize the probability of someone cracking your master key and you
> also minimize the damage caused by someone cracking one of the single
> use keys.

Yeah, I have a slide about that too, and the previous and next slide:

    http://momjian.us/main/writings/crypto_hw_use.pdf#page=90

The more different keys you use the encrypt data, the more places you
have to store it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Wed, Jun 20, 2018 at 05:28:43PM -0500, Nico Williams wrote:
> On Wed, Jun 20, 2018 at 06:19:40PM -0400, Joe Conway wrote:
> > On 06/20/2018 05:12 PM, Bruce Momjian wrote:
> > > On Mon, Jun 18, 2018 at 11:06:20AM -0400, Joe Conway wrote:
> > > Even if they are encrypted with the same key, they use different
> > > initialization vectors that are stored inside the encrypted payload, so
> > > you really can't identify much except the length, as Robert stated.
> 
> Definitely use different IVs, and don't reuse them (or use cipher modes
> where IV reuse is not fatal).
> 
> > The more you encrypt with a single key, the more fuel you give to the
> > person trying to solve for the key with cryptanalysis.
> 
> With modern 128-bit block ciphers in modern cipher modes you'd have to
> encrypt enough data to make this not a problem.  On the other hand,
> you'll still have other reasons to do key rotation.  Key rotation
> ultimately means re-encrypting everything.  Getting all of this right is
> very difficult.
> 
> So again, what's the threat model?  Because if it's sysadmins/DBAs
> you're afraid of, there are better things to do.

Agreed.  Databases just don't match to the typical cryptographic
solutions and threat models.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Thu, Jun 21, 2018 at 10:14:54AM -0400, Bruce Momjian wrote:
> On Wed, Jun 20, 2018 at 04:57:18PM -0500, Nico Williams wrote:
> > Client-side crypto is hard to do well and still get decent performance.
> > So on the whole I think that crypto is a poor fit for the DBAs-are-the-
> > threat threat model.  It's better to reduce the number of DBAs/sysadmins
> > and audit all privileged (and, for good measure, unprivileged) access.
> 
> Yeah, kind of.  There is the value of preventing accidental viewing of
> the data by the DBA, and of course WAL and backup encryption are nice.

One generally does not use crypto to prevent "accidental" viewing of
plaintext, but to provide real security relative to specific threats.

If you stop at encrypting values with no integrity protection for the
PKs, and no binding to TX IDs and such, you will indeed protect against
accidental viewing of the plaintext, but not against a determined
malicious insider.

Is that worthwhile?  Remember: you'll have to reduce and audit sysadmin
& DBA access anyways.

There is also the risk that users won't understand the limitations of
this sort of encryption feature and might get a false sense of security
from [mis]using it.

I'd want documentation to make it absolutely clear that such a feature
is only meant to reduce the risk of accidental viewing of plaintext by
DBAs and not a real security feature.

Nico
-- 


On Fri, May 25, 2018 at 08:41:46PM +0900, Moon, Insung wrote:
> Issues on data encryption of PostgreSQL
> ==========
> Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
> However, it is inconvenient to use pgcrypto to encrypts data in some cases.
> 
> There are two significant inconveniences.
> 
> First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.

Not so.  VIEWs with INSTEAD OF triggers allow you to avoid this.

> Second, we must modify application program code much if we want to do
> database migration to PostgreSQL from other databases that is using
> TDE.

Not so.  See above.

However, I have at times been told that I should use SQL Server or
whatever because it has column encryption.  My answer is always that it
doesn't help (see my other posts on this thread), but from a business
perspective I understand the problem: the competition has a shiny (if
useless) feature XYZ, therefore we must have it also.

I'm not opposed to PG providing encryption features similar to the
competition's provided the documentation makes their false-sense-of-
security dangers clear.

Incidentally, PG w/ pgcrypto and FDW does provide everything one needs
to be able to implement client-side crypto:

 - use PG w/ FDW as a client-side proxy for the real DB
 - use pgcrypto in VIEWs with INSTEAD OF triggers in the proxy
 - access the DB via the proxy

Presto: transparent client-side crypto that protects against DBAs.  See
other posts about properly binding ciphertext and plaintext.

Protection against malicious DBAs is ultimately a very difficult thing
to get right -- if you really have DBAs as a threat and take that threat
seriously then you'll end up implementing a Merkle tree and performance
will go out the window.

> In these discussions, there were requirements necessary to support TDE in PostgreSQL.
> 
> 1) The performance overhead of encryption and decryption database data must be minimized
> 2) Need to support WAL encryption.
> 3) Need to support Key Management Service.

(2) and full database encryption could be done by the filesystem /
device drivers.  I think this is a much better answer than including
encryption in the DB just because it means not adding all that
complexity to PG, though it's not as portable as doing it in the DB (and
this may well be a winning argument).

What (3) looks like depends utterly on the threat model.  We must
discuss threat models first.

The threat models will drive the design, and (1) will drive some
trade-offs.

> Therefore, I'd like to propose the new design of TDE that deals with
> both above requirements.  Since this feature will become very large,
> I'd like to hear opinions from community before starting making the
> patch.

Any discussion of cryptographic applications should start with a
discussion of threat models.  This is not a big hurdle.

Nico
-- 


On Thu, Jun 21, 2018 at 12:12:40PM -0500, Nico Williams wrote:
> On Thu, Jun 21, 2018 at 10:14:54AM -0400, Bruce Momjian wrote:
> > On Wed, Jun 20, 2018 at 04:57:18PM -0500, Nico Williams wrote:
> > > Client-side crypto is hard to do well and still get decent performance.
> > > So on the whole I think that crypto is a poor fit for the DBAs-are-the-
> > > threat threat model.  It's better to reduce the number of DBAs/sysadmins
> > > and audit all privileged (and, for good measure, unprivileged) access.
> > 
> > Yeah, kind of.  There is the value of preventing accidental viewing of
> > the data by the DBA, and of course WAL and backup encryption are nice.
> 
> One generally does not use crypto to prevent "accidental" viewing of
> plaintext, but to provide real security relative to specific threats.
> 
> If you stop at encrypting values with no integrity protection for the
> PKs, and no binding to TX IDs and such, you will indeed protect against
> accidental viewing of the plaintext, but not against a determined
> malicious insider.
> 
> Is that worthwhile?  Remember: you'll have to reduce and audit sysadmin
> & DBA access anyways.
> 
> There is also the risk that users won't understand the limitations of
> this sort of encryption feature and might get a false sense of security
> from [mis]using it.
> 
> I'd want documentation to make it absolutely clear that such a feature
> is only meant to reduce the risk of accidental viewing of plaintext by
> DBAs and not a real security feature.

Agreed.  I can see from this discussion that we have a long way to go
before we can produce something clearly useful, but it will be worth it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Thu, Jun 21, 2018 at 07:46:35PM -0400, Bruce Momjian wrote:
> Agreed.  I can see from this discussion that we have a long way to go
> before we can produce something clearly useful, but it will be worth it.

Let's start with a set of threat models then.  I'll go first:

1) storage devices as the threat
   a) theft of storage devices
   b) malicious storage device operators

2) malicious backup operators as the threat

3) malicious servers as the threat
   a) compromised servers
   b) insider threat -- rogue admins

4) malicious clients as the threat
   a) compromised clients
   b) insider threat

5) passive adversaries on the network as the threat

6) active adversaries on the network as the threat

7) adversaries on the same host as the server or client


Am I missing any?


For example, modern version control systems that use a Merkle hash tree
have malicious servers as part of their threat model.  Git clients, for
example, can detect non-fast-forward history changes upstream.

For another example, DNSSEC also provides protection against malicious
servers by authenticating not the servers but the _data_.  DNSSEC is a
useful case in point because it's effectively a key/value database that
stores somewhat relational data...


Clearly PG currently covers threat models 4 through 7:

 - passive adversaries on the network (addressed via TLS)
 - active adversaries on the network (addressed via TLS)
 - local adversaries (addressed by standard OS user process isolation)
 - malicious clients (addressed via authentication and authorization)

(1) and (2) can be covered externally:

 - protection against malicious storage or backup operators is trivial
   to provide: just use encrypting filesystems or device drivers, and
   encrypt backups using standard technologies.

One shortcoming of relying on OS functionality for protection against
malicious storage is that not all OSes may provide such functionality.
This could be an argument for implementing full, transparent encryption
for an entire DB in the postgres server.  Not a very compelling
argument, but that's just my opinion -- reasonable people could differ
on this.


PG also authenticates servers, but does nothing to authenticate the data
or functions of the server.  So while PG protects against illegitimate
server impersonators as well as TLS/GSS/SCRAM/... will afford, it does
not protect against rogue server admins nor against compromised servers.


That leaves (3) as the only threat model not covered.  It's also the
most challenging threat model to deal with.

Now, if you're going to protect against malicious servers (insiders)...

 - you can't let the server see any sensitive plaintext (must encrypt it)
 - which includes private/secret keys (the server can't have them, only
   the clients can)
 - you have to not only encrypt but provide integrity protection for
   ciphertext as well as unencrypted plaintext
 - decryption and integrity protection validation can only be done on
   the client (because only they have the necessary secrets!)

There are a lot of choices to make here that will greatly affect any
analysis of the security of the result.

A full analysis will inexorably lead to one conclusion: it's better to
just not have malicious servers (insiders), because if you really have
to defend against them then the only usable models of how to apply
cryptography to the problem are a) Git-like VCS, b) DNSSEC, and both are
rather heavy-duty for a general-purpose RDBMS.

So I think for (3) the best answer is to just not have that problem:
just reduce and audit admin access.

Still, if anyone wants to cover (3), I argue that PG gives you
everything you need right now: FDW and pgcrypto.  Just build a
solution where you have a PG server proxy that acts as a smart
client to untrusted servers:

                  +---------------+       +----------------+
   +--------+     |               |       |                |
   |        |     | postgres      |       | postgres       |
   |        |     | (proxy)       |       | (real server)  |
   | Client |---->|               |------>|                |
   |        |     |               |       |                |
   |        |     | (keys here)   |       | (no keys here) |
   +--------+     |               |       |                |
                  +---------------+       +----------------+

In the proxy use FDW (to talk to the real server) and VIEWs with INSTEAD
OF triggers to do all crypto transparently to the client.

Presto.  Transparent crypto right in your queries and DMLs.

But, you do have to get a number of choices right as to the crypto, and
chances are you won't provide integrity protection for the entire DB
(see above).

Nico
-- 


From: Nico Williams [mailto:nico@cryptonector.com]
> Let's start with a set of threat models then.  I'll go first:

Thank you so much for summarizing the current situation.  I'd appreciate it if you could write this on the PostgreSQL
wiki,when the discussion has settled somehow.
 


>  - local adversaries (addressed by standard OS user process isolation)

Does this also mean that we don't have to worry about the following?

* unencrypted data in the server process memory and core files
* passwords in .pgpass and recovery.conf (someone familiar with PCI DSS audit said this is a problem)
* user data in server logs


> One shortcoming of relying on OS functionality for protection against
> malicious storage is that not all OSes may provide such functionality.
> This could be an argument for implementing full, transparent encryption
> for an entire DB in the postgres server.  Not a very compelling
> argument, but that's just my opinion -- reasonable people could differ
> on this.

Yes, this is one reason I developed TDE in our product.  And in-database encryption allows optimization by encrypting
onlyuser data.
 


> So I think for (3) the best answer is to just not have that problem:
> just reduce and audit admin access.
> 
> Still, if anyone wants to cover (3), I argue that PG gives you
> everything you need right now: FDW and pgcrypto.  Just build a
> solution where you have a PG server proxy that acts as a smart
> client to untrusted servers:

Does sepgsql help?


Should a malfunctioning or buggy application be considered as as a threat?  That's what sql_firewall extension
addresses.

Regards
Takayuki Tsunakawa





On Fri, Jun 22, 2018 at 05:31:44AM +0000, Tsunakawa, Takayuki wrote:
> From: Nico Williams [mailto:nico@cryptonector.com]
> > Let's start with a set of threat models then.  I'll go first:
> 
> Thank you so much for summarizing the current situation.  I'd
> appreciate it if you could write this on the PostgreSQL wiki, when the
> discussion has settled somehow.

Sure, that's a good idea.

> >  - local adversaries (addressed by standard OS user process isolation)
> 
> Does this also mean that we don't have to worry about the following?
> 
> * unencrypted data in the server process memory and core files
> * passwords in .pgpass and recovery.conf (someone familiar with PCI
>   DSS audit said this is a problem)
> * user data in server logs

Short of using things like Intel SGX or homomorphic encryption, I don't
think we can do anything about plaintext in memory -- at some point it
has to be there, therefore it is as vulnerable as the host OS makes it.

Users can always run only the one postgres instance and nothing else
(and no other non-admin users) on the host to reduce local attack
surface to zero.

So, yes, I think this flavor of local vulnerability should be out of
scope for PG.

> > One shortcoming of relying on OS functionality for protection against
> > malicious storage is that not all OSes may provide such functionality.
> > This could be an argument for implementing full, transparent encryption
> > for an entire DB in the postgres server.  Not a very compelling
> > argument, but that's just my opinion -- reasonable people could differ
> > on this.
> 
> Yes, this is one reason I developed TDE in our product.  And
> in-database encryption allows optimization by encrypting only user
> data.

I understand this motivation.  I wouldn't reject this out of hand, even
though I'm not exactly interested either.

Can you keep the impact on the codebase isolated and limited, and the
performance impact when disabled to zero?

> > So I think for (3) the best answer is to just not have that problem:
> > just reduce and audit admin access.
> > 
> > Still, if anyone wants to cover (3), I argue that PG gives you
> > everything you need right now: FDW and pgcrypto.  Just build a
> > solution where you have a PG server proxy that acts as a smart
> > client to untrusted servers:
> 
> Does sepgsql help?

Any functionality in PG that allows DBAs to manage storage, sessions,
..., without having to see table data will help.  It doesn't ahve to be
tied to trusted OS functionality.

I've not looked at SEP [0] so I don't know if it helps.  I would prefer
that PG simply have native functionality to allow this sort of
separation -- as I'm not a DBA, I don't really know if PG has this.

[0] https://wiki.postgresql.org/wiki/SEPostgreSQL_SELinux_Overview

> Should a malfunctioning or buggy application be considered as as a
> threat?  That's what sql_firewall extension addresses.

I suppose so, yes.

Nico
-- 


On Fri, Jun 22, 2018 at 2:31 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: Nico Williams [mailto:nico@cryptonector.com]
>> Let's start with a set of threat models then.  I'll go first:
>
> Thank you so much for summarizing the current situation.  I'd appreciate it if you could write this on the PostgreSQL
wiki,when the discussion has settled somehow.
 
>
>
>>  - local adversaries (addressed by standard OS user process isolation)
>
> Does this also mean that we don't have to worry about the following?
>
> * unencrypted data in the server process memory and core files
> * passwords in .pgpass and recovery.conf (someone familiar with PCI DSS audit said this is a problem)
> * user data in server logs
>
>
>> One shortcoming of relying on OS functionality for protection against
>> malicious storage is that not all OSes may provide such functionality.
>> This could be an argument for implementing full, transparent encryption
>> for an entire DB in the postgres server.  Not a very compelling
>> argument, but that's just my opinion -- reasonable people could differ
>> on this.
>
> Yes, this is one reason I developed TDE in our product.  And in-database encryption allows optimization by encrypting
onlyuser data.
 
>

Me too. In-database encryption is helpful in practice. I think 1) and
2) seem to cover the thread models which the data encryption in
database needs to defend.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On 21/06/18 21:43, Nico Williams wrote:
> On Fri, May 25, 2018 at 08:41:46PM +0900, Moon, Insung wrote:
>> Issues on data encryption of PostgreSQL
>> ==========
>> Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
>> However, it is inconvenient to use pgcrypto to encrypts data in some cases.
>>
>> There are two significant inconveniences.
>>
>> First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
> Not so.  VIEWs with INSTEAD OF triggers allow you to avoid this.
>
>> Second, we must modify application program code much if we want to do
>> database migration to PostgreSQL from other databases that is using
>> TDE.
> Not so.  See above.
>
> However, I have at times been told that I should use SQL Server or
> whatever because it has column encryption.  My answer is always that it
> doesn't help (see my other posts on this thread), but from a business
> perspective I understand the problem: the competition has a shiny (if
> useless) feature XYZ, therefore we must have it also.
>
> I'm not opposed to PG providing encryption features similar to the
> competition's provided the documentation makes their false-sense-of-
> security dangers clear.
>
> Incidentally, PG w/ pgcrypto and FDW does provide everything one needs
> to be able to implement client-side crypto:
>
>   - use PG w/ FDW as a client-side proxy for the real DB
>   - use pgcrypto in VIEWs with INSTEAD OF triggers in the proxy
>   - access the DB via the proxy

     Sounds a bit hackish, but it could work. I doubt however the 
acceptance of a solution like this, given the number of "moving parts" 
and operational complexity associated with it.


>
> Presto: transparent client-side crypto that protects against DBAs.  See
> other posts about properly binding ciphertext and plaintext.
>
> Protection against malicious DBAs is ultimately a very difficult thing
> to get right -- if you really have DBAs as a threat and take that threat
> seriously then you'll end up implementing a Merkle tree and performance
> will go out the window.
>
>> In these discussions, there were requirements necessary to support TDE in PostgreSQL.
>>
>> 1) The performance overhead of encryption and decryption database data must be minimized
>> 2) Need to support WAL encryption.
>> 3) Need to support Key Management Service.
> (2) and full database encryption could be done by the filesystem /
> device drivers.  I think this is a much better answer than including
> encryption in the DB just because it means not adding all that
> complexity to PG, though it's not as portable as doing it in the DB (and
> this may well be a winning argument).
>
> What (3) looks like depends utterly on the threat model.  We must
> discuss threat models first.
>
> The threat models will drive the design, and (1) will drive some
> trade-offs.
>
>> Therefore, I'd like to propose the new design of TDE that deals with
>> both above requirements.  Since this feature will become very large,
>> I'd like to hear opinions from community before starting making the
>> patch.
> Any discussion of cryptographic applications should start with a
> discussion of threat models.  This is not a big hurdle.


     You already mentioned that there are also "business criteria" to 
consider here, and they are important. And there are even more to 
consider. For instance, cases where (1) and even (2) under your proposed 
threat model cannot be fixed by external (device/filesystem) encryption. 
Consider for example the managed database services provided by the cloud 
vendors. While (all?) of them provide transparent disk encryption, are 
they trust-able? Do business want to rely on their encryption scheme, 
key management, and how they respond from requests to hand off 
encryption keys? I believe self-contained solutions are very worth, also 
because of this.


     Álvaro

-- 

Alvaro Hernandez


-----------
OnGres




On 11/06/18 12:22, Masahiko Sawada wrote:
> On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
> <Moon_Insung_i3@lab.ntt.co.jp> wrote:
>> Hello Hackers,
>>
>> This propose a way to develop "Table-level" Transparent Data Encryption (TDE) and Key Management Service (KMS)
supportin
 
>> PostgreSQL.
>>
>>
>> Issues on data encryption of PostgreSQL
>> ==========
>> Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
>> However, it is inconvenient to use pgcrypto to encrypts data in some cases.
>>
>> There are two significant inconveniences.
>>
>> First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
>> Second, we must modify application program code much if we want to do database migration to PostgreSQL from other
databasesthat is
 
>> using TDE.
>>
>> To resolved these inconveniences, many users want to support TDE.
>> There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.
>>
>> However, currently PostgreSQL does not support TDE, so in development community, there are discussions whether it's
necessaryto
 
>> support TDE or not.
>>
>> In these discussions, there were requirements necessary to support TDE in PostgreSQL.
>>
>> 1) The performance overhead of encryption and decryption database data must be minimized
>> 2) Need to support WAL encryption.
>> 3) Need to support Key Management Service.
>>
>> Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
>> Since this feature will become very large, I'd like to hear opinions from community before starting making the
patch.
>>
>> First, my proposal is table-level TDE which is that user can specify tables begin encrypted.
>> Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.
>>
>> Moreover, I want to support encryption for large object as well.
>> But I haven't found a good way for it so far. So I'd like to remain it as future TODO.
>>
>> My proposal has five characteristics features of "table-level TDE".
>>
>> 1) Buffer-level data encryption and decryption
>> 2) Per-table encryption
>> 3) 2-tier encryption key management
>> 4) Working with external key management services(KMS)
>> 5) WAL encryption
>>
>> Here are more details for each items.
>>
>>
>> 1. Buffer-level data encryption and decryption
>> ==================
>> Transparent data encryption and decryption accompany by storage operation
>> With ordinally way like using pgcrypto, the biggest problem with encrypted data is the performance overhead of
decryptingthe data
 
>> each time the run to queries.
>>
>> My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
>> Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.
>>
>> With this design, data encryption/decryption implementations can be developed by modifying the codes of the storage
andbuffer
 
>> manager modules,
>> which are responsible for performing DISK I/O operation.
>>
>>
>> 2. Per-table encryption
>> ==================
>> User can enable TDE per table as they want.
>> I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.
>>
>>      // Generate  the encryption table
>>         CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );
>>
>>      // Change to the non-encryption table
>>         ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );
>>
>> This approach minimizes the overhead for tables that do not require encryption options.
>> For tables that enable TDE, the corresponding table key will be generated with random values, and it's stored into
thenew system
 
>> catalog after being encrypted by the master key.
>>
>> BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this
proposal.
>> I'd like to hear opinions by security engineer.
>>
>>
>> 3. 2-tier encryption key management
>> ==================
>> when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to
alldata.
 
>>
>> To solve this problem we employee 2-tier encryption.
>> 2-tier encryption is All table keys can be stored in the database cluster after being encrypted by the master key,
Andmaster keys
 
>> must be stored at external of PostgreSQL.
>>
>> Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the
databasedata.
 
>>
>> When changing the key, it's not necessary to re-encrypt for all data.
>> We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the
performanceoverhead.
 
>>
>> For table keys, all TDE-enabled tables have different table keys.
>> And for master key, all database have different master keys. Table keys are encrypted by the master key of its own
database.
>> For WAL encryption, we have another cryptographic key. WAL-key is also encrypted by a master key, but it is shared
acrossthe
 
>> database cluster.
>>
>>
>> 4. Working with external key management services(KMS)
>> ==================
>> A key management service is an integrated approach for generating, fetching and managing encryption keys for key
control.
>> They may cover all aspects of security from the secure generation of keys, secure storing keys, and secure fetching
keysup to
 
>> encryption key handling.
>> Also, various types of KMSs are provided by many companies, and users can choose them.
>>
>> Therefore I would like to manage the master key using KMS.
>> Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that
userscan use many
 
>> types of KMS as they want.
>>
>> In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS.
>> So in my proposal, all master keys are distinguished by its ID, called "master key ID".
>> The master key ID is made, for example, using the database oid and a sequence number, like <OID>_<SeqNo>. And they
aremanaged in
 
>> PostgreSQL.
>>
>> When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.
>>
>> When it comes time to rotate the master keys, run this query.
>>
>>          ALTER SYSTEM ROTATION MASTER KEY;
>>
>> In this query, the master key is rotated with the following step.
>> 1. Generate new master key,
>> 2. Change master key IDs and emit corresponding WAL
>> 3. Re-encrypt all table keys on its database
>>
>> Also during checkpoint, master key IDs on shared memory become a permanent condition.
>>
>>
>> 5. WAL encryption
>> ==================
>> If we encrypt all WAL records, performance overhead can be significant.
>> Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL
buffer,instead of
 
>> encrypting a whole WAL record.
>> WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier
encryptionfor WAL
 
>> encryption as well.
>> So, when it comes time to rotate the WAL encryption key, run this query.
>>
>>          ALTER SYSTEM ROTATION WAL KEY;
>>
>> Next, I will explain how to encrypt WAL.
>>
>> To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or
not.
>>
>> Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.
>>
>> In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
>> In the case of PITR, we use WAL key ID in the backup file.
>>
>> With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost
thesame as
 
>> before.
>>
>>

     I may have missed part of the conversation and/or this may be a 
naïve question, but what about pg_stats? I guess data should be 
encrypted there too, and I wonder how this would affect the query 
planner and how it could decrypt this information. Also would a separate
key be used for the stats?


     Thanks,

     Álvaro


-- 

Alvaro Hernandez


-----------
OnGres



On Mon, Jul 02, 2018 at 06:56:34PM +0300, Alvaro Hernandez wrote:
> On 21/06/18 21:43, Nico Williams wrote:
> >Incidentally, PG w/ pgcrypto and FDW does provide everything one needs
> >to be able to implement client-side crypto:
> >
> >  - use PG w/ FDW as a client-side proxy for the real DB
> >  - use pgcrypto in VIEWs with INSTEAD OF triggers in the proxy
> >  - access the DB via the proxy
> 
> Sounds a bit hackish, but it could work. I doubt however the acceptance
> of a solution like this, given the number of "moving parts" and operational
> complexity associated with it.

Well, you could use SQLite3 instead as the client.  Like how GDA does
it.

I do wish there was a libpostgres -- a light-weight postgres for running
an in-process RDBMS in applications without having to have a separate
set of server processes.  That would work really well for a use case
like this one where you're really going to be using FDW to access the
real data store.

If your objection is to an RDBMS in the application accessing real data
via FDW, well, see all the commentary about threat models.  You really
can't protect against DBAs without client-side crypto (and lots of bad
trade-offs).  You can do the crypto in the application, but you start to
lose the power of SQL.  Anyways, I don't think client-side crypto is the
best answer to the DBA threat -- access reduction + auditing is easier
and better.

In any case, spinning up a postgres instance just for this use case is
easy because it wouldn't have any persistent state to keep locally.

> >Any discussion of cryptographic applications should start with a
> >discussion of threat models.  This is not a big hurdle.
> 
> You already mentioned that there are also "business criteria" to
> consider here, and they are important. And there are even more to consider.

The threat model *is* a business criterion.  What are the threats you
want to protect against?  Why aren't you interested in these other
threats?  These are *business* questions.

Of course, one has to understand the issues, including intangibles.  For
example, reputation risk is a huge business concern, but as an
intangible it's easy to miss.  People have a blind spot for intangibles.

> For instance, cases where (1) and even (2) under your proposed threat model
> cannot be fixed by external (device/filesystem) encryption. Consider for
> example the managed database services provided by the cloud vendors. While
> (all?) of them provide transparent disk encryption, are they trust-able? Do

Databases won't be your only cloud security issue.  At some point you're
either using things like SGX or, while you wait for practical, high-
performance homomorphic cryptgraphic computing, you'll settle for using
the power of contracts and contract law -- contract law is a *business*
tool.

At some point there's not much difference between an insider you can
fire and an insider at a vendor who can fire them (and which vendor you
can fire as well), especially when you can use contract law to get the
vendor to do special things for you, like show you how they do things,
reduce admin access, let you audit them, and so on.

> business want to rely on their encryption scheme, key management, and how
> they respond from requests to hand off encryption keys? I believe
> self-contained solutions are very worth, also because of this.

I don't, because client-side crypto to deal with this is just so very
unsatisfactory.  Besides, what happens when first you move the DB into
the cloud and put a lot of effort into client-side crypto, then you move
the clients into the same cloud too?

And anyways, what was proposed by OP is *server-side* crypto, which
clearly does not help at all in this case.

Nico
-- 


On Mon, Jul 02, 2018 at 06:22:46PM +0900, Masahiko Sawada wrote:
> On Fri, Jun 22, 2018 at 2:31 PM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > From: Nico Williams [mailto:nico@cryptonector.com]
> >
> >> One shortcoming of relying on OS functionality for protection against
> >> malicious storage is that not all OSes may provide such functionality.
> >> This could be an argument for implementing full, transparent encryption
> >> for an entire DB in the postgres server.  Not a very compelling
> >> argument, but that's just my opinion -- reasonable people could differ
> >> on this.
> >
> > Yes, this is one reason I developed TDE in our product.  And
> > in-database encryption allows optimization by encrypting only user
> > data.

You're likely getting some things terribly wrong.  E.g., integrity
protection.  Most likely you're getting a false sense of security.

> Me too. In-database encryption is helpful in practice. I think 1) and
> 2) seem to cover the thread models which the data encryption in
> database needs to defend.

Yes, but piecemeal encryption seems like a bad idea to me.

Nico
-- 


On Tue, Jul 3, 2018 at 7:16 AM, Nico Williams <nico@cryptonector.com> wrote:
> On Mon, Jul 02, 2018 at 06:22:46PM +0900, Masahiko Sawada wrote:
>> On Fri, Jun 22, 2018 at 2:31 PM, Tsunakawa, Takayuki
>> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> > From: Nico Williams [mailto:nico@cryptonector.com]
>> >
>> >> One shortcoming of relying on OS functionality for protection against
>> >> malicious storage is that not all OSes may provide such functionality.
>> >> This could be an argument for implementing full, transparent encryption
>> >> for an entire DB in the postgres server.  Not a very compelling
>> >> argument, but that's just my opinion -- reasonable people could differ
>> >> on this.
>> >
>> > Yes, this is one reason I developed TDE in our product.  And
>> > in-database encryption allows optimization by encrypting only user
>> > data.
>
> You're likely getting some things terribly wrong.  E.g., integrity
> protection.  Most likely you're getting a false sense of security.
>
>> Me too. In-database encryption is helpful in practice. I think 1) and
>> 2) seem to cover the thread models which the data encryption in
>> database needs to defend.
>
> Yes, but piecemeal encryption seems like a bad idea to me.
>

What do you mean by "piecemeal encryption"? Is it not-whole database
encryption such as per-table or per-tablespace? If so could you please
elaborate on the reason why you think so?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Dear Antonin Houska.

> -----Original Message-----
> From: Antonin Houska [mailto:ah@cybertec.at]
> Sent: Tuesday, May 29, 2018 3:23 PM
> To: Moon, Insung
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
> Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:
>
> This patch seems to implement some of the features you propose, especially encryption of buffers and WAL. I recommend
> you to check so that no effort is
> duplicated:

Yes. encrypting / decrypting between Buffer <-> Disk is the same architecture.
But, this idea is not to encrypt all table, thinks to minimize the performance overhead, only encrypting to necessary
tables(including Xlog). 

Thank you and Best regards.
Moon.

>
> > [4] Recently discussed mail
> >
> > https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B
> > 9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com
>
>
>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com




Dear Aleksander Alekseev.

> -----Original Message-----
> From: Aleksander Alekseev [mailto:a.alekseev@postgrespro.ru]
> Sent: Thursday, May 31, 2018 10:33 PM
> To: Moon, Insung
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> 
> Hello Moon,
> 
> I promised to email links to the articles I mentioned during your talk on the PGCon Unconference to this thread.
Here
> they are:
> 
> * http://cryptowiki.net/index.php?title=Order-preserving_encryption
> * https://en.wikipedia.org/wiki/Homomorphic_encryption
> 
> Also I realized that I was wrong regarding encryption of the indexes since they will be encrypted on the block level
the
> same way the heap will be.

Sorry. I did not explain correctly in PGCon.
Yes. this idea is encrypting at the block level as you said, there is probably not a big problem with index
encryption.
I will testing with PoC later an Index Encryption.

Thank you and Best regards.
Moon.


> 
> --
> Best regards,
> Aleksander Alekseev




Dear Masahiko Sawada.

> -----Original Message-----
> From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
> Sent: Monday, June 11, 2018 6:22 PM
> To: Moon, Insung
> Cc: PostgreSQL-development; Joe Conway
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
> On Fri, May 25, 2018 at 8:41 PM, Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:
> > Hello Hackers,
> >
> > This propose a way to develop "Table-level" Transparent Data
> > Encryption (TDE) and Key Management Service (KMS) support in PostgreSQL.
> >
> >
> > Issues on data encryption of PostgreSQL ========== Currently, in
> > PostgreSQL, data encryption can be using pgcrypto Tool.
> > However, it is inconvenient to use pgcrypto to encrypts data in some cases.
> >
> > There are two significant inconveniences.
> >
> > First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
> > Second, we must modify application program code much if we want to do
> > database migration to PostgreSQL from other databases that is using TDE.
> >
> > To resolved these inconveniences, many users want to support TDE.
> > There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.
> >
> > However, currently PostgreSQL does not support TDE, so in development
> > community, there are discussions whether it's necessary to support TDE or not.
> >
> > In these discussions, there were requirements necessary to support TDE in PostgreSQL.
> >
> > 1) The performance overhead of encryption and decryption database data
> > must be minimized
> > 2) Need to support WAL encryption.
> > 3) Need to support Key Management Service.
> >
> > Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
> > Since this feature will become very large, I'd like to hear opinions from community before starting making the
patch.
> >
> > First, my proposal is table-level TDE which is that user can specify tables begin encrypted.
> > Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.
> >
> > Moreover, I want to support encryption for large object as well.
> > But I haven't found a good way for it so far. So I'd like to remain it as future TODO.
> >
> > My proposal has five characteristics features of "table-level TDE".
> >
> > 1) Buffer-level data encryption and decryption
> > 2) Per-table encryption
> > 3) 2-tier encryption key management
> > 4) Working with external key management services(KMS)
> > 5) WAL encryption
> >
> > Here are more details for each items.
> >
> >
> > 1. Buffer-level data encryption and decryption ==================
> > Transparent data encryption and decryption accompany by storage
> > operation With ordinally way like using pgcrypto, the biggest problem
> > with encrypted data is the performance overhead of decrypting the data each time the run to queries.
> >
> > My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
> > Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.
> >
> > With this design, data encryption/decryption implementations can be
> > developed by modifying the codes of the storage and buffer manager
> > modules, which are responsible for performing DISK I/O operation.
> >
> >
> > 2. Per-table encryption
> > ==================
> > User can enable TDE per table as they want.
> > I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.
> >
> >     // Generate  the encryption table
> >        CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );
> >
> >     // Change to the non-encryption table
> >        ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );
> >
> > This approach minimizes the overhead for tables that do not require encryption options.
> > For tables that enable TDE, the corresponding table key will be
> > generated with random values, and it's stored into the new system catalog after being encrypted by the master key.
> >
> > BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this
proposal.
> > I'd like to hear opinions by security engineer.
> >
> >
> > 3. 2-tier encryption key management
> > ==================
> > when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to
> all data.
> >
> > To solve this problem we employee 2-tier encryption.
> > 2-tier encryption is All table keys can be stored in the database
> > cluster after being encrypted by the master key, And master keys must be stored at external of PostgreSQL.
> >
> > Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the
database
> data.
> >
> > When changing the key, it's not necessary to re-encrypt for all data.
> > We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the
performance
> overhead.
> >
> > For table keys, all TDE-enabled tables have different table keys.
> > And for master key, all database have different master keys. Table keys are encrypted by the master key of its own
database.
> > For WAL encryption, we have another cryptographic key. WAL-key is also
> > encrypted by a master key, but it is shared across the database cluster.
> >
> >
> > 4. Working with external key management services(KMS)
> > ================== A key management service is an integrated approach
> > for generating, fetching and managing encryption keys for key control.
> > They may cover all aspects of security from the secure generation of
> > keys, secure storing keys, and secure fetching keys up to encryption key handling.
> > Also, various types of KMSs are provided by many companies, and users can choose them.
> >
> > Therefore I would like to manage the master key using KMS.
> > Also, my proposal is to create callback APIs(generate_key, fetch_key,
> > store_key) in the form of a plug-in so that users can use many types of KMS as they want.
> >
> > In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS.
> > So in my proposal, all master keys are distinguished by its ID, called "master key ID".
> > The master key ID is made, for example, using the database oid and a
> > sequence number, like <OID>_<SeqNo>. And they are managed in PostgreSQL.
> >
> > When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.
> >
> > When it comes time to rotate the master keys, run this query.
> >
> >         ALTER SYSTEM ROTATION MASTER KEY;
> >
> > In this query, the master key is rotated with the following step.
> > 1. Generate new master key,
> > 2. Change master key IDs and emit corresponding WAL 3. Re-encrypt all
> > table keys on its database
> >
> > Also during checkpoint, master key IDs on shared memory become a permanent condition.
> >
> >
> > 5. WAL encryption
> > ==================
> > If we encrypt all WAL records, performance overhead can be significant.
> > Therefore, this proposes a method to encrypt only WAL record excluding
> > WAL header when writing WAL on the WAL buffer, instead of encrypting a whole WAL record.
> > WAL encryption key is generated separately when the TDE-enabled table
> > is created the first time. We use 2-tier encryption for WAL encryption as well.
> > So, when it comes time to rotate the WAL encryption key, run this query.
> >
> >         ALTER SYSTEM ROTATION WAL KEY;
> >
> > Next, I will explain how to encrypt WAL.
> >
> > To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or
not.
> >
> > Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.
> >
> > In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
> > In the case of PITR, we use WAL key ID in the backup file.
> >
> > With this approach, the performance overhead of writing and reading
> > the WAL for unencrypted tables would be almost the same as before.
> >
> >
> > ==================
> > I'd like to discuss the design before starting making any change of code.
> > After a more discussion I want to make a PoC.
> > Feedback and suggestion are very welcome.
> >
> > Finally, thank you initial design input for Masahiko Sawada.
> >
> > Thank you.
> >
> > [1] What does TDE mean?
> >     > https://en.wikipedia.org/wiki/Transparent_Data_Encryption
> >
> > [2] What does KMS mean?
> >     >
> > https://en.wikipedia.org/wiki/Key_management#Key_Management_System
> >
> > [3] What does CBC-Mode mean?
> >     > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
> >
> > [4] Recently discussed mail
> >
> > https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B
> > 9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com
> >
> >
>
> As per discussion at PGCon unconference, I think that firstly we need to discuss what threats we want to defend
database
> data against. If user wants to defend against a threat that is malicious user who logged in OS or database steals an
important
> data on datbase this design TDE would not help. Because such user can steal the data by getting a memory dump or by
SQL.
> That is of course differs depending on system requirements or security compliance but what threats do you want to
defend
> database data against? and why?

Yes. I'm Checking to the requirement 3.4 of PCI-DSS.
This requirement is a refer to encrypting stored data.
And idea does not protect data against memory dump(include coredump).
If required for an encryption of memory layer, I'll recheck to this idea.
And I will do a little more research on any enterprise requirement on encryption data.

>
> Also, if I understand correctly, at unconference session there also were two suggestions about the design other than
the
> suggestion by
> Alexander: implementing TDE at column level using POLICY, and implementing TDE at table-space level. The former was
suggested
> by Joe but I'm not sure the detail of that suggestion. I'd love to hear the deal of that suggestion. The latter was
suggested
> by Tsunakawa-san.
> Have you considered that?

First, thank you for Joe and Tsunakawa-san.
I'm thinking of table-level encrypting, but I'll try to find the best way through this discussion.

>
> You mentioned that encryption of temporary data for query processing and large objects are still under the
consideration.
> But other than them you should consider the temporary data generated by other subsystems such as reorderbuffer and
transition
> table as well.

Yes. Encryption of temporary data and large objects and anymore is considered essential.
In this case, I have not yet decided how to encrypt temporary data. I'll make PoC patch, and find to how to encryption
oftemporary data. 

Thank you and Best regards.
Moon.



>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center




Dear Tomas Vondra.

> -----Original Message-----
> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
> Sent: Wednesday, June 13, 2018 10:15 PM
> To: Moon, Insung; pgsql-hackers@postgresql.org
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> 
> Hi,
> 
> On 05/25/2018 01:41 PM, Moon, Insung wrote:
> > Hello Hackers,
> >
> > ...
> >
> > BTW, I want to support CBC mode encryption[3]. However, I'm not sure
> > how to use the IV in CBC mode for this proposal. I'd like to hear
> > opinions by security engineer.
> >
> 
> I'm not a cryptographer either, but this is exactly where you need a prior discussion about the threat models -
there
> are a couple of chaining modes, each with different weaknesses.
> 

Thank you for your advice.
First, I'm researched to more security problem and found that CBC mode is an not safe encryption mode.
Later, when I'll create a PoC, using to GCM or XTS encryption mode.
And this time I know for using the same IV is dangerous, and I'm doing some more research on this.

Thank you and Best regards.
Moon.


> FWIW it may also matter if data_checksums are enabled, because that may prevent malleability attacks affecting of
the
> modes. Assuming active attacker (with the ability to modify the data files) is part of the threat model, of course.
> 
> regards
> 
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Tue, Jul 03, 2018 at 07:28:42PM +0900, Masahiko Sawada wrote:
> On Tue, Jul 3, 2018 at 7:16 AM, Nico Williams <nico@cryptonector.com> wrote:
> > Yes, but piecemeal encryption seems like a bad idea to me.
> 
> What do you mean by "piecemeal encryption"? Is it not-whole database
> encryption such as per-table or per-tablespace? If so could you please
> elaborate on the reason why you think so?

I mean that encrypting some columns only, or some tables only, has
integrity protection issues.  See earlier posts in this thread.

Encrypting the whole DB has no such problems, assuming you're doing the
crypto correctly anyways.  But for full DB encryption it's easier to
leave the crypto to the filesystem or device drivers.  (If the devices
are physically in the host and cannot be removed easily, then FDE at the
device works well too.)

Nico
-- 


Dear Tomas Vondra.

> -----Original Message-----
> From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
> Sent: Wednesday, June 13, 2018 10:03 PM
> To: Masahiko Sawada; Moon, Insung
> Cc: PostgreSQL-development; Joe Conway
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> 
> On 06/11/2018 11:22 AM, Masahiko Sawada wrote:
> > On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
> > <Moon_Insung_i3@lab.ntt.co.jp> wrote:
> >> Hello Hackers,
> >>
> >> This propose a way to develop "Table-level" Transparent Data
> >> Encryption (TDE) and Key Management Service (KMS) support in
> >> PostgreSQL.
> >>
> >> ...
> >
> > As per discussion at PGCon unconference, I think that firstly we need
> > to discuss what threats we want to defend database data against.
> > If user wants to defend against a threat that is malicious user who
> > logged in OS or database steals an important data on datbase this
> > design TDE would not help. Because such user can steal the data by
> > getting a memory dump or by SQL. That is of course differs depending
> > on system requirements or security compliance but what threats do you
> > want to defend database data against? and why?
> >
> 
> I do agree with this - a description of the threat model needs to be part of the design discussion, otherwise it's
not
> possible to compare it to alternative solutions (e.g. full-disk encryption using LUKS or using existing privilege
controls
> and/or RLS).
> 
> TDE was proposed/discussed repeatedly in the past, and every time it died exactly because it was not very clear
which
> issue it was attempting to solve.
> 
> Let me share some of the issues mentioned as possibly addressed by TDE (I'm not entirely sure TDE actually solves
them,
> I'm just saying those were mentioned in previous discussions):
> 
> 1) enterprise requirement - Companies want in-database encryption, for various reasons (because "enterprise
solution"
> or something).

Yes. I do not know clearly about enterprise encryption requirements.
Typically, identified the requirements for encryption of PCI-DSS and posted these ideas.(Storage encryptoin)
Therefore, according to your opinion, I will more try to research of the enterprise encryption requirements.

> 
> 2) like FDE, but OS/filesystem independent - Same config on any OS and filesystem, which may make maintenance
easier.
> 
> 3) does not require special OS/filesystem setup - Does not require help from system adminitrators, setup of LUKS
devices
> or whatever.

Yes. We can use disk encryption like LUKS at Linux, but it does not apply to all OS's, so I'm proposed TDE.

> 
> 4) all filesystem access (basebackups/rsync) is encrypted anyway
> 
> 5) solves key management (the main challenge with pgcrypto)

In fact, it is the biggest worry about key management.
First, I think of 2-tier encryption as I wrote in my idea, and I am thinking of using KMS for management to master
key.
However, I am also worried about security problems when I managed of table key and master key.
Therefore, I want to more discuss of Key Management and develop KMS simultaneously with TDE.


Thank you and Best regards.
Moon.


> 
> 6) allows encrypting only some of the data (tables, columns) to minimize performance impact
> 
> IMHO it makes sense to have TDE even if it provides the same "security"
> as disk-level encryption, assuming it's more convenient to setup/use from the database.
> 
> > Also, if I understand correctly, at unconference session there also
> > were two suggestions about the design other than the suggestion by
> > Alexander: implementing TDE at column level using POLICY, and
> > implementing TDE at table-space level. The former was suggested by Joe
> > but I'm not sure the detail of that suggestion. I'd love to hear the
> > deal of that suggestion. The latter was suggested by Tsunakawa-san.
> > Have you considered that?
> >
> > You mentioned that encryption of temporary data for query processing
> > and large objects are still under the consideration. But other than
> > them you should consider the temporary data generated by other
> > subsystems such as reorderbuffer and transition table as well.
> >
> 
> The severity of those limitations is likely related to the threat model.
> I don't think encrypting temporary data would be a big problem, assuming you know which key to use.
> 
> regards
> 
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Dear Takayuki Tsunakawa.

> -----Original Message-----
> From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> Sent: Thursday, June 14, 2018 9:58 AM
> To: 'Tomas Vondra'; Moon, Insung; pgsql-hackers@postgresql.org
> Subject: RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> 
> > From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
> > On 05/25/2018 01:41 PM, Moon, Insung wrote:
> > > BTW, I want to support CBC mode encryption[3]. However, I'm not sure
> > > how to use the IV in CBC mode for this proposal. I'd like to hear
> > > opinions by security engineer.
> > >
> >
> > I'm not a cryptographer either, but this is exactly where you need a
> > prior discussion about the threat models - there are a couple of
> > chaining modes, each with different weaknesses.
> Our products uses XTS, which recent FDE software like BitLocker and TrueCrypt uses instead of CBC.
> 
> https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS
> 
> "According to SP 800-38E, "In the absence of authentication or access control, XTS-AES provides more protection than
the
> other approved confidentiality-only modes against unauthorized manipulation of the encrypted data.""

Thank your for your advice!

Yes. I found that CBC is not safe at this time.
So let's use XTS mode or GCM mode as you mentioned.

Thank you and Best regards.
Moon.

> 
> 
> 
> > FWIW it may also matter if data_checksums are enabled, because that
> > may prevent malleability attacks affecting of the modes. Assuming
> > active attacker (with the ability to modify the data files) is part of
> > the threat model, of course.
> 
> Encrypt the page after embedding its checksum value.  If a malicious attacker modifies a page on disk, then the
decrypted
> page would be corrupt anyway, which can be detected by checksum.
> 
> 
> Regards
> Takayuki Tsunakawa
> 





Dear Joe.

> -----Original Message-----
> From: Joe Conway [mailto:mail@joeconway.com]
> Sent: Monday, June 18, 2018 9:30 PM
> To: Masahiko Sawada
> Cc: Moon, Insung; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> 
> On 06/14/2018 12:19 PM, Masahiko Sawada wrote:
> > On Wed, Jun 13, 2018 at 10:20 PM, Joe Conway <mail@joeconway.com> wrote:
> >> The idea has not been extensively fleshed out yet, but the thought
> >> was that we create column level POLICY, which would transparently
> >> apply some kind of transform on input and/or output. The transforms
> >> would presumably be expressions, which in turn could use functions
> >> (extension or builtin) to do their work. That would allow
> >> encryption/decryption, DLP (data loss prevention) schemes (masking,
> >> redacting), etc. to be applied based on the policies.
> >
> > Which does this design encrypt data on, buffer or both buffer and
> > disk?
> 
> 
> The point of the design is simply to provide a mechanism for input and output transformation, not to provide the
transform
> function itself.
> 
> How you use that transformation would be entirely up to you, but if you were providing an encryption transform on
input
> the data would be encrypted both buffer and disk.
> 
> > And does this design (per-column encryption) aim to satisfy something
> > specific security compliance?
> 
> 
> Again, entirely up to you and dependent on what type of transformation you provide. If, for example you provided
input
> encryption and output decryption based on some in memory session variable key, that would be essentially TDE and
would
> satisfy several common sets of compliance requirements.
> 
> 
> >> This, in and of itself, would not address key management. There is
> >> probably a separate need for some kind of built in key management --
> >> perhaps a flexible way to integrate with external systems such as
> >> Vault for example, or maybe something self contained, or perhaps both.
> >
> > I agree to have a flexible way in order to address different
> > requirements. I thought that having a GUC parameter to which we store
> > a shell command to get encryption key is enough but considering
> > integration with various key managements seamlessly I think that we
> > need to have APIs for key managements. (fetching key, storing key,
> > generating key etc)
> 
> 
> I don't like the idea of yet another path for arbitrary shell code execution. An API for extension code would be
preferable.

Thank you for your advice on key management.
In fact, it was a big worry how to implement key management.
Basically, we will look at the rules of KMIP, and I'll try to create an extension API that can mostly work with KMS.

and I have a question.
You said do not like the idea of another path for arbitrary shell code execution, is there any special reason?
For example, I think usability to specify a path for shell code to use several KMSs, Is there a potential security
issue?

Thank you and Best regards.
Moon.


> 
> 
> >> Or
> >> maybe key management is really tied into the separately discussed
> >> effort to create SQL VARIABLEs somehow.
> >
> > Could you elaborate on how key management is tied into SQL VARIABLEs?
> 
> Well, the key management probably is not, but the SQL VARIABLE might be where the key is stored for use.
> 
> Joe
> 
> --
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development





Dear Tom Lane.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, June 18, 2018 11:52 PM
> To: Robert Haas
> Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> 
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
> >> Not necessarily. Our pages probably have enough predictable bytes to
> >> aid cryptanalysis, compared to user data in a column which might not
> >> be very predicable.
> 
> > Really?  I would guess that the amount of entropy in a page is WAY
> > higher than in an individual column value.
> 
> Depending on the specifics of the encryption scheme, having some amount of known (or guessable) plaintext may allow
breaking
> the cipher, even if much of the plaintext is not known.  This is cryptology 101, really.
> 
> At the same time, having to have a bunch of independently-decipherable short field values is not real secure either,
especially
> if they're known to all be encrypted with the same key.  But what you know or can guess about the plaintext in such
cases
> would be target-specific, rather than an attack that could be built once and used against any PG database.

Yes. If there is known to guessable data of encrypted data, maybe there is a  possibility of decrypting the encrypted
data.

But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem?
(Do not use the same IV)

Thank you and Best regards.
Moon.


> 
>             regards, tom lane






On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:
Dear Tom Lane.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, June 18, 2018 11:52 PM
> To: Robert Haas
> Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
> >> Not necessarily. Our pages probably have enough predictable bytes to
> >> aid cryptanalysis, compared to user data in a column which might not
> >> be very predicable.
>
> > Really?  I would guess that the amount of entropy in a page is WAY
> > higher than in an individual column value.
>
> Depending on the specifics of the encryption scheme, having some amount of known (or guessable) plaintext may allow breaking
> the cipher, even if much of the plaintext is not known.  This is cryptology 101, really.
>
> At the same time, having to have a bunch of independently-decipherable short field values is not real secure either, especially
> if they're known to all be encrypted with the same key.  But what you know or can guess about the plaintext in such cases
> would be target-specific, rather than an attack that could be built once and used against any PG database.

Yes. If there is known to guessable data of encrypted data, maybe there is a  possibility of decrypting the encrypted data.

But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem?
(Do not use the same IV)

Thank you and Best regards.
Moon.


>
>                       regards, tom lane





Hi Moon,

Have you done progress on that patch? I am thinking to work on the project and found that you are already working on it. The last message is almost six months old. I want to check with you that are you still working on that, if yes I can help on that by reviewing the patch etc. If you are not working on that anymore, can you share your done work (if possible)?
--
Ibrar Ahmed

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)

From
PostgreSQL - Hans-Jürgen Schönig
Date:
hello ...

we are actually planning to move this forward but we did not get around to actually sit down and do it.
the thing is: we would really like to push this forward and we would certainly be happy if the community could reach a consenus on HOW TO implement it and what we really want.
the reason we went for block level encryption in the first place is that it makes key management really comparatively easy.
there is a module in the server (plugin architecture), which arranges the key so that you an start up. that could be a command line prompt, some integration into some fancy key management or whatever means the user wants. it is really really easy. also, TDE has encryption for everything short of the clog and the textual log (which is pretty pointless).
the clog encryption was left out for reliability issues (robert pointed out and issue with torn writes).
so, if we could somehow find a way to implement this which has a chance to actually get committed we are super open to put a lot more effort into that.
of course we are also open to helping hands.

in short: what does the community think? how shall we proceed?

    many thanks,

        hans


On 2/6/19 8:08 PM, Ibrar Ahmed wrote:

On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:
Dear Tom Lane.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, June 18, 2018 11:52 PM
> To: Robert Haas
> Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
> >> Not necessarily. Our pages probably have enough predictable bytes to
> >> aid cryptanalysis, compared to user data in a column which might not
> >> be very predicable.
>
> > Really?  I would guess that the amount of entropy in a page is WAY
> > higher than in an individual column value.
>
> Depending on the specifics of the encryption scheme, having some amount of known (or guessable) plaintext may allow breaking
> the cipher, even if much of the plaintext is not known.  This is cryptology 101, really.
>
> At the same time, having to have a bunch of independently-decipherable short field values is not real secure either, especially
> if they're known to all be encrypted with the same key.  But what you know or can guess about the plaintext in such cases
> would be target-specific, rather than an attack that could be built once and used against any PG database.

Yes. If there is known to guessable data of encrypted data, maybe there is a  possibility of decrypting the encrypted data.

But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem?
(Do not use the same IV)

Thank you and Best regards.
Moon.


>
>                       regards, tom lane





Hi Moon,

Have you done progress on that patch? I am thinking to work on the project and found that you are already working on it. The last message is almost six months old. I want to check with you that are you still working on that, if yes I can help on that by reviewing the patch etc. If you are not working on that anymore, can you share your done work (if possible)?
--
Ibrar Ahmed

-- 
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com
Dear Ibrar Ahmed.

From: Ibrar Ahmed [mailto:ibrar.ahmad@gmail.com]
Sent: Thursday, February 07, 2019 4:09 AM
To: Moon, Insung
Cc: Tom Lane; PostgreSQL-development
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)


On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:
Dear Tom Lane.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, June 18, 2018 11:52 PM
> To: Robert Haas
> Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
> >> Not necessarily. Our pages probably have enough predictable bytes to
> >> aid cryptanalysis, compared to user data in a column which might not
> >> be very predicable.
>
> > Really?  I would guess that the amount of entropy in a page is WAY
> > higher than in an individual column value.
>
> Depending on the specifics of the encryption scheme, having some amount of known (or guessable) plaintext may allow
breaking
> the cipher, even if much of the plaintext is not known.  This is cryptology 101, really.
>
> At the same time, having to have a bunch of independently-decipherable short field values is not real secure either,
especially
> if they're known to all be encrypted with the same key.  But what you know or can guess about the plaintext in such
cases
> would be target-specific, rather than an attack that could be built once and used against any PG database.

> > Yes. If there is known to guessable data of encrypted data, maybe there is a  possibility of decrypting the
encrypteddata. 
> >
> > But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem?
> > (Do not use the same IV)
> > Thank you and Best regards.
> > Moon.
>
> >
> >                       regards, tom lane





> Hi Moon,
>
> Have you done progress on that patch? I am thinking to work on the project and found that you are already working on
it.The last message is almost six months old. I want to check with you that are you still working on that, if yes I can
helpon that by reviewing the patch etc. If you are not working on that anymore, can you share your done work (if
possible)?
> --
> Ibrar Ahmed

We are currently developing for TDE and integration KMS.
So, We will Also be prepared to start a new discussion with the PoC patch as soon as possible.

At currently, we have changed the development direction of a per-Tablespace unit by per-table
Also, currently researching how to associate with KMIP protocol related to the encryption key for integration with KMS.
We talked about this in the Unconference session of PGConf.ASIA,
And a week ago, we talked about the development direction of TDE and integration with KMS at FOSDEM PGDAY[1].

We will soon provide PoC with new discussions.

Regards.

[1] TRANSPARENT DATA ENCRYPTION IN POSTGRESQL AND INTEGRATION WITH KEY MANAGEMENT SERVICES

https://www.postgresql.eu/events/fosdem2019/schedule/session/2307-transparent-data-encryption-in-postgresql-and-integration-with-key-management-services/




On Thu, Feb 7, 2019 at 9:27 AM Moon, Insung
<Moon_Insung_i3@lab.ntt.co.jp> wrote:
>
> Dear Ibrar Ahmed.
>
> From: Ibrar Ahmed [mailto:ibrar.ahmad@gmail.com]
> Sent: Thursday, February 07, 2019 4:09 AM
> To: Moon, Insung
> Cc: Tom Lane; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
>
> On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung <Moon_Insung_i3@lab.ntt.co.jp> wrote:
> Dear Tom Lane.
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Monday, June 18, 2018 11:52 PM
> > To: Robert Haas
> > Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> >
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail@joeconway.com> wrote:
> > >> Not necessarily. Our pages probably have enough predictable bytes to
> > >> aid cryptanalysis, compared to user data in a column which might not
> > >> be very predicable.
> >
> > > Really?  I would guess that the amount of entropy in a page is WAY
> > > higher than in an individual column value.
> >
> > Depending on the specifics of the encryption scheme, having some amount of known (or guessable) plaintext may allow
breaking
> > the cipher, even if much of the plaintext is not known.  This is cryptology 101, really.
> >
> > At the same time, having to have a bunch of independently-decipherable short field values is not real secure
either,especially 
> > if they're known to all be encrypted with the same key.  But what you know or can guess about the plaintext in such
cases
> > would be target-specific, rather than an attack that could be built once and used against any PG database.
>
> > > Yes. If there is known to guessable data of encrypted data, maybe there is a  possibility of decrypting the
encrypteddata. 
> > >
> > > But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem?
> > > (Do not use the same IV)
> > > Thank you and Best regards.
> > > Moon.
> >
> > >
> > >                       regards, tom lane
>
>
>
>
>
> > Hi Moon,
> >
> > Have you done progress on that patch? I am thinking to work on the project and found that you are already working
onit. The last message is almost six months old. I want to check with you that are you still working on that, if yes I
canhelp on that by reviewing the patch etc. If you are not working on that anymore, can you share your done work (if
possible)?
> > --
> > Ibrar Ahmed
>
> We are currently developing for TDE and integration KMS.
> So, We will Also be prepared to start a new discussion with the PoC patch as soon as possible.
>
> At currently, we have changed the development direction of a per-Tablespace unit by per-table
> Also, currently researching how to associate with KMIP protocol related to the encryption key for integration with
KMS.
> We talked about this in the Unconference session of PGConf.ASIA,
> And a week ago, we talked about the development direction of TDE and integration with KMS at FOSDEM PGDAY[1].
>
> We will soon provide PoC with new discussions.
>
> Regards.
>
> [1] TRANSPARENT DATA ENCRYPTION IN POSTGRESQL AND INTEGRATION WITH KEY MANAGEMENT SERVICES
>
https://www.postgresql.eu/events/fosdem2019/schedule/session/2307-transparent-data-encryption-in-postgresql-and-integration-with-key-management-services/
>

Let me share the details of progress and current state.

As the our presentation slides describes I've written the PoC code for
transparent encryption that uses 2-tier key architecture and has the
key rotation feature. We've been discussed the design database
transparent encryption on -hackers so far and we found a good design
and implementation. I will share them with our research results. But I
think the design of integration of PostgreSQL with key management
services(KMS) is more controvertible.

For integration with KMS, I'm going to propose to add generic key
management APIs to PostgreSQL core so that it can communicate with
KMSs supporting different interfaces and protocols and can get the
master key (of 2-tier key architecture) from them. Users can choose a
key management plugin according to their enviornment.

The integration of PostgreSQL with KMS should be separated patch from
the TDE patch and we think that TDE can be done first. But at least
it's essential to provide a way to get the master key from an external
location. Therefore as the first step we can propose the basic
components of TDE with a simple interface to get the master key from
KMS rather than supporting full key management APIs. The basic
components of TDE that we're going to propose are:

* Transparent encryption at a layer between shared buffer and OS page cache
* Per tablespaces encryption
* 2-tier key architecture
* Key rotation
* System catalogs and temporary files encryption

WAL encryption will follow as an additional feature.

The simple interface to get the master key is a GUC parameter that can
store the shell command, say get_encryption_key_command. As its names
suggests, the command is used for only getting the master key, never
be used for removal and registration.

The slides explains about TDE feature in details but doesn't about KMS
much. So let me share a rough idea of using TDE in combination with
KMS.

2-Tier Key Architecture and Key Generation
=================================

In our design, we use 2-tier key architecuter which uses two types
keys: one master key and multiple data encryption keys. As the slides
explains details, the benefit of this architecture is the fast key
rotation. When the key rotation the data to re-encrypt is only data
encryption keys.

Key Generation Number is an integer value starting from 1, using for
identifying the master key. It's initialized at initdb time and
incremented whenever the master key is changed (i.g. key rotation).
For each key generation number we have multiple data encryption keys
associated with tablespaces. The current key generation number is
written to checkpoint records. When starting up, the startup process
executes the shell command set in get_encryption_key_command GUC
parameter with a key generation number.

For example, we can set something like get_encryption_key_command =
'/bin/sh get_key_from_kms.sh %g', where '%g' is replaced with the
current key generation number and where 'get_key_from_kms.sh' is an
arbitary shell script to get the master key from a KMS. I assume that
the master keys on KMS can be identified by its ID. So DBA generates a
master key identified by the key ID in a arbitary form on KMS
beforehand and the get_encryption_key_command has to crafts the key ID
in the same manner and pass to the KMS. The master key we got is
written to stdout.

Therefore, the contract between PostgreSQL and user is,
* User must prepare the master key identified by an unique key ID in advance
* The shell command crafts the key ID in the same form as key ID on KMS.
* User must remove old keys from KMS if necessary (because there is no
interface other than getting the master key)

Initial Setup and Recovery
====================

Since the user data could be encrypted we need the data encryption
keys and the master key even during recovery. The
get_encryption_key_command will be executed by the startup process
with the key generation number written in the checkpoint record, and
stores the master key to the shared memory.

For example, if we crafts the master key ID in the form of 'ABC_<key
generation number>', the operation steps from initdb to recovery will
be followings.

1. User creates the master key of first generation with ID 'ABC_1' on KMS
2. User executes initdb and sets get_encryption_key_command = '/bin/sh
get_key_from_kms.sh %g' in postgresql.conf
3. Start PostgreSQL
    3-1. If transparent encryption is disabled or there is no
encrypted data on database go to step #4
    3-2. The startup process executes '/bin/sh get_key_from_kms.sh 1'
because the current(initial) key generation is 1
    3-3. The get_key_from_kms.sh crafts the key ID 'ABC_1' in the same
form and get the master key from KMS
    3-4. If failed, raise a FATAL error
    3-5. Store the master key to the shared memory
    3-6. If there is data encryption key, decrypt them using the master key
4. Recovery starts

To make sure that we got the correct master key we can save the hash
value of master key on the database cluster and compare them.

Key Rotation
===========

When user requests key rotation (via SQL command or function), the
backends execute get_encryption_key_command with the new key
generation number. It re-encrypts all existing data encryption keys
with the new master key and increments the current key generation
number. Similar to initialization time, we need to prepare the new
master key on the KMS before executing the key rotation.

So for example, the operation steps will be like;

1. Create the second generation master key with key ID 'ABC_2' on KMS
2. Execute key rotation on PostgreSQL (calling
pg_rotate_encryption_key() function)
    3-1. The backends execute '/bin/sh get_key_from_kms.sh 2', where 2
is the next key generation number
    3-2. It crafts the key ID 'ABC_2' in the same manner and gets the
new master key from KMS
    3-3. If failed, raise an error
    3-4. Re-encrypt data encryption keys using the new master key
    3-5. Increment the current key generation to 2

Of course some lockings are required here.

Integration with KMS
================

This above design has some restrictions on administration but might be
enough for a few use case. But I think that these inconviniences will
go way if we had KMS integration. Since KMIP supports some protocol
for key management such as key registration and key removal, the key
management plugin will be responsible for registrating the master key
and getting it using the key ID generated in an unified form. So what
user need to do are setting up KMS and setting up the key management
plugin. User no longer needs to create and remove the master key
manually.

Other use case of integrating with KMS
==============================

BTW we can have not only internal key management interfaces for TDE
feature but also SQL interface for existing use-cases such as using
pgcrypto. Currently we need to pass the password to encryption and
decryption functions.

SELECT decrypt(data, 'sercret-key', 'aes') FROM ...;

The password will be logged to the server log when log_statement =
'all'. But with KMSs it would become,

SELECT decrypt(data, get_encryption_key('keyid'), 'aes') FROM ...;

where get_encryption_key() function gets the encryption key from KMS
via the loaded plugin. The key string never be output to the server
logs.


We're still researching the details of KMIP and key managements APIs.
Will share the updates. Feedback is very welcome and we're open to new
idea.

Thank you for reading.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Thu, Feb 7, 2019 at 3:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> WAL encryption will follow as an additional feature.

I don't think WAL encryption is an optional feature.  You can argue
about whether it's useful to encrypt the disk files in the first place
given that there's no privilege boundary between the OS user and the
database, but a lot of people seem to think it is and maybe they're
right.  However, who can justify encrypting only SOME of the disk
files and not others?  I mean, maybe you could justify not encryption
the SLRU files on the grounds that they probably don't leak much in
the way of interesting information, but the WAL files certainly do --
your data is there, just as much as in the data files themselves.

To be honest, I think there is a lot to like about the patches
Cybertec has proposed.  Those patches don't have all of the fancy
key-management stuff that you are proposing here, but maybe that
stuff, if we want it, could be added, rather than starting over from
scratch.  It seems to me that those patches get a lot of things right.
In particular, it looked to me when I looked at them like they made a
pretty determined effort to encrypt every byte that might go down to
the disk.  It seems to me that you if you want encryption, you want
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Sat, Mar 2, 2019 at 7:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Feb 7, 2019 at 3:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> WAL encryption will follow as an additional feature.

I don't think WAL encryption is an optional feature.  You can argue
about whether it's useful to encrypt the disk files in the first place
given that there's no privilege boundary between the OS user and the
database, but a lot of people seem to think it is and maybe they're
right.  However, who can justify encrypting only SOME of the disk
files and not others?  I mean, maybe you could justify not encryption
the SLRU files on the grounds that they probably don't leak much in
the way of interesting information, but the WAL files certainly do --
your data is there, just as much as in the data files themselves.

+1

WAL encryption is not optional, it must be encrypted.

 
To be honest, I think there is a lot to like about the patches
Cybertec has proposed.  Those patches don't have all of the fancy
key-management stuff that you are proposing here, but maybe that
stuff, if we want it, could be added, rather than starting over from
scratch.  It seems to me that those patches get a lot of things right.
In particular, it looked to me when I looked at them like they made a
pretty determined effort to encrypt every byte that might go down to
the disk.  It seems to me that you if you want encryption, you want
that.


The Cybertec proposed patches are doing the encryption at the instance
level, AFAIK, the current discussion is also trying to reduce the scope of the
encryption to object level like (tablesapce, database or table) to avoid the encryption
performance impact for the databases, tables that don't need it.

IMO, the entire performance of encryption depends on WAL encryption, whether
we choose instance level or object level encryption.

As WAL is not an optional encryption, even if the encryption is set to object
level, the corresponding objects WAL needs to be encrypted, so this should be
done at the WAL insertion not at WAL write to disk, because some entries are
not encrypted and some not. Or may be something like encrypting entire WAL
even if one object is set for encryption.


Regards,
Haribabu Kommi
Fujitsu Australia
On Fri, Mar 1, 2019 at 3:52 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> The Cybertec proposed patches are doing the encryption at the instance
> level, AFAIK, the current discussion is also trying to reduce the scope of the
> encryption to object level like (tablesapce, database or table) to avoid the encryption
> performance impact for the databases, tables that don't need it.

The trick there is that it becomes difficult to figure out which keys
to use for certain things.  For example, you could say, well, this WAL
record is for a table that is encrypted with key 123, so let's use key
123 to encrypt the WAL record also.  So far, so good.  But then how do
you encrypt, say, a logical decoding spill file?  That could have data
in it mixed together from multiple relations, IIUC.  Or what do you do
about SLRUs or other global structures?  If you just exclude that
stuff from the scope of encryption, then you aren't helping the people
who want to Just Encrypt Everything.

Now that having been said I bet a lot of people would find it pretty
cool if we could make this work on a per-table basis.  And I'm not
opposed to that.  I just think it's really hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Sat, Mar 2, 2019 at 5:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Feb 7, 2019 at 3:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > WAL encryption will follow as an additional feature.
>
> I don't think WAL encryption is an optional feature.  You can argue
> about whether it's useful to encrypt the disk files in the first place
> given that there's no privilege boundary between the OS user and the
> database, but a lot of people seem to think it is and maybe they're
> right.  However, who can justify encrypting only SOME of the disk
> files and not others?  I mean, maybe you could justify not encryption
> the SLRU files on the grounds that they probably don't leak much in
> the way of interesting information, but the WAL files certainly do --
> your data is there, just as much as in the data files themselves.
>

Agreed.

> To be honest, I think there is a lot to like about the patches
> Cybertec has proposed.  Those patches don't have all of the fancy
> key-management stuff that you are proposing here, but maybe that
> stuff, if we want it, could be added, rather than starting over from
> scratch.  It seems to me that those patches get a lot of things right.
> In particular, it looked to me when I looked at them like they made a
> pretty determined effort to encrypt every byte that might go down to
> the disk.  It seems to me that you if you want encryption, you want
> that.
>

Agreed. I think the patch lacks the key management stuff: 2-tier key
architecture and integration of postgres with key management systems.
I'd like to work together and can propose the patch of key management
stuff to the proposed patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Sat, Mar 2, 2019 at 6:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 1, 2019 at 3:52 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> > The Cybertec proposed patches are doing the encryption at the instance
> > level, AFAIK, the current discussion is also trying to reduce the scope of the
> > encryption to object level like (tablesapce, database or table) to avoid the encryption
> > performance impact for the databases, tables that don't need it.
>
> The trick there is that it becomes difficult to figure out which keys
> to use for certain things.  For example, you could say, well, this WAL
> record is for a table that is encrypted with key 123, so let's use key
> 123 to encrypt the WAL record also.  So far, so good.  But then how do
> you encrypt, say, a logical decoding spill file?  That could have data
> in it mixed together from multiple relations, IIUC.

I think that there is no need to use the same key for both the spill
files and WAL because only one process encrypt/decrypt spill files. We
can use something like temporary key for that use case, which is used
by only one process and lives during process lifetime (or transaction
lifetime). The same is true for for other temporary files such as
tuplesort and tuplestore, although maybe we need tricks for shared
tuplestore.

> Or what do you do
> about SLRUs or other global structures?  If you just exclude that
> stuff from the scope of encryption, then you aren't helping the people
> who want to Just Encrypt Everything.

Why do people want to just encrypt everything? For satisfying some
security compliance?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Masahiko Sawada wrote:
> Why do people want to just encrypt everything? For satisfying some
> security compliance?

I'd say that TDE primarily protects you from masked ninjas that
break into your server room and rip out the disks with your database
on them.

Or from people stealing your file system backups that you leave
lying around in public.

My guess is that this requirement almost always comes from security
departments that don't know a lot about the typical security threats
that databases face, or (worse) from lawmakers.

And these are probably the people who will insist that *everything*
is encrypted, even your commit log (unencrypted log? everyone can
read the commits?).

Yours,
Laurenz Albe



Or on your laptop



On 3/4/19 11:55 AM, Laurenz Albe wrote:
> Masahiko Sawada wrote:
>> Why do people want to just encrypt everything? For satisfying some
>> security compliance?
> I'd say that TDE primarily protects you from masked ninjas that
> break into your server room and rip out the disks with your database
> on them.
>
> Or from people stealing your file system backups that you leave
> lying around in public.
>
> My guess is that this requirement almost always comes from security
> departments that don't know a lot about the typical security threats
> that databases face, or (worse) from lawmakers.
>
> And these are probably the people who will insist that *everything*
> is encrypted, even your commit log (unencrypted log? everyone can
> read the commits?).
>
> Yours,
> Laurenz Albe
>
>
>
>



On 3/4/19 6:55 PM, Laurenz Albe wrote:
> Masahiko Sawada wrote:
>> Why do people want to just encrypt everything? For satisfying some
>> security compliance?
> 
> I'd say that TDE primarily protects you from masked ninjas that
> break into your server room and rip out the disks with your database
> on them.
> 
> Or from people stealing your file system backups that you leave
> lying around in public.
> 
> My guess is that this requirement almost always comes from security
> departments that don't know a lot about the typical security threats
> that databases face, or (worse) from lawmakers.
> 
> And these are probably the people who will insist that *everything*
> is encrypted, even your commit log (unencrypted log? everyone can
> read the commits?).
> 

IMHO it's a sound design principle - deny access by default, then allow
for specific cases. It's much easier to reason about, and also validate
such solutions.

It's pretty much the same reason why firewall rules generally prohibit
everything by default, and then only allow access for specific ports,
from specific IP ranges, etc. Doing it the other way around is futile.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 3/4/19 6:40 AM, Masahiko Sawada wrote:
> On Sat, Mar 2, 2019 at 5:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Feb 7, 2019 at 3:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> WAL encryption will follow as an additional feature.
>>
>> I don't think WAL encryption is an optional feature.  You can argue
>> about whether it's useful to encrypt the disk files in the first place
>> given that there's no privilege boundary between the OS user and the
>> database, but a lot of people seem to think it is and maybe they're
>> right.  However, who can justify encrypting only SOME of the disk
>> files and not others?  I mean, maybe you could justify not encryption
>> the SLRU files on the grounds that they probably don't leak much in
>> the way of interesting information, but the WAL files certainly do --
>> your data is there, just as much as in the data files themselves.
>>
> 
> Agreed.
> 
>> To be honest, I think there is a lot to like about the patches
>> Cybertec has proposed.  Those patches don't have all of the fancy
>> key-management stuff that you are proposing here, but maybe that
>> stuff, if we want it, could be added, rather than starting over from
>> scratch.  It seems to me that those patches get a lot of things right.
>> In particular, it looked to me when I looked at them like they made a
>> pretty determined effort to encrypt every byte that might go down to
>> the disk.  It seems to me that you if you want encryption, you want
>> that.
>>
> 
> Agreed. I think the patch lacks the key management stuff: 2-tier key
> architecture and integration of postgres with key management systems.
> I'd like to work together and can propose the patch of key management
> stuff to the proposed patch.
> 

Sounds like a plan. It'd be nice to come up with a unified version of
those two patches, combining the good pieces from both.

I wonder how other databases deal with key management? Surely we're not
the first/only database that tries to do transparent encryption, so
perhaps we could learn something from others? For example, do they use
this 2-tier key architecture? How do they do key management? etc.

I don't say we should copy from them, but it'd allow us to (a) avoid
making the same mistakes and (b) build a solution the users are already
somewhat familiar with.

May I suggest creating a page on the PostgreSQL wiki, explaining the
design and updating it as the discussion develops? It's rather difficult
to follow all the different sub-threads, and IIRC some larger patches
used that successfully for this purpose.

See for example:

* https://wiki.postgresql.org/wiki/Parallel_External_Sort
* https://wiki.postgresql.org/wiki/Parallel_Internal_Sort


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


On Tue, Mar 5, 2019 at 3:46 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
>
>
> On 3/4/19 6:40 AM, Masahiko Sawada wrote:
> > On Sat, Mar 2, 2019 at 5:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >> On Thu, Feb 7, 2019 at 3:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>> WAL encryption will follow as an additional feature.
> >>
> >> I don't think WAL encryption is an optional feature.  You can argue
> >> about whether it's useful to encrypt the disk files in the first place
> >> given that there's no privilege boundary between the OS user and the
> >> database, but a lot of people seem to think it is and maybe they're
> >> right.  However, who can justify encrypting only SOME of the disk
> >> files and not others?  I mean, maybe you could justify not encryption
> >> the SLRU files on the grounds that they probably don't leak much in
> >> the way of interesting information, but the WAL files certainly do --
> >> your data is there, just as much as in the data files themselves.
> >>
> >
> > Agreed.
> >
> >> To be honest, I think there is a lot to like about the patches
> >> Cybertec has proposed.  Those patches don't have all of the fancy
> >> key-management stuff that you are proposing here, but maybe that
> >> stuff, if we want it, could be added, rather than starting over from
> >> scratch.  It seems to me that those patches get a lot of things right.
> >> In particular, it looked to me when I looked at them like they made a
> >> pretty determined effort to encrypt every byte that might go down to
> >> the disk.  It seems to me that you if you want encryption, you want
> >> that.
> >>
> >
> > Agreed. I think the patch lacks the key management stuff: 2-tier key
> > architecture and integration of postgres with key management systems.
> > I'd like to work together and can propose the patch of key management
> > stuff to the proposed patch.
> >
>
> Sounds like a plan. It'd be nice to come up with a unified version of
> those two patches, combining the good pieces from both.
>
> I wonder how other databases deal with key management? Surely we're not
> the first/only database that tries to do transparent encryption, so
> perhaps we could learn something from others? For example, do they use
> this 2-tier key architecture? How do they do key management? etc.
>
> I don't say we should copy from them, but it'd allow us to (a) avoid
> making the same mistakes and (b) build a solution the users are already
> somewhat familiar with.
>
> May I suggest creating a page on the PostgreSQL wiki, explaining the
> design and updating it as the discussion develops?

Understood. I've been researching transparent encryption of other
databases and been considering the architecture. I'll write down them
to the wiki.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Mon, Mar 4, 2019 at 1:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think that there is no need to use the same key for both the spill
> files and WAL because only one process encrypt/decrypt spill files. We
> can use something like temporary key for that use case, which is used
> by only one process and lives during process lifetime (or transaction
> lifetime). The same is true for for other temporary files such as
> tuplesort and tuplestore, although maybe we need tricks for shared
> tuplestore.

Agreed.  For a shared tuplestore you need a key that is shared between
the processes involved, but it doesn't need to be the same as any
other key.  For anything that is accessed by only a single process,
that process can just generate any old key and, as long as it's
secure, it's fine.

For the WAL, you could potentially create a new WAL record type that
is basically an encrypted wrapper around another WAL record.  So if
table X is encrypted with key K1, then all of the WAL records for
table X are wrapped inside of an encrypted-record WAL record that is
encrypted with key K1.  That's useful for people who want fine-grained
encryption only of certain data.

But for people who want to just encrypt everything, you need to
encrypt the entire WAL stream, all SLRU data, etc. and that pretty
much all has to be one key (or sub-keys derived from that one key
somehow).

> > Or what do you do
> > about SLRUs or other global structures?  If you just exclude that
> > stuff from the scope of encryption, then you aren't helping the people
> > who want to Just Encrypt Everything.
>
> Why do people want to just encrypt everything? For satisfying some
> security compliance?

Yeah, I think so.  Perhaps an encrypted filesystem is a better way to
go, but some people want something that is built into the database
server.  The motivation seems to be mostly that they have a compliance
requirement -- either the database itself encrypts everything, or they
cannot use the software.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On 3/3/19 21:40, Masahiko Sawada wrote:
> On Sat, Mar 2, 2019 at 5:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> To be honest, I think there is a lot to like about the patches
>> Cybertec has proposed.  Those patches don't have all of the fancy
>> key-management stuff that you are proposing here, but maybe that
>> stuff, if we want it, could be added, rather than starting over from
>> scratch.  It seems to me that those patches get a lot of things right.
>> In particular, it looked to me when I looked at them like they made a
>> pretty determined effort to encrypt every byte that might go down to
>> the disk.  It seems to me that you if you want encryption, you want
>> that.
> 
> Agreed. I think the patch lacks the key management stuff: 2-tier key
> architecture and integration of postgres with key management systems.
> I'd like to work together and can propose the patch of key management
> stuff to the proposed patch.

Might it make sense to generalize a little bit to secret management? It
would be *great* if PostgreSQL could have a standard "secrets" API which
could then use plugins or extensions to provide an internal
implementation (software or hardware based) and/or plug in to an
external secret management service, whether an OSS package installed on
the box or some 3rd party service off the box.

The two obvious use cases are encryption keys (mentioned here) and
passwords for things like logical replication, FDWs, dblinks, other
extensions, etc. Aside from adding new encryption key secrets, the way
PostgreSQL handles the existing secrets it already has today leaves room
for improvement.

-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services


On Wed, Mar  6, 2019 at 10:49:17AM -0800, Jeremy Schneider wrote:
> Might it make sense to generalize a little bit to secret management? It
> would be *great* if PostgreSQL could have a standard "secrets" API which
> could then use plugins or extensions to provide an internal
> implementation (software or hardware based) and/or plug in to an
> external secret management service, whether an OSS package installed on
> the box or some 3rd party service off the box.
> 
> The two obvious use cases are encryption keys (mentioned here) and
> passwords for things like logical replication, FDWs, dblinks, other
> extensions, etc. Aside from adding new encryption key secrets, the way
> PostgreSQL handles the existing secrets it already has today leaves room
> for improvement.

See this email for a possible implementation:

    https://www.postgresql.org/message-id/20190222035816.uozqvc4wjyag3pme@momjian.us

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Wed, Mar 6, 2019 at 6:32 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Mar  6, 2019 at 10:49:17AM -0800, Jeremy Schneider wrote:
> > Might it make sense to generalize a little bit to secret management? It
> > would be *great* if PostgreSQL could have a standard "secrets" API which
> > could then use plugins or extensions to provide an internal
> > implementation (software or hardware based) and/or plug in to an
> > external secret management service, whether an OSS package installed on
> > the box or some 3rd party service off the box.
> >
> > The two obvious use cases are encryption keys (mentioned here) and
> > passwords for things like logical replication, FDWs, dblinks, other
> > extensions, etc. Aside from adding new encryption key secrets, the way
> > PostgreSQL handles the existing secrets it already has today leaves room
> > for improvement.
>
> See this email for a possible implementation:
>
>         https://www.postgresql.org/message-id/20190222035816.uozqvc4wjyag3pme@momjian.us

I don't think that actually does what would be needed here.
pgcryptokey can manage the keys themselves, but the secrets (i.e.
passwords) that are used to access those keys are and must be revealed
to everyone who uses them.  I think we can imagine a
secrets-management solution where that's not the case -- where you can
access an encrypted database cluster or an encrypted table or an
encrypted column or an FDW on another server without being able to
access either the encryption key or the password for that key.

Generally, I think our interest should be less in how secrets are
stored inside the database than in how we can integrate with an
external secrets-management solution, and I think that's what Jeremy
is talking about here.  I don't know exactly how that would work, but
you can imagine having a way to tell an FDW "hey, there's a password
for this server, but it's not stored here -- instead go fetch secret
d41d8cd98f00b204e9800998ecf8427e" and the server does that and uses
that password for the connection.  But we don't need to solve the FDW
problem for this effort to move forward.  We do, however, need a
solution that's good enough for whatever we want to do in terms of
TDE.

If we imagine whole-database TDE, then there's really only one secret,
so there's not much to design.  We can just have a command that is
configured via a GUC that has to return the secret, and a user can put
whatever script they like in there.  But if we want to have
fine-grained TDE where different bits are encrypted with different
keys, then we have to have a way to request whichever key is needed
for a certain bit of data.  I don't know whether it's good enough to
just run a script and pass it some identifier and let it return the
corresponding key, or whether we should try to do something more
ambitious than that in the hopes of meeting more use case.  Sometimes
the perfect can be the enemy of the good, but half-baked solutions are
no good either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On 3/7/19 10:06, Robert Haas wrote:
Generally, I think our interest should be less in how secrets are
stored inside the database than in how we can integrate with an
external secrets-management solution, and I think that's what Jeremy
is talking about here.  I don't know exactly how that would work, but
you can imagine having a way to tell an FDW "hey, there's a password
for this server, but it's not stored here -- instead go fetch secret
d41d8cd98f00b204e9800998ecf8427e" and the server does that and uses
that password for the connection.  But we don't need to solve the FDW
problem for this effort to move forward.  We do, however, need a
solution that's good enough for whatever we want to do in terms of
TDE.

Right: the biggest use case I'm thinking about is external secret management systems. For anyone with heavy-weight security requirements, this will be a must.  I use LastPass in my personal life and they have an enterprise product with API access [1] that I know nothing about.  :)  At one previous company where I worked, they used Thycotic [2] which is now OEMed as IBM Security Server [3]. HashiCorp Vault [4] is pretty widely known, and it's docs have a handy list of a whole bunch more Secrets Engines [5] they integrate with.  Every major cloud provider has a secrets solution (AWS [6], Azure [7], GCP [8], etc) and then there are open source secret management suites like Cerberus [9] that layer on top of cloud APIs.  There are the services built into orchestration frameworks like Docker [10] and k8s [11].  And of course, don't forget HSMs [12].

I'm in no way discrediting a full implementation within an extension as you've done, Bruce, with pgcryptokey.  In fact I think we will need something like this as a reference implementation, and to build unit tests.  But the problem is that this doesn't provide a standard API for extensions to code against.  Other extensions need a dependency on pgcryptokey, it's up to each extension author to support every secret provider, and realistically FDWs and logical rep can't ever use an API that's not in core.  In my ideal world, core gives us a standard API that internal code and extensions can each code against to (1) store/retrieve secrets [including temporary secrets or tokens; supporting models like kerberos] and (2) provide custom backend implementations to that service.

In short, core could provide the _plumbing_ of a standard secrets API and allow extensions to register as providers and act as consumers of the API.  FDWs, logical replication and TDE are the things on the table right now but there are lots of conceivable things that future developers might need secrets for.  TDE is a great excuse to get an API in place. If it proves successful, then later on we can look at updating logical replication and FDWs to use this API as well.

-Jeremy


1. https://www.lastpass.com/enterprise-password-management
2. https://thycotic.com/products/secret-server/
3. https://ovum.informa.com/resources/product-content/ibm-adopts-thycotic-for-privileged-account-management
4. https://www.vaultproject.io/
5. https://www.vaultproject.io/docs/secrets/index.html
6. https://aws.amazon.com/secrets-manager/
7. https://azure.microsoft.com/en-us/services/key-vault/
8. https://cloud.google.com/solutions/secrets-management/
9. http://engineering.nike.com/cerberus/docs/
10. https://docs.docker.com/engine/swarm/secrets/
11. https://kubernetes.io/docs/concepts/configuration/secret/
12. https://en.wikipedia.org/wiki/Hardware_security_module

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services
On Wed, Mar 6, 2019 at 12:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 4, 2019 at 1:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I think that there is no need to use the same key for both the spill
> > files and WAL because only one process encrypt/decrypt spill files. We
> > can use something like temporary key for that use case, which is used
> > by only one process and lives during process lifetime (or transaction
> > lifetime). The same is true for for other temporary files such as
> > tuplesort and tuplestore, although maybe we need tricks for shared
> > tuplestore.
>
> Agreed.  For a shared tuplestore you need a key that is shared between
> the processes involved, but it doesn't need to be the same as any
> other key.  For anything that is accessed by only a single process,
> that process can just generate any old key and, as long as it's
> secure, it's fine.

Thank you for the advice. Understood.

>
> For the WAL, you could potentially create a new WAL record type that
> is basically an encrypted wrapper around another WAL record.  So if
> table X is encrypted with key K1, then all of the WAL records for
> table X are wrapped inside of an encrypted-record WAL record that is
> encrypted with key K1.  That's useful for people who want fine-grained
> encryption only of certain data.
>
> But for people who want to just encrypt everything, you need to
> encrypt the entire WAL stream, all SLRU data, etc. and that pretty
> much all has to be one key (or sub-keys derived from that one key
> somehow).

Agreed.

For the WAL encryption, I wonder if we can have a encryption key
dedicated for WAL. Regardless of keys of tables and indexes all WAL
are encrypted with the WAL key. During the recovery the startup
process decrypts WAL and applies it, and then the table data will be
encrypted with its table key when flushing. So we just control the
scope of encryption object: WAL of tables and indexes etc or
everything.

>
> > > Or what do you do
> > > about SLRUs or other global structures?  If you just exclude that
> > > stuff from the scope of encryption, then you aren't helping the people
> > > who want to Just Encrypt Everything.
> >
> > Why do people want to just encrypt everything? For satisfying some
> > security compliance?
>
> Yeah, I think so.  Perhaps an encrypted filesystem is a better way to
> go, but some people want something that is built into the database
> server.  The motivation seems to be mostly that they have a compliance
> requirement -- either the database itself encrypts everything, or they
> cannot use the software.
>

Understood. Maybe we need a option to control encrypting database
including all meta data or excluding them.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Wed, Mar 6, 2019 at 12:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Mar 4, 2019 at 1:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > I think that there is no need to use the same key for both the spill
> > > files and WAL because only one process encrypt/decrypt spill files. We
> > > can use something like temporary key for that use case, which is used
> > > by only one process and lives during process lifetime (or transaction
> > > lifetime). The same is true for for other temporary files such as
> > > tuplesort and tuplestore, although maybe we need tricks for shared
> > > tuplestore.
> >
> > Agreed.  For a shared tuplestore you need a key that is shared between
> > the processes involved, but it doesn't need to be the same as any
> > other key.  For anything that is accessed by only a single process,
> > that process can just generate any old key and, as long as it's
> > secure, it's fine.
>
> Thank you for the advice. Understood.
>
> >
> > For the WAL, you could potentially create a new WAL record type that
> > is basically an encrypted wrapper around another WAL record.  So if
> > table X is encrypted with key K1, then all of the WAL records for
> > table X are wrapped inside of an encrypted-record WAL record that is
> > encrypted with key K1.  That's useful for people who want fine-grained
> > encryption only of certain data.
> >
> > But for people who want to just encrypt everything, you need to
> > encrypt the entire WAL stream, all SLRU data, etc. and that pretty
> > much all has to be one key (or sub-keys derived from that one key
> > somehow).
>
> Agreed.
>
> For the WAL encryption, I wonder if we can have a encryption key
> dedicated for WAL. Regardless of keys of tables and indexes all WAL
> are encrypted with the WAL key. During the recovery the startup
> process decrypts WAL and applies it, and then the table data will be
> encrypted with its table key when flushing. So we just control the
> scope of encryption object: WAL of tables and indexes etc or
> everything.

My point of view is that different key usually means different user. The user
who can decrypt WAL can effectively see all the data, even though another user
put them (encrypted with another key) into tables. So in this case, different
keys don't really separate users in terms of data access.

--
Antonin Houska
https://www.cybertec-postgresql.com


Antonin Houska <ah@cybertec.at> wrote:

> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Agreed.
> >
> > For the WAL encryption, I wonder if we can have a encryption key
> > dedicated for WAL. Regardless of keys of tables and indexes all WAL
> > are encrypted with the WAL key. During the recovery the startup
> > process decrypts WAL and applies it, and then the table data will be
> > encrypted with its table key when flushing. So we just control the
> > scope of encryption object: WAL of tables and indexes etc or
> > everything.
>
> My point of view is that different key usually means different user. The user
> who can decrypt WAL can effectively see all the data, even though another user
> put them (encrypted with another key) into tables. So in this case, different
> keys don't really separate users in terms of data access.

Please ignore what I said here. You probably meant that the WAL is both
encrypted and decrypted using the same (dedicated) key.

--
Antonin Houska
https://www.cybertec-postgresql.com


On 3/8/19 5:38 PM, Antonin Houska wrote:
> Antonin Houska <ah@cybertec.at> wrote:
> 
>> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>> Agreed.
>>>
>>> For the WAL encryption, I wonder if we can have a encryption key
>>> dedicated for WAL. Regardless of keys of tables and indexes all WAL
>>> are encrypted with the WAL key. During the recovery the startup
>>> process decrypts WAL and applies it, and then the table data will be
>>> encrypted with its table key when flushing. So we just control the
>>> scope of encryption object: WAL of tables and indexes etc or
>>> everything.
>>
>> My point of view is that different key usually means different user. The user
>> who can decrypt WAL can effectively see all the data, even though another user
>> put them (encrypted with another key) into tables. So in this case, different
>> keys don't really separate users in terms of data access.
> 
> Please ignore what I said here. You probably meant that the WAL is both
> encrypted and decrypted using the same (dedicated) key.
> 

I think this very much depends on the threat model. If the encryption is
supposed to serve as a second access control layer (orthogonal to the
ACL stuff we already have), then a single WAL key may not be sufficient.

I may be misunderstanding the whole scheme, but it seems to me features
like logical decoding do require knowledge of the WAL key. So sessions
performing logical decoding (which are regular user sessions) would know
the WAL key, which gives them the ability to decode everything.

So if the threat model includes insider thread (someone with access to a
subset of data, gaining unauthorized access to everything), then this
would be an issue. Such bad actor might obtain access to WAL archive, or
possibly just copy the WAL segments on his own ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


On Sat, Mar 9, 2019 at 3:08 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On 3/8/19 5:38 PM, Antonin Houska wrote:
> > Antonin Houska <ah@cybertec.at> wrote:
> >
> >> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> >>> Agreed.
> >>>
> >>> For the WAL encryption, I wonder if we can have a encryption key
> >>> dedicated for WAL. Regardless of keys of tables and indexes all WAL
> >>> are encrypted with the WAL key. During the recovery the startup
> >>> process decrypts WAL and applies it, and then the table data will be
> >>> encrypted with its table key when flushing. So we just control the
> >>> scope of encryption object: WAL of tables and indexes etc or
> >>> everything.
> >>
> >> My point of view is that different key usually means different user. The user
> >> who can decrypt WAL can effectively see all the data, even though another user
> >> put them (encrypted with another key) into tables. So in this case, different
> >> keys don't really separate users in terms of data access.
> >
> > Please ignore what I said here. You probably meant that the WAL is both
> > encrypted and decrypted using the same (dedicated) key.
> >
>
> I think this very much depends on the threat model. If the encryption is
> supposed to serve as a second access control layer (orthogonal to the
> ACL stuff we already have), then a single WAL key may not be sufficient.
>

Agreed.

> I may be misunderstanding the whole scheme, but it seems to me features
> like logical decoding do require knowledge of the WAL key. So sessions
> performing logical decoding (which are regular user sessions) would know
> the WAL key, which gives them the ability to decode everything.

Yeah, currently logical decoding requires the super user privilege and
it can decode everything if they have regardless of per table
privileges. So the session performing logical decoding will take the
WAL key and decode WAL with it, and can use the WAL key or temporary
key for spilled files.

I'm trying to implement TDE while not changing the current access
control behavior. That is, if a user has an access privilege of a
table he/she can access it as same as before and encrypts and decrypts
it transparently. I've considered a design of having two layers the
encryption and access control to access data; users might see
encrypted data if they have a access privilege but not have the
decryption privilege. But I think there would be two problems: the
access control layer will get complex and applications need to be
tolerate getting encrypted data.

>
> So if the threat model includes insider thread (someone with access to a
> subset of data, gaining unauthorized access to everything), then this
> would be an issue. Such bad actor might obtain access to WAL archive, or
> possibly just copy the WAL segments on his own ...
>

So I think there is no such insider thread problem, right?

We can think the threat simply, the access control is not changed. The
current design cannot prevent data from theft by malicious user who
have access privileges. That can be addressed by auditing or more
fine-grained access control.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Thu, Mar 14, 2019 at 8:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> So I think there is no such insider thread problem, right?

No, I think that's a really bad assumption.  If there's no insider
threat, then what exactly ARE you guarding against?  If somebody
steals the disk, you don't need PostgreSQL to do anything; just
encrypt the filesystem and call it good.  Encryption within the
database can only provide any value when someone already has some
degree of access to the system, and we want to prevent them from
getting more access.  Maybe they are legitimately allowed some access
and we want to keep them from exceeding those bounds, or maybe they
hacked something else to gain limited access and we want to keep them
from parleying that into more access.  Either way, they are to some
extent an insider.

And that is why I think your idea that it's OK to encrypt the WAL with
key A while the tables are meanwhile being encrypted with keys B1, B2,
..., Bn is unsound.  In such an environment, anybody who has key A
does not really need to bother compromising the other keys; they
basically get everything anyway.  There is therefore little point in
having the complexity of multiple keys; just use key A for everything
and be done with it.  To justify the complexity of multiple keys, they
need to be independent of each other from a security perspective; that
is, it must be that all copies of any given piece of data are
encrypted with the same key, and you can therefore get that data only
if you get that key.  And that means, I believe, that fine-grained
encryption must use the same key to encrypt the WAL for table T1 --
and the indexes and TOAST table for table T1 and the WAL for those --
that it uses to encrypt table T1 itself.

If we can't make that happen, then I really don't see any advantage in
supporting multiple keys.  Nobody steals the key to one office if they
can, for the same effort, steal the master key for the whole building.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Replying to myself to resend to the list, since my previous attempt
seems to have been eaten by a grue.

On Tue, Apr 30, 2019 at 1:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Apr 30, 2019 at 1:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > It seems to me that encrypting table data in WAL with multiple keys
> > reduces damage in case where a key is theft. However user who has an
> > access privilege of reading WAL can obtain all keys encrypting table
> > data in WAL in the first place.
>
> That better not be true.  If you have a design where reading the WAL
> lets you get *any* encryption key, you have a bad design, I think.  If
> you have a design where reading the WAL lets you get *every*
> encryption key, that's truly terrible.  That's strictly worse than
> full-disk encryption, which at least protects against the disk being
> stolen.
>
> > So as long as postgres's access
> > control facility works fine it's the same as having single encryption
> > key dedicated for WAL.
>
> I think of postgres's access control facility works fine there is no
> need for encryption in the first place.  If we can just use REVOKE to
> block access, we should do that and forget about encryption.  The
> value of encryption only enters the picture when someone can bypass
> the database server permissions in some way, such as by reading the
> files directly.
>
> > Or do you mean the case where a malicious user
> > steals both WAL and key A? It completely depends on from what threat
> > we want to protect data by transparent data encryption but I think we
> > should rather protect from such threat by the access control in that
> > situation. I personally don't think the having an encryption key
> > dedicated for WAL would increase risk much.
>
> Well, what threat are you trying to protect against?
>
> > FWIW, binary log encryption of MySQL uses different encryption key
> > from a key used for table[1]. The key is encrypted by the master key
> > for binary log encryption and is stored in each file headers.
>
> So, if you steal the master key for binary log encryption, you can
> decrypt everything, it sounds like.
>
> > If we use the same key to encrypt the WAL for the table and indexes
> > and TOAST table for the table, what encryption key should we use for
> > temporary files for an intermediate result?
>
> For temporary files, we can just use some temporary key that is only
> stored in server memory and only for the lifetime of the session.
> Once the session ends, we don't ever need to read that data again.
>
> > And should we use each
> > different encryption keys for WAL other than table and indexes
> > resource manager?
>
> Data other than table and index data seems like it is not very
> security-sensitive.  I'm not sure we need to encrypt it at all.  If we
> do, using one key seems fine.
>
> > The advantage of having the dedicated key for WAL
> > encryption would be to make WAL encryption more simple. If we do that,
> > we can encrypt the whole WAL page by key A like the patch proposed by
> > Antonin does).
>
> Yeah.  His design is simpler because it is coarse-grained: the whole
> cluster is encrypted, so there's no need to worry about encrypting
> data differently for different tables.
>
> > Also the advantage of having multiple tablespace keys
> > would be to make postgres enable to re-encryption without rebuilt
> > database cluster.
>
> I don't understand this part, sorry.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 1, 2019 at 9:30 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Replying to myself to resend to the list, since my previous attempt
> seems to have been eaten by a grue.
>
> On Tue, Apr 30, 2019 at 1:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Apr 30, 2019 at 1:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > It seems to me that encrypting table data in WAL with multiple keys
> > > reduces damage in case where a key is theft. However user who has an
> > > access privilege of reading WAL can obtain all keys encrypting table
> > > data in WAL in the first place.
> >
> > That better not be true.  If you have a design where reading the WAL
> > lets you get *any* encryption key, you have a bad design, I think.

How does the startup process decrypt WAL during recovery without
getting any encryption key if we encrypt user data in WAL by multiple
encryption keys?

> > If
> > you have a design where reading the WAL lets you get *every*
> > encryption key, that's truly terrible.  That's strictly worse than
> > full-disk encryption, which at least protects against the disk being
> > stolen.
> >
> > > So as long as postgres's access
> > > control facility works fine it's the same as having single encryption
> > > key dedicated for WAL.
> >
> > I think of postgres's access control facility works fine there is no
> > need for encryption in the first place.  If we can just use REVOKE to
> > block access, we should do that and forget about encryption.  The
> > value of encryption only enters the picture when someone can bypass
> > the database server permissions in some way, such as by reading the
> > files directly.

Yes, I agreed.

> >
> > > Or do you mean the case where a malicious user
> > > steals both WAL and key A? It completely depends on from what threat
> > > we want to protect data by transparent data encryption but I think we
> > > should rather protect from such threat by the access control in that
> > > situation. I personally don't think the having an encryption key
> > > dedicated for WAL would increase risk much.
> >
> > Well, what threat are you trying to protect against?

Data theft bypassing PostgreSQL's ACL, for example a malicious user
thefts storage devices and reads datbase files directly.

I'm thinking that only users who have an access privilege of the
database object can get encryption key for the object. Therefore, when
a malicious user stole an encryption key by breaking the access
control system if we suppose data at rest encryption to serve as a yet
another access control layer we have to use the same encryption key
for WAL as that we used for database file. But I thought that we
should rather protect data from that situation by access control
system and managing encryption keys more robustly.

> >
> > > FWIW, binary log encryption of MySQL uses different encryption key
> > > from a key used for table[1]. The key is encrypted by the master key
> > > for binary log encryption and is stored in each file headers.
> >
> > So, if you steal the master key for binary log encryption, you can
> > decrypt everything, it sounds like.

Yes, I think so.

> >
> > > If we use the same key to encrypt the WAL for the table and indexes
> > > and TOAST table for the table, what encryption key should we use for
> > > temporary files for an intermediate result?
> >
> > For temporary files, we can just use some temporary key that is only
> > stored in server memory and only for the lifetime of the session.
> > Once the session ends, we don't ever need to read that data again.
> >

Agreed.

> > > And should we use each
> > > different encryption keys for WAL other than table and indexes
> > > resource manager?
> >
> > Data other than table and index data seems like it is not very
> > security-sensitive.  I'm not sure we need to encrypt it at all.  If we
> > do, using one key seems fine.

Agreed. But it seems not to satisfy some user who require to encrypt
everything, which we discussed before.

> >
> > > The advantage of having the dedicated key for WAL
> > > encryption would be to make WAL encryption more simple. If we do that,
> > > we can encrypt the whole WAL page by key A like the patch proposed by
> > > Antonin does).
> >
> > Yeah.  His design is simpler because it is coarse-grained: the whole
> > cluster is encrypted, so there's no need to worry about encrypting
> > data differently for different tables.
> >
> > > Also the advantage of having multiple tablespace keys
> > > would be to make postgres enable to re-encryption without rebuilt
> > > database cluster.
> >
> > I don't understand this part, sorry.

I wanted to say that if we encrypt whole database cluster by single
encryption key we would need to rebuilt the database cluster when
re-encrypt data. But if we encrypt data in tablespaces by per
tablespace encryption keys we can re-encrypt data by moving
tablespaces, without rebuilt it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, May 7, 2019 at 2:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > That better not be true.  If you have a design where reading the WAL
> > > lets you get *any* encryption key, you have a bad design, I think.
>
> How does the startup process decrypt WAL during recovery without
> getting any encryption key if we encrypt user data in WAL by multiple
> encryption keys?

The keys have to be supplied from someplace outside of the database
system.  I am imagining a command that gets run with the key ID as an
argument and is expected to print the key out on standard output for
the server to read.

I am not an encryption expert, but it's hard for me to imagine this
working any other way.  I mean, if you store the keys that you need
for decryption inside the database, isn't that the same as storing
your house key in your house, or your car key in your car?  If you
store your car key in the car, then either the car is locked from the
outside, and the key is useless to you, or the car is unlocked from
the outside, and the key is just as available to a thief as it is to
you.  Either way, it provides no security.  What you do is keep your
car key in your pocket or purse; if you try to start the car, it
"requests" the key from you as proof that you are entitled to start
it.  I think the database has to work similarly, except that rather
than protecting the act of "starting" the database, each key is
requested the first time it's needed, when it's discovered that we
need to decrypt some data encrypted with that key.

> > > Well, what threat are you trying to protect against?
>
> Data theft bypassing PostgreSQL's ACL, for example a malicious user
> thefts storage devices and reads datbase files directly.
>
> I'm thinking that only users who have an access privilege of the
> database object can get encryption key for the object. Therefore, when
> a malicious user stole an encryption key by breaking the access
> control system if we suppose data at rest encryption to serve as a yet
> another access control layer we have to use the same encryption key
> for WAL as that we used for database file. But I thought that we
> should rather protect data from that situation by access control
> system and managing encryption keys more robustly.

I don't really follow that logic.  If the encryption keys are managed
robustly enough that they cannot be stolen, then we only need one.  If
there is still enough risk of key theft that we care to protect
against it, we can't use a dedicated key for the WAL without
increasing the risk.

> > > > FWIW, binary log encryption of MySQL uses different encryption key
> > > > from a key used for table[1]. The key is encrypted by the master key
> > > > for binary log encryption and is stored in each file headers.
> > >
> > > So, if you steal the master key for binary log encryption, you can
> > > decrypt everything, it sounds like.
>
> Yes, I think so.

I am not keen to copy that design.  It sounds like having multiple
keys in this design adds a lot of complexity without adding much
security.

> > > Data other than table and index data seems like it is not very
> > > security-sensitive.  I'm not sure we need to encrypt it at all.  If we
> > > do, using one key seems fine.
>
> Agreed. But it seems not to satisfy some user who require to encrypt
> everything, which we discussed before.

Agreed.  I'm thinking possibly we need two different facilities.
Facility #1 could be whole-database encryption: everything is
encrypted with one key on a block level.  And facility #2 could be
per-table encryption: blocks for specific tables (and the related
TOAST tables, indexes, and relation forks) are encrypted with specific
keys and, in addition, the WAL records for those tables (and the
related TOAST tables, indexes, and relation forks) are encrypted with
the same key, but on a per-WAL-record level; the original WAL record
would get "wrapped" by a new WAL record that just says "I am an
encrypted WAL record, key ID %d, encrypted contents: %s" and you have
to get the key to decrypt the contents and decrypt the real WAL record
inside of it.  Then you process that interior record as normal.

I guess if you had both things, you'd want tables for which facility
#2 was enabled to bypass facility #1, so that no relation data blocks
were doubly-encrypted, to avoid the overhead.  But a WAL record would
be doubly-encrypted when both facilities are in use: the record would
get encrypted with the per-table key, and then the blocks it got
stored into would be encrypted with the cluster-wide key.

> I wanted to say that if we encrypt whole database cluster by single
> encryption key we would need to rebuilt the database cluster when
> re-encrypt data. But if we encrypt data in tablespaces by per
> tablespace encryption keys we can re-encrypt data by moving
> tablespaces, without rebuilt it.

Interesting.  I suppose that would also be true of per-table keys.
CREATE TABLE newthunk ENCRYPT WITH 'hoge' AS SELECT * FROM thunk; or
something of that sort.

Is there any real advantage of making this per-tablespace rather than
per-table in PostgreSQL's architecture? In some other systems, all the
stuff in a tablespace is glommed together into a big file or a raw
disk partiton or something, so if you used different keys for
different things in the tablespace then it might be hard to know which
key to use for which blocks, but we've got separate files for each
relation anyway. Now, that doesn't answer the question of how
recovery, which can't do pg_class lookups, knows which key to use for
which relation, but recovery can't do pg_tablespace lookups either.
But I think there's a simple answer for that: the encrypted 'wrapper'
WAL record must say which key should be used to decrypt the WAL record
inside of it.  And that must be the same key ID that should be used
for the corresponding relation files that the WAL record touches.  So
no problem!

I mean, no problem apart from writing a huge amount of very complex code...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 8, 2019 at 10:32 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 7, 2019 at 2:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > That better not be true.  If you have a design where reading the WAL
> > > > lets you get *any* encryption key, you have a bad design, I think.
> >
> > How does the startup process decrypt WAL during recovery without
> > getting any encryption key if we encrypt user data in WAL by multiple
> > encryption keys?
>
> The keys have to be supplied from someplace outside of the database
> system.  I am imagining a command that gets run with the key ID as an
> argument and is expected to print the key out on standard output for
> the server to read.
>
> I am not an encryption expert, but it's hard for me to imagine this
> working any other way.  I mean, if you store the keys that you need
> for decryption inside the database, isn't that the same as storing
> your house key in your house, or your car key in your car?  If you
> store your car key in the car, then either the car is locked from the
> outside, and the key is useless to you, or the car is unlocked from
> the outside, and the key is just as available to a thief as it is to
> you.  Either way, it provides no security.  What you do is keep your
> car key in your pocket or purse; if you try to start the car, it
> "requests" the key from you as proof that you are entitled to start
> it.

Agreed, keys for decryption must be stored outside of database.

>  I think the database has to work similarly, except that rather
> than protecting the act of "starting" the database, each key is
> requested the first time it's needed, when it's discovered that we
> need to decrypt some data encrypted with that key.
>

It could depend on the design. In 2-tier key architecture that we
proposed, since all data keys that we need for encryption of table
data are encrypted and stored inside of database, we can get the
master key at once when starting database and decrypt all data keys.

> > > > Well, what threat are you trying to protect against?
> >
> > Data theft bypassing PostgreSQL's ACL, for example a malicious user
> > thefts storage devices and reads datbase files directly.
> >
> > I'm thinking that only users who have an access privilege of the
> > database object can get encryption key for the object. Therefore, when
> > a malicious user stole an encryption key by breaking the access
> > control system if we suppose data at rest encryption to serve as a yet
> > another access control layer we have to use the same encryption key
> > for WAL as that we used for database file. But I thought that we
> > should rather protect data from that situation by access control
> > system and managing encryption keys more robustly.
>
> I don't really follow that logic.  If the encryption keys are managed
> robustly enough that they cannot be stolen, then we only need one.  If
> there is still enough risk of key theft that we care to protect
> against it, we can't use a dedicated key for the WAL without
> increasing the risk.

In 2-tier key architecture design, the key dedicated for WAL (=WAL
data key) is stored inside of database and it never go out of
database, which is also true  for data keys of tables and indexes .
The master key is per database cluster and it encrypts all data key as
well before storing them to the disk. Therefore when the master key is
stolen, a malicious user can see not only all data in WAL but also all
table data, because the all data keys are decrypted with the master
key. So I thought that the situation you're concerned is where a
malicious user can see a table data of that they don't have privilege
if they stole the master key, WAL data key and WAL but not for table
data. Is that right?

>
> > > > > FWIW, binary log encryption of MySQL uses different encryption key
> > > > > from a key used for table[1]. The key is encrypted by the master key
> > > > > for binary log encryption and is stored in each file headers.
> > > >
> > > > So, if you steal the master key for binary log encryption, you can
> > > > decrypt everything, it sounds like.
> >
> > Yes, I think so.
>
> I am not keen to copy that design.  It sounds like having multiple
> keys in this design adds a lot of complexity without adding much
> security.
>
> > > > Data other than table and index data seems like it is not very
> > > > security-sensitive.  I'm not sure we need to encrypt it at all.  If we
> > > > do, using one key seems fine.
> >
> > Agreed. But it seems not to satisfy some user who require to encrypt
> > everything, which we discussed before.
>
> Agreed.  I'm thinking possibly we need two different facilities.
> Facility #1 could be whole-database encryption: everything is
> encrypted with one key on a block level.  And facility #2 could be
> per-table encryption: blocks for specific tables (and the related
> TOAST tables, indexes, and relation forks) are encrypted with specific
> keys and, in addition, the WAL records for those tables (and the
> related TOAST tables, indexes, and relation forks) are encrypted with
> the same key, but on a per-WAL-record level; the original WAL record
> would get "wrapped" by a new WAL record that just says "I am an
> encrypted WAL record, key ID %d, encrypted contents: %s" and you have
> to get the key to decrypt the contents and decrypt the real WAL record
> inside of it.  Then you process that interior record as normal.
>
> I guess if you had both things, you'd want tables for which facility
> #2 was enabled to bypass facility #1, so that no relation data blocks
> were doubly-encrypted, to avoid the overhead.  But a WAL record would
> be doubly-encrypted when both facilities are in use: the record would
> get encrypted with the per-table key, and then the blocks it got
> stored into would be encrypted with the cluster-wide key.

#2 also must encrypt system catalogs as well as specified user tables,
and temporary files are also encrypted. So the difference between #1
and #2 is whether to encrypt all data in WAL from the perspective of
encrypted objects?  Or do you think that #1 encrypts anything other
objects or files such as large objects and backup_label? If the
difference is only WAL, #2 can cover #1 by encrypting all WAL records.

>
> > I wanted to say that if we encrypt whole database cluster by single
> > encryption key we would need to rebuilt the database cluster when
> > re-encrypt data. But if we encrypt data in tablespaces by per
> > tablespace encryption keys we can re-encrypt data by moving
> > tablespaces, without rebuilt it.
>
> Interesting.  I suppose that would also be true of per-table keys.
> CREATE TABLE newthunk ENCRYPT WITH 'hoge' AS SELECT * FROM thunk; or
> something of that sort.
>
> Is there any real advantage of making this per-tablespace rather than
> per-table in PostgreSQL's architecture? In some other systems, all the
> stuff in a tablespace is glommed together into a big file or a raw
> disk partiton or something, so if you used different keys for
> different things in the tablespace then it might be hard to know which
> key to use for which blocks, but we've got separate files for each
> relation anyway. Now, that doesn't answer the question of how
> recovery, which can't do pg_class lookups, knows which key to use for
> which relation, but recovery can't do pg_tablespace lookups either.
> But I think there's a simple answer for that: the encrypted 'wrapper'
> WAL record must say which key should be used to decrypt the WAL record
> inside of it.  And that must be the same key ID that should be used
> for the corresponding relation files that the WAL record touches.  So
> no problem!

In terms of keys, one advantage could be that we have less keys with
per-tablespace keys.

Let me briefly explain the current design I'm thinking. The design
employees 2-tier key architecture. That is, a database cluster has one
master key and per-tablespace keys which are encrypted with the master
key before storing to disk. Each tablespace keys are generated
randomly inside database when CREATE TABLESPACE. The all encrypted
tablespace keys are stored together with the master key ID to the file
(say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can
easily get all tablespace keys and the master key ID before starting
recovery, and therefore can get the all decrypted tablespace keys. The
reason why it doesn't store per-tablespace keys in a column of
pg_tabelspace is that we also encrypt pg_tablespace with the
tablespace key. We could take a way to not encrypt only pg_tablespace,
however it instead would require to scan pg_tablespace before
recovery, and eventually we would need to not encrypt pg_attribute
that should be encrypted.

During the recovery I'm also thinking the idea you suggested; wrapper
WAL records have tablespace OID that is the lookup key for tablespace
key and the startup process can get the tablespace key.

Given that the above design less data keys is better. Obviously
per-tablespace keys are less than per-table keys. And even if we
employee per-tablespace keys we can allow user to specify per-table
encryption by using the same encryption key within the tablespace.

FYI one advantage of per-tablespace encryption from user perspective
would be less conversion when database migration. Using
default_tablespace parameter we need less modification of create table
DDL.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Wed, May  8, 2019 at 09:32:08AM -0400, Robert Haas wrote:
> On Tue, May 7, 2019 at 2:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > That better not be true.  If you have a design where reading the WAL
> > > > lets you get *any* encryption key, you have a bad design, I think.
> >
> > How does the startup process decrypt WAL during recovery without
> > getting any encryption key if we encrypt user data in WAL by multiple
> > encryption keys?
> 
> The keys have to be supplied from someplace outside of the database
> system.  I am imagining a command that gets run with the key ID as an
> argument and is expected to print the key out on standard output for
> the server to read.

Agreed.

> I am not an encryption expert, but it's hard for me to imagine this
> working any other way.  I mean, if you store the keys that you need
> for decryption inside the database, isn't that the same as storing
> your house key in your house, or your car key in your car?  If you
> store your car key in the car, then either the car is locked from the
> outside, and the key is useless to you, or the car is unlocked from
> the outside, and the key is just as available to a thief as it is to
> you.  Either way, it provides no security.  What you do is keep your
> car key in your pocket or purse; if you try to start the car, it
> "requests" the key from you as proof that you are entitled to start
> it.  I think the database has to work similarly, except that rather
> than protecting the act of "starting" the database, each key is
> requested the first time it's needed, when it's discovered that we
> need to decrypt some data encrypted with that key.

Two-tier encryption usually stores the encrypted data keys in the
database, and the key access password is supplied externally. 
pgcryptokey does this:

    http://momjian.us/download/pgcryptokey/

        +------------------------+
        |                        |
        |   key access password  |
        |                        |
        |  +------------------+  |
        |  |encrypted_data_key|  |
        |  +------------------+  |
        |                        |
        +------------------------+

> > > > Well, what threat are you trying to protect against?
> >
> > Data theft bypassing PostgreSQL's ACL, for example a malicious user
> > thefts storage devices and reads database files directly.
> >
> > I'm thinking that only users who have an access privilege of the
> > database object can get encryption key for the object. Therefore, when
> > a malicious user stole an encryption key by breaking the access
> > control system if we suppose data at rest encryption to serve as a yet
> > another access control layer we have to use the same encryption key
> > for WAL as that we used for database file. But I thought that we
> > should rather protect data from that situation by access control
> > system and managing encryption keys more robustly.
> 
> I don't really follow that logic.  If the encryption keys are managed
> robustly enough that they cannot be stolen, then we only need one.  If
> there is still enough risk of key theft that we care to protect
> against it, we can't use a dedicated key for the WAL without
> increasing the risk.

You can change the key access password periodically by just reencrypting
the encrypted data keys with the new key access password.

Because you need to reencrypt all data when you change the encrypted
data key, you probably need to have at least two such keys active at a
time.  I think you need an API that allows applications to just use the
most recent key, and another API which allows you to select keys by
version number.  pgcryptokey does this by allowing specification of a
encrypted data key by name or key_id.

It might be necessary to allow decryption to try several versions of a
key to see which one decrypts the data.  While this is possible with PGP
because there is a checksum payload, it isn't possible with AES256
because the input/output sizes are the same.  Checking for a valid 8k
block format or WAL format might work.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, May  9, 2019 at 05:49:12PM +0900, Masahiko Sawada wrote:
> In terms of keys, one advantage could be that we have less keys with
> per-tablespace keys.
> 
> Let me briefly explain the current design I'm thinking. The design
> employees 2-tier key architecture. That is, a database cluster has one
> master key and per-tablespace keys which are encrypted with the master
> key before storing to disk. Each tablespace keys are generated
> randomly inside database when CREATE TABLESPACE. The all encrypted
> tablespace keys are stored together with the master key ID to the file
> (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can
> easily get all tablespace keys and the master key ID before starting
> recovery, and therefore can get the all decrypted tablespace keys. The
> reason why it doesn't store per-tablespace keys in a column of
> pg_tabelspace is that we also encrypt pg_tablespace with the
> tablespace key. We could take a way to not encrypt only pg_tablespace,
> however it instead would require to scan pg_tablespace before
> recovery, and eventually we would need to not encrypt pg_attribute
> that should be encrypted.
> 
> During the recovery I'm also thinking the idea you suggested; wrapper
> WAL records have tablespace OID that is the lookup key for tablespace
> key and the startup process can get the tablespace key.
> 
> Given that the above design less data keys is better. Obviously
> per-tablespace keys are less than per-table keys. And even if we
> employee per-tablespace keys we can allow user to specify per-table
> encryption by using the same encryption key within the tablespace.
> 
> FYI one advantage of per-tablespace encryption from user perspective
> would be less conversion when database migration. Using
> default_tablespace parameter we need less modification of create table
> DDL.

I think we need to step back and see what we want to do.  There are six
levels of possible encryption:

1.  client-side column encryption
2.  server-side column encryption
3.  table-level
4.  database-level
5.  tablespace-level
6.  cluster-level

1 & 2 encrypt the data in the WAL automatically, and option 6 is
encrypting the entire WAL.  This leaves 3-5 as cases where there will be
mismatch between the object-level encryption and WAL.  I don't think it
is very valuable to use these options so reencryption will be easier. 
In many cases, taking any object offline might cause the application to
fail, and having multiple encrypted data keys active will allow key
replacement to be done on an as-needed basis.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hi Masahiko,

> Let me briefly explain the current design I'm thinking. The design employees 2-tier key architecture. That is, a
databasecluster has one
 
> master key and per-tablespace keys which are encrypted with the master key before storing to disk. Each tablespace
keysare generated
 
> randomly inside database when CREATE TABLESPACE. The all encrypted tablespace keys are stored together with the
masterkey ID to the
 
> file (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can easily get all tablespace keys and the
masterkey ID before
 
> starting recovery, and therefore can get the all decrypted tablespace keys.

Your design idea sounds very similar to the current Fujitsu Enterprise Postgres (FEP) implementation of TDE.

FEP uses a master encryption key (MEK) for the database cluster. This MEK is stored in a file at some GUC variable
location.This file is encrypted using a “passphrase” known only to the administrator.
 

There are also per-tablespace keys, which are randomly generated at the time of CREATE TABLESPACE and stored in files.
Thereis one tablespace key file per tablespace. These tablespace key files are encrypted by the MEK and stored at the
locationspecified by CREATE TABLESPACE.
 

Not all tablespaces use TDE. An FEP extension of the CREATE TABLESPACE syntax, creates the tablespace key file only
whenencryption was requested.
 
e.g. CREATE TABLESPACE my_secure_tablespace LOCATION '/home/postgre/FEP/TESTING/tablespacedir' WITH
(tablespace_encryption_algorithm= 'AES256');
 

The MEK is not currently got from a third party. It is randomly generated when the master key file is first created by
anotheradded function.
 
e.g. select pgx_set_master_key('passphrase');

Kind Regards,
Peter Smith
Fujitsu Australia
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is
commercial-in-confidenceand is intended only for the use of the above named addressee. If you are not the intended
recipient,you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you
havereceived this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000
orby reply e-mail to the sender and delete the document and all copies thereof.
 


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email
communication,it is the receiver’s responsibility to scan all communication and any files attached for computer viruses
andother defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage
(whetherdirect, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may
resultdirectly or indirectly from this communication or any files attached.
 


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty
Ltd,please email unsubscribe@fast.au.fujitsu.com
 

On Mon, May 13, 2019 at 2:09 PM Smith, Peter <peters@fast.au.fujitsu.com> wrote:
>
> Hi Masahiko,
>
> > Let me briefly explain the current design I'm thinking. The design employees 2-tier key architecture. That is, a
databasecluster has one 
> > master key and per-tablespace keys which are encrypted with the master key before storing to disk. Each tablespace
keysare generated 
> > randomly inside database when CREATE TABLESPACE. The all encrypted tablespace keys are stored together with the
masterkey ID to the 
> > file (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can easily get all tablespace keys and the
masterkey ID before 
> > starting recovery, and therefore can get the all decrypted tablespace keys.
>
> Your design idea sounds very similar to the current Fujitsu Enterprise Postgres (FEP) implementation of TDE.
>

Yeah, I studied the design of TDE from FEP as well as other database
supporting TDE.

> FEP uses a master encryption key (MEK) for the database cluster. This MEK is stored in a file at some GUC variable
location.This file is encrypted using a “passphrase” known only to the administrator. 
>
> There are also per-tablespace keys, which are randomly generated at the time of CREATE TABLESPACE and stored in
files.There is one tablespace key file per tablespace. These tablespace key files are encrypted by the MEK and stored
atthe location specified by CREATE TABLESPACE. 
>
> Not all tablespaces use TDE. An FEP extension of the CREATE TABLESPACE syntax, creates the tablespace key file only
whenencryption was requested. 
> e.g. CREATE TABLESPACE my_secure_tablespace LOCATION '/home/postgre/FEP/TESTING/tablespacedir' WITH
(tablespace_encryption_algorithm= 'AES256'); 
>
> The MEK is not currently got from a third party. It is randomly generated when the master key file is first created
byanother added function. 
> e.g. select pgx_set_master_key('passphrase');

Thank you for explaining!

I think that the main difference between FEP and our proposal is the
master key management. In our proposal postgres can get the master key
from the external key management server or services such as AWS KMS,
Gemalto KeySecure and an encrypted file by using the corresponding
plugin. We believe that this extensible architecture would be useful
for applying postgres to various systems.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Fri, May 10, 2019 at 2:42 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, May  9, 2019 at 05:49:12PM +0900, Masahiko Sawada wrote:
> > In terms of keys, one advantage could be that we have less keys with
> > per-tablespace keys.
> >
> > Let me briefly explain the current design I'm thinking. The design
> > employees 2-tier key architecture. That is, a database cluster has one
> > master key and per-tablespace keys which are encrypted with the master
> > key before storing to disk. Each tablespace keys are generated
> > randomly inside database when CREATE TABLESPACE. The all encrypted
> > tablespace keys are stored together with the master key ID to the file
> > (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can
> > easily get all tablespace keys and the master key ID before starting
> > recovery, and therefore can get the all decrypted tablespace keys. The
> > reason why it doesn't store per-tablespace keys in a column of
> > pg_tabelspace is that we also encrypt pg_tablespace with the
> > tablespace key. We could take a way to not encrypt only pg_tablespace,
> > however it instead would require to scan pg_tablespace before
> > recovery, and eventually we would need to not encrypt pg_attribute
> > that should be encrypted.
> >
> > During the recovery I'm also thinking the idea you suggested; wrapper
> > WAL records have tablespace OID that is the lookup key for tablespace
> > key and the startup process can get the tablespace key.
> >
> > Given that the above design less data keys is better. Obviously
> > per-tablespace keys are less than per-table keys. And even if we
> > employee per-tablespace keys we can allow user to specify per-table
> > encryption by using the same encryption key within the tablespace.
> >
> > FYI one advantage of per-tablespace encryption from user perspective
> > would be less conversion when database migration. Using
> > default_tablespace parameter we need less modification of create table
> > DDL.
>
> I think we need to step back and see what we want to do.  There are six
> levels of possible encryption:
>
> 1.  client-side column encryption
> 2.  server-side column encryption
> 3.  table-level
> 4.  database-level
> 5.  tablespace-level
> 6.  cluster-level
>
> 1 & 2 encrypt the data in the WAL automatically, and option 6 is
> encrypting the entire WAL.  This leaves 3-5 as cases where there will be
> mismatch between the object-level encryption and WAL.  I don't think it
> is very valuable to use these options so reencryption will be easier.
> In many cases, taking any object offline might cause the application to
> fail, and having multiple encrypted data keys active will allow key
> replacement to be done on an as-needed basis.
>

Summarizing the design discussion so far and the discussion I had at
PGCon, there are several basic design items here. Each of them is
loosely related and there are trade-off.

1. Encryption Levels.
As Bruce suggested there are 6 levels.  The fine grained control will
help to suppress performance overheads of tables that we don't
actually need to encrypt. Even in terms of security it might help
since we don't give the key users who don't or cannot access to
encrypted tables. But whichever we choose the level, we can protect
data from attack bypassing PostgresSQL's ACL such as reading database
file directly, as long as we encrypt data inside database. Threats we
want to protect by has already gotten consensus so far, I think.

Among these levels, the tablespace level would be somewhat different
from others because it corresponds to physical directories rather than
database objects. So in principles it's possible that tables are
created on an encrypted tablespace while indexes are created on
non-encrypted tablespace, which does not make sense though. But having
less encryption keys would be better for simple architecture.

2. Encryption Objects.
Indexes, WAL and TOAST table pertaining to encrypted tables, and
temporary files must also be encrypted but we need to discuss whether
we encrypt non-user data as well such as SLRU data, vm and fsm, and
perhaps even other files such as 2PC state files, backend_label etc.
Encryption everything is required by some use case but it's also true
that there are users who wish to encrypt database while minimizing
performance overheads.

3. Encryption keys.
Encryption levels would be relevant with the number of encryption keys
we use. The database cluster levels would use single encryption key
and can encrypt everything easier including non-user data such as xact
WALs and SRLU data with the same key. On the other hand, for instance
the table level would use multiple keys and can encrypt tables with
different encryption keys. One advantage of having multiple keys in
database would be that it can re-encrypt encrypted database object
as-needed basis. For instance in multi tenant architecture, the
stopping database cluster would affect all services but we can
re-encrypt data one by one while minimizing downtime of each services
if we use multiple keys. Even in terms of security, having multiple
keys helps the diversification of risk.

4. Key rotation and 2 tier key hierarchy.
Another design point would be key rotation and using 2 tier key
hierarchy. Periodic key rotation is very important and is required by
some security standard.

For instance PCI DSS 3.6.4 states "Cryptographic key changes for keys
that have reached the end of their cryptoperiod (for example, after a
defined period of time has passed and/or after a certain amount of
cipher-text has been produced by a given key), as defined by the
associated application vendor or key owner, and based on industry best
practices and guidelines"[1]. A cryptoperiod is the time span during
which a specific cryptographic key is authorized for use[2] (sometimes
represents the number of transactions). It is defined based on
multiple factors such as key length, key strength, algorithms. So it
depends on users systems, key rotation can be required even every a
few months.

2 tier key hierarchy is a technique for faster key rotation; it uses 2
types of keys: master key and data key. The master key is stored
outside database whereas the data key is stored inside database. The
data key is used to encrypt actual database data and is encrypted with
the master key before storing to the disk. And the master key must be
stored outside database. When key rotation, we rotate the master key
and re-encrypt only data key rather than database data. Therefore
since we don't need to access and modify database data the key
rotation will complete in a second. Without it, we will end up with
re-encrypting database data, which could take a long time depending on
the encryption levels.

Because the data key must not be go outside database we might need to
provide a mechanism of key rotation in database side, for example
providing pg_rotate_encryption_key() SQL function or ALTER SYSTEM
ROTATE KEY SQL command so that PostgreSQL rotate data keys with the
new master key  So it might also require PostgreSQL to get the new
master key from an external location within the command, and therefore
it's relevant with key management.

5. Key management.
Encryption key (the master key in 2 tier key hierarchy) must be taken
from an external location. If only getting, it might be enough to get
it using shell command like curl. However it's more secure if we can
seamlessly integrate PostgreSQL with key management services. And DBAs
would no longer need to care key lifecycles. In addition to that,
dedicated programs or knowledge will not be necessary in individual
user systems. Furthermore, integration with KMS might be helpful even
for column-level encryption; we could combine pgcrypto with it to
provide per column TDE.

Apart from the above discussion, there are random concerns about the
design regarding to the fine grained design. For WAL encryption, as a
result of discussion so far I'm going to use the same encryption for
WAL encryption as that used for tables. Given that approach, it would
be required to make utility commands that read WAL (pg_waldump and
pg_rewind) be able to get arbitrary encryption keys. pg_waldump might
require even an encryption keys of WAL of which table has already been
dropped. As I discussed at PGCon[3], by rearranging WAL format would
solve this issue but it doesn't resolve fundamental issue.

Also, for system catalog encryption, it could be a hard part. System
catalogs are initially created at initdb time and created by copying
from template1 when CREATE DATABASE. Therefore we would need to either
modify initdb so that it's aware of encryption keys and KMS or modify
database creation so that it copies database file while encrypting
them.

There are two proposals so far: the cluster wide encryption and per
tablespace encryption. It's good if we pick a good thing from each
proposals in order to provide an useful TDE feature to users. And
whatever design the community accepts I'd like to contribute to do
that.

Comments and feedback are very welcome!

[1] https://www.pcisecuritystandards.org/document_library?category=pcidss&document=pci_dss
[2] https://en.wikipedia.org/wiki/Cryptoperiod
[3] https://www.slideshare.net/masahikosawada98/transparent-data-encryption-in-postgresql/28


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Wed, Jun  5, 2019 at 11:54:04AM +0900, Masahiko Sawada wrote:
> On Fri, May 10, 2019 at 2:42 AM Bruce Momjian <bruce@momjian.us> wrote:
> > I think we need to step back and see what we want to do.  There are six
> > levels of possible encryption:
> >
> > 1.  client-side column encryption
> > 2.  server-side column encryption
> > 3.  table-level
> > 4.  database-level
> > 5.  tablespace-level
> > 6.  cluster-level
> >
> > 1 & 2 encrypt the data in the WAL automatically, and option 6 is
> > encrypting the entire WAL.  This leaves 3-5 as cases where there will be
> > mismatch between the object-level encryption and WAL.  I don't think it
> > is very valuable to use these options so reencryption will be easier.
> > In many cases, taking any object offline might cause the application to
> > fail, and having multiple encrypted data keys active will allow key
> > replacement to be done on an as-needed basis.
> >
> 
> Summarizing the design discussion so far and the discussion I had at
> PGCon, there are several basic design items here. Each of them is
> loosely related and there are trade-off.
> 
> 1. Encryption Levels.
> As Bruce suggested there are 6 levels.  The fine grained control will
> help to suppress performance overheads of tables that we don't
> actually need to encrypt. Even in terms of security it might help
> since we don't give the key users who don't or cannot access to
> encrypted tables. But whichever we choose the level, we can protect
> data from attack bypassing PostgresSQL's ACL such as reading database
> file directly, as long as we encrypt data inside database. Threats we
> want to protect by has already gotten consensus so far, I think.

I think level 6 is an obvious must-have.  I think the big question is
whether we gain enough by implementing levels 3-5 compared to the
complexity of the code and user interface.  

The big question is how many people will be mixing encrypted and
unencrypted data in the same cluster, and care about performance?  Just
because someone might care is not enough of a justification.  They can
certainly create separate encrypted and non-encrypted clusters. Can we
implement level 6 and then implement levels 3-5 later if desired?

> Among these levels, the tablespace level would be somewhat different
> from others because it corresponds to physical directories rather than
> database objects. So in principles it's possible that tables are
> created on an encrypted tablespace while indexes are created on
> non-encrypted tablespace, which does not make sense though. But having
> less encryption keys would be better for simple architecture.

How would you configure the WAL to know which key to use if we did #5?
Wouldn't system tables and statistics, and perhaps referential integry
allow for information leakage?

> 2. Encryption Objects.
> Indexes, WAL and TOAST table pertaining to encrypted tables, and
> temporary files must also be encrypted but we need to discuss whether
> we encrypt non-user data as well such as SLRU data, vm and fsm, and
> perhaps even other files such as 2PC state files, backend_label etc.
> Encryption everything is required by some use case but it's also true
> that there are users who wish to encrypt database while minimizing
> performance overheads.

I don't think we need to encrypt the "status" files like SLRU data, vm
and fsm.

> 3. Encryption keys.
> Encryption levels would be relevant with the number of encryption keys
> we use. The database cluster levels would use single encryption key
> and can encrypt everything easier including non-user data such as xact
> WALs and SRLU data with the same key. On the other hand, for instance
> the table level would use multiple keys and can encrypt tables with
> different encryption keys. One advantage of having multiple keys in
> database would be that it can re-encrypt encrypted database object
> as-needed basis. For instance in multi tenant architecture, the
> stopping database cluster would affect all services but we can
> re-encrypt data one by one while minimizing downtime of each services
> if we use multiple keys. Even in terms of security, having multiple
> keys helps the diversification of risk.

I agree we need a 2 tier key hierarchy.   See my pgcryptokey extension
as an example:

    http://momjian.us/download/pgcryptokey/

> Apart from the above discussion, there are random concerns about the
> design regarding to the fine grained design. For WAL encryption, as a
> result of discussion so far I'm going to use the same encryption for
> WAL encryption as that used for tables. Given that approach, it would
> be required to make utility commands that read WAL (pg_waldump and
> pg_rewind) be able to get arbitrary encryption keys. pg_waldump might
> require even an encryption keys of WAL of which table has already been
> dropped. As I discussed at PGCon[3], by rearranging WAL format would
> solve this issue but it doesn't resolve fundamental issue.

Good point about pg_waldump.  I am a little worried we might open a
security hole making a new API so they work, so maybe we should avoid
it.

> Also, for system catalog encryption, it could be a hard part. System
> catalogs are initially created at initdb time and created by copying
> from template1 when CREATE DATABASE. Therefore we would need to either
> modify initdb so that it's aware of encryption keys and KMS or modify
> database creation so that it copies database file while encrypting
> them.

I assume initdb will use the same API that you would use to start the
server itself, e.g., type in a password, or contact a key server.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jun 13, 2019 at 3:48 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Jun  5, 2019 at 11:54:04AM +0900, Masahiko Sawada wrote:
> > On Fri, May 10, 2019 at 2:42 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > I think we need to step back and see what we want to do.  There are six
> > > levels of possible encryption:
> > >
> > > 1.  client-side column encryption
> > > 2.  server-side column encryption
> > > 3.  table-level
> > > 4.  database-level
> > > 5.  tablespace-level
> > > 6.  cluster-level
> > >
> > > 1 & 2 encrypt the data in the WAL automatically, and option 6 is
> > > encrypting the entire WAL.  This leaves 3-5 as cases where there will be
> > > mismatch between the object-level encryption and WAL.  I don't think it
> > > is very valuable to use these options so reencryption will be easier.
> > > In many cases, taking any object offline might cause the application to
> > > fail, and having multiple encrypted data keys active will allow key
> > > replacement to be done on an as-needed basis.
> > >
> >
> > Summarizing the design discussion so far and the discussion I had at
> > PGCon, there are several basic design items here. Each of them is
> > loosely related and there are trade-off.
> >
> > 1. Encryption Levels.
> > As Bruce suggested there are 6 levels.  The fine grained control will
> > help to suppress performance overheads of tables that we don't
> > actually need to encrypt. Even in terms of security it might help
> > since we don't give the key users who don't or cannot access to
> > encrypted tables. But whichever we choose the level, we can protect
> > data from attack bypassing PostgresSQL's ACL such as reading database
> > file directly, as long as we encrypt data inside database. Threats we
> > want to protect by has already gotten consensus so far, I think.
>
> I think level 6 is an obvious must-have.  I think the big question is
> whether we gain enough by implementing levels 3-5 compared to the
> complexity of the code and user interface.
>
> The big question is how many people will be mixing encrypted and
> unencrypted data in the same cluster, and care about performance?  Just
> because someone might care is not enough of a justification.  They can
> certainly create separate encrypted and non-encrypted clusters. Can we
> implement level 6 and then implement levels 3-5 later if desired?
>

I guess most users are interested in performance. Users don't want to
sacrifice performance for security and vice versa. Fine grained
control would allow us to seek a compromise point.

From another point of view, there are our clients who want rather to
use different keys from keys other systems using for security reason
in multi tenant environment. Also, when key leakage they need to
re-encrypt database data. This feature would help such situations. We
can use different keys for each systems and can re-encrypt data
without database down.

> > Among these levels, the tablespace level would be somewhat different
> > from others because it corresponds to physical directories rather than
> > database objects. So in principles it's possible that tables are
> > created on an encrypted tablespace while indexes are created on
> > non-encrypted tablespace, which does not make sense though. But having
> > less encryption keys would be better for simple architecture.
>
> How would you configure the WAL to know which key to use if we did #5?
> Wouldn't system tables and statistics, and perhaps referential integry
> allow for information leakage?

We use a something like a map between tablespace oid and encryption
key as a separate file (maybe stored in $PGDATA/global), called
keyring. Using the keyring we can obtain encryption key by tablespace
oid. For WAL, we add a flag to XLogRecord which indicates whether the
WAL record is encrypted, and we already have relfilenode in the header
data of WAL. So we can obtain the tablespace oid from the part and
obtain the corresponding encryption key.

>
> > 2. Encryption Objects.
> > Indexes, WAL and TOAST table pertaining to encrypted tables, and
> > temporary files must also be encrypted but we need to discuss whether
> > we encrypt non-user data as well such as SLRU data, vm and fsm, and
> > perhaps even other files such as 2PC state files, backend_label etc.
> > Encryption everything is required by some use case but it's also true
> > that there are users who wish to encrypt database while minimizing
> > performance overheads.
>
> I don't think we need to encrypt the "status" files like SLRU data, vm
> and fsm.

I agree.

>
> > 3. Encryption keys.
> > Encryption levels would be relevant with the number of encryption keys
> > we use. The database cluster levels would use single encryption key
> > and can encrypt everything easier including non-user data such as xact
> > WALs and SRLU data with the same key. On the other hand, for instance
> > the table level would use multiple keys and can encrypt tables with
> > different encryption keys. One advantage of having multiple keys in
> > database would be that it can re-encrypt encrypted database object
> > as-needed basis. For instance in multi tenant architecture, the
> > stopping database cluster would affect all services but we can
> > re-encrypt data one by one while minimizing downtime of each services
> > if we use multiple keys. Even in terms of security, having multiple
> > keys helps the diversification of risk.
>
> I agree we need a 2 tier key hierarchy.   See my pgcryptokey extension
> as an example:
>
>         http://momjian.us/download/pgcryptokey/

Thanks.

>
> > Apart from the above discussion, there are random concerns about the
> > design regarding to the fine grained design. For WAL encryption, as a
> > result of discussion so far I'm going to use the same encryption for
> > WAL encryption as that used for tables. Given that approach, it would
> > be required to make utility commands that read WAL (pg_waldump and
> > pg_rewind) be able to get arbitrary encryption keys. pg_waldump might
> > require even an encryption keys of WAL of which table has already been
> > dropped. As I discussed at PGCon[3], by rearranging WAL format would
> > solve this issue but it doesn't resolve fundamental issue.
>
> Good point about pg_waldump.  I am a little worried we might open a
> security hole making a new API so they work, so maybe we should avoid
> it.

Yeah, in principle since data key of 2 tier key architecture should
not go outside database I think we should not tell data keys to
utility commands. So the rearranging WAL format seems to be a better
solution but is there any reason why the main data is placed at end of
WAL record? I wonder if we can assemble WAL records as following order
and encrypt only 3 and 4.

1. Header data (XLogRecord and other headers)
2. Main data (xl_heap_insert, xl_heap_update etc + related data)
3. Block data (Tuple data, FPI)
4. Sub data (e.g tuple data for logical decoding)

>
> > Also, for system catalog encryption, it could be a hard part. System
> > catalogs are initially created at initdb time and created by copying
> > from template1 when CREATE DATABASE. Therefore we would need to either
> > modify initdb so that it's aware of encryption keys and KMS or modify
> > database creation so that it copies database file while encrypting
> > them.
>
> I assume initdb will use the same API that you would use to start the
> server itself, e.g., type in a password, or contact a key server.

I realized that in XTS encryption mode since we craft the tweak using
relfilenode we will need to have the different tweaks for system
catalogs in new database would change. So we might need to re-encrypt
system catalogs when CREATE DATABASE after all. I suspect that even
the cluster-wide encryption has the same problem.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
> On Thu, Jun 13, 2019 at 3:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> > The big question is how many people will be mixing encrypted and
> > unencrypted data in the same cluster, and care about performance?  Just
> > because someone might care is not enough of a justification.  They can
> > certainly create separate encrypted and non-encrypted clusters. Can we
> > implement level 6 and then implement levels 3-5 later if desired?
> 
> I guess most users are interested in performance. Users don't want to
> sacrifice performance for security and vice versa. Fine grained
> control would allow us to seek a compromise point.

Well, what does that add to the argument?  Yes, everyone cares about
performance, but it is the magnitude of the performance impact vs. the
complexity that is the issue here.  Also, by definition, users will
trade performance for security because encrypting data will slow down
the database.  The open question is how much, and if that overhead is
reasonable based on the complexity.

What I don't want to do is to design a system that is more complex than
required, and it might become so complex we might never get it done.

> > How would you configure the WAL to know which key to use if we did #5?
> > Wouldn't system tables and statistics, and perhaps referential integry
> > allow for information leakage?
> 
> We use a something like a map between tablespace oid and encryption
> key as a separate file (maybe stored in $PGDATA/global), called
> keyring. Using the keyring we can obtain encryption key by tablespace
> oid. For WAL, we add a flag to XLogRecord which indicates whether the
> WAL record is encrypted, and we already have relfilenode in the header
> data of WAL. So we can obtain the tablespace oid from the part and
> obtain the corresponding encryption key.

OK.

> > > 2. Encryption Objects.
> > > Indexes, WAL and TOAST table pertaining to encrypted tables, and
> > > temporary files must also be encrypted but we need to discuss whether
> > > we encrypt non-user data as well such as SLRU data, vm and fsm, and
> > > perhaps even other files such as 2PC state files, backend_label etc.
> > > Encryption everything is required by some use case but it's also true
> > > that there are users who wish to encrypt database while minimizing
> > > performance overheads.
> >
> > I don't think we need to encrypt the "status" files like SLRU data, vm
> > and fsm.
> 
> I agree.

Good.

> > Good point about pg_waldump.  I am a little worried we might open a
> > security hole making a new API so they work, so maybe we should avoid
> > it.
> 
> Yeah, in principle since data key of 2 tier key architecture should
> not go outside database I think we should not tell data keys to
> utility commands. So the rearranging WAL format seems to be a better
> solution but is there any reason why the main data is placed at end of
> WAL record? I wonder if we can assemble WAL records as following order
> and encrypt only 3 and 4.
> 
> 1. Header data (XLogRecord and other headers)
> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
> 3. Block data (Tuple data, FPI)
> 4. Sub data (e.g tuple data for logical decoding)

Yes, that does sound like a reasonable idea.  It is similar to us not
encrypting the clog --- there is little value.  However, if we only
encrypt the cluster, we don't need to expose the relfilenode and we can
just encrypt the entire WAL --- I like that simplicity.  We might find
that the complexity of encrypting only certain tablespaces makes the
system slower than just encrypting the entire cluster.

> > > Also, for system catalog encryption, it could be a hard part. System
> > > catalogs are initially created at initdb time and created by copying
> > > from template1 when CREATE DATABASE. Therefore we would need to either
> > > modify initdb so that it's aware of encryption keys and KMS or modify
> > > database creation so that it copies database file while encrypting
> > > them.
> >
> > I assume initdb will use the same API that you would use to start the
> > server itself, e.g., type in a password, or contact a key server.
> 
> I realized that in XTS encryption mode since we craft the tweak using
> relfilenode we will need to have the different tweaks for system
> catalogs in new database would change. So we might need to re-encrypt
> system catalogs when CREATE DATABASE after all. I suspect that even
> the cluster-wide encryption has the same problem.

Yes, this is why I want to just do cluster-wide encryption at this
stage.

In addition, while the 8k blocks would use a block cipher, the WAL would
likely use a stream cipher, and it will be very hard to use multiple
stream ciphers in a single WAL file.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jun 13, 2019 at 11:07:25AM -0400, Bruce Momjian wrote:
>On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
>> On Thu, Jun 13, 2019 at 3:48 AM Bruce Momjian <bruce@momjian.us> wrote:
>> > The big question is how many people will be mixing encrypted and
>> > unencrypted data in the same cluster, and care about performance?  Just
>> > because someone might care is not enough of a justification.  They can
>> > certainly create separate encrypted and non-encrypted clusters. Can we
>> > implement level 6 and then implement levels 3-5 later if desired?
>>
>> I guess most users are interested in performance. Users don't want to
>> sacrifice performance for security and vice versa. Fine grained
>> control would allow us to seek a compromise point.
>
>Well, what does that add to the argument?  Yes, everyone cares about
>performance, but it is the magnitude of the performance impact vs. the
>complexity that is the issue here.  Also, by definition, users will
>trade performance for security because encrypting data will slow down
>the database.  The open question is how much, and if that overhead is
>reasonable based on the complexity.
>
>What I don't want to do is to design a system that is more complex than
>required, and it might become so complex we might never get it done.
>

IMHO we should implement the simplest system possible, and optimize the
hell out of it without sacrificing any safety/security aspects. No smart
tunables, no extra GUCs to trade security for performance, nothing.

Then once we have this working, we can see what the impact is, and make
informed choices based on that. It's really hard to make good choices
based on speculation, which is all we have at this point. And the danger
is we'll end up with overly complex system with many parameters - which
is pretty bad when the configuration impacts security, because regular
users may not reaslise the consequences (and we'll get blamed for it).

Also, in my experience the deployments that really need this sort of
encryption tend to be quite valuable, and the owners will be happy with
higher hardware costs to compensate for the performance impact, if it
gives them the feature. So even if the performance impact is 20% (worst
case estimate), I'd say that may be acceptable.

>> > How would you configure the WAL to know which key to use if we did #5?
>> > Wouldn't system tables and statistics, and perhaps referential integry
>> > allow for information leakage?
>>
>> We use a something like a map between tablespace oid and encryption
>> key as a separate file (maybe stored in $PGDATA/global), called
>> keyring. Using the keyring we can obtain encryption key by tablespace
>> oid. For WAL, we add a flag to XLogRecord which indicates whether the
>> WAL record is encrypted, and we already have relfilenode in the header
>> data of WAL. So we can obtain the tablespace oid from the part and
>> obtain the corresponding encryption key.
>
>OK.
>
>> > > 2. Encryption Objects.
>> > > Indexes, WAL and TOAST table pertaining to encrypted tables, and
>> > > temporary files must also be encrypted but we need to discuss whether
>> > > we encrypt non-user data as well such as SLRU data, vm and fsm, and
>> > > perhaps even other files such as 2PC state files, backend_label etc.
>> > > Encryption everything is required by some use case but it's also true
>> > > that there are users who wish to encrypt database while minimizing
>> > > performance overheads.
>> >
>> > I don't think we need to encrypt the "status" files like SLRU data, vm
>> > and fsm.
>>
>> I agree.
>
>Good.
>
>> > Good point about pg_waldump.  I am a little worried we might open a
>> > security hole making a new API so they work, so maybe we should avoid
>> > it.
>>
>> Yeah, in principle since data key of 2 tier key architecture should
>> not go outside database I think we should not tell data keys to
>> utility commands. So the rearranging WAL format seems to be a better
>> solution but is there any reason why the main data is placed at end of
>> WAL record? I wonder if we can assemble WAL records as following order
>> and encrypt only 3 and 4.
>>
>> 1. Header data (XLogRecord and other headers)
>> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
>> 3. Block data (Tuple data, FPI)
>> 4. Sub data (e.g tuple data for logical decoding)
>
>Yes, that does sound like a reasonable idea.  It is similar to us not
>encrypting the clog --- there is little value.  However, if we only
>encrypt the cluster, we don't need to expose the relfilenode and we can
>just encrypt the entire WAL --- I like that simplicity.  We might find
>that the complexity of encrypting only certain tablespaces makes the
>system slower than just encrypting the entire cluster.
>

I personally find the idea of encrypting tablespaces rather strange.
Tablespaces are meant to define hysical location for objects, but this
would also use them to "mark" objects as encrypted or not. That just
seems misguided and would make the life harder for many users.

For example, what if I don't have any tablespaces (except for the
default one), but I want to encrypt only some objects? Suddenly I have
to create a tablespace, which will however cause various difficulties
down the road (during pg_basebackup, etc.).

If we really want to allow encrypting of just some objects (and I'm not
saying we need to), we should either allow defining that for individual
objects, or invent some new logical grouping of objects (where each
group is encrypted or not as a whole).

FWIW my impression is that we only really consider tablespace encryption
because Oracle has it, so users may be familiar with it. That however
ignores that Oracle actually allocates data for a tablespace (IIRC there
is a set of files per tablespace, shared by all objects in it), while
PostgreSQL has files per object.


>> > > Also, for system catalog encryption, it could be a hard part. System
>> > > catalogs are initially created at initdb time and created by copying
>> > > from template1 when CREATE DATABASE. Therefore we would need to either
>> > > modify initdb so that it's aware of encryption keys and KMS or modify
>> > > database creation so that it copies database file while encrypting
>> > > them.
>> >
>> > I assume initdb will use the same API that you would use to start the
>> > server itself, e.g., type in a password, or contact a key server.
>>
>> I realized that in XTS encryption mode since we craft the tweak using
>> relfilenode we will need to have the different tweaks for system
>> catalogs in new database would change. So we might need to re-encrypt
>> system catalogs when CREATE DATABASE after all. I suspect that even
>> the cluster-wide encryption has the same problem.
>
>Yes, this is why I want to just do cluster-wide encryption at this
>stage.
>
>In addition, while the 8k blocks would use a block cipher, the WAL would
>likely use a stream cipher, and it will be very hard to use multiple
>stream ciphers in a single WAL file.
>

Umm, why? Why would WAL necessarily use stream cipher instead of a block
cipher with a suitable mode (say, CBC or XTS)? And even if it did use
some stream cipter, why would it be hard to use multiple ciphers?

I mean, we'd probably want/need to start new streams for each WAL
segment anyway, so that tools can read and process each WAL segment
independently. So we wouldn't get very long streams anyway.



regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Fri, Jun 14, 2019 at 12:41:17AM +0200, Tomas Vondra wrote:
> On Thu, Jun 13, 2019 at 11:07:25AM -0400, Bruce Momjian wrote:
> IMHO we should implement the simplest system possible, and optimize the
> hell out of it without sacrificing any safety/security aspects. No smart
> tunables, no extra GUCs to trade security for performance, nothing.
> 
> Then once we have this working, we can see what the impact is, and make
> informed choices based on that. It's really hard to make good choices
> based on speculation, which is all we have at this point. And the danger
> is we'll end up with overly complex system with many parameters - which
> is pretty bad when the configuration impacts security, because regular
> users may not reaslise the consequences (and we'll get blamed for it).
> 
> Also, in my experience the deployments that really need this sort of
> encryption tend to be quite valuable, and the owners will be happy with
> higher hardware costs to compensate for the performance impact, if it
> gives them the feature. So even if the performance impact is 20% (worst
> case estimate), I'd say that may be acceptable.

Totally agree.

> I personally find the idea of encrypting tablespaces rather strange.
> Tablespaces are meant to define hysical location for objects, but this
> would also use them to "mark" objects as encrypted or not. That just
> seems misguided and would make the life harder for many users.
> 
> For example, what if I don't have any tablespaces (except for the
> default one), but I want to encrypt only some objects? Suddenly I have
> to create a tablespace, which will however cause various difficulties
> down the road (during pg_basebackup, etc.).

Yes, very good point.

> > In addition, while the 8k blocks would use a block cipher, the WAL would
> > likely use a stream cipher, and it will be very hard to use multiple
> > stream ciphers in a single WAL file.
> > 
> 
> Umm, why? Why would WAL necessarily use stream cipher instead of a block
> cipher with a suitable mode (say, CBC or XTS)? And even if it did use
> some stream cipter, why would it be hard to use multiple ciphers?

Well, the value of stream ciphers is that you can write as many bytes as
you want, rather than requiring all writes to be a multiple of the block
size.  The idea of having multiple tablespaces with different keys, and
switching streaming ciphers while encrypting only the part of the WAL
that needs it, and leaving the relfilenode unencrypted so we know which
keys to use, seems very complex.

> I mean, we'd probably want/need to start new streams for each WAL
> segment anyway, so that tools can read and process each WAL segment
> independently. So we wouldn't get very long streams anyway.

Well, 16MB quite a long stream, considering that AES256 is 128-bits or
16 bytes.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jun 13, 2019 at 07:49:48PM -0400, Bruce Momjian wrote:
>On Fri, Jun 14, 2019 at 12:41:17AM +0200, Tomas Vondra wrote:
>> On Thu, Jun 13, 2019 at 11:07:25AM -0400, Bruce Momjian wrote:
>> IMHO we should implement the simplest system possible, and optimize the
>> hell out of it without sacrificing any safety/security aspects. No smart
>> tunables, no extra GUCs to trade security for performance, nothing.
>>
>> Then once we have this working, we can see what the impact is, and make
>> informed choices based on that. It's really hard to make good choices
>> based on speculation, which is all we have at this point. And the danger
>> is we'll end up with overly complex system with many parameters - which
>> is pretty bad when the configuration impacts security, because regular
>> users may not reaslise the consequences (and we'll get blamed for it).
>>
>> Also, in my experience the deployments that really need this sort of
>> encryption tend to be quite valuable, and the owners will be happy with
>> higher hardware costs to compensate for the performance impact, if it
>> gives them the feature. So even if the performance impact is 20% (worst
>> case estimate), I'd say that may be acceptable.
>
>Totally agree.
>
>> I personally find the idea of encrypting tablespaces rather strange.
>> Tablespaces are meant to define hysical location for objects, but this
>> would also use them to "mark" objects as encrypted or not. That just
>> seems misguided and would make the life harder for many users.
>>
>> For example, what if I don't have any tablespaces (except for the
>> default one), but I want to encrypt only some objects? Suddenly I have
>> to create a tablespace, which will however cause various difficulties
>> down the road (during pg_basebackup, etc.).
>
>Yes, very good point.
>
>> > In addition, while the 8k blocks would use a block cipher, the WAL would
>> > likely use a stream cipher, and it will be very hard to use multiple
>> > stream ciphers in a single WAL file.
>> >
>>
>> Umm, why? Why would WAL necessarily use stream cipher instead of a block
>> cipher with a suitable mode (say, CBC or XTS)? And even if it did use
>> some stream cipter, why would it be hard to use multiple ciphers?
>
>Well, the value of stream ciphers is that you can write as many bytes as
>you want, rather than requiring all writes to be a multiple of the block
>size.  The idea of having multiple tablespaces with different keys, and
>switching streaming ciphers while encrypting only the part of the WAL
>that needs it, and leaving the relfilenode unencrypted so we know which
>keys to use, seems very complex.
>

OK, that makes sense.

FWIW my assumption was that we could just add an "encrypted" flag into
the main XLogRecord header, and then an extra part with important
encryption-related data - the key, and the important metadata needed by
external tools (e.g. relfilenode/block, needed by pg_waldump).

Then we wouldn't need to reshuffle the WAL, I think.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Fri, Jun 14, 2019 at 9:12 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Jun 13, 2019 at 07:49:48PM -0400, Bruce Momjian wrote:
> >On Fri, Jun 14, 2019 at 12:41:17AM +0200, Tomas Vondra wrote:
> >> On Thu, Jun 13, 2019 at 11:07:25AM -0400, Bruce Momjian wrote:
> >> IMHO we should implement the simplest system possible, and optimize the
> >> hell out of it without sacrificing any safety/security aspects. No smart
> >> tunables, no extra GUCs to trade security for performance, nothing.
> >>
> >> Then once we have this working, we can see what the impact is, and make
> >> informed choices based on that. It's really hard to make good choices
> >> based on speculation, which is all we have at this point. And the danger
> >> is we'll end up with overly complex system with many parameters - which
> >> is pretty bad when the configuration impacts security, because regular
> >> users may not reaslise the consequences (and we'll get blamed for it).
> >>
> >> Also, in my experience the deployments that really need this sort of
> >> encryption tend to be quite valuable, and the owners will be happy with
> >> higher hardware costs to compensate for the performance impact, if it
> >> gives them the feature. So even if the performance impact is 20% (worst
> >> case estimate), I'd say that may be acceptable.
> >
> >Totally agree.
> >
> >> I personally find the idea of encrypting tablespaces rather strange.
> >> Tablespaces are meant to define hysical location for objects, but this
> >> would also use them to "mark" objects as encrypted or not. That just
> >> seems misguided and would make the life harder for many users.
> >>
> >> For example, what if I don't have any tablespaces (except for the
> >> default one), but I want to encrypt only some objects? Suddenly I have
> >> to create a tablespace, which will however cause various difficulties
> >> down the road (during pg_basebackup, etc.).
> >
> >Yes, very good point.
> >
> >> > In addition, while the 8k blocks would use a block cipher, the WAL would
> >> > likely use a stream cipher, and it will be very hard to use multiple
> >> > stream ciphers in a single WAL file.
> >> >
> >>
> >> Umm, why? Why would WAL necessarily use stream cipher instead of a block
> >> cipher with a suitable mode (say, CBC or XTS)? And even if it did use
> >> some stream cipter, why would it be hard to use multiple ciphers?
> >
> >Well, the value of stream ciphers is that you can write as many bytes as
> >you want, rather than requiring all writes to be a multiple of the block
> >size.  The idea of having multiple tablespaces with different keys, and
> >switching streaming ciphers while encrypting only the part of the WAL
> >that needs it, and leaving the relfilenode unencrypted so we know which
> >keys to use, seems very complex.
> >
>
> OK, that makes sense.
>
> FWIW my assumption was that we could just add an "encrypted" flag into
> the main XLogRecord header, and then an extra part with important
> encryption-related data - the key, and the important metadata needed by
> external tools (e.g. relfilenode/block, needed by pg_waldump).
>
> Then we wouldn't need to reshuffle the WAL, I think.
>

Hmm, IIUC pg_waldump reads the main data containing the info for redo
operation of each rmgrs such as xl_heap_insert, xl_heap_update. I
wonder if we would need to doubly write the same data.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Fri, Jun 14, 2019 at 02:12:07AM +0200, Tomas Vondra wrote:
> FWIW my assumption was that we could just add an "encrypted" flag into
> the main XLogRecord header, and then an extra part with important
> encryption-related data - the key, and the important metadata needed by
> external tools (e.g. relfilenode/block, needed by pg_waldump).
> 
> Then we wouldn't need to reshuffle the WAL, I think.

I was thinking we would just encrypt the entire WAL file, and use the
WAL file name as the IV.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 6/13/19 11:07 AM, Bruce Momjian wrote:
> On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
>> Yeah, in principle since data key of 2 tier key architecture should
>> not go outside database I think we should not tell data keys to
>> utility commands. So the rearranging WAL format seems to be a better
>> solution but is there any reason why the main data is placed at end of
>> WAL record? I wonder if we can assemble WAL records as following order
>> and encrypt only 3 and 4.
>>
>> 1. Header data (XLogRecord and other headers)
>> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
>> 3. Block data (Tuple data, FPI)
>> 4. Sub data (e.g tuple data for logical decoding)
>
> Yes, that does sound like a reasonable idea.  It is similar to us not
> encrypting the clog --- there is little value.  However, if we only
> encrypt the cluster, we don't need to expose the relfilenode and we can
> just encrypt the entire WAL --- I like that simplicity.  We might find
> that the complexity of encrypting only certain tablespaces makes the
> system slower than just encrypting the entire cluster.


There are reasons other than performance to want more granular than
entire cluster encryption. Limiting the volume of encrypted data with
any one key for example. And not encrypting #1 & 2 above would help
avoid known-plaintext attacks I would think.


>> > > Also, for system catalog encryption, it could be a hard part. System
>> > > catalogs are initially created at initdb time and created by copying
>> > > from template1 when CREATE DATABASE. Therefore we would need to either
>> > > modify initdb so that it's aware of encryption keys and KMS or modify
>> > > database creation so that it copies database file while encrypting
>> > > them.
>> >
>> > I assume initdb will use the same API that you would use to start the
>> > server itself, e.g., type in a password, or contact a key server.
>>
>> I realized that in XTS encryption mode since we craft the tweak using
>> relfilenode we will need to have the different tweaks for system
>> catalogs in new database would change. So we might need to re-encrypt
>> system catalogs when CREATE DATABASE after all. I suspect that even
>> the cluster-wide encryption has the same problem.
>
> Yes, this is why I want to just do cluster-wide encryption at this
> stage.
>
> In addition, while the 8k blocks would use a block cipher, the WAL would
> likely use a stream cipher, and it will be very hard to use multiple
> stream ciphers in a single WAL file.


I don't understand why we would not just use AES for both.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
> On 6/13/19 11:07 AM, Bruce Momjian wrote:
> > On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
> >> Yeah, in principle since data key of 2 tier key architecture should
> >> not go outside database I think we should not tell data keys to
> >> utility commands. So the rearranging WAL format seems to be a better
> >> solution but is there any reason why the main data is placed at end of
> >> WAL record? I wonder if we can assemble WAL records as following order
> >> and encrypt only 3 and 4.
> >> 
> >> 1. Header data (XLogRecord and other headers)
> >> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
> >> 3. Block data (Tuple data, FPI)
> >> 4. Sub data (e.g tuple data for logical decoding)
> > 
> > Yes, that does sound like a reasonable idea.  It is similar to us not
> > encrypting the clog --- there is little value.  However, if we only
> > encrypt the cluster, we don't need to expose the relfilenode and we can
> > just encrypt the entire WAL --- I like that simplicity.  We might find
> > that the complexity of encrypting only certain tablespaces makes the
> > system slower than just encrypting the entire cluster.
> 
> 
> There are reasons other than performance to want more granular than
> entire cluster encryption. Limiting the volume of encrypted data with
> any one key for example. And not encrypting #1 & 2 above would help
> avoid known-plaintext attacks I would think.
> 
> 
> >> > > Also, for system catalog encryption, it could be a hard part. System
> >> > > catalogs are initially created at initdb time and created by copying
> >> > > from template1 when CREATE DATABASE. Therefore we would need to either
> >> > > modify initdb so that it's aware of encryption keys and KMS or modify
> >> > > database creation so that it copies database file while encrypting
> >> > > them.
> >> >
> >> > I assume initdb will use the same API that you would use to start the
> >> > server itself, e.g., type in a password, or contact a key server.
> >> 
> >> I realized that in XTS encryption mode since we craft the tweak using
> >> relfilenode we will need to have the different tweaks for system
> >> catalogs in new database would change. So we might need to re-encrypt
> >> system catalogs when CREATE DATABASE after all. I suspect that even
> >> the cluster-wide encryption has the same problem.
> > 
> > Yes, this is why I want to just do cluster-wide encryption at this
> > stage.
> > 
> > In addition, while the 8k blocks would use a block cipher, the WAL would
> > likely use a stream cipher, and it will be very hard to use multiple
> > stream ciphers in a single WAL file.
> 
> 
> I don't understand why we would not just use AES for both.

Uh, AES is an encryption cipher.  You can use it with block mode, CBC,
or stream mode, CTR, GCM;  see:

    http://momjian.us/main/writings/crypto.pdf#page=7

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 6/14/19 6:09 PM, Bruce Momjian wrote:
> On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
>> On 6/13/19 11:07 AM, Bruce Momjian wrote:
>> > In addition, while the 8k blocks would use a block cipher, the WAL would
>> > likely use a stream cipher, and it will be very hard to use multiple
>> > stream ciphers in a single WAL file.
>>
>> I don't understand why we would not just use AES for both.
>
> Uh, AES is an encryption cipher.  You can use it with block mode, CBC,
> or stream mode, CTR, GCM;  see:


AES is a block cipher, not a stream cipher. Yes you can use it in
different modes, including chained modes (and CBC is what I would pick),
but I assumed you were talking about an actual stream cipher algorithm.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Fri, Jun 14, 2019 at 09:37:37PM -0400, Joe Conway wrote:
> On 6/14/19 6:09 PM, Bruce Momjian wrote:
> > On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
> >> On 6/13/19 11:07 AM, Bruce Momjian wrote:
> >> > In addition, while the 8k blocks would use a block cipher, the WAL would
> >> > likely use a stream cipher, and it will be very hard to use multiple
> >> > stream ciphers in a single WAL file.
> >> 
> >> I don't understand why we would not just use AES for both.
> > 
> > Uh, AES is an encryption cipher.  You can use it with block mode, CBC,
> > or stream mode, CTR, GCM;  see:
> 
> 
> AES is a block cipher, not a stream cipher. Yes you can use it in
> different modes, including chained modes (and CBC is what I would pick),
> but I assumed you were talking about an actual stream cipher algorithm.

OK, to be specific, I was thinking of using aes128-cbc for the 8k pages
and aes128-ctr for the WAL.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
> On 6/13/19 11:07 AM, Bruce Momjian wrote:
> > On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
> >> Yeah, in principle since data key of 2 tier key architecture should
> >> not go outside database I think we should not tell data keys to
> >> utility commands. So the rearranging WAL format seems to be a better
> >> solution but is there any reason why the main data is placed at end of
> >> WAL record? I wonder if we can assemble WAL records as following order
> >> and encrypt only 3 and 4.
> >> 
> >> 1. Header data (XLogRecord and other headers)
> >> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
> >> 3. Block data (Tuple data, FPI)
> >> 4. Sub data (e.g tuple data for logical decoding)
> > 
> > Yes, that does sound like a reasonable idea.  It is similar to us not
> > encrypting the clog --- there is little value.  However, if we only
> > encrypt the cluster, we don't need to expose the relfilenode and we can
> > just encrypt the entire WAL --- I like that simplicity.  We might find
> > that the complexity of encrypting only certain tablespaces makes the
> > system slower than just encrypting the entire cluster.
> 
> 
> There are reasons other than performance to want more granular than
> entire cluster encryption. Limiting the volume of encrypted data with
> any one key for example. And not encrypting #1 & 2 above would help
> avoid known-plaintext attacks I would think.

There are no known non-exhaustive plaintext attacks on AES:

    https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks

Even if we don't encrypt the first part of the WAL record (1 & 2), the
block data (3) probably has enough format for a plaintext attack.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 6/15/19 9:28 PM, Bruce Momjian wrote:
> On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
>> On 6/13/19 11:07 AM, Bruce Momjian wrote:
>> > On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
>> >> Yeah, in principle since data key of 2 tier key architecture should
>> >> not go outside database I think we should not tell data keys to
>> >> utility commands. So the rearranging WAL format seems to be a better
>> >> solution but is there any reason why the main data is placed at end of
>> >> WAL record? I wonder if we can assemble WAL records as following order
>> >> and encrypt only 3 and 4.
>> >>
>> >> 1. Header data (XLogRecord and other headers)
>> >> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
>> >> 3. Block data (Tuple data, FPI)
>> >> 4. Sub data (e.g tuple data for logical decoding)
>> >
>> > Yes, that does sound like a reasonable idea.  It is similar to us not
>> > encrypting the clog --- there is little value.  However, if we only
>> > encrypt the cluster, we don't need to expose the relfilenode and we can
>> > just encrypt the entire WAL --- I like that simplicity.  We might find
>> > that the complexity of encrypting only certain tablespaces makes the
>> > system slower than just encrypting the entire cluster.
>>
>>
>> There are reasons other than performance to want more granular than
>> entire cluster encryption. Limiting the volume of encrypted data with
>> any one key for example. And not encrypting #1 & 2 above would help
>> avoid known-plaintext attacks I would think.
>
> There are no known non-exhaustive plaintext attacks on AES:
>
>     https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks

Even that non-authoritative stackexchange thread has varying opinions.
Surely you don't claim that limiting know plaintext as much as is
practical is a bad idea in general.

> Even if we don't encrypt the first part of the WAL record (1 & 2), the
> block data (3) probably has enough format for a plaintext attack.

Perhaps.

In any case it doesn't address my first point, which is limiting the
volume encrypted with the same key. Another valid reason is you might
have data at varying sensitivity levels and prefer different keys be
used for each level.

And although I'm not proposing this for the first implementation, yet
another reason is I might want to additionally control "transparent
access" to data based on who is logged in. That could be done by
layering an additional key on top of the per-tablespace key for example.

The bottom line in my mind is encrypting the entire database with a
single key is not much different/better than using filesystem
encryption, so I'm not sure it is worth the effort and complexity to get
that capability. I think having the ability to encrypt at the tablespace
level adds a lot of capability for minimal extra complexity.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> On 6/15/19 9:28 PM, Bruce Momjian wrote:
> >> There are reasons other than performance to want more granular than
> >> entire cluster encryption. Limiting the volume of encrypted data with
> >> any one key for example. And not encrypting #1 & 2 above would help
> >> avoid known-plaintext attacks I would think.
> > 
> > There are no known non-exhaustive plaintext attacks on AES:
> > 
> >     https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks
> 
> Even that non-authoritative stackexchange thread has varying opinions.
> Surely you don't claim that limiting know plaintext as much as is
> practical is a bad idea in general.

I think we have to look at complexity vs. benefit.

> > Even if we don't encrypt the first part of the WAL record (1 & 2), the
> > block data (3) probably has enough format for a plaintext attack.
> 
> Perhaps.
> 
> In any case it doesn't address my first point, which is limiting the
> volume encrypted with the same key. Another valid reason is you might
> have data at varying sensitivity levels and prefer different keys be
> used for each level.

That seems quite complex.

> And although I'm not proposing this for the first implementation, yet
> another reason is I might want to additionally control "transparent
> access" to data based on who is logged in. That could be done by
> layering an additional key on top of the per-tablespace key for example.
> 
> The bottom line in my mind is encrypting the entire database with a
> single key is not much different/better than using filesystem
> encryption, so I'm not sure it is worth the effort and complexity to get
> that capability. I think having the ability to encrypt at the tablespace
> level adds a lot of capability for minimal extra complexity.

I disagree.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sun, Jun 16, 2019 at 09:45:09AM -0400, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> > And although I'm not proposing this for the first implementation, yet
> > another reason is I might want to additionally control "transparent
> > access" to data based on who is logged in. That could be done by
> > layering an additional key on top of the per-tablespace key for example.
> > 
> > The bottom line in my mind is encrypting the entire database with a
> > single key is not much different/better than using filesystem
> > encryption, so I'm not sure it is worth the effort and complexity to get
> > that capability. I think having the ability to encrypt at the tablespace
> > level adds a lot of capability for minimal extra complexity.
> 
> I disagree.

I will add that OpenSSL has been removing features and compatibility
because the added complexity had hidden exploits that they could not
have anticipated.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 6/16/19 9:45 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> In any case it doesn't address my first point, which is limiting the
>> volume encrypted with the same key. Another valid reason is you might
>> have data at varying sensitivity levels and prefer different keys be
>> used for each level.
>
> That seems quite complex.


How? It is no more complex than encrypting at the tablespace level
already gives you - in that case you get this property for free if you
care to use it.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 6/16/19 9:46 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 09:45:09AM -0400, Bruce Momjian wrote:
>> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> > And although I'm not proposing this for the first implementation, yet
>> > another reason is I might want to additionally control "transparent
>> > access" to data based on who is logged in. That could be done by
>> > layering an additional key on top of the per-tablespace key for example.
>> >
>> > The bottom line in my mind is encrypting the entire database with a
>> > single key is not much different/better than using filesystem
>> > encryption, so I'm not sure it is worth the effort and complexity to get
>> > that capability. I think having the ability to encrypt at the tablespace
>> > level adds a lot of capability for minimal extra complexity.
>>
>> I disagree.
>
> I will add that OpenSSL has been removing features and compatibility
> because the added complexity had hidden exploits that they could not
> have anticipated.

Sorry but I'm not buying it.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
Greetings,

* Joe Conway (mail@joeconway.com) wrote:
> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> In any case it doesn't address my first point, which is limiting the
> >> volume encrypted with the same key. Another valid reason is you might
> >> have data at varying sensitivity levels and prefer different keys be
> >> used for each level.
> >
> > That seems quite complex.
>
> How? It is no more complex than encrypting at the tablespace level
> already gives you - in that case you get this property for free if you
> care to use it.

Perhaps not surprising, but I'm definitely in agreement with Joe
regarding having multiple keys when possible and (reasonably)
straight-forward to do so.  I also don't buy off on the OpenSSL
argument; their more severe issues certainly haven't been due to key
management issues such as what we're discussing here, so I don't think
the argument applies.

Thanks,

Stephen

Attachment
On Sun, Jun 16, 2019 at 12:42:55PM -0400, Joe Conway wrote:
> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> In any case it doesn't address my first point, which is limiting the
> >> volume encrypted with the same key. Another valid reason is you might
> >> have data at varying sensitivity levels and prefer different keys be
> >> used for each level.
> > 
> > That seems quite complex.
> 
> 
> How? It is no more complex than encrypting at the tablespace level
> already gives you - in that case you get this property for free if you
> care to use it.

All keys used to encrypt WAL data must be unlocked at all times or crash
recovery, PITR, and replication will not stop when it hits a locked key.
Given that, how much value is there in allowing a key per tablespace?

I don't see how this is better than telling users they have to create a
separate cluster per key.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Sun, Jun 16, 2019 at 12:42:55PM -0400, Joe Conway wrote:
> > On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> > >> In any case it doesn't address my first point, which is limiting the
> > >> volume encrypted with the same key. Another valid reason is you might
> > >> have data at varying sensitivity levels and prefer different keys be
> > >> used for each level.
> > >
> > > That seems quite complex.
> >
> > How? It is no more complex than encrypting at the tablespace level
> > already gives you - in that case you get this property for free if you
> > care to use it.
>
> All keys used to encrypt WAL data must be unlocked at all times or crash
> recovery, PITR, and replication will not stop when it hits a locked key.
> Given that, how much value is there in allowing a key per tablespace?

There's a few different things to discuss here, admittedly, but I don't
think it means that there's no value in having a key per tablespace.

Ideally, a given backend would only need, and only have access to, the
keys for the tablespaces that it is allowed to operate on.  I realize
that's a bit farther than what we're talking about today, but hopefully
not too much to be able to consider.

Thanks,

Stephen

Attachment
On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Joe Conway (mail@joeconway.com) wrote:
>> On 6/16/19 9:45 AM, Bruce Momjian wrote:
>> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> >> In any case it doesn't address my first point, which is limiting the
>> >> volume encrypted with the same key. Another valid reason is you might
>> >> have data at varying sensitivity levels and prefer different keys be
>> >> used for each level.
>> >
>> > That seems quite complex.
>>
>> How? It is no more complex than encrypting at the tablespace level
>> already gives you - in that case you get this property for free if you
>> care to use it.
>
>Perhaps not surprising, but I'm definitely in agreement with Joe
>regarding having multiple keys when possible and (reasonably)
>straight-forward to do so.  I also don't buy off on the OpenSSL
>argument; their more severe issues certainly haven't been due to key
>management issues such as what we're discussing here, so I don't think
>the argument applies.
>

I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
with? IMHO Bruce is quite right that the risk of vulnerabilities grows
with the complexity of the system (both due to implementation bugs and
general design weaknesses). I don't think it's tied to the key
management specifically, except that it's one of the parts that may
contribute to the complexity.

(It's often claimed that key management is one of the weakest points of
current crypto systems - we have safe (a)symmetric algorithms, but safe
handling of keys is an issue. I don't have data / papers supporting this
claim, I kinda believe it.)

Now, I'm not opposed to eventually implementing something more
elaborate, but I also think just encrypting the whole cluster (not
necessarily with a single key, but with one master key) would be enough
for vast majority of users. Plus it's less error prone and easier to
operate (backups, replicas, crash recovery, ...).

But there's about 0% chance we'll get that in v1, of course, so we need
s "minimum viable product" to build on anyway.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
> >* Joe Conway (mail@joeconway.com) wrote:
> >>On 6/16/19 9:45 AM, Bruce Momjian wrote:
> >>> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >>>> In any case it doesn't address my first point, which is limiting the
> >>>> volume encrypted with the same key. Another valid reason is you might
> >>>> have data at varying sensitivity levels and prefer different keys be
> >>>> used for each level.
> >>>
> >>> That seems quite complex.
> >>
> >>How? It is no more complex than encrypting at the tablespace level
> >>already gives you - in that case you get this property for free if you
> >>care to use it.
> >
> >Perhaps not surprising, but I'm definitely in agreement with Joe
> >regarding having multiple keys when possible and (reasonably)
> >straight-forward to do so.  I also don't buy off on the OpenSSL
> >argument; their more severe issues certainly haven't been due to key
> >management issues such as what we're discussing here, so I don't think
> >the argument applies.
>
> I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
> with? IMHO Bruce is quite right that the risk of vulnerabilities grows
> with the complexity of the system (both due to implementation bugs and
> general design weaknesses). I don't think it's tied to the key
> management specifically, except that it's one of the parts that may
> contribute to the complexity.

While I understand that complexity of the system can lead to
vulnerabilities, I don't agree that it's appropriate or sensible to
equate the ones that OpenSSL has had to deal with as being similar to
what we might have to deal with, given this additional proposed
complexity.

> (It's often claimed that key management is one of the weakest points of
> current crypto systems - we have safe (a)symmetric algorithms, but safe
> handling of keys is an issue. I don't have data / papers supporting this
> claim, I kinda believe it.)

Yes, I agree entirely that key management is absolutely one of the
hardest things to get right in crypto systems.  It is, however, not what
the OpenSSL issues have been about and so while I agree that we may have
some bugs there, it's not fair to say that they're equivilant to what
OpenSSL has been dealing with.

> Now, I'm not opposed to eventually implementing something more
> elaborate, but I also think just encrypting the whole cluster (not
> necessarily with a single key, but with one master key) would be enough
> for vast majority of users. Plus it's less error prone and easier to
> operate (backups, replicas, crash recovery, ...).

I agree that it'd be better than nothing but if we have the opportunity
now to introduce what would hopefully be a relatively small capability,
then I'm certainly all for it.

> But there's about 0% chance we'll get that in v1, of course, so we need
> s "minimum viable product" to build on anyway.

There seems like a whole lot of space between something very elaborate
and only supporting one key.

Thanks,

Stephen

Attachment
On 6/17/19 8:12 AM, Stephen Frost wrote:
>> But there's about 0% chance we'll get that in v1, of course, so we need
>> s "minimum viable product" to build on anyway.
> 
> There seems like a whole lot of space between something very elaborate
> and only supporting one key.

I think this is exactly the point -- IMHO one key per tablespace is a
nice and very sensible compromise. I can imagine all kinds of more
complex things that would be "nice to have" but that gets us most of the
flexibility needed with minimal additional complexity.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



On Fri, Jun 14, 2019 at 7:41 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I personally find the idea of encrypting tablespaces rather strange.
> Tablespaces are meant to define hysical location for objects, but this
> would also use them to "mark" objects as encrypted or not. That just
> seems misguided and would make the life harder for many users.
>
> For example, what if I don't have any tablespaces (except for the
> default one), but I want to encrypt only some objects? Suddenly I have
> to create a tablespace, which will however cause various difficulties
> down the road (during pg_basebackup, etc.).

I guess that we can have an encrypted tabelspace by default (e.g.
pg_default_enc). Or we encrypt per tables while having encryption keys
per tablespaces.

On Mon, Jun 17, 2019 at 6:54 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
> >Greetings,
> >
> >* Joe Conway (mail@joeconway.com) wrote:
> >> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> >> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> >> In any case it doesn't address my first point, which is limiting the
> >> >> volume encrypted with the same key. Another valid reason is you might
> >> >> have data at varying sensitivity levels and prefer different keys be
> >> >> used for each level.
> >> >
> >> > That seems quite complex.
> >>
> >> How? It is no more complex than encrypting at the tablespace level
> >> already gives you - in that case you get this property for free if you
> >> care to use it.
> >
> >Perhaps not surprising, but I'm definitely in agreement with Joe
> >regarding having multiple keys when possible and (reasonably)
> >straight-forward to do so.  I also don't buy off on the OpenSSL
> >argument; their more severe issues certainly haven't been due to key
> >management issues such as what we're discussing here, so I don't think
> >the argument applies.
> >
>
> I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
> with? IMHO Bruce is quite right that the risk of vulnerabilities grows
> with the complexity of the system (both due to implementation bugs and
> general design weaknesses). I don't think it's tied to the key
> management specifically, except that it's one of the parts that may
> contribute to the complexity.
>
> (It's often claimed that key management is one of the weakest points of
> current crypto systems - we have safe (a)symmetric algorithms, but safe
> handling of keys is an issue. I don't have data / papers supporting this
> claim, I kinda believe it.)
>
> Now, I'm not opposed to eventually implementing something more
> elaborate, but I also think just encrypting the whole cluster (not
> necessarily with a single key, but with one master key) would be enough
> for vast majority of users. Plus it's less error prone and easier to
> operate (backups, replicas, crash recovery, ...).
>
> But there's about 0% chance we'll get that in v1, of course, so we need
> s "minimum viable product" to build on anyway.
>

I agree that we need minimum viable product first. But I'm not sure
it's true that the implementing the cluster-wide TDE first could be
the first step of per-tablespace/table TDE.

The purpose of cluster-wide TDE and table/tablespace TDE are slightly
different in terms of encryption target objects. The cluster-wide TDE
would be a good solution for users who want to encrypt everything
while the table/tabelspace TDE would help more severe use cases in
terms of both of security and performance.

The cluster-wide TDE eventually encrypts SLRU data and all WAL
including non-user data related WAL while table/tablespace TDE doesn't
unless we develop such functionality. In addition, the cluster-wide
TDE also encrypts system catalogs but in table/tablespace TDE user
would be able to control that somewhat. That is, if we developed the
cluster-wide TDE first, when we develop table/tablespace TDE on top of
that we would need to change TDE so that table/tablespace TDE can
encrypt even non-user data related data while retaining its simple
user interface, which would rather make the feature complex, I'm
concerned. We can support them as different TDE features but I'm not
sure it's a good choice for users.

From perspective of  cryptographic, I think the fine grained TDE would
be better solution. Therefore if we eventually want the fine grained
TDE I wonder if it might be better to develop the table/tablespace TDE
first while keeping it simple as much as possible in v1, and then we
can provide the functionality to encrypt other data in database
cluster to satisfy the encrypting-everything requirement. I guess that
it's easier to incrementally add encryption target objects rather than
making it fine grained while not changing encryption target objects.

FWIW I'm writing a draft patch of per tablespace TDE and will submit
it in this month. We can more discuss the complexity of the proposed
TDE using it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> From perspective of  cryptographic, I think the fine grained TDE would
> be better solution. Therefore if we eventually want the fine grained
> TDE I wonder if it might be better to develop the table/tablespace TDE
> first while keeping it simple as much as possible in v1, and then we
> can provide the functionality to encrypt other data in database
> cluster to satisfy the encrypting-everything requirement. I guess that
> it's easier to incrementally add encryption target objects rather than
> making it fine grained while not changing encryption target objects.
> 
> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> it in this month. We can more discuss the complexity of the proposed
> TDE using it.

+1

Looking forward to it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> The cluster-wide TDE eventually encrypts SLRU data and all WAL
> including non-user data related WAL while table/tablespace TDE doesn't
> unless we develop such functionality. In addition, the cluster-wide
> TDE also encrypts system catalogs but in table/tablespace TDE user
> would be able to control that somewhat. That is, if we developed the
> cluster-wide TDE first, when we develop table/tablespace TDE on top of
> that we would need to change TDE so that table/tablespace TDE can
> encrypt even non-user data related data while retaining its simple
> user interface, which would rather make the feature complex, I'm
> concerned.

Isn't this only a problem of pg_upgrade? If the whole instance (including
catalog) is encrypted and user wants to adopt the table/tablespace TDE, then
pg_upgrade can simply decrypt the catalog, plus tables/tablespaces which
should no longer be encrypted. Conversely, if only some tables/tablespaces are
encrypted and user wants to encrypt the whole cluster, pg_upgrade will encrypt
the non-encrypted files.

> We can support them as different TDE features but I'm not sure it's a good
> choice for users.

IMO it does not matter which approach (cluster vs table/tablespace) is
implemented first. What matters is to design the user interface so that
addition of the other of the two features does not make users confused.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Antonin Houska <ah@cybertec.at> wrote:

> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> > The cluster-wide TDE eventually encrypts SLRU data and all WAL
> > including non-user data related WAL while table/tablespace TDE doesn't
> > unless we develop such functionality. In addition, the cluster-wide
> > TDE also encrypts system catalogs but in table/tablespace TDE user
> > would be able to control that somewhat. That is, if we developed the
> > cluster-wide TDE first, when we develop table/tablespace TDE on top of
> > that we would need to change TDE so that table/tablespace TDE can
> > encrypt even non-user data related data while retaining its simple
> > user interface, which would rather make the feature complex, I'm
> > concerned.
> 
> Isn't this only a problem of pg_upgrade?

Sorry, this is not a use case for pg_upgrade. Rather it's about a separate
encryption/decryption utility.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Mon, Jun 17, 2019 at 08:39:27AM -0400, Joe Conway wrote:
>On 6/17/19 8:29 AM, Masahiko Sawada wrote:
>> From perspective of  cryptographic, I think the fine grained TDE would
>> be better solution. Therefore if we eventually want the fine grained
>> TDE I wonder if it might be better to develop the table/tablespace TDE
>> first while keeping it simple as much as possible in v1, and then we
>> can provide the functionality to encrypt other data in database
>> cluster to satisfy the encrypting-everything requirement. I guess that
>> it's easier to incrementally add encryption target objects rather than
>> making it fine grained while not changing encryption target objects.
>>
>> FWIW I'm writing a draft patch of per tablespace TDE and will submit
>> it in this month. We can more discuss the complexity of the proposed
>> TDE using it.
>
>+1
>
>Looking forward to it.
>

Yep. In particular, I'm interested in those aspects:

(1) What's the proposed minimum viable product, and how do we expect to
extend it with the more elaborate features. I don't expect perfect
specification, but we should have some idea so that we don't paint
ourselves in the corner.

(2) How does it affect recovery, backups and replication (both physical
and logical)? That is, which other parts need to know the encryption keys
to function properly?

(3) What does it mean for external tools (pg_waldump, pg_upgrade,
pg_rewind etc.)? 


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Mon, Jun 17, 2019 at 08:29:02AM -0400, Joe Conway wrote:
>On 6/17/19 8:12 AM, Stephen Frost wrote:
>>> But there's about 0% chance we'll get that in v1, of course, so we need
>>> s "minimum viable product" to build on anyway.
>>
>> There seems like a whole lot of space between something very elaborate
>> and only supporting one key.
>
>I think this is exactly the point -- IMHO one key per tablespace is a
>nice and very sensible compromise. I can imagine all kinds of more
>complex things that would be "nice to have" but that gets us most of the
>flexibility needed with minimal additional complexity.
>

Not sure.

I think it's clear the main challenge is encryption of shared resources,
particularly WAL (when each WAL record gets encrypted with the same key as
the object). Considering the importance of WAL, that complicates all sorts
of stuff (recovery, replication, ...).

Encrypting WAL with a single key would be way easier, because we could
(probably) just encrypt whole WAL pages. That may not be appropriate for
some advanced use cases, of course. It would work when used as a db-level
replacement for FDE, which I think was the primary motivation for TDE.

In any case, if we end up with a more complex/advanced design, I've
already voiced my opinion that binding the keys to tablespaces is the
wrong abstraction, and I think we'll regret it eventually. For example,
why have we invented publications instead of using tablespaces?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> In any case, if we end up with a more complex/advanced design, I've
> already voiced my opinion that binding the keys to tablespaces is the
> wrong abstraction, and I think we'll regret it eventually. For example,
> why have we invented publications instead of using tablespaces?

I would certainly hope that we don't stop at tablespaces, they just seem
like a much simpler piece to bite off piece than going to table-level
right off, and they make sense for some environments where there's a
relatively small number of levels of separation, which are already being
segregated into different filesystems (or at least directories) for the
same reason that you want different encryption keys.

Thanks,

Stephen

Attachment
On Mon, Jun 17, 2019 at 10:33:11AM -0400, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> In any case, if we end up with a more complex/advanced design, I've
>> already voiced my opinion that binding the keys to tablespaces is the
>> wrong abstraction, and I think we'll regret it eventually. For example,
>> why have we invented publications instead of using tablespaces?
>
>I would certainly hope that we don't stop at tablespaces, they just seem
>like a much simpler piece to bite off piece than going to table-level
>right off, and they make sense for some environments where there's a
>relatively small number of levels of separation, which are already being
>segregated into different filesystems (or at least directories) for the
>same reason that you want different encryption keys.
>

Why not to use the right abstraction from the beginning? I already
mentioned publications, which I think we can use as an inspiration. So
it's not like this would be a major design task, IMHO.

In my experience it's pretty difficult to change abstractions the design
is based on, not just because it tends to be invasive implementation-wise,
but also because users get used to it.

FWIW this is one of the reasons why I advocate for v1 not to allow this,
because it's much easier to extend the design

    single group -> multiple groups

compared to

    one way to group objects -> different way to group objects



regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Masahiko Sawada <sawada.mshk@gmail.com> 于2019年6月17日周一 下午8:30写道:
On Fri, Jun 14, 2019 at 7:41 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I personally find the idea of encrypting tablespaces rather strange.
> Tablespaces are meant to define hysical location for objects, but this
> would also use them to "mark" objects as encrypted or not. That just
> seems misguided and would make the life harder for many users.
>
> For example, what if I don't have any tablespaces (except for the
> default one), but I want to encrypt only some objects? Suddenly I have
> to create a tablespace, which will however cause various difficulties
> down the road (during pg_basebackup, etc.).

I guess that we can have an encrypted tabelspace by default (e.g.
pg_default_enc). Or we encrypt per tables while having encryption keys
per tablespaces.
 
Hi Sawada-san,
I do agree with it. 

On Mon, Jun 17, 2019 at 6:54 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
> >Greetings,
> >
> >* Joe Conway (mail@joeconway.com) wrote:
> >> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> >> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> >> In any case it doesn't address my first point, which is limiting the
> >> >> volume encrypted with the same key. Another valid reason is you might
> >> >> have data at varying sensitivity levels and prefer different keys be
> >> >> used for each level.
> >> >
> >> > That seems quite complex.
> >>
> >> How? It is no more complex than encrypting at the tablespace level
> >> already gives you - in that case you get this property for free if you
> >> care to use it.
> >
> >Perhaps not surprising, but I'm definitely in agreement with Joe
> >regarding having multiple keys when possible and (reasonably)
> >straight-forward to do so.  I also don't buy off on the OpenSSL
> >argument; their more severe issues certainly haven't been due to key
> >management issues such as what we're discussing here, so I don't think
> >the argument applies.
> >
>
> I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
> with? IMHO Bruce is quite right that the risk of vulnerabilities grows
> with the complexity of the system (both due to implementation bugs and
> general design weaknesses). I don't think it's tied to the key
> management specifically, except that it's one of the parts that may
> contribute to the complexity.
>
> (It's often claimed that key management is one of the weakest points of
> current crypto systems - we have safe (a)symmetric algorithms, but safe
> handling of keys is an issue. I don't have data / papers supporting this
> claim, I kinda believe it.)
>
> Now, I'm not opposed to eventually implementing something more
> elaborate, but I also think just encrypting the whole cluster (not
> necessarily with a single key, but with one master key) would be enough
> for vast majority of users. Plus it's less error prone and easier to
> operate (backups, replicas, crash recovery, ...).
>
> But there's about 0% chance we'll get that in v1, of course, so we need
> s "minimum viable product" to build on anyway.
>

I agree that we need minimum viable product first. But I'm not sure
it's true that the implementing the cluster-wide TDE first could be
the first step of per-tablespace/table TDE.

Yes, we could complete the per-tablespace/table TDE in version 13.
And we could do cluster-wide TDE in the next version.
But I remember you said there are so many keys to manage in the table-level.
Will we add the table-level TDE in the first version?

And I have two questions.
1. Will we add hooks to support replacing the encryption algorithms?
2. Will we add some encryption algorithm or use some in some libraries?

Regards, 

--
Shwan Wang
HIGHGO SOFTWARE 


The purpose of cluster-wide TDE and table/tablespace TDE are slightly
different in terms of encryption target objects. The cluster-wide TDE
would be a good solution for users who want to encrypt everything
while the table/tabelspace TDE would help more severe use cases in
terms of both of security and performance.

The cluster-wide TDE eventually encrypts SLRU data and all WAL
including non-user data related WAL while table/tablespace TDE doesn't
unless we develop such functionality. In addition, the cluster-wide
TDE also encrypts system catalogs but in table/tablespace TDE user
would be able to control that somewhat. That is, if we developed the
cluster-wide TDE first, when we develop table/tablespace TDE on top of
that we would need to change TDE so that table/tablespace TDE can
encrypt even non-user data related data while retaining its simple
user interface, which would rather make the feature complex, I'm
concerned. We can support them as different TDE features but I'm not
sure it's a good choice for users.

From perspective of  cryptographic, I think the fine grained TDE would
be better solution. Therefore if we eventually want the fine grained
TDE I wonder if it might be better to develop the table/tablespace TDE
first while keeping it simple as much as possible in v1, and then we
can provide the functionality to encrypt other data in database
cluster to satisfy the encrypting-everything requirement. I guess that
it's easier to incrementally add encryption target objects rather than
making it fine grained while not changing encryption target objects.

FWIW I'm writing a draft patch of per tablespace TDE and will submit
it in this month. We can more discuss the complexity of the proposed
TDE using it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Tue, Jun 18, 2019 at 5:07 PM shawn wang <shawn.wang.pg@gmail.com> wrote:
>
> Masahiko Sawada <sawada.mshk@gmail.com> 于2019年6月17日周一 下午8:30写道:
>>
>> On Fri, Jun 14, 2019 at 7:41 AM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>> > I personally find the idea of encrypting tablespaces rather strange.
>> > Tablespaces are meant to define hysical location for objects, but this
>> > would also use them to "mark" objects as encrypted or not. That just
>> > seems misguided and would make the life harder for many users.
>> >
>> > For example, what if I don't have any tablespaces (except for the
>> > default one), but I want to encrypt only some objects? Suddenly I have
>> > to create a tablespace, which will however cause various difficulties
>> > down the road (during pg_basebackup, etc.).
>>
>> I guess that we can have an encrypted tabelspace by default (e.g.
>> pg_default_enc). Or we encrypt per tables while having encryption keys
>> per tablespaces.
>
>
> Hi Sawada-san,
> I do agree with it.
>>
>>
>> On Mon, Jun 17, 2019 at 6:54 AM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> > On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
>> > >Greetings,
>> > >
>> > >* Joe Conway (mail@joeconway.com) wrote:
>> > >> On 6/16/19 9:45 AM, Bruce Momjian wrote:
>> > >> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> > >> >> In any case it doesn't address my first point, which is limiting the
>> > >> >> volume encrypted with the same key. Another valid reason is you might
>> > >> >> have data at varying sensitivity levels and prefer different keys be
>> > >> >> used for each level.
>> > >> >
>> > >> > That seems quite complex.
>> > >>
>> > >> How? It is no more complex than encrypting at the tablespace level
>> > >> already gives you - in that case you get this property for free if you
>> > >> care to use it.
>> > >
>> > >Perhaps not surprising, but I'm definitely in agreement with Joe
>> > >regarding having multiple keys when possible and (reasonably)
>> > >straight-forward to do so.  I also don't buy off on the OpenSSL
>> > >argument; their more severe issues certainly haven't been due to key
>> > >management issues such as what we're discussing here, so I don't think
>> > >the argument applies.
>> > >
>> >
>> > I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
>> > with? IMHO Bruce is quite right that the risk of vulnerabilities grows
>> > with the complexity of the system (both due to implementation bugs and
>> > general design weaknesses). I don't think it's tied to the key
>> > management specifically, except that it's one of the parts that may
>> > contribute to the complexity.
>> >
>> > (It's often claimed that key management is one of the weakest points of
>> > current crypto systems - we have safe (a)symmetric algorithms, but safe
>> > handling of keys is an issue. I don't have data / papers supporting this
>> > claim, I kinda believe it.)
>> >
>> > Now, I'm not opposed to eventually implementing something more
>> > elaborate, but I also think just encrypting the whole cluster (not
>> > necessarily with a single key, but with one master key) would be enough
>> > for vast majority of users. Plus it's less error prone and easier to
>> > operate (backups, replicas, crash recovery, ...).
>> >
>> > But there's about 0% chance we'll get that in v1, of course, so we need
>> > s "minimum viable product" to build on anyway.
>> >
>>
>> I agree that we need minimum viable product first. But I'm not sure
>> it's true that the implementing the cluster-wide TDE first could be
>> the first step of per-tablespace/table TDE.
>
>
> Yes, we could complete the per-tablespace/table TDE in version 13.
> And we could do cluster-wide TDE in the next version.
> But I remember you said there are so many keys to manage in the table-level.

I think even if we provide the per table encryption we can have
encryption keys per tablepaces. That is, all tables on the same
tablespace encryption use the same encryption keys but user can
control encrypted objects per tables.

> Will we add the table-level TDE in the first version?

I hope so but It's under discussion now.

> And I have two questions.
> 1. Will we add hooks to support replacing the encryption algorithms?
> 2. Will we add some encryption algorithm or use some in some libraries?

Currently the WIP patch uses openssl and supports only AES-256 and I
don't have a plan to have such extensibility for now. But it might be
a good idea in the future. I think it would be not hard to support
symmetric encryption altgorithms supported by openssl but would you
like to support other encryption algorithms?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On 6/20/19 8:34 AM, Masahiko Sawada wrote:
> I think even if we provide the per table encryption we can have
> encryption keys per tablepaces. That is, all tables on the same
> tablespace encryption use the same encryption keys but user can
> control encrypted objects per tables.
> 
>> Will we add the table-level TDE in the first version?
> 
> I hope so but It's under discussion now.

+1

>> And I have two questions.
>> 1. Will we add hooks to support replacing the encryption algorithms?
>> 2. Will we add some encryption algorithm or use some in some libraries?
> 
> Currently the WIP patch uses openssl and supports only AES-256 and I
> don't have a plan to have such extensibility for now. But it might be
> a good idea in the future. I think it would be not hard to support
> symmetric encryption altgorithms supported by openssl but would you
> like to support other encryption algorithms?

Supporting other symmetric encryption algorithms would be nice but I
don't think that is required for the first version. It would also be
nice but not initially required to support different encryption
libraries. The implementation should be written with both of these
eventualities in mind though IMHO.

I would like to see strategically placed hooks in the key management so
that an extension could, for example, layer another key in between the
master key and the per-tablespace key. This would allow extensions to
provide additional control over when decryption is allowed.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



On Thu, Jun 20, 2019 at 10:46 PM Joe Conway <mail@joeconway.com> wrote:
>
> On 6/20/19 8:34 AM, Masahiko Sawada wrote:
> > I think even if we provide the per table encryption we can have
> > encryption keys per tablepaces. That is, all tables on the same
> > tablespace encryption use the same encryption keys but user can
> > control encrypted objects per tables.
> >
> >> Will we add the table-level TDE in the first version?
> >
> > I hope so but It's under discussion now.
>
> +1
>
> >> And I have two questions.
> >> 1. Will we add hooks to support replacing the encryption algorithms?
> >> 2. Will we add some encryption algorithm or use some in some libraries?
> >
> > Currently the WIP patch uses openssl and supports only AES-256 and I
> > don't have a plan to have such extensibility for now. But it might be
> > a good idea in the future. I think it would be not hard to support
> > symmetric encryption altgorithms supported by openssl but would you
> > like to support other encryption algorithms?
>
> Supporting other symmetric encryption algorithms would be nice but I
> don't think that is required for the first version. It would also be
> nice but not initially required to support different encryption
> libraries. The implementation should be written with both of these
> eventualities in mind though IMHO.

Agreed.

>
> I would like to see strategically placed hooks in the key management so
> that an extension could, for example, layer another key in between the
> master key and the per-tablespace key. This would allow extensions to
> provide additional control over when decryption is allowed.

Interesting.

The master key management is a important topic. In my proposal, we
provide generic key management APIs such as getkey, removekey,
generatekey, in order to manage the master key. A key management
extension could get the master key from an arbitrary external system.
So it also could layer an another key in between the master key and
the per-tablespace key.

Do you have any thoughts on the master key management? That design
would be flexible but complicated. Especially, the API design would be
controversial.

There is a way to enter the passphrase to Postgres to get the master
key stored in the database but the passphrase could be written in the
server log if we pass it using SQL command, which is bad. It would
require to invent another system to prevent particular SQL from being
written to the server log. Another example is to enter the password or
passphrase via a command line option. But it also could require users
to write the plain passphrase or password in script files, which is
bad too.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, Jun 17, 2019 at 11:02 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Jun 17, 2019 at 08:39:27AM -0400, Joe Conway wrote:
> >On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> >> From perspective of  cryptographic, I think the fine grained TDE would
> >> be better solution. Therefore if we eventually want the fine grained
> >> TDE I wonder if it might be better to develop the table/tablespace TDE
> >> first while keeping it simple as much as possible in v1, and then we
> >> can provide the functionality to encrypt other data in database
> >> cluster to satisfy the encrypting-everything requirement. I guess that
> >> it's easier to incrementally add encryption target objects rather than
> >> making it fine grained while not changing encryption target objects.
> >>
> >> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> >> it in this month. We can more discuss the complexity of the proposed
> >> TDE using it.
> >
> >+1
> >
> >Looking forward to it.
> >
>
> Yep. In particular, I'm interested in those aspects:
>

Attached the draft version patch sets of per tablespace transparent
data at rest encryption. The patch doesn't support full functionality,
it includes:

* Per tablespace encryption
* Encryption and decryption buffer data when disk I/O.
* 2 tier key hierarchy and key rotation
* Temporary file encryption (based on the patch Antonin proposd)
* System catalog encryption
* Generic key management API and test module
* Simple TAP test

but doesn't include for now (I'm writing):

* WAL encryption
* Replication supports
* pg_upgrade support
* Documentation
* README

and doesn't support:

* SLRU data encryption
* other system file encryption (pg_twophase, pg_subtrans, backup_label etc)
* Server log encryption

Before explaining the detail of the patch let me share my thoughts on
the following points.

> (1) What's the proposed minimum viable product, and how do we expect to
> extend it with the more elaborate features. I don't expect perfect
> specification, but we should have some idea so that we don't paint
> ourselves in the corner.

I think the minimum viable product should support the following features.

* Fine grained encryption object control (not using single key for
whole database cluster).
* Encrypt and decrypt tables (including system catalogs), indexes,
TOAST tables, WAL and temporary files when disk I/O.
* Passing either password, passphrase or encryption key to postgres
server without the risk of being written to files.
* Front-end programs provided by PostgreSQL source code work as much
as possible.
* Key rotation

I think that the following features would be added.

* SLRU and other data encryption. I think we can use an another
encryption key for these data.
* Support other encryption algorithms. I don't have any idea so far
but it would be not hard to support other symmetric-key algorithm.
* Faster key rotation. It can be done by having 2 tier key hierarchy.
* Integrate with external key management services. The patch
implements this but I'm sure there are other ways to integrate with
external key management services.

>
> (2) How does it affect recovery, backups and replication (both physical
> and logical)? That is, which other parts need to know the encryption keys
> to function properly?

If we encrypt whole 8kB WAL block (in cluster-wide encryption case) it
would be not hard because we just encrypt before writing to the disk
with single key. On the other hand if we encrypt some WAL records it
could be hard; it requires changes around WAL assembly code so that it
can obtain encryption keys and encrypt WAL data before inserting to
the WAL buffer. Since WAL is encrypted the recovery needs to obtain
all encryption keys and decrypt the encrypted WAL.

For streaming replication, since basically wal senders don't need to
know the actual contents of WAL (although xlogreader need to know WAL
header for validation) they send WAL data in encrypted state. And wal
receiver decrypt them. Therefore encryption keys also must be
replicated. On the other hand, logical replication (and logical
decoding) needs to decrypt WAL data when decoding. Since the logical
decoding is performed in PostgreSQL server side it's not hard to
obtain all encryption keys. It can send change sets both in
unencrypted state and even in encrypted sate if encrypt them again. We
would change xlogreader code so that it can decrypt WAL. So I think
that logical replication will be able to get WAL data in unencrypted
state without special operation.

For backups, physical backup must be encrypted even if we get it by
pg_basebackup, otherwise we cannot protect data from a malicious
backup operator threats. And encryption keys also must be backed up
together. Because this is data at rest encryption, logical backups can
be taken in unencrypted state. I think we would need nothing special
for backups.

>
> (3) What does it mean for external tools (pg_waldump, pg_upgrade,
> pg_rewind etc.)?

I think that this definitely affects at least pg_waldump, pg_upgrade,
pg_checksums and pg_rewind. By changing WAL format or giving
encryption keys to these programs we can support pg_waldump and
pg_rewind even for encrypted database. I prefer the former because
passing encryption keys to front-end programs could be risk of key
leakage. It also would affects external tools that reads or writes
database file and WAL directly. For instance pg_rman, which is a
recovery management tool, read database file and takes a backup
without a hole in each pages. Such programs would need encryption
keys.

Here is the details of patches.

Usage
======
To enable TDE feature please specify --with-openssl configuration
option. Also, please set kmgr_plugin_library GUC parameter in
postgresql.conf, which specifies the library for key managemnt
program. The patch includes contrib/kmgr_file which is the test
program for key management and store the master key in the local disk.
So for test purpose you can set kmgr_plugin_library = 'kmgr_file'.

After starting up postgres server, you can create an encrypted
tablespace by specifying 'encryption' option like,

CREATE TABLESPACE enctblspc LOCATION '/path/to/tblsp' WITH (encryption = on);

And then the tables, indexes and TOAST tables created on the
tablespace will be encrypted at rest.

For system catalogs, system catalogs on pg_default and global are not
encrypted. If you want to encrypt system catalogs, we need to create a
database on an encrypted tablespace. During copying database file from
source database we either encrypt/reencrypt each system catalogs.
You can enable and disable encryption of the table by moving
tablepsace between encrypted tablespace and non-encrypted tablespace.

Changes
=======

* 0001-Add-encryption-module-supporting-AES-256-by-using-op.patch

This patch is mostly based on the patch Antoin proposed[1] but I
modified some contents. This patch adds encrption function and
decryption function using openssl. It currently support AES-256-XTS
for buffer data encryption and AES-256-CTE for WAL encryption.

* 0002-Add-kmgr-plugin-APIs.patch

This patch adds new generic key managment APIs: startup, get,
generate, isexist and remove. Kmgr plugin programs can define these
primitive function to manage the master key that could be located at
external server. The plugin program is specified by
'kmgr_plugin_library' GUC parameter, and loaded when postmaster starts
up.

0003-Add-key-management-module-for-transparent-data-encry.patch

This patch adds key management module, which is responsible for
tablespace key management. All tablepace keys are persisted to the
file on disk, called keyring file, and loaded to the hash table on the
shared memory when postmatser starts up. The tablespace keys on the
shared memory is not encrypted state. Whenever a encrypted tablepsace
is created or dropped the keyring file is modified.

Master key identifier is used as the key for the master key. It
consists of system identifier and sequence number starting from 0 like
'pg_master_key-6707524-0000'. The sequence number is incremented
whenever key rotation.

When key rotation, we generate a new master key id in PostgreSQL core
and ask the kmgr plugin to generate new master key identified by the
new master key. And then update all tablespace keys in the keyring
file by reencrypting with the new master key.

0004-Add-facility-to-give-process-local-encryption-key.patch

This patch adds functionallity to get a process-local temporary key,
which is intended to use for temporary file encryption.

0005-Encrypt-and-decrypt-data-on-encrypted-tablespace-whe.patch

This patch support buffer encrption; encrypts and decrypt database
data when disk I/O. It adds new smgr callbacks smgrencrypta and
smgrdecrypt, and mdencrypt and mddecrypt but please note that
currently the patch supports only heap and nbtree, I'm trying to
support other access methods. Basically, when bufmgr reads buffer or
writes buffer through the shared buffer the access methods don't need
to care about the buffer encryption. However when the access methods
themselves write the buffer directly to the disk it needs to call
smgrencrypt.

0006-Encrypt-buffile.patch

This is the patch proposed Antonin. Since I've not look the detail of
this patch yet I'll look it.

0007-Make-Reorderbuffer-encrypt-spilled-out-file.patch

Same as above.

0008-Support-tablespace-encryption.patch

This patch adds 'encryption' option to tablespace.

0009-Add-kmgr-plugin-test-module-kmgr_file.patch

This patch adds a test module for kmgr plugin. It generates random
master key string and stores it to the local disk. Since this store
the master key without encryption this is for test purpose. It also
has TAP test for TDE.

Feedback and comment are very welcome.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment
On Sun, Jun 16, 2019 at 03:57:46PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Sun, Jun 16, 2019 at 12:42:55PM -0400, Joe Conway wrote:
> > > On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > > > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> > > >> In any case it doesn't address my first point, which is limiting the
> > > >> volume encrypted with the same key. Another valid reason is you might
> > > >> have data at varying sensitivity levels and prefer different keys be
> > > >> used for each level.
> > > > 
> > > > That seems quite complex.
> > > 
> > > How? It is no more complex than encrypting at the tablespace level
> > > already gives you - in that case you get this property for free if you
> > > care to use it.
> > 
> > All keys used to encrypt WAL data must be unlocked at all times or crash
> > recovery, PITR, and replication will not stop when it hits a locked key.
> > Given that, how much value is there in allowing a key per tablespace?
> 
> There's a few different things to discuss here, admittedly, but I don't
> think it means that there's no value in having a key per tablespace.
> 
> Ideally, a given backend would only need, and only have access to, the
> keys for the tablespaces that it is allowed to operate on.  I realize
> that's a bit farther than what we're talking about today, but hopefully
> not too much to be able to consider.

What people really want with more-granular-than-cluster encryption is
the ability to supply their passphrase key _when_ they want to access
their data, and then leave and be sure their data is secure from
decryption.  That will not be possible since the WAL will be encrypted
and any replay of it will need their passphrase key to unlock it, or the
entire system will be unrecoverable.

This is a fundamental issue, and will eventually doom any more granular
encryption approach, unless we want to use the same key for all
encrypted tablespaces, create separate WALs for each tablespace, or say
recovery of some tablespaces will fail.  I doubt any of those will be
acceptable.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> On 6/15/19 9:28 PM, Bruce Momjian wrote:
> > There are no known non-exhaustive plaintext attacks on AES:
> > 
> >     https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks
> 
> Even that non-authoritative stackexchange thread has varying opinions.
> Surely you don't claim that limiting know plaintext as much as is
> practical is a bad idea in general.

AES is used to encrypt TLS/https, and web traffic is practically always
mostly-known plaintext.  I don't know of any cases where only part of a
webpage is encrypted by TLS to avoid encrypting known plaintext.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Jul-05, Bruce Momjian wrote:

> What people really want with more-granular-than-cluster encryption is
> the ability to supply their passphrase key _when_ they want to access
> their data, and then leave and be sure their data is secure from
> decryption.  That will not be possible since the WAL will be encrypted
> and any replay of it will need their passphrase key to unlock it, or the
> entire system will be unrecoverable.

I'm not sure I understand why WAL replay needs the passphrase to work.
Why isn't the data saved in WAL already encrypted, which can be applied
as raw bytes to each data block, without needing to decrypt anything?
Only if somebody wants to interpret the bytes they need the passphrase,
no?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> On 2019-Jul-05, Bruce Momjian wrote:
>
> > What people really want with more-granular-than-cluster encryption is
> > the ability to supply their passphrase key _when_ they want to access
> > their data, and then leave and be sure their data is secure from
> > decryption.  That will not be possible since the WAL will be encrypted
> > and any replay of it will need their passphrase key to unlock it, or the
> > entire system will be unrecoverable.
>
> I'm not sure I understand why WAL replay needs the passphrase to work.
> Why isn't the data saved in WAL already encrypted, which can be applied
> as raw bytes to each data block, without needing to decrypt anything?
> Only if somebody wants to interpret the bytes they need the passphrase,
> no?

I had been specifically thinking of tablespaces because we might be able
to do something exactly along these lines- keep which tablespace the
data is in directly in the WAL (and not encrypted), but then have the
data itself be encrypted, and with the key for that tablespace.

Splitting the WAL by tablespace would be even nicer, of course... :)

Thanks!

Stephen

Attachment
On 2019-Jul-05, Stephen Frost wrote:

> I had been specifically thinking of tablespaces because we might be able
> to do something exactly along these lines- keep which tablespace the
> data is in directly in the WAL (and not encrypted), but then have the
> data itself be encrypted, and with the key for that tablespace.

Hmm, I was imagining that the user-level data is encrypted, while the
metadata such as the containing relfilenode is not encrypted and thus
can be read by system processes such as checkpointer or WAL-apply
without needing to decrypt anything.  Maybe I'm just lacking imagination
for an attack that uses that unencrypted metadata, though.

> Splitting the WAL by tablespace would be even nicer, of course... :)

Hmm, I think you would have to synchronize the apply anyway (i.e. not
replay in one tablespace ahead of a record in another tablespace with an
earlier LSN.)  What are you thinking are the gains of doing that, anyway?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jul  5, 2019 at 03:46:28PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-05, Bruce Momjian wrote:
> 
> > What people really want with more-granular-than-cluster encryption is
> > the ability to supply their passphrase key _when_ they want to access
> > their data, and then leave and be sure their data is secure from
> > decryption.  That will not be possible since the WAL will be encrypted
> > and any replay of it will need their passphrase key to unlock it, or the
> > entire system will be unrecoverable.
> 
> I'm not sure I understand why WAL replay needs the passphrase to work.
> Why isn't the data saved in WAL already encrypted, which can be applied
> as raw bytes to each data block, without needing to decrypt anything?
> Only if somebody wants to interpret the bytes they need the passphrase,
> no?

Uh, well, you have the WAL record, and you want to write it to an 8k
page.  You have to read the 8k page from disk into shared buffers, and
you have to decrypt the 8k page to do that, right?  We aren't going to
store 8k pages encrypted in shared buffers, right?

If you did want to do that, or wanted to write them to disk without
decrypting the 8k page, it still would not work since AES is a 16-byte
encryption cipher.  I don't think we can break 8k pages up into 16-byte
chunks and be sure we can just place data into those 16-byte boundaries.

Also, that assumes that we are only encrypting the WAL payload, and not
the relation oids or other header information, which I think is a
mistake because it will lead to information leakage.

You can use AES in stream cipher mode, but then the ordering of the
encryption is critical and you can't move 16-byte chunks around --- they
have to be decrypted in their encrypted order.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 05, 2019 at 03:38:28PM -0400, Bruce Momjian wrote:
>On Sun, Jun 16, 2019 at 03:57:46PM -0400, Stephen Frost wrote:
>> Greetings,
>>
>> * Bruce Momjian (bruce@momjian.us) wrote:
>> > On Sun, Jun 16, 2019 at 12:42:55PM -0400, Joe Conway wrote:
>> > > On 6/16/19 9:45 AM, Bruce Momjian wrote:
>> > > > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> > > >> In any case it doesn't address my first point, which is limiting the
>> > > >> volume encrypted with the same key. Another valid reason is you might
>> > > >> have data at varying sensitivity levels and prefer different keys be
>> > > >> used for each level.
>> > > >
>> > > > That seems quite complex.
>> > >
>> > > How? It is no more complex than encrypting at the tablespace level
>> > > already gives you - in that case you get this property for free if you
>> > > care to use it.
>> >
>> > All keys used to encrypt WAL data must be unlocked at all times or crash
>> > recovery, PITR, and replication will not stop when it hits a locked key.
>> > Given that, how much value is there in allowing a key per tablespace?
>>
>> There's a few different things to discuss here, admittedly, but I don't
>> think it means that there's no value in having a key per tablespace.
>>
>> Ideally, a given backend would only need, and only have access to, the
>> keys for the tablespaces that it is allowed to operate on.  I realize
>> that's a bit farther than what we're talking about today, but hopefully
>> not too much to be able to consider.
>
>What people really want with more-granular-than-cluster encryption is
>the ability to supply their passphrase key _when_ they want to access
>their data, and then leave and be sure their data is secure from
>decryption.  That will not be possible since the WAL will be encrypted
>and any replay of it will need their passphrase key to unlock it, or the
>entire system will be unrecoverable.
>
>This is a fundamental issue, and will eventually doom any more granular
>encryption approach, unless we want to use the same key for all
>encrypted tablespaces, create separate WALs for each tablespace, or say
>recovery of some tablespaces will fail.  I doubt any of those will be
>acceptable.
>

I agree this is a pretty crucial challenge, and those requirements seem
in direct conflict. Users use encryption to protect privacy of the data,
but we need access to some of the data to implement some of the
important tasks of a RDBMS.

And it's not just about things like recovery or replication. How do you
do ANALYZE on encrypted data? Sure, if a user runs it in a session that
has the right key, that's fine. But what about autovacuum/autoanalyze?

I suspect the issue here is that we're trying to retrofit a solution for
data-at-rest encryption to something that seems closer to protecting
data during execution.

Which is a worthwhile goal, of course, but perhaps we're trying to use
the wrong tool to achieve it? To paraphrase the hammer/nail saying "If
all you know is a block encryption, everything looks like a block."


What if the granular encryption (not the "whole cluster with a single
key") case does not encrypt whole blocks, but just tuple data? Would
that allow at least the most critical WAL use cases (recovery, physical
replication) to work without having to know all the encryption keys?

Of course, that would be a much less efficient compared to plain block
encryption, but that may be the "natural cost" of the feature.

It would not solve e.g. logical replication or ANALYZE, which both
require access to the plaintext data, though.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On 2019-Jul-05, Bruce Momjian wrote:

> Uh, well, you have the WAL record, and you want to write it to an 8k
> page.  You have to read the 8k page from disk into shared buffers, and
> you have to decrypt the 8k page to do that, right?  We aren't going to
> store 8k pages encrypted in shared buffers, right?

Oh, is that the idea?  I was kinda assuming that the data was kept
as-stored in shared buffers, ie. it would be decrypted on access, not on
read from disk.  The system seems very prone to leakage if you have it
decrypted in shared memory.

If you keep it unencrypted in shared_buffers, you'd require WAL replay
and checkpointer to have every single key in the system.  That sounds
nightmarish -- a single user can create a table, hide the key and block
WAL replay and checkpointing for the whole system.

I haven't read the whole thread, sorry about that -- maybe these points
have already been discussed.

> Also, that assumes that we are only encrypting the WAL payload, and not
> the relation oids or other header information, which I think is a
> mistake because it will lead to information leakage.

Well, that was part of my question.  Why do you care to encrypt metadata
such as the relation OID (or really relfilenode)?  Yes, I was assuming
that only the WAL payload is encrypted.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jul  5, 2019 at 04:24:54PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-05, Bruce Momjian wrote:
> 
> > Uh, well, you have the WAL record, and you want to write it to an 8k
> > page.  You have to read the 8k page from disk into shared buffers, and
> > you have to decrypt the 8k page to do that, right?  We aren't going to
> > store 8k pages encrypted in shared buffers, right?
> 
> Oh, is that the idea?  I was kinda assuming that the data was kept
> as-stored in shared buffers, ie. it would be decrypted on access, not on
> read from disk.  The system seems very prone to leakage if you have it
> decrypted in shared memory.

Well, the overhead of decrypting on every access will make the slowdown
huge, and I don't know what security value that would have.  I am not
sure what security value TDE itself has, but I think encrypting shared
buffer contents has even less.

> If you keep it unencrypted in shared_buffers, you'd require WAL replay
> and checkpointer to have every single key in the system.  That sounds
> nightmarish -- a single user can create a table, hide the key and block
> WAL replay and checkpointing for the whole system.

Yep, bingo!

> I haven't read the whole thread, sorry about that -- maybe these points
> have already been discussed.
> 
> > Also, that assumes that we are only encrypting the WAL payload, and not
> > the relation oids or other header information, which I think is a
> > mistake because it will lead to information leakage.
> 
> Well, that was part of my question.  Why do you care to encrypt metadata
> such as the relation OID (or really relfilenode)?  Yes, I was assuming
> that only the WAL payload is encrypted.

Well, you would need to decide what WAL information needs to be secured.
Is the fact an insert was performed on a table a security issue? 
Depends on your risks.  My point is that almost anything you do beyond
cluster-level encryption either adds complexity that is bug-prone or
fragile, or adds unacceptable overhead, or leaks security information.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul  5, 2019 at 05:00:42PM -0400, Bruce Momjian wrote:
> On Fri, Jul  5, 2019 at 04:24:54PM -0400, Alvaro Herrera wrote:
> > On 2019-Jul-05, Bruce Momjian wrote:
> > 
> > > Uh, well, you have the WAL record, and you want to write it to an 8k
> > > page.  You have to read the 8k page from disk into shared buffers, and
> > > you have to decrypt the 8k page to do that, right?  We aren't going to
> > > store 8k pages encrypted in shared buffers, right?
> > 
> > Oh, is that the idea?  I was kinda assuming that the data was kept
> > as-stored in shared buffers, ie. it would be decrypted on access, not on
> > read from disk.  The system seems very prone to leakage if you have it
> > decrypted in shared memory.
> 
> Well, the overhead of decrypting on every access will make the slowdown
> huge, and I don't know what security value that would have.  I am not
> sure what security value TDE itself has, but I think encrypting shared
> buffer contents has even less.

Sorry for the delay --- here is some benchmark info:

    https://www.postgresql.org/message-id/4723a402-b14f-4994-2de9-d85b55a56b7f%40cybertec.at

    as far as benchmarking is concerned: i did a quick test yesterday (not 
    with the final AES implementation yet) and i got pretty good results. 
    with a reasonably well cached database in a typical application I expect
    
    to loose around 10-20%. if everything fits in memory there is 0 loss of 
    course. the worst I got with the standard AES (no hardware support used 
    yet) I lost around 45% or so. but this requires a value as low as 32 MB 
    of shared buffers or so.

Notice the 0% overhead if everything fits in RAM, meaning it is not
decrypting on RAM access.  If it is 10-20% for a "reasonably well cached
database", I am sure it will be 10x that for decrypting on RAM access.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +




On Fri, Jul  5, 2019 at 05:00:42PM -0400, Bruce Momjian wrote:
> On Fri, Jul  5, 2019 at 04:24:54PM -0400, Alvaro Herrera wrote:
> > On 2019-Jul-05, Bruce Momjian wrote:
> > 
> > > Uh, well, you have the WAL record, and you want to write it to an 8k
> > > page.  You have to read the 8k page from disk into shared buffers, and
> > > you have to decrypt the 8k page to do that, right?  We aren't going to
> > > store 8k pages encrypted in shared buffers, right?
> > 
> > Oh, is that the idea?  I was kinda assuming that the data was kept
> > as-stored in shared buffers, ie. it would be decrypted on access, not on
> > read from disk.  The system seems very prone to leakage if you have it
> > decrypted in shared memory.
> 
> Well, the overhead of decrypting on every access will make the slowdown
> huge, and I don't know what security value that would have.  I am not
> sure what security value TDE itself has, but I think encrypting shared
> buffer contents has even less.

Sorry I didn't answer your question directly.  Since the shared buffers
are in memory, if the decryption key is also unlocked in memory, there
isn't much value to encrypting shared buffers, and the overhead would be
huge.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul  5, 2019 at 04:10:04PM -0400, Bruce Momjian wrote:
> On Fri, Jul  5, 2019 at 03:46:28PM -0400, Alvaro Herrera wrote:
> > On 2019-Jul-05, Bruce Momjian wrote:
> > 
> > > What people really want with more-granular-than-cluster encryption is
> > > the ability to supply their passphrase key _when_ they want to access
> > > their data, and then leave and be sure their data is secure from
> > > decryption.  That will not be possible since the WAL will be encrypted
> > > and any replay of it will need their passphrase key to unlock it, or the
> > > entire system will be unrecoverable.
> > 
> > I'm not sure I understand why WAL replay needs the passphrase to work.
> > Why isn't the data saved in WAL already encrypted, which can be applied
> > as raw bytes to each data block, without needing to decrypt anything?
> > Only if somebody wants to interpret the bytes they need the passphrase,
> > no?
> 
> Uh, well, you have the WAL record, and you want to write it to an 8k
> page.  You have to read the 8k page from disk into shared buffers, and
> you have to decrypt the 8k page to do that, right?  We aren't going to
> store 8k pages encrypted in shared buffers, right?
> 
> If you did want to do that, or wanted to write them to disk without
> decrypting the 8k page, it still would not work since AES is a 16-byte
> encryption cipher.  I don't think we can break 8k pages up into 16-byte
> chunks and be sure we can just place data into those 16-byte boundaries.

I am not sure I was clear in describing this.  If you want to copy data
directly from WAL to the 8k pages, you have to use either block mode or
streaming mode for both 8k pages and WAL.  If you use block mode, then
changing any data on the pages will change all encrypted storage in the
same pages after the change, and computing WAL will require 16-byte
boundries.  If you use streaming mode, you have to compute the proper
stream at the point in the 8k pages where you are changing the row.  My
point is that in neither case can you just encrypt for the row for WAL
and assume it can be placed in an 8k pages.  Neither option seems
desirable.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Jul-05, Bruce Momjian wrote:

> On Fri, Jul  5, 2019 at 05:00:42PM -0400, Bruce Momjian wrote:
> > On Fri, Jul  5, 2019 at 04:24:54PM -0400, Alvaro Herrera wrote:

> > > Oh, is that the idea?  I was kinda assuming that the data was kept
> > > as-stored in shared buffers, ie. it would be decrypted on access, not on
> > > read from disk.  The system seems very prone to leakage if you have it
> > > decrypted in shared memory.
> > 
> > Well, the overhead of decrypting on every access will make the slowdown
> > huge, and I don't know what security value that would have.  I am not
> > sure what security value TDE itself has, but I think encrypting shared
> > buffer contents has even less.
> 
> Sorry I didn't answer your question directly.  Since the shared buffers
> are in memory, if the decryption key is also unlocked in memory, there
> isn't much value to encrypting shared buffers, and the overhead would be
> huge.

Oh, I get your point now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jul  5, 2019 at 10:24:39PM +0200, Tomas Vondra wrote:
> I agree this is a pretty crucial challenge, and those requirements seem
> in direct conflict. Users use encryption to protect privacy of the data,
> but we need access to some of the data to implement some of the
> important tasks of a RDBMS.
> 
> And it's not just about things like recovery or replication. How do you
> do ANALYZE on encrypted data? Sure, if a user runs it in a session that
> has the right key, that's fine. But what about autovacuum/autoanalyze?

There might be a way to defer ANALYZE and autovacuum/autoanalyze, but
what about VACUUM FREEZE?  We can defer that too, but not the clog
truncation that is eventually the product of the freeze.

What about referential integrity constraints that need to check primary
keys in the encrypted tables?  I also don't see a way of delaying that,
and if you can't do referential integrity into the encrypted tables, it
reduces the value of having encrypted data in the same database rather
than in another database or cluster?

I still feel we have not clearly described what the options are:

1.  Encrypt everything

2.  Encrypt only some tables (for performance reasons), and use only one
key, or use multiple keys to allow for key rotation.  All keys are
always unlocked.

3.  Encrypt only some tables with different keys, and the keys are not
always unlocked.

As Tomas already stated, using tablespaces to distinguish encrypted from
non-encrypted tables doesn't make sense since the file system used for
the storage is immaterial to the encryptions status. An easier way would
be to just add a bit to WAL that would indicate if the rest of the WAL
record is encrypted, though I doubt the performance boost is worth the
complexity.

I see the attraction of #3, but operationally it is unclear how we can
decouple data that is not always accessible, as outlined above.  We
could probably work around the WAL issues, but it is going to be much
more overhead.  It is also unclear how the user supplies the keys ---
are they done at boot time, and if so, how are later keys unlocked, or
does the client provide it?  If the client provides it, isn't it better
to do client-side encryption, or have the client use pgcrypto with some
key management around it like pgcryptokey?  This presentation shows how
to use triggers to implement transparent encryption at the column level:

    https://momjian.us/main/writings/crypto_hw_use.pdf#page=77

Structurally, I understand the desire to push key control out to users
in #3, but it then becomes very complex to construct a system where the
data is tightly coupled in a Postgres cluster.  The pgcrypto method
above works because it decouples row control, like xmin/xmax and WAL
replay, which is not encrypted, with the row payload, which is
encrypted.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-07-05 22:24, Tomas Vondra wrote:
> What if the granular encryption (not the "whole cluster with a single
> key") case does not encrypt whole blocks, but just tuple data? Would
> that allow at least the most critical WAL use cases (recovery, physical
> replication) to work without having to know all the encryption keys?

Finding the exact point where you divide up sensitive and non-sensitive
data would be difficult.

For example, say, you encrypt the tuple payload but not the tuple
header, so that vacuum would still work.  Then, someone who has access
to the raw data directory could infer in combination with commit
timestamps for example, that on Friday between 5pm and 6pm, 10000
records were updated, 500 were inserted, and 200 were deleted, and that
table has about this size, and this happens every Friday, and so on.
That seems way to much information to reveal for an allegedly encrypted
data directory.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Sun, Jul 7, 2019 at 1:05 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Jul  5, 2019 at 10:24:39PM +0200, Tomas Vondra wrote:
> > I agree this is a pretty crucial challenge, and those requirements seem
> > in direct conflict. Users use encryption to protect privacy of the data,
> > but we need access to some of the data to implement some of the
> > important tasks of a RDBMS.
> >
> > And it's not just about things like recovery or replication. How do you
> > do ANALYZE on encrypted data? Sure, if a user runs it in a session that
> > has the right key, that's fine. But what about autovacuum/autoanalyze?
>
> There might be a way to defer ANALYZE and autovacuum/autoanalyze, but
> what about VACUUM FREEZE?  We can defer that too, but not the clog
> truncation that is eventually the product of the freeze.
>
> What about referential integrity constraints that need to check primary
> keys in the encrypted tables?  I also don't see a way of delaying that,
> and if you can't do referential integrity into the encrypted tables, it
> reduces the value of having encrypted data in the same database rather
> than in another database or cluster?
>

I just thought that PostgreSQL's auxiliary processes such as
autovacuum, startup, checkpointer, bgwriter should always be able to
access all keys because there are already in inside database. Even
today these processes don't check any privileges when accessing to
data. What security threats we can protect data from by requiring
privileges even for auxiliary processes? If this is a security problem
isn't it also true for cluster-wide encryption? I guess that processes
who have an access privilege on the table can always get the
corresponding encryption key. And any processes cannot access an
encryption key directly without accessing to a database object.

> I still feel we have not clearly described what the options are:
>
> 1.  Encrypt everything
>
> 2.  Encrypt only some tables (for performance reasons), and use only one
> key, or use multiple keys to allow for key rotation.  All keys are
> always unlocked.
>
> 3.  Encrypt only some tables with different keys, and the keys are not
> always unlocked.
>
> As Tomas already stated, using tablespaces to distinguish encrypted from
> non-encrypted tables doesn't make sense since the file system used for
> the storage is immaterial to the encryptions status. An easier way would
> be to just add a bit to WAL that would indicate if the rest of the WAL
> record is encrypted, though I doubt the performance boost is worth the
> complexity.

Okay, instead of using tablespaces we can create groups grouping
tables being encrypted with the same key. I think the one of the most
important point here is to provide a granular encryption feature and
have less the number of keys in database cluster, not to provide per
tablespace encryption feature. I'm not going to insist it should be
per tablespace encryption.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, Jul  8, 2019 at 06:04:28PM +0900, Masahiko Sawada wrote:
> On Sun, Jul 7, 2019 at 1:05 AM Bruce Momjian <bruce@momjian.us> wrote:
> > What about referential integrity constraints that need to check primary
> > keys in the encrypted tables?  I also don't see a way of delaying that,
> > and if you can't do referential integrity into the encrypted tables, it
> > reduces the value of having encrypted data in the same database rather
> > than in another database or cluster?
> 
> I just thought that PostgreSQL's auxiliary processes such as
> autovacuum, startup, checkpointer, bgwriter should always be able to
> access all keys because there are already in inside database. Even
> today these processes don't check any privileges when accessing to
> data. What security threats we can protect data from by requiring
> privileges even for auxiliary processes? If this is a security problem
> isn't it also true for cluster-wide encryption? I guess that processes
> who have an access privilege on the table can always get the
> corresponding encryption key. And any processes cannot access an
> encryption key directly without accessing to a database object.

Well, see my list of three things that users want in an earlier email:

    https://www.postgresql.org/message-id/20190706160514.b67q4f7abcxfdahk@momjian.us

When people are asking for multiple keys (not just for key rotation),
they are asking to have multiple keys that can be supplied by users only
when they need to access the data.  Yes, the keys are always in the
datbase, but the feature request is that they are only unlocked when the
user needs to access the data.  Obviously, that will not work for
autovacuum when the encryption is at the block level.

If the key is always unlocked, there is questionable security value of
having multiple keys, beyond key rotation.

> > I still feel we have not clearly described what the options are:
> >
> > 1.  Encrypt everything
> >
> > 2.  Encrypt only some tables (for performance reasons), and use only one
> > key, or use multiple keys to allow for key rotation.  All keys are
> > always unlocked.
> >
> > 3.  Encrypt only some tables with different keys, and the keys are not
> > always unlocked.
> >
> > As Tomas already stated, using tablespaces to distinguish encrypted from
> > non-encrypted tables doesn't make sense since the file system used for
> > the storage is immaterial to the encryptions status. An easier way would
> > be to just add a bit to WAL that would indicate if the rest of the WAL
> > record is encrypted, though I doubt the performance boost is worth the
> > complexity.
> 
> Okay, instead of using tablespaces we can create groups grouping
> tables being encrypted with the same key. I think the one of the most
> important point here is to provide a granular encryption feature and

Why is this important?  What are you trying to accomplish?

> have less the number of keys in database cluster, not to provide per
> tablespace encryption feature. I'm not going to insist it should be
> per tablespace encryption.

It is unclear which item you are looking for.  Which number are you
suggesting from the three listed above in the email URL?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 7/8/19 10:19 AM, Bruce Momjian wrote:
> When people are asking for multiple keys (not just for key rotation),
> they are asking to have multiple keys that can be supplied by users only
> when they need to access the data.  Yes, the keys are always in the
> datbase, but the feature request is that they are only unlocked when the
> user needs to access the data.  Obviously, that will not work for
> autovacuum when the encryption is at the block level.

> If the key is always unlocked, there is questionable security value of
> having multiple keys, beyond key rotation.

That is not true. Having multiple keys also allows you to reduce the
amount of data encrypted with a single key, which is desirable because:

1. It makes cryptanalysis more difficult
2. Puts less data at risk if someone gets "lucky" in doing brute force


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > When people are asking for multiple keys (not just for key rotation),
> > they are asking to have multiple keys that can be supplied by users only
> > when they need to access the data.  Yes, the keys are always in the
> > datbase, but the feature request is that they are only unlocked when the
> > user needs to access the data.  Obviously, that will not work for
> > autovacuum when the encryption is at the block level.
> 
> > If the key is always unlocked, there is questionable security value of
> > having multiple keys, beyond key rotation.
> 
> That is not true. Having multiple keys also allows you to reduce the
> amount of data encrypted with a single key, which is desirable because:
> 
> 1. It makes cryptanalysis more difficult
> 2. Puts less data at risk if someone gets "lucky" in doing brute force

What systems use multiple keys like that?  I know of no website that
does that.  Your arguments seem hypothetical.  What is your goal here?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > When people are asking for multiple keys (not just for key rotation),
> > > they are asking to have multiple keys that can be supplied by users only
> > > when they need to access the data.  Yes, the keys are always in the
> > > datbase, but the feature request is that they are only unlocked when the
> > > user needs to access the data.  Obviously, that will not work for
> > > autovacuum when the encryption is at the block level.
> >
> > > If the key is always unlocked, there is questionable security value of
> > > having multiple keys, beyond key rotation.
> >
> > That is not true. Having multiple keys also allows you to reduce the
> > amount of data encrypted with a single key, which is desirable because:
> >
> > 1. It makes cryptanalysis more difficult
> > 2. Puts less data at risk if someone gets "lucky" in doing brute force
>
> What systems use multiple keys like that?  I know of no website that
> does that.  Your arguments seem hypothetical.  What is your goal here?

Not sure what the reference to 'website' is here, but one doesn't get
certificates for TLS/SSL usage that aren't time-bounded, and when it
comes to the actual on-the-wire encryption that's used, that's a
symmetric key that's generated on-the-fly for every connection.

Wouldn't the fact that they generate a different key for every
connection be a pretty clear indication that it's a good idea to use
multiple keys and not use the same key over and over..?

Of course, we can discuss if what websites do with over-the-wire
encryption is sensible to compare to what we want to do in PG for
data-at-rest, but then we shouldn't be talking about what websites do,
it'd make more sense to look at other data-at-rest encryption systems
and consider what they're doing.

Thanks,

Stephen

Attachment
On 2019-07-08 17:47, Stephen Frost wrote:
> Of course, we can discuss if what websites do with over-the-wire
> encryption is sensible to compare to what we want to do in PG for
> data-at-rest, but then we shouldn't be talking about what websites do,
> it'd make more sense to look at other data-at-rest encryption systems
> and consider what they're doing.

So, how do encrypted file systems do it?  Are there any encrypted file
systems in general use that allow encrypting only some files or
encrypting different parts of the file system with different keys, or
any of those other granular approaches being discussed?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 7/8/19 11:56 AM, Peter Eisentraut wrote:
> On 2019-07-08 17:47, Stephen Frost wrote:
>> Of course, we can discuss if what websites do with over-the-wire
>> encryption is sensible to compare to what we want to do in PG for
>> data-at-rest, but then we shouldn't be talking about what websites do,
>> it'd make more sense to look at other data-at-rest encryption systems
>> and consider what they're doing.
>
> So, how do encrypted file systems do it?  Are there any encrypted file
> systems in general use that allow encrypting only some files or
> encrypting different parts of the file system with different keys, or
> any of those other granular approaches being discussed?

Well it is fairly common, for good reason IMHO, to encrypt some mount
points and not others on a system. In my mind, and in practice to a
large extent, a postgres tablespace == a unique mount point.

There is a description here:

  https://wiki.archlinux.org/index.php/Disk_encryption

A pertinent quote:
----
After it has been derived, the master key is securely stored in memory
(e.g. in a kernel keyring), for as long as the encrypted block device or
folder is mounted.

It is usually not used for de/encrypting the disk data directly, though.
For example, in the case of stacked filesystem encryption, each file can
be automatically assigned its own encryption key. Whenever the file is
to be read/modified, this file key first needs to be decrypted using the
main key, before it can itself be used to de/encrypt the file contents:

                           ╭┈┈┈┈┈┈┈┈┈┈┈┈╮
                           ┊ master key ┊
   file on disk:           ╰┈┈┈┈┈┬┈┈┈┈┈┈╯
  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐        │
  ╎╭───────────────────╮╎        ▼          ╭┈┈┈┈┈┈┈┈┈┈╮
  ╎│ encrypted file key│━━━━(decryption)━━━▶┊ file key ┊
  ╎╰───────────────────╯╎                   ╰┈┈┈┈┬┈┈┈┈┈╯
  ╎┌───────────────────┐╎                        ▼
┌┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┐
  ╎│ encrypted file    │◀━━━━━━━━━━━━━━━━━(de/encryption)━━━▶┊ readable
file ┊
  ╎│ contents          │╎                                    ┊ contents
     ┊
  ╎└───────────────────┘╎
└┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┘
  └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

In a similar manner, a separate key (e.g. one per folder) may be used
for the encryption of file names in the case of stacked filesystem
encryption.
----

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > When people are asking for multiple keys (not just for key rotation),
> > > > they are asking to have multiple keys that can be supplied by users only
> > > > when they need to access the data.  Yes, the keys are always in the
> > > > datbase, but the feature request is that they are only unlocked when the
> > > > user needs to access the data.  Obviously, that will not work for
> > > > autovacuum when the encryption is at the block level.
> > > 
> > > > If the key is always unlocked, there is questionable security value of
> > > > having multiple keys, beyond key rotation.
> > > 
> > > That is not true. Having multiple keys also allows you to reduce the
> > > amount of data encrypted with a single key, which is desirable because:
> > > 
> > > 1. It makes cryptanalysis more difficult
> > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> > 
> > What systems use multiple keys like that?  I know of no website that
> > does that.  Your arguments seem hypothetical.  What is your goal here?
> 
> Not sure what the reference to 'website' is here, but one doesn't get
> certificates for TLS/SSL usage that aren't time-bounded, and when it
> comes to the actual on-the-wire encryption that's used, that's a
> symmetric key that's generated on-the-fly for every connection.
> 
> Wouldn't the fact that they generate a different key for every
> connection be a pretty clear indication that it's a good idea to use
> multiple keys and not use the same key over and over..?
> 
> Of course, we can discuss if what websites do with over-the-wire
> encryption is sensible to compare to what we want to do in PG for
> data-at-rest, but then we shouldn't be talking about what websites do,
> it'd make more sense to look at other data-at-rest encryption systems
> and consider what they're doing.

(I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
used only for authentication, and Diffie–Hellman is used for key
exchange:

    https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

So, the question is whether you can pass so much data in TLS that using
the same key for the entire session is a security issue.  TLS originally
had key renegotiation, but that was removed in TLS 1.3:

    https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
    To mitigate these types of attacks, TLS 1.3 disallows renegotiation.

Of course, a database is going to process even more data so if the
amount of data encrypted is a problem, we might have a problem too in
using a single key.  This is not related to whether we use one key for
the entire cluster or multiple keys per tablespace --- the problem is
the same.  I guess we could create 1024 keys and use the bottom bits of
the block number to decide what key to use.  However, that still only
pushes the goalposts farther.

Anyway, I will to research the reasonable data size that can be secured
with a single key via AES.  I will look at how PGP encrypts large files
too.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 08, 2019 at 11:25:10AM -0400, Bruce Momjian wrote:
>On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
>> On 7/8/19 10:19 AM, Bruce Momjian wrote:
>> > When people are asking for multiple keys (not just for key rotation),
>> > they are asking to have multiple keys that can be supplied by users only
>> > when they need to access the data.  Yes, the keys are always in the
>> > datbase, but the feature request is that they are only unlocked when the
>> > user needs to access the data.  Obviously, that will not work for
>> > autovacuum when the encryption is at the block level.
>>
>> > If the key is always unlocked, there is questionable security value of
>> > having multiple keys, beyond key rotation.
>>
>> That is not true. Having multiple keys also allows you to reduce the
>> amount of data encrypted with a single key, which is desirable because:
>>
>> 1. It makes cryptanalysis more difficult
>> 2. Puts less data at risk if someone gets "lucky" in doing brute force
>
>What systems use multiple keys like that?  I know of no website that
>does that.  Your arguments seem hypothetical.  What is your goal here?
>

I might ask the same question about databases - which databases use an
encryption scheme where the database does not have access to the keys?

Actually, I've already asked this question before ...

The databases I'm familiar with do store all the keys in a vault that's
unlocked during startup, and then users may get keys from it (including
maintenance processes). We could still control access to those keys in
various ways (ACL or whatever), of course.

BTW how do you know this is what users want? Maybe they do, but then
again - maybe they just see it as magic and don't realize the extra
complexity (not just at the database level). In my experience users
generally want more abstract things, like "Ensure data privacy in case
media theft," or "protection against evil DBA".


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > > When people are asking for multiple keys (not just for key rotation),
> > > > > they are asking to have multiple keys that can be supplied by users only
> > > > > when they need to access the data.  Yes, the keys are always in the
> > > > > datbase, but the feature request is that they are only unlocked when the
> > > > > user needs to access the data.  Obviously, that will not work for
> > > > > autovacuum when the encryption is at the block level.
> > > >
> > > > > If the key is always unlocked, there is questionable security value of
> > > > > having multiple keys, beyond key rotation.
> > > >
> > > > That is not true. Having multiple keys also allows you to reduce the
> > > > amount of data encrypted with a single key, which is desirable because:
> > > >
> > > > 1. It makes cryptanalysis more difficult
> > > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> > >
> > > What systems use multiple keys like that?  I know of no website that
> > > does that.  Your arguments seem hypothetical.  What is your goal here?
> >
> > Not sure what the reference to 'website' is here, but one doesn't get
> > certificates for TLS/SSL usage that aren't time-bounded, and when it
> > comes to the actual on-the-wire encryption that's used, that's a
> > symmetric key that's generated on-the-fly for every connection.
> >
> > Wouldn't the fact that they generate a different key for every
> > connection be a pretty clear indication that it's a good idea to use
> > multiple keys and not use the same key over and over..?
> >
> > Of course, we can discuss if what websites do with over-the-wire
> > encryption is sensible to compare to what we want to do in PG for
> > data-at-rest, but then we shouldn't be talking about what websites do,
> > it'd make more sense to look at other data-at-rest encryption systems
> > and consider what they're doing.
>
> (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
> used only for authentication, and Diffie–Hellman is used for key
> exchange:
>
>     https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

Right, and the key that's figured out for each connection is at least
specific to the server AND client keys/certificates, thus meaning that
they're changed at least as frequently as those change (and clients end
up creating ones on the fly randomly if they don't have one, iirc).

> So, the question is whether you can pass so much data in TLS that using
> the same key for the entire session is a security issue.  TLS originally
> had key renegotiation, but that was removed in TLS 1.3:
>
>     https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
>     To mitigate these types of attacks, TLS 1.3 disallows renegotiation.

It was removed due to attacks targeting the renegotiation, not because
doing re-keying by itself was a bad idea, or because using multiple keys
was seen as a bad idea.

> Of course, a database is going to process even more data so if the
> amount of data encrypted is a problem, we might have a problem too in
> using a single key.  This is not related to whether we use one key for
> the entire cluster or multiple keys per tablespace --- the problem is
> the same.  I guess we could create 1024 keys and use the bottom bits of
> the block number to decide what key to use.  However, that still only
> pushes the goalposts farther.

All of this is about pushing the goalposts farther away, as I see it.
There's going to be trade-offs here and there isn't going to be any "one
right answer" when it comes to this space.  That's why I'm inclined to
argue that we should try to come up with a relatively *good* solution
that doesn't create a huge amount of work for us, and then build on
that.  To that end, leveraging metadata that we already have outside of
the catalogs (databases, tablespaces, potentially other information that
we store, essentially, in the filesystem metadata already) to decide on
what key to use, and how many we can support, strikes me as a good
initial target.

> Anyway, I will to research the reasonable data size that can be secured
> with a single key via AES.  I will look at how PGP encrypts large files
> too.

This seems unlikely to lead to a definitive result, but it would be
interesting to hear if there have been studies around that and what
their conclusions were.

When it comes to concerns about autovacuum or other system processes,
those don't have any direct user connections or interactions, so having
them be more privileged and having access to more is reasonable.

Ideally, all of this would leverage a vaulting system or other mechanism
which manages access to the keys and allows their usage to be limited.
That's been generally accepted as a good way to bridge the gap between
having to ask users every time for a key and having keys stored
long-term in memory.  Having *only* the keys for the data which the
currently connected user is allowed to access would certainly be a great
initial capability, even if system processes (including potentially WAL
replay) have to have access to all of the keys.  And yes, shared buffers
being unencrypted and accessible by every backend continues to be an
issue- it'd be great to improve on that situation too.  I don't think
having everything encrypted in shared buffers is likely the solution,
rather, segregating it up might make more sense, again, along similar
lines to keys and using metadata that's outside of the catalogs, which
has been discussed previously, though I don't think anyone's actively
working on it.

Thanks,

Stephen

Attachment
On Mon, Jul 08, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Bruce Momjian (bruce@momjian.us) wrote:
>> On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
>> > * Bruce Momjian (bruce@momjian.us) wrote:
>> > > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
>> > > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
>> > > > > When people are asking for multiple keys (not just for key rotation),
>> > > > > they are asking to have multiple keys that can be supplied by users only
>> > > > > when they need to access the data.  Yes, the keys are always in the
>> > > > > datbase, but the feature request is that they are only unlocked when the
>> > > > > user needs to access the data.  Obviously, that will not work for
>> > > > > autovacuum when the encryption is at the block level.
>> > > >
>> > > > > If the key is always unlocked, there is questionable security value of
>> > > > > having multiple keys, beyond key rotation.
>> > > >
>> > > > That is not true. Having multiple keys also allows you to reduce the
>> > > > amount of data encrypted with a single key, which is desirable because:
>> > > >
>> > > > 1. It makes cryptanalysis more difficult
>> > > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
>> > >
>> > > What systems use multiple keys like that?  I know of no website that
>> > > does that.  Your arguments seem hypothetical.  What is your goal here?
>> >
>> > Not sure what the reference to 'website' is here, but one doesn't get
>> > certificates for TLS/SSL usage that aren't time-bounded, and when it
>> > comes to the actual on-the-wire encryption that's used, that's a
>> > symmetric key that's generated on-the-fly for every connection.
>> >
>> > Wouldn't the fact that they generate a different key for every
>> > connection be a pretty clear indication that it's a good idea to use
>> > multiple keys and not use the same key over and over..?
>> >
>> > Of course, we can discuss if what websites do with over-the-wire
>> > encryption is sensible to compare to what we want to do in PG for
>> > data-at-rest, but then we shouldn't be talking about what websites do,
>> > it'd make more sense to look at other data-at-rest encryption systems
>> > and consider what they're doing.
>>
>> (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
>> used only for authentication, and Diffie–Hellman is used for key
>> exchange:
>>
>>     https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
>
>Right, and the key that's figured out for each connection is at least
>specific to the server AND client keys/certificates, thus meaning that
>they're changed at least as frequently as those change (and clients end
>up creating ones on the fly randomly if they don't have one, iirc).
>
>> So, the question is whether you can pass so much data in TLS that using
>> the same key for the entire session is a security issue.  TLS originally
>> had key renegotiation, but that was removed in TLS 1.3:
>>
>>     https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
>>     To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
>
>It was removed due to attacks targeting the renegotiation, not because
>doing re-keying by itself was a bad idea, or because using multiple keys
>was seen as a bad idea.
>
>> Of course, a database is going to process even more data so if the
>> amount of data encrypted is a problem, we might have a problem too in
>> using a single key.  This is not related to whether we use one key for
>> the entire cluster or multiple keys per tablespace --- the problem is
>> the same.  I guess we could create 1024 keys and use the bottom bits of
>> the block number to decide what key to use.  However, that still only
>> pushes the goalposts farther.
>
>All of this is about pushing the goalposts farther away, as I see it.
>There's going to be trade-offs here and there isn't going to be any "one
>right answer" when it comes to this space.  That's why I'm inclined to
>argue that we should try to come up with a relatively *good* solution
>that doesn't create a huge amount of work for us, and then build on
>that.  To that end, leveraging metadata that we already have outside of
>the catalogs (databases, tablespaces, potentially other information that
>we store, essentially, in the filesystem metadata already) to decide on
>what key to use, and how many we can support, strikes me as a good
>initial target.
>
>> Anyway, I will to research the reasonable data size that can be secured
>> with a single key via AES.  I will look at how PGP encrypts large files
>> too.
>
>This seems unlikely to lead to a definitive result, but it would be
>interesting to hear if there have been studies around that and what
>their conclusions were.
>
>When it comes to concerns about autovacuum or other system processes,
>those don't have any direct user connections or interactions, so having
>them be more privileged and having access to more is reasonable.
>

I think Bruce's proposal was to minimize the time the key is "unlocked"
in memory by only unlocking them when the user connects and supplies
some sort of secret (passphrase), and remove them from memory when the
user disconnects. So there's no way for the auxiliary processes to gain
access to those keys, because only the user knows the secret.

FWIW I have doubts this scheme actually measurably improves privacy in
practice, because most busy applications will end up having the keys in
the memory all the time anyway.

It also assumes memory is unsafe, i.e. bad actors can read it, and
that's probably a valid concern (root access, vulnerabilities etc.). But
in that case we already have plenty of issues with data in flight
anyway, and I doubt TDE is an answer to that.

>Ideally, all of this would leverage a vaulting system or other mechanism
>which manages access to the keys and allows their usage to be limited.
>That's been generally accepted as a good way to bridge the gap between
>having to ask users every time for a key and having keys stored
>long-term in memory.

Right. I agree with this.

>Having *only* the keys for the data which the
>currently connected user is allowed to access would certainly be a great
>initial capability, even if system processes (including potentially WAL
>replay) have to have access to all of the keys.  And yes, shared buffers
>being unencrypted and accessible by every backend continues to be an
>issue- it'd be great to improve on that situation too.  I don't think
>having everything encrypted in shared buffers is likely the solution,
>rather, segregating it up might make more sense, again, along similar
>lines to keys and using metadata that's outside of the catalogs, which
>has been discussed previously, though I don't think anyone's actively
>working on it.
>

I very much doubt TDE is a solution to this. Essentially, TDE is a good
data-at-rest solution, but this seems more like protecting data during
execution. And in that case I think we may need an entirely different
encryption scheme.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 08, 2019 at 12:16:04PM -0400, Bruce Momjian wrote:
>
> ...
>
>Anyway, I will to research the reasonable data size that can be secured
>with a single key via AES.  I will look at how PGP encrypts large files
>too.
>

IMO there are various recommendations about this, for example from NIST.
But it varies on the exact encryption mode (say, GCM, XTS, ...) and the
recommendations are not "per key" but "per key + nonce" etc.

IANAC but my understanding is if we use e.g. "OID + blocknum" as nonce,
then we should be pretty safe.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 08, 2019 at 12:09:58PM -0400, Joe Conway wrote:
>On 7/8/19 11:56 AM, Peter Eisentraut wrote:
>> On 2019-07-08 17:47, Stephen Frost wrote:
>>> Of course, we can discuss if what websites do with over-the-wire
>>> encryption is sensible to compare to what we want to do in PG for
>>> data-at-rest, but then we shouldn't be talking about what websites do,
>>> it'd make more sense to look at other data-at-rest encryption systems
>>> and consider what they're doing.
>>
>> So, how do encrypted file systems do it?  Are there any encrypted file
>> systems in general use that allow encrypting only some files or
>> encrypting different parts of the file system with different keys, or
>> any of those other granular approaches being discussed?
>
>Well it is fairly common, for good reason IMHO, to encrypt some mount
>points and not others on a system. In my mind, and in practice to a
>large extent, a postgres tablespace == a unique mount point.
>
>There is a description here:
>
>  https://wiki.archlinux.org/index.php/Disk_encryption
>

That link is a bit overwhelming, as it explains how various encrypted
filesystems do things. There's now official support for this in the
Linux kernel (encryption at the filesystem level, not block device) in
the form of fscrypt, see

  https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

It's a bit different because that's not a stacked encryption, it's
integrated directly into filesystems (like ext4, at the moment) and it
leverages other kernel facilities (like keyring).

The link also discusses the threat model, which is interesting
particularly interesting for this discussion, IMO.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul  8, 2019 at 12:09:58PM -0400, Joe Conway wrote:
> On 7/8/19 11:56 AM, Peter Eisentraut wrote:
> > On 2019-07-08 17:47, Stephen Frost wrote:
> >> Of course, we can discuss if what websites do with over-the-wire
> >> encryption is sensible to compare to what we want to do in PG for
> >> data-at-rest, but then we shouldn't be talking about what websites do,
> >> it'd make more sense to look at other data-at-rest encryption systems
> >> and consider what they're doing.
> > 
> > So, how do encrypted file systems do it?  Are there any encrypted file
> > systems in general use that allow encrypting only some files or
> > encrypting different parts of the file system with different keys, or
> > any of those other granular approaches being discussed?
> 
> Well it is fairly common, for good reason IMHO, to encrypt some mount
> points and not others on a system. In my mind, and in practice to a
> large extent, a postgres tablespace == a unique mount point.

Yes, that is a normal partition point for key use because one file
system is independent of others.  You could use different keys for
different directories in the same file system, but internally it all
uses the same storage, and storage theft would potentially happen at the
file system level.

For Postgres, tablespaces are not independent of the database system,
though media theft would still match.  Of course, in the case of a
tablespace media theft, Postgres would be quite confused, though you
could still start the database system:

    SELECT * FROM test;
    ERROR:  could not open file
    "pg_tblspc/16385/PG_13_201907054/16384/16386": No such file or directory

but the data would be gone.  What you can't do with Postgres is to have
the tablespace be inaccessible and then later reappear.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul  8, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
> > > Of course, we can discuss if what websites do with over-the-wire
> > > encryption is sensible to compare to what we want to do in PG for
> > > data-at-rest, but then we shouldn't be talking about what websites do,
> > > it'd make more sense to look at other data-at-rest encryption systems
> > > and consider what they're doing.
> > 
> > (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
> > used only for authentication, and Diffie–Hellman is used for key
> > exchange:
> > 
> >     https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
> 
> Right, and the key that's figured out for each connection is at least
> specific to the server AND client keys/certificates, thus meaning that
> they're changed at least as frequently as those change (and clients end
> up creating ones on the fly randomly if they don't have one, iirc).
> 
> > So, the question is whether you can pass so much data in TLS that using
> > the same key for the entire session is a security issue.  TLS originally
> > had key renegotiation, but that was removed in TLS 1.3:
> > 
> >     https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
> >     To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
> 
> It was removed due to attacks targeting the renegotiation, not because
> doing re-keying by itself was a bad idea, or because using multiple keys
> was seen as a bad idea.

Well, if it was a necessary features, I assume TLS 1.3 would have found
a way to make it secure, no?  Certainly they are not shipping TLS 1.3
with a known weakness.

> > Of course, a database is going to process even more data so if the
> > amount of data encrypted is a problem, we might have a problem too in
> > using a single key.  This is not related to whether we use one key for
> > the entire cluster or multiple keys per tablespace --- the problem is
> > the same.  I guess we could create 1024 keys and use the bottom bits of
> > the block number to decide what key to use.  However, that still only
> > pushes the goalposts farther.
> 
> All of this is about pushing the goalposts farther away, as I see it.
> There's going to be trade-offs here and there isn't going to be any "one
> right answer" when it comes to this space.  That's why I'm inclined to
> argue that we should try to come up with a relatively *good* solution
> that doesn't create a huge amount of work for us, and then build on
> that.  To that end, leveraging metadata that we already have outside of
> the catalogs (databases, tablespaces, potentially other information that
> we store, essentially, in the filesystem metadata already) to decide on
> what key to use, and how many we can support, strikes me as a good
> initial target.

Yes, we will need that for a usable nonce that we don't need to store in
the blocks and WAL files.

> > Anyway, I will to research the reasonable data size that can be secured
> > with a single key via AES.  I will look at how PGP encrypts large files
> > too.
> 
> This seems unlikely to lead to a definitive result, but it would be
> interesting to hear if there have been studies around that and what
> their conclusions were.

I found this:


https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
    https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340

The numbers listed are:

    Maximum Encrypted Plaintext Size:  68GB
    Maximum Processed Additional Authenticated Data: 2 x 10^18

The 68GB value is "the maximum bits that can be processed with a single
key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
assume a unique nonce per page that is 10^32 bytes.

For the WAL we would probably use a different nonce for each 16MB page,
so we would be OK there too, since that is 10 ^ 36 bytes.

gives us 10^36 bytes before the segment number causes the nonce to
repeat.

> When it comes to concerns about autovacuum or other system processes,
> those don't have any direct user connections or interactions, so having
> them be more privileged and having access to more is reasonable.

Well, I am trying to understand the value of having some keys accessible
by some parts of the system, and some not.  I am unclear what security
value that has.

> Ideally, all of this would leverage a vaulting system or other mechanism
> which manages access to the keys and allows their usage to be limited.
> That's been generally accepted as a good way to bridge the gap between
> having to ask users every time for a key and having keys stored
> long-term in memory.  Having *only* the keys for the data which the
> currently connected user is allowed to access would certainly be a great
> initial capability, even if system processes (including potentially WAL
> replay) have to have access to all of the keys.  And yes, shared buffers
> being unencrypted and accessible by every backend continues to be an
> issue- it'd be great to improve on that situation too.  I don't think
> having everything encrypted in shared buffers is likely the solution,
> rather, segregating it up might make more sense, again, along similar
> lines to keys and using metadata that's outside of the catalogs, which
> has been discussed previously, though I don't think anyone's actively
> working on it.

What is this trying to protect against?  Without a clear case, I don't
see what that complexity is buying us.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
> > > > Of course, we can discuss if what websites do with over-the-wire
> > > > encryption is sensible to compare to what we want to do in PG for
> > > > data-at-rest, but then we shouldn't be talking about what websites do,
> > > > it'd make more sense to look at other data-at-rest encryption systems
> > > > and consider what they're doing.
> > >
> > > (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
> > > used only for authentication, and Diffie–Hellman is used for key
> > > exchange:
> > >
> > >     https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
> >
> > Right, and the key that's figured out for each connection is at least
> > specific to the server AND client keys/certificates, thus meaning that
> > they're changed at least as frequently as those change (and clients end
> > up creating ones on the fly randomly if they don't have one, iirc).
> >
> > > So, the question is whether you can pass so much data in TLS that using
> > > the same key for the entire session is a security issue.  TLS originally
> > > had key renegotiation, but that was removed in TLS 1.3:
> > >
> > >     https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
> > >     To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
> >
> > It was removed due to attacks targeting the renegotiation, not because
> > doing re-keying by itself was a bad idea, or because using multiple keys
> > was seen as a bad idea.
>
> Well, if it was a necessary features, I assume TLS 1.3 would have found
> a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> with a known weakness.

As discussed below- this is about moving goalposts, and that's, in part
at least, why re-keying isn't a *necessary* feature of TLS.  As the
amount of data you transmit over a given TLS connection increases
though, the risk increases and it would be better to re-key.  How much
better?  That depends a great deal on if someone is trying to mount an
attack or not.

> > > Of course, a database is going to process even more data so if the
> > > amount of data encrypted is a problem, we might have a problem too in
> > > using a single key.  This is not related to whether we use one key for
> > > the entire cluster or multiple keys per tablespace --- the problem is
> > > the same.  I guess we could create 1024 keys and use the bottom bits of
> > > the block number to decide what key to use.  However, that still only
> > > pushes the goalposts farther.
> >
> > All of this is about pushing the goalposts farther away, as I see it.
> > There's going to be trade-offs here and there isn't going to be any "one
> > right answer" when it comes to this space.  That's why I'm inclined to
> > argue that we should try to come up with a relatively *good* solution
> > that doesn't create a huge amount of work for us, and then build on
> > that.  To that end, leveraging metadata that we already have outside of
> > the catalogs (databases, tablespaces, potentially other information that
> > we store, essentially, in the filesystem metadata already) to decide on
> > what key to use, and how many we can support, strikes me as a good
> > initial target.
>
> Yes, we will need that for a usable nonce that we don't need to store in
> the blocks and WAL files.

I'm not a fan of the idea of using something which is predictable as a
nonce.  Using the username as the salt for our md5 password mechanism
was, all around, a bad idea.  This seems like it's repeating that
mistake.

> > > Anyway, I will to research the reasonable data size that can be secured
> > > with a single key via AES.  I will look at how PGP encrypts large files
> > > too.
> >
> > This seems unlikely to lead to a definitive result, but it would be
> > interesting to hear if there have been studies around that and what
> > their conclusions were.
>
> I found this:
>
>
https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
>     https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340
>
> The numbers listed are:
>
>     Maximum Encrypted Plaintext Size:  68GB
>     Maximum Processed Additional Authenticated Data: 2 x 10^18

These are specific to AES, from a quick reading of those pages, right?

> The 68GB value is "the maximum bits that can be processed with a single
> key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
> assume a unique nonce per page that is 10^32 bytes.

A unique nonce per page strikes me as excessive...  but then, I think we
should have an actually random nonce rather than something calculated
from the metadata.

> For the WAL we would probably use a different nonce for each 16MB page,
> so we would be OK there too, since that is 10 ^ 36 bytes.
>
> gives us 10^36 bytes before the segment number causes the nonce to
> repeat.

This presumes that the segment number is involved in the nonce
selection, which again strikes me as a bad idea.  Maybe it could be
involved in some way, but we should have a properly random nonce.

> > When it comes to concerns about autovacuum or other system processes,
> > those don't have any direct user connections or interactions, so having
> > them be more privileged and having access to more is reasonable.
>
> Well, I am trying to understand the value of having some keys accessible
> by some parts of the system, and some not.  I am unclear what security
> value that has.

A very real risk is a low-privilege process gaining access to the entire
backend process, and therefore being able to access anything that
backend is able to.

> > Ideally, all of this would leverage a vaulting system or other mechanism
> > which manages access to the keys and allows their usage to be limited.
> > That's been generally accepted as a good way to bridge the gap between
> > having to ask users every time for a key and having keys stored
> > long-term in memory.  Having *only* the keys for the data which the
> > currently connected user is allowed to access would certainly be a great
> > initial capability, even if system processes (including potentially WAL
> > replay) have to have access to all of the keys.  And yes, shared buffers
> > being unencrypted and accessible by every backend continues to be an
> > issue- it'd be great to improve on that situation too.  I don't think
> > having everything encrypted in shared buffers is likely the solution,
> > rather, segregating it up might make more sense, again, along similar
> > lines to keys and using metadata that's outside of the catalogs, which
> > has been discussed previously, though I don't think anyone's actively
> > working on it.
>
> What is this trying to protect against?  Without a clear case, I don't
> see what that complexity is buying us.

This is trying to protect against cross-domain leakage due to specific
security vulnerabilities, similar to those just recently fixed, where a
given backend is able to be comrpomised and is able to be used to run
any code the attacker wishes inside of that backend's process.

If that user-connected backend is only able to access keys/data that is
at the level of their connection, then the data leakage is scoped to
only data at that level.  If they're able to access anything in the
database system, then the entire system and all of the data is
compromised.

Thanks,

Stephen

Attachment
On Mon, Jul  8, 2019 at 09:30:03PM +0200, Tomas Vondra wrote:
> I think Bruce's proposal was to minimize the time the key is "unlocked"
> in memory by only unlocking them when the user connects and supplies
> some sort of secret (passphrase), and remove them from memory when the
> user disconnects. So there's no way for the auxiliary processes to gain
> access to those keys, because only the user knows the secret.

I mentioned that because I thought that was the security value that
people wanted.  While I can see the value, I don't see how it can be
cleanly accomplished.  Keeping the keys unlocked at all times seems to
be possible, but of much smaller value.

Part of my goal in this discussion is to reverse the rush to implement
and pick apart exactly what is possible, and desirable.

> FWIW I have doubts this scheme actually measurably improves privacy in
> practice, because most busy applications will end up having the keys in
> the memory all the time anyway.

Yep.

> It also assumes memory is unsafe, i.e. bad actors can read it, and
> that's probably a valid concern (root access, vulnerabilities etc.). But
> in that case we already have plenty of issues with data in flight
> anyway, and I doubt TDE is an answer to that.

Agreed.

> > Ideally, all of this would leverage a vaulting system or other mechanism
> > which manages access to the keys and allows their usage to be limited.
> > That's been generally accepted as a good way to bridge the gap between
> > having to ask users every time for a key and having keys stored
> > long-term in memory.
> 
> Right. I agree with this.
> 
> > Having *only* the keys for the data which the
> > currently connected user is allowed to access would certainly be a great
> > initial capability, even if system processes (including potentially WAL
> > replay) have to have access to all of the keys.  And yes, shared buffers
> > being unencrypted and accessible by every backend continues to be an
> > issue- it'd be great to improve on that situation too.  I don't think
> > having everything encrypted in shared buffers is likely the solution,
> > rather, segregating it up might make more sense, again, along similar
> > lines to keys and using metadata that's outside of the catalogs, which
> > has been discussed previously, though I don't think anyone's actively
> > working on it.
> > 
> 
> I very much doubt TDE is a solution to this. Essentially, TDE is a good
> data-at-rest solution, but this seems more like protecting data during
> execution. And in that case I think we may need an entirely different
> encryption scheme.

I thought client-level encryption or pgcrypto-style encryption fits that
need better.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > with a known weakness.
> 
> As discussed below- this is about moving goalposts, and that's, in part
> at least, why re-keying isn't a *necessary* feature of TLS.  As the

I agree we have to allow rekeying and allow multiple unlocked keys in
the server at the same time.  The open question is whether encrypting
different data with different keys and different unlock controls is
possible or useful.

> amount of data you transmit over a given TLS connection increases
> though, the risk increases and it would be better to re-key.  How much
> better?  That depends a great deal on if someone is trying to mount an
> attack or not.

Yep, we need to allow rekey.

> > > > Of course, a database is going to process even more data so if the
> > > > amount of data encrypted is a problem, we might have a problem too in
> > > > using a single key.  This is not related to whether we use one key for
> > > > the entire cluster or multiple keys per tablespace --- the problem is
> > > > the same.  I guess we could create 1024 keys and use the bottom bits of
> > > > the block number to decide what key to use.  However, that still only
> > > > pushes the goalposts farther.
> > > 
> > > All of this is about pushing the goalposts farther away, as I see it.
> > > There's going to be trade-offs here and there isn't going to be any "one
> > > right answer" when it comes to this space.  That's why I'm inclined to
> > > argue that we should try to come up with a relatively *good* solution
> > > that doesn't create a huge amount of work for us, and then build on
> > > that.  To that end, leveraging metadata that we already have outside of
> > > the catalogs (databases, tablespaces, potentially other information that
> > > we store, essentially, in the filesystem metadata already) to decide on
> > > what key to use, and how many we can support, strikes me as a good
> > > initial target.
> > 
> > Yes, we will need that for a usable nonce that we don't need to store in
> > the blocks and WAL files.
> 
> I'm not a fan of the idea of using something which is predictable as a
> nonce.  Using the username as the salt for our md5 password mechanism
> was, all around, a bad idea.  This seems like it's repeating that
> mistake.

Uh, well, renaming the user was a big problem, but that is the only case
I can think of.  I don't see that as an issue for block or WAL sequence
numbers.  If we want to use a different nonce, we have to find a way to
store it or look it up efficiently.  Considering the nonce size, I don't
see how that is possible.

> > > > Anyway, I will to research the reasonable data size that can be secured
> > > > with a single key via AES.  I will look at how PGP encrypts large files
> > > > too.
> > > 
> > > This seems unlikely to lead to a definitive result, but it would be
> > > interesting to hear if there have been studies around that and what
> > > their conclusions were.
> > 
> > I found this:
> > 
> >
https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
> >     https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340
> > 
> > The numbers listed are:
> > 
> >     Maximum Encrypted Plaintext Size:  68GB
> >     Maximum Processed Additional Authenticated Data: 2 x 10^18
> 
> These are specific to AES, from a quick reading of those pages, right?

Yes, AES with GCM, which has authentication parts we would not use, so
we would use CBC and CTR, which I think has the same or larger spaces.
> 
> > The 68GB value is "the maximum bits that can be processed with a single
> > key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
> > assume a unique nonce per page that is 10^32 bytes.
> 
> A unique nonce per page strikes me as excessive...  but then, I think we
> should have an actually random nonce rather than something calculated
> from the metadata.

Uh, well, you are much less likely to get duplicate nonce values by
using block number or WAL sequence number.  If you look at the
implementations, few compute random nonce values.

> > For the WAL we would probably use a different nonce for each 16MB page,
> > so we would be OK there too, since that is 10 ^ 36 bytes.
> > 
> > gives us 10^36 bytes before the segment number causes the nonce to
> > repeat.
> 
> This presumes that the segment number is involved in the nonce
> selection, which again strikes me as a bad idea.  Maybe it could be
> involved in some way, but we should have a properly random nonce.

And you base the random goal on what?  Nonce is number used only once,
and randomness is not a requirement.  You can say you prefer it, but
why, because most implementations don't use random nonce.

> > > When it comes to concerns about autovacuum or other system processes,
> > > those don't have any direct user connections or interactions, so having
> > > them be more privileged and having access to more is reasonable.
> > 
> > Well, I am trying to understand the value of having some keys accessible
> > by some parts of the system, and some not.  I am unclear what security
> > value that has.
> 
> A very real risk is a low-privilege process gaining access to the entire
> backend process, and therefore being able to access anything that
> backend is able to.

Well, if they get to one key, they will get to them all, right?

> > > Ideally, all of this would leverage a vaulting system or other mechanism
> > > which manages access to the keys and allows their usage to be limited.
> > > That's been generally accepted as a good way to bridge the gap between
> > > having to ask users every time for a key and having keys stored
> > > long-term in memory.  Having *only* the keys for the data which the
> > > currently connected user is allowed to access would certainly be a great
> > > initial capability, even if system processes (including potentially WAL
> > > replay) have to have access to all of the keys.  And yes, shared buffers
> > > being unencrypted and accessible by every backend continues to be an
> > > issue- it'd be great to improve on that situation too.  I don't think
> > > having everything encrypted in shared buffers is likely the solution,
> > > rather, segregating it up might make more sense, again, along similar
> > > lines to keys and using metadata that's outside of the catalogs, which
> > > has been discussed previously, though I don't think anyone's actively
> > > working on it.
> > 
> > What is this trying to protect against?  Without a clear case, I don't
> > see what that complexity is buying us.
> 
> This is trying to protect against cross-domain leakage due to specific
> security vulnerabilities, similar to those just recently fixed, where a
> given backend is able to be comrpomised and is able to be used to run
> any code the attacker wishes inside of that backend's process.

I am not sure TLS is a good solution to that.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > with a known weakness.
> >
> > As discussed below- this is about moving goalposts, and that's, in part
> > at least, why re-keying isn't a *necessary* feature of TLS.  As the
>
> I agree we have to allow rekeying and allow multiple unlocked keys in
> the server at the same time.  The open question is whether encrypting
> different data with different keys and different unlock controls is
> possible or useful.

I'm not sure if there's really a question about if it's *possible*?  As
for if it's useful, I agree there's some debate.

> > amount of data you transmit over a given TLS connection increases
> > though, the risk increases and it would be better to re-key.  How much
> > better?  That depends a great deal on if someone is trying to mount an
> > attack or not.
>
> Yep, we need to allow rekey.

Supporting a way to rekey is definitely a good idea.

> > > > > Of course, a database is going to process even more data so if the
> > > > > amount of data encrypted is a problem, we might have a problem too in
> > > > > using a single key.  This is not related to whether we use one key for
> > > > > the entire cluster or multiple keys per tablespace --- the problem is
> > > > > the same.  I guess we could create 1024 keys and use the bottom bits of
> > > > > the block number to decide what key to use.  However, that still only
> > > > > pushes the goalposts farther.
> > > >
> > > > All of this is about pushing the goalposts farther away, as I see it.
> > > > There's going to be trade-offs here and there isn't going to be any "one
> > > > right answer" when it comes to this space.  That's why I'm inclined to
> > > > argue that we should try to come up with a relatively *good* solution
> > > > that doesn't create a huge amount of work for us, and then build on
> > > > that.  To that end, leveraging metadata that we already have outside of
> > > > the catalogs (databases, tablespaces, potentially other information that
> > > > we store, essentially, in the filesystem metadata already) to decide on
> > > > what key to use, and how many we can support, strikes me as a good
> > > > initial target.
> > >
> > > Yes, we will need that for a usable nonce that we don't need to store in
> > > the blocks and WAL files.
> >
> > I'm not a fan of the idea of using something which is predictable as a
> > nonce.  Using the username as the salt for our md5 password mechanism
> > was, all around, a bad idea.  This seems like it's repeating that
> > mistake.
>
> Uh, well, renaming the user was a big problem, but that is the only case
> I can think of.  I don't see that as an issue for block or WAL sequence
> numbers.  If we want to use a different nonce, we have to find a way to
> store it or look it up efficiently.  Considering the nonce size, I don't
> see how that is possible.

No, this also meant that, as an attacker, I *knew* the salt ahead of
time and therefore could build rainbow tables specifically for that
salt.  I could also use those *same* tables for any system where that
user had an account, even if they used different passwords on different
systems...

I appreciate that *some* of this might not be completely relevant for
the way a nonce is used in cryptography, but I'd be very surprised to
have a cryptographer tell me that a deterministic nonce didn't have
similar issues or didn't reduce the value of the nonce significantly.

> > > > > Anyway, I will to research the reasonable data size that can be secured
> > > > > with a single key via AES.  I will look at how PGP encrypts large files
> > > > > too.
> > > >
> > > > This seems unlikely to lead to a definitive result, but it would be
> > > > interesting to hear if there have been studies around that and what
> > > > their conclusions were.
> > >
> > > I found this:
> > >
> > >
https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
> > >     https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340
> > >
> > > The numbers listed are:
> > >
> > >     Maximum Encrypted Plaintext Size:  68GB
> > >     Maximum Processed Additional Authenticated Data: 2 x 10^18
> >
> > These are specific to AES, from a quick reading of those pages, right?
>
> Yes, AES with GCM, which has authentication parts we would not use, so
> we would use CBC and CTR, which I think has the same or larger spaces.
> >
> > > The 68GB value is "the maximum bits that can be processed with a single
> > > key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
> > > assume a unique nonce per page that is 10^32 bytes.
> >
> > A unique nonce per page strikes me as excessive...  but then, I think we
> > should have an actually random nonce rather than something calculated
> > from the metadata.
>
> Uh, well, you are much less likely to get duplicate nonce values by
> using block number or WAL sequence number.  If you look at the
> implementations, few compute random nonce values.

Which implementations..?  Where do their nonce values come from?  I can
see how a nonce might have to be naturally and deterministically random,
if the source for it is sufficiently varied across the key space, but
starting at '1' and going up with the same key seems like it's just
giving a potential attacker more information about what the encrypted
data contains...

> > > For the WAL we would probably use a different nonce for each 16MB page,
> > > so we would be OK there too, since that is 10 ^ 36 bytes.
> > >
> > > gives us 10^36 bytes before the segment number causes the nonce to
> > > repeat.
> >
> > This presumes that the segment number is involved in the nonce
> > selection, which again strikes me as a bad idea.  Maybe it could be
> > involved in some way, but we should have a properly random nonce.
>
> And you base the random goal on what?  Nonce is number used only once,
> and randomness is not a requirement.  You can say you prefer it, but
> why, because most implementations don't use random nonce.

The encryption schemes I've worked with in the past have used a random
nonce, so I'm wondering where the disconnect is between us on that.

> > > > When it comes to concerns about autovacuum or other system processes,
> > > > those don't have any direct user connections or interactions, so having
> > > > them be more privileged and having access to more is reasonable.
> > >
> > > Well, I am trying to understand the value of having some keys accessible
> > > by some parts of the system, and some not.  I am unclear what security
> > > value that has.
> >
> > A very real risk is a low-privilege process gaining access to the entire
> > backend process, and therefore being able to access anything that
> > backend is able to.
>
> Well, if they get to one key, they will get to them all, right?

That's only the case if all the keys are accessible to a backend process
which is under a user's control.  That would certainly be a bad
situation and one which I'd hope we would avoid.  If the backend that
the user has access to only has access to a subset of the keys, then
while they might be able to access the other encrypted data, they
wouldn't be able to decrypt it.

> > > > Ideally, all of this would leverage a vaulting system or other mechanism
> > > > which manages access to the keys and allows their usage to be limited.
> > > > That's been generally accepted as a good way to bridge the gap between
> > > > having to ask users every time for a key and having keys stored
> > > > long-term in memory.  Having *only* the keys for the data which the
> > > > currently connected user is allowed to access would certainly be a great
> > > > initial capability, even if system processes (including potentially WAL
> > > > replay) have to have access to all of the keys.  And yes, shared buffers
> > > > being unencrypted and accessible by every backend continues to be an
> > > > issue- it'd be great to improve on that situation too.  I don't think
> > > > having everything encrypted in shared buffers is likely the solution,
> > > > rather, segregating it up might make more sense, again, along similar
> > > > lines to keys and using metadata that's outside of the catalogs, which
> > > > has been discussed previously, though I don't think anyone's actively
> > > > working on it.
> > >
> > > What is this trying to protect against?  Without a clear case, I don't
> > > see what that complexity is buying us.
> >
> > This is trying to protect against cross-domain leakage due to specific
> > security vulnerabilities, similar to those just recently fixed, where a
> > given backend is able to be comrpomised and is able to be used to run
> > any code the attacker wishes inside of that backend's process.
>
> I am not sure TLS is a good solution to that.

... TLS?  Or do you mean TDE here..?

I don't mean to throw this up as a requirement on day one of this
feature, rather I'm trying to show where we could potentially go and why
*this* flexibility (supporting a key per tablespace, specifically)
would make sense and how it could help us build a more secure system in
the future.

Thanks,

Stephen

Attachment
On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > with a known weakness.
> > > 
> > > As discussed below- this is about moving goalposts, and that's, in part
> > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > 
> > I agree we have to allow rekeying and allow multiple unlocked keys in
> > the server at the same time.  The open question is whether encrypting
> > different data with different keys and different unlock controls is
> > possible or useful.
> 
> I'm not sure if there's really a question about if it's *possible*?  As
> for if it's useful, I agree there's some debate.

Right, it is easily possible to keep all keys unlocked, but the value is
minimal, and the complexity will have a cost, which is my point.

> > > amount of data you transmit over a given TLS connection increases
> > > though, the risk increases and it would be better to re-key.  How much
> > > better?  That depends a great deal on if someone is trying to mount an
> > > attack or not.
> > 
> > Yep, we need to allow rekey.
> 
> Supporting a way to rekey is definitely a good idea.

It is a requirement, I think.  We might have problem tracking exactly
what key _version_ each table (or 8k block), or WAL file are.  :-( 
Ideally we would allow only two active keys, and somehow mark each page
as using the odd or even key at a given time, or something strange. 
(Yeah, hand waving here.)

> > Uh, well, renaming the user was a big problem, but that is the only case
> > I can think of.  I don't see that as an issue for block or WAL sequence
> > numbers.  If we want to use a different nonce, we have to find a way to
> > store it or look it up efficiently.  Considering the nonce size, I don't
> > see how that is possible.
> 
> No, this also meant that, as an attacker, I *knew* the salt ahead of
> time and therefore could build rainbow tables specifically for that
> salt.  I could also use those *same* tables for any system where that
> user had an account, even if they used different passwords on different
> systems...

Yes, 'postgres' can be used to create a nice md5 rainbow table that
works on many servers --- good point.  Are rainbow tables possible with
something like AES?

> I appreciate that *some* of this might not be completely relevant for
> the way a nonce is used in cryptography, but I'd be very surprised to
> have a cryptographer tell me that a deterministic nonce didn't have
> similar issues or didn't reduce the value of the nonce significantly.

This post:

    https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm

says:

    GCM is a variation on Counter Mode (CTR).  As you say, with any variant
    of Counter Mode, it is essential  that the Nonce is not repeated with
    the same key.  Hence CTR mode  Nonces often include either a counter or
    a timer element: something that  is guaranteed not to repeat over the
    lifetime of the key.

CTR is what we use for WAL.  8k pages, we would use CBC, which says we
need a random nonce.  I need to dig deeper into ECB mode attack.

> > Uh, well, you are much less likely to get duplicate nonce values by
> > using block number or WAL sequence number.  If you look at the
> > implementations, few compute random nonce values.
> 
> Which implementations..?  Where do their nonce values come from?  I can
> see how a nonce might have to be naturally and deterministically random,
> if the source for it is sufficiently varied across the key space, but
> starting at '1' and going up with the same key seems like it's just
> giving a potential attacker more information about what the encrypted
> data contains...

Well, in many modes the nonce is just a counter, but as stated above,
not all modes.  I need to pull out my security books to remember for
which ones it is safe.  (Frankly, it is a lot easier to use a random
nonce for WAL than 8k pages.)

> > And you base the random goal on what?  Nonce is number used only once,
> > and randomness is not a requirement.  You can say you prefer it, but
> > why, because most implementations don't use random nonce.
> 
> The encryption schemes I've worked with in the past have used a random
> nonce, so I'm wondering where the disconnect is between us on that.

OK.

> > > > > When it comes to concerns about autovacuum or other system processes,
> > > > > those don't have any direct user connections or interactions, so having
> > > > > them be more privileged and having access to more is reasonable.
> > > > 
> > > > Well, I am trying to understand the value of having some keys accessible
> > > > by some parts of the system, and some not.  I am unclear what security
> > > > value that has.
> > > 
> > > A very real risk is a low-privilege process gaining access to the entire
> > > backend process, and therefore being able to access anything that
> > > backend is able to.
> > 
> > Well, if they get to one key, they will get to them all, right?
> 
> That's only the case if all the keys are accessible to a backend process
> which is under a user's control.  That would certainly be a bad
> situation and one which I'd hope we would avoid.  If the backend that
> the user has access to only has access to a subset of the keys, then
> while they might be able to access the other encrypted data, they
> wouldn't be able to decrypt it.

Uh, we already have Postgres security for the data, so what attack
vector has the user reading the RAM, but not seeing all the keys?  Isn't
client-supplied secrets a much better option for this?

> > > > > Ideally, all of this would leverage a vaulting system or other mechanism
> > > > > which manages access to the keys and allows their usage to be limited.
> > > > > That's been generally accepted as a good way to bridge the gap between
> > > > > having to ask users every time for a key and having keys stored
> > > > > long-term in memory.  Having *only* the keys for the data which the
> > > > > currently connected user is allowed to access would certainly be a great
> > > > > initial capability, even if system processes (including potentially WAL
> > > > > replay) have to have access to all of the keys.  And yes, shared buffers
> > > > > being unencrypted and accessible by every backend continues to be an
> > > > > issue- it'd be great to improve on that situation too.  I don't think
> > > > > having everything encrypted in shared buffers is likely the solution,
> > > > > rather, segregating it up might make more sense, again, along similar
> > > > > lines to keys and using metadata that's outside of the catalogs, which
> > > > > has been discussed previously, though I don't think anyone's actively
> > > > > working on it.
> > > > 
> > > > What is this trying to protect against?  Without a clear case, I don't
> > > > see what that complexity is buying us.
> > > 
> > > This is trying to protect against cross-domain leakage due to specific
> > > security vulnerabilities, similar to those just recently fixed, where a
> > > given backend is able to be comrpomised and is able to be used to run
> > > any code the attacker wishes inside of that backend's process.
> > 
> > I am not sure TLS is a good solution to that.
> 
> ... TLS?  Or do you mean TDE here..?

Sorry, yes, TDE.
> 
> I don't mean to throw this up as a requirement on day one of this
> feature, rather I'm trying to show where we could potentially go and why
> *this* flexibility (supporting a key per tablespace, specifically)
> would make sense and how it could help us build a more secure system in
> the future.

Understood.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 7/8/19 6:04 PM, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
>> Uh, well, renaming the user was a big problem, but that is the only case
>> I can think of.  I don't see that as an issue for block or WAL sequence
>> numbers.  If we want to use a different nonce, we have to find a way to
>> store it or look it up efficiently.  Considering the nonce size, I don't
>> see how that is possible.
>
> No, this also meant that, as an attacker, I *knew* the salt ahead of
> time and therefore could build rainbow tables specifically for that
> salt.  I could also use those *same* tables for any system where that
> user had an account, even if they used different passwords on different
> systems...
>
> I appreciate that *some* of this might not be completely relevant for
> the way a nonce is used in cryptography, but I'd be very surprised to
> have a cryptographer tell me that a deterministic nonce didn't have
> similar issues or didn't reduce the value of the nonce significantly.

I have worked side by side on projects with bona fide cryptographers and
I can assure you that they recommended random nonces. Granted, that was
in the early 2000s, but I don't think "modern cryptography" has changed
that any more than "web scale" has made Postgres irrelevant in the
intervening years.

Related links:

https://defuse.ca/cbcmodeiv.htm
https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > > with a known weakness.
> > > >
> > > > As discussed below- this is about moving goalposts, and that's, in part
> > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > >
> > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > the server at the same time.  The open question is whether encrypting
> > > different data with different keys and different unlock controls is
> > > possible or useful.
> >
> > I'm not sure if there's really a question about if it's *possible*?  As
> > for if it's useful, I agree there's some debate.
>
> Right, it is easily possible to keep all keys unlocked, but the value is
> minimal, and the complexity will have a cost, which is my point.

Having them all unlocked but only accessible to certain privileged
processes if very different from having them unlocked and available to
every backend process.

> > > > amount of data you transmit over a given TLS connection increases
> > > > though, the risk increases and it would be better to re-key.  How much
> > > > better?  That depends a great deal on if someone is trying to mount an
> > > > attack or not.
> > >
> > > Yep, we need to allow rekey.
> >
> > Supporting a way to rekey is definitely a good idea.
>
> It is a requirement, I think.  We might have problem tracking exactly
> what key _version_ each table (or 8k block), or WAL file are.  :-(
> Ideally we would allow only two active keys, and somehow mark each page
> as using the odd or even key at a given time, or something strange.
> (Yeah, hand waving here.)

Well, that wouldn't be the ideal since it would limit us to some small
number of GBs of data written, based on the earlier discussion, right?

I'm not sure that I can see through to a system where we are rewriting
tables that are out on disk every time we hit 60GB of data written.

Or maybe I'm misunderstanding what you're suggesting here..?

> > > Uh, well, renaming the user was a big problem, but that is the only case
> > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > numbers.  If we want to use a different nonce, we have to find a way to
> > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > see how that is possible.
> >
> > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > time and therefore could build rainbow tables specifically for that
> > salt.  I could also use those *same* tables for any system where that
> > user had an account, even if they used different passwords on different
> > systems...
>
> Yes, 'postgres' can be used to create a nice md5 rainbow table that
> works on many servers --- good point.  Are rainbow tables possible with
> something like AES?

I'm not a cryptographer, just to be clear...  but it sure seems like if
you know what the nonce is, and a strong idea about at least what some
of the contents are, then you could work to pre-calculate a portian of
the encrypted data and be able to determine the key based on that.

> > I appreciate that *some* of this might not be completely relevant for
> > the way a nonce is used in cryptography, but I'd be very surprised to
> > have a cryptographer tell me that a deterministic nonce didn't have
> > similar issues or didn't reduce the value of the nonce significantly.
>
> This post:
>
>     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>
> says:
>
>     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>     of Counter Mode, it is essential  that the Nonce is not repeated with
>     the same key.  Hence CTR mode  Nonces often include either a counter or
>     a timer element: something that  is guaranteed not to repeat over the
>     lifetime of the key.
>
> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> need a random nonce.  I need to dig deeper into ECB mode attack.

That page also says:

  Using a random IV / nonce for GCM has been specified as an official
  recommendation by - for instance - NIST. If anybody suggests differently
  then that's up to them.

and a recommendation by NIST certainly holds a lot of water, at least
for me.  They also have a recommendation regarding the amount of data to
encrypt with the same key, and that number is much lower than the 96-bit
randomness of the nonce, with a recommendation to use a
cryptographically sound random, meaning that the chances of a duplicate
are extremely low.

> > > Uh, well, you are much less likely to get duplicate nonce values by
> > > using block number or WAL sequence number.  If you look at the
> > > implementations, few compute random nonce values.
> >
> > Which implementations..?  Where do their nonce values come from?  I can
> > see how a nonce might have to be naturally and deterministically random,
> > if the source for it is sufficiently varied across the key space, but
> > starting at '1' and going up with the same key seems like it's just
> > giving a potential attacker more information about what the encrypted
> > data contains...
>
> Well, in many modes the nonce is just a counter, but as stated above,
> not all modes.  I need to pull out my security books to remember for
> which ones it is safe.  (Frankly, it is a lot easier to use a random
> nonce for WAL than 8k pages.)

I do appreciate that, but given the recommendation that you can encrypt
gigabytes before needing to change, I don't know that we really gain a
lot by changing for every 8K page.

> > > And you base the random goal on what?  Nonce is number used only once,
> > > and randomness is not a requirement.  You can say you prefer it, but
> > > why, because most implementations don't use random nonce.
> >
> > The encryption schemes I've worked with in the past have used a random
> > nonce, so I'm wondering where the disconnect is between us on that.
>
> OK.
>
> > > > > > When it comes to concerns about autovacuum or other system processes,
> > > > > > those don't have any direct user connections or interactions, so having
> > > > > > them be more privileged and having access to more is reasonable.
> > > > >
> > > > > Well, I am trying to understand the value of having some keys accessible
> > > > > by some parts of the system, and some not.  I am unclear what security
> > > > > value that has.
> > > >
> > > > A very real risk is a low-privilege process gaining access to the entire
> > > > backend process, and therefore being able to access anything that
> > > > backend is able to.
> > >
> > > Well, if they get to one key, they will get to them all, right?
> >
> > That's only the case if all the keys are accessible to a backend process
> > which is under a user's control.  That would certainly be a bad
> > situation and one which I'd hope we would avoid.  If the backend that
> > the user has access to only has access to a subset of the keys, then
> > while they might be able to access the other encrypted data, they
> > wouldn't be able to decrypt it.
>
> Uh, we already have Postgres security for the data, so what attack
> vector has the user reading the RAM, but not seeing all the keys?  Isn't
> client-supplied secrets a much better option for this?

I'm all for client-supplied secrets, just to be clear, but much of the
point of this effort is to reduce the burden on the application
developers (after all, that's a lot of what we're doing in the data
layer is for...).

The attack vector, as discussed below, is where the attacker has
complete access to the backend process through some exploit that
bypasses the PG security controls.  We'd like to limit the exposure
from such a situation happening, by having large categories which can't
be breached by even an attacker whose completely compromised a backend.

Note that this will almost certainly involve the kernel, and that's why
multiple shared buffers would be needed, to make it so that a given
backend isn't actually able to access all of shared buffers, but rather,
it's only able to access that portion of the filesystem, and that
portion of shared buffers, and those keys which are able to decrypt the
data that they're, broadly, allowed to see.

Thanks,

Stephen

Attachment
On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
> Yes, 'postgres' can be used to create a nice md5 rainbow table that
> works on many servers --- good point.  Are rainbow tables possible with
> something like AES?
> 
> > I appreciate that *some* of this might not be completely relevant for
> > the way a nonce is used in cryptography, but I'd be very surprised to
> > have a cryptographer tell me that a deterministic nonce didn't have
> > similar issues or didn't reduce the value of the nonce significantly.
> 
> This post:
> 
>     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> 
> says:
> 
>     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>     of Counter Mode, it is essential  that the Nonce is not repeated with
>     the same key.  Hence CTR mode  Nonces often include either a counter or
>     a timer element: something that  is guaranteed not to repeat over the
>     lifetime of the key.
> 
> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> need a random nonce.  I need to dig deeper into ECB mode attack.

Looking here:

    https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm

I think the issues is that we can't use a _counter_ for the nonce since
each page-0 of each table would use the same nonce, and each page-1,
etc.  I assume we would use the table oid and page number as the nonce. 
We can't use the database oid since we copy the files from one database
to another via file system copy and not through the shared buffer cache
where they would be re encrypted.  Using relfilenode seems dangerous. 
For WAL I think it would be the WAL segment number.  It would be nice
to mix that with the "Database system identifier:", but are these the
same on primary and replicas?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul  8, 2019 at 06:43:31PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > > > with a known weakness.
> > > > > 
> > > > > As discussed below- this is about moving goalposts, and that's, in part
> > > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > > > 
> > > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > > the server at the same time.  The open question is whether encrypting
> > > > different data with different keys and different unlock controls is
> > > > possible or useful.
> > > 
> > > I'm not sure if there's really a question about if it's *possible*?  As
> > > for if it's useful, I agree there's some debate.
> > 
> > Right, it is easily possible to keep all keys unlocked, but the value is
> > minimal, and the complexity will have a cost, which is my point.
> 
> Having them all unlocked but only accessible to certain privileged
> processes if very different from having them unlocked and available to
> every backend process.

Operationally, how would that work?  We unlock them all on boot but
somehow make them inaccessible to some backends after that?

> > > > > amount of data you transmit over a given TLS connection increases
> > > > > though, the risk increases and it would be better to re-key.  How much
> > > > > better?  That depends a great deal on if someone is trying to mount an
> > > > > attack or not.
> > > > 
> > > > Yep, we need to allow rekey.
> > > 
> > > Supporting a way to rekey is definitely a good idea.
> > 
> > It is a requirement, I think.  We might have problem tracking exactly
> > what key _version_ each table (or 8k block), or WAL file are.  :-( 
> > Ideally we would allow only two active keys, and somehow mark each page
> > as using the odd or even key at a given time, or something strange. 
> > (Yeah, hand waving here.)
> 
> Well, that wouldn't be the ideal since it would limit us to some small
> number of GBs of data written, based on the earlier discussion, right?

No, it is GB per secret-nonce combination.

> I'm not sure that I can see through to a system where we are rewriting
> tables that are out on disk every time we hit 60GB of data written.
> 
> Or maybe I'm misunderstanding what you're suggesting here..?

See above.

> > > > Uh, well, renaming the user was a big problem, but that is the only case
> > > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > > numbers.  If we want to use a different nonce, we have to find a way to
> > > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > > see how that is possible.
> > > 
> > > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > > time and therefore could build rainbow tables specifically for that
> > > salt.  I could also use those *same* tables for any system where that
> > > user had an account, even if they used different passwords on different
> > > systems...
> > 
> > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > works on many servers --- good point.  Are rainbow tables possible with
> > something like AES?
> 
> I'm not a cryptographer, just to be clear...  but it sure seems like if
> you know what the nonce is, and a strong idea about at least what some
> of the contents are, then you could work to pre-calculate a portian of
> the encrypted data and be able to determine the key based on that.

Uh, well, you would think so, but for some reason AES just doesn't allow
that kind of attack, unless you brute force it trying every key.  The
nonce is only to prevent someone from detecting that two output
encryption pages contain the same contents originally.

> > > I appreciate that *some* of this might not be completely relevant for
> > > the way a nonce is used in cryptography, but I'd be very surprised to
> > > have a cryptographer tell me that a deterministic nonce didn't have
> > > similar issues or didn't reduce the value of the nonce significantly.
> > 
> > This post:
> > 
> >     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> > 
> > says:
> > 
> >     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
> >     of Counter Mode, it is essential  that the Nonce is not repeated with
> >     the same key.  Hence CTR mode  Nonces often include either a counter or
> >     a timer element: something that  is guaranteed not to repeat over the
> >     lifetime of the key.
> > 
> > CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> > need a random nonce.  I need to dig deeper into ECB mode attack.
> 
> That page also says:
> 
>   Using a random IV / nonce for GCM has been specified as an official
>   recommendation by - for instance - NIST. If anybody suggests differently
>   then that's up to them.

Well, if we could generate a random nonce easily, we would do that.  The
question is how important is it for our application.

> and a recommendation by NIST certainly holds a lot of water, at least
> for me.  They also have a recommendation regarding the amount of data to

Agreed.

> > Well, in many modes the nonce is just a counter, but as stated above,
> > not all modes.  I need to pull out my security books to remember for
> > which ones it is safe.  (Frankly, it is a lot easier to use a random
> > nonce for WAL than 8k pages.)
> 
> I do appreciate that, but given the recommendation that you can encrypt
> gigabytes before needing to change, I don't know that we really gain a
> lot by changing for every 8K page.

Uh, well, if you don't do that, you need to use the contents of the
previous page for the next page, and I think we want to encrypt each 8k
page independenty of what was before it.

> > Uh, we already have Postgres security for the data, so what attack
> > vector has the user reading the RAM, but not seeing all the keys?  Isn't
> > client-supplied secrets a much better option for this?
> 
> I'm all for client-supplied secrets, just to be clear, but much of the
> point of this effort is to reduce the burden on the application
> developers (after all, that's a lot of what we're doing in the data
> layer is for...).
> 
> The attack vector, as discussed below, is where the attacker has
> complete access to the backend process through some exploit that
> bypasses the PG security controls.  We'd like to limit the exposure
> from such a situation happening, by having large categories which can't
> be breached by even an attacker whose completely compromised a backend.

As far as I know, TDE was to prevent someone with file system access
from reading the data.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 06:43:31PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > > > > with a known weakness.
> > > > > >
> > > > > > As discussed below- this is about moving goalposts, and that's, in part
> > > > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > > > >
> > > > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > > > the server at the same time.  The open question is whether encrypting
> > > > > different data with different keys and different unlock controls is
> > > > > possible or useful.
> > > >
> > > > I'm not sure if there's really a question about if it's *possible*?  As
> > > > for if it's useful, I agree there's some debate.
> > >
> > > Right, it is easily possible to keep all keys unlocked, but the value is
> > > minimal, and the complexity will have a cost, which is my point.
> >
> > Having them all unlocked but only accessible to certain privileged
> > processes if very different from having them unlocked and available to
> > every backend process.
>
> Operationally, how would that work?  We unlock them all on boot but
> somehow make them inaccessible to some backends after that?

That could work and doesn't seem like an insurmountable challenge.  The
way that's been discussed, at least somewhere in the past, is leveraging
the exec backend framework to have the user-connected backends work in
an independent space from the processes launched at startup.

> > > > > > amount of data you transmit over a given TLS connection increases
> > > > > > though, the risk increases and it would be better to re-key.  How much
> > > > > > better?  That depends a great deal on if someone is trying to mount an
> > > > > > attack or not.
> > > > >
> > > > > Yep, we need to allow rekey.
> > > >
> > > > Supporting a way to rekey is definitely a good idea.
> > >
> > > It is a requirement, I think.  We might have problem tracking exactly
> > > what key _version_ each table (or 8k block), or WAL file are.  :-(
> > > Ideally we would allow only two active keys, and somehow mark each page
> > > as using the odd or even key at a given time, or something strange.
> > > (Yeah, hand waving here.)
> >
> > Well, that wouldn't be the ideal since it would limit us to some small
> > number of GBs of data written, based on the earlier discussion, right?
>
> No, it is GB per secret-nonce combination.

Hrmpf.  I'm trying to follow the logic that draws this conclusion.

As I understand it, the NIST recommendation is a 96-bit *random* nonce,
and then there's also a recommendation to not encrypt more than 2^32
messages- much less than the 96-bit random nonce, at least potentially
because that limits the repeat-nonce risk to a very low probability.

If the amount-you-can-encrypt is really per secret+nonce combination,
then how do those recommendations make sense..?  This is where I really
think we should be reading through and understanding exactly what the
NIST recommendations are and not just trying to follow through things on
stackoverflow.

> > I'm not sure that I can see through to a system where we are rewriting
> > tables that are out on disk every time we hit 60GB of data written.
> >
> > Or maybe I'm misunderstanding what you're suggesting here..?
>
> See above.

How long would these keys be active for then in the system..?  How much
data would they potentially be used to encrypt?  Strikes me as likely to
be an awful lot...

> > > > > Uh, well, renaming the user was a big problem, but that is the only case
> > > > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > > > numbers.  If we want to use a different nonce, we have to find a way to
> > > > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > > > see how that is possible.
> > > >
> > > > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > > > time and therefore could build rainbow tables specifically for that
> > > > salt.  I could also use those *same* tables for any system where that
> > > > user had an account, even if they used different passwords on different
> > > > systems...
> > >
> > > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > > works on many servers --- good point.  Are rainbow tables possible with
> > > something like AES?
> >
> > I'm not a cryptographer, just to be clear...  but it sure seems like if
> > you know what the nonce is, and a strong idea about at least what some
> > of the contents are, then you could work to pre-calculate a portian of
> > the encrypted data and be able to determine the key based on that.
>
> Uh, well, you would think so, but for some reason AES just doesn't allow
> that kind of attack, unless you brute force it trying every key.  The
> nonce is only to prevent someone from detecting that two output
> encryption pages contain the same contents originally.

That's certainly interesting, but such a brute-force with every key
would allow it, where, if you use a random nonce, then such an attack
would have to start working only after having access to the data, and
not be something that could be pre-computed.

> > > > I appreciate that *some* of this might not be completely relevant for
> > > > the way a nonce is used in cryptography, but I'd be very surprised to
> > > > have a cryptographer tell me that a deterministic nonce didn't have
> > > > similar issues or didn't reduce the value of the nonce significantly.
> > >
> > > This post:
> > >
> > >     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> > >
> > > says:
> > >
> > >     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
> > >     of Counter Mode, it is essential  that the Nonce is not repeated with
> > >     the same key.  Hence CTR mode  Nonces often include either a counter or
> > >     a timer element: something that  is guaranteed not to repeat over the
> > >     lifetime of the key.
> > >
> > > CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> > > need a random nonce.  I need to dig deeper into ECB mode attack.
> >
> > That page also says:
> >
> >   Using a random IV / nonce for GCM has been specified as an official
> >   recommendation by - for instance - NIST. If anybody suggests differently
> >   then that's up to them.
>
> Well, if we could generate a random nonce easily, we would do that.  The
> question is how important is it for our application.

[...]

> > and a recommendation by NIST certainly holds a lot of water, at least
> > for me.  They also have a recommendation regarding the amount of data to
>
> Agreed.

This is just it though, at least from my perspective- we are saying "ok,
well, we know people recommend using a random nonce, but that's hard, so
we aren't going to do that because we don't think it's important for our
application", but we aren't cryptographers.  I liken this to whatever
discussion lead to using the username as the salt for our md5
authentication method- great intentions, but not complete understanding,
leading to a less-than-desirable result.

When it comes to this stuff, I don't think we really get to pick and
choose what we follow and what we don't.  If the recommendation from an
authority says we should use random nonces, then we *really* need to
listen and do that, because that authority is a bunch of cryptographers
with a lot more experience and who have definitely spent a great deal
more time thinking about this than we have.

If there's a recommendation from such an authority that says we *don't*
need to use a random nonce, great, I'm happy to go review that and agree
with it, but discussions on stackoverflow or similar don't hold the same
weight that a recommendation from NIST does.

> > > Well, in many modes the nonce is just a counter, but as stated above,
> > > not all modes.  I need to pull out my security books to remember for
> > > which ones it is safe.  (Frankly, it is a lot easier to use a random
> > > nonce for WAL than 8k pages.)
> >
> > I do appreciate that, but given the recommendation that you can encrypt
> > gigabytes before needing to change, I don't know that we really gain a
> > lot by changing for every 8K page.
>
> Uh, well, if you don't do that, you need to use the contents of the
> previous page for the next page, and I think we want to encrypt each 8k
> page independenty of what was before it.

I'm not sure that we really want to do this at the 8K level...  I'll
admit that I'm not completely sure *where* to draw that line then
though.

> > > Uh, we already have Postgres security for the data, so what attack
> > > vector has the user reading the RAM, but not seeing all the keys?  Isn't
> > > client-supplied secrets a much better option for this?
> >
> > I'm all for client-supplied secrets, just to be clear, but much of the
> > point of this effort is to reduce the burden on the application
> > developers (after all, that's a lot of what we're doing in the data
> > layer is for...).
> >
> > The attack vector, as discussed below, is where the attacker has
> > complete access to the backend process through some exploit that
> > bypasses the PG security controls.  We'd like to limit the exposure
> > from such a situation happening, by having large categories which can't
> > be breached by even an attacker whose completely compromised a backend.
>
> As far as I know, TDE was to prevent someone with file system access
> from reading the data.

This seems pretty questionable, doesn't it?  Who gets access to a system
without having some access to what's running at the same time?  Perhaps
if the drive is stolen out from under the running system, but then that
could be protected against using filesystem-level encryption.  If we're
trying to mimic that, which by itself would be good, then wouldn't we
want to do so with similar capabilities- that is, by having
per-tablespace keys?  Since that's what someone running with filesystem
level encryption would have.  Of course, if they don't mount all the
filesystems they've got set up then they have problems, but that's their
choice.

In the end, having this bit of flexibility allows us to have the same
level of options that someone using filesystem-level encryption would
have, but it also starts us down the path to having something which
would work against another attack vector where someone has control over
a complete running backend.

Thanks,

Stephen

Attachment
On Mon, Jul  8, 2019 at 07:27:12PM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> > Operationally, how would that work?  We unlock them all on boot but
> > somehow make them inaccessible to some backends after that?
> 
> That could work and doesn't seem like an insurmountable challenge.  The
> way that's been discussed, at least somewhere in the past, is leveraging
> the exec backend framework to have the user-connected backends work in
> an independent space from the processes launched at startup.

Just do it in another cluster --- why bother with all that?

> > > > > > > amount of data you transmit over a given TLS connection increases
> > > > > > > though, the risk increases and it would be better to re-key.  How much
> > > > > > > better?  That depends a great deal on if someone is trying to mount an
> > > > > > > attack or not.
> > > > > > 
> > > > > > Yep, we need to allow rekey.
> > > > > 
> > > > > Supporting a way to rekey is definitely a good idea.
> > > > 
> > > > It is a requirement, I think.  We might have problem tracking exactly
> > > > what key _version_ each table (or 8k block), or WAL file are.  :-( 
> > > > Ideally we would allow only two active keys, and somehow mark each page
> > > > as using the odd or even key at a given time, or something strange. 
> > > > (Yeah, hand waving here.)
> > > 
> > > Well, that wouldn't be the ideal since it would limit us to some small
> > > number of GBs of data written, based on the earlier discussion, right?
> > 
> > No, it is GB per secret-nonce combination.
> 
> Hrmpf.  I'm trying to follow the logic that draws this conclusion.
> 
> As I understand it, the NIST recommendation is a 96-bit *random* nonce,
> and then there's also a recommendation to not encrypt more than 2^32
> messages- much less than the 96-bit random nonce, at least potentially
> because that limits the repeat-nonce risk to a very low probability.
> 
> If the amount-you-can-encrypt is really per secret+nonce combination,
> then how do those recommendations make sense..?  This is where I really
> think we should be reading through and understanding exactly what the
> NIST recommendations are and not just trying to follow through things on
> stackoverflow.

Yes, it needs more research.

> > > I'm not sure that I can see through to a system where we are rewriting
> > > tables that are out on disk every time we hit 60GB of data written.
> > > 
> > > Or maybe I'm misunderstanding what you're suggesting here..?
> > 
> > See above.
> 
> How long would these keys be active for then in the system..?  How much
> data would they potentially be used to encrypt?  Strikes me as likely to
> be an awful lot...

I think we need to look at CTR vs GCM.

> > Uh, well, you would think so, but for some reason AES just doesn't allow
> > that kind of attack, unless you brute force it trying every key.  The
> > nonce is only to prevent someone from detecting that two output
> > encryption pages contain the same contents originally.
> 
> That's certainly interesting, but such a brute-force with every key
> would allow it, where, if you use a random nonce, then such an attack
> would have to start working only after having access to the data, and
> not be something that could be pre-computed.

Uh, the nonce is going to have to be unecrypted so it can be fed into
the crypto method, so it will be visible.

> > > and a recommendation by NIST certainly holds a lot of water, at least
> > > for me.  They also have a recommendation regarding the amount of data to
> > 
> > Agreed.
> 
> This is just it though, at least from my perspective- we are saying "ok,
> well, we know people recommend using a random nonce, but that's hard, so
> we aren't going to do that because we don't think it's important for our
> application", but we aren't cryptographers.  I liken this to whatever
> discussion lead to using the username as the salt for our md5
> authentication method- great intentions, but not complete understanding,
> leading to a less-than-desirable result.
> 
> When it comes to this stuff, I don't think we really get to pick and
> choose what we follow and what we don't.  If the recommendation from an
> authority says we should use random nonces, then we *really* need to
> listen and do that, because that authority is a bunch of cryptographers
> with a lot more experience and who have definitely spent a great deal
> more time thinking about this than we have.
> 
> If there's a recommendation from such an authority that says we *don't*
> need to use a random nonce, great, I'm happy to go review that and agree
> with it, but discussions on stackoverflow or similar don't hold the same
> weight that a recommendation from NIST does.

Yes, we need to get some experts involved.

> > > > Well, in many modes the nonce is just a counter, but as stated above,
> > > > not all modes.  I need to pull out my security books to remember for
> > > > which ones it is safe.  (Frankly, it is a lot easier to use a random
> > > > nonce for WAL than 8k pages.)
> > > 
> > > I do appreciate that, but given the recommendation that you can encrypt
> > > gigabytes before needing to change, I don't know that we really gain a
> > > lot by changing for every 8K page.
> > 
> > Uh, well, if you don't do that, you need to use the contents of the
> > previous page for the next page, and I think we want to encrypt each 8k
> > page independenty of what was before it.
> 
> I'm not sure that we really want to do this at the 8K level...  I'll
> admit that I'm not completely sure *where* to draw that line then
> though.

Uh, if you want more than 8k you will need to have surrounding 8k pages
in shared buffers, which seems unworkable.

> > As far as I know, TDE was to prevent someone with file system access
> > from reading the data.
> 
> This seems pretty questionable, doesn't it?  Who gets access to a system
> without having some access to what's running at the same time?  Perhaps
> if the drive is stolen out from under the running system, but then that
> could be protected against using filesystem-level encryption.  If we're
> trying to mimic that, which by itself would be good, then wouldn't we
> want to do so with similar capabilities- that is, by having
> per-tablespace keys?  Since that's what someone running with filesystem
> level encryption would have.  Of course, if they don't mount all the
> filesystems they've got set up then they have problems, but that's their
> choice.
> 
> In the end, having this bit of flexibility allows us to have the same
> level of options that someone using filesystem-level encryption would
> have, but it also starts us down the path to having something which
> would work against another attack vector where someone has control over
> a complete running backend.

Again, why not just use a different cluster?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hey everyone,

Here is my input regarding nonces and randomness.

> As I understand it, the NIST recommendation is a 96-bit *random* nonce,

I could not find that exact requirement in the NIST documents, though given the volume of that library it would be possible to miss.  The recommendation I repeatedly saw for the nonce was unique.  There is also an important distinction that the nonce is not the Initialization Vector (IV), it can be used as part of the IV, more on that later.

The most succinct definition for nonce I found was in SP-800-38A [1] page 4:  "A value that is only used once."
SP-800-90A [2] (page 6) expands on the definition: "A time-varying value that has at most a negligible chance of repeating, e.g., a random value that is generated anew for each use, a timestamp, a sequence number, or some combination of these."

The second definition references randomness but does not require it.  [1] (pg 19) reinforces the importance of uniqueness:  "A procedure should be established to ensure the uniqueness of the message nonces"


> That's certainly interesting, but such a brute-force with every key
> would allow it, where, if you use a random nonce, then such an attack
> would have to start working only after having access to the data, and
> not be something that could be pre-computed
> to talk about IV's not being secret

An unpredictable IV can be generated using a non-random nonce including a counter, per [1] (pg 20):

"The first method is to apply the forward cipher function, under the same key that is used for the encryption of the
plaintext, to a nonce. The nonce must be a data block that is unique to each execution of the
encryption operation. For example, the nonce may be a counter, as described in Appendix B, or
a message number. The second method is to generate a random data block using a FIPS approved
random number generator."

A unique nonce gets passed through the cipher with the key, the uniqueness of the nonce is the strength with this method and the key + cipher handle the randomness for the IV.  The second method listed above does require a random number generator and if chosen those must conform to [2].

> I'm not a fan of the idea of using something which is predictable as a
> nonce.  Using the username as the salt for our md5 password mechanism
> was, all around, a bad idea.  This seems like it's repeating that
> mistake.

Yeah that MD5 stuff wasn't the greatest.  With MD5 and the username as a salt, the salt is known and you only need to work out the password.  In reality, you only need to find a collision with that password, the high collision rate with MD5 (2^64) [3] made things really bad.  That (collisions) is not a significant problem today with AES to the best of my knowledge.

Further, knowing the nonce gets you nowhere, it isn't the salt until it is ran through the forward cipher with the encryption key.  Even with the nonce the attacker has doesn't know the salt unless they steal the key, and that's a bigger problem.

The strictest definition of nonce I found was in [2] (pg 19) defining nonces to use in the process of random generation:

"The nonce shall be either:
a. A value with at least (security_strength/2) bits of entropy, or
b. A value that is expected to repeat no more often than a (security_strength/2)-bit random
string would be expected to repeat."

Even there it is randomness (a) or uniqueness (b).

[1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
[2] https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-90Ar1.pdf
[3] https://stackoverflow.com/questions/8852668/what-is-the-clash-rate-for-md5

Thanks,

Ryan Lambert
RustProof Labs
On Tue, Jul 9, 2019 at 3:39 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> BTW how do you know this is what users want? Maybe they do, but then
> again - maybe they just see it as magic and don't realize the extra
> complexity (not just at the database level). In my experience users
> generally want more abstract things, like "Ensure data privacy in case
> media theft," or "protection against evil DBA".
>

I think that it's true that user generally want more abstract things
at system design stage so that's why I've been considering the
functionality of TDE based on security standards such PCI DSS. These
might have a high goal but would be good materials to define
requirements that user will want.

BTW I've created a wiki page[1] for TDE summarizing the discussion. I
will keep it up-to-date but please feel free to update it.

[1] https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, Jul 08, 2019 at 06:24:40PM -0400, Joe Conway wrote:
>On 7/8/19 6:04 PM, Stephen Frost wrote:
>> * Bruce Momjian (bruce@momjian.us) wrote:
>>> Uh, well, renaming the user was a big problem, but that is the only case
>>> I can think of.  I don't see that as an issue for block or WAL sequence
>>> numbers.  If we want to use a different nonce, we have to find a way to
>>> store it or look it up efficiently.  Considering the nonce size, I don't
>>> see how that is possible.
>>
>> No, this also meant that, as an attacker, I *knew* the salt ahead of
>> time and therefore could build rainbow tables specifically for that
>> salt.  I could also use those *same* tables for any system where that
>> user had an account, even if they used different passwords on different
>> systems...
>>
>> I appreciate that *some* of this might not be completely relevant for
>> the way a nonce is used in cryptography, but I'd be very surprised to
>> have a cryptographer tell me that a deterministic nonce didn't have
>> similar issues or didn't reduce the value of the nonce significantly.
>
>I have worked side by side on projects with bona fide cryptographers and
>I can assure you that they recommended random nonces. Granted, that was
>in the early 2000s, but I don't think "modern cryptography" has changed
>that any more than "web scale" has made Postgres irrelevant in the
>intervening years.
>
>Related links:
>
>https://defuse.ca/cbcmodeiv.htm
>https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors
>

AFAIK it very much depends on the encryption mode. CBC mode does require
random nonces, other modes may be fine with even sequences as long as
the values are not reused. In which case we might even use the LSN, for
example. And I wonder if sha2(LSN) could be considered "random", but
maybe that's entirely silly idea ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 08, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:
>On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
>> Yes, 'postgres' can be used to create a nice md5 rainbow table that
>> works on many servers --- good point.  Are rainbow tables possible with
>> something like AES?
>>
>> > I appreciate that *some* of this might not be completely relevant for
>> > the way a nonce is used in cryptography, but I'd be very surprised to
>> > have a cryptographer tell me that a deterministic nonce didn't have
>> > similar issues or didn't reduce the value of the nonce significantly.
>>
>> This post:
>>
>>     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>
>> says:
>>
>>     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>>     of Counter Mode, it is essential  that the Nonce is not repeated with
>>     the same key.  Hence CTR mode  Nonces often include either a counter or
>>     a timer element: something that  is guaranteed not to repeat over the
>>     lifetime of the key.
>>
>> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
>> need a random nonce.  I need to dig deeper into ECB mode attack.
>
>Looking here:
>
>    https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>
>I think the issues is that we can't use a _counter_ for the nonce since
>each page-0 of each table would use the same nonce, and each page-1,
>etc.  I assume we would use the table oid and page number as the nonce.
>We can't use the database oid since we copy the files from one database
>to another via file system copy and not through the shared buffer cache
>where they would be re encrypted.  Using relfilenode seems dangerous.
>For WAL I think it would be the WAL segment number.  It would be nice
>to mix that with the "Database system identifier:", but are these the
>same on primary and replicas?
>

Can't we just store the nonce somewhere? What if for encrypted pages we
only use/encrypt (8kB - X bytes), where X bytes is just enough to store
the nonce and maybe some other encryption metadata (key ID?).

This would be similar to the "special" area on a page, except that that
relies on page header which is encrypted (and thus not accessible before
decrypting the page).

So encryption would:

1) encrypt the (8kB - X bytes) with nonce suitable for the used
   encryption mode (sequence, random, ...)

2) store the nonce / key ID etc. to the reserved space

and encryption would

1) look at the encryption metadata at the end (nonce, key ....)

2) decrypt the page using that info

Or maybe we could define a new relation fork for encrypted relations,
storing all this metadata (not sure if we need that just for the main
fork or for all forks including vm, fsm ...)?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 08, 2019 at 05:41:55PM -0400, Bruce Momjian wrote:
>On Mon, Jul  8, 2019 at 09:30:03PM +0200, Tomas Vondra wrote:
>> I think Bruce's proposal was to minimize the time the key is "unlocked"
>> in memory by only unlocking them when the user connects and supplies
>> some sort of secret (passphrase), and remove them from memory when the
>> user disconnects. So there's no way for the auxiliary processes to gain
>> access to those keys, because only the user knows the secret.
>
>I mentioned that because I thought that was the security value that
>people wanted.  While I can see the value, I don't see how it can be
>cleanly accomplished.  Keeping the keys unlocked at all times seems to
>be possible, but of much smaller value.
>
>Part of my goal in this discussion is to reverse the rush to implement
>and pick apart exactly what is possible, and desirable.
>
>> FWIW I have doubts this scheme actually measurably improves privacy in
>> practice, because most busy applications will end up having the keys in
>> the memory all the time anyway.
>
>Yep.
>
>> It also assumes memory is unsafe, i.e. bad actors can read it, and
>> that's probably a valid concern (root access, vulnerabilities etc.). But
>> in that case we already have plenty of issues with data in flight
>> anyway, and I doubt TDE is an answer to that.
>
>Agreed.
>
>> > Ideally, all of this would leverage a vaulting system or other mechanism
>> > which manages access to the keys and allows their usage to be limited.
>> > That's been generally accepted as a good way to bridge the gap between
>> > having to ask users every time for a key and having keys stored
>> > long-term in memory.
>>
>> Right. I agree with this.
>>
>> > Having *only* the keys for the data which the
>> > currently connected user is allowed to access would certainly be a great
>> > initial capability, even if system processes (including potentially WAL
>> > replay) have to have access to all of the keys.  And yes, shared buffers
>> > being unencrypted and accessible by every backend continues to be an
>> > issue- it'd be great to improve on that situation too.  I don't think
>> > having everything encrypted in shared buffers is likely the solution,
>> > rather, segregating it up might make more sense, again, along similar
>> > lines to keys and using metadata that's outside of the catalogs, which
>> > has been discussed previously, though I don't think anyone's actively
>> > working on it.
>> >
>>
>> I very much doubt TDE is a solution to this. Essentially, TDE is a good
>> data-at-rest solution, but this seems more like protecting data during
>> execution. And in that case I think we may need an entirely different
>> encryption scheme.
>
>I thought client-level encryption or pgcrypto-style encryption fits that
>need better.
>

I'm not sure client-level encryption is something people really want.

It essentially means moving a lot of the logic to the client. For
example, when you want to do grouping on joins on encrypted columns, we
can't do that in the database when only the client knows the key. And we
know how terribly half-baked those client-side implementations are, even
without the additional encryption complexity.

So it's more a case of not having a better in-database solution :-(

pgcrypto is a bit ugly and tedious, but in general it's a step in the
right direction. The main issue is (a) the key management (or lack of
it) and (2) essentially decrypting once and then processing plaintext in
the rest of the query.

I think (1) could be improved if we had a key vault of some sorts (which
we might get as part of TDE) and (2) might be improved by having special
encrypted data types, with operations offloaded to a trusted execution
environment (TrustZone, SGX, ...).

But that's a very separate topic, unrelated to TDE.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 8, 2019 at 11:20 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Jul  8, 2019 at 06:04:28PM +0900, Masahiko Sawada wrote:
> > On Sun, Jul 7, 2019 at 1:05 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > What about referential integrity constraints that need to check primary
> > > keys in the encrypted tables?  I also don't see a way of delaying that,
> > > and if you can't do referential integrity into the encrypted tables, it
> > > reduces the value of having encrypted data in the same database rather
> > > than in another database or cluster?
> >
> > I just thought that PostgreSQL's auxiliary processes such as
> > autovacuum, startup, checkpointer, bgwriter should always be able to
> > access all keys because there are already in inside database. Even
> > today these processes don't check any privileges when accessing to
> > data. What security threats we can protect data from by requiring
> > privileges even for auxiliary processes? If this is a security problem
> > isn't it also true for cluster-wide encryption? I guess that processes
> > who have an access privilege on the table can always get the
> > corresponding encryption key. And any processes cannot access an
> > encryption key directly without accessing to a database object.
>
> Well, see my list of three things that users want in an earlier email:
>
>         https://www.postgresql.org/message-id/20190706160514.b67q4f7abcxfdahk@momjian.us
>
> When people are asking for multiple keys (not just for key rotation),
> they are asking to have multiple keys that can be supplied by users only
> when they need to access the data.  Yes, the keys are always in the
> datbase, but the feature request is that they are only unlocked when the
> user needs to access the data.  Obviously, that will not work for
> autovacuum when the encryption is at the block level.

I got your point. I also felt that the client-side encryption or the
encryption during execution (by using pgcrypto with triggers and
views) would fits to these requirements better.

>
> If the key is always unlocked, there is questionable security value of
> having multiple keys, beyond key rotation.
>
> > > I still feel we have not clearly described what the options are:
> > >
> > > 1.  Encrypt everything
> > >
> > > 2.  Encrypt only some tables (for performance reasons), and use only one
> > > key, or use multiple keys to allow for key rotation.  All keys are
> > > always unlocked.
> > >
> > > 3.  Encrypt only some tables with different keys, and the keys are not
> > > always unlocked.
> > >
> > > As Tomas already stated, using tablespaces to distinguish encrypted from
> > > non-encrypted tables doesn't make sense since the file system used for
> > > the storage is immaterial to the encryptions status. An easier way would
> > > be to just add a bit to WAL that would indicate if the rest of the WAL
> > > record is encrypted, though I doubt the performance boost is worth the
> > > complexity.
> >
> > Okay, instead of using tablespaces we can create groups grouping
> > tables being encrypted with the same key. I think the one of the most
> > important point here is to provide a granular encryption feature and
>
> Why is this important?  What are you trying to accomplish?

Because it can not only suppress performance overhead by
encryption/decryption and reduce the amount of data encryption.

The having less number of keys in database would make the
implementation simple, especially for recovery. Since system caches
are not available during recovery we might need a cache mechanism for
keys if we have several thousand keys in database.

>
> > have less the number of keys in database cluster, not to provide per
> > tablespace encryption feature. I'm not going to insist it should be
> > per tablespace encryption.
>
> It is unclear which item you are looking for.  Which number are you
> suggesting from the three listed above in the email URL?
>

Sorry, I'm referring to #2.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On 2019-07-08 18:09, Joe Conway wrote:
> In my mind, and in practice to a
> large extent, a postgres tablespace == a unique mount point.

But a critical difference is that in file systems, a separate mount
point has its own journal.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 7/9/19 4:34 AM, Tomas Vondra wrote:
> On Mon, Jul 08, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:
>>On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
>>> Yes, 'postgres' can be used to create a nice md5 rainbow table that
>>> works on many servers --- good point.  Are rainbow tables possible with
>>> something like AES?
>>>
>>> > I appreciate that *some* of this might not be completely relevant for
>>> > the way a nonce is used in cryptography, but I'd be very surprised to
>>> > have a cryptographer tell me that a deterministic nonce didn't have
>>> > similar issues or didn't reduce the value of the nonce significantly.
>>>
>>> This post:
>>>
>>>     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>>
>>> says:
>>>
>>>     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>>>     of Counter Mode, it is essential  that the Nonce is not repeated with
>>>     the same key.  Hence CTR mode  Nonces often include either a counter or
>>>     a timer element: something that  is guaranteed not to repeat over the
>>>     lifetime of the key.
>>>
>>> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
>>> need a random nonce.  I need to dig deeper into ECB mode attack.
>>
>>Looking here:
>>
>>    https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>
>>I think the issues is that we can't use a _counter_ for the nonce since
>>each page-0 of each table would use the same nonce, and each page-1,
>>etc.  I assume we would use the table oid and page number as the nonce.
>>We can't use the database oid since we copy the files from one database
>>to another via file system copy and not through the shared buffer cache
>>where they would be re encrypted.  Using relfilenode seems dangerous.
>>For WAL I think it would be the WAL segment number.  It would be nice
>>to mix that with the "Database system identifier:", but are these the
>>same on primary and replicas?
>>
>
> Can't we just store the nonce somewhere? What if for encrypted pages we
> only use/encrypt (8kB - X bytes), where X bytes is just enough to store
> the nonce and maybe some other encryption metadata (key ID?).
>
> This would be similar to the "special" area on a page, except that that
> relies on page header which is encrypted (and thus not accessible before
> decrypting the page).
>
> So encryption would:
>
> 1) encrypt the (8kB - X bytes) with nonce suitable for the used
>    encryption mode (sequence, random, ...)
>
> 2) store the nonce / key ID etc. to the reserved space
>
> and encryption would
>
> 1) look at the encryption metadata at the end (nonce, key ....)
>
> 2) decrypt the page using that info

That is pretty much what I had been envisioning.

> Or maybe we could define a new relation fork for encrypted relations,
> storing all this metadata (not sure if we need that just for the main
> fork or for all forks including vm, fsm ...)?

I like the idea of a fork if it is workable.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/9/19 4:23 AM, Tomas Vondra wrote:
> On Mon, Jul 08, 2019 at 06:24:40PM -0400, Joe Conway wrote:
>>On 7/8/19 6:04 PM, Stephen Frost wrote:
>>> * Bruce Momjian (bruce@momjian.us) wrote:
>>>> Uh, well, renaming the user was a big problem, but that is the only case
>>>> I can think of.  I don't see that as an issue for block or WAL sequence
>>>> numbers.  If we want to use a different nonce, we have to find a way to
>>>> store it or look it up efficiently.  Considering the nonce size, I don't
>>>> see how that is possible.
>>>
>>> No, this also meant that, as an attacker, I *knew* the salt ahead of
>>> time and therefore could build rainbow tables specifically for that
>>> salt.  I could also use those *same* tables for any system where that
>>> user had an account, even if they used different passwords on different
>>> systems...
>>>
>>> I appreciate that *some* of this might not be completely relevant for
>>> the way a nonce is used in cryptography, but I'd be very surprised to
>>> have a cryptographer tell me that a deterministic nonce didn't have
>>> similar issues or didn't reduce the value of the nonce significantly.
>>
>>I have worked side by side on projects with bona fide cryptographers and
>>I can assure you that they recommended random nonces. Granted, that was
>>in the early 2000s, but I don't think "modern cryptography" has changed
>>that any more than "web scale" has made Postgres irrelevant in the
>>intervening years.
>>
>>Related links:
>>
>>https://defuse.ca/cbcmodeiv.htm
>>https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors
>>
>
> AFAIK it very much depends on the encryption mode. CBC mode does require
> random nonces, other modes may be fine with even sequences as long as
> the values are not reused. In which case we might even use the LSN, for
> example. And I wonder if sha2(LSN) could be considered "random", but
> maybe that's entirely silly idea ...


Yeah, we worked mostly with CBC so that could be the case in terms of
what is required. But I don't think it is ever a bad idea.

But as Stephen pointed out elsewhere on this thread, I think we should
be getting our guidance from places like NIST, which has actual experts
in this stuff.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> On 2019-07-08 18:09, Joe Conway wrote:
>> In my mind, and in practice to a
>> large extent, a postgres tablespace == a unique mount point.
>
> But a critical difference is that in file systems, a separate mount
> point has its own journal.

While it would be ideal to have separate WAL, and even separate shared
buffer pools, per tablespace, I think that is too much complexity for
the first implementation and we could have a single separate key for all
WAL for now. The main thing I don't think we want is e.g. a 50TB
database with everything encrypted with a single key -- for the reasons
previously stated.

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
Hi Thomas,

> CBC mode does require
> random nonces, other modes may be fine with even sequences as long as
> the values are not reused.   

I disagree that CBC mode requires random nonces, at least based on what NIST has published.  They only require that the IV (not the nonce) must be unpredictable per [1]:

" For the CBC and CFB modes, the IVs must be unpredictable."

The unpredictable IV can be generated from a non-random nonce including a counter:

"There are two recommended methods for generating unpredictable IVs. The first method is to apply the forward cipher function, under the same key that is used for the encryption of the plaintext, to a nonce. The nonce must be a data block that is unique to each execution of the encryption operation. For example, the nonce may be a counter, as described in Appendix B, or a message number."


Thanks,
Ryan Lambert


On 7/9/19 8:39 AM, Ryan Lambert wrote:
> Hi Thomas,
>
>> CBC mode does require
>> random nonces, other modes may be fine with even sequences as long as
>> the values are not reused.   
>
> I disagree that CBC mode requires random nonces, at least based on what
> NIST has published.  They only require that the IV (not the nonce) must
> be unpredictable per [1]:
>
> " For the CBC and CFB modes, the IVs must be unpredictable."
>
> The unpredictable IV can be generated from a non-random nonce including
> a counter:
>
> "There are two recommended methods for generating unpredictable IVs. The
> first method is to apply the forward cipher function, under the same key
> that is used for the encryption of the plaintext, to a nonce. The nonce
> must be a data block that is unique to each execution of the encryption
> operation. For example, the nonce may be a counter, as described in
> Appendix B, or a message number."
>
> [1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf


The terms nonce and IV are often used more-or-less interchangeably, and
it is important to be clear when we are talking about an IV specifically
- an IV is a specific type of nonce. Nonce means "number used once".
i.e. unique, whereas an IV (for CBC use anyway) should be unique and
random but not necessarily kept secret. The NIST requirements that
Stephen referenced elsewhere on this thread are as I understand it
intended to ensure the random but unique property with high probability.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Tue, Jul  9, 2019 at 08:01:35AM -0400, Joe Conway wrote:
> On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> > On 2019-07-08 18:09, Joe Conway wrote:
> >> In my mind, and in practice to a
> >> large extent, a postgres tablespace == a unique mount point.
> > 
> > But a critical difference is that in file systems, a separate mount
> > point has its own journal.
> 
> While it would be ideal to have separate WAL, and even separate shared
> buffer pools, per tablespace, I think that is too much complexity for
> the first implementation and we could have a single separate key for all
> WAL for now. 

Agreed.  I have thought about this some more.  There is certainly value
in layered security, so if something gets violated, it doesn't open the
whole system.  However, I think the layering has to be done at the right
levels, and I think you want levels that have clear boundaries, like IP
filtering or monitoring.  Placing a boundary inside the database seems
much too complex a level to be effective.  Using separate encrypted and
unencrypted clusters and allowing the encrypted cluster to query the
unencrypted clusters using FDWs does seem like good layering, though the
FDW queries might leak information.

> The main thing I don't think we want is e.g. a 50TB
> database with everything encrypted with a single key -- for the reasons
> previously stated.

Yes, I think we need to research in which cases the nonce must be
random, and how much key space the secret+nonce gives us.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul  8, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:
> On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
> > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > works on many servers --- good point.  Are rainbow tables possible with
> > something like AES?
> > 
> > > I appreciate that *some* of this might not be completely relevant for
> > > the way a nonce is used in cryptography, but I'd be very surprised to
> > > have a cryptographer tell me that a deterministic nonce didn't have
> > > similar issues or didn't reduce the value of the nonce significantly.
> > 
> > This post:
> > 
> >     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> > 
> > says:
> > 
> >     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
> >     of Counter Mode, it is essential  that the Nonce is not repeated with
> >     the same key.  Hence CTR mode  Nonces often include either a counter or
> >     a timer element: something that  is guaranteed not to repeat over the
> >     lifetime of the key.
> > 
> > CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> > need a random nonce.  I need to dig deeper into ECB mode attack.
> 
> Looking here:
> 
>     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> 
> I think the issues is that we can't use a _counter_ for the nonce since
> each page-0 of each table would use the same nonce, and each page-1,
> etc.  I assume we would use the table oid and page number as the nonce. 
> We can't use the database oid since we copy the files from one database
> to another via file system copy and not through the shared buffer cache
> where they would be re encrypted.  Using relfilenode seems dangerous. 

FYI, pg_upgrade already preserves the pg_class.oid, which is why I
recommended it over pg_class.relfilenode:


https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_upgrade/pg_upgrade.c;h=ff78929707ef12699a7579274693f6020c54c755;hb=HEAD#l14

    We control all assignments of pg_class.oid (and relfilenode) so toast
    oids are the same between old and new clusters.  This is important
    because toast oids are stored as toast pointers in user tables.
    
    While pg_class.oid and pg_class.relfilenode are initially the same
    in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
    FULL.  In the new cluster, pg_class.oid and pg_class.relfilenode will
    be the same and will match the old pg_class.oid value.  Because of
    this, old/new pg_class.relfilenode values will not match if CLUSTER,
    REINDEX, or VACUUM FULL have been performed in the old cluster.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul  8, 2019 at 09:57:57PM -0600, Ryan Lambert wrote:
> Hey everyone,
> 
> Here is my input regarding nonces and randomness.
> 
> > As I understand it, the NIST recommendation is a 96-bit *random* nonce,
> 
> I could not find that exact requirement in the NIST documents, though given the
> volume of that library it would be possible to miss.  The recommendation I
> repeatedly saw for the nonce was unique.  There is also an important
> distinction that the nonce is not the Initialization Vector (IV), it can be
> used as part of the IV, more on that later.
> 
> The most succinct definition for nonce I found was in SP-800-38A [1] page 4:
>  "A value that is only used once."
> SP-800-90A [2] (page 6) expands on the definition: "A time-varying value that
> has at most a negligible chance of repeating, e.g., a random value that is
> generated anew for each use, a timestamp, a sequence number, or some
> combination of these."
> 
> The second definition references randomness but does not require it.  [1] (pg
> 19) reinforces the importance of uniqueness:  "A procedure should be
> established to ensure the uniqueness of the message nonces"

Yes, that is what I remembered but the URL I referenced stated
randomness is preferred.  I was hopeful that whatever was preferring
randomness was trying to avoid a problem we didn't have.

> Further, knowing the nonce gets you nowhere, it isn't the salt until it is ran
> through the forward cipher with the encryption key.  Even with the nonce the
> attacker has doesn't know the salt unless they steal the key, and that's a
> bigger problem.

Yes, I had forgotten about that step --- good point, meaning that the
nonce for block zero is different for every encryption key.

> The strictest definition of nonce I found was in [2] (pg 19) defining nonces to
> use in the process of random generation:
> 
> "The nonce shall be either:
> a. A value with at least (security_strength/2) bits of entropy, or
> b. A value that is expected to repeat no more often than a (security_strength/
> 2)-bit random
> string would be expected to repeat."
> 
> Even there it is randomness (a) or uniqueness (b).

Thanks, this was very helpful.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Jul  9, 2019 at 10:34:06AM +0200, Tomas Vondra wrote:
> > I think the issues is that we can't use a _counter_ for the nonce since
> > each page-0 of each table would use the same nonce, and each page-1,
> > etc.  I assume we would use the table oid and page number as the nonce.
> > We can't use the database oid since we copy the files from one database
> > to another via file system copy and not through the shared buffer cache
> > where they would be re encrypted.  Using relfilenode seems dangerous.
> > For WAL I think it would be the WAL segment number.  It would be nice
> > to mix that with the "Database system identifier:", but are these the
> > same on primary and replicas?
> > 
> 
> Can't we just store the nonce somewhere? What if for encrypted pages we
> only use/encrypt (8kB - X bytes), where X bytes is just enough to store
> the nonce and maybe some other encryption metadata (key ID?).

Storing the nonce on each 8k page is going to add complexity, so I am
trying to figure out if it is a security requirement.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jul  9, 2019 at 08:01:35AM -0400, Joe Conway wrote:
> > On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> > > On 2019-07-08 18:09, Joe Conway wrote:
> > >> In my mind, and in practice to a
> > >> large extent, a postgres tablespace == a unique mount point.
> > >
> > > But a critical difference is that in file systems, a separate mount
> > > point has its own journal.
> >
> > While it would be ideal to have separate WAL, and even separate shared
> > buffer pools, per tablespace, I think that is too much complexity for
> > the first implementation and we could have a single separate key for all
> > WAL for now.

I agree that all of that isn't necessary for an initial implementation,
I was rather trying to lay out how we could improve on this in the
future and why having the keying done at a tablespace level makes sense
initially because we can then potentially move forward with further
segregation to improve the situation.  I do believe it's also useful in
its own right, to be clear, just not as nice since a compromised backend
could still get access to data in shared buffers that it really
shouldn't be able to, even broadly, see.

> Agreed.  I have thought about this some more.  There is certainly value
> in layered security, so if something gets violated, it doesn't open the
> whole system.  However, I think the layering has to be done at the right
> levels, and I think you want levels that have clear boundaries, like IP
> filtering or monitoring.  Placing a boundary inside the database seems
> much too complex a level to be effective.  Using separate encrypted and
> unencrypted clusters and allowing the encrypted cluster to query the
> unencrypted clusters using FDWs does seem like good layering, though the
> FDW queries might leak information.

Using FDWs simply isn't a solution to this, for a few different reasons-
the first is that our solution to authentication for FDWs is to store
passwords in our catalog tables, but an FDW table also doesn't behave
like a regular table in many important cases.

> > The main thing I don't think we want is e.g. a 50TB
> > database with everything encrypted with a single key -- for the reasons
> > previously stated.
>
> Yes, I think we need to research in which cases the nonce must be
> random, and how much key space the secret+nonce gives us.

Agreed on both.

Thanks,

Stephen

Attachment
On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> I agree that all of that isn't necessary for an initial implementation,
> I was rather trying to lay out how we could improve on this in the
> future and why having the keying done at a tablespace level makes sense
> initially because we can then potentially move forward with further
> segregation to improve the situation.  I do believe it's also useful in
> its own right, to be clear, just not as nice since a compromised backend
> could still get access to data in shared buffers that it really
> shouldn't be able to, even broadly, see.

I think TDE is feature of questionable value at best and the idea that
we would fundmentally change the internals of Postgres to add more
features to it seems very unlikely.  I realize we have to discuss it so
we don't block reasonable future feature development.

> > Agreed.  I have thought about this some more.  There is certainly value
> > in layered security, so if something gets violated, it doesn't open the
> > whole system.  However, I think the layering has to be done at the right
> > levels, and I think you want levels that have clear boundaries, like IP
> > filtering or monitoring.  Placing a boundary inside the database seems
> > much too complex a level to be effective.  Using separate encrypted and
> > unencrypted clusters and allowing the encrypted cluster to query the
> > unencrypted clusters using FDWs does seem like good layering, though the
> > FDW queries might leak information.
> 
> Using FDWs simply isn't a solution to this, for a few different reasons-
> the first is that our solution to authentication for FDWs is to store
> passwords in our catalog tables, but an FDW table also doesn't behave
> like a regular table in many important cases.

The FDW authentication problem is something I think we need to improve
no matter what.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Jul  9, 2019 at 09:16:17AM -0400, Joe Conway wrote:
> On 7/9/19 8:39 AM, Ryan Lambert wrote:
> > Hi Thomas,
> > 
> >> CBC mode does require
> >> random nonces, other modes may be fine with even sequences as long as
> >> the values are not reused.   
> > 
> > I disagree that CBC mode requires random nonces, at least based on what
> > NIST has published.  They only require that the IV (not the nonce) must
> > be unpredictable per [1]:
> > 
> > " For the CBC and CFB modes, the IVs must be unpredictable."
> > 
> > The unpredictable IV can be generated from a non-random nonce including
> > a counter:
> > 
> > "There are two recommended methods for generating unpredictable IVs. The
> > first method is to apply the forward cipher function, under the same key
> > that is used for the encryption of the plaintext, to a nonce. The nonce
> > must be a data block that is unique to each execution of the encryption
> > operation. For example, the nonce may be a counter, as described in
> > Appendix B, or a message number."
> > 
> > [1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> 
> 
> The terms nonce and IV are often used more-or-less interchangeably, and
> it is important to be clear when we are talking about an IV specifically
> - an IV is a specific type of nonce. Nonce means "number used once".
> i.e. unique, whereas an IV (for CBC use anyway) should be unique and
> random but not necessarily kept secret. The NIST requirements that
> Stephen referenced elsewhere on this thread are as I understand it
> intended to ensure the random but unique property with high probability.

Good point about nonce and IV.  I wonder if running the nonce through
the cipher with the key makes it random enough to use as an IV.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > I agree that all of that isn't necessary for an initial implementation,
> > I was rather trying to lay out how we could improve on this in the
> > future and why having the keying done at a tablespace level makes sense
> > initially because we can then potentially move forward with further
> > segregation to improve the situation.  I do believe it's also useful in
> > its own right, to be clear, just not as nice since a compromised backend
> > could still get access to data in shared buffers that it really
> > shouldn't be able to, even broadly, see.
>
> I think TDE is feature of questionable value at best and the idea that
> we would fundmentally change the internals of Postgres to add more
> features to it seems very unlikely.  I realize we have to discuss it so
> we don't block reasonable future feature development.

We'd be getting to something much better than just TDE by going down
that road- we'd be able to properly leverage the kernel to enforce real
MAC.  I get that this would be a change but I'm not entirely convinced
that it'd be as much of a fundamental change as implied here.  I expect
that we're going to get to a point where we want to have multiple shared
buffer segments for other reasons anyway.

> > > Agreed.  I have thought about this some more.  There is certainly value
> > > in layered security, so if something gets violated, it doesn't open the
> > > whole system.  However, I think the layering has to be done at the right
> > > levels, and I think you want levels that have clear boundaries, like IP
> > > filtering or monitoring.  Placing a boundary inside the database seems
> > > much too complex a level to be effective.  Using separate encrypted and
> > > unencrypted clusters and allowing the encrypted cluster to query the
> > > unencrypted clusters using FDWs does seem like good layering, though the
> > > FDW queries might leak information.
> >
> > Using FDWs simply isn't a solution to this, for a few different reasons-
> > the first is that our solution to authentication for FDWs is to store
> > passwords in our catalog tables, but an FDW table also doesn't behave
> > like a regular table in many important cases.
>
> The FDW authentication problem is something I think we need to improve
> no matter what.

Yes, constrained delegation with Kerberos would certainly be an
improvement, and having a way to do something like peer auth when local,
and maybe even a server-to-server trust based on certificates or similar
might be an option.

Thanks,

Stephen

Attachment
On 7/9/19 11:11 AM, Bruce Momjian wrote:
> On Tue, Jul  9, 2019 at 09:16:17AM -0400, Joe Conway wrote:
>> On 7/9/19 8:39 AM, Ryan Lambert wrote:
>> > Hi Thomas,
>> >
>> >> CBC mode does require
>> >> random nonces, other modes may be fine with even sequences as long as
>> >> the values are not reused.   
>> >
>> > I disagree that CBC mode requires random nonces, at least based on what
>> > NIST has published.  They only require that the IV (not the nonce) must
>> > be unpredictable per [1]:
>> >
>> > " For the CBC and CFB modes, the IVs must be unpredictable."
>> >
>> > The unpredictable IV can be generated from a non-random nonce including
>> > a counter:
>> >
>> > "There are two recommended methods for generating unpredictable IVs. The
>> > first method is to apply the forward cipher function, under the same key
>> > that is used for the encryption of the plaintext, to a nonce. The nonce
>> > must be a data block that is unique to each execution of the encryption
>> > operation. For example, the nonce may be a counter, as described in
>> > Appendix B, or a message number."
>> >
>> > [1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
>>
>>
>> The terms nonce and IV are often used more-or-less interchangeably, and
>> it is important to be clear when we are talking about an IV specifically
>> - an IV is a specific type of nonce. Nonce means "number used once".
>> i.e. unique, whereas an IV (for CBC use anyway) should be unique and
>> random but not necessarily kept secret. The NIST requirements that
>> Stephen referenced elsewhere on this thread are as I understand it
>> intended to ensure the random but unique property with high probability.
>
> Good point about nonce and IV.  I wonder if running the nonce through
> the cipher with the key makes it random enough to use as an IV.

Based on that NIST document it seems so.

The trick will be to be 100% sure we never reuse a nonce that is used to
produce the IV when using the same key.

I think the potential to get that wrong (i.e. inadvertently reuse a
nonce) would lead to using the second described method

  "The second method is to generate a random data block using a
   FIPS-approved random number generator."

That method is what I am used to seeing. But with the second method we
need to store the IV, with the first we could reproduce it if we select
our initial nonce carefully.

So thinking out loud, and perhaps you already said this Bruce, but I
guess the input nonce used to generate the IV could be something like
pg_class.oid and blocknum concatenated together with some delimiting
character as long as we guarantee that we generate different keys in
different databases. Then there would be no need to store the IV since
we could reproduce it. This all assumes that we encrypt each block
independently. Sound correct?

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
> On 7/9/19 11:11 AM, Bruce Momjian wrote:
> > Good point about nonce and IV.  I wonder if running the nonce
> > through the cipher with the key makes it random enough to use as an
> > IV.
>
> Based on that NIST document it seems so.
>
> The trick will be to be 100% sure we never reuse a nonce that is used
> to produce the IV when using the same key.
>
> I think the potential to get that wrong (i.e. inadvertently reuse a
> nonce) would lead to using the second described method
>
>   "The second method is to generate a random data block using a
>   FIPS-approved random number generator."
>
> That method is what I am used to seeing. But with the second method
> we need to store the IV, with the first we could reproduce it if we
> select our initial nonce carefully.
>
> So thinking out loud, and perhaps you already said this Bruce, but I
> guess the input nonce used to generate the IV could be something like
> pg_class.oid and blocknum concatenated together with some delimiting
> character as long as we guarantee that we generate different keys in
> different databases. Then there would be no need to store the IV since
> we could reproduce it.

Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
be preserved by pg_upgrade anyway), and the page number.  However,
different databases can have the same pg_class.oid/page number
combination, so there would be duplication between databases.  Now, you
might say let's add the pg_database.oid, but unfortunately, because of
the way we file-system-copy files from one database to another during
database creation (it doesn't go through shared buffers), we can't use
pg_database.oid as part of the nonce.

My only idea here is that we actually decrypt/re-encrypted pages as we
copy them at the file system level during database creation to match the
new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
(I think we will need to modify pg_upgrade to preserve pg_database.oid.)

If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
4-bytes for the block number, which gets us to 32TB before the page
counter would overflow a 4-byte value, and our max table size is 32TB
anyway, so that all works.

> This all assumes that we encrypt each block independently. Sound
> correct?

Yes, I think 8k encryption granularity is a requirement.  If not, you
would need to potentially load and write multiple 8k pages for a single
8k page change, which seems very complex.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 7/9/19 3:50 PM, Bruce Momjian wrote:
> On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
>> On 7/9/19 11:11 AM, Bruce Momjian wrote:
>> > Good point about nonce and IV.  I wonder if running the nonce
>> > through the cipher with the key makes it random enough to use as an
>> > IV.
>>
>> Based on that NIST document it seems so.
>>
>> The trick will be to be 100% sure we never reuse a nonce that is used
>> to produce the IV when using the same key.
>>
>> I think the potential to get that wrong (i.e. inadvertently reuse a
>> nonce) would lead to using the second described method
>>
>>   "The second method is to generate a random data block using a
>>   FIPS-approved random number generator."
>>
>> That method is what I am used to seeing. But with the second method
>> we need to store the IV, with the first we could reproduce it if we
>> select our initial nonce carefully.
>>
>> So thinking out loud, and perhaps you already said this Bruce, but I
>> guess the input nonce used to generate the IV could be something like
>> pg_class.oid and blocknum concatenated together with some delimiting
>> character as long as we guarantee that we generate different keys in
>> different databases. Then there would be no need to store the IV since
>> we could reproduce it.
>
> Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
> be preserved by pg_upgrade anyway), and the page number.  However,
> different databases can have the same pg_class.oid/page number
> combination, so there would be duplication between databases.

But as I said "as long as we guarantee that we generate different keys
in different databases". The IV only needs to be unique for a given key.
the combination of oid and page number when run through the cipher
should always produce a unique IV with high probability. And if we
generate random keys with sufficient entropy the chances of collision
should approach zero.

> If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
> pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
> 4-bytes for the block number, which gets us to 32TB before the page
> counter would overflow a 4-byte value, and our max table size is 32TB
> anyway, so that all works.

The IV will be the same as the algorithm block size (128 bits for AES).
It gets XOR'd with the first block of plaintext in CBC. The nonce used
to produce the IV does not need to be the same size, but but running it
through the cipher the output IV will be the correct size.

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
https://en.wikipedia.org/wiki/Initialization_vector

>> This all assumes that we encrypt each block independently. Sound
>> correct?
>
> Yes, I think 8k encryption granularity is a requirement.  If not, you
> would need to potentially load and write multiple 8k pages for a single
> 8k page change, which seems very complex.

Exactly, and it would also be terrible for performance.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:
>On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
>> On 7/9/19 11:11 AM, Bruce Momjian wrote:
>> > Good point about nonce and IV.  I wonder if running the nonce
>> > through the cipher with the key makes it random enough to use as an
>> > IV.
>>
>> Based on that NIST document it seems so.
>>
>> The trick will be to be 100% sure we never reuse a nonce that is used
>> to produce the IV when using the same key.
>>
>> I think the potential to get that wrong (i.e. inadvertently reuse a
>> nonce) would lead to using the second described method
>>
>>   "The second method is to generate a random data block using a
>>   FIPS-approved random number generator."
>>
>> That method is what I am used to seeing. But with the second method
>> we need to store the IV, with the first we could reproduce it if we
>> select our initial nonce carefully.
>>
>> So thinking out loud, and perhaps you already said this Bruce, but I
>> guess the input nonce used to generate the IV could be something like
>> pg_class.oid and blocknum concatenated together with some delimiting
>> character as long as we guarantee that we generate different keys in
>> different databases. Then there would be no need to store the IV since
>> we could reproduce it.
>
>Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
>be preserved by pg_upgrade anyway), and the page number.  However,
>different databases can have the same pg_class.oid/page number
>combination, so there would be duplication between databases.  Now, you
>might say let's add the pg_database.oid, but unfortunately, because of
>the way we file-system-copy files from one database to another during
>database creation (it doesn't go through shared buffers), we can't use
>pg_database.oid as part of the nonce.
>
>My only idea here is that we actually decrypt/re-encrypted pages as we
>copy them at the file system level during database creation to match the
>new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
>(I think we will need to modify pg_upgrade to preserve pg_database.oid.)
>

Ot you could just encrypt them with a different key, and you would not
need to make database OID part of the nonce.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On 7/9/19 4:12 PM, Tomas Vondra wrote:
> On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:
>>On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:

>>> the input nonce used to generate the IV could be something like
>>> pg_class.oid and blocknum concatenated together with some delimiting
>>> character as long as we guarantee that we generate different keys in
>>> different databases.

<snip>

> Ot you could just encrypt them with a different key, and you would not
> need to make database OID part of the nonce.

Yeah that was pretty much exactly what I was trying to say above ;-)

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 2019-Jul-09, Joe Conway wrote:

> > Ot you could just encrypt them with a different key, and you would not
> > need to make database OID part of the nonce.
> 
> Yeah that was pretty much exactly what I was trying to say above ;-)

So you need to decrypt each file and encrypt again when doing CREATE
DATABASE?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
>On 2019-Jul-09, Joe Conway wrote:
>
>> > Ot you could just encrypt them with a different key, and you would not
>> > need to make database OID part of the nonce.
>>
>> Yeah that was pretty much exactly what I was trying to say above ;-)
>
>So you need to decrypt each file and encrypt again when doing CREATE
>DATABASE?
>

The question is whether we actually need to do that? Do we change OIDs
of relations when creating the database? If not, we don't need to
re-encrypt because having copies of the same block encrypted with the
same nonce is not an issue (just like copying encrypted files is not an
issue).

Of course, we may need a CREATE DATABASE option that would force
re-encryption with a different key, but it's not necessary because of
nonces or whatnot.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On 2019-Jul-09, Tomas Vondra wrote:

> On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
> > On 2019-Jul-09, Joe Conway wrote:
> > 
> > > > Ot you could just encrypt them with a different key, and you would not
> > > > need to make database OID part of the nonce.
> > > 
> > > Yeah that was pretty much exactly what I was trying to say above ;-)
> > 
> > So you need to decrypt each file and encrypt again when doing CREATE
> > DATABASE?
> 
> The question is whether we actually need to do that?

I mean if the new database is supposed to be encrypted with key B, you
can't just copy the files from the other database, since they are
encrypted with key A, right?  Even if you consider that both copies of
each table have the same OID and each block has the same nonce.

> Do we change OIDs of relations when creating the database? If not, we
> don't need to re-encrypt because having copies of the same block
> encrypted with the same nonce is not an issue (just like copying
> encrypted files is not an issue).

Are you thinking that the files can be decrypted by the two keys
somehow?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
> >On 2019-Jul-09, Joe Conway wrote:
> >
> >>> Ot you could just encrypt them with a different key, and you would not
> >>> need to make database OID part of the nonce.
> >>
> >>Yeah that was pretty much exactly what I was trying to say above ;-)
> >
> >So you need to decrypt each file and encrypt again when doing CREATE
> >DATABASE?
>
> The question is whether we actually need to do that? Do we change OIDs
> of relations when creating the database? If not, we don't need to
> re-encrypt because having copies of the same block encrypted with the
> same nonce is not an issue (just like copying encrypted files is not an
> issue).
>
> Of course, we may need a CREATE DATABASE option that would force
> re-encryption with a different key, but it's not necessary because of
> nonces or whatnot.

This also depends on if we actually encrypt the template databases.
Seems like that could be optional, if we're supporting different keys
for different databases.

In that case we'd need the "encrypt this database" option during CREATE
DATABASE, of course.

Thanks,

Stephen

Attachment
On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:
>On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
>> On 7/9/19 11:11 AM, Bruce Momjian wrote:
>> > Good point about nonce and IV.  I wonder if running the nonce
>> > through the cipher with the key makes it random enough to use as an
>> > IV.
>>
>> Based on that NIST document it seems so.
>>
>> The trick will be to be 100% sure we never reuse a nonce that is used
>> to produce the IV when using the same key.
>>
>> I think the potential to get that wrong (i.e. inadvertently reuse a
>> nonce) would lead to using the second described method
>>
>>   "The second method is to generate a random data block using a
>>   FIPS-approved random number generator."
>>
>> That method is what I am used to seeing. But with the second method
>> we need to store the IV, with the first we could reproduce it if we
>> select our initial nonce carefully.
>>
>> So thinking out loud, and perhaps you already said this Bruce, but I
>> guess the input nonce used to generate the IV could be something like
>> pg_class.oid and blocknum concatenated together with some delimiting
>> character as long as we guarantee that we generate different keys in
>> different databases. Then there would be no need to store the IV since
>> we could reproduce it.
>
>Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
>be preserved by pg_upgrade anyway), and the page number.  However,
>different databases can have the same pg_class.oid/page number
>combination, so there would be duplication between databases.  Now, you
>might say let's add the pg_database.oid, but unfortunately, because of
>the way we file-system-copy files from one database to another during
>database creation (it doesn't go through shared buffers), we can't use
>pg_database.oid as part of the nonce.
>
>My only idea here is that we actually decrypt/re-encrypted pages as we
>copy them at the file system level during database creation to match the
>new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
>(I think we will need to modify pg_upgrade to preserve pg_database.oid.)
>
>If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
>pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
>4-bytes for the block number, which gets us to 32TB before the page
>counter would overflow a 4-byte value, and our max table size is 32TB
>anyway, so that all works.
>

I don't think that works, because that'd mean we're encrypting the same
page with the same nonce over and over, which means reusing the reuse
(even if you hash/encrypt it). Or did I miss something?

There are two basic ways to construct nonces - CSPRNG and sequences, and
then a combination of both, i.e. one part is generated from a sequence
and one randomly.

FWIW not sure using OIDs as nonces directly is a good idea, as those are
inherently low entropy data - how often do you see databases with OIDs
above 1M or so? Probably not very often, and in most cases those are
databases where those OIDs are for OIDs and large objects, so irrelevant
for this purpose. I might be wrong but having a 96-bit nonce with maybe
just 32bits of entrophy seems suspicious.

That does not mean we can't use the OIDs at all, but maybe hashing them
into a single 4B value, and then picking the remaining 8B randomly.
Also, we have a "natural" sequence in the database - LSNs, maybe that
would be a good source of nonces too?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Tue, Jul 09, 2019 at 05:31:49PM -0400, Alvaro Herrera wrote:
>On 2019-Jul-09, Tomas Vondra wrote:
>
>> On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
>> > On 2019-Jul-09, Joe Conway wrote:
>> >
>> > > > Ot you could just encrypt them with a different key, and you would not
>> > > > need to make database OID part of the nonce.
>> > >
>> > > Yeah that was pretty much exactly what I was trying to say above ;-)
>> >
>> > So you need to decrypt each file and encrypt again when doing CREATE
>> > DATABASE?
>>
>> The question is whether we actually need to do that?
>
>I mean if the new database is supposed to be encrypted with key B, you
>can't just copy the files from the other database, since they are
>encrypted with key A, right?  Even if you consider that both copies of
>each table have the same OID and each block has the same nonce.
>

Sure, if the databases are supposed to be encrypted with different keys,
then we may need to re-encrypt the files. I don't see a way around that,
but maybe we could use the scheme with master key somehow.

>> Do we change OIDs of relations when creating the database? If not, we
>> don't need to re-encrypt because having copies of the same block
>> encrypted with the same nonce is not an issue (just like copying
>> encrypted files is not an issue).
>
>Are you thinking that the files can be decrypted by the two keys
>somehow?
>

No, I was kinda assuming the database will start with the same key, but
that might have been a silly idea.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On 7/9/19 5:42 PM, Tomas Vondra wrote:
> There are two basic ways to construct nonces - CSPRNG and sequences, and
> then a combination of both, i.e. one part is generated from a sequence
> and one randomly.
>
> FWIW not sure using OIDs as nonces directly is a good idea, as those are
> inherently low entropy data - how often do you see databases with OIDs
> above 1M or so? Probably not very often, and in most cases those are
> databases where those OIDs are for OIDs and large objects, so irrelevant
> for this purpose. I might be wrong but having a 96-bit nonce with maybe
> just 32bits of entrophy seems suspicious.
>
> That does not mean we can't use the OIDs at all, but maybe hashing them
> into a single 4B value, and then picking the remaining 8B randomly.
> Also, we have a "natural" sequence in the database - LSNs, maybe that
> would be a good source of nonces too?

I think you missed the quoted part (upthread) from the NIST document:

  "There are two recommended methods for generating unpredictable IVs.
   The first method is to apply the forward cipher  function, under the
   same key that is used for the encryption of the plaintext, to a
   nonce. The nonce must be a data block that is unique to each
   execution of the encryption operation. For example, the nonce may be
   a counter, as described in Appendix B, or a message number. The
   second method is to generate a random data block using a
   FIPS-approved random number generator."

That first method says a counter as input produces an acceptably
unpredictable IV as long as it is unique to each encryption operation.
If each page is going to be an "encryption operation", so as long as our
input nonce is unique for a given key, we should be ok. If the input
nonce is tableoid+pagenum and the key is different per database (at
least, hopefully different per tablespace too), we should be good to go,
at least from what I can see.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
Greetings,

* Joe Conway (mail@joeconway.com) wrote:
> On 7/9/19 5:42 PM, Tomas Vondra wrote:
> > There are two basic ways to construct nonces - CSPRNG and sequences, and
> > then a combination of both, i.e. one part is generated from a sequence
> > and one randomly.
> >
> > FWIW not sure using OIDs as nonces directly is a good idea, as those are
> > inherently low entropy data - how often do you see databases with OIDs
> > above 1M or so? Probably not very often, and in most cases those are
> > databases where those OIDs are for OIDs and large objects, so irrelevant
> > for this purpose. I might be wrong but having a 96-bit nonce with maybe
> > just 32bits of entrophy seems suspicious.
> >
> > That does not mean we can't use the OIDs at all, but maybe hashing them
> > into a single 4B value, and then picking the remaining 8B randomly.
> > Also, we have a "natural" sequence in the database - LSNs, maybe that
> > would be a good source of nonces too?
>
> I think you missed the quoted part (upthread) from the NIST document:
>
>   "There are two recommended methods for generating unpredictable IVs.
>    The first method is to apply the forward cipher  function, under the
>    same key that is used for the encryption of the plaintext, to a
>    nonce. The nonce must be a data block that is unique to each
>    execution of the encryption operation. For example, the nonce may be
>    a counter, as described in Appendix B, or a message number. The
>    second method is to generate a random data block using a
>    FIPS-approved random number generator."
>
> That first method says a counter as input produces an acceptably
> unpredictable IV as long as it is unique to each encryption operation.
> If each page is going to be an "encryption operation", so as long as our
> input nonce is unique for a given key, we should be ok. If the input
> nonce is tableoid+pagenum and the key is different per database (at
> least, hopefully different per tablespace too), we should be good to go,
> at least from what I can see.

What I think Tomas is getting at here is that we don't write a page only
once.

A nonce of tableoid+pagenum will only be unique the first time we write
out that page.  Seems unlikely that we're only going to be writing these
pages once though- what we need is a nonce that's unique for *every
write* of the 8k page, isn't it?  As every write of the page is going to
be encrypting something new.

With sufficient randomness, we can at least be more likely to have a
unique nonce for each 8K write.  Including the LSN seems like it'd be a
possible alternative.

Thanks,

Stephen

Attachment
> What I think Tomas is getting at here is that we don't write a page only
> once.

> A nonce of tableoid+pagenum will only be unique the first time we write
> out that page.  Seems unlikely that we're only going to be writing these
> pages once though- what we need is a nonce that's unique for *every
> write* of the 8k page, isn't it?  As every write of the page is going to
>  be encrypting something new.

> With sufficient randomness, we can at least be more likely to have a
> unique nonce for each 8K write.  Including the LSN seems like it'd be a
> possible alternative.

Agreed.  I know little of the inner details about the LSN but what I read in [1] sounds encouraging in addition to tableoid + pagenum.


Ryan Lambert


Greetings,

* Ryan Lambert (ryan@rustprooflabs.com) wrote:
> > What I think Tomas is getting at here is that we don't write a page only
> > once.
>
> > A nonce of tableoid+pagenum will only be unique the first time we write
> > out that page.  Seems unlikely that we're only going to be writing these
> > pages once though- what we need is a nonce that's unique for *every
> > write* of the 8k page, isn't it?  As every write of the page is going to
> >  be encrypting something new.
>
> > With sufficient randomness, we can at least be more likely to have a
> > unique nonce for each 8K write.  Including the LSN seems like it'd be a
> > possible alternative.
>
> Agreed.  I know little of the inner details about the LSN but what I read
> in [1] sounds encouraging in addition to tableoid + pagenum.
>
> [1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html

Yes, but it's still something that we'd have to store somewhere- the
actual LSN of the page is going to be in the 8K block.

Unless we decide that we can pull the LSN *out* of the 8K block and
store it unencrypted, and then store the *rest* of the block
encrypted...  That might also allow things like backup software to work
on these encrypted data files for page-level backups without needing
access to the key and that'd be pretty neat.

Of course, as with anything, the more data you expose, the higher the
overall risk that someone can figure out some meaning from it.  Still,
if the idea was that we'd use the LSN in this way, then it'd need to be
stored unencrypted regardless...

Thanks,

Stephen

Attachment
If a random number were generated instead its result would need to be stored somewhere too, correct?

> That might also allow things like backup software to work
> on these encrypted data files for page-level backups without needing
> access to the key and that'd be pretty neat.

+1

Ryan


 
Greetings,

* Ryan Lambert (ryan@rustprooflabs.com) wrote:
> If a random number were generated instead its result would need to be
> stored somewhere too, correct?

Yes.

Thanks,

Stephen

Attachment
On Tue, Jul 9, 2019 at 9:01 PM Joe Conway <mail@joeconway.com> wrote:
>
> On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> > On 2019-07-08 18:09, Joe Conway wrote:
> >> In my mind, and in practice to a
> >> large extent, a postgres tablespace == a unique mount point.
> >
> > But a critical difference is that in file systems, a separate mount
> > point has its own journal.
>
> While it would be ideal to have separate WAL, and even separate shared
> buffer pools, per tablespace, I think that is too much complexity for
> the first implementation and we could have a single separate key for all
> WAL for now.

If we encrypt different tables with different keys I think we need to
encrypt WAL with the same keys as we used for tables, as per
discussion so far. And we would need to encrypt each WAL records, not
whole WAL 8k pages.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, Jul 9, 2019 at 10:16 PM Joe Conway <mail@joeconway.com> wrote:
>
> On 7/9/19 8:39 AM, Ryan Lambert wrote:
> > Hi Thomas,
> >
> >> CBC mode does require
> >> random nonces, other modes may be fine with even sequences as long as
> >> the values are not reused.
> >
> > I disagree that CBC mode requires random nonces, at least based on what
> > NIST has published.  They only require that the IV (not the nonce) must
> > be unpredictable per [1]:
> >
> > " For the CBC and CFB modes, the IVs must be unpredictable."
> >
> > The unpredictable IV can be generated from a non-random nonce including
> > a counter:
> >
> > "There are two recommended methods for generating unpredictable IVs. The
> > first method is to apply the forward cipher function, under the same key
> > that is used for the encryption of the plaintext, to a nonce. The nonce
> > must be a data block that is unique to each execution of the encryption
> > operation. For example, the nonce may be a counter, as described in
> > Appendix B, or a message number."
> >
> > [1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
>
>
> The terms nonce and IV are often used more-or-less interchangeably, and
> it is important to be clear when we are talking about an IV specifically
> - an IV is a specific type of nonce. Nonce means "number used once".
> i.e. unique, whereas an IV (for CBC use anyway) should be unique and
> random but not necessarily kept secret.

FWIW, it seems that predictable IVs can sometimes be harmful. See


https://crypto.stackexchange.com/questions/3499/why-cant-the-iv-be-predictable-when-its-said-it-doesnt-need-to-be-a-secret

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Wed, Jul 10, 2019 at 11:06 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Ryan Lambert (ryan@rustprooflabs.com) wrote:
> > > What I think Tomas is getting at here is that we don't write a page only
> > > once.
> >
> > > A nonce of tableoid+pagenum will only be unique the first time we write
> > > out that page.  Seems unlikely that we're only going to be writing these
> > > pages once though- what we need is a nonce that's unique for *every
> > > write* of the 8k page, isn't it?  As every write of the page is going to
> > >  be encrypting something new.
> >
> > > With sufficient randomness, we can at least be more likely to have a
> > > unique nonce for each 8K write.  Including the LSN seems like it'd be a
> > > possible alternative.
> >
> > Agreed.  I know little of the inner details about the LSN but what I read
> > in [1] sounds encouraging in addition to tableoid + pagenum.
> >
> > [1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html
>
> Yes, but it's still something that we'd have to store somewhere- the
> actual LSN of the page is going to be in the 8K block.

Can we use CBC-ESSIV[1] or XTS[2] instead? IIUC with these modes we
can use table oid and page number for IV or tweak and we don't need to
change them each time to encrypt pages.

[1] https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_.28ESSIV.29
[2]
https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_(XTS)

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Joe Conway <mail@joeconway.com> wrote:

> On 7/8/19 6:04 PM, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> >> Uh, well, renaming the user was a big problem, but that is the only case
> >> I can think of.  I don't see that as an issue for block or WAL sequence
> >> numbers.  If we want to use a different nonce, we have to find a way to
> >> store it or look it up efficiently.  Considering the nonce size, I don't
> >> see how that is possible.
> >
> > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > time and therefore could build rainbow tables specifically for that
> > salt.  I could also use those *same* tables for any system where that
> > user had an account, even if they used different passwords on different
> > systems...
> >
> > I appreciate that *some* of this might not be completely relevant for
> > the way a nonce is used in cryptography, but I'd be very surprised to
> > have a cryptographer tell me that a deterministic nonce didn't have
> > similar issues or didn't reduce the value of the nonce significantly.
>
> I have worked side by side on projects with bona fide cryptographers and
> I can assure you that they recommended random nonces. Granted, that was
> in the early 2000s, but I don't think "modern cryptography" has changed
> that any more than "web scale" has made Postgres irrelevant in the
> intervening years.

I think that particular threads have to be considered.

> Related links:

> https://defuse.ca/cbcmodeiv.htm
> https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors

The first one looks more in-depth than the other one, so I focused on it:

* "Statistical Correlations between IV and Plaintext"

My understanding is that predictability of the IV (in our implementation of
full-instance encryption [1] we derive the IV from RelFileNode combined with
block number) can reveal information about the first encryption block (16
bytes) of the page, i.e. part of the PageHeaderData structure. I don't think
this leaks any valuable data. And starting the 2nd block, the IV is not
predictable because it is cipher text of the previous block.

* "Chosen-Plaintext Attacks"

The question here is whether we expect the OS admin to have access to the
database. In [1] we currently don't (cloud, where DBA has no control over the
storage layer is the main use case), but if it appears to be the requirement,
I believe CBC-ESSIV mode [2] can fix the problem.

Anyway, I'm not sure if this kind of attack can reveal more information than
something about the first block of the page (the page header), since each of
the following blocks uses ciphertext of the previous block as the IV.

* "Altering the IV Before Decryption"

I don't think this attack needs special attention - page checksums should
reveal it.


[1] https://commitfest.postgresql.org/23/2104/
[2] https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_.28ESSIV.29

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:
> >On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
> >> On 7/9/19 11:11 AM, Bruce Momjian wrote:
> >> > Good point about nonce and IV.  I wonder if running the nonce
> >> > through the cipher with the key makes it random enough to use as an
> >> > IV.
> >>
> >> Based on that NIST document it seems so.
> >>
> >> The trick will be to be 100% sure we never reuse a nonce that is used
> >> to produce the IV when using the same key.
> >>
> >> I think the potential to get that wrong (i.e. inadvertently reuse a
> >> nonce) would lead to using the second described method
> >>
> >>   "The second method is to generate a random data block using a
> >>   FIPS-approved random number generator."
> >>
> >> That method is what I am used to seeing. But with the second method
> >> we need to store the IV, with the first we could reproduce it if we
> >> select our initial nonce carefully.
> >>
> >> So thinking out loud, and perhaps you already said this Bruce, but I
> >> guess the input nonce used to generate the IV could be something like
> >> pg_class.oid and blocknum concatenated together with some delimiting
> >> character as long as we guarantee that we generate different keys in
> >> different databases. Then there would be no need to store the IV since
> >> we could reproduce it.
> >
> >Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
> >be preserved by pg_upgrade anyway), and the page number.  However,
> >different databases can have the same pg_class.oid/page number
> >combination, so there would be duplication between databases.  Now, you
> >might say let's add the pg_database.oid, but unfortunately, because of
> >the way we file-system-copy files from one database to another during
> >database creation (it doesn't go through shared buffers), we can't use
> >pg_database.oid as part of the nonce.
> >
> >My only idea here is that we actually decrypt/re-encrypted pages as we
> >copy them at the file system level during database creation to match the
> >new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
> >(I think we will need to modify pg_upgrade to preserve pg_database.oid.)
> >
> >If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
> >pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
> >4-bytes for the block number, which gets us to 32TB before the page
> >counter would overflow a 4-byte value, and our max table size is 32TB
> >anyway, so that all works.
> >
> 
> I don't think that works, because that'd mean we're encrypting the same
> page with the same nonce over and over, which means reusing the reuse
> (even if you hash/encrypt it). Or did I miss something?

I found out that it's wrong to use the same key (or (key, IV) pair) to encrypt
different plain texts [1], however this is about *stream cipher*. There should
be some evidence that *block cipher* has similar weakness before we accept
another restriction on the IV setup.

[1] https://en.wikipedia.org/wiki/Stream_cipher_attacks#Reused_key_attack

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



On 7/9/19 7:28 PM, Stephen Frost wrote:
> Greetings,
>
> * Joe Conway (mail@joeconway.com) wrote:
>> On 7/9/19 5:42 PM, Tomas Vondra wrote:
>> > There are two basic ways to construct nonces - CSPRNG and sequences, and
>> > then a combination of both, i.e. one part is generated from a sequence
>> > and one randomly.
>> >
>> > FWIW not sure using OIDs as nonces directly is a good idea, as those are
>> > inherently low entropy data - how often do you see databases with OIDs
>> > above 1M or so? Probably not very often, and in most cases those are
>> > databases where those OIDs are for OIDs and large objects, so irrelevant
>> > for this purpose. I might be wrong but having a 96-bit nonce with maybe
>> > just 32bits of entrophy seems suspicious.
>> >
>> > That does not mean we can't use the OIDs at all, but maybe hashing them
>> > into a single 4B value, and then picking the remaining 8B randomly.
>> > Also, we have a "natural" sequence in the database - LSNs, maybe that
>> > would be a good source of nonces too?
>>
>> I think you missed the quoted part (upthread) from the NIST document:
>>
>>   "There are two recommended methods for generating unpredictable IVs.
>>    The first method is to apply the forward cipher  function, under the
>>    same key that is used for the encryption of the plaintext, to a
>>    nonce. The nonce must be a data block that is unique to each
>>    execution of the encryption operation. For example, the nonce may be
>>    a counter, as described in Appendix B, or a message number. The
>>    second method is to generate a random data block using a
>>    FIPS-approved random number generator."
>>
>> That first method says a counter as input produces an acceptably
>> unpredictable IV as long as it is unique to each encryption operation.
>> If each page is going to be an "encryption operation", so as long as our
>> input nonce is unique for a given key, we should be ok. If the input
>> nonce is tableoid+pagenum and the key is different per database (at
>> least, hopefully different per tablespace too), we should be good to go,
>> at least from what I can see.
>
> What I think Tomas is getting at here is that we don't write a page only
> once.
>
> A nonce of tableoid+pagenum will only be unique the first time we write
> out that page.  Seems unlikely that we're only going to be writing these
> pages once though- what we need is a nonce that's unique for *every
> write* of the 8k page, isn't it?  As every write of the page is going to
> be encrypting something new.


Hmm, good point. I'm not entirely sure it would be required if the two
page versions don't exist at the same time, but I guess backups mean
that it would, so yeah.

> With sufficient randomness, we can at least be more likely to have a
> unique nonce for each 8K write.  Including the LSN seems like it'd be a
> possible alternative.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/9/19 10:06 PM, Stephen Frost wrote:
> Greetings,
>
> * Ryan Lambert (ryan@rustprooflabs.com) wrote:
>> > What I think Tomas is getting at here is that we don't write a page only
>> > once.
>>
>> > A nonce of tableoid+pagenum will only be unique the first time we write
>> > out that page.  Seems unlikely that we're only going to be writing these
>> > pages once though- what we need is a nonce that's unique for *every
>> > write* of the 8k page, isn't it?  As every write of the page is going to
>> >  be encrypting something new.
>>
>> > With sufficient randomness, we can at least be more likely to have a
>> > unique nonce for each 8K write.  Including the LSN seems like it'd be a
>> > possible alternative.
>>
>> Agreed.  I know little of the inner details about the LSN but what I read
>> in [1] sounds encouraging in addition to tableoid + pagenum.
>>
>> [1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html
>
> Yes, but it's still something that we'd have to store somewhere- the
> actual LSN of the page is going to be in the 8K block.
>
> Unless we decide that we can pull the LSN *out* of the 8K block and
> store it unencrypted, and then store the *rest* of the block
> encrypted...  That might also allow things like backup software to work
> on these encrypted data files for page-level backups without needing
> access to the key and that'd be pretty neat.
>
> Of course, as with anything, the more data you expose, the higher the
> overall risk that someone can figure out some meaning from it.  Still,
> if the idea was that we'd use the LSN in this way, then it'd need to be
> stored unencrypted regardless...

I don't think we are going to be able to eliminate every possible
side-channel anyway -- this seems like a good compromise to me.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/10/19 2:40 AM, Masahiko Sawada wrote:
> On Tue, Jul 9, 2019 at 10:16 PM Joe Conway <mail@joeconway.com> wrote:
>>
>> On 7/9/19 8:39 AM, Ryan Lambert wrote:
>> > Hi Thomas,
>> >
>> >> CBC mode does require
>> >> random nonces, other modes may be fine with even sequences as long as
>> >> the values are not reused.
>> >
>> > I disagree that CBC mode requires random nonces, at least based on what
>> > NIST has published.  They only require that the IV (not the nonce) must
>> > be unpredictable per [1]:
>> >
>> > " For the CBC and CFB modes, the IVs must be unpredictable."
>> >
>> > The unpredictable IV can be generated from a non-random nonce including
>> > a counter:
>> >
>> > "There are two recommended methods for generating unpredictable IVs. The
>> > first method is to apply the forward cipher function, under the same key
>> > that is used for the encryption of the plaintext, to a nonce. The nonce
>> > must be a data block that is unique to each execution of the encryption
>> > operation. For example, the nonce may be a counter, as described in
>> > Appendix B, or a message number."
>> >
>> > [1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
>>
>>
>> The terms nonce and IV are often used more-or-less interchangeably, and
>> it is important to be clear when we are talking about an IV specifically
>> - an IV is a specific type of nonce. Nonce means "number used once".
>> i.e. unique, whereas an IV (for CBC use anyway) should be unique and
>> random but not necessarily kept secret.
>
> FWIW, it seems that predictable IVs can sometimes be harmful. See


Yes, for CBC as I said above "IV ... should be unique and random but not
necessarily kept secret". You can argue if the word "random" should read
"unpredictable" instead, but that was the intention.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/10/19 2:38 AM, Masahiko Sawada wrote:
> On Tue, Jul 9, 2019 at 9:01 PM Joe Conway <mail@joeconway.com> wrote:
>>
>> On 7/9/19 6:07 AM, Peter Eisentraut wrote:
>> > On 2019-07-08 18:09, Joe Conway wrote:
>> >> In my mind, and in practice to a
>> >> large extent, a postgres tablespace == a unique mount point.
>> >
>> > But a critical difference is that in file systems, a separate mount
>> > point has its own journal.
>>
>> While it would be ideal to have separate WAL, and even separate shared
>> buffer pools, per tablespace, I think that is too much complexity for
>> the first implementation and we could have a single separate key for all
>> WAL for now.
>
> If we encrypt different tables with different keys I think we need to
> encrypt WAL with the same keys as we used for tables, as per
> discussion so far. And we would need to encrypt each WAL records, not
> whole WAL 8k pages.

That is not a technical requirement to be sure. We may decide we want
that from a security perspective, but that point is debatable. There
have been different goals expressed on this thread:

1. Keep user 1 from decrypting data A and user 2 from decrypting data B
2. Limit the amount of data encrypted with key Kn

We can use K1 for A, K2 for B, and K3 for WAL and achieve goal #2. As
Stephen pointed out, goal #1 would be great to have, but I am not sure
there is consensus that it is required, at least not for the initial
implementation.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/10/19 4:47 AM, Antonin Houska wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I don't think that works, because that'd mean we're encrypting the same
>> page with the same nonce over and over, which means reusing the reuse
>> (even if you hash/encrypt it). Or did I miss something?
>
> I found out that it's wrong to use the same key (or (key, IV) pair) to encrypt
> different plain texts [1], however this is about *stream cipher*. There should
> be some evidence that *block cipher* has similar weakness before we accept
> another restriction on the IV setup.
>
> [1] https://en.wikipedia.org/wiki/Stream_cipher_attacks#Reused_key_attack

There is plenty of guidance that specifies CBC requires unique,
unpredictable, but not necessarily secret IV. See for example Appendix C
here:

https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/10/19 4:24 AM, Antonin Houska wrote:
> Joe Conway <mail@joeconway.com> wrote:
>
>> On 7/8/19 6:04 PM, Stephen Frost wrote:
>> > * Bruce Momjian (bruce@momjian.us) wrote:
>> >> Uh, well, renaming the user was a big problem, but that is the only case
>> >> I can think of.  I don't see that as an issue for block or WAL sequence
>> >> numbers.  If we want to use a different nonce, we have to find a way to
>> >> store it or look it up efficiently.  Considering the nonce size, I don't
>> >> see how that is possible.
>> >
>> > No, this also meant that, as an attacker, I *knew* the salt ahead of
>> > time and therefore could build rainbow tables specifically for that
>> > salt.  I could also use those *same* tables for any system where that
>> > user had an account, even if they used different passwords on different
>> > systems...
>> >
>> > I appreciate that *some* of this might not be completely relevant for
>> > the way a nonce is used in cryptography, but I'd be very surprised to
>> > have a cryptographer tell me that a deterministic nonce didn't have
>> > similar issues or didn't reduce the value of the nonce significantly.
>>
>> I have worked side by side on projects with bona fide cryptographers and
>> I can assure you that they recommended random nonces. Granted, that was
>> in the early 2000s, but I don't think "modern cryptography" has changed
>> that any more than "web scale" has made Postgres irrelevant in the
>> intervening years.
>
> I think that particular threads have to be considered.
>
>> Related links:
>
>> https://defuse.ca/cbcmodeiv.htm
>> https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors
>
> The first one looks more in-depth than the other one, so I focused on it:
>
> * "Statistical Correlations between IV and Plaintext"
>
> My understanding is that predictability of the IV (in our implementation of
> full-instance encryption [1] we derive the IV from RelFileNode combined with
> block number) can reveal information about the first encryption block (16
> bytes) of the page, i.e. part of the PageHeaderData structure. I don't think
> this leaks any valuable data. And starting the 2nd block, the IV is not
> predictable because it is cipher text of the previous block.
>
> * "Chosen-Plaintext Attacks"
>
> The question here is whether we expect the OS admin to have access to the
> database. In [1] we currently don't (cloud, where DBA has no control over the
> storage layer is the main use case), but if it appears to be the requirement,
> I believe CBC-ESSIV mode [2] can fix the problem.
>
> Anyway, I'm not sure if this kind of attack can reveal more information than
> something about the first block of the page (the page header), since each of
> the following blocks uses ciphertext of the previous block as the IV.
>
> * "Altering the IV Before Decryption"
>
> I don't think this attack needs special attention - page checksums should
> reveal it.
>
>
> [1] https://commitfest.postgresql.org/23/2104/
> [2] https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_.28ESSIV.29
>

Please see my other reply (and
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
appendix C as pointed out by Ryan downthread).

At least in my mind, I trust a published specification from the
nation-state level over random blogs or wikipedia. If we can find some
equivalent published standards that contradict NIST we should discuss
it, but for my money I would prefer to stick with the NIST recommended
method to produce the IVs.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
Greetings,

* Joe Conway (mail@joeconway.com) wrote:
> On 7/9/19 7:28 PM, Stephen Frost wrote:
> > * Joe Conway (mail@joeconway.com) wrote:
> >> On 7/9/19 5:42 PM, Tomas Vondra wrote:
> >> > There are two basic ways to construct nonces - CSPRNG and sequences, and
> >> > then a combination of both, i.e. one part is generated from a sequence
> >> > and one randomly.
> >> >
> >> > FWIW not sure using OIDs as nonces directly is a good idea, as those are
> >> > inherently low entropy data - how often do you see databases with OIDs
> >> > above 1M or so? Probably not very often, and in most cases those are
> >> > databases where those OIDs are for OIDs and large objects, so irrelevant
> >> > for this purpose. I might be wrong but having a 96-bit nonce with maybe
> >> > just 32bits of entrophy seems suspicious.
> >> >
> >> > That does not mean we can't use the OIDs at all, but maybe hashing them
> >> > into a single 4B value, and then picking the remaining 8B randomly.
> >> > Also, we have a "natural" sequence in the database - LSNs, maybe that
> >> > would be a good source of nonces too?
> >>
> >> I think you missed the quoted part (upthread) from the NIST document:
> >>
> >>   "There are two recommended methods for generating unpredictable IVs.
> >>    The first method is to apply the forward cipher  function, under the
> >>    same key that is used for the encryption of the plaintext, to a
> >>    nonce. The nonce must be a data block that is unique to each
> >>    execution of the encryption operation. For example, the nonce may be
> >>    a counter, as described in Appendix B, or a message number. The
> >>    second method is to generate a random data block using a
> >>    FIPS-approved random number generator."
> >>
> >> That first method says a counter as input produces an acceptably
> >> unpredictable IV as long as it is unique to each encryption operation.
> >> If each page is going to be an "encryption operation", so as long as our
> >> input nonce is unique for a given key, we should be ok. If the input
> >> nonce is tableoid+pagenum and the key is different per database (at
> >> least, hopefully different per tablespace too), we should be good to go,
> >> at least from what I can see.
> >
> > What I think Tomas is getting at here is that we don't write a page only
> > once.
> >
> > A nonce of tableoid+pagenum will only be unique the first time we write
> > out that page.  Seems unlikely that we're only going to be writing these
> > pages once though- what we need is a nonce that's unique for *every
> > write* of the 8k page, isn't it?  As every write of the page is going to
> > be encrypting something new.
>
> Hmm, good point. I'm not entirely sure it would be required if the two
> page versions don't exist at the same time, but I guess backups mean
> that it would, so yeah.

Uh, or an attacker got a copy of the page and then just waited a few
minutes for a new version to be written and then grabbed that...

Definitely not limited to just concerns about the fact that other
versions would exist in backups too.

Thanks,

Stephen

Attachment
On 7/10/19 2:45 AM, Masahiko Sawada wrote:
> On Wed, Jul 10, 2019 at 11:06 AM Stephen Frost <sfrost@snowman.net> wrote:
>>
>> Greetings,
>>
>> * Ryan Lambert (ryan@rustprooflabs.com) wrote:
>> > > What I think Tomas is getting at here is that we don't write a page only
>> > > once.
>> >
>> > > A nonce of tableoid+pagenum will only be unique the first time we write
>> > > out that page.  Seems unlikely that we're only going to be writing these
>> > > pages once though- what we need is a nonce that's unique for *every
>> > > write* of the 8k page, isn't it?  As every write of the page is going to
>> > >  be encrypting something new.
>> >
>> > > With sufficient randomness, we can at least be more likely to have a
>> > > unique nonce for each 8K write.  Including the LSN seems like it'd be a
>> > > possible alternative.
>> >
>> > Agreed.  I know little of the inner details about the LSN but what I read
>> > in [1] sounds encouraging in addition to tableoid + pagenum.
>> >
>> > [1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html
>>
>> Yes, but it's still something that we'd have to store somewhere- the
>> actual LSN of the page is going to be in the 8K block.
>
> Can we use CBC-ESSIV[1] or XTS[2] instead? IIUC with these modes we
> can use table oid and page number for IV or tweak and we don't need to
> change them each time to encrypt pages.
>
> [1] https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_.28ESSIV.29
> [2]
https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_(XTS)


From what I can tell [1] is morally equivalent to the NIST method and
does nothing to change the fact that the input nonce needs to be unique
for each encryption operation. I have not had time to review [2] yet...

While it would be very tempting to convince ourselves that a unique
input nonce is not a requirement, I think we are better off being
conservative unless we find some extremely clear guidance that allows us
to draw that conclusion.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/10/19 8:34 AM, Stephen Frost wrote:
> Greetings,
>
> * Joe Conway (mail@joeconway.com) wrote:
>> On 7/9/19 7:28 PM, Stephen Frost wrote:
>> > * Joe Conway (mail@joeconway.com) wrote:
>> >> On 7/9/19 5:42 PM, Tomas Vondra wrote:
>> >> > There are two basic ways to construct nonces - CSPRNG and sequences, and
>> >> > then a combination of both, i.e. one part is generated from a sequence
>> >> > and one randomly.
>> >> >
>> >> > FWIW not sure using OIDs as nonces directly is a good idea, as those are
>> >> > inherently low entropy data - how often do you see databases with OIDs
>> >> > above 1M or so? Probably not very often, and in most cases those are
>> >> > databases where those OIDs are for OIDs and large objects, so irrelevant
>> >> > for this purpose. I might be wrong but having a 96-bit nonce with maybe
>> >> > just 32bits of entrophy seems suspicious.
>> >> >
>> >> > That does not mean we can't use the OIDs at all, but maybe hashing them
>> >> > into a single 4B value, and then picking the remaining 8B randomly.
>> >> > Also, we have a "natural" sequence in the database - LSNs, maybe that
>> >> > would be a good source of nonces too?
>> >>
>> >> I think you missed the quoted part (upthread) from the NIST document:
>> >>
>> >>   "There are two recommended methods for generating unpredictable IVs.
>> >>    The first method is to apply the forward cipher  function, under the
>> >>    same key that is used for the encryption of the plaintext, to a
>> >>    nonce. The nonce must be a data block that is unique to each
>> >>    execution of the encryption operation. For example, the nonce may be
>> >>    a counter, as described in Appendix B, or a message number. The
>> >>    second method is to generate a random data block using a
>> >>    FIPS-approved random number generator."
>> >>
>> >> That first method says a counter as input produces an acceptably
>> >> unpredictable IV as long as it is unique to each encryption operation.
>> >> If each page is going to be an "encryption operation", so as long as our
>> >> input nonce is unique for a given key, we should be ok. If the input
>> >> nonce is tableoid+pagenum and the key is different per database (at
>> >> least, hopefully different per tablespace too), we should be good to go,
>> >> at least from what I can see.
>> >
>> > What I think Tomas is getting at here is that we don't write a page only
>> > once.
>> >
>> > A nonce of tableoid+pagenum will only be unique the first time we write
>> > out that page.  Seems unlikely that we're only going to be writing these
>> > pages once though- what we need is a nonce that's unique for *every
>> > write* of the 8k page, isn't it?  As every write of the page is going to
>> > be encrypting something new.
>>
>> Hmm, good point. I'm not entirely sure it would be required if the two
>> page versions don't exist at the same time, but I guess backups mean
>> that it would, so yeah.
>
> Uh, or an attacker got a copy of the page and then just waited a few
> minutes for a new version to be written and then grabbed that...
>
> Definitely not limited to just concerns about the fact that other
> versions would exist in backups too.

Agreed :-/

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Tue, Jul 09, 2019 at 10:06:33PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Ryan Lambert (ryan@rustprooflabs.com) wrote:
>> > What I think Tomas is getting at here is that we don't write a page only
>> > once.
>>

Yes, that's what I meant.

>> > A nonce of tableoid+pagenum will only be unique the first time we write
>> > out that page.  Seems unlikely that we're only going to be writing these
>> > pages once though- what we need is a nonce that's unique for *every
>> > write* of the 8k page, isn't it?  As every write of the page is going to
>> >  be encrypting something new.
>>
>> > With sufficient randomness, we can at least be more likely to have a
>> > unique nonce for each 8K write.  Including the LSN seems like it'd be a
>> > possible alternative.
>>
>> Agreed.  I know little of the inner details about the LSN but what I read
>> in [1] sounds encouraging in addition to tableoid + pagenum.
>>
>> [1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html
>
>Yes, but it's still something that we'd have to store somewhere- the
>actual LSN of the page is going to be in the 8K block.
>
>Unless we decide that we can pull the LSN *out* of the 8K block and
>store it unencrypted, and then store the *rest* of the block
>encrypted...  That might also allow things like backup software to work
>on these encrypted data files for page-level backups without needing
>access to the key and that'd be pretty neat.
>
>Of course, as with anything, the more data you expose, the higher the
>overall risk that someone can figure out some meaning from it.  Still,
>if the idea was that we'd use the LSN in this way, then it'd need to be
>stored unencrypted regardless...
>

Elsewhere in this thread I've already proposed to leave a bit of space at
the end of a page unencrypted, with page-level encryption metadata. That
might be the nonce (no matter how we end up computing it), key ID used to
encrypt this page, etc.

I don't think we need to put the whole LSN into the nonce in plaintext.
What I was imagining was intead using something like

    sha2(LSN, oid, blockno, random())

or something like that.

Of course, having the LSN (and other stuff like page checksum) unencrypted
would be pretty useful - as you note.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, Jul 10, 2019 at 03:38:54PM +0900, Masahiko Sawada wrote:
>On Tue, Jul 9, 2019 at 9:01 PM Joe Conway <mail@joeconway.com> wrote:
>>
>> On 7/9/19 6:07 AM, Peter Eisentraut wrote:
>> > On 2019-07-08 18:09, Joe Conway wrote:
>> >> In my mind, and in practice to a
>> >> large extent, a postgres tablespace == a unique mount point.
>> >
>> > But a critical difference is that in file systems, a separate mount
>> > point has its own journal.
>>
>> While it would be ideal to have separate WAL, and even separate shared
>> buffer pools, per tablespace, I think that is too much complexity for
>> the first implementation and we could have a single separate key for all
>> WAL for now.
>
>If we encrypt different tables with different keys I think we need to
>encrypt WAL with the same keys as we used for tables, as per
>discussion so far. And we would need to encrypt each WAL records, not
>whole WAL 8k pages.
>

I don't think we actually need that - we need to ensure that users don't
have access to the key used to encrypt WAL.

This is very much a question of the threat model. If we see TDE as a
data-at-rest solution, then I think it's fine to have a separate keyring
with such keys, and only allow access to system processes.

If our thread model includes cases where people can read memory (does not
matter if it's because of a vulnerability, privilege escalation or just
allowing the people to load an extension with custom C function), then I
think we've already lost. Those people will be able to read the keys
anyway, no matter how many keys are used to encrypt the WAL.

We may need to change how WAL writing works, so that individual backends
don't really write into the WAL buffers directly, and intead hand-over the
data to a separate process (with access to the key). We already have the
walwriter, but IIRC it may not be running and we consider that to be OK.
Or maybe we could have "encrypter" process that does just that.

That's surely non-trivial work, but it seems like much less work compared
to reworking the WAL format to allow WAL to be encrypted with different
keys. At least for v0 that should be OK.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Wed, Jul 10, 2019 at 08:31:17AM -0400, Joe Conway wrote:
> Please see my other reply (and
> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> appendix C as pointed out by Ryan downthread).
> 
> At least in my mind, I trust a published specification from the
> nation-state level over random blogs or wikipedia. If we can find some
> equivalent published standards that contradict NIST we should discuss
> it, but for my money I would prefer to stick with the NIST recommended
> method to produce the IVs.

So, we have had a flurry of activity on this thread in the past day, so
let me summarize:

*  Using the database oid does make the nonce unique in the cluster as
long as we re-encrypt when we do CREATE DATABASE.  We can avoid some of
that by not encrypting template1, but we have the WITH TEMPLATE option
to use other databases as templates, so we might as well always just
decrypt/re-encrypted.

*  However, the page will be rewritten many times, so if we use just
pg_database/pg_class/page-offset for the nonce, we are re-encrypting
with the same nonce/IV for multiple page values, which is a security
issue.

*  Using the LSN as part of the nonce fixes both problems, and has a
   third benefit:

    *  We don't need to decrypt/re-encrypt during CREATE DATABASE since
       the page contents are the same in both places, and once one
       database changes its pages, it gets a new LSN, and hence a new
       nonce/IV.

    *  For each change of an 8k page, you get a new nonce/IV, so you
       are not encrypting different data with the same nonce/IV

    *  This avoids requiring pg_upgrade to preserve database oids.

*  It was determined that running known values like pg_class.oid, LSN,
page-number to create a nonce and running those through the encryption function
to create an IV is sufficient.

However, the LSN must then be visible on the encrypted pages.  I would
like to avoid having different page formats for encrypted and
non-encrypted pages, because if we require additional storage for
encrypted pages (like adding a random number), existing non-encrypted
pages might not be able to fit in the encrypted format, causing
complexity when accessing them and when converting tables to encrypted
format.

Looking at the page header, I see:

    typedef struct PageHeaderData
    {
        /* XXX LSN is member of *any* block, not only page-organized ones */
        PageXLogRecPtr pd_lsn;      /* LSN: next byte after last byte of xlog
                                     * record for last change to this page */
        uint16      pd_checksum;    /* checksum */
        uint16      pd_flags;       /* flag bits, see below */
        LocationIndex pd_lower;     /* offset to start of free space */
        LocationIndex pd_upper;     /* offset to end of free space */

pd_lsn/PageXLogRecPtr is 8 bytes.  (We might only want to use the low
order bits for the nonce.)  LocationIndex is 16 bits, meaning that the
four fields listed above are 16-bytes wide, which is the width of the
typical AES cipher mode block.  I suggest we _not_ encrypt the first 16
bytes of each 8k page, and start encrypting at byte 17 --- that way,
these values are visible and can be used as part of the nonce to create
an IV.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Jul-10, Bruce Momjian wrote:

> *  Using the LSN as part of the nonce fixes both problems, and has a
>    third benefit:
> 
>     *  We don't need to decrypt/re-encrypt during CREATE DATABASE since
>        the page contents are the same in both places, and once one
>        database changes its pages, it gets a new LSN, and hence a new
>        nonce/IV.
> 
>     *  For each change of an 8k page, you get a new nonce/IV, so you
>        are not encrypting different data with the same nonce/IV
> 
>     *  This avoids requiring pg_upgrade to preserve database oids.

An ignorant question -- what is it that gets stored in the page for
decryption use, the nonce or the IV derived from it?  I think if you
want to store the nonce, you'd have to store the database OID, because
otherwise how do you know which database OID to use to determine the
full nonce after cloning a database?  You already have the table OID in
the catalog and the LSN in the page header, so you're only missing the
database OID.  (Assuming you make the nonce be database OID || relation
OID || page LSN)

Also, how are key changes handled?  Do we need to store a key identifier
in each page?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, Jul 10, 2019 at 01:04:47PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-10, Bruce Momjian wrote:
> 
> > *  Using the LSN as part of the nonce fixes both problems, and has a
> >    third benefit:
> > 
> >     *  We don't need to decrypt/re-encrypt during CREATE DATABASE since
> >        the page contents are the same in both places, and once one
> >        database changes its pages, it gets a new LSN, and hence a new
> >        nonce/IV.
> > 
> >     *  For each change of an 8k page, you get a new nonce/IV, so you
> >        are not encrypting different data with the same nonce/IV
> > 
> >     *  This avoids requiring pg_upgrade to preserve database oids.
> 
> An ignorant question -- what is it that gets stored in the page for
> decryption use, the nonce or the IV derived from it?  I think if you
> want to store the nonce, you'd have to store the database OID, because
> otherwise how do you know which database OID to use to determine the
> full nonce after cloning a database?  You already have the table OID in
> the catalog and the LSN in the page header, so you're only missing the
> database OID.  (Assuming you make the nonce be database OID || relation
> OID || page LSN)

You are right that if you used the database oid in the nonce, you would
need to decrypt/re-encrypt the data during CREATE DATABASE, or store
the original database oid in the page.

The new approach is that a single key would be used for all databases
and the WAL, and use the LSN instead of the database oid, so there is no
need to know which database originally encrypted the page --- any
database can decrypt it.

> Also, how are key changes handled?  Do we need to store a key identifier
> in each page?

Uh, we have not started discussing that yet.  I am thinking we might
need to store the key identifier in the pg_class table and then create a
command to re-encrypt tables.  We can re-key a page at a time, but we
would still need to know when all pages/tables are no longer using the
old key, so doing it just at the table/index level seems appropriate.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +




what is it that gets stored in the page for
decryption use, the nonce or the IV derived from it?

I believe storing the IV is preferable and still secure per [1]: "The IV need not be secret"

Beyond needing the database oid, if every decrypt function has to regenerate the IV from the nonce that will affect performance.  I don't know how expensive the forward hash is but it won't be free.




Ryan Lambert

 
Greetings,

* Ryan Lambert (ryan@rustprooflabs.com) wrote:
> > what is it that gets stored in the page for
> > decryption use, the nonce or the IV derived from it?
>
> I believe storing the IV is preferable and still secure per [1]: "The IV
> need not be secret"
>
> Beyond needing the database oid, if every decrypt function has to
> regenerate the IV from the nonce that will affect performance.  I don't
> know how expensive the forward hash is but it won't be free.

Compared to the syscall and possible disk i/o required, I'm not sure
that's something we really need to try to optimize for, particularly if
we could store something more generally useful (like the LSN) in that
little bit of space that's available in each page.

Thanks,

Stephen

Attachment
On Wed, Jul 10, 2019 at 12:38:02PM -0600, Ryan Lambert wrote:
> 
>     what is it that gets stored in the page for
>     decryption use, the nonce or the IV derived from it?
> 
> 
> I believe storing the IV is preferable and still secure per [1]: "The IV need
> not be secret"
> 
> Beyond needing the database oid, if every decrypt function has to regenerate
> the IV from the nonce that will affect performance.  I don't know how expensive
> the forward hash is but it won't be free.

Well, I think we have three options.  We have 3 4-byte integers
(pg_class.oid, LSN, page-number) that could be concatenated to be the
IV, we could run those through a hash, or we could run them through the
encryption function with the secret.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Wed, Jul 10, 2019 at 12:38:02PM -0600, Ryan Lambert wrote:
> >
> >     what is it that gets stored in the page for
> >     decryption use, the nonce or the IV derived from it?
> >
> >
> > I believe storing the IV is preferable and still secure per [1]: "The IV need
> > not be secret"
> >
> > Beyond needing the database oid, if every decrypt function has to regenerate
> > the IV from the nonce that will affect performance.  I don't know how expensive
> > the forward hash is but it won't be free.
>
> Well, I think we have three options.  We have 3 4-byte integers
> (pg_class.oid, LSN, page-number) that could be concatenated to be the
> IV, we could run those through a hash, or we could run them through the
> encryption function with the secret.

I didn't see where it was said that using a hash was a good idea in this
context..?  Encrypting it with the key looked like it was discussed as a
viable option.  I had understood that part of the point of using the
table OID and page-number was also so that we didn't have to explicitly
store the result, therefore requiring us to need less space on the page
to make this happen.

Thanks,

Stephen

Attachment
On Wed, Jul 10, 2019 at 02:44:30PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Wed, Jul 10, 2019 at 12:38:02PM -0600, Ryan Lambert wrote:
> > > 
> > >     what is it that gets stored in the page for
> > >     decryption use, the nonce or the IV derived from it?
> > > 
> > > 
> > > I believe storing the IV is preferable and still secure per [1]: "The IV need
> > > not be secret"
> > > 
> > > Beyond needing the database oid, if every decrypt function has to regenerate
> > > the IV from the nonce that will affect performance.  I don't know how expensive
> > > the forward hash is but it won't be free.
> > 
> > Well, I think we have three options.  We have 3 4-byte integers
> > (pg_class.oid, LSN, page-number) that could be concatenated to be the
> > IV, we could run those through a hash, or we could run them through the
> > encryption function with the secret.
> 
> I didn't see where it was said that using a hash was a good idea in this
> context..?  Encrypting it with the key looked like it was discussed as a

I didn't either, except it was referenced above as "forward hash".  I
don't know why that was suggested, which is why I listed it as an
option/suggestion.

> viable option.  I had understood that part of the point of using the

Agreed.

> table OID and page-number was also so that we didn't have to explicitly
> store the result, therefore requiring us to need less space on the page
> to make this happen.

Yep!

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



> I didn't either, except it was referenced above as "forward hash".  I
> don't know why that was suggested, which is why I listed it as an
> option/suggestion.

My bad, sorry for the confusion!  I meant to say "cipher" not "hash".  I was (trying to) refer to the method of generating unpredictable IV from nonces using the forward cipher function and the encryption key.
Too many closely related words with very specific meanings.

Ryan



Greetings,

* Ryan Lambert (ryan@rustprooflabs.com) wrote:
> > I didn't either, except it was referenced above as "forward hash".  I
> > don't know why that was suggested, which is why I listed it as an
> > option/suggestion.
>
> My bad, sorry for the confusion!  I meant to say "cipher" not "hash".  I
> was (trying to) refer to the method of generating unpredictable IV from
> nonces using the forward cipher function and the encryption key.
> Too many closely related words with very specific meanings.

No worries, just want to try and be clear on these things..  Too easy to
mistakenly think that doing this very-similar-thing will be as secure as
doing the recommended-thing (particularly when the recommended-thing is
a lot harder...), and we don't want to end up doing that and then
discovering it isn't actually secure..

Thanks!

Stephen

Attachment
On Wed, Jul 10, 2019 at 02:57:54PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Ryan Lambert (ryan@rustprooflabs.com) wrote:
> > > I didn't either, except it was referenced above as "forward hash".  I
> > > don't know why that was suggested, which is why I listed it as an
> > > option/suggestion.
> > 
> > My bad, sorry for the confusion!  I meant to say "cipher" not "hash".  I
> > was (trying to) refer to the method of generating unpredictable IV from
> > nonces using the forward cipher function and the encryption key.
> > Too many closely related words with very specific meanings.
> 
> No worries, just want to try and be clear on these things..  Too easy to
> mistakenly think that doing this very-similar-thing will be as secure as
> doing the recommended-thing (particularly when the recommended-thing is
> a lot harder...), and we don't want to end up doing that and then
> discovering it isn't actually secure..

Good, so I think we all now agree we have to put the nonce
(pg_class.oid, LSN, page-number) though the cipher using the secret.  I
think Stephen is right that the overhead of this will be minimal for 8k
page writes, and for WAL, we only need to generate the IV when we start
a new 16MB segment, so again, minimal overhead.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Jul-10, Bruce Momjian wrote:

> Good, so I think we all now agree we have to put the nonce
> (pg_class.oid, LSN, page-number) though the cipher using the secret.

Actually, why do you need the page number in the nonce?  The LSN already
distinguishes pages -- you can't have two pages with the same LSN, can
you?  (I do think you can have multiple writes of the same page with
different LSNs, if you change hint bits and don't write WAL about it,
but maybe we should force CRC enabled in encrypted tables, which I think
closes this hole?)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> On 2019-Jul-10, Bruce Momjian wrote:
>
> > Good, so I think we all now agree we have to put the nonce
> > (pg_class.oid, LSN, page-number) though the cipher using the secret.
>
> Actually, why do you need the page number in the nonce?  The LSN already
> distinguishes pages -- you can't have two pages with the same LSN, can
> you?  (I do think you can have multiple writes of the same page with
> different LSNs, if you change hint bits and don't write WAL about it,
> but maybe we should force CRC enabled in encrypted tables, which I think
> closes this hole?)

The point about the LSN not changing is definitely a very good one..  I
agree that we should require checksums to deal with that possibility.

Thanks,

Stephen

Attachment
On Wed, Jul 10, 2019 at 03:53:55PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-10, Bruce Momjian wrote:
> 
> > Good, so I think we all now agree we have to put the nonce
> > (pg_class.oid, LSN, page-number) though the cipher using the secret.
> 
> Actually, why do you need the page number in the nonce?  The LSN already
> distinguishes pages -- you can't have two pages with the same LSN, can
> you?  (I do think you can have multiple writes of the same page with
> different LSNs, if you change hint bits and don't write WAL about it,
> but maybe we should force CRC enabled in encrypted tables, which I think
> closes this hole?)

Uh, what if a transaction modifies page 0 and page 1 of the same table
--- don't those pages have the same LSN.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Jul-10, Bruce Momjian wrote:

> Uh, what if a transaction modifies page 0 and page 1 of the same table
> --- don't those pages have the same LSN.

No, because WAL being a physical change log, each page gets its own
WAL record with its own LSN.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 7/10/19 3:53 PM, Alvaro Herrera wrote:
> On 2019-Jul-10, Bruce Momjian wrote:
> 
>> Good, so I think we all now agree we have to put the nonce
>> (pg_class.oid, LSN, page-number) though the cipher using the secret.

(been traveling -- just trying to get caught up on this thread)

> Actually, why do you need the page number in the nonce?  The LSN already
> distinguishes pages -- you can't have two pages with the same LSN, can
> you?  (I do think you can have multiple writes of the same page with
> different LSNs, if you change hint bits and don't write WAL about it,

Do you mean "multiple writes of the same page without..."?

> but maybe we should force CRC enabled in encrypted tables, which I think
> closes this hole?)

If we can use the LSN (perhaps with CRC) without the page number that
would seem to be a good idea.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



On Wed, Jul 10, 2019 at 04:11:21PM -0400, Alvaro Herrera wrote:
>On 2019-Jul-10, Bruce Momjian wrote:
>
>> Uh, what if a transaction modifies page 0 and page 1 of the same table
>> --- don't those pages have the same LSN.
>
>No, because WAL being a physical change log, each page gets its own
>WAL record with its own LSN.
>

What if you have wal_log_hints=off? AFAIK that won't change the page LSN.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Wed, Jul 10, 2019 at 04:11:21PM -0400, Alvaro Herrera wrote:
> >On 2019-Jul-10, Bruce Momjian wrote:
> >
> >>Uh, what if a transaction modifies page 0 and page 1 of the same table
> >>--- don't those pages have the same LSN.
> >
> >No, because WAL being a physical change log, each page gets its own
> >WAL record with its own LSN.
> >
>
> What if you have wal_log_hints=off? AFAIK that won't change the page LSN.

Alvaro suggested elsewhere that we require checksums for these, which
would also force wal_log_hints to be on, and therefore the LSN would
change.

Thanks,

Stephen

Attachment
On Wed, Jul 10, 2019 at 06:04:30PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On Wed, Jul 10, 2019 at 04:11:21PM -0400, Alvaro Herrera wrote:
>> >On 2019-Jul-10, Bruce Momjian wrote:
>> >
>> >>Uh, what if a transaction modifies page 0 and page 1 of the same table
>> >>--- don't those pages have the same LSN.
>> >
>> >No, because WAL being a physical change log, each page gets its own
>> >WAL record with its own LSN.
>> >
>>
>> What if you have wal_log_hints=off? AFAIK that won't change the page LSN.
>
>Alvaro suggested elsewhere that we require checksums for these, which
>would also force wal_log_hints to be on, and therefore the LSN would
>change.
>

Oh, I see - yes, that would solve the hint bits issue. Not sure we want
to combine the features like this, though, as it increases the costs of
TDE. But maybe it's the best solution.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On 2019-Jul-10, Joe Conway wrote:

> On 7/10/19 3:53 PM, Alvaro Herrera wrote:

> > (I do think you can have multiple writes of the same page with
> > different LSNs, if you change hint bits and don't write WAL about it,
> 
> Do you mean "multiple writes of the same page without..."?

Right, "twice the same page with the same LSN" is what I was thinking,
which is basically the question Tomas asked afterwards.

> > but maybe we should force CRC enabled in encrypted tables, which I think
> > closes this hole?)
> 
> If we can use the LSN (perhaps with CRC) without the page number that
> would seem to be a good idea.

Umm, I'm not advocating using the CRC as part of the nonce, because that
seems a terrible idea.  I was just saying that if you enable CRC, then
even hint bit changes cause LSN changes (and thus IV changes) because of
the necessary FPIs, so you shouldn't get two writes with the same LSN.

With all this said, I think the case for writing two pages with the same
IV is being overstated a little bit.  As I understand, the reason we
want to avoid using the same IV for too many pages is to dodge a
cryptanalysis attack, which requires a large amount of data encrypted
with the same key/IV in order to be effective.  But if we have two
copies of the same page encrypted with the same key/IV, yes it's twice
as much data as just one copy of the page with that key/IV, but it still
seems like a sufficiently low amount of data that cryptanalysis is
unfeasible.  Right?  I mean, webservers send hundreds of kilobytes
encrypted with the same key; they avoid sending megabytes of it with the
same key/IV, but getting too worked up about 16 kB when we think 8 kB is
fine seems over the top.

So I guess the question is how much data is considered sufficient for a
successful, practical cryptanalysis attack?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




As I understand, the reason we
want to avoid using the same IV for too many pages is to dodge a
cryptanalysis attack, which requires a large amount of data encrypted
with the same key/IV in order to be effective.  But if we have two
copies of the same page encrypted with the same key/IV, yes it's twice
as much data as just one copy of the page with that key/IV, but it still
seems like a sufficiently low amount of data that cryptanalysis is
unfeasible.  Right?  I mean, webservers send hundreds of kilobytes
encrypted with the same key; they avoid sending megabytes of it with the
same key/IV, but getting too worked up about 16 kB when we think 8 kB is
fine seems over the top.
So I guess the question is how much data is considered sufficient for a
successful, practical cryptanalysis attack?

Yes, a cryptanalysis attack could hypothetically derive critical info about the key from two encrypted blocks with the same IV.

A major (very important) difference with web servers is they use asymmetric encryption with the client to negotiate and share the secure symmetric encryption key for that session.  The vast majority of the encryption work is done w/ short lived symmetric keys, not the TLS keys we all think of (because that's what we configure).  Many DB encryption keys (symmetric) will live for a number of years, so the attack vectors and timelines are far different.  By contrast, the longest CA TLS keys through paid vendors are typically 2 years, most are 1, LetsEncrypt certs only live 3 months.

Are there any metrics on how long a page can live without being modified in one way or another to trigger it to re-encrypt with a new IV?  Is it possible that a single page could live essentially forever without being modified?  If its the latter than I would opt on the side of paranoid due to the expected lifecycle of keys.  If it's the former it probably merits further discussion on the paranoia requirements.  Another consideration is if someone can get this data and there are a LOT of pages sharing IVs the exposure increases significantly, probably not linearly. 

Another (probably bad) idea is if there was a REENCRYPT DATABASE, the hyper-paranoid could force a full rewrite as often as they want.  Large databases seem to be the ones that are most likely to have long living pages, and the least likely to want to wait to reencrypt the whole thing.

Ryan Lambert


On Wed, Jul 10, 2019 at 06:28:42PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-10, Joe Conway wrote:
> 
> > On 7/10/19 3:53 PM, Alvaro Herrera wrote:
> 
> > > (I do think you can have multiple writes of the same page with
> > > different LSNs, if you change hint bits and don't write WAL about it,
> > 
> > Do you mean "multiple writes of the same page without..."?
> 
> Right, "twice the same page with the same LSN" is what I was thinking,
> which is basically the question Tomas asked afterwards.

Just to clarify, the case being discussed above is where we modify a
page with a new row, write the page with an LSN, then change a hint bit
on the page, and, if log_hint_bits is false, write the page with the
hint bit change without changing the LSN since we didn't log hint bits.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 11, 2019 at 12:18:47AM +0200, Tomas Vondra wrote:
> On Wed, Jul 10, 2019 at 06:04:30PM -0400, Stephen Frost wrote:
> > Greetings,
> > 
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> > > On Wed, Jul 10, 2019 at 04:11:21PM -0400, Alvaro Herrera wrote:
> > > >On 2019-Jul-10, Bruce Momjian wrote:
> > > >
> > > >>Uh, what if a transaction modifies page 0 and page 1 of the same table
> > > >>--- don't those pages have the same LSN.
> > > >
> > > >No, because WAL being a physical change log, each page gets its own
> > > >WAL record with its own LSN.
> > > >
> > > 
> > > What if you have wal_log_hints=off? AFAIK that won't change the page LSN.
> > 
> > Alvaro suggested elsewhere that we require checksums for these, which
> > would also force wal_log_hints to be on, and therefore the LSN would
> > change.
> > 
> 
> Oh, I see - yes, that would solve the hint bits issue. Not sure we want
> to combine the features like this, though, as it increases the costs of
> TDE. But maybe it's the best solution.

Uh, why can't we just force log_hint_bits for encrypted tables?  Why
would we need to use checksums as well?

Why is page-number not needed in the nonce?  Because it is duplicative
of the LSN?  Can we use just LSN?  Do we need pg_class.oid too?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



> >>Uh, what if a transaction modifies page 0 and page 1 of the same table

> >>--- don't those pages have the same LSN.
> >
> >No, because WAL being a physical change log, each page gets its own
> >WAL record with its own LSN.
> >
>
> What if you have wal_log_hints=off? AFAIK that won't change the page LSN. >

> Alvaro suggested elsewhere that we require checksums for these, which
> would also force wal_log_hints to be on, and therefore the LSN would
> change.


 
Yes, it sounds like the agreement was LSN is unique when wal_log_hints is on.  I don't know enough about the internals to know if pg_class.oid is also needed or not.

Ryan

On Wed, Jul 10, 2019 at 6:07 PM Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Jul 11, 2019 at 12:18:47AM +0200, Tomas Vondra wrote:
> On Wed, Jul 10, 2019 at 06:04:30PM -0400, Stephen Frost wrote:
> > Greetings,
> >
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> > > On Wed, Jul 10, 2019 at 04:11:21PM -0400, Alvaro Herrera wrote:
> > > >On 2019-Jul-10, Bruce Momjian wrote:
> > > >
> > > >>Uh, what if a transaction modifies page 0 and page 1 of the same table
> > > >>--- don't those pages have the same LSN.
> > > >
> > > >No, because WAL being a physical change log, each page gets its own
> > > >WAL record with its own LSN.
> > > >
> > >
> > > What if you have wal_log_hints=off? AFAIK that won't change the page LSN.
> >
> > Alvaro suggested elsewhere that we require checksums for these, which
> > would also force wal_log_hints to be on, and therefore the LSN would
> > change.
> >
>
> Oh, I see - yes, that would solve the hint bits issue. Not sure we want
> to combine the features like this, though, as it increases the costs of
> TDE. But maybe it's the best solution.

Uh, why can't we just force log_hint_bits for encrypted tables?  Why
would we need to use checksums as well?

Why is page-number not needed in the nonce?  Because it is duplicative
of the LSN?  Can we use just LSN?  Do we need pg_class.oid too?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
Joe Conway <mail@joeconway.com> wrote:

> Please see my other reply (and
> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> appendix C as pointed out by Ryan downthread).

Thanks.

> At least in my mind, I trust a published specification from the
> nation-state level over random blogs or wikipedia. If we can find some
> equivalent published standards that contradict NIST we should discuss
> it, but for my money I would prefer to stick with the NIST recommended
> method to produce the IVs.

I don't think this as a problem of trusting A over B. Those blogs try to
explain the attacks in detail, while the NIST standard is just a set of
recommendations that does not (try to) provide technical details of comparable
depth.

Although I prefer understanding things in detail, I think it's o.k. to say in
documentation that "we use ... cipher because it complies to ... standard".

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Thu, Jul 11, 2019 at 06:45:36AM -0600, Ryan Lambert wrote:
> > >>Uh, what if a transaction modifies page 0 and page 1 of the same table
> 
> > >>--- don't those pages have the same LSN.
> > >
> > >No, because WAL being a physical change log, each page gets its own
> > >WAL record with its own LSN.
> > >
> >
> > What if you have wal_log_hints=off? AFAIK that won't change the page LSN.
> 
> >
> 
> > Alvaro suggested elsewhere that we require checksums for these, which
> > would also force wal_log_hints to be on, and therefore the LSN would
> > change.
>  
> Yes, it sounds like the agreement was LSN is unique when wal_log_hints is on. 
> I don't know enough about the internals to know if pg_class.oid is also needed
> or not.

Well, so, as far as we know now, every change to a heap/index page
advances the LSN, except for hint bits, which we can force to advance
LSN via wal_log_hints.  We automatically enable wal_log_hints for
checksums, so we can easily enable it automatically for encrypted
clusters.

I assume the LSN used for 8k pages and the segment numbers (for WAL) do
not overlap in numbering, for our nonce.

I think we will eventually have to have this setup verified by security
experts but I think we need to work through what is possible using
existing Postgres facilities before we present a possible solution.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Jul 9, 2019 at 10:43 AM Bruce Momjian <bruce@momjian.us> wrote:
> FYI, pg_upgrade already preserves the pg_class.oid, which is why I
> recommended it over pg_class.relfilenode:

I think it's strange that pg_upgrade does not preserve the
relfilenode.  I think it would probably make more sense if it did.

Anyway, leaving that aside, you have to be able to read pg_class to
know the OID of a table, and you can't do that in recovery before
reaching consistency. Yet, you still need to be able to modify disk
blocks at that point, to finish recovery. So I can't see how any
system that involves figuring out the nonce from the OID would ever
work.

If we end up with random nonces, we're going to have to store them
someplace - either in some unencrypted portion of the disk blocks
themselves, or in a separate fork, or someplace else. If it's OK for
them to predictable as long as they vary a lot, we could derive them
from DBOID + RELFILENODE + FORK + BLOCK, but not from DBOID + RELOID +
FORK + BLOCK, because of the aforementioned recovery problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jul 11, 2019 at 03:47:50PM -0400, Robert Haas wrote:
> On Tue, Jul 9, 2019 at 10:43 AM Bruce Momjian <bruce@momjian.us> wrote:
> > FYI, pg_upgrade already preserves the pg_class.oid, which is why I
> > recommended it over pg_class.relfilenode:
> 
> I think it's strange that pg_upgrade does not preserve the
> relfilenode.  I think it would probably make more sense if it did.
> 
> Anyway, leaving that aside, you have to be able to read pg_class to
> know the OID of a table, and you can't do that in recovery before
> reaching consistency. Yet, you still need to be able to modify disk
> blocks at that point, to finish recovery. So I can't see how any
> system that involves figuring out the nonce from the OID would ever
> work.
> 
> If we end up with random nonces, we're going to have to store them
> someplace - either in some unencrypted portion of the disk blocks
> themselves, or in a separate fork, or someplace else. If it's OK for
> them to predictable as long as they vary a lot, we could derive them
> from DBOID + RELFILENODE + FORK + BLOCK, but not from DBOID + RELOID +
> FORK + BLOCK, because of the aforementioned recovery problem.

Later in this thread, we decided that the page LSN was the best option
as a nonce because it changes every time the page chages.  (We will
enable wal_log_hints.)  We will not encrypt the first 16 bytes of the
page so it can be used for the nonce. (AES block ciphers are use 16-byte
blocks.)

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Jul 10, 2019 at 12:26:24PM -0400, Bruce Momjian wrote:
> On Wed, Jul 10, 2019 at 08:31:17AM -0400, Joe Conway wrote:
> > Please see my other reply (and
> > https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> > appendix C as pointed out by Ryan downthread).
> > 
> > At least in my mind, I trust a published specification from the
> > nation-state level over random blogs or wikipedia. If we can find some
> > equivalent published standards that contradict NIST we should discuss
> > it, but for my money I would prefer to stick with the NIST recommended
> > method to produce the IVs.
> 
> So, we have had a flurry of activity on this thread in the past day, so
> let me summarize:

Seems we have an updated approach:

First, we need to store the symmetric encryption key in the data
directory, like we do for SSL certificates and private keys.  (Crash
recovery needs access to this key, so we can't easily store it in a
database table.)  We will pattern it after the GUC
ssl_passphrase_command.   We will need to decide on a format for the
symmetric encryption key in the file so we can check that the supplied
passphrase properly unlocks the key.

Our first implementation will encrypt the entire cluster.  We can later
consider encryption per table or tablespace.  It is unclear if
encrypting different parts of the system with different keys is useful
or feasible.  (This is separate from key rotation.)

We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
8k pages will use the LSN as a nonce, which will be encrypted to
generate the initialization vector (IV).  We will not encrypt the first
16 bytes of each pages so the LSN can be used in this way.  The WAL will
use the WAL file segment number as the nonce and the IV will be created
in the same way.

wal_log_hints will be enabled automatically in encryption mode, like we
do for checksum mode, so we never encrypt different 8k pages with the
same IV.

There will need to be a pg_control field to indicate that encryption is
in use.

Right now we don't support the online changing of a cluster's checksum
mode, so I suggest we create a utility like pg_checksums --enable to
allow offline key rotation.  Once we get online checksum mode changing
ability, we can look into use that for encryption key rotation.

Did I miss anything?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 7/11/19 6:37 PM, Bruce Momjian wrote:
> On Wed, Jul 10, 2019 at 12:26:24PM -0400, Bruce Momjian wrote:
>> On Wed, Jul 10, 2019 at 08:31:17AM -0400, Joe Conway wrote:
>>> Please see my other reply (and
>>> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
>>> appendix C as pointed out by Ryan downthread).
>>>
>>> At least in my mind, I trust a published specification from the
>>> nation-state level over random blogs or wikipedia. If we can find some
>>> equivalent published standards that contradict NIST we should discuss
>>> it, but for my money I would prefer to stick with the NIST recommended
>>> method to produce the IVs.
>>
>> So, we have had a flurry of activity on this thread in the past day, so
>> let me summarize:
> 
> Seems we have an updated approach:

I tried to keep up with this thread, and may have failed, but comments
inline...

> First, we need to store the symmetric encryption key in the data
> directory, like we do for SSL certificates and private keys.  (Crash
> recovery needs access to this key, so we can't easily store it in a
> database table.)  We will pattern it after the GUC
> ssl_passphrase_command.   We will need to decide on a format for the
> symmetric encryption key in the file so we can check that the supplied
> passphrase properly unlocks the key.
> 
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)

I still object strongly to using a single key for the entire database. I
think we can use a single key for WAL, but we need some way to split the
heap so that multiple keys are used. If not by tablespace, then some
other method.

Regardless of the method to split the heap into different keys, I think
there should be an option for some tables to not be encrypted. If we
decide it must be all or nothing for the first implementation I guess I
could live with it but would be very disappointed.

The keys themselves should be in an file which is encrypted by a master
key. Obtaining the master key should be pattern it after the GUC
ssl_passphrase_command.

> We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> 8k pages will use the LSN as a nonce, which will be encrypted to
> generate the initialization vector (IV).  We will not encrypt the first
> 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> use the WAL file segment number as the nonce and the IV will be created
> in the same way.

I vote for AES 256 rather than 128.

Did we determine that we no longer need table oid because LSNs are
sufficiently unique?

> wal_log_hints will be enabled automatically in encryption mode, like we
> do for checksum mode, so we never encrypt different 8k pages with the
> same IV.

check

> There will need to be a pg_control field to indicate that encryption is
> in use.

I didn't see that discussed but it makes sense.

> Right now we don't support the online changing of a cluster's checksum
> mode, so I suggest we create a utility like pg_checksums --enable to
> allow offline key rotation.  Once we get online checksum mode changing
> ability, we can look into use that for encryption key rotation.

Master key rotation should be trivial if we do it the way I discussed
above. Rotating the individual heap and WAL keys would certainly be a
bigger problem.

Thinking out loud (and I believe somewhere in this massive thread
someone else already said this), if we had a way to flag "key version"
at the page level it seems like we could potentially rekey page-by-page
while online, locking only one page at a time. We really only need to
support 2 key versions and could ping-pong between them as they change.
Or maybe this is a crazy idea.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
> On 7/11/19 6:37 PM, Bruce Momjian wrote:
> > Our first implementation will encrypt the entire cluster.  We can later
> > consider encryption per table or tablespace.  It is unclear if
> > encrypting different parts of the system with different keys is useful
> > or feasible.  (This is separate from key rotation.)
> 
> I still object strongly to using a single key for the entire database. I
> think we can use a single key for WAL, but we need some way to split the
> heap so that multiple keys are used. If not by tablespace, then some
> other method.

What do you base this on?

> Regardless of the method to split the heap into different keys, I think
> there should be an option for some tables to not be encrypted. If we
> decide it must be all or nothing for the first implementation I guess I
> could live with it but would be very disappointed.

What does it mean you "could live this it"?  Why do you consider having
some tables unencrypted important?

> The keys themselves should be in an file which is encrypted by a master
> key. Obtaining the master key should be pattern it after the GUC
> ssl_passphrase_command.
> 
> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> > 8k pages will use the LSN as a nonce, which will be encrypted to
> > generate the initialization vector (IV).  We will not encrypt the first
> > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> > use the WAL file segment number as the nonce and the IV will be created
> > in the same way.
> 
> I vote for AES 256 rather than 128.

Why?  This page seems to think 128 is sufficient:


https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc

    For practical purposes, 128-bit keys are sufficient to ensure security.
    The larger key sizes exist mostly to satisfy some US military
    regulations which call for the existence of several distinct "security
    levels", regardless of whether breaking the lowest level is already far
    beyond existing technology.

We might need to run some benchmarks to determine the overhead of going
to AES256, because I am unclear of the security value.

> Did we determine that we no longer need table oid because LSNs are
> sufficiently unique?

I think so.

> > wal_log_hints will be enabled automatically in encryption mode, like we
> > do for checksum mode, so we never encrypt different 8k pages with the
> > same IV.
> 
> check
> 
> > There will need to be a pg_control field to indicate that encryption is
> > in use.
> 
> I didn't see that discussed but it makes sense.

Yes, it seems to make sense, but was not discussed.

> > Right now we don't support the online changing of a cluster's checksum
> > mode, so I suggest we create a utility like pg_checksums --enable to
> > allow offline key rotation.  Once we get online checksum mode changing
> > ability, we can look into use that for encryption key rotation.
> 
> Master key rotation should be trivial if we do it the way I discussed
> above. Rotating the individual heap and WAL keys would certainly be a
> bigger problem.

Yes, sorry, master key rotation is simple.  It is encryption key
rotation that I think needs a tool.

> Thinking out loud (and I believe somewhere in this massive thread
> someone else already said this), if we had a way to flag "key version"
> at the page level it seems like we could potentially rekey page-by-page
> while online, locking only one page at a time. We really only need to
> support 2 key versions and could ping-pong between them as they change.
> Or maybe this is a crazy idea.

Yes, we did talk about this.  It is certainly possible, but we would
still need a tool to guarantee all pages are using the new version, so I
am not sure what per-page buys us except making the later check faster. 
I don't see this as a version-1 feature, frankly.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 11, 2019 at 06:37:41PM -0400, Bruce Momjian wrote:
> wal_log_hints will be enabled automatically in encryption mode, like we
> do for checksum mode, so we never encrypt different 8k pages with the
> same IV.

Checksum mode can be enabled in encrypted clusters to detect modified
data, since the checksum is encrypted.  The WAL already has checksums.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 12, 2019 at 7:37 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Jul 10, 2019 at 12:26:24PM -0400, Bruce Momjian wrote:
> > On Wed, Jul 10, 2019 at 08:31:17AM -0400, Joe Conway wrote:
> > > Please see my other reply (and
> > > https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> > > appendix C as pointed out by Ryan downthread).
> > >
> > > At least in my mind, I trust a published specification from the
> > > nation-state level over random blogs or wikipedia. If we can find some
> > > equivalent published standards that contradict NIST we should discuss
> > > it, but for my money I would prefer to stick with the NIST recommended
> > > method to produce the IVs.
> >
> > So, we have had a flurry of activity on this thread in the past day, so
> > let me summarize:
>
> Seems we have an updated approach:
>
> First, we need to store the symmetric encryption key in the data
> directory, like we do for SSL certificates and private keys.  (Crash
> recovery needs access to this key, so we can't easily store it in a
> database table.)  We will pattern it after the GUC
> ssl_passphrase_command.   We will need to decide on a format for the
> symmetric encryption key in the file so we can check that the supplied
> passphrase properly unlocks the key.
>
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)
>
> We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> 8k pages will use the LSN as a nonce, which will be encrypted to
> generate the initialization vector (IV).  We will not encrypt the first
> 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> use the WAL file segment number as the nonce and the IV will be created
> in the same way.
>
> wal_log_hints will be enabled automatically in encryption mode, like we
> do for checksum mode, so we never encrypt different 8k pages with the
> same IV.

I guess that different two pages can have the same LSN when a heap
update modifies both a page for old tuple and another page for new
tuple.

heapam.c:3707
        recptr = log_heap_update(relation, buffer,
                                 newbuf, &oldtup, heaptup,
                                 old_key_tuple,
                                 all_visible_cleared,
                                 all_visible_cleared_new);
        if (newbuf != buffer)
        {
            PageSetLSN(BufferGetPage(newbuf), recptr);
        }
        PageSetLSN(BufferGetPage(buffer), recptr);

Wouldn't it a problem?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Thank you for summarizing the discussion, it's really helpful. I'll
update the wiki page based on the summary.

On Fri, Jul 12, 2019 at 10:05 AM Bruce Momjian <bruce@momjian.us> wrote:
> > The keys themselves should be in an file which is encrypted by a master
> > key. Obtaining the master key should be pattern it after the GUC
> > ssl_passphrase_command.

+1.
I will update the patch set based on the decision on this thread.

> >
> > > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> > > 8k pages will use the LSN as a nonce, which will be encrypted to
> > > generate the initialization vector (IV).  We will not encrypt the first
> > > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> > > use the WAL file segment number as the nonce and the IV will be created
> > > in the same way.
> >
> > I vote for AES 256 rather than 128.
>
> Why?  This page seems to think 128 is sufficient:
>
>
https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>
>         For practical purposes, 128-bit keys are sufficient to ensure security.
>         The larger key sizes exist mostly to satisfy some US military
>         regulations which call for the existence of several distinct "security
>         levels", regardless of whether breaking the lowest level is already far
>         beyond existing technology.
>
> We might need to run some benchmarks to determine the overhead of going
> to AES256, because I am unclear of the security value.

'openssl speed' will help to see the performance differences easily.
FWIW I got the following result in my environment (Intel(R) Core(TM)
i7-3770 CPU @ 3.40GHz).

$ openssl speed -evp aes-128-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     642449.60k   656404.63k   700231.23k   706461.71k   706051.44k

$ openssl speed -evp aes-256-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc     466787.73k   496237.08k   503477.16k   507113.32k   508453.80k

Regarding the security value, I found an interesting post by Bruce Schneier.

https://www.schneier.com/blog/archives/2009/07/another_new_aes.html

"And for new applications I suggest that people don't use AES-256.
AES-128 provides more than enough security margin for the forseeable
future. But if you're already using AES-256, there's no reason to
change."

>
> > Did we determine that we no longer need table oid because LSNs are
> > sufficiently unique?
>
> I think so.
>
> > > wal_log_hints will be enabled automatically in encryption mode, like we
> > > do for checksum mode, so we never encrypt different 8k pages with the
> > > same IV.
> >
> > check
> >
> > > There will need to be a pg_control field to indicate that encryption is
> > > in use.
> >
> > I didn't see that discussed but it makes sense.
>
> Yes, it seems to make sense, but was not discussed.
>
> > > Right now we don't support the online changing of a cluster's checksum
> > > mode, so I suggest we create a utility like pg_checksums --enable to
> > > allow offline key rotation.  Once we get online checksum mode changing
> > > ability, we can look into use that for encryption key rotation.
> >
> > Master key rotation should be trivial if we do it the way I discussed
> > above. Rotating the individual heap and WAL keys would certainly be a
> > bigger problem.
>
> Yes, sorry, master key rotation is simple.  It is encryption key
> rotation that I think needs a tool.

Agreed.

To rotate the master key we can have a SQL function or dedicated SQL
command passing the new master key or the passphrase to postgres.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, Jul 11, 2019 at 9:05 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
> > I vote for AES 256 rather than 128.
>
> Why?  This page seems to think 128 is sufficient:
>
>
https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>
>         For practical purposes, 128-bit keys are sufficient to ensure security.
>         The larger key sizes exist mostly to satisfy some US military
>         regulations which call for the existence of several distinct "security
>         levels", regardless of whether breaking the lowest level is already far
>         beyond existing technology.
>
> We might need to run some benchmarks to determine the overhead of going
> to AES256, because I am unclear of the security value.

If the algorithm and key size is not going to be configurable then
better to lean toward the larger size, especially given the desire for
future proofing against standards evolution and potential for the
encrypted data to be very long lived. NIST recommends AES-128 or
higher but there are other publications that recommend AES-256 for
long term usage:

NIST - 2019 : Recommends AES-128, AES-192, or AES-256.
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf

NSA - 2016 : Recommends AES-256 for future quantum resistance.

https://apps.nsa.gov/iaarchive/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/cnsa-suite-and-quantum-computing-faq.cfm

ECRYPT - 2015 - Recommends AES-256 for future quantum resistance.
https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf

ECRYPT - 2018 - Recommends AES-256 for long term use.
https://www.ecrypt.eu.org/csa/documents/D5.4-FinalAlgKeySizeProt.pdf

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On Fri, Jul 12, 2019 at 03:20:37PM +0900, Masahiko Sawada wrote:
> Thank you for summarizing the discussion, it's really helpful. I'll
> update the wiki page based on the summary.
> 
> On Fri, Jul 12, 2019 at 10:05 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > The keys themselves should be in an file which is encrypted by a master
> > > key. Obtaining the master key should be pattern it after the GUC
> > > ssl_passphrase_command.
> 
> +1.
> I will update the patch set based on the decision on this thread.

Thanks.

> > > > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> > > > 8k pages will use the LSN as a nonce, which will be encrypted to
> > > > generate the initialization vector (IV).  We will not encrypt the first
> > > > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> > > > use the WAL file segment number as the nonce and the IV will be created
> > > > in the same way.
> > >
> > > I vote for AES 256 rather than 128.
> >
> > Why?  This page seems to think 128 is sufficient:
> >
> >
https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
> >
> >         For practical purposes, 128-bit keys are sufficient to ensure security.
> >         The larger key sizes exist mostly to satisfy some US military
> >         regulations which call for the existence of several distinct "security
> >         levels", regardless of whether breaking the lowest level is already far
> >         beyond existing technology.
> >
> > We might need to run some benchmarks to determine the overhead of going
> > to AES256, because I am unclear of the security value.
> 
> 'openssl speed' will help to see the performance differences easily.
> FWIW I got the following result in my environment (Intel(R) Core(TM)
> i7-3770 CPU @ 3.40GHz).
> 
> $ openssl speed -evp aes-128-cbc
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> aes-128-cbc     642449.60k   656404.63k   700231.23k   706461.71k   706051.44k
> 
> $ openssl speed -evp aes-256-cbc
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> aes-256-cbc     466787.73k   496237.08k   503477.16k   507113.32k   508453.80k

I saw similar numbers on my Intel(R) Xeon(R) CPU E5620  @ 2.40GHz with
AES optimization enabled on the CPUs:

    $ grep -i '\<aes\>' /proc/cpuinfo | wc -l
    16

    Doing aes-128-cbc for 3s on 8192 size blocks: 254000 aes-128-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 8192 size blocks: 182496 aes-256-cbc's in 3.00s

which shows AES256 as 40% slower than AES128, which matches the 40%
mentioned here:


https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc

    The larger key sizes imply some CPU overhead (+20% for a 192-bit key,
    +40% for a 256-bit key: internally, the AES is a sequence of "rounds"
    and the AES standard says that there shall be 10, 12 or 14 rounds, for a
    128-bit, 192-bit or 256-bit key, respectively). So there is some
    rational reason not to use a larger than necessary key.

> Regarding the security value, I found an interesting post by Bruce Schneier.
> 
> https://www.schneier.com/blog/archives/2009/07/another_new_aes.html
> 
> "And for new applications I suggest that people don't use AES-256.
> AES-128 provides more than enough security margin for the forseeable
> future. But if you're already using AES-256, there's no reason to
> change."

Yes, that is what I have heard too.  I think the additional number of
people who use encryption because of its lower overhead will greatly
outweigh the benefit of using AES256 vs AES128.

> > Yes, sorry, master key rotation is simple.  It is encryption key
> > rotation that I think needs a tool.
> 
> Agreed.
> 
> To rotate the master key we can have a SQL function or dedicated SQL
> command passing the new master key or the passphrase to postgres.

Well, depending on how we store the encryption key, we will probably
change the master key via a command-line tool like pgchecksums.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 12, 2019 at 07:26:21AM -0400, Sehrope Sarkuni wrote:
> On Thu, Jul 11, 2019 at 9:05 PM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
> > > I vote for AES 256 rather than 128.
> >
> > Why?  This page seems to think 128 is sufficient:
> >
> >
https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
> >
> >         For practical purposes, 128-bit keys are sufficient to ensure security.
> >         The larger key sizes exist mostly to satisfy some US military
> >         regulations which call for the existence of several distinct "security
> >         levels", regardless of whether breaking the lowest level is already far
> >         beyond existing technology.
> >
> > We might need to run some benchmarks to determine the overhead of going
> > to AES256, because I am unclear of the security value.
> 
> If the algorithm and key size is not going to be configurable then
> better to lean toward the larger size, especially given the desire for
> future proofing against standards evolution and potential for the
> encrypted data to be very long lived. NIST recommends AES-128 or
> higher but there are other publications that recommend AES-256 for
> long term usage:
> 
> NIST - 2019 : Recommends AES-128, AES-192, or AES-256.
> https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf
> 
> NSA - 2016 : Recommends AES-256 for future quantum resistance.
>
https://apps.nsa.gov/iaarchive/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/cnsa-suite-and-quantum-computing-faq.cfm
> 
> ECRYPT - 2015 - Recommends AES-256 for future quantum resistance.
> https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf
> 
> ECRYPT - 2018 - Recommends AES-256 for long term use.
> https://www.ecrypt.eu.org/csa/documents/D5.4-FinalAlgKeySizeProt.pdf

Oh, interesting.  Let's see what performance tests with the database
show.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 12, 2019 at 02:15:02PM +0900, Masahiko Sawada wrote:
> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> > 8k pages will use the LSN as a nonce, which will be encrypted to
> > generate the initialization vector (IV).  We will not encrypt the first
> > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> > use the WAL file segment number as the nonce and the IV will be created
> > in the same way.
> >
> > wal_log_hints will be enabled automatically in encryption mode, like we
> > do for checksum mode, so we never encrypt different 8k pages with the
> > same IV.
> 
> I guess that different two pages can have the same LSN when a heap
> update modifies both a page for old tuple and another page for new
> tuple.
> 
> heapam.c:3707
>         recptr = log_heap_update(relation, buffer,
>                                  newbuf, &oldtup, heaptup,
>                                  old_key_tuple,
>                                  all_visible_cleared,
>                                  all_visible_cleared_new);
>         if (newbuf != buffer)
>         {
>             PageSetLSN(BufferGetPage(newbuf), recptr);
>         }
>         PageSetLSN(BufferGetPage(buffer), recptr);
> 
> Wouldn't it a problem?

I had the same question.  If someone does:

    UPDATE tab SET col = col + 1

then each row change gets its own LSN.  You are asking if an update that
just expires one row and adds it to a new page gets the same LSN.  I
don't know.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



>> I vote for AES 256 rather than 128.
>
> Why?  This page seems to think 128 is sufficient:
>
>         https://crypto.stackexchange.com/questions/20/what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>
>         For practical purposes, 128-bit keys are sufficient to ensure security.
>         The larger key sizes exist mostly to satisfy some US military
>         regulations which call for the existence of several distinct "security
>         levels", regardless of whether breaking the lowest level is already far
>         beyond existing technology.

After researching AES key sizes a bit more my vote is (surprisingly?) for AES-128.  My reasoning is about security, I did not consider performance impacts in my decision.

The purpose of longer keys over shorter keys is about ensuring brute-force attacks are prohibitively expensive.  Finding a flaw in the algorithm is the goal of cryptanalysis.  Brute force attacks are only advanced by increasing computing power.

"The security of a symmetric cryptosystem is a function of two things:  the strength of the algorithm and the length of the key.  The former is more important... " [1] (pg 151)

"The algorithm must be so secure that there is no better way to break it than with a brute-force attack." [1] (pg 152)

Finally, a stated recommendation on key size:  "Insist on at least 112-bit keys." [1] (pg 153)  Schneier also mentions in that section that breaking an 80-bit key (brute force) is likely not realistic for another 30 years.  ETA: 2045 based on dated published.  Brute forcing a 128 bit key is much further in the future.

Knowing the algorithm is the important part, onto the strength of the algorithm.  The abstract from [2] states:

"In the case of AES-128, there is no known attack which is faster than the 2^128 complexity of exhaustive search. However, AES-192 and AES-256 were recently shown to be breakable by attacks which require 2^176 and 2^119 time, respectively."

This shows that both AES-128 (2^128) and AES-192 (2^176) both provide more protection in this case than the AES-256 algorithm provides (2^119).  This may be surprising because all AES encryption does not work the same way, even though it's "all AES."  Again from [2]:

"The key schedules of AES-128 and AES-192 are slightly different, since they have to apply more mixing operations to the shorter key in order to produce the slightly smaller number of subkeys for the various rounds. This small difference in the key schedules plays a major role in making AES-256 more vulnerable to our attacks, in spite of its longer key and supposedly higher security."

It appears the required key mixing that occurs with shorter key lengths is actually a strength of the underlying algorithm, and simplifying the key mixing is bad.  They go on to restate this in a more succinct and damning way:  "... it clearly indicates that this part of the design of AES-256 is seriously flawed."

Schneier also mentions how small changes can have big impacts on the security: "strong cryptosystems, with a couple of minor changes, can become weak." [1] (pg 152)


[1] Schneier, B. (2015). Applied Cryptography: Protocols, Algorithms and Source Code in C (20th Anniversary ed.). John Wiley & Sons.

[2] Biryukov, A., Dunkelman, O., Keller, N., Khovratovich, D., & Shamir, A. (2009). Key Recovery Attacks of Practical Complexity on AES-256 Variants with up to 10 Rounds. Retreived from https://eprint.iacr.org/2009/374.pdf


Ryan Lambert
RustProof Labs


On Fri, Jul 12, 2019 at 12:41:19PM -0600, Ryan Lambert wrote:
> >> I vote for AES 256 rather than 128.
> >
> > Why?  This page seems to think 128 is sufficient:
> >
> >         https://crypto.stackexchange.com/questions/20/
> what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
> >
> >         For practical purposes, 128-bit keys are sufficient to ensure
> security.
> >         The larger key sizes exist mostly to satisfy some US military
> >         regulations which call for the existence of several distinct
> "security
> >         levels", regardless of whether breaking the lowest level is already
> far
> >         beyond existing technology.
> 
> After researching AES key sizes a bit more my vote is (surprisingly?) for
> AES-128.  My reasoning is about security, I did not consider performance
> impacts in my decision.

Thank you for this exhaustive research.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 7/12/19 2:45 PM, Bruce Momjian wrote:
> On Fri, Jul 12, 2019 at 12:41:19PM -0600, Ryan Lambert wrote:
>> >> I vote for AES 256 rather than 128.
>> >
>> > Why?  This page seems to think 128 is sufficient:
>> >
>> >         https://crypto.stackexchange.com/questions/20/
>> what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>> >
>> >         For practical purposes, 128-bit keys are sufficient to ensure
>> security.
>> >         The larger key sizes exist mostly to satisfy some US military
>> >         regulations which call for the existence of several distinct
>> "security
>> >         levels", regardless of whether breaking the lowest level is already
>> far
>> >         beyond existing technology.
>>
>> After researching AES key sizes a bit more my vote is (surprisingly?) for
>> AES-128.  My reasoning is about security, I did not consider performance
>> impacts in my decision.
>
> Thank you for this exhaustive research.

First of all, that is a mischaracterization of the issue. That paper
also states:

"we describe several key derivation attacks of practical complexity on
AES-256 when its number of rounds is reduced to approximately that of
AES-128. The best previously published  attacks  on  such  variants
were  far  from  practical,  requiring  4  related  keys  and  2^120
time  to break a 9-round version of AES-256 [9], and 64 related keys and
2^172time to break a 10-round version of AES-256 ([9], see also [2]). In
this paper we describe an attack on 9-round AES-256 which can findits
complete 256-bit key in 239time by using only the simplest type of
related keys (in which the chosenplaintexts are encrypted under two keys
whose XOR difference can be chosen in many different ways).Our best
attack on 10-round AES-256 requires only two keys and 245time, but it
uses a stronger type ofrelated subkey attack. These attacks can be
extended into a quasi-practical 270attack on 11-round AES,and into a
trivial 226attack on 8-round AES."

Notice 2 key things about this:
1. The attacks described are against a reduced number of rounds. AES256
   is 14 rounds, not 9 or 10.
2, They are "related key" attacks, which can be avoided by not using
   related keys, and we certainly should not be doing that.

Also, please read the links that Sehrope sent earlier if you have not
done so. In particular this one:

https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf

which says:

"Post-quantum cryptography is an area of cryptography in which systems
are studied under the  security  assumption  that  the  attacker  has  a
 quantum  computer. This  attack  model  is interesting because Shor has
shown a quantum algorithm that breaks RSA, ECC, and finite field
discrete logarithms in polynomial time.  This means that in this model
all commonly used public-key systems are no longer secure.Symmetric
cryptography is also affected but significantly less.  For systems that
do not rely on mathematical structures the main effect is that an
algorithm due to Grover halves the security level, i.e., breaking
AES-128 takes 2^64 quantum operations while current attacks take  2^128
steps.   While  this  is  a  big  change,  it  can  be  managed  quite
easily  by  doubling the key sizes, e.g., by deploying AES-256.  The
operations needed in Grover’s algorithm are inherently  sequential
which  has  led  some  to  doubt  that  even  264quantum  operations
are feasible, but since the remedy of changing to larger key sizes is
very inexpensive it is generally recommended to do so."

In addition, that first paper was written in 2010, yet in 2016 NSA
published one of the other documents referenced by Sehrope:


https://apps.nsa.gov/iaarchive/customcf/openAttachment.cfm?FilePath=/iad/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/assets/public/upload/CNSA-Suite-and-Quantum-Computing-FAQ.pdf&WpKes=aF6woL7fQp3dJiyWTFKrYn3ZZShmLnzECSjJhf

Which states:

"If you have already implemented Suite B algorithms using the larger
(for TOP SECRET) key sizes, you should continue to use those algorithms
and key sizes through this upcoming transition period. In many products
changing to a larger key size can be done via a configuration
change.Implementations using only the algorithms previously approved for
SECRET and below in Suite B should not be used in NSS. In more precise
terms this means that NSS (National Security Systems) should no longer use
 •ECDH and ECDSA with NIST P-256
 •SHA-256
 •AES-128
 •RSA with 2048-bit keys
 •Diffie-Hellman with 2048-bit keys"


I stand by my position. At a minimum, we need a choice of AES128 and AES256.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/11/19 9:05 PM, Bruce Momjian wrote:
> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>> > Our first implementation will encrypt the entire cluster.  We can later
>> > consider encryption per table or tablespace.  It is unclear if
>> > encrypting different parts of the system with different keys is useful
>> > or feasible.  (This is separate from key rotation.)
>>
>> I still object strongly to using a single key for the entire database. I
>> think we can use a single key for WAL, but we need some way to split the
>> heap so that multiple keys are used. If not by tablespace, then some
>> other method.
>
> What do you base this on?

I have stated this more than once, and you and Stephen discussed it
earlier as well, but will find all the links back into the thread and
references and address in a separate email.

>> Regardless of the method to split the heap into different keys, I think
>> there should be an option for some tables to not be encrypted. If we
>> decide it must be all or nothing for the first implementation I guess I
>> could live with it but would be very disappointed.
>
> What does it mean you "could live this it"?  Why do you consider having
> some tables unencrypted important?


I think it is pretty obvious isn't it? You have talked about the
performance impact. Why would I want to encrypt, for example a lookup
table, if there is nothing in that table warranting encryption?

I think in many if not most applications the sensitive data is limited
to much less than all of the tables, and I'd rather not take the hit for
those tables.


>> The keys themselves should be in an file which is encrypted by a master
>> key. Obtaining the master key should be pattern it after the GUC
>> ssl_passphrase_command.
>>
>> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
>> > 8k pages will use the LSN as a nonce, which will be encrypted to
>> > generate the initialization vector (IV).  We will not encrypt the first
>> > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
>> > use the WAL file segment number as the nonce and the IV will be created
>> > in the same way.
>>
>> I vote for AES 256 rather than 128.
>
> Why?  This page seems to think 128 is sufficient:


Addressed in the other email


>> Thinking out loud (and I believe somewhere in this massive thread
>> someone else already said this), if we had a way to flag "key version"
>> at the page level it seems like we could potentially rekey page-by-page
>> while online, locking only one page at a time. We really only need to
>> support 2 key versions and could ping-pong between them as they change.
>> Or maybe this is a crazy idea.
>
> Yes, we did talk about this.  It is certainly possible, but we would
> still need a tool to guarantee all pages are using the new version, so I
> am not sure what per-page buys us except making the later check faster.
> I don't see this as a version-1 feature, frankly.

If we allow for say, 2 versions of the key to exist at any given time,
and if we could store that key version information on each page, we
could change the key from old to new without locking the entire table at
once, just locking one page at a time. Or at least that was my thinking.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/13/19 9:38 AM, Joe Conway wrote:
> On 7/11/19 9:05 PM, Bruce Momjian wrote:
>> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>>> > Our first implementation will encrypt the entire cluster.  We can later
>>> > consider encryption per table or tablespace.  It is unclear if
>>> > encrypting different parts of the system with different keys is useful
>>> > or feasible.  (This is separate from key rotation.)
>>>
>>> I still object strongly to using a single key for the entire database. I
>>> think we can use a single key for WAL, but we need some way to split the
>>> heap so that multiple keys are used. If not by tablespace, then some
>>> other method.
>>
>> What do you base this on?

Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
    Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.
---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per database.

But I did find that files in an encrypted file system are encrypted with
derived keys from a master key, and I view this as analogous to what we
are doing.

As an aside to the specific question, I also found more evidence that
AES 256 is appropriate.

Joe

============================
[1]
https://www.postgresql.org/message-id/832657a1-b27a-a33a-5bc2-ce420f95f4be%40joeconway.com

[2]
https://www.postgresql.org/message-id/20190708194733.cztnwhqge4acepzw%40development

[3]
https://www.postgresql.org/message-id/20190708211811.sio5o36zxhps7snx%40momjian.us

[4] https://www.rfc-editor.org/pdfrfc/rfc3686.txt.pdf

[5]
https://security.stackexchange.com/questions/33434/rsa-maximum-bytes-to-encrypt-comparison-to-aes-in-terms-of-security

[6]
https://docs.microsoft.com/en-us/azure/sql-database/transparent-data-encryption-azure-sql?view=sql-server-2017

[7]

https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-2017

[8] https://www.ecrypt.eu.org/csa/documents/D5.4-FinalAlgKeySizeProt.pdf

[9] https://www.ietf.org/rfc/rfc5869.txt

[10] https://tools.ietf.org/html/rfc3394

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>On 7/13/19 9:38 AM, Joe Conway wrote:
>> On 7/11/19 9:05 PM, Bruce Momjian wrote:
>>> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>>>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>>>> > Our first implementation will encrypt the entire cluster.  We can later
>>>> > consider encryption per table or tablespace.  It is unclear if
>>>> > encrypting different parts of the system with different keys is useful
>>>> > or feasible.  (This is separate from key rotation.)
>>>>
>>>> I still object strongly to using a single key for the entire database. I
>>>> think we can use a single key for WAL, but we need some way to split the
>>>> heap so that multiple keys are used. If not by tablespace, then some
>>>> other method.
>>>
>>> What do you base this on?
>
>Ok, so here we go. See links below. I skimmed through the entire thread
>and FWIW it was exhausting.
>
>To some extent this degenerated into a general search for relevant
>information:
>
>---
>[1] and [2] show that at least some file system encryption uses a
>different key per file.
>---
>[2] also shows that file system encryption uses a KDF (key derivation
>function) which we may want to use ourselves. The analogy would be
>per-table derived key instead of per file derived key. Note that KDF is
>a safe way to derive a key and it is not the same as a "related key"
>which was mentioned on another email as an attack vector.
>---
>[2] also says provides additional support for AES 256. It also mentions
>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>
>"Currently, the following pairs of encryption modes are supported:
>
>    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>    Adiantum for both contents and filenames
>
>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>
>AES-128-CBC was added only for low-powered embedded devices with crypto
>accelerators such as CAAM or CESA that do not support XTS."
>---
>[2] also states this, which again makes me think in terms of table being
>the moral equivalent to a file:
>
>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>at the block device level. This allows it to encrypt different files
>with different keys and to have unencrypted files on the same
>filesystem. This is useful for multi-user systems where each user’s
>data-at-rest needs to be cryptographically isolated from the others.
>However, except for filenames, fscrypt does not encrypt filesystem
>metadata."
>---
>[3] suggests 68 GB per key and unique IV in GCM mode.
>---
>[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
>directly to our proposal to use CTR for WAL.
>---
>[5] has this to say which seems independent of mode:
>
>"When encrypting data with a symmetric block cipher, which uses blocks
>of n bits, some security concerns begin to appear when the amount of
>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>blocks). This means a limit of more than 250 millions of terabytes,
>which is sufficiently large not to be a problem. That's precisely why
>AES was defined with 128-bit blocks, instead of the more common (at that
>time) 64-bit blocks: so that data size is practically unlimited."
>

FWIW I was a bit confused at first, because the copy paste mangled the
formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

>But goes on to say:
>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>reach that number of bits the probability of a collision will grow
>quickly and you will be way over 50% probability of a collision by the
>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>with the same key. In the case of AES that works out to 64GB"
>
>It is hard to say if that recommendation is per key or per key+IV.

Hmm, yeah. The question is what collisions they have in mind? Presumably
it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
IV, so per key+IV.

>---
>[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
>imply a single key is used although at one point it says "transparent
>data encryption master key, also known as the transparent data
>encryption protector". The term "master key" indicates that they likely
>use derived keys under the covers.
>---
>[7] is generally useful read about how many of the things we have been
>discussing are done in SQL Server
>---
>[8] was referenced by Sehrope. In addition to support for AES 256 for
>long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
>but not "future".
>---
>[9] IETF RFC for KDF
>---
>[10] IETF RFC for Key wrapping -- this is probably how we should wrap
>the master key with the Key Encryption Key (KEK) -- i.e. the outer key
>provided by the user or command on postmaster start
>---
>
>Based on all of that I cannot find a requirement that we use more than
>one key per database.
>
>But I did find that files in an encrypted file system are encrypted with
>derived keys from a master key, and I view this as analogous to what we
>are doing.
>

My understanding always was that we'd do something like that, i.e. we'd
have a master key (or perhaps multiple of them, for various users), but
the data would be encrypted with secondary (generated) keys, and those
secondary keys would be encrypted by the master key. At least that's
what was proposed at the beginning of this thread by Insung Moon.

But AFAICS the 2-tier key scheme is primarily motivated by operational
reasons, i.e. effort to rotate the master key etc. So I would not expect
to find recommendations to use multiple keys in sources primarily
dealing with cryptography.

One extra thing we should consider is authenticated encryption. We can't
just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
as that does not provide integrity protection (i.e. can't detect when
the ciphertext was corrupted due to disk failure or intentionally). And
we can't quite rely on checksums, because that checksums the plaintext
and is stored encrypted.

Which seems pretty annoying, because then the checksums won't verify
data as sent to the storage system, and verify checksums would require
access to all keys (how do you do that in offline mode?).

But the main issue with checksum-then-encrypt is it's essentially
"MAC-then-Encrypt" and that does not provide Authenticated Encryption
security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
in which case we'll need to store the MAC somewhere (probably in the
same place as the nonce/IV/key/... for each page).

I've also stumbled upon [2], which is a nice doctoral thesis about disk
encryption - in particular chapter 4 is a nice overview of the threat
model and use cases. That guy also had a nice talk at FOSDEM 2018 about
data dm-integrity etc. [3]

[1] https://www.cosic.esat.kuleuven.be/school-iot/slides/AuthenticatedEncryptionII.pdf

[2] https://is.muni.cz/th/vesfr/final.pdf

[3] https://ftp.fau.de/fosdem/2018/Janson/cryptsetup.mp4


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On 7/13/19 2:41 PM, Joe Conway wrote:
> [2]
> https://www.postgresql.org/message-id/20190708194733.cztnwhqge4acepzw%40development

BTW I managed to mess up this link. This is what I intended to link
there (from Tomas):

[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

I am sure I have confused the heck out of everyone reading what I wrote
by that error :-/

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On 7/13/19 5:58 PM, Tomas Vondra wrote:
> On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>>[2] also says provides additional support for AES 256. It also mentions
>>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>>
>>"Currently, the following pairs of encryption modes are supported:
>>
>>    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>>    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>>    Adiantum for both contents and filenames
>>
>>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>>
>>AES-128-CBC was added only for low-powered embedded devices with crypto
>>accelerators such as CAAM or CESA that do not support XTS."
>>---
>>[2] also states this, which again makes me think in terms of table being
>>the moral equivalent to a file:
>>
>>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>>at the block device level. This allows it to encrypt different files
>>with different keys and to have unencrypted files on the same
>>filesystem. This is useful for multi-user systems where each user’s
>>data-at-rest needs to be cryptographically isolated from the others.
>>However, except for filenames, fscrypt does not encrypt filesystem
>>metadata."

<snip>

>>[5] has this to say which seems independent of mode:
>>
>>"When encrypting data with a symmetric block cipher, which uses blocks
>>of n bits, some security concerns begin to appear when the amount of
>>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>>blocks). This means a limit of more than 250 millions of terabytes,
>>which is sufficiently large not to be a problem. That's precisely why
>>AES was defined with 128-bit blocks, instead of the more common (at that
>>time) 64-bit blocks: so that data size is practically unlimited."
>>
>
> FWIW I was a bit confused at first, because the copy paste mangled the
> formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

Yeah, sorry about that.

>>But goes on to say:
>>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>>reach that number of bits the probability of a collision will grow
>>quickly and you will be way over 50% probability of a collision by the
>>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>>with the same key. In the case of AES that works out to 64GB"
>>
>>It is hard to say if that recommendation is per key or per key+IV.
>
> Hmm, yeah. The question is what collisions they have in mind? Presumably
> it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
> IV, so per key+IV.

Seems likely.

>>But I did find that files in an encrypted file system are encrypted with
>>derived keys from a master key, and I view this as analogous to what we
>>are doing.
>>
>
> My understanding always was that we'd do something like that, i.e. we'd
> have a master key (or perhaps multiple of them, for various users), but
> the data would be encrypted with secondary (generated) keys, and those
> secondary keys would be encrypted by the master key. At least that's
> what was proposed at the beginning of this thread by Insung Moon.

In my email I linked the wrong page for [2]. The correct one is here:
[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Following that, I think we could end up with three tiers:

1. A master key encryption key (KEK): this is the ley supplied by the
   database admin using something akin to ssl_passphrase_command

2. A master data encryption key (MDEK): this is a generated key using a
   cryptographically secure pseudo-random number generator. It is
   encrypted using the KEK, probably with Key Wrap (KW):
   or maybe better Key Wrap with Padding (KWP):

3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
    table specific keys.

3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
    generate new keys when needed for WAL (based on the other info we
    need to change WAL keys every 68 GB unless I read that wrong).

I believe that would allows us to have multiple keys but they are
derived securely from the one DEK using available info similar to the
way we intend to use LSN to derive the IVs -- perhaps table.oid for
tables and something else for WAL.

We also need to figure out how/when to generate new WDEK. Maybe every
checkpoint, also meaning we would have to force a checkpoint every 68GB?

[HKDF]: https://tools.ietf.org/html/rfc5869
[KW]: https://tools.ietf.org/html/rfc3394
[KWP]: https://tools.ietf.org/html/rfc5649


> But AFAICS the 2-tier key scheme is primarily motivated by operational
> reasons, i.e. effort to rotate the master key etc. So I would not expect
> to find recommendations to use multiple keys in sources primarily
> dealing with cryptography.

It does in [2]


> One extra thing we should consider is authenticated encryption. We can't
> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> as that does not provide integrity protection (i.e. can't detect when
> the ciphertext was corrupted due to disk failure or intentionally). And
> we can't quite rely on checksums, because that checksums the plaintext
> and is stored encrypted.

I agree that authenticated encryption would be a good goal. I'm not sure
we need to require it for the first version, although it would mean
another option for the encryption type. That may be another good reason
to allow both AES 128 and AES 256 CTR/CBC in the first version, as it
will hopefully ensure that when we add different modes later it will be
less painful.

We could check the CRC prior to encryption and throw an ERROR if it is
not correct. After decryption we can check it again -- if it no longer
matches we would know there way a corruption or change of the
ciphertext, no?

Hmm, I guess the entire page of ciphertext could be faked including CRC,
so this would only really cover corruption, not an intentional change if
it were done properly.

> Which seems pretty annoying, because then the checksums won't verify
> data as sent to the storage system, and verify checksums would require
> access to all keys (how do you do that in offline mode?).

Given the scheme above I don't see why that would be an issue. The keys
are all accessible via the MDEK, which is in turn available via the KEK.

> But the main issue with checksum-then-encrypt is it's essentially
> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
> in which case we'll need to store the MAC somewhere (probably in the
> same place as the nonce/IV/key/... for each page).


Yeah, that's why I think maybe this is a v2 feature.


> I've also stumbled upon [2], which is a nice doctoral thesis about disk
> encryption - in particular chapter 4 is a nice overview of the threat
> model and use cases. That guy also had a nice talk at FOSDEM 2018 about
> data dm-integrity etc. [3]
>
> [1] https://www.cosic.esat.kuleuven.be/school-iot/slides/AuthenticatedEncryptionII.pdf
> [2] https://is.muni.cz/th/vesfr/final.pdf
> [3] https://ftp.fau.de/fosdem/2018/Janson/cryptsetup.mp4

Awesome links -- thanks!

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Attachment
On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:
>On 7/13/19 5:58 PM, Tomas Vondra wrote:
>> On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>>>[2] also says provides additional support for AES 256. It also mentions
>>>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>>>
>>>"Currently, the following pairs of encryption modes are supported:
>>>
>>>    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>>>    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>>>    Adiantum for both contents and filenames
>>>
>>>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>>>
>>>AES-128-CBC was added only for low-powered embedded devices with crypto
>>>accelerators such as CAAM or CESA that do not support XTS."
>>>---
>>>[2] also states this, which again makes me think in terms of table being
>>>the moral equivalent to a file:
>>>
>>>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>>>at the block device level. This allows it to encrypt different files
>>>with different keys and to have unencrypted files on the same
>>>filesystem. This is useful for multi-user systems where each user’s
>>>data-at-rest needs to be cryptographically isolated from the others.
>>>However, except for filenames, fscrypt does not encrypt filesystem
>>>metadata."
>
><snip>
>
>>>[5] has this to say which seems independent of mode:
>>>
>>>"When encrypting data with a symmetric block cipher, which uses blocks
>>>of n bits, some security concerns begin to appear when the amount of
>>>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>>>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>>>blocks). This means a limit of more than 250 millions of terabytes,
>>>which is sufficiently large not to be a problem. That's precisely why
>>>AES was defined with 128-bit blocks, instead of the more common (at that
>>>time) 64-bit blocks: so that data size is practically unlimited."
>>>
>>
>> FWIW I was a bit confused at first, because the copy paste mangled the
>> formulas a bit - it should have been 2^(n/2) and n*2^(n/2).
>
>Yeah, sorry about that.
>
>>>But goes on to say:
>>>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>>>reach that number of bits the probability of a collision will grow
>>>quickly and you will be way over 50% probability of a collision by the
>>>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>>>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>>>with the same key. In the case of AES that works out to 64GB"
>>>
>>>It is hard to say if that recommendation is per key or per key+IV.
>>
>> Hmm, yeah. The question is what collisions they have in mind? Presumably
>> it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
>> IV, so per key+IV.
>
>Seems likely.
>
>>>But I did find that files in an encrypted file system are encrypted with
>>>derived keys from a master key, and I view this as analogous to what we
>>>are doing.
>>>
>>
>> My understanding always was that we'd do something like that, i.e. we'd
>> have a master key (or perhaps multiple of them, for various users), but
>> the data would be encrypted with secondary (generated) keys, and those
>> secondary keys would be encrypted by the master key. At least that's
>> what was proposed at the beginning of this thread by Insung Moon.
>
>In my email I linked the wrong page for [2]. The correct one is here:
>[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html
>
>Following that, I think we could end up with three tiers:
>
>1. A master key encryption key (KEK): this is the ley supplied by the
>   database admin using something akin to ssl_passphrase_command
>
>2. A master data encryption key (MDEK): this is a generated key using a
>   cryptographically secure pseudo-random number generator. It is
>   encrypted using the KEK, probably with Key Wrap (KW):
>   or maybe better Key Wrap with Padding (KWP):
>
>3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
>    table specific keys.
>
>3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
>    generate new keys when needed for WAL (based on the other info we
>    need to change WAL keys every 68 GB unless I read that wrong).
>
>I believe that would allows us to have multiple keys but they are
>derived securely from the one DEK using available info similar to the
>way we intend to use LSN to derive the IVs -- perhaps table.oid for
>tables and something else for WAL.
>
>We also need to figure out how/when to generate new WDEK. Maybe every
>checkpoint, also meaning we would have to force a checkpoint every 68GB?
>

I think that very much depends on what exactly the 68GB refers to - key
or key+IV? If key+IV, then I suppose we can use LSN as IV and we would
not need to change checkpoints. But it's not clear to me why we would
need to force checkpoints at all? Surely we can just write a WAL message
about switching to the new key, or something like that?

>[HKDF]: https://tools.ietf.org/html/rfc5869
>[KW]: https://tools.ietf.org/html/rfc3394
>[KWP]: https://tools.ietf.org/html/rfc5649
>
>
>> But AFAICS the 2-tier key scheme is primarily motivated by operational
>> reasons, i.e. effort to rotate the master key etc. So I would not expect
>> to find recommendations to use multiple keys in sources primarily
>> dealing with cryptography.
>
>It does in [2]
>
>
>> One extra thing we should consider is authenticated encryption. We can't
>> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> as that does not provide integrity protection (i.e. can't detect when
>> the ciphertext was corrupted due to disk failure or intentionally). And
>> we can't quite rely on checksums, because that checksums the plaintext
>> and is stored encrypted.
>
>I agree that authenticated encryption would be a good goal. I'm not sure
>we need to require it for the first version, although it would mean
>another option for the encryption type. That may be another good reason
>to allow both AES 128 and AES 256 CTR/CBC in the first version, as it
>will hopefully ensure that when we add different modes later it will be
>less painful.
>
>We could check the CRC prior to encryption and throw an ERROR if it is
>not correct. After decryption we can check it again -- if it no longer
>matches we would know there way a corruption or change of the
>ciphertext, no?
>
>Hmm, I guess the entire page of ciphertext could be faked including CRC,
>so this would only really cover corruption, not an intentional change if
>it were done properly.
>

I don't think any of the schemes discussed here provides protection
against this sort of replay attacks (i.e. replacing a page with an older
copy of the page). That would probably require having some global
checksum or something like that.

>> Which seems pretty annoying, because then the checksums won't verify
>> data as sent to the storage system, and verify checksums would require
>> access to all keys (how do you do that in offline mode?).
>
>Given the scheme above I don't see why that would be an issue. The keys
>are all accessible via the MDEK, which is in turn available via the KEK.
>

I just don't know how the offline tools will access the KMS to get the
keys. But maybe that's not an issue. But even then I think it's kinda
against the idea of checksums that they would not checksum what was sent
to the storage system.

>> But the main issue with checksum-then-encrypt is it's essentially
>> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
>> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
>> in which case we'll need to store the MAC somewhere (probably in the
>> same place as the nonce/IV/key/... for each page).
>
>
>Yeah, that's why I think maybe this is a v2 feature.
>

Maybe - as long as we design it with enough flexibility to enable it
later, that might work. That depends on where we store the metadata,
etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
> >On 7/13/19 9:38 AM, Joe Conway wrote:

> >[5] has this to say which seems independent of mode:
> >
> >"When encrypting data with a symmetric block cipher, which uses blocks
> >of n bits, some security concerns begin to appear when the amount of
> >data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
> >bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
> >blocks). This means a limit of more than 250 millions of terabytes,
> >which is sufficiently large not to be a problem. That's precisely why
> >AES was defined with 128-bit blocks, instead of the more common (at that
> >time) 64-bit blocks: so that data size is practically unlimited."
> >
> 
> FWIW I was a bit confused at first, because the copy paste mangled the
> formulas a bit - it should have been 2^(n/2) and n*2^(n/2).
> 
> >But goes on to say:
> >"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
> >reach that number of bits the probability of a collision will grow
> >quickly and you will be way over 50% probability of a collision by the
> >time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
> >collision negligible I recommend encrypting no more than n*2^(n/4) bits
> >with the same key. In the case of AES that works out to 64GB"
> >
> >It is hard to say if that recommendation is per key or per key+IV.
> 
> Hmm, yeah. The question is what collisions they have in mind? Presumably
> it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
> IV, so per key+IV.

I've spent a while trying to understand where the formula comes from. If the
problem can be expressed as "avoidance of repeating blocks of ciphertext",
then it's basically the known "birthday problem". Then we can use this formula
[1]

n ~ sqrt(2 * m * p(n))

(note that the meaning of "n" is different form the formula introduced
upthread) and substitute

1) 0.5 for the probability p(n)

2) 2^b for the number of distinct blocks "m", where "b" is number of bits in
an encryption block

Then the formula becomes

    n ~ sqrt(2^b)

and thus

    n ~ 2^(b/2)

So if the number of safely encrypted blocks was derived this way, I agree that
IV was not taken into consideration: if there is an IV, then identical blocks
of ciphertext are not a problem because they represent different blocks of
plaintext.


[1] https://en.wikipedia.org/wiki/Birthday_problem#Square_approximation

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Mon, Jun 17, 2019 at 11:02 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Mon, Jun 17, 2019 at 08:39:27AM -0400, Joe Conway wrote:
> > >On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> > >> From perspective of  cryptographic, I think the fine grained TDE would
> > >> be better solution. Therefore if we eventually want the fine grained
> > >> TDE I wonder if it might be better to develop the table/tablespace TDE
> > >> first while keeping it simple as much as possible in v1, and then we
> > >> can provide the functionality to encrypt other data in database
> > >> cluster to satisfy the encrypting-everything requirement. I guess that
> > >> it's easier to incrementally add encryption target objects rather than
> > >> making it fine grained while not changing encryption target objects.
> > >>
> > >> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> > >> it in this month. We can more discuss the complexity of the proposed
> > >> TDE using it.
> > >
> > >+1
> > >
> > >Looking forward to it.
> > >
> >
> > Yep. In particular, I'm interested in those aspects:
> >
> 
> Attached the draft version patch sets of per tablespace transparent
> data at rest encryption.

I was worried that there's competition between us but now that I've checked
your patch set I see that you already use some parts of

https://commitfest.postgresql.org/23/2104/

although not the latest version. I'm supposed to work on the encryption now,
so thinking what to do next. I think we should coordinate the effort, possibly
off-list. The earlier we have a single patch set the more efficient the work
should be.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Sat, Jul 13, 2019 at 09:38:06AM -0400, Joe Conway wrote:
> On 7/11/19 9:05 PM, Bruce Momjian wrote:
> >> Regardless of the method to split the heap into different keys, I think
> >> there should be an option for some tables to not be encrypted. If we
> >> decide it must be all or nothing for the first implementation I guess I
> >> could live with it but would be very disappointed.
> > 
> > What does it mean you "could live this it"?  Why do you consider having
> > some tables unencrypted important?
> 
> 
> I think it is pretty obvious isn't it? You have talked about the
> performance impact. Why would I want to encrypt, for example a lookup
> table, if there is nothing in that table warranting encryption?

Well, a lookup table is not going to get many writes, and will usually
stay in shared buffers, so I don't see the value in skipping encryption
of it.  However, the big issue is that having encryption on only some
tables has a cost, both in code complexity, and in looking up pg_class
or pg_tablespace rows to find out what needs encryption.  Also, it would
lead to information leakage if we don't encrypt all of WAL but instead
only encrypt the entries for tables/tablespaces that need encryption. 
(I don't think we discussed how WAL would be encrypted in such cases.)

My point is that doing encryption of only some data might actually make
the system slower due to the lookups, so I think we need to implement
all-cluster encryption and then see what the overhead is, and if there
are use-cases for not encrypting only some data.

> I think in many if not most applications the sensitive data is limited
> to much less than all of the tables, and I'd rather not take the hit for
> those tables.

Agreed, but let's see what the overhead it first, and also decide how
WAL would be handled in such cases.

> >> Thinking out loud (and I believe somewhere in this massive thread
> >> someone else already said this), if we had a way to flag "key version"
> >> at the page level it seems like we could potentially rekey page-by-page
> >> while online, locking only one page at a time. We really only need to
> >> support 2 key versions and could ping-pong between them as they change.
> >> Or maybe this is a crazy idea.
> > 
> > Yes, we did talk about this.  It is certainly possible, but we would
> > still need a tool to guarantee all pages are using the new version, so I
> > am not sure what per-page buys us except making the later check faster. 
> > I don't see this as a version-1 feature, frankly.
> 
> If we allow for say, 2 versions of the key to exist at any given time,
> and if we could store that key version information on each page, we
> could change the key from old to new without locking the entire table at
> once, just locking one page at a time. Or at least that was my thinking.

Yes, that is true, but eventually we will need to do key rotation again,
so we have to be sure the old key is no longer being used, so we would
still need a tool to check everything to be sure all the pages are using
the new key, and that could be done a page at a time.

It does feel like our checksum problem and I am hoping the
infrastructure to allow that to be done online can be used for this too.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
> On 7/13/19 9:38 AM, Joe Conway wrote:
> ---
> [1] and [2] show that at least some file system encryption uses a
> different key per file.

Yes, I see later they did that for per-file keys, but I think with WAL
and crash recovery we decided there was little value in trying that
since all keys would need to be online for recovery.

> ---
> [2] also shows that file system encryption uses a KDF (key derivation
> function) which we may want to use ourselves. The analogy would be
> per-table derived key instead of per file derived key. Note that KDF is
> a safe way to derive a key and it is not the same as a "related key"
> which was mentioned on another email as an attack vector.
> ---
> [2] also says provides additional support for AES 256. It also mentions
> CBC versus XTS -- I came across this elsewhere and it bears discussion:
> 
> "Currently, the following pairs of encryption modes are supported:
> 
>     AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>     AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>     Adiantum for both contents and filenames
> 
> If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
> 
> AES-128-CBC was added only for low-powered embedded devices with crypto
> accelerators such as CAAM or CESA that do not support XTS."

It would be nice to understand what XTS adds to CBC.

> [5] has this to say which seems independent of mode:
> 
> "When encrypting data with a symmetric block cipher, which uses blocks
> of n bits, some security concerns begin to appear when the amount of
> data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
> bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
> blocks). This means a limit of more than 250 millions of terabytes,
> which is sufficiently large not to be a problem. That's precisely why
> AES was defined with 128-bit blocks, instead of the more common (at that
> time) 64-bit blocks: so that data size is practically unlimited."
> 
> But goes on to say:
> "I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
> reach that number of bits the probability of a collision will grow
> quickly and you will be way over 50% probability of a collision by the
> time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
> collision negligible I recommend encrypting no more than n*2^(n/4) bits
> with the same key. In the case of AES that works out to 64GB"
>
> It is hard to say if that recommendation is per key or per key+IV.

When they mention collision, are they assuming a random nonce?  I am
guessing they do, I think the LSN avoids that problem because we
effectively have a counter.

> ---
> [6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
> imply a single key is used although at one point it says "transparent
> data encryption master key, also known as the transparent data
> encryption protector". The term "master key" indicates that they likely
> use derived keys under the covers.
> ---
> [7] is generally useful read about how many of the things we have been
> discussing are done in SQL Server
> ---
> [8] was referenced by Sehrope. In addition to support for AES 256 for
> long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
> but not "future".

Interesting.  Is the a reason stated?

> ---
> [9] IETF RFC for KDF
> ---
> [10] IETF RFC for Key wrapping -- this is probably how we should wrap
> the master key with the Key Encryption Key (KEK) -- i.e. the outer key
> provided by the user or command on postmaster start

Yes, I think we all agreed to have a passphrase to lock the encryption
keys.

> ---
> 
> Based on all of that I cannot find a requirement that we use more than
> one key per database.

You mean cluster, right?  That is great news!

> But I did find that files in an encrypted file system are encrypted with
> derived keys from a master key, and I view this as analogous to what we
> are doing.

Agreed.

> As an aside to the specific question, I also found more evidence that
> AES 256 is appropriate.

I think we should allow the AES128/AES256 to be optional on version 1 of
the feature, or at least call the initdb option --encrypt-aes128, like
we did with SCRAM, so we have a clear path to adding AES256.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> One extra thing we should consider is authenticated encryption. We can't
> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> as that does not provide integrity protection (i.e. can't detect when
> the ciphertext was corrupted due to disk failure or intentionally). And
> we can't quite rely on checksums, because that checksums the plaintext
> and is stored encrypted.

Uh, if someone modifies a few bytes of the page, we will decrypt it, but
the checksum (per-page or WAL) will not match our decrypted output.  How
would they make it match the checksum without already knowing the key. 
I read [1] but could not see that explained.

This post discussed it:

    https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

I realize in a new system we might prefer encrypt-then-mac, TLS and SSL
do it differently, and I don't think the security problems of
MAC-then-Encrypt apply to our use-case, e.g. API programming errors.

If we want to go crazy, we could encrypt, assume zeros for the CRC,
compute the MAC and put it in the place of the CRC is, but then tools
that read CRC would see that as an error, so we don't want to go there.
Yes, crazy.

> Which seems pretty annoying, because then the checksums won't verify
> data as sent to the storage system, and verify checksums would require
> access to all keys (how do you do that in offline mode?).

Uh, the keys are stored in a PGDATA file --- seems simple enough, but we
would either have to do whole-cluster encryption or have some per-page
encryption flag.

> But the main issue with checksum-then-encrypt is it's essentially
> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
> in which case we'll need to store the MAC somewhere (probably in the
> same place as the nonce/IV/key/... for each page).

I don't think we are planning to store the nonce/IV on each page but
rather use the LSN (already on the page), and perhaps in addition, the
page number.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:
> On 7/13/19 5:58 PM, Tomas Vondra wrote:
> In my email I linked the wrong page for [2]. The correct one is here:
> [2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html
> 
> Following that, I think we could end up with three tiers:
> 
> 1. A master key encryption key (KEK): this is the ley supplied by the
>    database admin using something akin to ssl_passphrase_command
> 
> 2. A master data encryption key (MDEK): this is a generated key using a
>    cryptographically secure pseudo-random number generator. It is
>    encrypted using the KEK, probably with Key Wrap (KW):
>    or maybe better Key Wrap with Padding (KWP):
> 
> 3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
>     table specific keys.

Uh, when was per-table encryption keys discussed?  Uses pg_class.oid or
relfilenode?

> 3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
>     generate new keys when needed for WAL (based on the other info we
>     need to change WAL keys every 68 GB unless I read that wrong).

I thought we were going to use the WAL segement number for each 16MB
file so eachy 16MB gets a new nonce.

> I believe that would allows us to have multiple keys but they are
> derived securely from the one DEK using available info similar to the
> way we intend to use LSN to derive the IVs -- perhaps table.oid for
> tables and something else for WAL.

Ah, got it.  We might want to use relfilenode (and have pg_upgrade
preserve it) to avoid having to do catalog lookups during WAL recovery.
However, I thought we were still unclear if that 68GB is per secret or
per nonce/secret.

> > One extra thing we should consider is authenticated encryption. We can't
> > just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> > as that does not provide integrity protection (i.e. can't detect when
> > the ciphertext was corrupted due to disk failure or intentionally). And
> > we can't quite rely on checksums, because that checksums the plaintext
> > and is stored encrypted.
> 
> I agree that authenticated encryption would be a good goal. I'm not sure
> we need to require it for the first version, although it would mean
> another option for the encryption type. That may be another good reason
> to allow both AES 128 and AES 256 CTR/CBC in the first version, as it
> will hopefully ensure that when we add different modes later it will be
> less painful.
> 
> We could check the CRC prior to encryption and throw an ERROR if it is
> not correct. After decryption we can check it again -- if it no longer
> matches we would know there way a corruption or change of the
> ciphertext, no?

Yes, that is my hope too.

> Hmm, I guess the entire page of ciphertext could be faked including CRC,
> so this would only really cover corruption, not an intentional change if
> it were done properly.

Uh, how would they get a CRC to decrypt to match their page contents?

> > Which seems pretty annoying, because then the checksums won't verify
> > data as sent to the storage system, and verify checksums would require
> > access to all keys (how do you do that in offline mode?).
> 
> Given the scheme above I don't see why that would be an issue. The keys
> are all accessible via the MDEK, which is in turn available via the KEK.

Yep.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 15, 2019 at 03:47:59AM +0200, Tomas Vondra wrote:
> On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:
> > We could check the CRC prior to encryption and throw an ERROR if it is
> > not correct. After decryption we can check it again -- if it no longer
> > matches we would know there way a corruption or change of the
> > ciphertext, no?
> > 
> > Hmm, I guess the entire page of ciphertext could be faked including CRC,
> > so this would only really cover corruption, not an intentional change if
> > it were done properly.
> > 
> 
> I don't think any of the schemes discussed here provides protection
> against this sort of replay attacks (i.e. replacing a page with an older
> copy of the page). That would probably require having some global
> checksum or something like that.

Uh, I think the only thing we could do is to add the page number into
the nonce so the page would have to be replaced in the same place in the
table, but it hardly seems worth it.

> > > Which seems pretty annoying, because then the checksums won't verify
> > > data as sent to the storage system, and verify checksums would require
> > > access to all keys (how do you do that in offline mode?).
> > 
> > Given the scheme above I don't see why that would be an issue. The keys
> > are all accessible via the MDEK, which is in turn available via the KEK.
> > 
> 
> I just don't know how the offline tools will access the KMS to get the
> keys. But maybe that's not an issue. But even then I think it's kinda
> against the idea of checksums that they would not checksum what was sent
> to the storage system.

Oh, I see your point now.  pgchecksum will look at the page and think it
is corrupt.  It would need access to the keys to verify it, and only for
whole-cluster encryption or if there is a per-page flag (it can't easily
do system table lookups).

The crazy seems more sane now --- "encrypt the page with CRC contents as
zero" (which we probably already do to compute the CRC), then compute
the CRC, and modify the page CRC.

I kind of feel we need to decide this now so our tooling can plan for it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 13, 2019 at 07:34:22AM -0400, Joe Conway wrote:
> I stand by my position. At a minimum, we need a choice of AES128 and AES256.

These are compelling arguments.  Agreed.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 15, 2019 at 03:55:38PM -0400, Bruce Momjian wrote:
>On Mon, Jul 15, 2019 at 03:47:59AM +0200, Tomas Vondra wrote:
>> On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:
>> > We could check the CRC prior to encryption and throw an ERROR if it is
>> > not correct. After decryption we can check it again -- if it no longer
>> > matches we would know there way a corruption or change of the
>> > ciphertext, no?
>> >
>> > Hmm, I guess the entire page of ciphertext could be faked including CRC,
>> > so this would only really cover corruption, not an intentional change if
>> > it were done properly.
>> >
>>
>> I don't think any of the schemes discussed here provides protection
>> against this sort of replay attacks (i.e. replacing a page with an older
>> copy of the page). That would probably require having some global
>> checksum or something like that.
>
>Uh, I think the only thing we could do is to add the page number into
>the nonce so the page would have to be replaced in the same place in the
>table, but it hardly seems worth it.
>
>> > > Which seems pretty annoying, because then the checksums won't verify
>> > > data as sent to the storage system, and verify checksums would require
>> > > access to all keys (how do you do that in offline mode?).
>> >
>> > Given the scheme above I don't see why that would be an issue. The keys
>> > are all accessible via the MDEK, which is in turn available via the KEK.
>> >
>>
>> I just don't know how the offline tools will access the KMS to get the
>> keys. But maybe that's not an issue. But even then I think it's kinda
>> against the idea of checksums that they would not checksum what was sent
>> to the storage system.
>
>Oh, I see your point now.  pgchecksum will look at the page and think it
>is corrupt.  It would need access to the keys to verify it, and only for
>whole-cluster encryption or if there is a per-page flag (it can't easily
>do system table lookups).
>
>The crazy seems more sane now --- "encrypt the page with CRC contents as
>zero" (which we probably already do to compute the CRC), then compute
>the CRC, and modify the page CRC.
>

Huh? So you want to

1) set CRC to 0
2) encrypt the page
3) compute CRC
4) set CRC to value computed in (3)
5) encrypt the page again

That seems pretty awful from performance POV, and it does not really
solve much as we'd still need to decrypt the page while verifying the
checksums (because the CRC is in the page header, which is encrypted).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> One extra thing we should consider is authenticated encryption. We can't
>> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> as that does not provide integrity protection (i.e. can't detect when
>> the ciphertext was corrupted due to disk failure or intentionally). And
>> we can't quite rely on checksums, because that checksums the plaintext
>> and is stored encrypted.
>
>Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>the checksum (per-page or WAL) will not match our decrypted output.  How
>would they make it match the checksum without already knowing the key.
>I read [1] but could not see that explained.
>

Our checksum is only 16 bits, so perhaps one way would be to just
generate 64k of randomly modified pages and hope one of them happens to
hit the right checksum value. Not sure how practical such attack is, but
it does require just filesystem access.

FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
a cryptographic hash algorithm. Now, maybe we don't want authenticated
encryption (e.g. XTS is not authenticated, unlike GCM/CCM).

>This post discussed it:
>
>    https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
>
>I realize in a new system we might prefer encrypt-then-mac, TLS and SSL
>do it differently, and I don't think the security problems of
>MAC-then-Encrypt apply to our use-case, e.g. API programming errors.
>
>If we want to go crazy, we could encrypt, assume zeros for the CRC,
>compute the MAC and put it in the place of the CRC is, but then tools
>that read CRC would see that as an error, so we don't want to go there.
>Yes, crazy.
>
>> Which seems pretty annoying, because then the checksums won't verify
>> data as sent to the storage system, and verify checksums would require
>> access to all keys (how do you do that in offline mode?).
>
>Uh, the keys are stored in a PGDATA file --- seems simple enough, but we
>would either have to do whole-cluster encryption or have some per-page
>encryption flag.
>

And how do you know which files are encrypted and which are not, and
which keys are used for which file? Presumably that's in some system
catalog, which is not available in offline mode. 

>> But the main issue with checksum-then-encrypt is it's essentially
>> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
>> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
>> in which case we'll need to store the MAC somewhere (probably in the
>> same place as the nonce/IV/key/... for each page).
>
>I don't think we are planning to store the nonce/IV on each page but
>rather use the LSN (already on the page), and perhaps in addition, the
>page number.

But the LSN is in the page header, and AFAICS the page header is
encrypted. So how do you decrypt the page without knowing the LSN (which
I think you need to know in otder to derive the IV)?

Also, we probably don't want to expose the checksum, because it may
reveal information about page contents (since it's not a HMAC).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 15, 2019 at 10:44:34PM +0200, Tomas Vondra wrote:
> On Mon, Jul 15, 2019 at 03:55:38PM -0400, Bruce Momjian wrote:
> > The crazy seems more sane now --- "encrypt the page with CRC contents as
> > zero" (which we probably already do to compute the CRC), then compute
> > the CRC, and modify the page CRC.
> > 
> 
> Huh? So you want to
> 
> 1) set CRC to 0
> 2) encrypt the page
> 3) compute CRC
> 4) set CRC to value computed in (3)
> 5) encrypt the page again
> 
> That seems pretty awful from performance POV, and it does not really
> solve much as we'd still need to decrypt the page while verifying the
> checksums (because the CRC is in the page header, which is encrypted).

No, I was thinking we would overwrite whatever the encrypted output was
in the spot that has the CRC with the computed CRC.  Yeah, sounds even
crazier now that I said it --- never mind.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hi,

Some more thoughts on CBC vs CTR modes. There are a number of
advantages to using CTR mode for page encryption.

CTR encryption modes can be fully parallelized, whereas CBC can only
parallelized for decryption. While both can use AES specific hardware
such as AES-NI, CTR modes can go a step further and use vectorized
instructions.

On an i7-8559U (with AES-NI) I get a 4x speed improvement for
CTR-based modes vs CBC when run on 8K of data:

# openssl speed -evp ${cipher}
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
aes-128-cbc    1024361.51k  1521249.60k  1562033.41k  1571663.87k
1574537.90k  1575512.75k
aes-128-ctr     696866.85k  2214441.86k  4364903.85k  5896221.35k
6559735.81k  6619594.75k
aes-128-gcm     642758.92k  1638619.09k  3212068.27k  5085193.22k
6366035.97k  6474006.53k
aes-256-cbc     940906.25k  1114628.44k  1131255.13k  1138385.92k
1140258.13k  1143592.28k
aes-256-ctr     582161.82k  1896409.32k  3216926.12k  4249708.20k
4680299.86k  4706375.00k
aes-256-gcm     553513.89k  1532556.16k  2705510.57k  3931744.94k
4615812.44k  4673093.63k

For relation data where the encryption is going to be per page,
there's flexibility in how the CTR nonce (IV + counter) is generated.
With an 8K page, the counter need only go up to 512 for each page
(8192-bytes per page / 16-bytes per AES-block). That would require
9-bits for the counter. Rounding that up to 16-bits allows for wider
pages and it still uses only two bytes of the counter while ensuring
that it'd be unique per AES-block. The remaining 14-bytes would be
populated with some other data that is guaranteed unique per
page-write to allow encryption via the same per-relation-file derived
key. From what I gather, the LSN is a candidate though it'd have to be
stored in plaintext for decryption.

What's important is that writing the two pages (either different
locations or the same page back again) never reuses the same nonce
with the same key. Using the same nonce with a different key is fine.

With any of these schemes the same inputs will generate the same
outputs. With CTR mode for WAL this would be an issue if the same key
and deterministic nonce (ex: LSN + offset) is reused in multiple
places. That does not have to be the same cluster either. For example
if two replicas are promoted from the same backup with the same master
key, they would generate the same WAL CTR stream, reusing the
key/nonce pair. Ditto for starting off with a master key and deriving
per-relation keys in a cloned installation off some deterministic
attribute such as oid.

This can be avoided by deriving new keys per file (not just per
relation) from a random salt. It'd be stored out of band and combined
with the master key to derive the specific key used for that CTR
stream. If there's a desire for supporting multiple ciphers or key
sizes, that could be stored alongside the salt. Perhaps use the same
location or lack of it to indicate "not encrypted" as well.

Per-file salts and derived keys would facilitate re-keying a table
piecemeal, file by file, by generating a new salt/derived-key,
encrypting a copy of the decrypted file, and doing an atomic rename.
The files contents would change but its length and any references to
pages or byte offsets would stay valid. (I think this would work for
CBC modes too as there's nothing CTR specific about it.)

I'm not sure of is how to handle randomizing the relation file IV in a
cloned database. Until the key for a relation file or segment is
rotated it'd have the same deterministic IV generated as its source as
the LSN would continue from the same point. One idea is with 128-bits
for the IV, one could have 64-bits for LSN, 16-bits for AES-block
counter, and the remaining 48-bits be randomized; though you'd need to
store those 48-bits somewhere per-page (basically it's a salt per
page). That'd give some protection from the clone's new data be
encrypted with the same stream as the parent's. Another option would
be to track ranges of LSNs and have a centralized list of 48-bit
randomized salts. That would remove the need for additional salt per
page though you'd have to do a lookup on that shared list to figure
out which to use.

CTR mode is definitely more complicated than a pure random-IV + CBC
but with any deterministic generation of IVs for CBC mode you're going
to have some of these same problems as well.

Regarding CRCs, CTR mode has the advantage of not destroying the rest
of the stream to replace the CRC bytes. With CBC mode any change would
cascade and corrupt the rest of data the down stream from that block.
With CTR mode you can overwrite the CRC's location with the CRC or a
truncated MAC of the encrypted data as each byte is encrypted
separately. At decryption time you simply ignore the decrypted output
of those bytes and zero them out again. A CRC of encrypted data (but
not a partial MAC) could be checked offline without access to the key.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On Mon, Jul 15, 2019 at 11:05:30PM +0200, Tomas Vondra wrote:
> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
> > On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> > > One extra thing we should consider is authenticated encryption. We can't
> > > just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> > > as that does not provide integrity protection (i.e. can't detect when
> > > the ciphertext was corrupted due to disk failure or intentionally). And
> > > we can't quite rely on checksums, because that checksums the plaintext
> > > and is stored encrypted.
> > 
> > Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> > the checksum (per-page or WAL) will not match our decrypted output.  How
> > would they make it match the checksum without already knowing the key.
> > I read [1] but could not see that explained.
> > 
> 
> Our checksum is only 16 bits, so perhaps one way would be to just
> generate 64k of randomly modified pages and hope one of them happens to
> hit the right checksum value. Not sure how practical such attack is, but
> it does require just filesystem access.

Yes, that would work, and opens the question of whether our checksum is
big enough for this, and if it is not, we need to find space for it,
probably with a custom encrypted page format.  :-(   And that makes
adding encryption offline almost impossible because you potentially have
to move tuples around.  Yuck!

> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
> a cryptographic hash algorithm. Now, maybe we don't want authenticated
> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).

I thought just encrypting the CRC value would be enough to detect
changes, but you are right that some you could just do 64k pages until
one hit.

> > This post discussed it:
> > 
> >     https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
> > 
> > I realize in a new system we might prefer encrypt-then-mac, TLS and SSL
> > do it differently, and I don't think the security problems of
> > MAC-then-Encrypt apply to our use-case, e.g. API programming errors.
> > 
> > If we want to go crazy, we could encrypt, assume zeros for the CRC,
> > compute the MAC and put it in the place of the CRC is, but then tools
> > that read CRC would see that as an error, so we don't want to go there.
> > Yes, crazy.
> > 
> > > Which seems pretty annoying, because then the checksums won't verify
> > > data as sent to the storage system, and verify checksums would require
> > > access to all keys (how do you do that in offline mode?).
> > 
> > Uh, the keys are stored in a PGDATA file --- seems simple enough, but we
> > would either have to do whole-cluster encryption or have some per-page
> > encryption flag.
> > 
> 
> And how do you know which files are encrypted and which are not, and
> which keys are used for which file? Presumably that's in some system
> catalog, which is not available in offline mode.

You would need either all-cluster encryption (no need to check) or a
per-page bit that says the page is encrypted, and the bit has to be in
the part of the page that is not encryped, e.g., near LSN.

> > > But the main issue with checksum-then-encrypt is it's essentially
> > > "MAC-then-Encrypt" and that does not provide Authenticated Encryption
> > > security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
> > > in which case we'll need to store the MAC somewhere (probably in the
> > > same place as the nonce/IV/key/... for each page).
> > 
> > I don't think we are planning to store the nonce/IV on each page but
> > rather use the LSN (already on the page), and perhaps in addition, the
> > page number.
> 
> But the LSN is in the page header, and AFAICS the page header is
> encrypted. So how do you decrypt the page without knowing the LSN (which
> I think you need to know in otder to derive the IV)?

My poposal was that the first 16 bytes of the page are not encrypted.

> Also, we probably don't want to expose the checksum, because it may
> reveal information about page contents (since it's not a HMAC).

Uh, I have not heard of that as an issue.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Jul-15, Bruce Momjian wrote:

> My point is that doing encryption of only some data might actually make
> the system slower due to the lookups, so I think we need to implement
> all-cluster encryption and then see what the overhead is, and if there
> are use-cases for not encrypting only some data.

We can keep the keys in the relcache.  It doesn't have to be slow.  It
is certainly slower to have to encrypt *all* data, which can be
massively larger than the sensitive portion of the database.

If we need the keys for offline operation (where relcache is not
reachable), we can keep pointers to the key files in the filesystem --
for example for an encrypted table we would keep a new file, say
<relfilenode>.key, which could be a symlink to the encrypted key file.
The tool already has access to the key data, but the symlink lets it
know *which* key to use; random onlookers cannot get the key data
because the file is encrypted with the master key.

Any table without the key file is assumed to be unencrypted.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Mon, Jul 15, 2019 at 06:11:41PM -0400, Bruce Momjian wrote:
>On Mon, Jul 15, 2019 at 11:05:30PM +0200, Tomas Vondra wrote:
>> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>> > On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> > > One extra thing we should consider is authenticated encryption. We can't
>> > > just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> > > as that does not provide integrity protection (i.e. can't detect when
>> > > the ciphertext was corrupted due to disk failure or intentionally). And
>> > > we can't quite rely on checksums, because that checksums the plaintext
>> > > and is stored encrypted.
>> >
>> > Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>> > the checksum (per-page or WAL) will not match our decrypted output.  How
>> > would they make it match the checksum without already knowing the key.
>> > I read [1] but could not see that explained.
>> >
>>
>> Our checksum is only 16 bits, so perhaps one way would be to just
>> generate 64k of randomly modified pages and hope one of them happens to
>> hit the right checksum value. Not sure how practical such attack is, but
>> it does require just filesystem access.
>
>Yes, that would work, and opens the question of whether our checksum is
>big enough for this, and if it is not, we need to find space for it,
>probably with a custom encrypted page format.  :-(   And that makes
>adding encryption offline almost impossible because you potentially have
>to move tuples around.  Yuck!
>

Right. We've been working on allowing to disable checksum online, and it
would be useful to allow something like that for encryption too I guess.
And without some sort of page-level flag that won't be possible, which
would be rather annoying.

Not sure it needs to be in the page itself, though - that's pretty much
why I proposed to store metadata (IV, key ID, ...) for encryption in a
new fork. That would be a bit more flexible than storing it in the page
itself (e.g. different encryption schemes might easily store different
amounts of metadata).

Maybe a new fork is way too complex solution, not sure.

>> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
>> a cryptographic hash algorithm. Now, maybe we don't want authenticated
>> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).
>
>I thought just encrypting the CRC value would be enough to detect
>changes, but you are right that some you could just do 64k pages until
>one hit.
>

Right. Not sure that's really a practical attack we need to worry about,
considering all of this is vulnerable to replay attacks.

>> > This post discussed it:
>> >
>> >     https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
>> >
>> > I realize in a new system we might prefer encrypt-then-mac, TLS and SSL
>> > do it differently, and I don't think the security problems of
>> > MAC-then-Encrypt apply to our use-case, e.g. API programming errors.
>> >
>> > If we want to go crazy, we could encrypt, assume zeros for the CRC,
>> > compute the MAC and put it in the place of the CRC is, but then tools
>> > that read CRC would see that as an error, so we don't want to go there.
>> > Yes, crazy.
>> >
>> > > Which seems pretty annoying, because then the checksums won't verify
>> > > data as sent to the storage system, and verify checksums would require
>> > > access to all keys (how do you do that in offline mode?).
>> >
>> > Uh, the keys are stored in a PGDATA file --- seems simple enough, but we
>> > would either have to do whole-cluster encryption or have some per-page
>> > encryption flag.
>> >
>>
>> And how do you know which files are encrypted and which are not, and
>> which keys are used for which file? Presumably that's in some system
>> catalog, which is not available in offline mode.
>
>You would need either all-cluster encryption (no need to check) or a
>per-page bit that says the page is encrypted, and the bit has to be in
>the part of the page that is not encryped, e.g., near LSN.
>
>> > > But the main issue with checksum-then-encrypt is it's essentially
>> > > "MAC-then-Encrypt" and that does not provide Authenticated Encryption
>> > > security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
>> > > in which case we'll need to store the MAC somewhere (probably in the
>> > > same place as the nonce/IV/key/... for each page).
>> >
>> > I don't think we are planning to store the nonce/IV on each page but
>> > rather use the LSN (already on the page), and perhaps in addition, the
>> > page number.
>>
>> But the LSN is in the page header, and AFAICS the page header is
>> encrypted. So how do you decrypt the page without knowing the LSN (which
>> I think you need to know in otder to derive the IV)?
>
>My poposal was that the first 16 bytes of the page are not encrypted.
>

Ah, I see.

>> Also, we probably don't want to expose the checksum, because it may
>> reveal information about page contents (since it's not a HMAC).
>
>Uh, I have not heard of that as an issue.
>

To clarify, I think it's more a general issue - the checksum does leak a
bit of information about the plaintext, I think that's fairly obvious. I
don't know if 16 bits is enough for practical attacks, though.

But it clearly is not the same thing as HMAC, so we should not treat it
as such.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Jul 15, 2019 at 06:05:37PM -0400, Bruce Momjian wrote:
>On Mon, Jul 15, 2019 at 10:44:34PM +0200, Tomas Vondra wrote:
>> On Mon, Jul 15, 2019 at 03:55:38PM -0400, Bruce Momjian wrote:
>> > The crazy seems more sane now --- "encrypt the page with CRC contents as
>> > zero" (which we probably already do to compute the CRC), then compute
>> > the CRC, and modify the page CRC.
>> >
>>
>> Huh? So you want to
>>
>> 1) set CRC to 0
>> 2) encrypt the page
>> 3) compute CRC
>> 4) set CRC to value computed in (3)
>> 5) encrypt the page again
>>
>> That seems pretty awful from performance POV, and it does not really
>> solve much as we'd still need to decrypt the page while verifying the
>> checksums (because the CRC is in the page header, which is encrypted).
>
>No, I was thinking we would overwrite whatever the encrypted output was
>in the spot that has the CRC with the computed CRC.  Yeah, sounds even
>crazier now that I said it --- never mind.
>

Uh, how could that possibly work? Symmetric ciphers are "diffusing" the
bits within the block, i.e. replacing 16 bits in a 128-bit ciphertext
block will affect the whole plaintext block, not just the matching 16
bits of plaintext.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Tue, Jul 16, 2019 at 02:04:58AM +0200, Tomas Vondra wrote:
> On Mon, Jul 15, 2019 at 06:05:37PM -0400, Bruce Momjian wrote:
> > On Mon, Jul 15, 2019 at 10:44:34PM +0200, Tomas Vondra wrote:
> > > On Mon, Jul 15, 2019 at 03:55:38PM -0400, Bruce Momjian wrote:
> > > > The crazy seems more sane now --- "encrypt the page with CRC contents as
> > > > zero" (which we probably already do to compute the CRC), then compute
> > > > the CRC, and modify the page CRC.
> > > >
> > > 
> > > Huh? So you want to
> > > 
> > > 1) set CRC to 0
> > > 2) encrypt the page
> > > 3) compute CRC
> > > 4) set CRC to value computed in (3)
> > > 5) encrypt the page again
> > > 
> > > That seems pretty awful from performance POV, and it does not really
> > > solve much as we'd still need to decrypt the page while verifying the
> > > checksums (because the CRC is in the page header, which is encrypted).
> > 
> > No, I was thinking we would overwrite whatever the encrypted output was
> > in the spot that has the CRC with the computed CRC.  Yeah, sounds even
> > crazier now that I said it --- never mind.
> > 
> 
> Uh, how could that possibly work? Symmetric ciphers are "diffusing" the
> bits within the block, i.e. replacing 16 bits in a 128-bit ciphertext
> block will affect the whole plaintext block, not just the matching 16
> bits of plaintext.

Yes, it would only work if the checksum was the last part of the page,
or if we used CTR mode, where changing the source bits doens't affect
the later bits.  I am thinking crazy here, I know, but it seemed worth
mentioning in case someone liked it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 15, 2019 at 06:08:28PM -0400, Sehrope Sarkuni wrote:
> Hi,
> 
> Some more thoughts on CBC vs CTR modes. There are a number of
> advantages to using CTR mode for page encryption.
> 
> CTR encryption modes can be fully parallelized, whereas CBC can only
> parallelized for decryption. While both can use AES specific hardware
> such as AES-NI, CTR modes can go a step further and use vectorized
> instructions.
> 
> On an i7-8559U (with AES-NI) I get a 4x speed improvement for
> CTR-based modes vs CBC when run on 8K of data:
> 
> # openssl speed -evp ${cipher}
> type             16 bytes     64 bytes    256 bytes   1024 bytes
> 8192 bytes  16384 bytes
> aes-128-cbc    1024361.51k  1521249.60k  1562033.41k  1571663.87k
> 1574537.90k  1575512.75k
> aes-128-ctr     696866.85k  2214441.86k  4364903.85k  5896221.35k
> 6559735.81k  6619594.75k
> aes-128-gcm     642758.92k  1638619.09k  3212068.27k  5085193.22k
> 6366035.97k  6474006.53k
> aes-256-cbc     940906.25k  1114628.44k  1131255.13k  1138385.92k
> 1140258.13k  1143592.28k
> aes-256-ctr     582161.82k  1896409.32k  3216926.12k  4249708.20k
> 4680299.86k  4706375.00k
> aes-256-gcm     553513.89k  1532556.16k  2705510.57k  3931744.94k
> 4615812.44k  4673093.63k

Wow, I am seeing CTR as 2x faster here too.

> For relation data where the encryption is going to be per page,
> there's flexibility in how the CTR nonce (IV + counter) is generated.
> With an 8K page, the counter need only go up to 512 for each page
> (8192-bytes per page / 16-bytes per AES-block). That would require
> 9-bits for the counter. Rounding that up to 16-bits allows for wider
> pages and it still uses only two bytes of the counter while ensuring
> that it'd be unique per AES-block. The remaining 14-bytes would be
> populated with some other data that is guaranteed unique per
> page-write to allow encryption via the same per-relation-file derived
> key. From what I gather, the LSN is a candidate though it'd have to be
> stored in plaintext for decryption.

Oh, for CTR, we need to increment the counter for each 16-byte block ---
got it.

> What's important is that writing the two pages (either different
> locations or the same page back again) never reuses the same nonce
> with the same key. Using the same nonce with a different key is fine.

Uh, I think we can use LSN and page-number to be unique.

> With any of these schemes the same inputs will generate the same
> outputs. With CTR mode for WAL this would be an issue if the same key
> and deterministic nonce (ex: LSN + offset) is reused in multiple
> places. That does not have to be the same cluster either. For example

Very good point, since CTR does not use the user data as part of the
later encryption.

> if two replicas are promoted from the same backup with the same master
> key, they would generate the same WAL CTR stream, reusing the
> key/nonce pair. Ditto for starting off with a master key and deriving
> per-relation keys in a cloned installation off some deterministic
> attribute such as oid.

Uh, when we promote a standby, don't we increment the timeline?  Does
that help?  I don't know what we could use to distingish two standbys
that are both promoted and using the same key --- there is nothing
unique about them.

> This can be avoided by deriving new keys per file (not just per
> relation) from a random salt. It'd be stored out of band and combined
> with the master key to derive the specific key used for that CTR
> stream. If there's a desire for supporting multiple ciphers or key
> sizes, that could be stored alongside the salt. Perhaps use the same
> location or lack of it to indicate "not encrypted" as well.

You mean the cluster would have its own random key?  Unfortunately all
clusters in a replica set have the same Database system identifier as
the primary.

> Per-file salts and derived keys would facilitate re-keying a table
> piecemeal, file by file, by generating a new salt/derived-key,
> encrypting a copy of the decrypted file, and doing an atomic rename.
> The files contents would change but its length and any references to
> pages or byte offsets would stay valid. (I think this would work for
> CBC modes too as there's nothing CTR specific about it.)

Storing that might be a problem, particularly to access during crash
recovery.

> I'm not sure of is how to handle randomizing the relation file IV in a
> cloned database. Until the key for a relation file or segment is
> rotated it'd have the same deterministic IV generated as its source as
> the LSN would continue from the same point. One idea is with 128-bits
> for the IV, one could have 64-bits for LSN, 16-bits for AES-block
> counter, and the remaining 48-bits be randomized; though you'd need to
> store those 48-bits somewhere per-page (basically it's a salt per
> page). That'd give some protection from the clone's new data be
> encrypted with the same stream as the parent's. Another option would
> be to track ranges of LSNs and have a centralized list of 48-bit
> randomized salts. That would remove the need for additional salt per
> page though you'd have to do a lookup on that shared list to figure
> out which to use.
> 
> CTR mode is definitely more complicated than a pure random-IV + CBC
> but with any deterministic generation of IVs for CBC mode you're going
> to have some of these same problems as well.

This is starting to sound unworkable for our usecase.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 13, 2019 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Jul 12, 2019 at 02:15:02PM +0900, Masahiko Sawada wrote:
> > > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> > > 8k pages will use the LSN as a nonce, which will be encrypted to
> > > generate the initialization vector (IV).  We will not encrypt the first
> > > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> > > use the WAL file segment number as the nonce and the IV will be created
> > > in the same way.
> > >
> > > wal_log_hints will be enabled automatically in encryption mode, like we
> > > do for checksum mode, so we never encrypt different 8k pages with the
> > > same IV.
> >
> > I guess that different two pages can have the same LSN when a heap
> > update modifies both a page for old tuple and another page for new
> > tuple.
> >
> > heapam.c:3707
> >         recptr = log_heap_update(relation, buffer,
> >                                  newbuf, &oldtup, heaptup,
> >                                  old_key_tuple,
> >                                  all_visible_cleared,
> >                                  all_visible_cleared_new);
> >         if (newbuf != buffer)
> >         {
> >             PageSetLSN(BufferGetPage(newbuf), recptr);
> >         }
> >         PageSetLSN(BufferGetPage(buffer), recptr);
> >
> > Wouldn't it a problem?
>
> I had the same question.  If someone does:
>
>         UPDATE tab SET col = col + 1
>
> then each row change gets its own LSN.  You are asking if an update that
> just expires one row and adds it to a new page gets the same LSN.  I
> don't know.

The following scripts can reproduce that different two pages have the same LSN.

=# create table test (a int);
CREATE TABLE
=# insert into test select generate_series(1, 226);
INSERT 0 226
=# update test set a = a where a = 1;
UPDATE 1
=# select lsn from page_header(get_raw_page('test', 0));
    lsn
-----------
 0/1690488
(1 row)

=# select lsn from page_header(get_raw_page('test', 1));
    lsn
-----------
 0/1690488
(1 row)

So I think it's better to use LSN and page number to create IV. If we
modify different tables by single WAL we also would need OID or
relfilenode but I don't think currently we have such operations.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, Jul 15, 2019 at 9:38 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > On Mon, Jun 17, 2019 at 11:02 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 08:39:27AM -0400, Joe Conway wrote:
> > > >On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> > > >> From perspective of  cryptographic, I think the fine grained TDE would
> > > >> be better solution. Therefore if we eventually want the fine grained
> > > >> TDE I wonder if it might be better to develop the table/tablespace TDE
> > > >> first while keeping it simple as much as possible in v1, and then we
> > > >> can provide the functionality to encrypt other data in database
> > > >> cluster to satisfy the encrypting-everything requirement. I guess that
> > > >> it's easier to incrementally add encryption target objects rather than
> > > >> making it fine grained while not changing encryption target objects.
> > > >>
> > > >> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> > > >> it in this month. We can more discuss the complexity of the proposed
> > > >> TDE using it.
> > > >
> > > >+1
> > > >
> > > >Looking forward to it.
> > > >
> > >
> > > Yep. In particular, I'm interested in those aspects:
> > >
> >
> > Attached the draft version patch sets of per tablespace transparent
> > data at rest encryption.
>

Thank you for your email

> I was worried that there's competition between us but now that I've checked
> your patch set I see that you already use some parts of
>
> https://commitfest.postgresql.org/23/2104/
>
> although not the latest version. I'm supposed to work on the encryption now,
> so thinking what to do next. I think we should coordinate the effort, possibly
> off-list. The earlier we have a single patch set the more efficient the work
> should be.

Agreed. Let's discuss how we can coordinate the effort. I also think
it could be off-list as that's mostly about non-technical topic.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Fri, Jul 12, 2019 at 7:37 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Jul 10, 2019 at 12:26:24PM -0400, Bruce Momjian wrote:
> > On Wed, Jul 10, 2019 at 08:31:17AM -0400, Joe Conway wrote:
> > > Please see my other reply (and
> > > https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> > > appendix C as pointed out by Ryan downthread).
> > >
> > > At least in my mind, I trust a published specification from the
> > > nation-state level over random blogs or wikipedia. If we can find some
> > > equivalent published standards that contradict NIST we should discuss
> > > it, but for my money I would prefer to stick with the NIST recommended
> > > method to produce the IVs.
> >
> > So, we have had a flurry of activity on this thread in the past day, so
> > let me summarize:
>
> Seems we have an updated approach:
>
> First, we need to store the symmetric encryption key in the data
> directory, like we do for SSL certificates and private keys.  (Crash
> recovery needs access to this key, so we can't easily store it in a
> database table.)  We will pattern it after the GUC
> ssl_passphrase_command.   We will need to decide on a format for the
> symmetric encryption key in the file so we can check that the supplied
> passphrase properly unlocks the key.
>
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)
>
> We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> 8k pages will use the LSN as a nonce, which will be encrypted to
> generate the initialization vector (IV).  We will not encrypt the first
> 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> use the WAL file segment number as the nonce and the IV will be created
> in the same way.
>
> wal_log_hints will be enabled automatically in encryption mode, like we
> do for checksum mode, so we never encrypt different 8k pages with the
> same IV.
>
> There will need to be a pg_control field to indicate that encryption is
> in use.
>
> Right now we don't support the online changing of a cluster's checksum
> mode, so I suggest we create a utility like pg_checksums --enable to
> allow offline key rotation.  Once we get online checksum mode changing
> ability, we can look into use that for encryption key rotation.
>

I've re-considered the design of TDE feature based on the discussion
so far. The one of the main open question is the granular of
encryption objects: cluster encryption or more-granular-than-cluster
encryption. The followings describe about the new TDE design when we
choose table-level encryption or something-new-group-level encryption.

General
========
We will use AES and support both AES-128 and AES-256. User can specify
the new initdb option something like --aes-128 or --aes-256 to enable
encryption and must specify --encryption-key-passphrase-command along
with. (I guess we also require openssl library.) If these options are
specified, we write the key length to the control file and derive the
KEK and generate MDEK during initdb. wal_log_hints will be enabled
automatically in encryption mode, like we do for checksum mode,

Key Management
==============
We will use 3-tier key architecture as Joe proposed.

  1. A master key encryption key (KEK): this is the ley supplied by the
     database admin using something akin to ssl_passphrase_command

  2. A master data encryption key (MDEK): this is a generated key using a
     cryptographically secure pseudo-random number generator. It is
     encrypted using the KEK, probably with Key Wrap (KW):
     or maybe better Key Wrap with Padding (KWP):

  3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
      table specific keys.

  3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
      generate new keys when needed for WAL.

We store MDEK to the plain file (say global/pgkey) after encrypted
with the KEK. I might want to store the hash of passphrase of the KEK
in order to verify the correctness of the given passphrase. However we
don't need to store TDEK and WDEK as we can derive them as needed. The
key file can be read by both backend processes and front-end tools.

When postmaster startup, it reads the key file and decrypts MDEK and
derive WDEK using key id for WDEK. WDEK is loaded to the key hash map
(keyid -> key) on the shared memory. Also we derive TDEK as needed
when reading tables or indexes and add it to the key hash map as well
if not exists.

Buffer Encryption
==============
We will use AES-CBC for buffer encryption. We will add key id (4byte)
to after the pd_lsn(8byte) in PageHeaderData and we will not encrypt
first 16 byte of each pages so the LSN and key id can be used. We can
store an invalid key id to tell us that the table is not encrypted.
There two benefits of storing key id to the page header: offline tools
can get key id (and know the table is encrypted or not) and it's
helpful for online rekey in the future.

I've considered to store IV and key id to a new fork but I felt that
it is complex because we will always need to have the fork on the
shared buffer when any pages of its main fork is written to the disk.
If almost buffers of the shared buffers are dirtied and theirs new
forks are not  loaded to the shared buffer, we might need to load the
new fork and write the page to the disk and then evict some pages,
over and over.

We will use (page lsn, page number) to create a nonce. IVs are created
by encrypting the nonce with its TDEK.

WAL Encryption
=============
We will use AES-CTR for WAL encryption and encrypt each WAL pages with WDEK.

We will use WAL segment number to create a nonce. Similar to buffer
encryption, IVs are created using by the nonce and WDEK.

If we want to support enabling or disabling encryption after initdb we
might want to have key id in the WAL page header.

Front-end Tool Support
==================
We will add --encryption-key-passphrase-command option to the
front-end tools that read database files or WAL segment files directly.
They can get KEK via --encryption-key-passphrase-command and get MDEK
by reading the key file. Also they can know the key length by checking
the control file. Since they can derive TDEK using by key id stored in
the page header they can decrypt database files. Similarly, they also
can decrypt WAL as they can know the key id of WDEK.

Master Key Rotation
================
We will support new command-line tool that rotates the master key
offline. It accepts --old-encryption-key-passphrase-command option and
--new-encryption-key-passphrase-command to get old KEK and new KEK
respectively. It decrypt MDEK with the old key and encrypt it with
the new key.

There is concern about the performance overhead by both looking up
keys and checking if the object is encrypted or not but with this
design we can know them by reading page and by checking hash map on
the shared memory. It works fine unless we have to have a huge number
of keys in the hash map. So I guess the overhead doesn't become
obvious. In addition, this key management design is similar to the PoC
patch I created before and evaluated its the performance overhead a few
month ago. In the evaluation, I didn't see such overhead. See [1].

[1]
https://www.slideshare.net/masahikosawada98/transparent-data-encryptoin-in-postgresql-and-integratino-with-key-management-service/31




Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
> >On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> >> One extra thing we should consider is authenticated encryption. We can't
> >> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> >> as that does not provide integrity protection (i.e. can't detect when
> >> the ciphertext was corrupted due to disk failure or intentionally). And
> >> we can't quite rely on checksums, because that checksums the plaintext
> >> and is stored encrypted.
> >
> >Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> >the checksum (per-page or WAL) will not match our decrypted output.  How
> >would they make it match the checksum without already knowing the key.
> >I read [1] but could not see that explained.
> >
> 
> Our checksum is only 16 bits, so perhaps one way would be to just
> generate 64k of randomly modified pages and hope one of them happens to
> hit the right checksum value. Not sure how practical such attack is, but
> it does require just filesystem access.

I don't think you can easily generate 64k of different checksums this way. If
the data is random, I suppose that each set of 2^(128 - 16) blocks will
contain the the same checksum after decryption. Thus even you generate 64k of
different ciphertext blocks that contain the checksum, some (many?)  checksums
will be duplicate. Unfortunately the math to describe this problem does not
seem to be trivial.

Also note that if you try to generate ciphertext, decryption of which will
result in particular value of checksum, you can hardly control the other 14
bytes of the block, which in turn are used to verify the checksum.

> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
> a cryptographic hash algorithm. Now, maybe we don't want authenticated
> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).

I'm also not sure if we should try to guarantee data authenticity /
integrity. As someone already mentioned elsewhere, page MAC does not help if
the whole page is replaced. (An extreme case is that old filesystem snapshot
containing the whole data directory is restored, although that will probably
make the database crash soon.)

We can guarantee integrity and authenticity of backup, but that's a separate
feature: someone may need this although it's o.k. for him to run the cluster
unencrypted.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On Mon, Jul 15, 2019 at 06:11:41PM -0400, Bruce Momjian wrote:
> >On Mon, Jul 15, 2019 at 11:05:30PM +0200, Tomas Vondra wrote:
> >> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
> >> > On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> >> > > One extra thing we should consider is authenticated encryption. We can't
> >> > > just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> >> > > as that does not provide integrity protection (i.e. can't detect when
> >> > > the ciphertext was corrupted due to disk failure or intentionally). And
> >> > > we can't quite rely on checksums, because that checksums the plaintext
> >> > > and is stored encrypted.
> >> >
> >> > Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> >> > the checksum (per-page or WAL) will not match our decrypted output.  How
> >> > would they make it match the checksum without already knowing the key.
> >> > I read [1] but could not see that explained.
> >> >
> >>
> >> Our checksum is only 16 bits, so perhaps one way would be to just
> >> generate 64k of randomly modified pages and hope one of them happens to
> >> hit the right checksum value. Not sure how practical such attack is, but
> >> it does require just filesystem access.
> >
> >Yes, that would work, and opens the question of whether our checksum is
> >big enough for this, and if it is not, we need to find space for it,
> >probably with a custom encrypted page format.  :-(   And that makes
> >adding encryption offline almost impossible because you potentially have
> >to move tuples around.  Yuck!
> >
>
> Right. We've been working on allowing to disable checksum online, and it
> would be useful to allow something like that for encryption too I guess.
> And without some sort of page-level flag that won't be possible, which
> would be rather annoying.
>
> Not sure it needs to be in the page itself, though - that's pretty much
> why I proposed to store metadata (IV, key ID, ...) for encryption in a
> new fork. That would be a bit more flexible than storing it in the page
> itself (e.g. different encryption schemes might easily store different
> amounts of metadata).
>
> Maybe a new fork is way too complex solution, not sure.

One problem of this new fork would be that contents of its buffer (the MAC
values) is not determined until the corresponding buffers of the MAIN fork get
encrypted. However encryption is performed by the storage layer (md.c), which
is not expected to lock other buffers (such as those of the "MAC fork"), read
their pages from disk or insert their WAL records.

This is different from the FSM or VM forks whose buffers are only updated
above the storage layer.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Fri, Jul 19, 2019 at 12:04:36PM +0200, Antonin Houska wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>> >On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> >> One extra thing we should consider is authenticated encryption. We can't
>> >> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> >> as that does not provide integrity protection (i.e. can't detect when
>> >> the ciphertext was corrupted due to disk failure or intentionally). And
>> >> we can't quite rely on checksums, because that checksums the plaintext
>> >> and is stored encrypted.
>> >
>> >Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>> >the checksum (per-page or WAL) will not match our decrypted output.  How
>> >would they make it match the checksum without already knowing the key.
>> >I read [1] but could not see that explained.
>> >
>>
>> Our checksum is only 16 bits, so perhaps one way would be to just
>> generate 64k of randomly modified pages and hope one of them happens to
>> hit the right checksum value. Not sure how practical such attack is, but
>> it does require just filesystem access.
>
>I don't think you can easily generate 64k of different checksums this way. If
>the data is random, I suppose that each set of 2^(128 - 16) blocks will
>contain the the same checksum after decryption. Thus even you generate 64k of
>different ciphertext blocks that contain the checksum, some (many?)  checksums
>will be duplicate. Unfortunately the math to describe this problem does not
>seem to be trivial.
>

I'm not sure what's your point, or why you care about the 128 bits, but I
don't think the math is very complicated (and it's exactly the same with
or without encryption). The probability of checksum collision for randomly
modified page is 1/64k, so p=~0.00153%. So probability of *not* getting a
collision is (1-p)=99.9985%. So with N pages, the probability of no
collisions is pow((1-p),N) which behaves like this:

      N     pow((1-p),N)
    --------------------
    10000           85%
    20000           73%
    30000           63%
    46000           49%
    200000           4%

So with 1.6GB relation you have about 96% chance of a checksum collision.

>Also note that if you try to generate ciphertext, decryption of which will
>result in particular value of checksum, you can hardly control the other 14
>bytes of the block, which in turn are used to verify the checksum.
>

But we don't care about the 14 bytes. In fact, we want the page header
(which includes both the checksums and the other 14B in the block) to
remain unchanged - the attack only needs to modify the remaining parts of
the 8kB page in a way to generate the same checksum on the plaintext.

And that's not that hard to do, IMHO, because the header is stored at the
beginning of the page. So we can just randomly modify the last AES block
(last 16B on the page) to minimize the corruption to the last block.

Now, I'm not saying this attack is particularly practical - it would
generate a fair number of checkpoint failures before getting the first
collision. So it'd trigger quite a few alerts, I guess.

>> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
>> a cryptographic hash algorithm. Now, maybe we don't want authenticated
>> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).
>
>I'm also not sure if we should try to guarantee data authenticity /
>integrity. As someone already mentioned elsewhere, page MAC does not help if
>the whole page is replaced. (An extreme case is that old filesystem snapshot
>containing the whole data directory is restored, although that will probably
>make the database crash soon.)
>
>We can guarantee integrity and authenticity of backup, but that's a separate
>feature: someone may need this although it's o.k. for him to run the cluster
>unencrypted.
>

Yes, I do agree with that. I think attempts to guarantee data authenticity
and/or integrity at the page level is mostly futile (replay attacks are an
example of why). IMHO we should consider that to be outside the threat
model TDE is expected to address.

IMO a better way to handle authenticity/integrity would be based on WAL,
which is essentially an authoritative log of operations. We should be able
to parse WAL, deduce expected state (min LSN, checksums) for each page,
and validate the cluster state based on that.

I still think having to decrypt the page in order to verify a checksum
(because the header is part of the encrypted page, and is computed from
the plaintext version) is not great.

regards

>-- 
>Antonin Houska
>Web: https://www.cybertec-postgresql.com

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Fri, Jul 19, 2019 at 01:32:01PM +0200, Antonin Houska wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>> On Mon, Jul 15, 2019 at 06:11:41PM -0400, Bruce Momjian wrote:
>> >On Mon, Jul 15, 2019 at 11:05:30PM +0200, Tomas Vondra wrote:
>> >> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>> >> > On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> >> > > One extra thing we should consider is authenticated encryption. We can't
>> >> > > just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> >> > > as that does not provide integrity protection (i.e. can't detect when
>> >> > > the ciphertext was corrupted due to disk failure or intentionally). And
>> >> > > we can't quite rely on checksums, because that checksums the plaintext
>> >> > > and is stored encrypted.
>> >> >
>> >> > Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>> >> > the checksum (per-page or WAL) will not match our decrypted output.  How
>> >> > would they make it match the checksum without already knowing the key.
>> >> > I read [1] but could not see that explained.
>> >> >
>> >>
>> >> Our checksum is only 16 bits, so perhaps one way would be to just
>> >> generate 64k of randomly modified pages and hope one of them happens to
>> >> hit the right checksum value. Not sure how practical such attack is, but
>> >> it does require just filesystem access.
>> >
>> >Yes, that would work, and opens the question of whether our checksum is
>> >big enough for this, and if it is not, we need to find space for it,
>> >probably with a custom encrypted page format.  :-(   And that makes
>> >adding encryption offline almost impossible because you potentially have
>> >to move tuples around.  Yuck!
>> >
>>
>> Right. We've been working on allowing to disable checksum online, and it
>> would be useful to allow something like that for encryption too I guess.
>> And without some sort of page-level flag that won't be possible, which
>> would be rather annoying.
>>
>> Not sure it needs to be in the page itself, though - that's pretty much
>> why I proposed to store metadata (IV, key ID, ...) for encryption in a
>> new fork. That would be a bit more flexible than storing it in the page
>> itself (e.g. different encryption schemes might easily store different
>> amounts of metadata).
>>
>> Maybe a new fork is way too complex solution, not sure.
>
>One problem of this new fork would be that contents of its buffer (the MAC
>values) is not determined until the corresponding buffers of the MAIN fork get
>encrypted. However encryption is performed by the storage layer (md.c), which
>is not expected to lock other buffers (such as those of the "MAC fork"), read
>their pages from disk or insert their WAL records.
>
>This is different from the FSM or VM forks whose buffers are only updated
>above the storage layer.
>

Yes, that seems like a valid issue :-(


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On Fri, Jul 19, 2019 at 12:04:36PM +0200, Antonin Houska wrote:
> >Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >
> >> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
> >> >On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> >> >> One extra thing we should consider is authenticated encryption. We can't
> >> >> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> >> >> as that does not provide integrity protection (i.e. can't detect when
> >> >> the ciphertext was corrupted due to disk failure or intentionally). And
> >> >> we can't quite rely on checksums, because that checksums the plaintext
> >> >> and is stored encrypted.
> >> >
> >> >Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> >> >the checksum (per-page or WAL) will not match our decrypted output.  How
> >> >would they make it match the checksum without already knowing the key.
> >> >I read [1] but could not see that explained.
> >> >
> >>
> >> Our checksum is only 16 bits, so perhaps one way would be to just
> >> generate 64k of randomly modified pages and hope one of them happens to
> >> hit the right checksum value. Not sure how practical such attack is, but
> >> it does require just filesystem access.
> >
> >I don't think you can easily generate 64k of different checksums this way. If
> >the data is random, I suppose that each set of 2^(128 - 16) blocks will
> >contain the the same checksum after decryption. Thus even you generate 64k of
> >different ciphertext blocks that contain the checksum, some (many?)  checksums
> >will be duplicate. Unfortunately the math to describe this problem does not
> >seem to be trivial.
> >
>
> I'm not sure what's your point, or why you care about the 128 bits, but I
> don't think the math is very complicated (and it's exactly the same with
> or without encryption). The probability of checksum collision for randomly
> modified page is 1/64k, so p=~0.00153%. So probability of *not* getting a
> collision is (1-p)=99.9985%. So with N pages, the probability of no
> collisions is pow((1-p),N) which behaves like this:
>
>      N     pow((1-p),N)
>    --------------------
>    10000           85%
>    20000           73%
>    30000           63%
>    46000           49%
>    200000           4%
>
> So with 1.6GB relation you have about 96% chance of a checksum collision.

I thought your attack proposal was to find valid (encrypted) checksum for a
given encrypted page. Instead it seems that you were only trying to say that
it's not too hard to generate page with a valid checksum in general. Thus the
attacker can try to modify the ciphertext again and again in a way that is not
quite random, but the chance to pass the checksum verification may still be
relatively high.

> >Also note that if you try to generate ciphertext, decryption of which will
> >result in particular value of checksum, you can hardly control the other 14
> >bytes of the block, which in turn are used to verify the checksum.
> >
>
> Now, I'm not saying this attack is particularly practical - it would
> generate a fair number of checkpoint failures before getting the first
> collision. So it'd trigger quite a few alerts, I guess.

You probably mean "checksum failures". I agree. And even if the checksum
passed the verification, page or tuple headers would probably be incorrect and
cause other errors.

> >> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
> >> a cryptographic hash algorithm. Now, maybe we don't want authenticated
> >> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).
> >
> >I'm also not sure if we should try to guarantee data authenticity /
> >integrity. As someone already mentioned elsewhere, page MAC does not help if
> >the whole page is replaced. (An extreme case is that old filesystem snapshot
> >containing the whole data directory is restored, although that will probably
> >make the database crash soon.)
> >
> >We can guarantee integrity and authenticity of backup, but that's a separate
> >feature: someone may need this although it's o.k. for him to run the cluster
> >unencrypted.
> >
>
> Yes, I do agree with that. I think attempts to guarantee data authenticity
> and/or integrity at the page level is mostly futile (replay attacks are an
> example of why). IMHO we should consider that to be outside the threat
> model TDE is expected to address.

When writing my previous email I forgot that, besides improving data
integrity, the authenticated encryption also tries to detect an attempt to get
encryption key via "chosen-ciphertext attack (CCA)". The fact that pages are
encrypted / decrypted independent from each other should not be a problem
here. We just need to consider if this kind of CCA is the threat we try to
protect against.

> IMO a better way to handle authenticity/integrity would be based on WAL,
> which is essentially an authoritative log of operations. We should be able
> to parse WAL, deduce expected state (min LSN, checksums) for each page,
> and validate the cluster state based on that.

ok. A replica that was cloned from the master before any corruption could have
happened can be used for such checks. But that should be done by an external
tool rather than by PG core.

> I still think having to decrypt the page in order to verify a checksum
> (because the header is part of the encrypted page, and is computed from
> the plaintext version) is not great.

Should we forbid the checksums if the cluster is encrypted? Even if the
checksum is encrypted, I think it can still help to detect I/O corruption: if
the encrypted data is corrupted, then the checksum verification should fail
after decryption anyway.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Fri, Jul 19, 2019 at 04:02:19PM +0200, Antonin Houska wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>> On Fri, Jul 19, 2019 at 12:04:36PM +0200, Antonin Houska wrote:
>> >Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> >> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>> >> >On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> >> >> One extra thing we should consider is authenticated encryption. We can't
>> >> >> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> >> >> as that does not provide integrity protection (i.e. can't detect when
>> >> >> the ciphertext was corrupted due to disk failure or intentionally). And
>> >> >> we can't quite rely on checksums, because that checksums the plaintext
>> >> >> and is stored encrypted.
>> >> >
>> >> >Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>> >> >the checksum (per-page or WAL) will not match our decrypted output.  How
>> >> >would they make it match the checksum without already knowing the key.
>> >> >I read [1] but could not see that explained.
>> >> >
>> >>
>> >> Our checksum is only 16 bits, so perhaps one way would be to just
>> >> generate 64k of randomly modified pages and hope one of them happens to
>> >> hit the right checksum value. Not sure how practical such attack is, but
>> >> it does require just filesystem access.
>> >
>> >I don't think you can easily generate 64k of different checksums this way. If
>> >the data is random, I suppose that each set of 2^(128 - 16) blocks will
>> >contain the the same checksum after decryption. Thus even you generate 64k of
>> >different ciphertext blocks that contain the checksum, some (many?)  checksums
>> >will be duplicate. Unfortunately the math to describe this problem does not
>> >seem to be trivial.
>> >
>>
>> I'm not sure what's your point, or why you care about the 128 bits, but I
>> don't think the math is very complicated (and it's exactly the same with
>> or without encryption). The probability of checksum collision for randomly
>> modified page is 1/64k, so p=~0.00153%. So probability of *not* getting a
>> collision is (1-p)=99.9985%. So with N pages, the probability of no
>> collisions is pow((1-p),N) which behaves like this:
>>
>>      N     pow((1-p),N)
>>    --------------------
>>    10000           85%
>>    20000           73%
>>    30000           63%
>>    46000           49%
>>    200000           4%
>>
>> So with 1.6GB relation you have about 96% chance of a checksum collision.
>
>I thought your attack proposal was to find valid (encrypted) checksum for a
>given encrypted page. Instead it seems that you were only trying to say that
>it's not too hard to generate page with a valid checksum in general. Thus the
>attacker can try to modify the ciphertext again and again in a way that is not
>quite random, but the chance to pass the checksum verification may still be
>relatively high.
>
>> >Also note that if you try to generate ciphertext, decryption of which will
>> >result in particular value of checksum, you can hardly control the other 14
>> >bytes of the block, which in turn are used to verify the checksum.
>> >
>>
>> Now, I'm not saying this attack is particularly practical - it would
>> generate a fair number of checkpoint failures before getting the first
>> collision. So it'd trigger quite a few alerts, I guess.
>
>You probably mean "checksum failures". I agree. And even if the checksum
>passed the verification, page or tuple headers would probably be incorrect and
>cause other errors.
>
>> >> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
>> >> a cryptographic hash algorithm. Now, maybe we don't want authenticated
>> >> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).
>> >
>> >I'm also not sure if we should try to guarantee data authenticity /
>> >integrity. As someone already mentioned elsewhere, page MAC does not help if
>> >the whole page is replaced. (An extreme case is that old filesystem snapshot
>> >containing the whole data directory is restored, although that will probably
>> >make the database crash soon.)
>> >
>> >We can guarantee integrity and authenticity of backup, but that's a separate
>> >feature: someone may need this although it's o.k. for him to run the cluster
>> >unencrypted.
>> >
>>
>> Yes, I do agree with that. I think attempts to guarantee data authenticity
>> and/or integrity at the page level is mostly futile (replay attacks are an
>> example of why). IMHO we should consider that to be outside the threat
>> model TDE is expected to address.
>
>When writing my previous email I forgot that, besides improving data
>integrity, the authenticated encryption also tries to detect an attempt to get
>encryption key via "chosen-ciphertext attack (CCA)". The fact that pages are
>encrypted / decrypted independent from each other should not be a problem
>here. We just need to consider if this kind of CCA is the threat we try to
>protect against.
>
>> IMO a better way to handle authenticity/integrity would be based on WAL,
>> which is essentially an authoritative log of operations. We should be able
>> to parse WAL, deduce expected state (min LSN, checksums) for each page,
>> and validate the cluster state based on that.
>
>ok. A replica that was cloned from the master before any corruption could have
>happened can be used for such checks. But that should be done by an external
>tool rather than by PG core.
>
>> I still think having to decrypt the page in order to verify a checksum
>> (because the header is part of the encrypted page, and is computed from
>> the plaintext version) is not great.
>
>Should we forbid the checksums if the cluster is encrypted? Even if the
>checksum is encrypted, I think it can still help to detect I/O corruption: if
>the encrypted data is corrupted, then the checksum verification should fail
>after decryption anyway.
>

Forbid checksums? I don't see how that could be acceptable. We either have
to accept the limitations of the current design (having to decrypt
everything before checking the checksums) or change the design.

I personally think we should do the latter - not just because of this
"decrypt-then-verify" issue, but consider how much work we've done to
allow enabling checksums on-line (it's still not there, but it's likely
doable in PG13). How are we going to do that with encryption? ISTM we
should design it so that we can enable encryption on-line too - maybe not
in v1, but it should be possible. So how are we going to do that? With
checksums it's (fairly) easy because we can "not verify" the page before
we know all pages have a checksum, but with encryption that's not
possible. And if the whole page is encrypted, then what?

Of course, maybe we don't need such capability for the use-case we're
trying to solve with encryption. I can imagine that someone is running a
large system, has issues with data corruption, and decides to enable
checksums to remedy that. Maybe there's no such scenario in the privacy
case? But we can probably come up with scenarios where a new company
policy forces people to enable encryption on all systems, or something
like that.

That being said, I don't know how to solve this, but it seems to me that
any system where we can't easily decide whether a page is encrypted or not
(because everything including the page header) is encrypted has this
exact issue. Maybe we could keep some part of the header unencrypted
(likely an information leak, does not solve decrypt-then-verify). Or maybe
we need to store some additional information on each page (which breaks
on-disk format).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Sat, Jul 20, 2019 at 1:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Forbid checksums? I don't see how that could be acceptable. We either have
> to accept the limitations of the current design (having to decrypt
> everything before checking the checksums) or change the design.
>
> I personally think we should do the latter - not just because of this
> "decrypt-then-verify" issue, but consider how much work we've done to
> allow enabling checksums on-line (it's still not there, but it's likely
> doable in PG13). How are we going to do that with encryption? ISTM we
> should design it so that we can enable encryption on-line too - maybe not
> in v1, but it should be possible. So how are we going to do that? With
> checksums it's (fairly) easy because we can "not verify" the page before
> we know all pages have a checksum, but with encryption that's not
> possible. And if the whole page is encrypted, then what?
>
> Of course, maybe we don't need such capability for the use-case we're
> trying to solve with encryption. I can imagine that someone is running a
> large system, has issues with data corruption, and decides to enable
> checksums to remedy that. Maybe there's no such scenario in the privacy
> case? But we can probably come up with scenarios where a new company
> policy forces people to enable encryption on all systems, or something
> like that.
>
> That being said, I don't know how to solve this, but it seems to me that
> any system where we can't easily decide whether a page is encrypted or not
> (because everything including the page header) is encrypted has this
> exact issue. Maybe we could keep some part of the header unencrypted
> (likely an information leak, does not solve decrypt-then-verify). Or maybe
> we need to store some additional information on each page (which breaks
> on-disk format).

How about storing the CRC of the encrypted pages? It would not leak
any additional information and serves the same purpose as a
non-encrypted one, namely I/O corruption detection. I took a look at
pg_checksum and besides checking for empty pages, the checksum
validation path does not interpret any other fields to calculate the
checksum. I think even the offline checksum enabling path looks like
it may work out of the box. Checksums of encrypted data are not a
replacement for a MAC and this would allow that validation to run
without any knowledge of keys.

Related, I think CTR mode should be considered for pages. It has
performance advantages at 8K data sizes, but even better, allows for
arbitrary bytes of the cipher text to be replaced. For example, after
encrypting a block you can replace the two checksum bytes with the CRC
of the cipher text v.s. CBC mode where that would cause corruption to
cascade forward. Same could be used for leaving things like
pd_pagesize_version in plaintext at its current offset. For anything
deemed non-sensitive, having it readable without having to decrypt the
page is useful.

It does not have to be full bytes either. CTR mode operates as a
stream of bits so its possible to replace nibbles or even individual
bits. It can be something as small one bit for an "is_encrypted" flag
or a handful of bits used to infer a derived key. For example, with
2-bits you could have 00 represent unencrypted, 01/10 represent
old/new key, and 11 be future use. Something like that could
facilitate online key rotation.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On Mon, Jul 15, 2019 at 07:39:20PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-15, Bruce Momjian wrote:
> 
> > My point is that doing encryption of only some data might actually make
> > the system slower due to the lookups, so I think we need to implement
> > all-cluster encryption and then see what the overhead is, and if there
> > are use-cases for not encrypting only some data.
> 
> We can keep the keys in the relcache.  It doesn't have to be slow.  It
> is certainly slower to have to encrypt *all* data, which can be
> massively larger than the sensitive portion of the database.
> 
> If we need the keys for offline operation (where relcache is not
> reachable), we can keep pointers to the key files in the filesystem --
> for example for an encrypted table we would keep a new file, say
> <relfilenode>.key, which could be a symlink to the encrypted key file.
> The tool already has access to the key data, but the symlink lets it
> know *which* key to use; random onlookers cannot get the key data
> because the file is encrypted with the master key.
> 
> Any table without the key file is assumed to be unencrypted.

The relcache and symlinks is an interesting idea.  Are we still
encrypting all of WAL?  If so, the savings is only on heap/index file
writes, and I just don't know much of a benefit skipping encryption will
be --- we can test it later.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 20, 2019 at 03:39:25PM -0400, Sehrope Sarkuni wrote:
> How about storing the CRC of the encrypted pages? It would not leak
> any additional information and serves the same purpose as a
> non-encrypted one, namely I/O corruption detection. I took a look at
> pg_checksum and besides checking for empty pages, the checksum
> validation path does not interpret any other fields to calculate the
> checksum. I think even the offline checksum enabling path looks like
> it may work out of the box. Checksums of encrypted data are not a
> replacement for a MAC and this would allow that validation to run
> without any knowledge of keys.
> 
> Related, I think CTR mode should be considered for pages. It has
> performance advantages at 8K data sizes, but even better, allows for
> arbitrary bytes of the cipher text to be replaced. For example, after
> encrypting a block you can replace the two checksum bytes with the CRC
> of the cipher text v.s. CBC mode where that would cause corruption to
> cascade forward. Same could be used for leaving things like
> pd_pagesize_version in plaintext at its current offset. For anything
> deemed non-sensitive, having it readable without having to decrypt the
> page is useful.

Yes, I did cover that here:

    https://www.postgresql.org/message-id/20190716002519.yyvgl7qi4ewl6pc2@momjian.us
    
    Yes, it would only work if the checksum was the last part of the page,
    or if we used CTR mode, where changing the source bits doesn't affect
    the later bits.  I am thinking crazy here, I know, but it seemed worth
    mentioning in case someone liked it.
    
    https://www.postgresql.org/message-id/20190715194239.iqq5jdj54ru32kmt@momjian.us
    
    If we want to go crazy, we could encrypt, assume zeros for the CRC,
    compute the MAC and put it in the place of the CRC is, but then tools
    that read CRC would see that as an error, so we don't want to go there.
    Yes, crazy.

I know this thread is long so you might have missed it.

I do think CTR mode is the way to go for the heap/index pages and the
WAL, and will reply to another email on that topic now.

> It does not have to be full bytes either. CTR mode operates as a
> stream of bits so its possible to replace nibbles or even individual
> bits. It can be something as small one bit for an "is_encrypted" flag
> or a handful of bits used to infer a derived key. For example, with
> 2-bits you could have 00 represent unencrypted, 01/10 represent
> old/new key, and 11 be future use. Something like that could
> facilitate online key rotation.

Yes, if we do all-cluster encryption, we can just consult pg_control,
but if we do per-table/index, that might be needed.  There is another
email suggesting symlink file to a key file could indicate an encrypted
table/index.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Jul 16, 2019 at 01:24:54PM +0900, Masahiko Sawada wrote:
> On Sat, Jul 13, 2019 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:
> > then each row change gets its own LSN.  You are asking if an update that
> > just expires one row and adds it to a new page gets the same LSN.  I
> > don't know.
> 
> The following scripts can reproduce that different two pages have the same LSN.
> 
> =# create table test (a int);
> CREATE TABLE
> =# insert into test select generate_series(1, 226);
> INSERT 0 226
> =# update test set a = a where a = 1;
> UPDATE 1
> =# select lsn from page_header(get_raw_page('test', 0));
>     lsn
> -----------
>  0/1690488
> (1 row)
> 
> =# select lsn from page_header(get_raw_page('test', 1));
>     lsn
> -----------
>  0/1690488
> (1 row)
> 
> So I think it's better to use LSN and page number to create IV. If we
> modify different tables by single WAL we also would need OID or
> relfilenode but I don't think currently we have such operations.

OK, good to know, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
> I've re-considered the design of TDE feature based on the discussion
> so far. The one of the main open question is the granular of
> encryption objects: cluster encryption or more-granular-than-cluster
> encryption. The followings describe about the new TDE design when we
> choose table-level encryption or something-new-group-level encryption.
> 
> General
> ========
> We will use AES and support both AES-128 and AES-256. User can specify
> the new initdb option something like --aes-128 or --aes-256 to enable
> encryption and must specify --encryption-key-passphrase-command along
> with. (I guess we also require openssl library.) If these options are
> specified, we write the key length to the control file and derive the
> KEK and generate MDEK during initdb. wal_log_hints will be enabled
> automatically in encryption mode, like we do for checksum mode,

Agreed.  pg_control will store the none/AES128/AES256 indicator.

> Key Management
> ==============
> We will use 3-tier key architecture as Joe proposed.
> 
>   1. A master key encryption key (KEK): this is the ley supplied by the
>      database admin using something akin to ssl_passphrase_command
> 
>   2. A master data encryption key (MDEK): this is a generated key using a
>      cryptographically secure pseudo-random number generator. It is
>      encrypted using the KEK, probably with Key Wrap (KW):
>      or maybe better Key Wrap with Padding (KWP):
> 
>   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
>       table specific keys.

What is the value of a per-table encryption key?  How is HKDF derived?
Are we still unclear if the 68GB limit is per encryption key or per
encryption key/IV combination?
 
>   3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
>       generate new keys when needed for WAL.
> 
> We store MDEK to the plain file (say global/pgkey) after encrypted
> with the KEK. I might want to store the hash of passphrase of the KEK
> in order to verify the correctness of the given passphrase. However we
> don't need to store TDEK and WDEK as we can derive them as needed. The
> key file can be read by both backend processes and front-end tools.

Yes, we need to verify the pass phrase.

> When postmaster startup, it reads the key file and decrypts MDEK and
> derive WDEK using key id for WDEK. WDEK is loaded to the key hash map
> (keyid -> key) on the shared memory. Also we derive TDEK as needed
> when reading tables or indexes and add it to the key hash map as well
> if not exists.
> 
> Buffer Encryption
> ==============
> We will use AES-CBC for buffer encryption. We will add key id (4byte)

I think we might want to use CTR for this, and will post after this.

> to after the pd_lsn(8byte) in PageHeaderData and we will not encrypt
> first 16 byte of each pages so the LSN and key id can be used. We can
> store an invalid key id to tell us that the table is not encrypted.
> There two benefits of storing key id to the page header: offline tools
> can get key id (and know the table is encrypted or not) and it's
> helpful for online rekey in the future.

I don't remember anyone suggesting different keys for different tables. 
How would this even be managed by the user?

> I've considered to store IV and key id to a new fork but I felt that
> it is complex because we will always need to have the fork on the
> shared buffer when any pages of its main fork is written to the disk.
> If almost buffers of the shared buffers are dirtied and theirs new
> forks are not  loaded to the shared buffer, we might need to load the
> new fork and write the page to the disk and then evict some pages,
> over and over.
> 
> We will use (page lsn, page number) to create a nonce. IVs are created
> by encrypting the nonce with its TDEK.

Agreed.

> WAL Encryption
> =============
> We will use AES-CTR for WAL encryption and encrypt each WAL pages with WDEK.
> 
> We will use WAL segment number to create a nonce. Similar to buffer
> encryption, IVs are created using by the nonce and WDEK.

Yes.  If there is concern about collision of table/index and WAL IVs, we
can add a constant to the two uses, as Joe Conway mentioned.

> If we want to support enabling or disabling encryption after initdb we
> might want to have key id in the WAL page header.
> 
> Front-end Tool Support
> ==================
> We will add --encryption-key-passphrase-command option to the
> front-end tools that read database files or WAL segment files directly.
> They can get KEK via --encryption-key-passphrase-command and get MDEK
> by reading the key file. Also they can know the key length by checking
> the control file. Since they can derive TDEK using by key id stored in
> the page header they can decrypt database files. Similarly, they also
> can decrypt WAL as they can know the key id of WDEK.
>
> Master Key Rotation
> ================
> We will support new command-line tool that rotates the master key
> offline. It accepts --old-encryption-key-passphrase-command option and
> --new-encryption-key-passphrase-command to get old KEK and new KEK
> respectively. It decrypt MDEK with the old key and encrypt it with
> the new key.

That handles changing the passphrase, but what about rotating the
encryption key?  Don't we want to support that, at least in offline
mode?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 01:18:44PM -0400, Bruce Momjian wrote:
> > Key Management
> > ==============
> > We will use 3-tier key architecture as Joe proposed.
> > 
> >   1. A master key encryption key (KEK): this is the ley supplied by the
> >      database admin using something akin to ssl_passphrase_command
> > 
> >   2. A master data encryption key (MDEK): this is a generated key using a
> >      cryptographically secure pseudo-random number generator. It is
> >      encrypted using the KEK, probably with Key Wrap (KW):
> >      or maybe better Key Wrap with Padding (KWP):
> > 
> >   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
> >       table specific keys.
> 
> What is the value of a per-table encryption key?  How is HKDF derived?
> Are we still unclear if the 68GB limit is per encryption key or per
> encryption key/IV combination?

Oh, I see you got this from Joe Conway's email.  Let me reply to that
now.  (I am obviously having problems keeping this thread in my head as
well.)

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 26, 2019 at 2:18 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
> > I've re-considered the design of TDE feature based on the discussion
> > so far. The one of the main open question is the granular of
> > encryption objects: cluster encryption or more-granular-than-cluster
> > encryption. The followings describe about the new TDE design when we
> > choose table-level encryption or something-new-group-level encryption.
> >
> > General
> > ========
> > We will use AES and support both AES-128 and AES-256. User can specify
> > the new initdb option something like --aes-128 or --aes-256 to enable
> > encryption and must specify --encryption-key-passphrase-command along
> > with. (I guess we also require openssl library.) If these options are
> > specified, we write the key length to the control file and derive the
> > KEK and generate MDEK during initdb. wal_log_hints will be enabled
> > automatically in encryption mode, like we do for checksum mode,
>
> Agreed.  pg_control will store the none/AES128/AES256 indicator.
>
> > Key Management
> > ==============
> > We will use 3-tier key architecture as Joe proposed.
> >
> >   1. A master key encryption key (KEK): this is the ley supplied by the
> >      database admin using something akin to ssl_passphrase_command
> >
> >   2. A master data encryption key (MDEK): this is a generated key using a
> >      cryptographically secure pseudo-random number generator. It is
> >      encrypted using the KEK, probably with Key Wrap (KW):
> >      or maybe better Key Wrap with Padding (KWP):
> >
> >   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
> >       table specific keys.
>
> What is the value of a per-table encryption key?  How is HKDF derived?

Per-table encryption key is derived from MDEK with salt and its OID as
info. I think we can store salts for each encryption keys into the
separate file so that off-line tool also can read it.

> Are we still unclear if the 68GB limit is per encryption key or per
> encryption key/IV combination?

I think that 68GB refers to key+IV but I'll research that.

>
> >   3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
> >       generate new keys when needed for WAL.
> >
> > We store MDEK to the plain file (say global/pgkey) after encrypted
> > with the KEK. I might want to store the hash of passphrase of the KEK
> > in order to verify the correctness of the given passphrase. However we
> > don't need to store TDEK and WDEK as we can derive them as needed. The
> > key file can be read by both backend processes and front-end tools.
>
> Yes, we need to verify the pass phrase.
>
> > When postmaster startup, it reads the key file and decrypts MDEK and
> > derive WDEK using key id for WDEK. WDEK is loaded to the key hash map
> > (keyid -> key) on the shared memory. Also we derive TDEK as needed
> > when reading tables or indexes and add it to the key hash map as well
> > if not exists.
> >
> > Buffer Encryption
> > ==============
> > We will use AES-CBC for buffer encryption. We will add key id (4byte)
>
> I think we might want to use CTR for this, and will post after this.
>
> > to after the pd_lsn(8byte) in PageHeaderData and we will not encrypt
> > first 16 byte of each pages so the LSN and key id can be used. We can
> > store an invalid key id to tell us that the table is not encrypted.
> > There two benefits of storing key id to the page header: offline tools
> > can get key id (and know the table is encrypted or not) and it's
> > helpful for online rekey in the future.
>
> I don't remember anyone suggesting different keys for different tables.
> How would this even be managed by the user?

I think it's still unclear whether we implement one key for whole
database cluster or different keys for different table as the first
version. I'm evaluating the performance overhead of the latter that
you concerned and will share it.

I prefer tablespace-level or something-new-group-level than
table-level but if we choose the latter we can create a new group of
tables that are encrypted with the same key. That is user create a
group and then associate tables to that group. Tablespace-level is
implemented in the patch I submitted before. Or it's just idea but
another idea could be to allow users to create encryption key object
first and then specify which tables are encrypted with which
encryption key in DDL. For example, user creates an encryption keys
with name by SQL function and creates an encrypted table by CREATE
TABLE ... WITH (encryption_key = 'mykey');.

>
> > I've considered to store IV and key id to a new fork but I felt that
> > it is complex because we will always need to have the fork on the
> > shared buffer when any pages of its main fork is written to the disk.
> > If almost buffers of the shared buffers are dirtied and theirs new
> > forks are not  loaded to the shared buffer, we might need to load the
> > new fork and write the page to the disk and then evict some pages,
> > over and over.
> >
> > We will use (page lsn, page number) to create a nonce. IVs are created
> > by encrypting the nonce with its TDEK.
>
> Agreed.
>
> > WAL Encryption
> > =============
> > We will use AES-CTR for WAL encryption and encrypt each WAL pages with WDEK.
> >
> > We will use WAL segment number to create a nonce. Similar to buffer
> > encryption, IVs are created using by the nonce and WDEK.
>
> Yes.  If there is concern about collision of table/index and WAL IVs, we
> can add a constant to the two uses, as Joe Conway mentioned.
>
> > If we want to support enabling or disabling encryption after initdb we
> > might want to have key id in the WAL page header.
> >
> > Front-end Tool Support
> > ==================
> > We will add --encryption-key-passphrase-command option to the
> > front-end tools that read database files or WAL segment files directly.
> > They can get KEK via --encryption-key-passphrase-command and get MDEK
> > by reading the key file. Also they can know the key length by checking
> > the control file. Since they can derive TDEK using by key id stored in
> > the page header they can decrypt database files. Similarly, they also
> > can decrypt WAL as they can know the key id of WDEK.
> >
> > Master Key Rotation
> > ================
> > We will support new command-line tool that rotates the master key
> > offline. It accepts --old-encryption-key-passphrase-command option and
> > --new-encryption-key-passphrase-command to get old KEK and new KEK
> > respectively. It decrypt MDEK with the old key and encrypt it with
> > the new key.
>
> That handles changing the passphrase, but what about rotating the
> encryption key?  Don't we want to support that, at least in offline
> mode?

Yeah, supporting rotating the encryption key is a good idea. Agreed.

After more thoughts, it's a just idea but I wonder if the first
implementation step of TDE for v13 could be key management module.
That is, (in 3-tier case) PostgreSQL gets KEK by passphrase command or
directly, and creates MDEK. User can create an encryption key with
name using by SQL function, and the key manager derives DEK and store
its salt to the disk. Also we have an internal interface to get an
encryption key.

The good point is not only to develop incrementally but also that if
PostgreSQL is able to manage (symmetric) encryption keys inside
database cluster and has interfaces to get and add keys, pgcrypt also
will be able to use it. That way, we will provide column-level TDE
first by combination of pgcrypt, triggers and views while keeping
encryption keys truly secret. After that we can add other level TDE
using the key management module. We would then be able to focus on how
to encrypt buffer and WAL.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:
> In my email I linked the wrong page for [2]. The correct one is here:
> [2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html
> 
> Following that, I think we could end up with three tiers:
> 
> 1. A master key encryption key (KEK): this is the ley supplied by the
>    database admin using something akin to ssl_passphrase_command
> 
> 2. A master data encryption key (MDEK): this is a generated key using a
>    cryptographically secure pseudo-random number generator. It is
>    encrypted using the KEK, probably with Key Wrap (KW):
>    or maybe better Key Wrap with Padding (KWP):
> 
> 3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
>     table specific keys.
> 
> 3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
>     generate new keys when needed for WAL (based on the other info we
>     need to change WAL keys every 68 GB unless I read that wrong).
> 
> I believe that would allows us to have multiple keys but they are
> derived securely from the one DEK using available info similar to the
> way we intend to use LSN to derive the IVs -- perhaps table.oid for
> tables and something else for WAL.
> 
> We also need to figure out how/when to generate new WDEK. Maybe every
> checkpoint, also meaning we would have to force a checkpoint every 68GB?

Masahiko Sawada copied this section as a desired direction, so I want to
drill down into it.  I think we have five possible approaches for level
3 listed above.

The simplest approach would be to say that the LSN/page-number and WAL
segment-number used as IV will not collide and we can just use them
directly.

The second approach is to say they will collide and that we need to mix
a constant into the IV for tables/indexes and a different one for WAL. 
In a way I would like to mix the pg_controldata Database system
Identifier into there too, but I am unclear on the value and complexity
involved.

A third approach would be to say that we will have duplicate LSNs
between a table and its index?  Maybe we need three constants, one for
heap, one for indexes, and one for WAL.

A fourth approach would be to say we will have duplicate LSNs on
different heap files, or index files.  We would then modify pg_upgrade to
preserve relfilenode and use that.  (I don't think pg_class.oid is
visible during recovery, and it certainly isn't visible in offline
mode.)

However, we need to be clear that adding relfilenode is only helping to
avoid tables/indexes with the same LSN pages.  It doesn't address the
68GB limit since our tables can be larger than that.

A fifth approach would be to decide that 68GB is the limit for a single
key (not single key/IV combo).  If that is the case we need a different
key for each 68GB of a file, and because we break files into 1GB chunks,
we would just use a different key for each chunk, and I guess store the
keys in the file system, encrypted with the master key.

My big point is that we need to decide where the IV collisions will
happen, and what our encryption limit per key (not per key/IV
combination) is.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 26, 2019 at 02:54:04AM +0900, Masahiko Sawada wrote:
> On Fri, Jul 26, 2019 at 2:18 AM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
> > > I've re-considered the design of TDE feature based on the discussion
> > > so far. The one of the main open question is the granular of
> > > encryption objects: cluster encryption or more-granular-than-cluster
> > > encryption. The followings describe about the new TDE design when we
> > > choose table-level encryption or something-new-group-level encryption.
> > >
> > > General
> > > ========
> > > We will use AES and support both AES-128 and AES-256. User can specify
> > > the new initdb option something like --aes-128 or --aes-256 to enable
> > > encryption and must specify --encryption-key-passphrase-command along
> > > with. (I guess we also require openssl library.) If these options are
> > > specified, we write the key length to the control file and derive the
> > > KEK and generate MDEK during initdb. wal_log_hints will be enabled
> > > automatically in encryption mode, like we do for checksum mode,
> >
> > Agreed.  pg_control will store the none/AES128/AES256 indicator.
> >
> > > Key Management
> > > ==============
> > > We will use 3-tier key architecture as Joe proposed.
> > >
> > >   1. A master key encryption key (KEK): this is the ley supplied by the
> > >      database admin using something akin to ssl_passphrase_command
> > >
> > >   2. A master data encryption key (MDEK): this is a generated key using a
> > >      cryptographically secure pseudo-random number generator. It is
> > >      encrypted using the KEK, probably with Key Wrap (KW):
> > >      or maybe better Key Wrap with Padding (KWP):
> > >
> > >   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
> > >       table specific keys.
> >
> > What is the value of a per-table encryption key?  How is HKDF derived?
> 
> Per-table encryption key is derived from MDEK with salt and its OID as
> info. I think we can store salts for each encryption keys into the
> separate file so that off-line tool also can read it.

Thanks. I just sent an email with five possible options for this.  I
think relfilenode will be fine --- I am not sure what salt adds to this.

> > Are we still unclear if the 68GB limit is per encryption key or per
> > encryption key/IV combination?
> 
> I think that 68GB refers to key+IV but I'll research that.

Yes, please.  I think we need a definite answer on that question, which
you will see in my later email.

> > I don't remember anyone suggesting different keys for different tables.
> > How would this even be managed by the user?
> 
> I think it's still unclear whether we implement one key for whole
> database cluster or different keys for different table as the first
> version. I'm evaluating the performance overhead of the latter that
> you concerned and will share it.

I am not worried about the performance of this --- if it not secure with
a single key, it is useless, so we have to do it.  If the single key is
secure, I don't think multiple keys helps us.  I think we already
decided that the keys always have to be online for crash recovery, so we
can't allow users to control their keys anyway.

> I prefer tablespace-level or something-new-group-level than
> table-level but if we choose the latter we can create a new group of
> tables that are encrypted with the same key. That is user create a
> group and then associate tables to that group. Tablespace-level is
> implemented in the patch I submitted before. Or it's just idea but
> another idea could be to allow users to create encryption key object
> first and then specify which tables are encrypted with which
> encryption key in DDL. For example, user creates an encryption keys
> with name by SQL function and creates an encrypted table by CREATE
> TABLE ... WITH (encryption_key = 'mykey');.

That seems very complex so I think we need agreement to go in that
direction, and see what I said above about multiple keys.

> > That handles changing the passphrase, but what about rotating the
> > encryption key?  Don't we want to support that, at least in offline
> > mode?
> 
> Yeah, supporting rotating the encryption key is a good idea. Agreed.
> 
> After more thoughts, it's a just idea but I wonder if the first
> implementation step of TDE for v13 could be key management module.
> That is, (in 3-tier case) PostgreSQL gets KEK by passphrase command or
> directly, and creates MDEK. User can create an encryption key with
> name using by SQL function, and the key manager derives DEK and store
> its salt to the disk. Also we have an internal interface to get an
> encryption key.
> 
> The good point is not only to develop incrementally but also that if
> PostgreSQL is able to manage (symmetric) encryption keys inside
> database cluster and has interfaces to get and add keys, pgcrypt also
> will be able to use it. That way, we will provide column-level TDE
> first by combination of pgcrypt, triggers and views while keeping
> encryption keys truly secret. After that we can add other level TDE
> using the key management module. We would then be able to focus on how
> to encrypt buffer and WAL.

Uh, remember, all keys have to be online all the time, so what value
does this add?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 20, 2019 at 07:30:30PM +0200, Tomas Vondra wrote:
> Forbid checksums? I don't see how that could be acceptable. We either have
> to accept the limitations of the current design (having to decrypt
> everything before checking the checksums) or change the design.

Yes, checksums certainly have to work.

> I personally think we should do the latter - not just because of this
> "decrypt-then-verify" issue, but consider how much work we've done to
> allow enabling checksums on-line (it's still not there, but it's likely
> doable in PG13). How are we going to do that with encryption? ISTM we
> should design it so that we can enable encryption on-line too - maybe not
> in v1, but it should be possible. So how are we going to do that? With
> checksums it's (fairly) easy because we can "not verify" the page before
> we know all pages have a checksum, but with encryption that's not
> possible. And if the whole page is encrypted, then what?

Well, I assumed we would start with a command-line offline tool to
add/remove encryption, and I assumed the command-line tool pg_checksums
would use the same code to decrypt the page to add/remove checksums and
rewrite it.  I don't think we will ever allow add/remove encryption in
online mode.

Does that help?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Jul 19, 2019 at 01:59:41PM +0200, Tomas Vondra wrote:
> On Fri, Jul 19, 2019 at 12:04:36PM +0200, Antonin Houska wrote:
> > We can guarantee integrity and authenticity of backup, but that's a separate
> > feature: someone may need this although it's o.k. for him to run the cluster
> > unencrypted.

> Yes, I do agree with that. I think attempts to guarantee data authenticity
> and/or integrity at the page level is mostly futile (replay attacks are an
> example of why). IMHO we should consider that to be outside the threat
> model TDE is expected to address.

Yes, I think we can say that checksums _help_ detect unauthorized
database changes, and usually detects database corruption, but it isn't
a fully secure solution.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 02:05:05PM -0400, Bruce Momjian wrote:
> The second approach is to say they will collide and that we need to mix
> a constant into the IV for tables/indexes and a different one for WAL. 
> In a way I would like to mix the pg_controldata Database system
> Identifier into there too, but I am unclear on the value and complexity
> involved.
> 
> A third approach would be to say that we will have duplicate LSNs
> between a table and its index?  Maybe we need three constants, one for
> heap, one for indexes, and one for WAL.
> 
> A fourth approach would be to say we will have duplicate LSNs on
> different heap files, or index files.  We would then modify pg_upgrade to
> preserve relfilenode and use that.  (I don't think pg_class.oid is
> visible during recovery, and it certainly isn't visible in offline
> mode.)
> 
> However, we need to be clear that adding relfilenode is only helping to
> avoid tables/indexes with the same LSN pages.  It doesn't address the
> 68GB limit since our tables can be larger than that.
> 
> A fifth approach would be to decide that 68GB is the limit for a single
> key (not single key/IV combo).  If that is the case we need a different
> key for each 68GB of a file, and because we break files into 1GB chunks,
> we would just use a different key for each chunk, and I guess store the
> keys in the file system, encrypted with the master key.
> 
> My big point is that we need to decide where the IV collisions will
> happen, and what our encryption limit per key (not per key/IV
> combination) is.

After talking to Joe Conway, I just want to mention that if we decide
that the LSN is unique among heap and index, or among heap or index, we
will need to make sure future WAL records retain this uniqueness.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> After talking to Joe Conway, I just want to mention that if we decide
> that the LSN is unique among heap and index, or among heap or index, we
> will need to make sure future WAL records retain this uniqueness.

One thing comes to mind regarding this and I'll admit that I don't quite
remember exactly off-hand but I also don't want to not mention it now
and forget to later.

What about pg_upgrade?

Thanks,

Stephen

Attachment
On 2019-Jul-15, Bruce Momjian wrote:

> Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> the checksum (per-page or WAL) will not match our decrypted output.  How
> would they make it match the checksum without already knowing the key. 
> I read [1] but could not see that explained.
> 
> This post discussed it:
> 
>     https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

I find all the discussion downthread from this post pretty confusing.
Why are we encrypting the page header in the first place?  It seems to
me that the encrypted area should cover only the line pointers and the
tuple data area; the page header needs to be unencrypted so that it can
be used at all: firstly because you need to obtain the LSN from it in
order to compute the IV, and secondly because the checksum must be
validated *before* decrypting (per Moxie Marlinspike's "cryptographic
doom" principle mentioned in a comment in the SE question).

I am not totally clear on whether the special space and the "page hole"
need to be encrypted.  I tend to think that they should *not* be
encrypted; in particular, encrypting a large area containing zeroes seem
a plentiful source of known cleartext, which seems a bad thing.  Special
space also seems to contain known cleartext; maybe not as much as the
page hole, but still seems better avoided.

Given this, it seems to me that we should first encrypt those two data
areas, and *then* compute the CRC on the complete page just like we do
today ... and the result is stored in an unencrypted area (the page
header) and so it doesn't affect the encryption.

The checksum we currently have is not cryptographically secure -- it's
not a crypto-strong signature.  If we want that, we need some further
protection.  Maybe for encrypted tables we replace our current checksum
with an cryptographically secure signature ...?  Pretty sure 16 bits are
insufficient for that, but I suppose we would just use a different page
header with room for a proper sig.

Am I misunderstanding something?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Thu, Jul 25, 2019 at 03:41:05PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > After talking to Joe Conway, I just want to mention that if we decide
> > that the LSN is unique among heap and index, or among heap or index, we
> > will need to make sure future WAL records retain this uniqueness.
> 
> One thing comes to mind regarding this and I'll admit that I don't quite
> remember exactly off-hand but I also don't want to not mention it now
> and forget to later.
> 
> What about pg_upgrade?

So, we don't carry WAL from the old cluster to the new cluster, so if
the WAL is changed and had duplicates, it would only be new WAL records.
pg_upgrade seems immune to must of this, and that is by design. 
However, I am hesitant to change the heap/index page format for
encryption because if we add fields, old pages might not fit as
encrypted pages, and then you have to move rows around, and things
become _much_ more complicated.

I don't see any other pg_upgrade issues, unless someone else does.  Oh,
we will have to check pg_control for a matching encryption format.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jul 25, 2019 at 03:41:05PM -0400, Stephen Frost wrote:
> > Greetings,
> >
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > After talking to Joe Conway, I just want to mention that if we decide
> > > that the LSN is unique among heap and index, or among heap or index, we
> > > will need to make sure future WAL records retain this uniqueness.
> >
> > One thing comes to mind regarding this and I'll admit that I don't quite
> > remember exactly off-hand but I also don't want to not mention it now
> > and forget to later.
> >
> > What about pg_upgrade?
>
> So, we don't carry WAL from the old cluster to the new cluster, so if
> the WAL is changed and had duplicates, it would only be new WAL records.

Right, we don't carry it forward- but what I couldn't remember is if
start from more-or-less LSN 0 or if pg_upgrade will arrange it such that
the new major version will start from LSN-of-old+1 (or whatever).  Seems
like it'd *have* to be the latter, but just thought of it and wanted to
make sure.

> pg_upgrade seems immune to must of this, and that is by design.
> However, I am hesitant to change the heap/index page format for
> encryption because if we add fields, old pages might not fit as
> encrypted pages, and then you have to move rows around, and things
> become _much_ more complicated.

Yeah, I'm afraid we are going to have a hard time making this work
without changing the page format for encrypted..  I don't know if that's
actually a *huge* issue like we've considered it to be in the past or
not, as making someone rewrite just the few sensitive tables in their
environment might not be that bad, and there's also logical replication
today..

> I don't see any other pg_upgrade issues, unless someone else does.  Oh,
> we will have to check pg_control for a matching encryption format.

Yes, certainly it'd need to be updated for at least that, when upgading
an encrypted cluster.

Thanks!

Stephen

Attachment
On Thu, Jul 25, 2019 at 03:43:34PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-15, Bruce Momjian wrote:
> 
> > Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> > the checksum (per-page or WAL) will not match our decrypted output.  How
> > would they make it match the checksum without already knowing the key. 
> > I read [1] but could not see that explained.
> > 
> > This post discussed it:
> > 
> >     https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
> 
> I find all the discussion downthread from this post pretty confusing.

Agreed.

> Why are we encrypting the page header in the first place?  It seems to
> me that the encrypted area should cover only the line pointers and the
> tuple data area; the page header needs to be unencrypted so that it can
> be used at all: firstly because you need to obtain the LSN from it in

Yes, the plan was to not encrypt the first 16 bytes so the LSN was visible.

> order to compute the IV, and secondly because the checksum must be
> validated *before* decrypting (per Moxie Marlinspike's "cryptographic
> doom" principle mentioned in a comment in the SE question).

Uh, I think we are still on the fence about writing the checksum _after_
encryption, but I think we are leaning against that, meaning online or
offline encryption must be able to decrypt the page.  Since we will
already need an offline tool to enable/remove encryption anyway, it
seems we can just reuse that code for pg_checksums.

I think we have three options with for CRC:

1.  compute CRC and then encrypt everything

2   encrypt and then CRC, and store the CRC unchanged

3.  encrypt and then CRC, and store the CRC encrypted

The only way offline tools can verify the CRC without access to the keys
is via #2, but #2 gives us _no_ detection of tampering.  I realize the
CRC tampering detection of #1 and #3 is not great, but it certainly has
some value.

> I am not totally clear on whether the special space and the "page hole"
> need to be encrypted.  I tend to think that they should *not* be
> encrypted; in particular, encrypting a large area containing zeroes seem
> a plentiful source of known cleartext, which seems a bad thing.  Special
> space also seems to contain known cleartext; maybe not as much as the
> page hole, but still seems better avoided.

Uh, there are no known attacks on AES with known plain-text, e.g., SSL
uses AES, so I think we are good with encrypting everything after the
first 16 bytes.

> Given this, it seems to me that we should first encrypt those two data
> areas, and *then* compute the CRC on the complete page just like we do
> today ... and the result is stored in an unencrypted area (the page
> header) and so it doesn't affect the encryption.

Yes, that is a possibility.

> The checksum we currently have is not cryptographically secure -- it's
> not a crypto-strong signature.  If we want that, we need some further
> protection.  Maybe for encrypted tables we replace our current checksum
> with an cryptographically secure signature ...?  Pretty sure 16 bits are
> insufficient for that, but I suppose we would just use a different page
> header with room for a proper sig.

Yes, checksum is more for best-effort than fully secure, but replay of
pages makes a fullly secure solution hard anyway.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 03:55:01PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Thu, Jul 25, 2019 at 03:41:05PM -0400, Stephen Frost wrote:
> > > Greetings,
> > > 
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > After talking to Joe Conway, I just want to mention that if we decide
> > > > that the LSN is unique among heap and index, or among heap or index, we
> > > > will need to make sure future WAL records retain this uniqueness.
> > > 
> > > One thing comes to mind regarding this and I'll admit that I don't quite
> > > remember exactly off-hand but I also don't want to not mention it now
> > > and forget to later.
> > > 
> > > What about pg_upgrade?
> > 
> > So, we don't carry WAL from the old cluster to the new cluster, so if
> > the WAL is changed and had duplicates, it would only be new WAL records.
> 
> Right, we don't carry it forward- but what I couldn't remember is if
> start from more-or-less LSN 0 or if pg_upgrade will arrange it such that
> the new major version will start from LSN-of-old+1 (or whatever).  Seems
> like it'd *have* to be the latter, but just thought of it and wanted to
> make sure.

pg_upgrade uses pg_resetwal -l to set the next WAL segment file based on
the value in the old cluster:

    /* now reset the wal archives in the new cluster */
    prep_status("Resetting WAL archives");
    exec_prog(UTILITY_LOG_FILE, NULL, true, true,
    /* use timeline 1 to match controldata and no WAL history file */
-->           "\"%s/pg_resetwal\" -l 00000001%s \"%s\"", new_cluster.bindir,
              old_cluster.controldata.nextxlogfile + 8,
              new_cluster.pgdata);

> > pg_upgrade seems immune to must of this, and that is by design. 
> > However, I am hesitant to change the heap/index page format for
> > encryption because if we add fields, old pages might not fit as
> > encrypted pages, and then you have to move rows around, and things
> > become _much_ more complicated.
> 
> Yeah, I'm afraid we are going to have a hard time making this work
> without changing the page format for encrypted..  I don't know if that's
> actually a *huge* issue like we've considered it to be in the past or
> not, as making someone rewrite just the few sensitive tables in their
> environment might not be that bad, and there's also logical replication
> today..

It is hard to do that while the server is offline.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 15, 2019 at 06:08:28PM -0400, Sehrope Sarkuni wrote:
> Hi,
> 
> Some more thoughts on CBC vs CTR modes. There are a number of
> advantages to using CTR mode for page encryption.
> 
> CTR encryption modes can be fully parallelized, whereas CBC can only
> parallelized for decryption. While both can use AES specific hardware
> such as AES-NI, CTR modes can go a step further and use vectorized
> instructions.
> 
> On an i7-8559U (with AES-NI) I get a 4x speed improvement for
> CTR-based modes vs CBC when run on 8K of data:
> 
> # openssl speed -evp ${cipher}
> type             16 bytes     64 bytes    256 bytes   1024 bytes
> 8192 bytes  16384 bytes
> aes-128-cbc    1024361.51k  1521249.60k  1562033.41k  1571663.87k
> 1574537.90k  1575512.75k
> aes-128-ctr     696866.85k  2214441.86k  4364903.85k  5896221.35k
> 6559735.81k  6619594.75k
> aes-128-gcm     642758.92k  1638619.09k  3212068.27k  5085193.22k
> 6366035.97k  6474006.53k
> aes-256-cbc     940906.25k  1114628.44k  1131255.13k  1138385.92k
> 1140258.13k  1143592.28k
> aes-256-ctr     582161.82k  1896409.32k  3216926.12k  4249708.20k
> 4680299.86k  4706375.00k
> aes-256-gcm     553513.89k  1532556.16k  2705510.57k  3931744.94k
> 4615812.44k  4673093.63k

I am back to this email now.  I think there is a strong case that we
should use CTR mode for both WAL and heap/index files because CTR mode
is faster.  CBC mode has the advantage of being more immune to IV
duplication, but I think the fact that the page format is similar enough
among pages means we don't gain a lot from that, and therefor IV
uniqueness must be closely honored anyway.

> For relation data where the encryption is going to be per page,
> there's flexibility in how the CTR nonce (IV + counter) is generated.
> With an 8K page, the counter need only go up to 512 for each page
> (8192-bytes per page / 16-bytes per AES-block). That would require
> 9-bits for the counter. Rounding that up to 16-bits allows for wider
> pages and it still uses only two bytes of the counter while ensuring
> that it'd be unique per AES-block. The remaining 14-bytes would be
> populated with some other data that is guaranteed unique per
> page-write to allow encryption via the same per-relation-file derived
> key. From what I gather, the LSN is a candidate though it'd have to be
> stored in plaintext for decryption.

Yes, LSN is 8 bytes, and the page number is 4 bytes.  That leaves four
bytes of the counter.

> What's important is that writing the two pages (either different
> locations or the same page back again) never reuses the same nonce
> with the same key. Using the same nonce with a different key is fine.
> 
> With any of these schemes the same inputs will generate the same
> outputs. With CTR mode for WAL this would be an issue if the same key
> and deterministic nonce (ex: LSN + offset) is reused in multiple
> places. That does not have to be the same cluster either. For example
> if two replicas are promoted from the same backup with the same master
> key, they would generate the same WAL CTR stream, reusing the
> key/nonce pair. Ditto for starting off with a master key and deriving
> per-relation keys in a cloned installation off some deterministic
> attribute such as oid.

I think we need to document that sharing keys among clusters (except
for identical replicas) is insecure.

We can add the "Database system identifier" into the IV, which would
avoid the problem of two clusters using the same key, but it wouldn't
avoid the problem with promoting two replicas to primaries because they
would have the same "Database system identifier" so I think it is just
simpler to say "don't do that".

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jul 25, 2019 at 03:55:01PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Thu, Jul 25, 2019 at 03:41:05PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > After talking to Joe Conway, I just want to mention that if we decide
> > > > > that the LSN is unique among heap and index, or among heap or index, we
> > > > > will need to make sure future WAL records retain this uniqueness.
> > > >
> > > > One thing comes to mind regarding this and I'll admit that I don't quite
> > > > remember exactly off-hand but I also don't want to not mention it now
> > > > and forget to later.
> > > >
> > > > What about pg_upgrade?
> > >
> > > So, we don't carry WAL from the old cluster to the new cluster, so if
> > > the WAL is changed and had duplicates, it would only be new WAL records.
> >
> > Right, we don't carry it forward- but what I couldn't remember is if
> > start from more-or-less LSN 0 or if pg_upgrade will arrange it such that
> > the new major version will start from LSN-of-old+1 (or whatever).  Seems
> > like it'd *have* to be the latter, but just thought of it and wanted to
> > make sure.
>
> pg_upgrade uses pg_resetwal -l to set the next WAL segment file based on
> the value in the old cluster:
>
>     /* now reset the wal archives in the new cluster */
>     prep_status("Resetting WAL archives");
>     exec_prog(UTILITY_LOG_FILE, NULL, true, true,
>     /* use timeline 1 to match controldata and no WAL history file */
> -->           "\"%s/pg_resetwal\" -l 00000001%s \"%s\"", new_cluster.bindir,
>               old_cluster.controldata.nextxlogfile + 8,
>               new_cluster.pgdata);

Ah, right, ok, we reset the timeline but not the LSN, ok.

> > > pg_upgrade seems immune to must of this, and that is by design.
> > > However, I am hesitant to change the heap/index page format for
> > > encryption because if we add fields, old pages might not fit as
> > > encrypted pages, and then you have to move rows around, and things
> > > become _much_ more complicated.
> >
> > Yeah, I'm afraid we are going to have a hard time making this work
> > without changing the page format for encrypted..  I don't know if that's
> > actually a *huge* issue like we've considered it to be in the past or
> > not, as making someone rewrite just the few sensitive tables in their
> > environment might not be that bad, and there's also logical replication
> > today..
>
> It is hard to do that while the server is offline.

I don't see any reason to assume we must only support encrypting
individual tables when the server is offline, or that we have to support
any option which involves the server being offline when it comes to
doing encryption.

I'm not against supporting a "shut down the server and then encrypt
everything and then start it up" option, but I don't see any
particularly good reason to explicitly design the system with that
use-case in mind.

There seems to be a strong thrust on this thread to assume that a
database MUST go from ALL DECRYPTED to ALL ENCRYPTED in one shot (and
therefore we have to shut down the server to do it), but I don't get why
that's the case, particularly if we support any kind of mixed setup
where there's some data that's encrypted and some that isn't, and since
we're talking about using different keys for different things, it seems
to me that we almost certainly should be able to easily say "well,
there's no key for this, so just don't go through the decrypt/encrypt
routines".

I can see an argument for why we might need to go through a restart and
possibly use some off-line tool when a user decides they want, say, the
WAL, or CLOG, or the other control files, to be encrypted, or basic
encryption capability to be set up for the cluster (like generating the
master key and storing it or making some changes to the control file to
indicate that some things in this cluster has encrypted bits), but
saying we must have the server offline to support encrypted tables that
have a different page format is a bit like saying we need to take the
server offline to add a new tableam (like zheap) and that we have to use
some off-line utility while the server is down to convert a given table
from heapam to zheap, isn't it?

Thanks,

Stephen

Attachment
On Thu, Jul 25, 2019 at 05:50:57PM -0400, Stephen Frost wrote:
> > > > pg_upgrade seems immune to must of this, and that is by design. 
> > > > However, I am hesitant to change the heap/index page format for
> > > > encryption because if we add fields, old pages might not fit as
> > > > encrypted pages, and then you have to move rows around, and things
> > > > become _much_ more complicated.
> > > 
> > > Yeah, I'm afraid we are going to have a hard time making this work
> > > without changing the page format for encrypted..  I don't know if that's
> > > actually a *huge* issue like we've considered it to be in the past or
> > > not, as making someone rewrite just the few sensitive tables in their
> > > environment might not be that bad, and there's also logical replication
> > > today..
> > 
> > It is hard to do that while the server is offline.
> 
> I don't see any reason to assume we must only support encrypting
> individual tables when the server is offline, or that we have to support
> any option which involves the server being offline when it comes to
> doing encryption.
> 
> I'm not against supporting a "shut down the server and then encrypt
> everything and then start it up" option, but I don't see any
> particularly good reason to explicitly design the system with that
> use-case in mind.

You are right that we can allow it online, but we haven't been
discussing these cases since it is easy to do this because we have
access to the keys.  I do think whatever code we use for checksum online
changes will be used for encryption online changes.  We would need a
per-page bit to indicate encryption, hopefully in the first 16 bytes.

> There seems to be a strong thrust on this thread to assume that a
> database MUST go from ALL DECRYPTED to ALL ENCRYPTED in one shot (and
> therefore we have to shut down the server to do it), but I don't get why
> that's the case, particularly if we support any kind of mixed setup
> where there's some data that's encrypted and some that isn't, and since
> we're talking about using different keys for different things, it seems
> to me that we almost certainly should be able to easily say "well,
> there's no key for this, so just don't go through the decrypt/encrypt
> routines".

No, we can't easily do different keys for different things since all the
keys have to be online for crash recovery, so there isn't much value to
having different keys since they always have to be online.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jul 25, 2019 at 05:50:57PM -0400, Stephen Frost wrote:
> > > > > pg_upgrade seems immune to must of this, and that is by design.
> > > > > However, I am hesitant to change the heap/index page format for
> > > > > encryption because if we add fields, old pages might not fit as
> > > > > encrypted pages, and then you have to move rows around, and things
> > > > > become _much_ more complicated.
> > > >
> > > > Yeah, I'm afraid we are going to have a hard time making this work
> > > > without changing the page format for encrypted..  I don't know if that's
> > > > actually a *huge* issue like we've considered it to be in the past or
> > > > not, as making someone rewrite just the few sensitive tables in their
> > > > environment might not be that bad, and there's also logical replication
> > > > today..
> > >
> > > It is hard to do that while the server is offline.
> >
> > I don't see any reason to assume we must only support encrypting
> > individual tables when the server is offline, or that we have to support
> > any option which involves the server being offline when it comes to
> > doing encryption.
> >
> > I'm not against supporting a "shut down the server and then encrypt
> > everything and then start it up" option, but I don't see any
> > particularly good reason to explicitly design the system with that
> > use-case in mind.
>
> You are right that we can allow it online, but we haven't been
> discussing these cases since it is easy to do this because we have
> access to the keys.  I do think whatever code we use for checksum online
> changes will be used for encryption online changes.  We would need a
> per-page bit to indicate encryption, hopefully in the first 16 bytes.

Arranging to have an individual table move from being plain to
encrypted is something that would be nice to support in an online and
non-blocking manner, but *that* is a bunch of additional work that we
don't need to do today as opposed to being something that's just part of
the initial design.  Sure, it might use functions/capabilities that
pg_checksums also use, but I don't know that we need to think about the
code sharing there being much more than that, just that those
capabilities should be built out in a way that they can be used for
multiple things (and based on what I saw, that looks like it's exactly
how that code was being written already).

> > There seems to be a strong thrust on this thread to assume that a
> > database MUST go from ALL DECRYPTED to ALL ENCRYPTED in one shot (and
> > therefore we have to shut down the server to do it), but I don't get why
> > that's the case, particularly if we support any kind of mixed setup
> > where there's some data that's encrypted and some that isn't, and since
> > we're talking about using different keys for different things, it seems
> > to me that we almost certainly should be able to easily say "well,
> > there's no key for this, so just don't go through the decrypt/encrypt
> > routines".
>
> No, we can't easily do different keys for different things since all the
> keys have to be online for crash recovery, so there isn't much value to
> having different keys since they always have to be online.

Wasn't this already discussed?  We should have a master key and then
additional keys for different tables, et al, which are encrypted with
the master key.  Joe, I believe, covered all this quite well.

Either way though, I don't think it really goes against the point that I
was trying to make- we should be able to figure out if a table is
encrypted or not based on some information that we arrange to have
available during crash recovery and online processing, and the absence
of such should allow us to skip the encryption/decryption routines and
work with the table as we do today.  We should be thinking about
migrating from a completely unencrypted database to a database which has
all the 'core' bits encrypted (possibly as part of pg_upgrade or through
an offline tool of some kind) but the user data not encrypted, and then
online allow users to create new tables which are encrypted (maybe by
putting them in a particular tablespace or as a single table) and then
operate with those tables just like they would any other table in the
system, and let them manage how they move their sensitive data from some
other table into the encrypted table (or from another system into the
encrypted table).

Thanks,

Stephen

Attachment
On Thu, Jul 25, 2019 at 02:05:05PM -0400, Bruce Momjian wrote:
> Masahiko Sawada copied this section as a desired direction, so I want to
> drill down into it.  I think we have five possible approaches for level
> 3 listed above.
> 
> The simplest approach would be to say that the LSN/page-number and WAL
> segment-number used as IV will not collide and we can just use them
> directly.

Looking at the bits we have, the IV for AES is 16 bytes.  Since we know
we have to use LSN (to change the IV for each page write), and the page
number (so WAL updates that change multiple pages with the same LSN use
different IVs), that uses 12 bytes:

    LSN         8 bytes
    page-number 4 bytes

That leaves 4 bytes unused.  If we use CTR, we need 11 bits for the
counter to support 32k pages sizes (per Sehrope Sarkuni), and we can use
the remaining 5 bits as constants to indicate heap, index, or WAL. 
(Technically, since we are not encrypting the first 16 bytes, we could
use one less bit for the counter.)  If we also use relfilenode, that is
4 bytes, so we have no bits for the heap/index/WAL constant, and no
space for the CTR counter, meaning we would have to use CBC mode.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 07:41:14PM -0400, Stephen Frost wrote:
> > You are right that we can allow it online, but we haven't been
> > discussing these cases since it is easy to do this because we have
> > access to the keys.  I do think whatever code we use for checksum online
> > changes will be used for encryption online changes.  We would need a
> > per-page bit to indicate encryption, hopefully in the first 16 bytes.
> 
> Arranging to have an individual table move from being plain to
> encrypted is something that would be nice to support in an online and
> non-blocking manner, but *that* is a bunch of additional work that we
> don't need to do today as opposed to being something that's just part of
> the initial design.  Sure, it might use functions/capabilities that
> pg_checksums also use, but I don't know that we need to think about the
> code sharing there being much more than that, just that those
> capabilities should be built out in a way that they can be used for
> multiple things (and based on what I saw, that looks like it's exactly
> how that code was being written already).

Yes, we need to see how we are going to do that for checksums and
encryption and come up with a plan.

> > > There seems to be a strong thrust on this thread to assume that a
> > > database MUST go from ALL DECRYPTED to ALL ENCRYPTED in one shot (and
> > > therefore we have to shut down the server to do it), but I don't get why
> > > that's the case, particularly if we support any kind of mixed setup
> > > where there's some data that's encrypted and some that isn't, and since
> > > we're talking about using different keys for different things, it seems
> > > to me that we almost certainly should be able to easily say "well,
> > > there's no key for this, so just don't go through the decrypt/encrypt
> > > routines".
> > 
> > No, we can't easily do different keys for different things since all the
> > keys have to be online for crash recovery, so there isn't much value to
> > having different keys since they always have to be online.
> 
> Wasn't this already discussed?  We should have a master key and then
> additional keys for different tables, et al, which are encrypted with
> the master key.  Joe, I believe, covered all this quite well.

Yes, I am disagreeing with that.  I posted a 5-option email that went
over the issue and explored the value in different keys.  I am still
unclear of the benefit since it doesn't fix the 68GB limit unless we do
it per 1GB file, and we don't even know if that limit is per key or per
key/IV combo.  We can't move ahead until we decide that.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jul 25, 2019 at 07:41:14PM -0400, Stephen Frost wrote:
> > > You are right that we can allow it online, but we haven't been
> > > discussing these cases since it is easy to do this because we have
> > > access to the keys.  I do think whatever code we use for checksum online
> > > changes will be used for encryption online changes.  We would need a
> > > per-page bit to indicate encryption, hopefully in the first 16 bytes.
> >
> > Arranging to have an individual table move from being plain to
> > encrypted is something that would be nice to support in an online and
> > non-blocking manner, but *that* is a bunch of additional work that we
> > don't need to do today as opposed to being something that's just part of
> > the initial design.  Sure, it might use functions/capabilities that
> > pg_checksums also use, but I don't know that we need to think about the
> > code sharing there being much more than that, just that those
> > capabilities should be built out in a way that they can be used for
> > multiple things (and based on what I saw, that looks like it's exactly
> > how that code was being written already).
>
> Yes, we need to see how we are going to do that for checksums and
> encryption and come up with a plan.

This is already being done though- Andres has a patch posted already and
my recollection from a quick look at that is that it should work just
fine for enabling checksums as well as potentially moving a table to be
encrypted online- the main common bit being that we need a way to say
"OK, everything has been done but we need to flip this flag and make
sure that everyone knows that this is now all checksum'd or all
encrypted".  The only thing there that I'm not sure about is that when
it comes to checksums, I believe the idea is that it's cluster-wide,
while with encryption that would only be true if we were trying to do
something like move the entire cluster from unencrypted to encrypted in
an online fashion (including WAL, CLOG, et al...) and if that's the case
then there's a bunch of other complicated bits, I believe, that we'd
have to work out, and I don't really think it's necessary or sensible to
worry about that right now.  Those are problems we don't currently have
with checksums either- the WAL already has them and I don't think
anyone's trying to address the fact that other rather core pieces of
the system don't currently.

> > > > There seems to be a strong thrust on this thread to assume that a
> > > > database MUST go from ALL DECRYPTED to ALL ENCRYPTED in one shot (and
> > > > therefore we have to shut down the server to do it), but I don't get why
> > > > that's the case, particularly if we support any kind of mixed setup
> > > > where there's some data that's encrypted and some that isn't, and since
> > > > we're talking about using different keys for different things, it seems
> > > > to me that we almost certainly should be able to easily say "well,
> > > > there's no key for this, so just don't go through the decrypt/encrypt
> > > > routines".
> > >
> > > No, we can't easily do different keys for different things since all the
> > > keys have to be online for crash recovery, so there isn't much value to
> > > having different keys since they always have to be online.
> >
> > Wasn't this already discussed?  We should have a master key and then
> > additional keys for different tables, et al, which are encrypted with
> > the master key.  Joe, I believe, covered all this quite well.
>
> Yes, I am disagreeing with that.  I posted a 5-option email that went
> over the issue and explored the value in different keys.  I am still
> unclear of the benefit since it doesn't fix the 68GB limit unless we do
> it per 1GB file, and we don't even know if that limit is per key or per
> key/IV combo.  We can't move ahead until we decide that.

I understand the 68G limit that you're referring to to be key/IV combo,
which means that a key per relation should be just fine.

Even if it was per key, and it means having a key per 1GB file,
that wouldn't change the point that I was making, so I'm not entirely
sure why it's being mentioned in this context.

I disagree with any approach that lacks a master key with additional
sub-keys, if that helps clarify things.

Thanks,

Stephen

Attachment
On Thu, Jul 25, 2019 at 08:07:28PM -0400, Stephen Frost wrote:
> > Yes, we need to see how we are going to do that for checksums and
> > encryption and come up with a plan.
> 
> This is already being done though- Andres has a patch posted already and
> my recollection from a quick look at that is that it should work just
> fine for enabling checksums as well as potentially moving a table to be
> encrypted online- the main common bit being that we need a way to say
> "OK, everything has been done but we need to flip this flag and make
> sure that everyone knows that this is now all checksum'd or all
> encrypted".  The only thing there that I'm not sure about is that when
> it comes to checksums, I believe the idea is that it's cluster-wide,
> while with encryption that would only be true if we were trying to do
> something like move the entire cluster from unencrypted to encrypted in
> an online fashion (including WAL, CLOG, et al...) and if that's the case
> then there's a bunch of other complicated bits, I believe, that we'd
> have to work out, and I don't really think it's necessary or sensible to
> worry about that right now.  Those are problems we don't currently have
> with checksums either- the WAL already has them and I don't think
> anyone's trying to address the fact that other rather core pieces of
> the system don't currently.

OK,

> > > > > There seems to be a strong thrust on this thread to assume that a
> > > > > database MUST go from ALL DECRYPTED to ALL ENCRYPTED in one shot (and
> > > > > therefore we have to shut down the server to do it), but I don't get why
> > > > > that's the case, particularly if we support any kind of mixed setup
> > > > > where there's some data that's encrypted and some that isn't, and since
> > > > > we're talking about using different keys for different things, it seems
> > > > > to me that we almost certainly should be able to easily say "well,
> > > > > there's no key for this, so just don't go through the decrypt/encrypt
> > > > > routines".
> > > > 
> > > > No, we can't easily do different keys for different things since all the
> > > > keys have to be online for crash recovery, so there isn't much value to
> > > > having different keys since they always have to be online.
> > > 
> > > Wasn't this already discussed?  We should have a master key and then
> > > additional keys for different tables, et al, which are encrypted with
> > > the master key.  Joe, I believe, covered all this quite well.
> > 
> > Yes, I am disagreeing with that.  I posted a 5-option email that went
> > over the issue and explored the value in different keys.  I am still
> > unclear of the benefit since it doesn't fix the 68GB limit unless we do
> > it per 1GB file, and we don't even know if that limit is per key or per
> > key/IV combo.  We can't move ahead until we decide that.
> 
> I understand the 68G limit that you're referring to to be key/IV combo,
> which means that a key per relation should be just fine.

Yes, that is what I thought too.

> Even if it was per key, and it means having a key per 1GB file,
> that wouldn't change the point that I was making, so I'm not entirely
> sure why it's being mentioned in this context.

Because I thought we would use a single key for the entire cluster
(heap/index/WAL), and only use another key for encryption key rotation.

> I disagree with any approach that lacks a master key with additional
> sub-keys, if that helps clarify things.

We all know we need a passphrase that unlocks an encryption key.  Are
you saying we need per-object/table keys?  Why, other than for key
rotation?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 7:51 PM Bruce Momjian <bruce@momjian.us> wrote:
Looking at the bits we have, the IV for AES is 16 bytes.  Since we know
we have to use LSN (to change the IV for each page write), and the page
number (so WAL updates that change multiple pages with the same LSN use
different IVs), that uses 12 bytes:

        LSN         8 bytes
        page-number 4 bytes

That leaves 4 bytes unused.  If we use CTR, we need 11 bits for the
counter to support 32k pages sizes (per Sehrope Sarkuni), and we can use
the remaining 5 bits as constants to indicate heap, index, or WAL.
(Technically, since we are not encrypting the first 16 bytes, we could
use one less bit for the counter.)  If we also use relfilenode, that is
4 bytes, so we have no bits for the heap/index/WAL constant, and no
space for the CTR counter, meaning we would have to use CBC mode.

You can still use CTR mode and include those to make the key + IV unique by adding them to the derived key rather than the IV.

The IV per-page would still be LSN + page-number (with the block number added as it's evaluated across the page) and the relfilenode, heap/index, database, and anything else to make it unique can be included in the HKDF to create the per-file derived key.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Thu, Jul 25, 2019 at 08:44:40PM -0400, Sehrope Sarkuni wrote:
> On Thu, Jul 25, 2019 at 7:51 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     Looking at the bits we have, the IV for AES is 16 bytes.  Since we know
>     we have to use LSN (to change the IV for each page write), and the page
>     number (so WAL updates that change multiple pages with the same LSN use
>     different IVs), that uses 12 bytes:
> 
>             LSN         8 bytes
>             page-number 4 bytes
> 
>     That leaves 4 bytes unused.  If we use CTR, we need 11 bits for the
>     counter to support 32k pages sizes (per Sehrope Sarkuni), and we can use
>     the remaining 5 bits as constants to indicate heap, index, or WAL.
>     (Technically, since we are not encrypting the first 16 bytes, we could
>     use one less bit for the counter.)  If we also use relfilenode, that is
>     4 bytes, so we have no bits for the heap/index/WAL constant, and no
>     space for the CTR counter, meaning we would have to use CBC mode.
> 
> 
> You can still use CTR mode and include those to make the key + IV unique by
> adding them to the derived key rather than the IV.
>
> The IV per-page would still be LSN + page-number (with the block number added
> as it's evaluated across the page) and the relfilenode, heap/index, database,
> and anything else to make it unique can be included in the HKDF to create the
> per-file derived key.

I thought if we didn't have to hash the stuff together we would be less
likely to get collisions with the IV.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 8:50 PM Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Jul 25, 2019 at 08:44:40PM -0400, Sehrope Sarkuni wrote:
> You can still use CTR mode and include those to make the key + IV unique by
> adding them to the derived key rather than the IV.
>
> The IV per-page would still be LSN + page-number (with the block number added
> as it's evaluated across the page) and the relfilenode, heap/index, database,
> and anything else to make it unique can be included in the HKDF to create the
> per-file derived key.

I thought if we didn't have to hash the stuff together we would be less
likely to get collisions with the IV.
 
IV creation not use any hashing and would never have collisions with the same key as it's LSN + page + block (concatenation).

The derived keys would also not have collisions as the HKDF prevents that. Deriving two matching keys with different inputs has the same chance as randomly generating matching HMACs (effectively nil with something like HMAC-SHA-256).

So there wouldn't be any reuse of the same key + IV. Even if two different files are encrypted with the same LSN + page the total operation (key + IV) would be different as they'd be using different derived keys.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Thu, Jul 25, 2019 at 09:11:18PM -0400, Sehrope Sarkuni wrote:
> On Thu, Jul 25, 2019 at 8:50 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Thu, Jul 25, 2019 at 08:44:40PM -0400, Sehrope Sarkuni wrote:
>     > You can still use CTR mode and include those to make the key + IV unique
>     by
>     > adding them to the derived key rather than the IV.
>     >
>     > The IV per-page would still be LSN + page-number (with the block number
>     added
>     > as it's evaluated across the page) and the relfilenode, heap/index,
>     database,
>     > and anything else to make it unique can be included in the HKDF to create
>     the
>     > per-file derived key.
> 
>     I thought if we didn't have to hash the stuff together we would be less
>     likely to get collisions with the IV.
> 
>  
> IV creation not use any hashing and would never have collisions with the same
> key as it's LSN + page + block (concatenation).
> 
> The derived keys would also not have collisions as the HKDF prevents that.
> Deriving two matching keys with different inputs has the same chance as
> randomly generating matching HMACs (effectively nil with something like
> HMAC-SHA-256).
> 
> So there wouldn't be any reuse of the same key + IV. Even if two different
> files are encrypted with the same LSN + page the total operation (key + IV)
> would be different as they'd be using different derived keys.

Oh, mix the value into the derived key, not into the IV --- got it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hi,

Before my reply, I wanted to say that I've been lurking on this thread
for a bit as I've tried to better inform myself on encryption at rest
and how it will apply to what we want to build. I actually built a
(poor) prototype in Python of the key management system that Joe &
Masahiko both laid out, in addition to performing some "buffer
encrpytion" with it. It's not worth sharing at this point.

With the disclaimer that I'm not as familiar with a lot of concepts as I
would like to be:

On 7/25/19 1:54 PM, Masahiko Sawada wrote:
> On Fri, Jul 26, 2019 at 2:18 AM Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
>>> I've re-considered the design of TDE feature based on the discussion
>>> so far. The one of the main open question is the granular of
>>> encryption objects: cluster encryption or more-granular-than-cluster
>>> encryption. The followings describe about the new TDE design when we
>>> choose table-level encryption or something-new-group-level encryption.
>>>
>>> General
>>> ========
>>> We will use AES and support both AES-128 and AES-256. User can specify
>>> the new initdb option something like --aes-128 or --aes-256 to enable
>>> encryption and must specify --encryption-key-passphrase-command along
>>> with. (I guess we also require openssl library.) If these options are
>>> specified, we write the key length to the control file and derive the
>>> KEK and generate MDEK during initdb. wal_log_hints will be enabled
>>> automatically in encryption mode, like we do for checksum mode,
>>
>> Agreed.  pg_control will store the none/AES128/AES256 indicator.
>>
>>> Key Management
>>> ==============
>>> We will use 3-tier key architecture as Joe proposed.
>>>
>>>   1. A master key encryption key (KEK): this is the ley supplied by the
>>>      database admin using something akin to ssl_passphrase_command
>>>
>>>   2. A master data encryption key (MDEK): this is a generated key using a
>>>      cryptographically secure pseudo-random number generator. It is
>>>      encrypted using the KEK, probably with Key Wrap (KW):
>>>      or maybe better Key Wrap with Padding (KWP):
>>>
>>>   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
>>>       table specific keys.
>>
>> What is the value of a per-table encryption key?  How is HKDF derived?
>
> Per-table encryption key is derived from MDEK with salt and its OID as
> info. I think we can store salts for each encryption keys into the
> separate file so that off-line tool also can read it.

+1 with using the info/salt for the HKDF as described above. The other
decision will be the hashing algorithm to use. SHA-256?


>>>   3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
>>>       generate new keys when needed for WAL.
>>>
>>> We store MDEK to the plain file (say global/pgkey) after encrypted
>>> with the KEK. I might want to store the hash of passphrase of the KEK
>>> in order to verify the correctness of the given passphrase. However we
>>> don't need to store TDEK and WDEK as we can derive them as needed. The
>>> key file can be read by both backend processes and front-end tools.
>>
>> Yes, we need to verify the pass phrase.

Just to clarify, this would be a hash of the KEK?

From my experiments, the MDEK key unwrapping fails if you do not have
the correct KEK (as it should). If it's a matter of storing a hash of
the KEK, I'm not sure if there is much added benefit to have it, but I
would not necessarily oppose it either.

>>> When postmaster startup, it reads the key file and decrypts MDEK and
>>> derive WDEK using key id for WDEK.

I don't know if this is getting too far ahead, but what happens if the
supplied KEK fails to decrypt the MDEK? Will postmaster refuse to startup?

>>> WDEK is loaded to the key hash map
>>> (keyid -> key) on the shared memory. Also we derive TDEK as needed
>>> when reading tables or indexes and add it to the key hash map as well
>>> if not exists.

+1 to this approach.

>>>
>>> Buffer Encryption
>>> ==============
>>> We will use AES-CBC for buffer encryption. We will add key id (4byte)
>>
>> I think we might want to use CTR for this, and will post after this.

Not sure if I missed this post or not (as several people mentioned, it
is easy to get lost in this thread).

I think what will help drive this decision is whether or not we consider
the data we are storing on disk as a "file system" in itself. Trying to
make myself literate in disk encryption theory[1], it seems a big
weakness in using CTR mode for encryption is we need to be able to
guarantee a fresh counter for every page we encrypt[2], so if we can
guarantee the uniqueness of IV per TDEK, this is on the table.

XTS mode, on the other hand, appears to be more durable to reusing an IV
as the "tweak" was designed to represent a disk sector, though there are
still problems. However, I presume this is one of many reasons why
fscrypt uses XTS[3].

For data malleability, CTR is described to be more vulnerable, but both
modes (all for that modes?) require some sort of digital signature (and
most of my research has lead to Encrypt-then-MAC, which I know is being
discussed elsewhere in the thread).

>>
>>> to after the pd_lsn(8byte) in PageHeaderData and we will not encrypt
>>> first 16 byte of each pages so the LSN and key id can be used. We can
>>> store an invalid key id to tell us that the table is not encrypted.
>>> There two benefits of storing key id to the page header: offline tools
>>> can get key id (and know the table is encrypted or not) and it's
>>> helpful for online rekey in the future.
>>
>> I don't remember anyone suggesting different keys for different tables.
>> How would this even be managed by the user?
>
> I think it's still unclear whether we implement one key for whole
> database cluster or different keys for different table as the first
> version. I'm evaluating the performance overhead of the latter that
> you concerned and will share it.
>
> I prefer tablespace-level or something-new-group-level than
> table-level but if we choose the latter we can create a new group of
> tables that are encrypted with the same key. That is user create a
> group and then associate tables to that group. Tablespace-level is
> implemented in the patch I submitted before.

I may not be following here...but the TDEKs are can be dervied with a
(OID,salt) combination, so even if it was per tablespace we would be
storing a salt -- I'm not sure how it would affect being per-table other
than the additional overhead of storing the salt per table...

...I think the pain is realized if/when there is a TDEK rotation, i.e.
the amount of data encrypted by the (OID,salt) pair exceeds

>  Or it's just idea but
> another idea could be to allow users to create encryption key object
> first and then specify which tables are encrypted with which
> encryption key in DDL. For example, user creates an encryption keys
> with name by SQL function and creates an encrypted table by CREATE
> TABLE ... WITH (encryption_key = 'mykey');.

-1 for storing encryption keys in the DDL. If someone has `log_statement
= ddl` or above, those keys will get stored in plaintext to said logs.

I would be +1 for being able to explicitly set tables to be encrypted,
and +1 for a GUC that turns on encryption for all tables. I see a lot of
footguns with configurability and understand there are implementation
headaches as well, but wanted to float the ideas.

>>> I've considered to store IV and key id to a new fork but I felt that
>>> it is complex because we will always need to have the fork on the
>>> shared buffer when any pages of its main fork is written to the disk.
>>> If almost buffers of the shared buffers are dirtied and theirs new
>>> forks are not  loaded to the shared buffer, we might need to load the
>>> new fork and write the page to the disk and then evict some pages,
>>> over and over.
>>>
>>> We will use (page lsn, page number) to create a nonce. IVs are created
>>> by encrypting the nonce with its TDEK.
>>
>> Agreed.

We just need to ensure this adds up to 16 bytes for the IV based on all
of the encryption methods we are considering. I believe this gets us to
12, so we need 4 additional bytes.

To echo an idea up thread, we could make this completely
nondeterministic and keep a randomly generated IV on the page header
(understanding this takes up even more space, and we may need some more
space anyway based on the outcome of the MAC discussion). Or perhaps we
just need to keep 4 bytes for a random salt on the page header that can
be appended to the page LSN / page no. pair.

>>
>>> WAL Encryption
>>> =============
>>> We will use AES-CTR for WAL encryption and encrypt each WAL pages with WDEK.
>>>
>>> We will use WAL segment number to create a nonce. Similar to buffer
>>> encryption, IVs are created using by the nonce and WDEK.

Same comment as above RE needing 16 bytes for the IV, as well as
possible solutions.

>>
>> Yes.  If there is concern about collision of table/index and WAL IVs, we
>> can add a constant to the two uses, as Joe Conway mentioned.
>>
>>> If we want to support enabling or disabling encryption after initdb we
>>> might want to have key id in the WAL page header.

Makes sense. I think the big question is if one enables encryption after
initdb and after there is already data in the database, what happens?
Sounds like it could be a bit of a challenge :)

>>>
>>> Front-end Tool Support
>>> ==================
>>> We will add --encryption-key-passphrase-command option to the
>>> front-end tools that read database files or WAL segment files directly.
>>> They can get KEK via --encryption-key-passphrase-command and get MDEK
>>> by reading the key file. Also they can know the key length by checking
>>> the control file. Since they can derive TDEK using by key id stored in
>>> the page header they can decrypt database files. Similarly, they also
>>> can decrypt WAL as they can know the key id of WDEK.

+1.

>>>
>>> Master Key Rotation
>>> ================
>>> We will support new command-line tool that rotates the master key
>>> offline. It accepts --old-encryption-key-passphrase-command option and
>>> --new-encryption-key-passphrase-command to get old KEK and new KEK
>>> respectively. It decrypt MDEK with the old key and encrypt it with
>>> the new key.
>>
>> That handles changing the passphrase, but what about rotating the
>> encryption key?  Don't we want to support that, at least in offline
>> mode?
>
> Yeah, supporting rotating the encryption key is a good idea. Agreed.

I think part of the reason for having the KEK is we can rotate the KEK
without needing to rotate the MDEK.

Rotating the MDEK could cause a pretty significant downtime event based
on the size of your data. Perhaps something like that should be there
for emergencies, but arguably rotating a MDEK would the the equivalent
of a logical restore to another cluster.

>
> After more thoughts, it's a just idea but I wonder if the first
> implementation step of TDE for v13 could be key management module.
> That is, (in 3-tier case) PostgreSQL gets KEK by passphrase command or
> directly, and creates MDEK. User can create an encryption key with
> name using by SQL function, and the key manager derives DEK and store
> its salt to the disk. Also we have an internal interface to get an
> encryption key.
>
> The good point is not only to develop incrementally but also that if
> PostgreSQL is able to manage (symmetric) encryption keys inside
> database cluster and has interfaces to get and add keys, pgcrypt also
> will be able to use it. That way, we will provide column-level TDE
> first by combination of pgcrypt, triggers and views while keeping
> encryption keys truly secret. After that we can add other level TDE
> using the key management module. We would then be able to focus on how
> to encrypt buffer and WAL.

I think it is a logical starting point to get the key management module
into place, as the rest of the systems to build out from there. That is
how I built my (poor) prototype :)

Given you can already get column level encryption with pgcrypto with
external key management, my suggestion is to spend the effort getting
the TDE architecture nailed down.

(I would also be -1 for making the MDEK available to the user in any way
other than it sitting in the encrypted storage file where it is wrapped.
If they wish to unwrap the MDEK from there with the KEK, that would be
their choice.)

I also want to thank everyone for their efforts on the thread. It has
been a lot to follow to date (and I am sure there will be plenty more to
come), but it speaks to the excitement of wanting to get these features
into PostgreSQL and do it well :)

Jonathan


Attachment
>>>> Buffer Encryption
>>>> ==============
>>>> We will use AES-CBC for buffer encryption. We will add key id (4byte)
>>>
>>> I think we might want to use CTR for this, and will post after this.
>
> Not sure if I missed this post or not (as several people mentioned, it
> is easy to get lost in this thread).
>
> I think what will help drive this decision is whether or not we consider
> the data we are storing on disk as a "file system" in itself. Trying to
> make myself literate in disk encryption theory[1], it seems a big
> weakness in using CTR mode for encryption is we need to be able to
> guarantee a fresh counter for every page we encrypt[2], so if we can
> guarantee the uniqueness of IV per TDEK, this is on the table.
>
> XTS mode, on the other hand, appears to be more durable to reusing an IV
> as the "tweak" was designed to represent a disk sector, though there are
> still problems. However, I presume this is one of many reasons why
> fscrypt uses XTS[3].

Much like Joe earlier, I forgot my citations:

[1] https://en.wikipedia.org/wiki/Disk_encryption_theory
[2]
https://crypto.stackexchange.com/questions/14628/why-do-we-use-xts-over-ctr-for-disk-encryption
[3] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Jonathan


Attachment
On 2019-Jul-25, Bruce Momjian wrote:

> On Thu, Jul 25, 2019 at 03:43:34PM -0400, Alvaro Herrera wrote:

> > Why are we encrypting the page header in the first place?  It seems to
> > me that the encrypted area should cover only the line pointers and the
> > tuple data area; the page header needs to be unencrypted so that it can
> > be used at all: firstly because you need to obtain the LSN from it in
> 
> Yes, the plan was to not encrypt the first 16 bytes so the LSN was visible.

I don't see the value of encrypting the rest of the page header
(which includes the page checksum).

> > order to compute the IV, and secondly because the checksum must be
> > validated *before* decrypting (per Moxie Marlinspike's "cryptographic
> > doom" principle mentioned in a comment in the SE question).
> 
> Uh, I think we are still on the fence about writing the checksum _after_
> encryption,

I don't see what's the reason for doing that.  The "cryptographic doom
principle" page talks about this kind of scenario, and ISTM that the
ultimate suggestion is that the page checksum ought to be verifyable
prior to doing any decryption.

Are you worried about an attacker forging the page checksum by
installing another encrypted page that gives the same checksum?  I'm not
sure how that attack works ... I mean why can the attacker install
arbitrary pages?

> The only way offline tools can verify the CRC without access to the keys
> is via #2, but #2 gives us _no_ detection of tampering.  I realize the
> CRC tampering detection of #1 and #3 is not great, but it certainly has
> some value.

It seems to me that you're trying to invent a cryptographic signature
scheme on your own.  That seems very likely to backfire.

> > I am not totally clear on whether the special space and the "page hole"
> > need to be encrypted.  I tend to think that they should *not* be
> > encrypted; in particular, encrypting a large area containing zeroes seem
> > a plentiful source of known cleartext, which seems a bad thing.  Special
> > space also seems to contain known cleartext; maybe not as much as the
> > page hole, but still seems better avoided.
> 
> Uh, there are no known attacks on AES with known plain-text, e.g., SSL
> uses AES, so I think we are good with encrypting everything after the
> first 16 bytes.

Well, maybe there aren't any attacks *now*, but I don't know what will
happen in the future.  I'm not clear what's the intended win by
encrypting the all-zeroes page hole anyway.  If you leave it
unencrypted, the attacker knows the size of the hole, as well as the
size of the tuple data area and the size of the LP array.  Is that a
side-channer that leaks much?

> > The checksum we currently have is not cryptographically secure -- it's
> > not a crypto-strong signature.  If we want that, we need some further
> > protection.  Maybe for encrypted tables we replace our current checksum
> > with an cryptographically secure signature ...?  Pretty sure 16 bits are
> > insufficient for that, but I suppose we would just use a different page
> > header with room for a proper sig.
> 
> Yes, checksum is more for best-effort than fully secure, but replay of
> pages makes a fully secure solution hard anyway.

What do you mean with "replay of pages"?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 2019-Jul-25, Alvaro Herrera wrote:

> > Uh, there are no known attacks on AES with known plain-text, e.g., SSL
> > uses AES, so I think we are good with encrypting everything after the
> > first 16 bytes.
> 
> Well, maybe there aren't any attacks *now*, but I don't know what will
> happen in the future.  I'm not clear what's the intended win by
> encrypting the all-zeroes page hole anyway.  If you leave it
> unencrypted, the attacker knows the size of the hole, as well as the
> size of the tuple data area and the size of the LP array.  Is that a
> side-channer that leaks much?

This answer https://crypto.stackexchange.com/a/31090 is interesting for
three reasons:

1. it says we don't really have to worry about cleartext attacks, at
least not in the immediate future, so encrypting the hole should be OK;

2. it seems to reinforces a point I tried to make earlier, which is that
reusing the IV a small number of times is *not that bad*:

> On the other hand if the Key and IV are reused between messages then
> the same plaintext will lead to the same ciphertext, so you can
> potentially decrypt a message using a sufficiently large corpus of known
> matching plaintext/ciphertext pairs, even without ever recovering the
> key.

Actually the attack being described presumes that you know *both the*
*unencrypted data and the encrypted data* for a certain key/IV pair,
and only then you can decrypt some other data.  It doesn't follow that
you can decrypt data just because somebody reused the IV for a second
page ... I haven't seen any literature referenced that explains what
this attack is.

3. It seems clear that AES is sufficiently complicated that explaining
it to non-cryptographers is a lost cause.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jul 26, 2019 at 10:57 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>
> Hi,
>
> Before my reply, I wanted to say that I've been lurking on this thread
> for a bit as I've tried to better inform myself on encryption at rest
> and how it will apply to what we want to build. I actually built a
> (poor) prototype in Python of the key management system that Joe &
> Masahiko both laid out, in addition to performing some "buffer
> encrpytion" with it. It's not worth sharing at this point.
>
> With the disclaimer that I'm not as familiar with a lot of concepts as I
> would like to be:
>
> On 7/25/19 1:54 PM, Masahiko Sawada wrote:
> > On Fri, Jul 26, 2019 at 2:18 AM Bruce Momjian <bruce@momjian.us> wrote:
> >>
> >> On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
> >>> I've re-considered the design of TDE feature based on the discussion
> >>> so far. The one of the main open question is the granular of
> >>> encryption objects: cluster encryption or more-granular-than-cluster
> >>> encryption. The followings describe about the new TDE design when we
> >>> choose table-level encryption or something-new-group-level encryption.
> >>>
> >>> General
> >>> ========
> >>> We will use AES and support both AES-128 and AES-256. User can specify
> >>> the new initdb option something like --aes-128 or --aes-256 to enable
> >>> encryption and must specify --encryption-key-passphrase-command along
> >>> with. (I guess we also require openssl library.) If these options are
> >>> specified, we write the key length to the control file and derive the
> >>> KEK and generate MDEK during initdb. wal_log_hints will be enabled
> >>> automatically in encryption mode, like we do for checksum mode,
> >>
> >> Agreed.  pg_control will store the none/AES128/AES256 indicator.
> >>
> >>> Key Management
> >>> ==============
> >>> We will use 3-tier key architecture as Joe proposed.
> >>>
> >>>   1. A master key encryption key (KEK): this is the ley supplied by the
> >>>      database admin using something akin to ssl_passphrase_command
> >>>
> >>>   2. A master data encryption key (MDEK): this is a generated key using a
> >>>      cryptographically secure pseudo-random number generator. It is
> >>>      encrypted using the KEK, probably with Key Wrap (KW):
> >>>      or maybe better Key Wrap with Padding (KWP):
> >>>
> >>>   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
> >>>       table specific keys.
> >>
> >> What is the value of a per-table encryption key?  How is HKDF derived?
> >
> > Per-table encryption key is derived from MDEK with salt and its OID as
> > info. I think we can store salts for each encryption keys into the
> > separate file so that off-line tool also can read it.
>
> +1 with using the info/salt for the HKDF as described above. The other
> decision will be the hashing algorithm to use. SHA-256?

Yeah, SHA-256 would be better for safety.

>
>
> >>>   3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
> >>>       generate new keys when needed for WAL.
> >>>
> >>> We store MDEK to the plain file (say global/pgkey) after encrypted
> >>> with the KEK. I might want to store the hash of passphrase of the KEK
> >>> in order to verify the correctness of the given passphrase. However we
> >>> don't need to store TDEK and WDEK as we can derive them as needed. The
> >>> key file can be read by both backend processes and front-end tools.
> >>
> >> Yes, we need to verify the pass phrase.
>
> Just to clarify, this would be a hash of the KEK?

No, it's a hash of passphrase. Or we might be able to use crypt(3) to
verify the input passphrase.

Apart from passing the passphrase, there are users who rather want to
pass the key directly, for example when using external key management
services. So it might be good if we provide both way.

>
> From my experiments, the MDEK key unwrapping fails if you do not have
> the correct KEK (as it should). If it's a matter of storing a hash of
> the KEK, I'm not sure if there is much added benefit to have it, but I
> would not necessarily oppose it either.
>
> >>> When postmaster startup, it reads the key file and decrypts MDEK and
> >>> derive WDEK using key id for WDEK.
>
> I don't know if this is getting too far ahead, but what happens if the
> supplied KEK fails to decrypt the MDEK? Will postmaster refuse to startup?

I think it should refuse to startup. It would not able to operate all
things properly without correct keys and we prevent to startup from
possible malicious user.

>
> >>> WDEK is loaded to the key hash map
> >>> (keyid -> key) on the shared memory. Also we derive TDEK as needed
> >>> when reading tables or indexes and add it to the key hash map as well
> >>> if not exists.
>
> +1 to this approach.
>
> >>>
> >>> Buffer Encryption
> >>> ==============
> >>> We will use AES-CBC for buffer encryption. We will add key id (4byte)
> >>
> >> I think we might want to use CTR for this, and will post after this.
>
> Not sure if I missed this post or not (as several people mentioned, it
> is easy to get lost in this thread).
>
> I think what will help drive this decision is whether or not we consider
> the data we are storing on disk as a "file system" in itself. Trying to
> make myself literate in disk encryption theory[1], it seems a big
> weakness in using CTR mode for encryption is we need to be able to
> guarantee a fresh counter for every page we encrypt[2], so if we can
> guarantee the uniqueness of IV per TDEK, this is on the table.
>
> XTS mode, on the other hand, appears to be more durable to reusing an IV
> as the "tweak" was designed to represent a disk sector, though there are
> still problems. However, I presume this is one of many reasons why
> fscrypt uses XTS[3].
>
> For data malleability, CTR is described to be more vulnerable, but both
> modes (all for that modes?) require some sort of digital signature (and
> most of my research has lead to Encrypt-then-MAC, which I know is being
> discussed elsewhere in the thread).
>
> >>
> >>> to after the pd_lsn(8byte) in PageHeaderData and we will not encrypt
> >>> first 16 byte of each pages so the LSN and key id can be used. We can
> >>> store an invalid key id to tell us that the table is not encrypted.
> >>> There two benefits of storing key id to the page header: offline tools
> >>> can get key id (and know the table is encrypted or not) and it's
> >>> helpful for online rekey in the future.
> >>
> >> I don't remember anyone suggesting different keys for different tables.
> >> How would this even be managed by the user?
> >
> > I think it's still unclear whether we implement one key for whole
> > database cluster or different keys for different table as the first
> > version. I'm evaluating the performance overhead of the latter that
> > you concerned and will share it.
> >
> > I prefer tablespace-level or something-new-group-level than
> > table-level but if we choose the latter we can create a new group of
> > tables that are encrypted with the same key. That is user create a
> > group and then associate tables to that group. Tablespace-level is
> > implemented in the patch I submitted before.
>
> I may not be following here...but the TDEKs are can be dervied with a
> (OID,salt) combination, so even if it was per tablespace we would be
> storing a salt -- I'm not sure how it would affect being per-table other
> than the additional overhead of storing the salt per table...
>
> ...I think the pain is realized if/when there is a TDEK rotation, i.e.
> the amount of data encrypted by the (OID,salt) pair exceeds
>
> >  Or it's just idea but
> > another idea could be to allow users to create encryption key object
> > first and then specify which tables are encrypted with which
> > encryption key in DDL. For example, user creates an encryption keys
> > with name by SQL function and creates an encrypted table by CREATE
> > TABLE ... WITH (encryption_key = 'mykey');.
>
> -1 for storing encryption keys in the DDL. If someone has `log_statement
> = ddl` or above, those keys will get stored in plaintext to said logs.

Sorry, I meant to create an encrypted table by specifying the
encryption key. What I wanted to say is that user can create an
encryption key object by SQL function or other with the name (say
'mykey'). The encryption key here is TDEK, not MDEK. And then user can
specify the encryption key object when table creation by the name.
Therefore the key never be logged.

Furthermore, the encryption key object could also be used by pgcrypto;
currently we have to pass key itself as argument of pgcrypto function
such as decrypt() or encrypt() but we can change such functions so
that we can specify the name of key object instead. It's a just idea
though.

>
> I would be +1 for being able to explicitly set tables to be encrypted,
> and +1 for a GUC that turns on encryption for all tables. I see a lot of
> footguns with configurability and understand there are implementation
> headaches as well, but wanted to float the ideas.

It seems good idea.

>
> >>> I've considered to store IV and key id to a new fork but I felt that
> >>> it is complex because we will always need to have the fork on the
> >>> shared buffer when any pages of its main fork is written to the disk.
> >>> If almost buffers of the shared buffers are dirtied and theirs new
> >>> forks are not  loaded to the shared buffer, we might need to load the
> >>> new fork and write the page to the disk and then evict some pages,
> >>> over and over.
> >>>
> >>> We will use (page lsn, page number) to create a nonce. IVs are created
> >>> by encrypting the nonce with its TDEK.
> >>
> >> Agreed.
>
> We just need to ensure this adds up to 16 bytes for the IV based on all
> of the encryption methods we are considering. I believe this gets us to
> 12, so we need 4 additional bytes.
>
> To echo an idea up thread, we could make this completely
> nondeterministic and keep a randomly generated IV on the page header
> (understanding this takes up even more space, and we may need some more
> space anyway based on the outcome of the MAC discussion). Or perhaps we
> just need to keep 4 bytes for a random salt on the page header that can
> be appended to the page LSN / page no. pair.
>
> >>
> >>> WAL Encryption
> >>> =============
> >>> We will use AES-CTR for WAL encryption and encrypt each WAL pages with WDEK.
> >>>
> >>> We will use WAL segment number to create a nonce. Similar to buffer
> >>> encryption, IVs are created using by the nonce and WDEK.
>
> Same comment as above RE needing 16 bytes for the IV, as well as
> possible solutions.
>
> >>
> >> Yes.  If there is concern about collision of table/index and WAL IVs, we
> >> can add a constant to the two uses, as Joe Conway mentioned.
> >>
> >>> If we want to support enabling or disabling encryption after initdb we
> >>> might want to have key id in the WAL page header.
>
> Makes sense. I think the big question is if one enables encryption after
> initdb and after there is already data in the database, what happens?
> Sounds like it could be a bit of a challenge :)

I guess that when user requested to encrypt the table we can mark
every pages as that this page has to be encrypted before writing so as
the table is encrypted. For WAL, I've not consider deeply yet but we
might need to switch WAL and enables WAL encryption from the next WAL
file.

>
> >>>
> >>> Front-end Tool Support
> >>> ==================
> >>> We will add --encryption-key-passphrase-command option to the
> >>> front-end tools that read database files or WAL segment files directly.
> >>> They can get KEK via --encryption-key-passphrase-command and get MDEK
> >>> by reading the key file. Also they can know the key length by checking
> >>> the control file. Since they can derive TDEK using by key id stored in
> >>> the page header they can decrypt database files. Similarly, they also
> >>> can decrypt WAL as they can know the key id of WDEK.
>
> +1.
>
> >>>
> >>> Master Key Rotation
> >>> ================
> >>> We will support new command-line tool that rotates the master key
> >>> offline. It accepts --old-encryption-key-passphrase-command option and
> >>> --new-encryption-key-passphrase-command to get old KEK and new KEK
> >>> respectively. It decrypt MDEK with the old key and encrypt it with
> >>> the new key.
> >>
> >> That handles changing the passphrase, but what about rotating the
> >> encryption key?  Don't we want to support that, at least in offline
> >> mode?
> >
> > Yeah, supporting rotating the encryption key is a good idea. Agreed.
>
> I think part of the reason for having the KEK is we can rotate the KEK
> without needing to rotate the MDEK.
>
> Rotating the MDEK could cause a pretty significant downtime event based
> on the size of your data. Perhaps something like that should be there
> for emergencies, but arguably rotating a MDEK would the the equivalent
> of a logical restore to another cluster.

Yeah, it actually depends on the size of your *encrypted* data. So if
we can encrypt only some important table the rotating of MDEK would
not take a long time.

>
> >
> > After more thoughts, it's a just idea but I wonder if the first
> > implementation step of TDE for v13 could be key management module.
> > That is, (in 3-tier case) PostgreSQL gets KEK by passphrase command or
> > directly, and creates MDEK. User can create an encryption key with
> > name using by SQL function, and the key manager derives DEK and store
> > its salt to the disk. Also we have an internal interface to get an
> > encryption key.
> >
> > The good point is not only to develop incrementally but also that if
> > PostgreSQL is able to manage (symmetric) encryption keys inside
> > database cluster and has interfaces to get and add keys, pgcrypt also
> > will be able to use it. That way, we will provide column-level TDE
> > first by combination of pgcrypt, triggers and views while keeping
> > encryption keys truly secret. After that we can add other level TDE
> > using the key management module. We would then be able to focus on how
> > to encrypt buffer and WAL.
>
> I think it is a logical starting point to get the key management module
> into place, as the rest of the systems to build out from there. That is
> how I built my (poor) prototype :)
>
> Given you can already get column level encryption with pgcrypto with
> external key management, my suggestion is to spend the effort getting
> the TDE architecture nailed down.

Thank you! Agreed.

>
> (I would also be -1 for making the MDEK available to the user in any way
> other than it sitting in the encrypted storage file where it is wrapped.
> If they wish to unwrap the MDEK from there with the KEK, that would be
> their choice.)

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, Jul 25, 2019 at 10:57:08PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-25, Bruce Momjian wrote:
> 
> > On Thu, Jul 25, 2019 at 03:43:34PM -0400, Alvaro Herrera wrote:
> 
> > > Why are we encrypting the page header in the first place?  It seems to
> > > me that the encrypted area should cover only the line pointers and the
> > > tuple data area; the page header needs to be unencrypted so that it can
> > > be used at all: firstly because you need to obtain the LSN from it in
> > 
> > Yes, the plan was to not encrypt the first 16 bytes so the LSN was visible.
> 
> I don't see the value of encrypting the rest of the page header
> (which includes the page checksum).

Well, let's unpack this.  Encrypting the page in more finely grained
parts than 16-bytes is going to require the use of CTR, but I think we
are leaning toward that anyway.

One advantage of not encrypting the hole is that it might be faster, but
I think it might reduce parallelism possibilities, so it might be
slower.  This might need testing.

No encrypting the hold does leak the size of the hole to the attacker,
but the size of the table is also visible to the attacker, so I don't
know if the hole size helps.  Knowing index hole size might be useful to
an attacker --- not sure.

> > > order to compute the IV, and secondly because the checksum must be
> > > validated *before* decrypting (per Moxie Marlinspike's "cryptographic
> > > doom" principle mentioned in a comment in the SE question).
> > 
> > Uh, I think we are still on the fence about writing the checksum _after_
> > encryption,
> 
> I don't see what's the reason for doing that.  The "cryptographic doom
> principle" page talks about this kind of scenario, and ISTM that the
> ultimate suggestion is that the page checksum ought to be verifyable
> prior to doing any decryption.

Uh, I listed the three options for the CRC and gave the benefits of
each:

    https://www.postgresql.org/message-id/20190725200343.xo4dcjm5azrfn6zr@momjian.us

Obviously I was not clear on the benefits.  To quote:

    1.  compute CRC and then encrypt everything
    3.  encrypt and then CRC, and store the CRC encrypted

Numbers 1 & 3 give us tampering detection, though with the CRC being so
small, it isn't totally secure.

> Are you worried about an attacker forging the page checksum by
> installing another encrypted page that gives the same checksum?  I'm not
> sure how that attack works ... I mean why can the attacker install
> arbitrary pages?

Well, with #2

    2   encrypt and then CRC, and store the CRC unchanged

you can modify the page, even small parts, and just replace the CRC to
match your changes.  In #1 and #3, you would get a CRC error in almost
all cases since you have no way of setting the decrypted CRC without
knowing the key.  You can change the encrypted CRC, but the odds that
the decrypted one would match the page is very slim.

> > The only way offline tools can verify the CRC without access to the keys
> > is via #2, but #2 gives us _no_ detection of tampering.  I realize the
> > CRC tampering detection of #1 and #3 is not great, but it certainly has
> > some value.
> 
> It seems to me that you're trying to invent a cryptographic signature
> scheme on your own.  That seems very likely to backfire.

Well, we have to live within the constraints we have.  The question is
whether there is sufficient value to having such tampering detection (#1
& #3) compared to the ease of having offline tools verify the checksums
without need to access the keys (#2).

> > > I am not totally clear on whether the special space and the "page hole"
> > > need to be encrypted.  I tend to think that they should *not* be
> > > encrypted; in particular, encrypting a large area containing zeroes seem
> > > a plentiful source of known cleartext, which seems a bad thing.  Special
> > > space also seems to contain known cleartext; maybe not as much as the
> > > page hole, but still seems better avoided.
> > 
> > Uh, there are no known attacks on AES with known plain-text, e.g., SSL
> > uses AES, so I think we are good with encrypting everything after the
> > first 16 bytes.
> 
> Well, maybe there aren't any attacks *now*, but I don't know what will
> happen in the future.  I'm not clear what's the intended win by
> encrypting the all-zeroes page hole anyway.  If you leave it
> unencrypted, the attacker knows the size of the hole, as well as the
> size of the tuple data area and the size of the LP array.  Is that a
> side-channer that leaks much?

See above.

> > > The checksum we currently have is not cryptographically secure -- it's
> > > not a crypto-strong signature.  If we want that, we need some further
> > > protection.  Maybe for encrypted tables we replace our current checksum
> > > with an cryptographically secure signature ...?  Pretty sure 16 bits are
> > > insufficient for that, but I suppose we would just use a different page
> > > header with room for a proper sig.
> > 
> > Yes, checksum is more for best-effort than fully secure, but replay of
> > pages makes a fully secure solution hard anyway.
> 
> What do you mean with "replay of pages"?

Someone can replace the entire page with an old copy of the page they
saved, and since they didn't modify the page, even for #1 and #3, the
checksum would match, unless the encryption key has been rotated.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 11:30:55PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-25, Alvaro Herrera wrote:
> 
> > > Uh, there are no known attacks on AES with known plain-text, e.g., SSL
> > > uses AES, so I think we are good with encrypting everything after the
> > > first 16 bytes.
> > 
> > Well, maybe there aren't any attacks *now*, but I don't know what will
> > happen in the future.  I'm not clear what's the intended win by
> > encrypting the all-zeroes page hole anyway.  If you leave it
> > unencrypted, the attacker knows the size of the hole, as well as the
> > size of the tuple data area and the size of the LP array.  Is that a
> > side-channer that leaks much?
> 
> This answer https://crypto.stackexchange.com/a/31090 is interesting for
> three reasons:
> 
> 1. it says we don't really have to worry about cleartext attacks, at
> least not in the immediate future, so encrypting the hole should be OK;
> 
> 2. it seems to reinforces a point I tried to make earlier, which is that
> reusing the IV a small number of times is *not that bad*:

I think using LSN and page number, we will _never_ reuse the IV, except
for cases like promoting two standbys, which I think we have to document
as an insecure practice.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 27, 2019 at 1:32 PM Bruce Momjian <bruce@momjian.us> wrote:
> Uh, I listed the three options for the CRC and gave the benefits of
> each:
>
>         https://www.postgresql.org/message-id/20190725200343.xo4dcjm5azrfn6zr@momjian.us
>
> Obviously I was not clear on the benefits.  To quote:
>
>         1.  compute CRC and then encrypt everything
>         3.  encrypt and then CRC, and store the CRC encrypted
>
> Numbers 1 & 3 give us tampering detection, though with the CRC being so
> small, it isn't totally secure.
>
> > Are you worried about an attacker forging the page checksum by
> > installing another encrypted page that gives the same checksum?  I'm not
> > sure how that attack works ... I mean why can the attacker install
> > arbitrary pages?
>
> Well, with #2
>
>         2   encrypt and then CRC, and store the CRC unchanged
>
> you can modify the page, even small parts, and just replace the CRC to
> match your changes.  In #1 and #3, you would get a CRC error in almost
> all cases since you have no way of setting the decrypted CRC without
> knowing the key.  You can change the encrypted CRC, but the odds that
> the decrypted one would match the page is very slim.

Regarding #1 and #3, with CTR mode you do not need to know the key to
make changes to the CRC. Flipping bits of the encrypted CRC would flip
the same bits of the decrypted one. This was one of the issues with
the older WiFi encryption standard WEP[1] which used RC4 + CRC32. It's
not the exact same usage pattern, but I wouldn't be surprised if there
is a way to make in place updates and matching CRC32 changes even if
it's encrypted.

Given the non-cryptographic nature of CRC and its 16-bit size, I'd
round down the malicious tamper detection it provides to zero. At best
it catches random disk errors so might as well keep it in plain text
and checkable offline.

More generally, without a cryptographic MAC I don't think it's
possible to provide any meaningful malicious tamper detection. And
even that would have to be off-page to deal with page replay (which I
think is out of scope).

[1]: https://en.wikipedia.org/wiki/CRC-32#Data_integrity

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On 7/27/19 3:02 PM, Sehrope Sarkuni wrote:
> More generally, without a cryptographic MAC I don't think it's
> possible to provide any meaningful malicious tamper detection. And
> even that would have to be off-page to deal with page replay (which I
> think is out of scope).
>
> [1]: https://en.wikipedia.org/wiki/CRC-32#Data_integrity

Yes, exactly -- pretty sure I made that point down thread but who knows;
I know I at least thought it ;-P

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Sat, Jul 27, 2019 at 03:02:02PM -0400, Sehrope Sarkuni wrote:
> On Sat, Jul 27, 2019 at 1:32 PM Bruce Momjian <bruce@momjian.us> wrote:
> > Uh, I listed the three options for the CRC and gave the benefits of
> > each:
> >
> >         https://www.postgresql.org/message-id/20190725200343.xo4dcjm5azrfn6zr@momjian.us
> >
> > Obviously I was not clear on the benefits.  To quote:
> >
> >         1.  compute CRC and then encrypt everything
> >         3.  encrypt and then CRC, and store the CRC encrypted
> >
> > Numbers 1 & 3 give us tampering detection, though with the CRC being so
> > small, it isn't totally secure.
> >
> > > Are you worried about an attacker forging the page checksum by
> > > installing another encrypted page that gives the same checksum?  I'm not
> > > sure how that attack works ... I mean why can the attacker install
> > > arbitrary pages?
> >
> > Well, with #2
> >
> >         2   encrypt and then CRC, and store the CRC unchanged
> >
> > you can modify the page, even small parts, and just replace the CRC to
> > match your changes.  In #1 and #3, you would get a CRC error in almost
> > all cases since you have no way of setting the decrypted CRC without
> > knowing the key.  You can change the encrypted CRC, but the odds that
> > the decrypted one would match the page is very slim.
> 
> Regarding #1 and #3, with CTR mode you do not need to know the key to
> make changes to the CRC. Flipping bits of the encrypted CRC would flip
> the same bits of the decrypted one. This was one of the issues with
> the older WiFi encryption standard WEP[1] which used RC4 + CRC32. It's
> not the exact same usage pattern, but I wouldn't be surprised if there
> is a way to make in place updates and matching CRC32 changes even if
> it's encrypted.

I see.

> Given the non-cryptographic nature of CRC and its 16-bit size, I'd
> round down the malicious tamper detection it provides to zero. At best
> it catches random disk errors so might as well keep it in plain text
> and checkable offline.

OK, zero is pretty low.  ;-)  Let's just go with #2 then, and use CTR
mode so it is easy to skip the CRC bytes in the page.

> More generally, without a cryptographic MAC I don't think it's
> possible to provide any meaningful malicious tamper detection. And
> even that would have to be off-page to deal with page replay (which I
> think is out of scope).

Yeah.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 01:03:06PM -0400, Bruce Momjian wrote:
> On Tue, Jul 16, 2019 at 01:24:54PM +0900, Masahiko Sawada wrote:
> > On Sat, Jul 13, 2019 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > then each row change gets its own LSN.  You are asking if an update that
> > > just expires one row and adds it to a new page gets the same LSN.  I
> > > don't know.
> > 
> > The following scripts can reproduce that different two pages have the same LSN.
> > 
> > =# create table test (a int);
> > CREATE TABLE
> > =# insert into test select generate_series(1, 226);
> > INSERT 0 226
> > =# update test set a = a where a = 1;
> > UPDATE 1
> > =# select lsn from page_header(get_raw_page('test', 0));
> >     lsn
> > -----------
> >  0/1690488
> > (1 row)
> > 
> > =# select lsn from page_header(get_raw_page('test', 1));
> >     lsn
> > -----------
> >  0/1690488
> > (1 row)
> > 
> > So I think it's better to use LSN and page number to create IV. If we
> > modify different tables by single WAL we also would need OID or
> > relfilenode but I don't think currently we have such operations.
> 
> OK, good to know, thanks.

I did some more research on which cases use a single LSN to modify
multiple 8k pages.  The normal program flow is:

        XLogBeginInsert();
    ...
        XLogRegisterBuffer(0, meta, ...
-->     recptr = XLogInsert(RM_BRIN_ID, XLOG_...

        page = BufferGetPage(meta);
        PageSetLSN(page, recptr);

XLogInsert() calls BufferGetTag(), which fills in the buffer's
RelFileNode (which internally is the tablespace, database, and
pg_class.relfilenode).  So, to use the LSN and page-number for the IV,
we need to make sure that there is no other encryption of those values
in a different relation.  What I did was to find cases where
XLogRegisterBuffer/PageSetLSN are called more than once for a single
LSN.  I found cases in:

    brin_doupdate
    brin_doinsert
    brinRevmapDesummarizeRange
    revmap_physical_extend
    GenericXLogFinish
    ginPlaceToPage
    shiftList
    ginDeletePage
    gistXLogSplit
    gistXLogPageDelete
    gistXLogUpdate
    hashbucketcleanup
    _hash_doinsert
    _hash_vacuum_one_page
    _hash_addovflpage
    _hash_freeovflpage
    _hash_squeezebucket
    _hash_init
    _hash_expandtable
    _hash_splitbucket
    log_heap_visible
    log_heap_update
    _bt_insertonpg
    _bt_split
    _bt_newroot
    _bt_getroot
    _bt_mark_page_halfdead
    _bt_unlink_halfdead_page
    addLeafTuple
    moveLeafs
    doPickSplit
    spgSplitNodeAction
    log_newpage_range

Most of these are either updating the different pages in the same
relation (so the page-number for the IV would be different), or are
modifying other types of files, like vm.  (We have not discussed if we
are going to encrypt vm or fsm.  I am guessing we are not.)

You might say, well, is it terrible if we reuse the LSN in a different
relation with the same page number?  Yes.  The way CTR works, it
generates a stream of bits using the key and IV (which will be LSN and
page number).  It then XORs it with the page contents to encrypt it.  If
we encrypt an all-zero gap in a page, or a place in the page where the
page format is known, a user can XOR that with the encrypted data and
get the bit stream at that point.  They can then go to another page that
uses the same key and IV and XOR that to get the decrypted data.  CBC
mode is slightly better because it mixes the user data into the future
16-byte blocks, but lots of our early-byte page format is known, so it
isn't great.

You might say, wow, that is a lot of places to make sure we don't reuse
the LSN in a different relation with the same page number --- let's mix
the relfilenode in the IV so we are sure the IV is not reused.

Well, the pg_class.relfilenode is only unique within the
tablespace/database, i.e., from src/include/storage/relfilenode.h:

 * relNode identifies the specific relation.  relNode corresponds to
 * pg_class.relfilenode (NOT pg_class.oid, because we need to be able
 * to assign new physical files to relations in some situations).
 * Notice that relNode is only unique within a database in a particular
 * tablespace.

So, we would need to mix the tablespace, database oid, and relfilenode
into the IV to be unique.  We would then need to have pg_upgrade
preserve the relfilenode, change CREATE DATABASE to decrypt/encrypt when
creating a new database, and no longer allow files to be moved between
tablespaces without decryption/encryption.

There are just a whole host of complexities we add to encryption if we
add the requirement of preserving the refilenode, tablespace, and
database to decrypt each page.  I just don't think we want go there
unless we have a valid reason.

I am thinking of writing some Assert() code that checks that all buffers
using a single LSN are from the same relation (and therefore different
page numbers).  I would do it by creating a static array, clearing it on
XLogBeginInsert(), adding to it for each  XLogInsert(), then checking on
PageSetLSN() that everything in the array is from the same file.  Does
that make sense?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Jul 27, 2019 at 12:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 26, 2019 at 10:57 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> >
> > Hi,
> >
> > Before my reply, I wanted to say that I've been lurking on this thread
> > for a bit as I've tried to better inform myself on encryption at rest
> > and how it will apply to what we want to build. I actually built a
> > (poor) prototype in Python of the key management system that Joe &
> > Masahiko both laid out, in addition to performing some "buffer
> > encrpytion" with it. It's not worth sharing at this point.
> >
> > With the disclaimer that I'm not as familiar with a lot of concepts as I
> > would like to be:
> >
> > On 7/25/19 1:54 PM, Masahiko Sawada wrote:
> > > On Fri, Jul 26, 2019 at 2:18 AM Bruce Momjian <bruce@momjian.us> wrote:
> > >>
> > >> On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
> > >>> I've re-considered the design of TDE feature based on the discussion
> > >>> so far. The one of the main open question is the granular of
> > >>> encryption objects: cluster encryption or more-granular-than-cluster
> > >>> encryption. The followings describe about the new TDE design when we
> > >>> choose table-level encryption or something-new-group-level encryption.
> > >>>
> > >>> General
> > >>> ========
> > >>> We will use AES and support both AES-128 and AES-256. User can specify
> > >>> the new initdb option something like --aes-128 or --aes-256 to enable
> > >>> encryption and must specify --encryption-key-passphrase-command along
> > >>> with. (I guess we also require openssl library.) If these options are
> > >>> specified, we write the key length to the control file and derive the
> > >>> KEK and generate MDEK during initdb. wal_log_hints will be enabled
> > >>> automatically in encryption mode, like we do for checksum mode,
> > >>
> > >> Agreed.  pg_control will store the none/AES128/AES256 indicator.
> > >>
> > >>> Key Management
> > >>> ==============
> > >>> We will use 3-tier key architecture as Joe proposed.
> > >>>
> > >>>   1. A master key encryption key (KEK): this is the ley supplied by the
> > >>>      database admin using something akin to ssl_passphrase_command
> > >>>
> > >>>   2. A master data encryption key (MDEK): this is a generated key using a
> > >>>      cryptographically secure pseudo-random number generator. It is
> > >>>      encrypted using the KEK, probably with Key Wrap (KW):
> > >>>      or maybe better Key Wrap with Padding (KWP):
> > >>>
> > >>>   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
> > >>>       table specific keys.
> > >>
> > >> What is the value of a per-table encryption key?  How is HKDF derived?
> > >
> > > Per-table encryption key is derived from MDEK with salt and its OID as
> > > info. I think we can store salts for each encryption keys into the
> > > separate file so that off-line tool also can read it.
> >
> > +1 with using the info/salt for the HKDF as described above. The other
> > decision will be the hashing algorithm to use. SHA-256?
>
> Yeah, SHA-256 would be better for safety.

After more thoughts, I'm confused why we need to have MDEK. We can use
KEK derived from passphrase and TDEK and WDEK that are derived from
KEK. That way, we don't need store any key in database file. What is
the advantage of 3-tier key architecture?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, Jul 29, 2019 at 4:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> After more thoughts, I'm confused why we need to have MDEK. We can use
> KEK derived from passphrase and TDEK and WDEK that are derived from
> KEK. That way, we don't need store any key in database file. What is
> the advantage of 3-tier key architecture?

The separate MDEK serves a couple purposes:

1. Allows for rotating the passphrase without actually changing any of
the downstream derived keys.
2. Verification that the passphrase itself is correct by checking if
it can unlock and authenticate (via a MAC) the MDEK.
3. Ensures it's generated from a strong random source (ex: /dev/urandom).

If the MDEK was directly derived via a deterministic function of the
passphrase, then that passphrase could never change as all your
derived keys would also change (and thus could not be decrypt their
existing data). The encrypted MDEK provides a level of indirection for
passphrase rotation.

An argument could be made to push that problem upstream, i.e. let the
supplier of the passphrase deal with the indirection. You would still
need to verify the supplied passphrase/key is correct via something
like authenticating against a stored MAC. If you're going to do that,
might as well directly support decrypting and managing your own MDEK.
That also let's you ensure it was properly generated via strong random
source.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On Mon, Jul 29, 2019 at 7:17 PM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Mon, Jul 29, 2019 at 4:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > After more thoughts, I'm confused why we need to have MDEK. We can use
> > KEK derived from passphrase and TDEK and WDEK that are derived from
> > KEK. That way, we don't need store any key in database file. What is
> > the advantage of 3-tier key architecture?
>
> The separate MDEK serves a couple purposes:
>
> 1. Allows for rotating the passphrase without actually changing any of
> the downstream derived keys.
> 2. Verification that the passphrase itself is correct by checking if
> it can unlock and authenticate (via a MAC) the MDEK.
> 3. Ensures it's generated from a strong random source (ex: /dev/urandom).
>
> If the MDEK was directly derived via a deterministic function of the
> passphrase, then that passphrase could never change as all your
> derived keys would also change (and thus could not be decrypt their
> existing data). The encrypted MDEK provides a level of indirection for
> passphrase rotation.

Understood. Thank you for explanation!

>
> An argument could be made to push that problem upstream, i.e. let the
> supplier of the passphrase deal with the indirection. You would still
> need to verify the supplied passphrase/key is correct via something
> like authenticating against a stored MAC.

So do we need the key for MAC of passphrase/key in order to verify?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, Jul 29, 2019 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > An argument could be made to push that problem upstream, i.e. let the
> > supplier of the passphrase deal with the indirection. You would still
> > need to verify the supplied passphrase/key is correct via something
> > like authenticating against a stored MAC.
>
> So do we need the key for MAC of passphrase/key in order to verify?

Yes. Any 128 or 256-bit value is a valid AES key and any 16-byte input
can be "decrypted" with it in both CTR and CBC mode, you'll just end
up with garbage data if the key does not match. Verification of the
key prior to usage (i.e. starting DB and encrypting/decrypting data)
is a must as otherwise you'll end up with all kinds of corruption or
data loss.

From a single user supplied passphrase you would derive the MDEK and
compute a MAC (either using the same key or via a separate derived
MDEK-MAC key). If the computed MAC matches against the previously
stored value then you know the MDEK is correct as well.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On Mon, Jul 29, 2019 at 11:33 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Jul 25, 2019 at 01:03:06PM -0400, Bruce Momjian wrote:
> > On Tue, Jul 16, 2019 at 01:24:54PM +0900, Masahiko Sawada wrote:
> > > On Sat, Jul 13, 2019 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > > then each row change gets its own LSN.  You are asking if an update that
> > > > just expires one row and adds it to a new page gets the same LSN.  I
> > > > don't know.
> > >
> > > The following scripts can reproduce that different two pages have the same LSN.
> > >
> > > =# create table test (a int);
> > > CREATE TABLE
> > > =# insert into test select generate_series(1, 226);
> > > INSERT 0 226
> > > =# update test set a = a where a = 1;
> > > UPDATE 1
> > > =# select lsn from page_header(get_raw_page('test', 0));
> > >     lsn
> > > -----------
> > >  0/1690488
> > > (1 row)
> > >
> > > =# select lsn from page_header(get_raw_page('test', 1));
> > >     lsn
> > > -----------
> > >  0/1690488
> > > (1 row)
> > >
> > > So I think it's better to use LSN and page number to create IV. If we
> > > modify different tables by single WAL we also would need OID or
> > > relfilenode but I don't think currently we have such operations.
> >
> > OK, good to know, thanks.
>
> I did some more research on which cases use a single LSN to modify
> multiple 8k pages.  The normal program flow is:
>
>         XLogBeginInsert();
>         ...
>         XLogRegisterBuffer(0, meta, ...
> -->     recptr = XLogInsert(RM_BRIN_ID, XLOG_...
>
>         page = BufferGetPage(meta);
>         PageSetLSN(page, recptr);
>
> XLogInsert() calls BufferGetTag(), which fills in the buffer's
> RelFileNode (which internally is the tablespace, database, and
> pg_class.relfilenode).  So, to use the LSN and page-number for the IV,
> we need to make sure that there is no other encryption of those values
> in a different relation.  What I did was to find cases where
> XLogRegisterBuffer/PageSetLSN are called more than once for a single
> LSN.  I found cases in:
>
>         brin_doupdate
>         brin_doinsert
>         brinRevmapDesummarizeRange
>         revmap_physical_extend
>         GenericXLogFinish
>         ginPlaceToPage
>         shiftList
>         ginDeletePage
>         gistXLogSplit
>         gistXLogPageDelete
>         gistXLogUpdate
>         hashbucketcleanup
>         _hash_doinsert
>         _hash_vacuum_one_page
>         _hash_addovflpage
>         _hash_freeovflpage
>         _hash_squeezebucket
>         _hash_init
>         _hash_expandtable
>         _hash_splitbucket
>         log_heap_visible
>         log_heap_update
>         _bt_insertonpg
>         _bt_split
>         _bt_newroot
>         _bt_getroot
>         _bt_mark_page_halfdead
>         _bt_unlink_halfdead_page
>         addLeafTuple
>         moveLeafs
>         doPickSplit
>         spgSplitNodeAction
>         log_newpage_range
>
> Most of these are either updating the different pages in the same
> relation (so the page-number for the IV would be different), or are
> modifying other types of files, like vm.  (We have not discussed if we
> are going to encrypt vm or fsm.  I am guessing we are not.)
>
> You might say, well, is it terrible if we reuse the LSN in a different
> relation with the same page number?  Yes.  The way CTR works, it
> generates a stream of bits using the key and IV (which will be LSN and
> page number).  It then XORs it with the page contents to encrypt it.  If
> we encrypt an all-zero gap in a page, or a place in the page where the
> page format is known, a user can XOR that with the encrypted data and
> get the bit stream at that point.  They can then go to another page that
> uses the same key and IV and XOR that to get the decrypted data.  CBC
> mode is slightly better because it mixes the user data into the future
> 16-byte blocks, but lots of our early-byte page format is known, so it
> isn't great.
>
> You might say, wow, that is a lot of places to make sure we don't reuse
> the LSN in a different relation with the same page number --- let's mix
> the relfilenode in the IV so we are sure the IV is not reused.
>
> Well, the pg_class.relfilenode is only unique within the
> tablespace/database, i.e., from src/include/storage/relfilenode.h:
>
>  * relNode identifies the specific relation.  relNode corresponds to
>  * pg_class.relfilenode (NOT pg_class.oid, because we need to be able
>  * to assign new physical files to relations in some situations).
>  * Notice that relNode is only unique within a database in a particular
>  * tablespace.
>
> So, we would need to mix the tablespace, database oid, and relfilenode
> into the IV to be unique.  We would then need to have pg_upgrade
> preserve the relfilenode, change CREATE DATABASE to decrypt/encrypt when
> creating a new database, and no longer allow files to be moved between
> tablespaces without decryption/encryption.
>
> There are just a whole host of complexities we add to encryption if we
> add the requirement of preserving the refilenode, tablespace, and
> database to decrypt each page.  I just don't think we want go there
> unless we have a valid reason.
>
> I am thinking of writing some Assert() code that checks that all buffers
> using a single LSN are from the same relation (and therefore different
> page numbers).  I would do it by creating a static array, clearing it on
> XLogBeginInsert(), adding to it for each  XLogInsert(), then checking on
> PageSetLSN() that everything in the array is from the same file.  Does
> that make sense?

I had the same concern before. We could have BKPBLOCK_SAME_REL flag in
XLogRecordBlockHeader, which indicates that the relation of the block
is the same as the previous block and therefore we skip to write
RelFileNode. At first glance I thought it's possible that one WAL
record can contain different RelFileNodes but I didn't find any code
attempting to do that.

Checking that all buffers using a single LSN are from the same
relation would be a good idea but I think it's hard to test it and
regard the test result as okay. Even if we passed 'make checkworld',
it might still be possible to happen. And even assertion failures
don't happen in production environment. So I guess it would be better
to have IV so that we never reuse in different relation with the same
page. An idea I came up with is that we make  IV from (PageLSN,
PageNumber, relNode) and have the encryption keys per tablespace.
That way, we never reuse IV in a different relation with the same page
number because relNode is unique within a database in a particular
tablespace as you mentioned.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center





On Mon, Jul 29, 2019, 20:43 Masahiko Sawada <sawada.mshk@gmail.com> wrote:
That way, we never reuse IV in a different relation with the same page
number because relNode is unique within a database in a particular
tablespace as you mentioned.

Sorry, I meant that we can ensure IV+key is unique.

--
Maaahiko Sawada
On Mon, Jul 29, 2019 at 08:43:06PM +0900, Masahiko Sawada wrote:
> > I am thinking of writing some Assert() code that checks that all buffers
> > using a single LSN are from the same relation (and therefore different
> > page numbers).  I would do it by creating a static array, clearing it on
> > XLogBeginInsert(), adding to it for each  XLogInsert(), then checking on
> > PageSetLSN() that everything in the array is from the same file.  Does
> > that make sense?
> 
> I had the same concern before. We could have BKPBLOCK_SAME_REL flag in
> XLogRecordBlockHeader, which indicates that the relation of the block
> is the same as the previous block and therefore we skip to write
> RelFileNode. At first glance I thought it's possible that one WAL
> record can contain different RelFileNodes but I didn't find any code
> attempting to do that.

Yes, the point is that the WAL record makes it possible, so we either
have to test for it or allow it.

> Checking that all buffers using a single LSN are from the same
> relation would be a good idea but I think it's hard to test it and
> regard the test result as okay. Even if we passed 'make checkworld',
> it might still be possible to happen. And even assertion failures

Yes, the problem is that if you embed the relfilenode or tablespace or
database in the encryption IV, you then need to then make sure you
re-encrypt any files that move between these.  I am hesitant to do that
since it then requires these workarounds for encryption going forward.
We know that most people will not be using encryption, so that will not
be well tested either.  For pg_upgrade, I used a minimal-impact
approach, and it has allowed dramatic changes in our code without
requiring changes and retesting of pg_upgrade.

> don't happen in production environment. So I guess it would be better
> to have IV so that we never reuse in different relation with the same
> page. An idea I came up with is that we make  IV from (PageLSN,
> PageNumber, relNode) and have the encryption keys per tablespace.
> That way, we never reuse IV in a different relation with the same page
> number because relNode is unique within a database in a particular
> tablespace as you mentioned.

Yes, this is what we are discussing.  Whether the relfilenode is part of
the IV, or we derive a key with a mix of the master encryption key and
relfilenode is mostly a matter of what fits into which bits.  With CTR,
I think we agreed it has to be LSN and page-number (and CTR counter),
and we only have 5 bits left.  If we wanted to add anything else, it
would be done via the creation of a derived key;  this was covered here:

    https://www.postgresql.org/message-id/CAH7T-ap1Q9yHjGSO4ZJaVhU3L=u14TSHmR++Ccc_Hk3EoqKpUQ@mail.gmail.com

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sun, Jul 28, 2019 at 10:33:03PM -0400, Bruce Momjian wrote:
> I did some more research on which cases use a single LSN to modify
> multiple 8k pages.  The normal program flow is:
> 
>         XLogBeginInsert();
>     ...
> -->     XLogRegisterBuffer(0, meta, ...
>         recptr = XLogInsert(RM_BRIN_ID, XLOG_...
> 
>         page = BufferGetPage(meta);
>         PageSetLSN(page, recptr);
> 
> XLogInsert() calls BufferGetTag(), which fills in the buffer's

Correction, XLogRegisterBuffer() calls BufferGetTag().  I have updated the
quote above.  That is the function I checked, not XLogInsert().

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 29, 2019 at 9:44 AM Bruce Momjian <bruce@momjian.us> wrote:
> Checking that all buffers using a single LSN are from the same
> relation would be a good idea but I think it's hard to test it and
> regard the test result as okay. Even if we passed 'make checkworld',
> it might still be possible to happen. And even assertion failures

Yes, the problem is that if you embed the relfilenode or tablespace or
database in the encryption IV, you then need to then make sure you
re-encrypt any files that move between these.  I am hesitant to do that
since it then requires these workarounds for encryption going forward.
We know that most people will not be using encryption, so that will not
be well tested either.  For pg_upgrade, I used a minimal-impact
approach, and it has allowed dramatic changes in our code without
requiring changes and retesting of pg_upgrade.

Will there be a per-relation salt stored in a separate file? I saw it mentioned in a few places (most recently https://www.postgresql.org/message-id/aa386c3f-fb89-60af-c7a3-9263a633ca1a%40postgresql.org) but there's also discussion of trying to make the TDEK unique without a separate salt so I'm unsure.

With a per-relation salt there is no need to include fixed attributes (database, relfilenode, or tablespace) to ensure the derived key is unique per relation. A long salt (32-bytes from /dev/urandom) alone guarantees that uniqueness. Copying or moving files would then be possible by also copying the salt. It does not need to be a salt per file on disk either, one salt can be used for many files for the same relation by including the fork number, type, or segment in the TDEK derivation (so each file on disk for that relation ends up with a unique TDEK).

There's the usual gotchas of copying encrypted data, i.e. it's exactly the same so clearly they're equal. But any subsequent changes would have a different LSN and encrypt differently going forward. If the main use cases are copying an entire database or moving a tablespace, having that be simpler/faster seems like a good idea. It could be a known limitation like the promoting of multiple replicas. Plus with a key rotation tool anyone that wants everything re-encrypted could run one after the copy.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ 
On 2019-Jul-27, Bruce Momjian wrote:

> I think using LSN and page number, we will _never_ reuse the IV, except
> for cases like promoting two standbys, which I think we have to document
> as an insecure practice.

Actually, why is it an insecure practice?  If you promote two standbys,
then the encrypted pages are the same pages, so it's not two different
messages with the same key/IV -- they're still *one* message.  And as
soon as they start getting queries, they will most likely diverge
because the LSNs of records after the promotion will (most likely) no
longer match.  It takes one different WAL record length for the
"encryption histories" to diverge completely ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 2019-Jul-27, Sehrope Sarkuni wrote:

> Given the non-cryptographic nature of CRC and its 16-bit size, I'd
> round down the malicious tamper detection it provides to zero. At best
> it catches random disk errors so might as well keep it in plain text
> and checkable offline.

But what attack are we protecting against?  We fear that somebody will
steal a disk or a backup.  We don't fear that they will *write* data.
The CRC is there to protect against data corruption.  So whether or not
the CRC protects against malicious tampering is beside the point.

If we were trying to protect against an attacker having access to
*writing* data in the production server, this encryption scheme is
useless: they could just as well read unencrypted data from shared
buffers anyway.

I think trying to protect against malicious data tampering is a second
step *after* this one is done.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Mon, Jul 29, 2019 at 4:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jul-27, Bruce Momjian wrote:

> I think using LSN and page number, we will _never_ reuse the IV, except
> for cases like promoting two standbys, which I think we have to document
> as an insecure practice.

Actually, why is it an insecure practice?  If you promote two standbys,
then the encrypted pages are the same pages, so it's not two different
messages with the same key/IV -- they're still *one* message.  And as
soon as they start getting queries, they will most likely diverge
because the LSNs of records after the promotion will (most likely) no
longer match.  It takes one different WAL record length for the
"encryption histories" to diverge completely ...

You could have a sequence of post promotion events like:

# Replica 1
LSN=t+0 Operation A
LSN=t+1 Operation B
...
LSN=t+n Operation C

# Replica 2
LSN=t+0 Operation X
LSN=t+1 Operation Y
...
LSN=t+n Operation Z

If the LSN and modified page numbers of C and Z are the same
... and the net effect of Z is known (ex: setting a bunch of bytes on the row to zero)
... and you can read the encrypted pages of both replicas (ex: have access to the encrypted storage tier but not necessarily the live server)
... then you can XOR the encrypted pages to get the plain text for the bytes after operation C.

Yes, it's not likely and yes it has a lot of "if..." involved, but it is possible.

I don't think this will be an issue in practice, but it should be documented. Otherwise, it's not unreasonable for someone to expect that a promoted replica would use be using new keys for everything after each promotion.

Encryption for WAL can avoid this type of problem entirely by generating a new random salt and adding a "Use new salt XYZ for WDEK going forward" record. The two replicas would generate different salts so all subsequent encrypted WAL data would be different (even the exact same records). Unfortunately, that doesn't work for pages without a lot more complexity to keep track of which key version to use based upon the LSN.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Mon, Jul 29, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jul-27, Sehrope Sarkuni wrote:

> Given the non-cryptographic nature of CRC and its 16-bit size, I'd
> round down the malicious tamper detection it provides to zero. At best
> it catches random disk errors so might as well keep it in plain text
> and checkable offline.

But what attack are we protecting against?  We fear that somebody will
steal a disk or a backup.  We don't fear that they will *write* data.
The CRC is there to protect against data corruption.  So whether or not
the CRC protects against malicious tampering is beside the point.

That was in response to using an encrypted CRC for tamper detection. I agree that it does not provide meaningful protection so there is no point in adding complexity to use it for that.

I agree it's better to leave the CRC as-is for detecting corruption which also has the advantage of playing nice with existing checksum tooling.
 
If we were trying to protect against an attacker having access to
*writing* data in the production server, this encryption scheme is
useless: they could just as well read unencrypted data from shared
buffers anyway.

The attack situation is someone being able to modify pages at the storage tier. They cannot necessarily read server memory or the encryption key, but they could make changes to existing data or an existing backup that would be subsequently read by the server.

Dealing with that is way out of scope but similar to the replica promotion I think it should be kept track of and documented.
 
I think trying to protect against malicious data tampering is a second
step *after* this one is done.

+1

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

On 7/29/19 6:11 PM, Sehrope Sarkuni wrote:
> On Mon, Jul 29, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com
> <mailto:alvherre@2ndquadrant.com>> wrote:
>
>     On 2019-Jul-27, Sehrope Sarkuni wrote:
>
>     > Given the non-cryptographic nature of CRC and its 16-bit size, I'd
>     > round down the malicious tamper detection it provides to zero. At best
>     > it catches random disk errors so might as well keep it in plain text
>     > and checkable offline.
>
>     But what attack are we protecting against?  We fear that somebody will
>     steal a disk or a backup.  We don't fear that they will *write* data.
>     The CRC is there to protect against data corruption.  So whether or not
>     the CRC protects against malicious tampering is beside the point.
>
>
> That was in response to using an encrypted CRC for tamper detection. I
> agree that it does not provide meaningful protection so there is no
> point in adding complexity to use it for that.
>
> I agree it's better to leave the CRC as-is for detecting corruption
> which also has the advantage of playing nice with existing checksum tooling.
>  
>
>     If we were trying to protect against an attacker having access to
>     *writing* data in the production server, this encryption scheme is
>     useless: they could just as well read unencrypted data from shared
>     buffers anyway.
>
>
> The attack situation is someone being able to modify pages at the
> storage tier. They cannot necessarily read server memory or the
> encryption key, but they could make changes to existing data or an
> existing backup that would be subsequently read by the server.
>
> Dealing with that is way out of scope but similar to the replica
> promotion I think it should be kept track of and documented.
>  
>
>     I think trying to protect against malicious data tampering is a second
>     step *after* this one is done.
>
>
> +1

Well said; +1

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Mon, Jul 29, 2019 at 04:09:52PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-27, Bruce Momjian wrote:
> 
> > I think using LSN and page number, we will _never_ reuse the IV, except
> > for cases like promoting two standbys, which I think we have to document
> > as an insecure practice.
> 
> Actually, why is it an insecure practice?  If you promote two standbys,
> then the encrypted pages are the same pages, so it's not two different
> messages with the same key/IV -- they're still *one* message.  And as
> soon as they start getting queries, they will most likely diverge
> because the LSNs of records after the promotion will (most likely) no
> longer match.  It takes one different WAL record length for the
> "encryption histories" to diverge completely ...

That is a very good point, but if the LSN was reused in _any_ table with
the same page number, it would be insecure, and it would be easy to scan
for such cases.  However, you are right that it is more rare than I
thought.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 29, 2019 at 05:53:40PM -0400, Sehrope Sarkuni wrote:
> I don't think this will be an issue in practice, but it should be documented.
> Otherwise, it's not unreasonable for someone to expect that a promoted replica
> would use be using new keys for everything after each promotion.
> 
> Encryption for WAL can avoid this type of problem entirely by generating a new
> random salt and adding a "Use new salt XYZ for WDEK going forward" record. The
> two replicas would generate different salts so all subsequent encrypted WAL
> data would be different (even the exact same records). Unfortunately, that
> doesn't work for pages without a lot more complexity to keep track of which key
> version to use based upon the LSN.

Oh, yeah, WAL is the big issue here, not the heap/index files, since we
know they will use the same segment number in both clusters.  We can't
use the timeline in the WAL IV since they will both be on the same
timeline.  Anyway, I think the heap/index is still an issue so we should
just document "don't do that".

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sun, Jul 28, 2019 at 10:33:03PM -0400, Bruce Momjian wrote:
> I am thinking of writing some Assert() code that checks that all buffers
> using a single LSN are from the same relation (and therefore different
> page numbers).  I would do it by creating a static array, clearing it on
> XLogBeginInsert(), adding to it for each  XLogInsert(), then checking on
> PageSetLSN() that everything in the array is from the same file.  Does
> that make sense?

So, I started looking at how to implement the Assert checks and found
that Heikki has already added (in commit 2c03216d83) Assert checks to
avoid duplicate block numbers in WAL.  I just added the attached patch
to check that all RelFileNodes are the same.

I ran the regression tests with asserts on and got no failures, so I
think we are good.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachment
On Mon, Jul 29, 2019 at 8:35 PM Bruce Momjian <bruce@momjian.us> wrote:
On Sun, Jul 28, 2019 at 10:33:03PM -0400, Bruce Momjian wrote:
> I am thinking of writing some Assert() code that checks that all buffers
> using a single LSN are from the same relation (and therefore different
> page numbers).  I would do it by creating a static array, clearing it on
> XLogBeginInsert(), adding to it for each  XLogInsert(), then checking on
> PageSetLSN() that everything in the array is from the same file.  Does
> that make sense?

So, I started looking at how to implement the Assert checks and found
that Heikki has already added (in commit 2c03216d83) Assert checks to
avoid duplicate block numbers in WAL.  I just added the attached patch
to check that all RelFileNodes are the same.

From the patch:

/*
! * The initialization vector (IV) is used for page-level
! * encryption.  We use the LSN and page number as the IV, and IV
! * values must never be reused since it is insecure. It is safe
! * to use the LSN on multiple pages in the same relation since
! * the page number is part of the IV.  It is unsafe to reuse the
! * LSN in different relations because the page number might be
! * the same, and hence the IV.  Therefore, we check here that
! * we don't have WAL records for different relations using the
! * same LSN.
! */

If each relation file has its own derived key, the derived TDEK for that relation file, then there is no issue with reusing the same IV = LSN || Page Number. The TDEKs will be different so Key + IV will never collide.

In general it's fine to use the same IV with different keys. Only reuse of Key + IV is a problem and the entire set of possible counter values (IV + 0, IV + 1, ...) generated with a key must be unique. That's also why we must leave at least log2(PAGE_SIZE / AES_BLOCK_SIZE) bits at the end of the IV to be filled in with 0, 1, 2, ... for each 16-byte AES-block on the page. If our per-page IV prefix used any of those bits then the counter could overflow into the next page's IV's range.

I ran the regression tests with asserts on and got no failures, so I
think we are good.

It's not strictly required but it also doesn't hurt that LSN is unique per-relation so that's still good news!

Might be useful for something down the road like a separate stream of MACs computed per-LSN.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ 
 
On Mon, Jul 29, 2019 at 8:18 PM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Mon, Jul 29, 2019 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > An argument could be made to push that problem upstream, i.e. let the
> > > supplier of the passphrase deal with the indirection. You would still
> > > need to verify the supplied passphrase/key is correct via something
> > > like authenticating against a stored MAC.
> >
> > So do we need the key for MAC of passphrase/key in order to verify?
>
> Yes. Any 128 or 256-bit value is a valid AES key and any 16-byte input
> can be "decrypted" with it in both CTR and CBC mode, you'll just end
> up with garbage data if the key does not match. Verification of the
> key prior to usage (i.e. starting DB and encrypting/decrypting data)
> is a must as otherwise you'll end up with all kinds of corruption or
> data loss.
>

Do you mean that we encrypt and store a 16 byte input with the correct
key to the disk, and then decrypt it with the user supplied key and
compare the result to the input data?

> From a single user supplied passphrase you would derive the MDEK and
> compute a MAC (either using the same key or via a separate derived
> MDEK-MAC key). If the computed MAC matches against the previously
> stored value then you know the MDEK is correct as well.

You meant KEK, not MDEK?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, Jul 30, 2019 at 07:44:20AM -0400, Sehrope Sarkuni wrote:
> On Mon, Jul 29, 2019 at 8:35 PM Bruce Momjian <bruce@momjian.us> wrote:
> From the patch:
> 
> /*
> ! * The initialization vector (IV) is used for page-level
> ! * encryption.  We use the LSN and page number as the IV, and IV
> ! * values must never be reused since it is insecure. It is safe
> ! * to use the LSN on multiple pages in the same relation since
> ! * the page number is part of the IV.  It is unsafe to reuse the
> ! * LSN in different relations because the page number might be
> ! * the same, and hence the IV.  Therefore, we check here that
> ! * we don't have WAL records for different relations using the
> ! * same LSN.
> ! */
> 
> If each relation file has its own derived key, the derived TDEK for that
> relation file, then there is no issue with reusing the same IV = LSN || Page
> Number. The TDEKs will be different so Key + IV will never collide.

So, this email explains that we are considering not using the
relfilenode/tablepsace/database to create a derived key per relation,
but us the same key for all relaions because the IV will be unique per
page across all relations:

    https://www.postgresql.org/message-id/20190729134442.2bxakegiqafxgj6u@momjian.us

There is talk of using a derived key with a contant to make sure all
heap/index files use a different derived key than WAL, but I am not
sure.  This is related to whether WAL IV and per-heap/index IV can
collide.

There are other emails in the thread that also discuss the topic.  The
issue is that we add a lot of complexity to other parts of the system,
(e.g. pg_upgrade, CREATE DATABASE, moving relations between tablespaces)
to create a derived key, so we should make sure we need it before we do
it.

> In general it's fine to use the same IV with different keys. Only reuse of Key
> + IV is a problem and the entire set of possible counter values (IV + 0, IV +
> 1, ...) generated with a key must be unique. That's also why we must leave at
> least log2(PAGE_SIZE / AES_BLOCK_SIZE) bits at the end of the IV to be filled
> in with 0, 1, 2, ... for each 16-byte AES-block on the page. If our per-page IV
> prefix used any of those bits then the counter could overflow into the next
> page's IV's range.

Agreed.

Attached is an updated patch that checks only main relation forks, which
I think are the only files we are going ot encrypt, and it has better
comments.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachment
On Tue, Jul 30, 2019 at 8:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Jul 29, 2019 at 8:18 PM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Mon, Jul 29, 2019 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > An argument could be made to push that problem upstream, i.e. let the
> > > supplier of the passphrase deal with the indirection. You would still
> > > need to verify the supplied passphrase/key is correct via something
> > > like authenticating against a stored MAC.
> >
> > So do we need the key for MAC of passphrase/key in order to verify?
>
> Yes. Any 128 or 256-bit value is a valid AES key and any 16-byte input
> can be "decrypted" with it in both CTR and CBC mode, you'll just end
> up with garbage data if the key does not match. Verification of the
> key prior to usage (i.e. starting DB and encrypting/decrypting data)
> is a must as otherwise you'll end up with all kinds of corruption or
> data loss.
>

Do you mean that we encrypt and store a 16 byte input with the correct
key to the disk, and then decrypt it with the user supplied key and
compare the result to the input data?

Yes but we don't compare via decryption of a known input. We instead compute a MAC of the encrypted master key using the user supplied key, and compare that against an expected MAC stored alongside the encrypted master key.

The pseudo code would be something like:

// Read key text from user:
string raw_kek = read_from_user()
// Normalize it to a fixed size of 64-bytes
byte[64] kek = SHA512(SHA512(raw_kek))
// Split the 64-bytes into a separate encryption and MAC key
byte[32] user_encryption_key = kek.slice(0,32)
byte[32] user_mac_key = kek.slice(32, 64)

// Read our saved MAC and encrypted master key
byte[80] mac_iv_encrypted_master_key = read_from_file()
// First 32-bytes is the MAC of the rest
byte[32] expected_mac = mac_iv_encrypted_master_key.slice(0, 32)
// Rest is a random IV + Encrypted master key
byte[48] iv_encrypted_master_key = mac_iv_encrypted_master_key(32, 80)

// Compute the MAC with the user supplied key
byte[32] actual_mac = HMAC(user_mac_key, iv_encrypted_master_key)
// If it does not match then the user key is invalid
if (actual_mac != expected_mac) {
  print_err_and_exit("Bad user key!") 
}

// Our MAC was correct
// ... so we know user supplied key is valid
// ... and we know our iv and encrypted_key are valid
byte[16] iv = iv_encrypted_master_key.slice(0,16)
byte[32] encrypted_master_key = iv_encrypted_master_key.slice(16, 48)
// ... so we can use all three to decrypt the master key (MDEK)
byte[32] master_key = decrypt_aes_cbc(user_encryption_key, iv, encrypted_master_key)


> From a single user supplied passphrase you would derive the MDEK and
> compute a MAC (either using the same key or via a separate derived
> MDEK-MAC key). If the computed MAC matches against the previously
> stored value then you know the MDEK is correct as well.

You meant KEK, not MDEK?

If the KEK is incorrect then the MAC validation would fail and the decrypt would never be attempted.

If the MAC matches then both the KEK (user supplied key) and MDEK ("master_key" in the pseudo code above) would be confirmed to be valid. So the MDEK is safe to use for deriving keys for encrypt / decrypt.

I'm using the definitions for "KEK" and "MDEK" from Joe's mail https://www.postgresql.org/message-id/c878de71-a0c3-96b2-3e11-9ac2c35357c3%40joeconway.com

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Tue, Jul 30, 2019 at 5:03 AM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Mon, Jul 29, 2019 at 9:44 AM Bruce Momjian <bruce@momjian.us> wrote:
>>
>> > Checking that all buffers using a single LSN are from the same
>> > relation would be a good idea but I think it's hard to test it and
>> > regard the test result as okay. Even if we passed 'make checkworld',
>> > it might still be possible to happen. And even assertion failures
>>
>> Yes, the problem is that if you embed the relfilenode or tablespace or
>> database in the encryption IV, you then need to then make sure you
>> re-encrypt any files that move between these.  I am hesitant to do that
>> since it then requires these workarounds for encryption going forward.
>> We know that most people will not be using encryption, so that will not
>> be well tested either.  For pg_upgrade, I used a minimal-impact
>> approach, and it has allowed dramatic changes in our code without
>> requiring changes and retesting of pg_upgrade.
>
>
> Will there be a per-relation salt stored in a separate file? I saw it mentioned in a few places (most recently
https://www.postgresql.org/message-id/aa386c3f-fb89-60af-c7a3-9263a633ca1a%40postgresql.org)but there's also discussion
oftrying to make the TDEK unique without a separate salt so I'm unsure. 
>
> With a per-relation salt there is no need to include fixed attributes (database, relfilenode, or tablespace) to
ensurethe derived key is unique per relation. A long salt (32-bytes from /dev/urandom) alone guarantees that
uniqueness.Copying or moving files would then be possible by also copying the salt. It does not need to be a salt per
fileon disk either, one salt can be used for many files for the same relation by including the fork number, type, or
segmentin the TDEK derivation (so each file on disk for that relation ends up with a unique TDEK). 

If we can derive unique TDEK using (database, tablespace, relfilenode)
as info I think it's better to use it rather than using random salt
per relations since it doesn't require additional information we need
to store. As described in HKDF RFC[1], if the input key is already
present as a cryptographically strong key we can skip the extract part
where use a salt.

[1] https://tools.ietf.org/html/rfc5869

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, Jul 30, 2019 at 8:16 AM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Jul 30, 2019 at 07:44:20AM -0400, Sehrope Sarkuni wrote:
> If each relation file has its own derived key, the derived TDEK for that
> relation file, then there is no issue with reusing the same IV = LSN || Page
> Number. The TDEKs will be different so Key + IV will never collide.

So, this email explains that we are considering not using the
relfilenode/tablepsace/database to create a derived key per relation,
but us the same key for all relaions because the IV will be unique per
page across all relations:

        https://www.postgresql.org/message-id/20190729134442.2bxakegiqafxgj6u@momjian.us

There is talk of using a derived key with a contant to make sure all
heap/index files use a different derived key than WAL, but I am not
sure.  This is related to whether WAL IV and per-heap/index IV can
collide.

Ah, I read that to imply that derived keys were a must. Specifically this piece at the end:

From Joe's email on 2019-07-13 18:41:34:
>> Based on all of that I cannot find a requirement that we use more than
one key per database.
>>
>> But I did find that files in an encrypted file system are encrypted with
derived keys from a master key, and I view this as analogous to what we
are doing.

I read that as the "one key per database" is the MDEK.

And read the part about derived keys as referring to separate derived keys for relations. Perhaps I misread and it was referring to different keys for WAL vs pages.
 
There are other emails in the thread that also discuss the topic.  The
issue is that we add a lot of complexity to other parts of the system,
(e.g. pg_upgrade, CREATE DATABASE, moving relations between tablespaces)
to create a derived key, so we should make sure we need it before we do
it.

Yes it definitely complicates things both on the derivation and potential additional storage for the salts (they're small and fixed size, but you still need to put it somewhere).

I think key rotation for TDEK will be impossible without some stored salt and per-relation derived key. It might not be needed in a first cut though as the "default salt" could be no salt or a place holder of all zeroes. Even if the rotation itself is out of scope for a first pass the potential to eventually add it should be there.

Should keep in mind that because we do not have a MAC on the encrypted pages we'll need to know which derived key to use. We can't try multiple options to see which is correct as any key would "succeed" with garbage decrypted data.
 
> In general it's fine to use the same IV with different keys. Only reuse of Key
> + IV is a problem and the entire set of possible counter values (IV + 0, IV +
> 1, ...) generated with a key must be unique. That's also why we must leave at
> least log2(PAGE_SIZE / AES_BLOCK_SIZE) bits at the end of the IV to be filled
> in with 0, 1, 2, ... for each 16-byte AES-block on the page. If our per-page IV
> prefix used any of those bits then the counter could overflow into the next
> page's IV's range.

Agreed.

Attached is an updated patch that checks only main relation forks, which
I think are the only files we are going ot encrypt, and it has better
comments.

Okay that makes sense in the context of using a single key and relying on the LSN based IV to be unique.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Tue, Jul 30, 2019 at 10:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Jul 30, 2019 at 5:03 AM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Mon, Jul 29, 2019 at 9:44 AM Bruce Momjian <bruce@momjian.us> wrote:
>>
>> > Checking that all buffers using a single LSN are from the same
>> > relation would be a good idea but I think it's hard to test it and
>> > regard the test result as okay. Even if we passed 'make checkworld',
>> > it might still be possible to happen. And even assertion failures
>>
>> Yes, the problem is that if you embed the relfilenode or tablespace or
>> database in the encryption IV, you then need to then make sure you
>> re-encrypt any files that move between these.  I am hesitant to do that
>> since it then requires these workarounds for encryption going forward.
>> We know that most people will not be using encryption, so that will not
>> be well tested either.  For pg_upgrade, I used a minimal-impact
>> approach, and it has allowed dramatic changes in our code without
>> requiring changes and retesting of pg_upgrade.
>
>
> Will there be a per-relation salt stored in a separate file? I saw it mentioned in a few places (most recently https://www.postgresql.org/message-id/aa386c3f-fb89-60af-c7a3-9263a633ca1a%40postgresql.org) but there's also discussion of trying to make the TDEK unique without a separate salt so I'm unsure.
>
> With a per-relation salt there is no need to include fixed attributes (database, relfilenode, or tablespace) to ensure the derived key is unique per relation. A long salt (32-bytes from /dev/urandom) alone guarantees that uniqueness. Copying or moving files would then be possible by also copying the salt. It does not need to be a salt per file on disk either, one salt can be used for many files for the same relation by including the fork number, type, or segment in the TDEK derivation (so each file on disk for that relation ends up with a unique TDEK).

If we can derive unique TDEK using (database, tablespace, relfilenode)
as info I think it's better to use it rather than using random salt
per relations since it doesn't require additional information we need
to store. As described in HKDF RFC[1], if the input key is already
present as a cryptographically strong key we can skip the extract part
where use a salt.

[1] https://tools.ietf.org/html/rfc5869

Yes a random salt is not required for security reasons. Any unique values mixed into the HKDF is fine and the derived keys will still be unique. The HKDF ensures that uniqueness.

The separate salt allows you to disconnect the key derivation from the physical attributes of the file. The physical attributes (ex: database, tablespace, file node) are very convenient as they're unique and do not require additional storage. However using them prevents copying or moving the encrypted files as one or more of them would be different at the destination (so the derived key would no longer decrypt the existing data). So you would have to decrypt / encrypt everything as part of a copy.

If copying raw files without a decrypt/encrypt cycle is desired then the key derivation cannot include physical attributes (or per Bruce's note above, there would be no separate key derivation relation). I thought it'd be a nice property to have as it limits the amount of code that needs to be crypto aware (ex: copying a database or moving a table to a different tablespace would not change beyond ensuring the salt is also copied).

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Tue, Jul 30, 2019 at 10:14:14AM -0400, Sehrope Sarkuni wrote:
>     > In general it's fine to use the same IV with different keys. Only reuse
>     of Key
>     > + IV is a problem and the entire set of possible counter values (IV + 0,
>     IV +
>     > 1, ...) generated with a key must be unique. That's also why we must
>     leave at
>     > least log2(PAGE_SIZE / AES_BLOCK_SIZE) bits at the end of the IV to be
>     filled
>     > in with 0, 1, 2, ... for each 16-byte AES-block on the page. If our
>     per-page IV
>     > prefix used any of those bits then the counter could overflow into the
>     next
>     > page's IV's range.
> 
>     Agreed.
> 
>     Attached is an updated patch that checks only main relation forks, which
>     I think are the only files we are going ot encrypt, and it has better
>     comments.
> 
> 
> Okay that makes sense in the context of using a single key and relying on the
> LSN based IV to be unique.

I had more time to think about the complexity of adding relfilenode to
the IV.  Since relfilenode is only unique within a database/tablespace,
we would need to have pg_upgrade preserve database/tablespace oids
(which I assume are the same as the directory and tablespace symlinks). 
Then, to decode a page, you would need to look up those values.  This is
in addition to the new complexity of CREATE DATABASE and moving files
between tablespaces.  I am also concerned that crash recovery operations
and cluster forensics and repair would need to also deal with this.

I am not even clear if pg_upgrade preserving relfilenode is possible ---
when we wrap the relfilenode counter, does it start at 1 or at the
first-user-relation-oid?  If the former, it could conflict with oids
assigned to new system tables in later major releases.  Tying the
preservation of relations to two restrictions seems risky.

Using just the page LSN and page number allows a page to be be
decrypted/encrypted independently of its file name, tablespace, and
database, and I think that is a win for simplicity.  Of course, if it is
insecure we will not do it.

I am thinking for the heap/index IV, it would be:

    uint64 lsn;
    unint32 page number;
    /* only uses 11 bits for a zero-based CTR counter for 32k pages */
    uint32 counter;

and for WAL it would be:

    uint64 segment_number;
    uint32    counter;
    /* guarantees this IV doesn't match any relation IV */
    uint32   2^32-1 /* all 1's */    

Anyway, these are my thoughts so far.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Jul 29, 2019 at 10:44 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Jul 29, 2019 at 08:43:06PM +0900, Masahiko Sawada wrote:
> > > I am thinking of writing some Assert() code that checks that all buffers
> > > using a single LSN are from the same relation (and therefore different
> > > page numbers).  I would do it by creating a static array, clearing it on
> > > XLogBeginInsert(), adding to it for each  XLogInsert(), then checking on
> > > PageSetLSN() that everything in the array is from the same file.  Does
> > > that make sense?
> >
> > I had the same concern before. We could have BKPBLOCK_SAME_REL flag in
> > XLogRecordBlockHeader, which indicates that the relation of the block
> > is the same as the previous block and therefore we skip to write
> > RelFileNode. At first glance I thought it's possible that one WAL
> > record can contain different RelFileNodes but I didn't find any code
> > attempting to do that.
>
> Yes, the point is that the WAL record makes it possible, so we either
> have to test for it or allow it.
>
> > Checking that all buffers using a single LSN are from the same
> > relation would be a good idea but I think it's hard to test it and
> > regard the test result as okay. Even if we passed 'make checkworld',
> > it might still be possible to happen. And even assertion failures
>
> Yes, the problem is that if you embed the relfilenode or tablespace or
> database in the encryption IV, you then need to then make sure you
> re-encrypt any files that move between these.  I am hesitant to do that
> since it then requires these workarounds for encryption going forward.
> We know that most people will not be using encryption, so that will not
> be well tested either.  For pg_upgrade, I used a minimal-impact
> approach, and it has allowed dramatic changes in our code without
> requiring changes and retesting of pg_upgrade.
>
> > don't happen in production environment. So I guess it would be better
> > to have IV so that we never reuse in different relation with the same
> > page. An idea I came up with is that we make  IV from (PageLSN,
> > PageNumber, relNode) and have the encryption keys per tablespace.
> > That way, we never reuse IV in a different relation with the same page
> > number because relNode is unique within a database in a particular
> > tablespace as you mentioned.
>
> Yes, this is what we are discussing.  Whether the relfilenode is part of
> the IV, or we derive a key with a mix of the master encryption key and
> relfilenode is mostly a matter of what fits into which bits.  With CTR,
> I think we agreed it has to be LSN and page-number (and CTR counter),
> and we only have 5 bits left.  If we wanted to add anything else, it
> would be done via the creation of a derived key;  this was covered here:
>

Just to confirm, we have 21 bits left for nonce in CTR? We have LSN (8
bytes), page-number (4 bytes) and counter (11 bits) in 16 bytes nonce
space. Even though we have 21 bits left we cannot store relfilenode to
the IV.

BTW I've received a review about the current design by some
cryptologists in our company. They recommended to use CTR rather than
CBC. The main reason is that in block cipher it's important to ensure
the uniqueness even for every input block to the block cipher. CBC is
hard to ensure that because the previous output is the next block's
input. Whereas in CTR, it encrypts each blocks separately with
key+nonce. Therefore if we can ensure the uniqueness of IV we can meet
that. Also it's not necessary to encrypt IV as it's okey to be
predictable. So I vote for CTR for at least for tables/indexes
encryption, there already might be consensus though.

For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
and then write only the encrypted data of the new WAL record using
pg_pwrite() rather than write whole encrypted page. So each time we
encrypt 8k WAL page we end up with encrypting different data with the
same key+nonce but since we don't write to the disk other than space
where we actually wrote WAL records it's not a problem. Is that right?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Wed, Jul 31, 2019 at 5:48 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Jul 30, 2019 at 10:14:14AM -0400, Sehrope Sarkuni wrote:
> >     > In general it's fine to use the same IV with different keys. Only reuse
> >     of Key
> >     > + IV is a problem and the entire set of possible counter values (IV + 0,
> >     IV +
> >     > 1, ...) generated with a key must be unique. That's also why we must
> >     leave at
> >     > least log2(PAGE_SIZE / AES_BLOCK_SIZE) bits at the end of the IV to be
> >     filled
> >     > in with 0, 1, 2, ... for each 16-byte AES-block on the page. If our
> >     per-page IV
> >     > prefix used any of those bits then the counter could overflow into the
> >     next
> >     > page's IV's range.
> >
> >     Agreed.
> >
> >     Attached is an updated patch that checks only main relation forks, which
> >     I think are the only files we are going ot encrypt, and it has better
> >     comments.
> >
> >
> > Okay that makes sense in the context of using a single key and relying on the
> > LSN based IV to be unique.
>
> I had more time to think about the complexity of adding relfilenode to
> the IV.  Since relfilenode is only unique within a database/tablespace,
> we would need to have pg_upgrade preserve database/tablespace oids
> (which I assume are the same as the directory and tablespace symlinks).
> Then, to decode a page, you would need to look up those values.  This is
> in addition to the new complexity of CREATE DATABASE and moving files
> between tablespaces.  I am also concerned that crash recovery operations
> and cluster forensics and repair would need to also deal with this.
>
> I am not even clear if pg_upgrade preserving relfilenode is possible ---
> when we wrap the relfilenode counter, does it start at 1 or at the
> first-user-relation-oid?  If the former, it could conflict with oids
> assigned to new system tables in later major releases.  Tying the
> preservation of relations to two restrictions seems risky.
>
> Using just the page LSN and page number allows a page to be be
> decrypted/encrypted independently of its file name, tablespace, and
> database, and I think that is a win for simplicity.  Of course, if it is
> insecure we will not do it.
>
> I am thinking for the heap/index IV, it would be:
>
>         uint64 lsn;
>         unint32 page number;
>         /* only uses 11 bits for a zero-based CTR counter for 32k pages */
>         uint32 counter;
>

+1
IIUC since this would require to ensure uniqueness by using key+IV we
need to use different keys for different relations. Is that right?

> and for WAL it would be:
>
>         uint64 segment_number;
>         uint32    counter;
>         /* guarantees this IV doesn't match any relation IV */
>         uint32   2^32-1 /* all 1's */

I would propose to include the page number within a WAL segment to IV
so that we can encrypt each WAL page with the counter always starting
from 0. And if we use different encryption keys for tables/indexes and
WAL I think we don't need 2^32-1.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, Jul 30, 2019 at 10:45 PM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Tue, Jul 30, 2019 at 8:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Mon, Jul 29, 2019 at 8:18 PM Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>> >
>> > On Mon, Jul 29, 2019 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> > > > An argument could be made to push that problem upstream, i.e. let the
>> > > > supplier of the passphrase deal with the indirection. You would still
>> > > > need to verify the supplied passphrase/key is correct via something
>> > > > like authenticating against a stored MAC.
>> > >
>> > > So do we need the key for MAC of passphrase/key in order to verify?
>> >
>> > Yes. Any 128 or 256-bit value is a valid AES key and any 16-byte input
>> > can be "decrypted" with it in both CTR and CBC mode, you'll just end
>> > up with garbage data if the key does not match. Verification of the
>> > key prior to usage (i.e. starting DB and encrypting/decrypting data)
>> > is a must as otherwise you'll end up with all kinds of corruption or
>> > data loss.
>> >
>>
>> Do you mean that we encrypt and store a 16 byte input with the correct
>> key to the disk, and then decrypt it with the user supplied key and
>> compare the result to the input data?
>
>
> Yes but we don't compare via decryption of a known input. We instead compute a MAC of the encrypted master key using
theuser supplied key, and compare that against an expected MAC stored alongside the encrypted master key.
 
>
> The pseudo code would be something like:
>
> // Read key text from user:
> string raw_kek = read_from_user()
> // Normalize it to a fixed size of 64-bytes
> byte[64] kek = SHA512(SHA512(raw_kek))
> // Split the 64-bytes into a separate encryption and MAC key
> byte[32] user_encryption_key = kek.slice(0,32)
> byte[32] user_mac_key = kek.slice(32, 64)
>
> // Read our saved MAC and encrypted master key
> byte[80] mac_iv_encrypted_master_key = read_from_file()
> // First 32-bytes is the MAC of the rest
> byte[32] expected_mac = mac_iv_encrypted_master_key.slice(0, 32)
> // Rest is a random IV + Encrypted master key
> byte[48] iv_encrypted_master_key = mac_iv_encrypted_master_key(32, 80)
>
> // Compute the MAC with the user supplied key
> byte[32] actual_mac = HMAC(user_mac_key, iv_encrypted_master_key)
> // If it does not match then the user key is invalid
> if (actual_mac != expected_mac) {
>   print_err_and_exit("Bad user key!")
> }
>
> // Our MAC was correct
> // ... so we know user supplied key is valid
> // ... and we know our iv and encrypted_key are valid
> byte[16] iv = iv_encrypted_master_key.slice(0,16)
> byte[32] encrypted_master_key = iv_encrypted_master_key.slice(16, 48)
> // ... so we can use all three to decrypt the master key (MDEK)
> byte[32] master_key = decrypt_aes_cbc(user_encryption_key, iv, encrypted_master_key)
>
>
>> > From a single user supplied passphrase you would derive the MDEK and
>> > compute a MAC (either using the same key or via a separate derived
>> > MDEK-MAC key). If the computed MAC matches against the previously
>> > stored value then you know the MDEK is correct as well.
>>
>> You meant KEK, not MDEK?
>
>
> If the KEK is incorrect then the MAC validation would fail and the decrypt would never be attempted.
>
> If the MAC matches then both the KEK (user supplied key) and MDEK ("master_key" in the pseudo code above) would be
confirmedto be valid. So the MDEK is safe to use for deriving keys for encrypt / decrypt.
 
>
> I'm using the definitions for "KEK" and "MDEK" from Joe's mail
https://www.postgresql.org/message-id/c878de71-a0c3-96b2-3e11-9ac2c35357c3%40joeconway.com
>

I now understood. Thank you for explanation!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Wed, Jul 31, 2019 at 3:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
> and then write only the encrypted data of the new WAL record using
> pg_pwrite() rather than write whole encrypted page. So each time we
> encrypt 8k WAL page we end up with encrypting different data with the
> same key+nonce but since we don't write to the disk other than space
> where we actually wrote WAL records it's not a problem. Is that right?

Hmm that's incorrect. We always write an entire 8k WAL page even if we
write a few WAl records into a page. It's bad because we encrypt
different pages with the same key+IV, but we cannot change IV for each
WAL writes as we end up with changing also
already-flushed-WAL-records. So we might need to change the WAL write
so that it write only WAL records we actually wrote.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, Jul 30, 2019 at 4:48 PM Bruce Momjian <bruce@momjian.us> wrote:
I had more time to think about the complexity of adding relfilenode to
the IV.  Since relfilenode is only unique within a database/tablespace,
we would need to have pg_upgrade preserve database/tablespace oids
(which I assume are the same as the directory and tablespace symlinks).
Then, to decode a page, you would need to look up those values.  This is
in addition to the new complexity of CREATE DATABASE and moving files
between tablespaces.  I am also concerned that crash recovery operations
and cluster forensics and repair would need to also deal with this.

I am not even clear if pg_upgrade preserving relfilenode is possible ---
when we wrap the relfilenode counter, does it start at 1 or at the
first-user-relation-oid?  If the former, it could conflict with oids
assigned to new system tables in later major releases.  Tying the
preservation of relations to two restrictions seems risky.

Agreed. Unless you know for sure the input is going to immutable across copies or upgrades, including anything in either the IV or key derivation gets risky and could tie you down for the future. That's partly why I like the idea separate salt (basically you directly pay for the complexity by tracking that).

Even if we do not include a separate per-relation salt or things like relfilenode when generating a derived key, we can still include other types of immutable attributes. For example the fork type could be included to eventually allow multiple forks for the same relation to be encrypted with the same IV = LSN + Page Number as the derived key per-fork would be distinct.
 
Using just the page LSN and page number allows a page to be be
decrypted/encrypted independently of its file name, tablespace, and
database, and I think that is a win for simplicity.  Of course, if it is
insecure we will not do it.

As LSN + Page Number combo is unique for all relations (not just one relation) I think we're good for pages.

I am thinking for the heap/index IV, it would be:

        uint64 lsn;
        unint32 page number;
        /* only uses 11 bits for a zero-based CTR counter for 32k pages */
        uint32 counter;

Looks good. 
 
and for WAL it would be:

        uint64 segment_number;
        uint32    counter;
        /* guarantees this IV doesn't match any relation IV */
        uint32   2^32-1 /* all 1's */   

I need to read up more on the structure of the WAL records but here's some high level thoughts:

WAL encryption should not use the same key as page encryption so there's no need to design the IV to try to avoid matching the page IVs. Even a basic derivation with a single fixed WDEK = HKDF(MDEK, "WAL") and TDEK = HKDF(MDEK, "PAGE") would ensure separate keys. That's the the literal string "WAL" or "PAGE" being added as a salt to generate the respective keys, all that matters is they're different.

Ideally WAL encryption would generating new derived keys as part of the WAL stream. The WAL stream is not fixed so you have the luxury of being able to add a "Use new random salt XZY going forward" records. Forcing generation of a new salt/key upon promotion of a replica would ensure that at least the WAL is unique going forward. Could also generate a new upon server startup, after every N bytes, or a new one for each new WAL file. There's much more flexibility compared to page encryption.

As WAL is a single continuous stream, we can start the IV for each derived WAL key from zero. There's no need to complicate it further as Key + IV will never be reused.

If WAL is always written as full pages we need to ensure that the empty parts of the page are actual zeros and not "encrypted zeroes". Otherwise an XOR of the empty section of the first write of a page against a subsequent one would give you the plain text.

The non-fixed size of the WAL allows for the addition of a MAC though I'm not sure yet the best way to incorporate it. It could be part of each encrypted record or its own summary record (providing a MAC for a series of WAL records). After I've gone through this a bit more I'm looking to put together a write up with this and some other thoughts in one place.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ 
On Wed, Jul 31, 2019 at 2:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Just to confirm, we have 21 bits left for nonce in CTR? We have LSN (8
bytes), page-number (4 bytes) and counter (11 bits) in 16 bytes nonce
space. Even though we have 21 bits left we cannot store relfilenode to
the IV.

Fields like relfilenode, database, or tablespace could be added to the derived key, not the per-page IV. There's no space limitations as they are additional inputs into the HKDF (key derivation function).

Additional salt of any size, either some stored random value or something deterministic like the relfilenode/database/tablespace, can be added to the HKDF. There's separate issues with including those specific fields but it's not a size limitation.
 
For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
and then write only the encrypted data of the new WAL record using
pg_pwrite() rather than write whole encrypted page. So each time we
encrypt 8k WAL page we end up with encrypting different data with the
same key+nonce but since we don't write to the disk other than space
where we actually wrote WAL records it's not a problem. Is that right?

Ah, this is what I was referring to in my previous mail. I'm not familiar with how the writes happen yet (reading up...) but, yes, we would need to ensure that encrypted data is not written more than once (i.e. no writing of encrypt(zero) followed by writing of encrypt(non-zero) at the same spot).

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Wed, Jul 31, 2019 at 03:29:59PM +0900, Masahiko Sawada wrote:
> Just to confirm, we have 21 bits left for nonce in CTR? We have LSN (8
> bytes), page-number (4 bytes) and counter (11 bits) in 16 bytes nonce
> space. Even though we have 21 bits left we cannot store relfilenode to
> the IV.

No.  The nonce is the LSN and page number, the CTR counter (11 bits),
and 21 extra bits.  CTR needs a nonce for every 16-byte block, if you
want to think of it that way.

Even though there isn't space for the relfilenode in the nonce, We could
use the relfilenode/tablespace/database id by mixing into a derived key,
based on the master key, but as I stated in another email, we don't want
do that unles we have to.

> BTW I've received a review about the current design by some
> cryptologists in our company. They recommended to use CTR rather than
> CBC. The main reason is that in block cipher it's important to ensure
> the uniqueness even for every input block to the block cipher. CBC is
> hard to ensure that because the previous output is the next block's
> input. Whereas in CTR, it encrypts each blocks separately with
> key+nonce. Therefore if we can ensure the uniqueness of IV we can meet
> that. Also it's not necessary to encrypt IV as it's okey to be
> predictable. So I vote for CTR for at least for tables/indexes
> encryption, there already might be consensus though.

Yes, you are more likely to get duplicate nonce in CBC mode rather than
the CTR mode we are proposing.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Jul 31, 2019 at 04:58:49PM +0900, Masahiko Sawada wrote:
> On Wed, Jul 31, 2019 at 3:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
> > and then write only the encrypted data of the new WAL record using
> > pg_pwrite() rather than write whole encrypted page. So each time we
> > encrypt 8k WAL page we end up with encrypting different data with the
> > same key+nonce but since we don't write to the disk other than space
> > where we actually wrote WAL records it's not a problem. Is that right?
> 
> Hmm that's incorrect. We always write an entire 8k WAL page even if we
> write a few WAl records into a page. It's bad because we encrypt
> different pages with the same key+IV, but we cannot change IV for each
> WAL writes as we end up with changing also
> already-flushed-WAL-records. So we might need to change the WAL write
> so that it write only WAL records we actually wrote.

Uh, I don't understand.  We use the LSN to write the 8k page, and we use
a different nonce scheme for the WAL.  The LSN changes each time the
page is modified. The 8k page in the WAL is encrypted just like the rest
of the WAL.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Jul 31, 2019 at 09:43:00AM -0400, Sehrope Sarkuni wrote:
> On Wed, Jul 31, 2019 at 2:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
>     Just to confirm, we have 21 bits left for nonce in CTR? We have LSN (8
>     bytes), page-number (4 bytes) and counter (11 bits) in 16 bytes nonce
>     space. Even though we have 21 bits left we cannot store relfilenode to
>     the IV.
> 
> 
> Fields like relfilenode, database, or tablespace could be added to the derived
> key, not the per-page IV. There's no space limitations as they are additional
> inputs into the HKDF (key derivation function).

Yes, but we want to avoid that for other reasons.

>     For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
>     and then write only the encrypted data of the new WAL record using
>     pg_pwrite() rather than write whole encrypted page. So each time we
>     encrypt 8k WAL page we end up with encrypting different data with the
>     same key+nonce but since we don't write to the disk other than space
>     where we actually wrote WAL records it's not a problem. Is that right?
> 
> Ah, this is what I was referring to in my previous mail. I'm not familiar with
> how the writes happen yet (reading up...) but, yes, we would need to ensure
> that encrypted data is not written more than once (i.e. no writing of encrypt
> (zero) followed by writing of encrypt(non-zero) at the same spot).

Right.  The 8k page LSN changes each time the page is modified, and the
is part of the page nonce.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Jul 31, 2019 at 04:11:03PM +0900, Masahiko Sawada wrote:
> On Wed, Jul 31, 2019 at 5:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> > I am thinking for the heap/index IV, it would be:
> >
> >         uint64 lsn;
> >         unint32 page number;
> >         /* only uses 11 bits for a zero-based CTR counter for 32k pages */
> >         uint32 counter;
> >
> 
> +1
> IIUC since this would require to ensure uniqueness by using key+IV we
> need to use different keys for different relations. Is that right?

No.  My other email states that the LSN is only used for a single
relation, so there is no need for the relfilenode in the nonce.  A
single LSN writing to multiple parts of the relation generates a unique
nonce since the page number is also part of the nonce.

> > and for WAL it would be:
> >
> >         uint64 segment_number;
> >         uint32    counter;
> >         /* guarantees this IV doesn't match any relation IV */
> >         uint32   2^32-1 /* all 1's */
> 
> I would propose to include the page number within a WAL segment to IV
> so that we can encrypt each WAL page with the counter always starting
> from 0. And if we use different encryption keys for tables/indexes and

What is the value of that?

> And if we use different encryption keys for tables/indexes and
> WAL I think we don't need 2^32-1.

I see little value to using different encryption keys for tables/indexes
and WAL.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Jul 31, 2019 at 09:25:01AM -0400, Sehrope Sarkuni wrote:
> On Tue, Jul 30, 2019 at 4:48 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     I had more time to think about the complexity of adding relfilenode to
>     the IV.  Since relfilenode is only unique within a database/tablespace,
>     we would need to have pg_upgrade preserve database/tablespace oids
>     (which I assume are the same as the directory and tablespace symlinks).
>     Then, to decode a page, you would need to look up those values.  This is
>     in addition to the new complexity of CREATE DATABASE and moving files
>     between tablespaces.  I am also concerned that crash recovery operations
>     and cluster forensics and repair would need to also deal with this.
> 
>     I am not even clear if pg_upgrade preserving relfilenode is possible ---
>     when we wrap the relfilenode counter, does it start at 1 or at the
>     first-user-relation-oid?  If the former, it could conflict with oids
>     assigned to new system tables in later major releases.  Tying the
>     preservation of relations to two restrictions seems risky.
> 
> 
> Agreed. Unless you know for sure the input is going to immutable across copies
> or upgrades, including anything in either the IV or key derivation gets risky
> and could tie you down for the future. That's partly why I like the idea
> separate salt (basically you directly pay for the complexity by tracking that).

Yes, fragility is something to be concerned about.  The system is
already very complex, and we occasionally have to do forensic work or
repairs.

> Even if we do not include a separate per-relation salt or things like
> relfilenode when generating a derived key, we can still include other types of
> immutable attributes. For example the fork type could be included to eventually
> allow multiple forks for the same relation to be encrypted with the same IV =
> LSN + Page Number as the derived key per-fork would be distinct.

Yes, the fork number could be useful in this case.  I was thinking we
would just leave the extra bits as zeros and we can then set it to '1'
or something else for a different fork.

> 
>     Using just the page LSN and page number allows a page to be be
>     decrypted/encrypted independently of its file name, tablespace, and
>     database, and I think that is a win for simplicity.  Of course, if it is
>     insecure we will not do it.
> 
> 
> As LSN + Page Number combo is unique for all relations (not just one relation)
> I think we're good for pages.

Yes, since a single LSN can only be used for a single relation, and I
added an Assert to check that.  Good.

>     I am thinking for the heap/index IV, it would be:
> 
>             uint64 lsn;
>             unint32 page number;
>             /* only uses 11 bits for a zero-based CTR counter for 32k pages */
>             uint32 counter;
> 
> 
> Looks good. 
>  
> 
>     and for WAL it would be:
> 
>             uint64 segment_number;
>             uint32    counter;
>             /* guarantees this IV doesn't match any relation IV */
>             uint32   2^32-1 /* all 1's */   
> 
> 
> I need to read up more on the structure of the WAL records but here's some high
> level thoughts:
> 
> WAL encryption should not use the same key as page encryption so there's no
> need to design the IV to try to avoid matching the page IVs. Even a basic
> derivation with a single fixed WDEK = HKDF(MDEK, "WAL") and TDEK = HKDF(MDEK,
> "PAGE") would ensure separate keys. That's the the literal string "WAL" or
> "PAGE" being added as a salt to generate the respective keys, all that matters
> is they're different.

I was thinking the WAL would use the same key since the nonce is unique
between the two.  What value is there in using a different key?

> Ideally WAL encryption would generating new derived keys as part of the WAL
> stream. The WAL stream is not fixed so you have the luxury of being able to add
> a "Use new random salt XZY going forward" records. Forcing generation of a new
> salt/key upon promotion of a replica would ensure that at least the WAL is
> unique going forward. Could also generate a new upon server startup, after

Ah, yes, good point, and using a derived key would make that easier. 
The tricky part is what to use to create the new derived key, unless we
generate a random number and store that somewhere in the data directory,
but that might lead to fragility, so I am worried.  We have pg_rewind,
which allows to make the WAL go backwards.  What is the value in doing
this?

> every N bytes, or a new one for each new WAL file. There's much more
> flexibility compared to page encryption.
> 
> As WAL is a single continuous stream, we can start the IV for each derived WAL
> key from zero. There's no need to complicate it further as Key + IV will never
> be reused.

Uh, you want a new random key for each WAL file?  I was going to use the
WAL segment number as the nonce, which is always increasing, and easily
determined.  The file is 16MB.

> If WAL is always written as full pages we need to ensure that the empty parts
> of the page are actual zeros and not "encrypted zeroes". Otherwise an XOR of
> the empty section of the first write of a page against a subsequent one would
> give you the plain text.

Right, I think we need the segment number as part of the nonce for WAL.

> The non-fixed size of the WAL allows for the addition of a MAC though I'm not
> sure yet the best way to incorporate it. It could be part of each encrypted
> record or its own summary record (providing a MAC for a series of WAL records).
> After I've gone through this a bit more I'm looking to put together a write up
> with this and some other thoughts in one place.

I don't think we want to add a MAC at this point since the MAC for 8k
pages seems unattainable.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Aug 6, 2019 at 9:42 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Jul 31, 2019 at 04:58:49PM +0900, Masahiko Sawada wrote:
> > On Wed, Jul 31, 2019 at 3:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
> > > and then write only the encrypted data of the new WAL record using
> > > pg_pwrite() rather than write whole encrypted page. So each time we
> > > encrypt 8k WAL page we end up with encrypting different data with the
> > > same key+nonce but since we don't write to the disk other than space
> > > where we actually wrote WAL records it's not a problem. Is that right?
> >
> > Hmm that's incorrect. We always write an entire 8k WAL page even if we
> > write a few WAl records into a page. It's bad because we encrypt
> > different pages with the same key+IV, but we cannot change IV for each
> > WAL writes as we end up with changing also
> > already-flushed-WAL-records. So we might need to change the WAL write
> > so that it write only WAL records we actually wrote.
>
> Uh, I don't understand.  We use the LSN to write the 8k page, and we use
> a different nonce scheme for the WAL.  The LSN changes each time the
> page is modified. The 8k page in the WAL is encrypted just like the rest
> of the WAL.

What I'm thinking about WAL encryption is that WAL records on WAL
buffer is not encrypted. When writing to the disk we copy the contents
of 8k WAL page to a temporary buffer and encrypt it, and then write
it. And according to the current behavior, every time we write WAL we
write WAL per 8k WAL pages rather than WAL records.

The nonce for WAL encryption is {segment number, counter}. Suppose we
write 100 bytes WAL at beginning of the first 8k WAL page in WAL
segment 50. We encrypt the entire 8k WAL page with the nonce starting
from {50, 0} and write to the disk. After that, suppose we append 200
bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
page with the nonce staring from {50, 0} and write to the disk. The
two 8k WAL pages we wrote to the disk are different but we encrypted
them with the same nonce, which I think it's bad.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Tue, Aug  6, 2019 at 12:00:27PM +0900, Masahiko Sawada wrote:
> What I'm thinking about WAL encryption is that WAL records on WAL
> buffer is not encrypted. When writing to the disk we copy the contents
> of 8k WAL page to a temporary buffer and encrypt it, and then write
> it. And according to the current behavior, every time we write WAL we
> write WAL per 8k WAL pages rather than WAL records.
> 
> The nonce for WAL encryption is {segment number, counter}. Suppose we
> write 100 bytes WAL at beginning of the first 8k WAL page in WAL
> segment 50. We encrypt the entire 8k WAL page with the nonce starting
> from {50, 0} and write to the disk. After that, suppose we append 200
> bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
> page with the nonce staring from {50, 0} and write to the disk. The
> two 8k WAL pages we wrote to the disk are different but we encrypted
> them with the same nonce, which I think it's bad.

OK, I think you are missing something.   Let me go over the details. 
First, I think we are all agreed we are using CTR for heap/index pages,
and for WAL, because CTR allows byte granularity, it is faster, and
might be more secure.

So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
to encrypt each page.  In CTR mode, we do that by creating an 8k bit
stream, which is created in 16-byte chunks with AES by incrementing the
counter used for each 16-byte chunk.  Wee then XOR the bits with what we
want to encrypt, and skip the LSN and CRC parts of the page.

For WAL, we effectively create a 16MB bitstream, though we can create it
in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
nonce is the segment number, but each 16-byte chunk uses a different
counter.  Therefore, even if you are encrypting the same 8k page several
times in the WAL, the 8k page would be different because of the LSN (and
other changes), and the bitstream you encrypt/XOR it with would be
different because the counter would be different for that offset in the
WAL.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hi Bruce,
(off-list)

I think I'm missing something about basic of encryption. Please let me
question about it on off-list.

On Tue, Aug 6, 2019 at 11:36 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Aug  6, 2019 at 12:00:27PM +0900, Masahiko Sawada wrote:
> > What I'm thinking about WAL encryption is that WAL records on WAL
> > buffer is not encrypted. When writing to the disk we copy the contents
> > of 8k WAL page to a temporary buffer and encrypt it, and then write
> > it. And according to the current behavior, every time we write WAL we
> > write WAL per 8k WAL pages rather than WAL records.
> >
> > The nonce for WAL encryption is {segment number, counter}. Suppose we
> > write 100 bytes WAL at beginning of the first 8k WAL page in WAL
> > segment 50. We encrypt the entire 8k WAL page with the nonce starting
> > from {50, 0} and write to the disk. After that, suppose we append 200
> > bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
> > page with the nonce staring from {50, 0} and write to the disk. The
> > two 8k WAL pages we wrote to the disk are different but we encrypted
> > them with the same nonce, which I think it's bad.
>
> OK, I think you are missing something.   Let me go over the details.
> First, I think we are all agreed we are using CTR for heap/index pages,
> and for WAL, because CTR allows byte granularity, it is faster, and
> might be more secure.
>
> So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
> to encrypt each page.  In CTR mode, we do that by creating an 8k bit
> stream, which is created in 16-byte chunks with AES by incrementing the
> counter used for each 16-byte chunk.  Wee then XOR the bits with what we
> want to encrypt, and skip the LSN and CRC parts of the page.
>
> For WAL, we effectively create a 16MB bitstream, though we can create it
> in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> nonce is the segment number, but each 16-byte chunk uses a different
> counter.  Therefore, even if you are encrypting the same 8k page several
> times in the WAL, the 8k page would be different because of the LSN (and
> other changes), and the bitstream you encrypt/XOR it with would be
> different because the counter would be different for that offset in the
> WAL.

Well, so you mean that for example we encrypt only 100 bytes WAL
record when append 100 bytes WAL records?

For WAL encryption, if we encrypt the entire 8k WAL page and write the
entire page, the encrypted-and-written page will contain 100 bytes WAL
record data and (8192-100) bytes garbage (omitted WAL page header for
simplify), although WAL data on WAL buffer is still not encrypted
state. And then if we append 200 bytes again, the
encrypted-and-written page will contain 300 bytes WAL record data and
(8192-300)bytes garbage, data on WAL buffer is still not encrypted
state though.

In this case I think the first 100 bytes of two 8k WAL pages are the
same because we encrypted both from the beginning of the page with the
counter = 0. But the next 200 bytes are different; it's (encrypted)
garbage in the former case but it's (encrypted) WAL record data in the
latter case. I think that's a problem.

On the other hand, if we encrypt 8k WAL page with the different
counter of nonce after append 200 byes WAL record, the first 100 byte
(and of course the entire 8k page also) will be different. However
since it's the same thing doing as changing already-flushed WAL record
on the disk it's bad.

Also, if we encrypt only append data instead of entire 8k page, we
would need to have the information  in somewhere about how much byte
the WAL page has valid values. Otherwise reading WAL would not work
fine.

Please advise me what I am missing.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Hi,

On Wed, Aug 7, 2019, 00:31 Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi Bruce,
(off-list)

I think I'm missing something about basic of encryption. Please let me
question about it on off-list.

Sorry for the noise, it was not off-list. I made a mistake.


On Tue, Aug 6, 2019 at 11:36 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Aug  6, 2019 at 12:00:27PM +0900, Masahiko Sawada wrote:
> > What I'm thinking about WAL encryption is that WAL records on WAL
> > buffer is not encrypted. When writing to the disk we copy the contents
> > of 8k WAL page to a temporary buffer and encrypt it, and then write
> > it. And according to the current behavior, every time we write WAL we
> > write WAL per 8k WAL pages rather than WAL records.
> >
> > The nonce for WAL encryption is {segment number, counter}. Suppose we
> > write 100 bytes WAL at beginning of the first 8k WAL page in WAL
> > segment 50. We encrypt the entire 8k WAL page with the nonce starting
> > from {50, 0} and write to the disk. After that, suppose we append 200
> > bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
> > page with the nonce staring from {50, 0} and write to the disk. The
> > two 8k WAL pages we wrote to the disk are different but we encrypted
> > them with the same nonce, which I think it's bad.
>
> OK, I think you are missing something.   Let me go over the details.
> First, I think we are all agreed we are using CTR for heap/index pages,
> and for WAL, because CTR allows byte granularity, it is faster, and
> might be more secure.
>
> So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
> to encrypt each page.  In CTR mode, we do that by creating an 8k bit
> stream, which is created in 16-byte chunks with AES by incrementing the
> counter used for each 16-byte chunk.  Wee then XOR the bits with what we
> want to encrypt, and skip the LSN and CRC parts of the page.
>
> For WAL, we effectively create a 16MB bitstream, though we can create it
> in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> nonce is the segment number, but each 16-byte chunk uses a different
> counter.  Therefore, even if you are encrypting the same 8k page several
> times in the WAL, the 8k page would be different because of the LSN (and
> other changes), and the bitstream you encrypt/XOR it with would be
> different because the counter would be different for that offset in the
> WAL.

Well, so you mean that for example we encrypt only 100 bytes WAL
record when append 100 bytes WAL records?

For WAL encryption, if we encrypt the entire 8k WAL page and write the
entire page, the encrypted-and-written page will contain 100 bytes WAL
record data and (8192-100) bytes garbage (omitted WAL page header for
simplify), although WAL data on WAL buffer is still not encrypted
state. And then if we append 200 bytes again, the
encrypted-and-written page will contain 300 bytes WAL record data and
(8192-300)bytes garbage, data on WAL buffer is still not encrypted
state though.

In this case I think the first 100 bytes of two 8k WAL pages are the
same because we encrypted both from the beginning of the page with the
counter = 0. But the next 200 bytes are different; it's (encrypted)
garbage in the former case but it's (encrypted) WAL record data in the
latter case. I think that's a problem.

On the other hand, if we encrypt 8k WAL page with the different
counter of nonce after append 200 byes WAL record, the first 100 byte
(and of course the entire 8k page also) will be different. However
since it's the same thing doing as changing already-flushed WAL record
on the disk it's bad.

Also, if we encrypt only append data instead of entire 8k page, we
would need to have the information  in somewhere about how much byte
the WAL page has valid values. Otherwise reading WAL would not work
fine.

Please advise me what I am missing.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Regards,
On Wed, Aug  7, 2019 at 12:31:58AM +0900, Masahiko Sawada wrote:
> Well, so you mean that for example we encrypt only 100 bytes WAL
> record when append 100 bytes WAL records?
> 
> For WAL encryption, if we encrypt the entire 8k WAL page and write the
> entire page, the encrypted-and-written page will contain 100 bytes WAL
> record data and (8192-100) bytes garbage (omitted WAL page header for
> simplify), although WAL data on WAL buffer is still not encrypted
> state. And then if we append 200 bytes again, the
> encrypted-and-written page will contain 300 bytes WAL record data and
> (8192-300)bytes garbage, data on WAL buffer is still not encrypted
> state though.
> 
> In this case I think the first 100 bytes of two 8k WAL pages are the
> same because we encrypted both from the beginning of the page with the
> counter = 0. But the next 200 bytes are different; it's (encrypted)
> garbage in the former case but it's (encrypted) WAL record data in the
> latter case. I think that's a problem.
> 
> On the other hand, if we encrypt 8k WAL page with the different
> counter of nonce after append 200 byes WAL record, the first 100 byte
> (and of course the entire 8k page also) will be different. However
> since it's the same thing doing as changing already-flushed WAL record
> on the disk it's bad.
> 
> Also, if we encrypt only append data instead of entire 8k page, we
> would need to have the information  in somewhere about how much byte
> the WAL page has valid values. Otherwise reading WAL would not work
> fine.

OK, onlist reply.  We are going to encrypt the _entire_ WAL stream as we
write it, which is possible with CTR.  If we write 200 bytes of WAL, we
encrypt/XOR 200 bytes of WAL.  If we write 10k of WAL, and 8k of that is
an 8k page, we encrypt the entire 10k of WAL --- we don't care if there
is an 8k page in there or not.

CTR mode creates a bit stream for the first 16 bytes with nonce of
(segment_number, counter = 0), and the next 16 bytes with 
(segment_number, counter = 1), etc.  We only XOR using the parts of the
bit stream we want to use.  We don't care what the WAL content is --- we
just XOR it with the stream with the matching counter for that part of
the WAL.

It is true we are encrypting the same 8k page in the heap/index page,
and in WAL, with different key/nonce combinations, which I think is
secure.  What is insecure is to encrypt two different pieces of data
with the same key/nonce combination.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Aug  6, 2019 at 01:55:38PM -0400, Bruce Momjian wrote:
> CTR mode creates a bit stream for the first 16 bytes with nonce of
> (segment_number, counter = 0), and the next 16 bytes with 
> (segment_number, counter = 1), etc.  We only XOR using the parts of the
> bit stream we want to use.  We don't care what the WAL content is --- we
> just XOR it with the stream with the matching counter for that part of
> the WAL.

The diagram which is part of this section might be helpful:

    https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)
    https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#/media/File:CTR_encryption_2.svg

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hi,

On 8/6/19 3:01 PM, Bruce Momjian wrote:
> On Tue, Aug  6, 2019 at 01:55:38PM -0400, Bruce Momjian wrote:
>> CTR mode creates a bit stream for the first 16 bytes with nonce of
>> (segment_number, counter = 0), and the next 16 bytes with
>> (segment_number, counter = 1), etc.  We only XOR using the parts of the
>> bit stream we want to use.  We don't care what the WAL content is --- we
>> just XOR it with the stream with the matching counter for that part of
>> the WAL.
>
> The diagram which is part of this section might be helpful:
>
>     https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)
>     https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#/media/File:CTR_encryption_2.svg

This is going to be a slightly long (understatement) email that I
thought would be easier to try to communicate all in one place vs.
replying to individual parts on this long thread. My main goal was to
present some things I had researched on TDE, some of which had been
mentioned on thread, and compile it in one place (it's also why I was
slow to respond on some other things on the thread -- sorry!)

While compiling this note, one idea that came to mind is that given the
complexity of this topic and getting the key pieces (no pun intended)
correct, it may be worthwhile if the people interested in working on TDE
make some time where we can all either get together and/or set up a
group call where we hash out the architecture we want to build to ensure
we have a fundamentally sound implementation (even if it takes a few
versions to implement it fully).

Since my last note, I really dove in to understand what other RDBMS
system are doing and what we can learn from them as well as what would
make sense in PostgreSQL. In particular, one goal is to be able to build
a TDE system that satisfies keeping data both confidential and
integrable while at rest while minimizing the amount of overhead we
introduce into the system.

Additionally, I would strongly suggest, if not require, that what we
build follows guidelines such as those outlined by NIST, as failure to
do so could end up that some teams would be unable to utilize our TDE
solution. And of course, mitigate the risk that we introduce security
vulnerabilities :)

(I've also continued to build out my terrible prototype to experiment
with some of the methods suggested. It's written in Python and leverages
the "cryptography"[0] library [which it itself has some good
recommendations on how to use its various parts) and still not worth
sharing yet (though happy to share if asked off-list -- you will be
underwhelmed].)

Below I outline some of my findings from looking at other systems,
looking at our own code, and make recommendations to the best of my
abilities on this matter. I broke it up into these 3 sections, which are
interspersed with research and recommendations:

1. Encryption Key Management
2. Encryption/Decryption of Pages + WAL Records
3. Automated Key Rotation

Of course, they are tightly intertwined, but thought it would be easier
to look at it in this way. It does stop short of certain implementation
details.

Anyway, without further ado:

#1 Encryption Key Management
----------------------------

While I thought what we had proposed on list (KEK, MDEK, TDEK, WDEK)
system made a lot of sense, I still decided to look at what other
systems did. In particular, I looked at SQL Server and the "redwood
city" database.

It turns out the SQL Server has a somewhat similar architecture to what
we propose[1]:


https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-2017

- A Service Master Key (equivalent to the key-encrypting key KEK) is
created when SQL Server is setup (equivalent to initdb)
- SQL Server uses Windows facilities to protect the master cluster key.
We are trying to achieve this with our KEK, which are saying we would
store in the GUC `tde_passphrase` (or equivalent, I forget the name we
proposed, if we proposed one. Sorry.). If we wanted to protect it at the
OS level, Peter Eisentraut shows a way to do this equivalent with the
"ssl_passphrase" GUC and systemd[2]
- SQL Server lets you create a "master key" for all the clusters, explicitly
- Here is where the differences start: SQL Server then lets you create a
certificate that can be protected by the master key.
- For the specific databases in the SQL Server cluster, you can then
create a "database encryption key" (they call it "DEK") that is used to
encrypt the data, and ask that said key is protected by a certificate.

(For SQL Server, it appears that TDE is all or nothing within a
database, but *not* within the entire cluster.)

SQL Server does support a multi-key approach which allows for, amongst
other things, key rotation. For what we are looking to do, it may be too
much, but it seems similar to the ideas proposed.

The redwood city database also has something similar to what we are
proposing[3]:

https://docs.oracle.com/database/121/ASOAG/introduction-to-transparent-data-encryption.htm#ASOAG10139

Basically:

    - A master key (equiv to our MDEK) is managed by an "external
security module".
    - Each tablespace has a key that can be encrypted/decrypted by the
MDEK. This would be similar to our TDEK.

The keys are secured in a wallet[8].

This is a long way of saying that we are in the right direction
architecture-wise.

Here are some thoughts on that. Using terms defined in Joe's email[4].
Note I am assuming we are going with AES-256 (and a bit more on this later).

    0. First, proposing the KEK he defined is the MDEK. More on that in
a second.

    1. To have a MDEK stored in a GUC, we'd have to have people
comfortable storing with random bytes in a GUC. It may be better to either:

        a) Allow for the GUC to point to a file that has the MDEK stored
        b) Allow the GUC to be a passphrase that can unlock the MDEK,
which is stored internally in the catalog.

    For a), we might want to provide some facilities for the user to
generate a MDEK, which there have been suggestions to do that from initdb.

    2. We need to ensure we allow the user to cleanly rotate the MDEK.
Just updating the passphrase would be no good, as they'd basically lock
themselves out of all of their data :) This was proposed on the list,
but would need to ensure we can handle this in a tidy way with what we
can control in the configuration.

    3. We then have a "database encryption key" (DDEK) which is
presumably generated on CREATE DATABASE (as I am not sure I've seen that
explicitly stated anywhere). This is encrypted by the MDEK using a
padded key-wrapping function[5].

    4. Each relation is encrypted by a "table encryption key" (TDEK)
that is generated by a HMAC-based key deriviation function (HKDF) using
SHA-256 and the DDEK. Guidelines for how to safely generate this are
present in NIST 800-56C[6]. We would want a two-step key derivation,
which would include[6]:

        a) A cryptographically-secure randomly generated salt of 64
bytes[6] (Section 5; Table 4) that is created when the relation is created.
        b) Some "fixed info" bit string which is likely to be composed
of the relation OID (4 bytes) and a "key identifier" (which I have
pegged at 1 bytes, but more on this later)

    5. Last but not least, the "WAL encrpytion key" (WDEK). I have
looked at this one the least as I was focused on the pages, but this is
also one where we can use a HKDF. However, I believe the difference is
we need to use the **MDEK** here to generate the WDEK, which would include:

        a) A cryptographically-secure randomly generated salt of 64
bytes[6] (Section 5; Table 4) that is created when the **cluster** is
initialized.
        b) A counter bit string, that is incremented each time we've
encrypted more than 64GB of WAL. (More in key management)

One open question is do we allow users to explicitly add their own DDEK
or TDEK? I would say yes for DDEK (given it would be stored anyway), no
for TDEK for now, if only because of the potential desire to
automatically rotate the TDEK once 64GB of data has been encrypted by
the same TDEK (more on this in a bit). It seems, if we follow the SQL
Server example, the level of granularity it goes down to is the
database, so we may be safe there. Redwood city lets you select the
tablespace to use TDE, so we can be mindful of that as well.

I well understand the nuances are in the details, but this should build
on good practices started by other RDBMS, guidelines recommended by
NIST, and provide something that is relatively easy for our users to
implement.

#2 Encryption/Decryption of Pages + WAL Records
-----------------------------------------------

To me there are two goals with this part:

    1. Ensure confidentiality of the data, in particular, if the storage
device is removed
    2. Ensure the integrity of the data should someone tamper with the
ciphertext

The other thing to consider (per this discussion) is doing so in a way
where we can add minimal overhead to PostgreSQL, both in terms of
additional storage. (Though if you want TDE you are already accepting
some overhead.)

With that all said, I first wanted to understand what other RDBMS did
(surprise) for their encryption algorithms and modes.

I had a bit of trouble trying to figure out what SQL Server did, but if
this[7] is still accurate, it's a combination of AES-{128,192,256} and
the CBC mode. They provide an integrity check with a SHA1 hash.

With redwood city, they appear to support a few algorithms and modes[9]
and I did not narrow down which is used by default[10]. They list out
CBC, CFB, ECB, and OFB, and also indicate they use a SHA1 hash to
provide the integrity check.

This lead me to an exploration of all the different modes we could
utilize and what would make the most sense for PostgreSQL. Depending on
what you read, you could easily get your head (or your drives)
spinning[11][12][13][14].

However, while reading all of these, a few themes emerged:

    1. Unless you're using authentication encryption[15], your data is
subject to tampering (This has been brought up on the list).

    2. All of  modes have tradeoffs given they are focused on
performance of encrypting/decrypting

    3. If you're using anything with counters, you MUST choose a secure
counting method and NEVER repeat the counter with the same key. This has
also been brought up on the list.

(It goes without saying, but we also want to ensure we use something
that is in the clear on intellectual property, which you could run into
with some modes)

No matter which mode we choose, we will have to store an integrity
check, and the exploration I began was how we could do this while taking
up the least amount of space. But we _cannot_ compromise on the
integrity check, otherwise any encrypted data is subject to tampering. I
know I read this in one of the guidelines somewhere, but I can't seem to
find it at the moment I'm typing this.

I believe the mode we are looking for is GCM[16] which has similar
benefits to CTR (parallelism) but allows for an integrity check (via its
GMAC). And if you follow NIST 800-38d[17] (below for convenience), you
can actually create IVs deterministically! It also has the added benefit
where you can associate additional, unencrypted data as part of the
decryption process (a nice example of how to do so in the Python
cryptography library[18]).

Here are the guidelines:
https://csrc.nist.gov/publications/detail/sp/800-38d/final

A lot of what we are concerned about is in Section 8. For IV construction:

    - The size they recommend is 96 bits. If the IV is 96 bits or less,
there is guidance for deterministic construction (Sec 8.2.1)
    - An IV can have a "fixed" field and an "invocation" field. The
"fixed" field could be something like the relation OID (32-bits). The
invocation field could be the page number (32-bits). 64-bit IVs are okay
-- they recommend 96 bits.

    ^ The beauty of this is if we decide we are ok with 64-bit IVs, we
do not need to store any additional data :) If we want 96 bits we'd like
need to store a 4 byte "salt" if you will. Or we could leave it blank in
case we ever have more than 2**32 pages (*hides*).

    - You cannot repeat an IV for a given key.

    - Part of the output of the encryption is a tag, which can be
anywhere from 32 - 128 bits in length. The recommendation is to use 128
bits. This would then be the only additional storage we need. (If
smaller tags are to be considered, there are limits to the number of
invocations of the decryption function).

This means at most we only need to add 8 additional bytes of storage on
the pages + WAL headers.

However, even if we were to use a 96-bit IVs with 64-bits variable
space, we would need to eventually rotate keys (esp. WAL). As such:

# 3. Automated Key Rotation
---------------------------

If need to rotate an encryption key once we are at the point where a
counter will reset, that means we also need to know which key encrypted
the function. If we decide that 64GB is our magic number, based on some
back of the envelope calculation with an assist from a nice blog post by
Simon[19], I determined that each relation would require a maximum of
512 keys (32TB / 64GB). Using the HKDF method of deriving keys, we can
just use a byte to store which "key" should be used to encrypt/decrypt a
particular page. I believe would be similar for WAL.

BUT...AIUI that 64GB limit is for a key/IV pair. If each page has its
own IV and our counter is tied to the total number of pages available in
a relation, do we need to rotate that key? (I apologize at this point,
I've been writing this note for a couple of hours and my brain is
getting a bit mushy, so I'm fine with being told I'm way off base).

WDEK is a different, as we know the counter can wrap. As such, we likely
need to keep some length of a "key id" (4 bytes?) on the WAL to know
which WDEK was used. The key id would be passed in as part of the
"counter bit string" to the HKDF. We want to make it large enough that
the probability of wrapping around is low, and we are within the
guidelines for using a HKDF properly.

---------------

At this point I'm running out of steam as I type this. I realize overall
there is a lot to consider.

I'll also make the suggestion again that perhaps the people who are
interested in working on TDE have a discussion/meetup etc. to iron out
details.

If we can build a system that's architecturally sound (read: no one can
file a CVE on the architecture), adheres to guidelines for teams that
opt to utilize things like TDE, stores data confidentially and with
integrity (where we can detect tampering), and is easy(-ish) to use,
we'll be in a good position :)

Thanks,

Jonathan

[0] https://cryptography.io/en/latest/
[1]

https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-2017
[2]
https://www.2ndquadrant.com/en/blog/postgresql-passphrase-protected-ssl-keys-systemd/
[3]
https://docs.oracle.com/database/121/ASOAG/introduction-to-transparent-data-encryption.htm#ASOAG10139
[4]
https://www.postgresql.org/message-id/c878de71-a0c3-96b2-3e11-9ac2c35357c3%40joeconway.com
[5] https://tools.ietf.org/html/rfc5649.html
[6] https://csrc.nist.gov/publications/detail/sp/800-56c/rev-1/final
[7]
https://blogs.msdn.microsoft.com/sqlsecurity/2009/03/30/sql-server-encryptbykey-cryptographic-message-description/
[8]
https://docs.oracle.com/cd/B19306_01/network.102/b14268/asotrans.htm#BABGHIDE
[9]
https://docs.oracle.com/database/121/DBSEG/data_encryption.htm#DBSEG80084
[10]
https://docs.oracle.com/cd/B19306_01/network.102/b14268/asotrans.htm#BABHJCHD
[11]
https://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html
[12]
https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html#encryption-modes-and-usage
[13] https://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/
[14]
https://crypto.stackexchange.com/questions/14628/why-do-we-use-xts-over-ctr-for-disk-encryption
[15] https://en.wikipedia.org/wiki/Authenticated_encryption
[16] https://en.wikipedia.org/wiki/Galois/Counter_Mode
[17] https://csrc.nist.gov/publications/detail/sp/800-38d/final
[18]

https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/#cryptography.hazmat.primitives.ciphers.modes.GCM
[19] https://www.2ndquadrant.com/en/blog/postgresql-maximum-table-size/


Attachment
On Tue, Aug  6, 2019 at 06:13:30PM -0400, Jonathan Katz wrote:
> Hi,
> 
> On 8/6/19 3:01 PM, Bruce Momjian wrote:
> > On Tue, Aug  6, 2019 at 01:55:38PM -0400, Bruce Momjian wrote:
> >> CTR mode creates a bit stream for the first 16 bytes with nonce of
> >> (segment_number, counter = 0), and the next 16 bytes with 
> >> (segment_number, counter = 1), etc.  We only XOR using the parts of the
> >> bit stream we want to use.  We don't care what the WAL content is --- we
> >> just XOR it with the stream with the matching counter for that part of
> >> the WAL.
> > 
> > The diagram which is part of this section might be helpful:
> > 
> >     https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)
> >     https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#/media/File:CTR_encryption_2.svg
> 
> This is going to be a slightly long (understatement) email that I
> thought would be easier to try to communicate all in one place vs.
> replying to individual parts on this long thread. My main goal was to
> present some things I had researched on TDE, some of which had been
> mentioned on thread, and compile it in one place (it's also why I was
> slow to respond on some other things on the thread -- sorry!)

This basically tries to re-litigate many discussions we have already
had, and I don't see much value in replying point by point.  It
relitigates:

*  table/tablespace-level encryption keys (single WAL file and unlocked
keys for recovery)

*  CTR mode

*  Authentication of data (we decided we would not do this for v1 of
this feature)

* Use of something like "ssl_passphrase"

If you want to relitigate something, you will need to state that, and
reference the previous arguments in explaining your disagreement.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Aug 7, 2019 at 2:55 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Aug  7, 2019 at 12:31:58AM +0900, Masahiko Sawada wrote:
> > Well, so you mean that for example we encrypt only 100 bytes WAL
> > record when append 100 bytes WAL records?
> >
> > For WAL encryption, if we encrypt the entire 8k WAL page and write the
> > entire page, the encrypted-and-written page will contain 100 bytes WAL
> > record data and (8192-100) bytes garbage (omitted WAL page header for
> > simplify), although WAL data on WAL buffer is still not encrypted
> > state. And then if we append 200 bytes again, the
> > encrypted-and-written page will contain 300 bytes WAL record data and
> > (8192-300)bytes garbage, data on WAL buffer is still not encrypted
> > state though.
> >
> > In this case I think the first 100 bytes of two 8k WAL pages are the
> > same because we encrypted both from the beginning of the page with the
> > counter = 0. But the next 200 bytes are different; it's (encrypted)
> > garbage in the former case but it's (encrypted) WAL record data in the
> > latter case. I think that's a problem.
> >
> > On the other hand, if we encrypt 8k WAL page with the different
> > counter of nonce after append 200 byes WAL record, the first 100 byte
> > (and of course the entire 8k page also) will be different. However
> > since it's the same thing doing as changing already-flushed WAL record
> > on the disk it's bad.
> >
> > Also, if we encrypt only append data instead of entire 8k page, we
> > would need to have the information  in somewhere about how much byte
> > the WAL page has valid values. Otherwise reading WAL would not work
> > fine.
>
> OK, onlist reply.  We are going to encrypt the _entire_ WAL stream as we
> write it, which is possible with CTR.  If we write 200 bytes of WAL, we
> encrypt/XOR 200 bytes of WAL.  If we write 10k of WAL, and 8k of that is
> an 8k page, we encrypt the entire 10k of WAL --- we don't care if there
> is an 8k page in there or not.
>
> CTR mode creates a bit stream for the first 16 bytes with nonce of
> (segment_number, counter = 0), and the next 16 bytes with
> (segment_number, counter = 1), etc.  We only XOR using the parts of the
> bit stream we want to use.  We don't care what the WAL content is --- we
> just XOR it with the stream with the matching counter for that part of
> the WAL.
>
> It is true we are encrypting the same 8k page in the heap/index page,
> and in WAL, with different key/nonce combinations, which I think is
> secure.  What is insecure is to encrypt two different pieces of data
> with the same key/nonce combination.
>

I understood. IIUC in your approach postgres processes encrypt WAL
records when inserting to the WAL buffer. So WAL data is encrypted
even on the WAL buffer.

It works but I think the implementation might be complex; For example
using openssl, we would use EVP functions to encrypt data by
AES-256-CTR. We would need to make IV and pass it to them and these
functions however don't manage the counter value of nonce as long as I
didn't miss. That is, we need to calculate the correct counter value
for each encryption and pass it to EVP functions. Suppose we encrypt
20 bytes of WAL. The first 16 bytes is encrypted with nonce of
(segment_number, 0) and the next 4 bytes is encrypted with nonce of
(segment_number, 1). After that suppose we encrypt 12 bytes of WAL. We
cannot use nonce of (segment_number, 2) but should use nonce of
(segment_number , 1). Therefore we would need 4 bytes padding and to
encrypt it and then to throw that 4 bytes away .


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Wed, Aug  7, 2019 at 05:13:31PM +0900, Masahiko Sawada wrote:
> I understood. IIUC in your approach postgres processes encrypt WAL
> records when inserting to the WAL buffer. So WAL data is encrypted
> even on the WAL buffer.
> 
> It works but I think the implementation might be complex; For example
> using openssl, we would use EVP functions to encrypt data by
> AES-256-CTR. We would need to make IV and pass it to them and these
> functions however don't manage the counter value of nonce as long as I
> didn't miss. That is, we need to calculate the correct counter value
> for each encryption and pass it to EVP functions. Suppose we encrypt
> 20 bytes of WAL. The first 16 bytes is encrypted with nonce of
> (segment_number, 0) and the next 4 bytes is encrypted with nonce of
> (segment_number, 1). After that suppose we encrypt 12 bytes of WAL. We
> cannot use nonce of (segment_number, 2) but should use nonce of
> (segment_number , 1). Therefore we would need 4 bytes padding and to
> encrypt it and then to throw that 4 bytes away .

Since we want to have per-byte control over encryption, for both
heap/index pages (skip LSN and CRC), and WAL (encrypt to the last byte),
I assumed we would need to generate a bit stream of a specified size and
do the XOR ourselves against the data.  I assume ssh does this, so we
would have to study the method.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Mon, Aug 5, 2019 at 9:02 PM Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Jul 31, 2019 at 09:25:01AM -0400, Sehrope Sarkuni wrote:
> Even if we do not include a separate per-relation salt or things like
> relfilenode when generating a derived key, we can still include other types of
> immutable attributes. For example the fork type could be included to eventually
> allow multiple forks for the same relation to be encrypted with the same IV =
> LSN + Page Number as the derived key per-fork would be distinct.

Yes, the fork number could be useful in this case.  I was thinking we
would just leave the extra bits as zeros and we can then set it to '1'
or something else for a different fork.

Key derivation has more flexibility as you're not limited by the number of unused bits in the IV.
 
> WAL encryption should not use the same key as page encryption so there's no
> need to design the IV to try to avoid matching the page IVs. Even a basic
> derivation with a single fixed WDEK = HKDF(MDEK, "WAL") and TDEK = HKDF(MDEK,
> "PAGE") would ensure separate keys. That's the the literal string "WAL" or
> "PAGE" being added as a salt to generate the respective keys, all that matters
> is they're different.

I was thinking the WAL would use the same key since the nonce is unique
between the two.  What value is there in using a different key?

Never having to worry about overlap in Key + IV usage is main advantage. While it's possible to structure IVs to avoid that from happening, it's much easier to completely avoid that situation by ensuring different parts of an application are using separate derived keys.
 
> Ideally WAL encryption would generating new derived keys as part of the WAL
> stream. The WAL stream is not fixed so you have the luxury of being able to add
> a "Use new random salt XZY going forward" records. Forcing generation of a new
> salt/key upon promotion of a replica would ensure that at least the WAL is
> unique going forward. Could also generate a new upon server startup, after

Ah, yes, good point, and using a derived key would make that easier.
The tricky part is what to use to create the new derived key, unless we
generate a random number and store that somewhere in the data directory,
but that might lead to fragility, so I am worried. 

Simplest approach for derived keys would be to use immutable attributes of the WAL files as an input to the key derivation. Something like HKDF(MDEK, "WAL:" || timeline_id || wal_segment_num) should be fine for this as it is:

* Unique per WAL file
* Known prior to writing to a given WAL file
* Known prior to reading a given WAL file
* Does not require any additional persistence

We have pg_rewind,
which allows to make the WAL go backwards.  What is the value in doing
this?

Good point re: pg_rewind. Having key rotation records in the stream would complicate that as you'd have to jump back / forward to figure out which key to use. It's doable but much more complicated.

A unique WDEK per WAL file that is derived from the segment number would not have that problem. A unique key per-file means the IVs can all start at zero and the each file can be treated as one encrypted stream. Any encryption/decryption code would only need to touch the write/read callsites.
 
> every N bytes, or a new one for each new WAL file. There's much more
> flexibility compared to page encryption.
>
> As WAL is a single continuous stream, we can start the IV for each derived WAL
> key from zero. There's no need to complicate it further as Key + IV will never
> be reused.

Uh, you want a new random key for each WAL file?  I was going to use the
WAL segment number as the nonce, which is always increasing, and easily
determined.  The file is 16MB.

Ideally yes as it would allow for multiple replicas promoted off the same primary to immediately diverge as each would have its own keys. I don't consider it a requirement but if it's possible without significant added complexity I say that's a win.

I'm still reading up on the file and record format to understand how complex that would be. Though given your point re: pg_rewind and the lack of handling for page encryption divergence when promoting multiple replicas, I doubt the complexity will be worth it.
 
> If WAL is always written as full pages we need to ensure that the empty parts
> of the page are actual zeros and not "encrypted zeroes". Otherwise an XOR of
> the empty section of the first write of a page against a subsequent one would
> give you the plain text.

Right, I think we need the segment number as part of the nonce for WAL.

+1 to using segment number but it's better as a derived key instead of coming up with new IV constructs and reusing the MDEK.
 
> The non-fixed size of the WAL allows for the addition of a MAC though I'm not
> sure yet the best way to incorporate it. It could be part of each encrypted
> record or its own summary record (providing a MAC for a series of WAL records).
> After I've gone through this a bit more I'm looking to put together a write up
> with this and some other thoughts in one place.

I don't think we want to add a MAC at this point since the MAC for 8k
pages seems unattainable.

Even without a per-page MAC, a MAC at some level for WAL has its own benefits such as perfect corruption detection. It could be per-record, per-N-records, per-checkpoint, or per-file. The current WAL file format already handles arbitrary gaps so there is significantly more flexibility in adding it vs pages. I'm not saying it should be a requirement but, unlike pages, I would not rule it out just yet as it may not be that complicated.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Wed, Aug 7, 2019 at 7:19 AM Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Aug  7, 2019 at 05:13:31PM +0900, Masahiko Sawada wrote:
> I understood. IIUC in your approach postgres processes encrypt WAL
> records when inserting to the WAL buffer. So WAL data is encrypted
> even on the WAL buffer.

I was originally thinking of not encrypting the shared WAL buffers but that may have issues. If the buffers are already encrypted and contiguous in shared memory, it's possible to write out many via a single pg_pwrite(...) call as is currently done in XLogWrite(...).

If they're not encrypted you'd need to do more work in that critical section. That'd involve allocating a commensurate amount of memory to hold the encrypted pages and then encrypting them all prior to the single pg_pwrite(...) call. Reusing one buffer is possible but it would require encrypting and writing the pages one by one. Both of those seem like a bad idea.

Better to pay the encryption cost at the time of WAL record creation and keep the writing process as fast and simple as possible.
 
> It works but I think the implementation might be complex; For example
> using openssl, we would use EVP functions to encrypt data by
> AES-256-CTR. We would need to make IV and pass it to them and these
> functions however don't manage the counter value of nonce as long as I
> didn't miss. That is, we need to calculate the correct counter value
> for each encryption and pass it to EVP functions. Suppose we encrypt
> 20 bytes of WAL. The first 16 bytes is encrypted with nonce of
> (segment_number, 0) and the next 4 bytes is encrypted with nonce of
> (segment_number, 1). After that suppose we encrypt 12 bytes of WAL. We
> cannot use nonce of (segment_number, 2) but should use nonce of
> (segment_number , 1). Therefore we would need 4 bytes padding and to
> encrypt it and then to throw that 4 bytes away .

Since we want to have per-byte control over encryption, for both
heap/index pages (skip LSN and CRC), and WAL (encrypt to the last byte),
I assumed we would need to generate a bit stream of a specified size and
do the XOR ourselves against the data.  I assume ssh does this, so we
would have to study the method.

The lower level non-EVP OpenSSL functions allow specifying the offset within the 16-byte AES block from which the encrypt/decrypt should proceed. It's the "num" parameter of their encrypt/decrypt functions. For a continuous encrypted stream such as a WAL file, a "pread(...)" of a possibly non-16-byte aligned section would involve determining the 16-byte counter (byte_offset / 16) and the intra-block offset (byte_offset % 16). I'm not sure how one handles initializing the internal encrypted counter and that might be one more step that would need be done. But it's definitely possible to read / write less than a block via those APIs (not the EVP ones).

I don't think the EVP functions have parameters for the intra-block offset but you can mimic it by initializing the IV/block counter and then skipping over the intra-block offset by either reading or writing a dummy partial block. The EVP read and write functions both deal with individual bytes so once you've seeked to your desired offset you can read or write the real individual bytes.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
 
On Wed, Aug  7, 2019 at 11:41:51AM -0400, Sehrope Sarkuni wrote:
> On Wed, Aug 7, 2019 at 7:19 AM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Wed, Aug  7, 2019 at 05:13:31PM +0900, Masahiko Sawada wrote:
>     > I understood. IIUC in your approach postgres processes encrypt WAL
>     > records when inserting to the WAL buffer. So WAL data is encrypted
>     > even on the WAL buffer.
> 
> 
> I was originally thinking of not encrypting the shared WAL buffers but that may
> have issues. If the buffers are already encrypted and contiguous in shared
> memory, it's possible to write out many via a single pg_pwrite(...) call as is
> currently done in XLogWrite(...).

The shared buffers will not be encrypted --- they are encrypted only
when being written to storage.  We felt encrypting shared buffers will
be too much overhead, for little gain.  I don't know if we will encrypt
while writing to the WAL buffers or while writing the WAL buffers to
the file system.

> If they're not encrypted you'd need to do more work in that critical section.
> That'd involve allocating a commensurate amount of memory to hold the encrypted
> pages and then encrypting them all prior to the single pg_pwrite(...) call.
> Reusing one buffer is possible but it would require encrypting and writing the
> pages one by one. Both of those seem like a bad idea.

Well, right now the 8k pages is part of the WAL stream, so I don't know
it would be any more overhead than other WAL writes.  I am hoping we can
generate the encryption bit stream in chunks earlier so we can just to
the XOR was we are writing the data to the WAL buffers.

> Better to pay the encryption cost at the time of WAL record creation and keep
> the writing process as fast and simple as possible.

Yes, I don't think we know at the time of WAL record creation what
_offset_ the records will have when then are written to WAL, so I am
thinking we need to do it later, and as I said, I am hoping we can
generate the encryption bit stream earlier.

>     > It works but I think the implementation might be complex; For example
>     > using openssl, we would use EVP functions to encrypt data by
>     > AES-256-CTR. We would need to make IV and pass it to them and these
>     > functions however don't manage the counter value of nonce as long as I
>     > didn't miss. That is, we need to calculate the correct counter value
>     > for each encryption and pass it to EVP functions. Suppose we encrypt
>     > 20 bytes of WAL. The first 16 bytes is encrypted with nonce of
>     > (segment_number, 0) and the next 4 bytes is encrypted with nonce of
>     > (segment_number, 1). After that suppose we encrypt 12 bytes of WAL. We
>     > cannot use nonce of (segment_number, 2) but should use nonce of
>     > (segment_number , 1). Therefore we would need 4 bytes padding and to
>     > encrypt it and then to throw that 4 bytes away .
> 
>     Since we want to have per-byte control over encryption, for both
>     heap/index pages (skip LSN and CRC), and WAL (encrypt to the last byte),
>     I assumed we would need to generate a bit stream of a specified size and
>     do the XOR ourselves against the data.  I assume ssh does this, so we
>     would have to study the method.
> 
> 
> The lower level non-EVP OpenSSL functions allow specifying the offset within
> the 16-byte AES block from which the encrypt/decrypt should proceed. It's the
> "num" parameter of their encrypt/decrypt functions. For a continuous encrypted
> stream such as a WAL file, a "pread(...)" of a possibly non-16-byte aligned
> section would involve determining the 16-byte counter (byte_offset / 16) and
> the intra-block offset (byte_offset % 16). I'm not sure how one handles
> initializing the internal encrypted counter and that might be one more step
> that would need be done. But it's definitely possible to read / write less than
> a block via those APIs (not the EVP ones).
> 
> I don't think the EVP functions have parameters for the intra-block offset but
> you can mimic it by initializing the IV/block counter and then skipping over
> the intra-block offset by either reading or writing a dummy partial block. The
> EVP read and write functions both deal with individual bytes so once you've
> seeked to your desired offset you can read or write the real individual bytes.

Can we generate the bit stream in 1MB chunks or something and just XOR
as needed?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Aug 7, 2019 at 1:39 PM Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Aug  7, 2019 at 11:41:51AM -0400, Sehrope Sarkuni wrote:
> On Wed, Aug 7, 2019 at 7:19 AM Bruce Momjian <bruce@momjian.us> wrote:
>
>     On Wed, Aug  7, 2019 at 05:13:31PM +0900, Masahiko Sawada wrote:
>     > I understood. IIUC in your approach postgres processes encrypt WAL
>     > records when inserting to the WAL buffer. So WAL data is encrypted
>     > even on the WAL buffer.
>
>
> I was originally thinking of not encrypting the shared WAL buffers but that may
> have issues. If the buffers are already encrypted and contiguous in shared
> memory, it's possible to write out many via a single pg_pwrite(...) call as is
> currently done in XLogWrite(...).

The shared buffers will not be encrypted --- they are encrypted only
when being written to storage.  We felt encrypting shared buffers will
be too much overhead, for little gain.  I don't know if we will encrypt
while writing to the WAL buffers or while writing the WAL buffers to
the file system.

My mistake on the wording. By "shared WAL buffers" I meant the shared memory used for WAL buffers, XLogCtl->pages. Not the shared buffers for pages.
 
> If they're not encrypted you'd need to do more work in that critical section.
> That'd involve allocating a commensurate amount of memory to hold the encrypted
> pages and then encrypting them all prior to the single pg_pwrite(...) call.
> Reusing one buffer is possible but it would require encrypting and writing the
> pages one by one. Both of those seem like a bad idea.

Well, right now the 8k pages is part of the WAL stream, so I don't know
it would be any more overhead than other WAL writes. 

The total work is the same but when it happens, memory usage, or number of syscalls could change.

Right now the XLogWrite(...) code can write many WAL pages at once via a single call to pg_pwrite(...): https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb=HEAD#l2491

If the blocks are not encrypted then you either need to allocate and encrypt everything (could be up to wal_buffers max size) to do it as one write, or encrypt chunks of WAL and do multiple writes. I'm not sure how big an issue this would be in practice as it'd be workload specific. 
 
I am hoping we can
generate the encryption bit stream in chunks earlier so we can just to
the XOR was we are writing the data to the WAL buffers.

For pure CTR that sounds doable as it'd be the same as doing an XOR with encrypted zero. Anything with a built-in MAC like GCM would not work though (I'm not proposing we use that, just keeping it in mind).

You'd also increase your memory requirements (one allocation for the encrypted stream and one for the encrypted data right?).
 
> Better to pay the encryption cost at the time of WAL record creation and keep
> the writing process as fast and simple as possible.

Yes, I don't think we know at the time of WAL record creation what
_offset_ the records will have when then are written to WAL, so I am
thinking we need to do it later, and as I said, I am hoping we can
generate the encryption bit stream earlier.

>     > It works but I think the implementation might be complex; For example
>     > using openssl, we would use EVP functions to encrypt data by
>     > AES-256-CTR. We would need to make IV and pass it to them and these
>     > functions however don't manage the counter value of nonce as long as I
>     > didn't miss. That is, we need to calculate the correct counter value
>     > for each encryption and pass it to EVP functions. Suppose we encrypt
>     > 20 bytes of WAL. The first 16 bytes is encrypted with nonce of
>     > (segment_number, 0) and the next 4 bytes is encrypted with nonce of
>     > (segment_number, 1). After that suppose we encrypt 12 bytes of WAL. We
>     > cannot use nonce of (segment_number, 2) but should use nonce of
>     > (segment_number , 1). Therefore we would need 4 bytes padding and to
>     > encrypt it and then to throw that 4 bytes away .
>
>     Since we want to have per-byte control over encryption, for both
>     heap/index pages (skip LSN and CRC), and WAL (encrypt to the last byte),
>     I assumed we would need to generate a bit stream of a specified size and
>     do the XOR ourselves against the data.  I assume ssh does this, so we
>     would have to study the method.
>
>
> The lower level non-EVP OpenSSL functions allow specifying the offset within
> the 16-byte AES block from which the encrypt/decrypt should proceed. It's the
> "num" parameter of their encrypt/decrypt functions. For a continuous encrypted
> stream such as a WAL file, a "pread(...)" of a possibly non-16-byte aligned
> section would involve determining the 16-byte counter (byte_offset / 16) and
> the intra-block offset (byte_offset % 16). I'm not sure how one handles
> initializing the internal encrypted counter and that might be one more step
> that would need be done. But it's definitely possible to read / write less than
> a block via those APIs (not the EVP ones).
>
> I don't think the EVP functions have parameters for the intra-block offset but
> you can mimic it by initializing the IV/block counter and then skipping over
> the intra-block offset by either reading or writing a dummy partial block. The
> EVP read and write functions both deal with individual bytes so once you've
> seeked to your desired offset you can read or write the real individual bytes.

Can we generate the bit stream in 1MB chunks or something and just XOR
as needed?

With the provisos above, yes I think that would work though I don't think it's a good idea. Better to start off using the functions directly and then look into optimizing only if they're a bottleneck. As a first pass I'd break it up as separate writes with the encryption happening at write time. If that works fine there's no need to complicate things further.
 
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ 
On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> On Mon, Aug 5, 2019 at 9:02 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Wed, Jul 31, 2019 at 09:25:01AM -0400, Sehrope Sarkuni wrote:
>     > Even if we do not include a separate per-relation salt or things like
>     > relfilenode when generating a derived key, we can still include other
>     types of
>     > immutable attributes. For example the fork type could be included to
>     eventually
>     > allow multiple forks for the same relation to be encrypted with the same
>     IV =
>     > LSN + Page Number as the derived key per-fork would be distinct.
> 
>     Yes, the fork number could be useful in this case.  I was thinking we
>     would just leave the extra bits as zeros and we can then set it to '1'
>     or something else for a different fork.
> 
> 
> Key derivation has more flexibility as you're not limited by the number of
> unused bits in the IV.

Understood, though I was not aware of the usefulness of key derivation
until this thread.

>     > WAL encryption should not use the same key as page encryption so there's
>     no
>     > need to design the IV to try to avoid matching the page IVs. Even a basic
>     > derivation with a single fixed WDEK = HKDF(MDEK, "WAL") and TDEK = HKDF
>     (MDEK,
>     > "PAGE") would ensure separate keys. That's the the literal string "WAL"
>     or
>     > "PAGE" being added as a salt to generate the respective keys, all that
>     matters
>     > is they're different.
> 
>     I was thinking the WAL would use the same key since the nonce is unique
>     between the two.  What value is there in using a different key?
> 
> 
> Never having to worry about overlap in Key + IV usage is main advantage. While
> it's possible to structure IVs to avoid that from happening, it's much easier
> to completely avoid that situation by ensuring different parts of an
> application are using separate derived keys.

Understood.

>     > Ideally WAL encryption would generating new derived keys as part of the
>     WAL
>     > stream. The WAL stream is not fixed so you have the luxury of being able
>     to add
>     > a "Use new random salt XZY going forward" records. Forcing generation of
>     a new
>     > salt/key upon promotion of a replica would ensure that at least the WAL
>     is
>     > unique going forward. Could also generate a new upon server startup,
>     after
> 
>     Ah, yes, good point, and using a derived key would make that easier.
>     The tricky part is what to use to create the new derived key, unless we
>     generate a random number and store that somewhere in the data directory,
>     but that might lead to fragility, so I am worried. 
> 
> 
> Simplest approach for derived keys would be to use immutable attributes of the
> WAL files as an input to the key derivation. Something like HKDF(MDEK, "WAL:" |

So, I am thinking we should use "WAL:" for WAL and "REL:" for heap/index
files.

> | timeline_id || wal_segment_num) should be fine for this as it is:

I considered using the timeline in the nonce, but then remembered that
in timeline switch, we _copy_ the part of the old WAL up to the timeline
switch to the new timeline;  see:


https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb=HEAD#l5502

   * Initialize the starting WAL segment for the new timeline. If the switch
   * happens in the middle of a segment, copy data from the last WAL segment
   * of the old timeline up to the switch point, to the starting WAL segment
   * on the new timeline.

We would need to decrypt/encrypt to do the copy, and just wasn't sure of
the value of the timeline in the nonce.  One value to it is that if
there some WAL that generated after the timeline switch in the old
primary that isn't transfered, there would be potentially new data
encrypted with the same key/nonce in the new primary, but if that WAL is
not used, odds are it is gone/destroyed/inaccessible, or it would have
been used during the switchover, so it didn't seem worth worrying about.

One _big_ reason to add the timeline is if you had a standby that you
recovered and rolled forward only to a specific transaction, then
continued running it as a new primary.  In that case, you would have
different WAL encrypted with the same key/nonce, but that sounds like
the same as promoting two standbys, and we should just document not to
do it.

Maybe we need to consider this further.

> * Unique per WAL file
> * Known prior to writing to a given WAL file
> * Known prior to reading a given WAL file
> * Does not require any additional persistence

Agreed.

>     We have pg_rewind,
>     which allows to make the WAL go backwards.  What is the value in doing
>     this?
> 
> 
> Good point re: pg_rewind. Having key rotation records in the stream would
> complicate that as you'd have to jump back / forward to figure out which key to
> use. It's doable but much more complicated.

Yep.

> A unique WDEK per WAL file that is derived from the segment number would not
> have that problem. A unique key per-file means the IVs can all start at zero
> and the each file can be treated as one encrypted stream. Any encryption/
> decryption code would only need to touch the write/read callsites.

So, I am now wondering when we should be using a non-zero nonce to
start, and when we should be using derived keys.   Should we add the
page-number to the derived key for heap/index files too and just use the
LSN for nonce, or add the LSN to the derived key too?

>     > every N bytes, or a new one for each new WAL file. There's much more
>     > flexibility compared to page encryption.
>     >
>     > As WAL is a single continuous stream, we can start the IV for each
>     derived WAL
>     > key from zero. There's no need to complicate it further as Key + IV will
>     never
>     > be reused.
> 
>     Uh, you want a new random key for each WAL file?  I was going to use the
>     WAL segment number as the nonce, which is always increasing, and easily
>     determined.  The file is 16MB.
> 
> Ideally yes as it would allow for multiple replicas promoted off the same
> primary to immediately diverge as each would have its own keys. I don't
> consider it a requirement but if it's possible without significant added
> complexity I say that's a win.

Yeah, it is probably lots of added complexity, and there would be
duplicates unless we got random numbers for heap/index pages too.  We
would then have to modify the heap/index page format, and then the
non-encrypted format might not fit in the encrypted format, and then we
can't do the conversion offline --- as you can see, the negatives pile
up.
 
> I'm still reading up on the file and record format to understand how complex
> that would be. Though given your point re: pg_rewind and the lack of handling
> for page encryption divergence when promoting multiple replicas, I doubt the
> complexity will be worth it.

Yep, there is _ideal_, and what is reasonable complexity to implement in
Postgres, while still remaining secure.

>     > If WAL is always written as full pages we need to ensure that the empty
>     parts
>     > of the page are actual zeros and not "encrypted zeroes". Otherwise an XOR
>     of
>     > the empty section of the first write of a page against a subsequent one
>     would
>     > give you the plain text.
> 
>     Right, I think we need the segment number as part of the nonce for WAL.
>
> +1 to using segment number but it's better as a derived key instead of coming
> up with new IV constructs and reusing the MDEK.

OK.

>     > The non-fixed size of the WAL allows for the addition of a MAC though I'm
>     not
>     > sure yet the best way to incorporate it. It could be part of each
>     encrypted
>     > record or its own summary record (providing a MAC for a series of WAL
>     records).
>     > After I've gone through this a bit more I'm looking to put together a
>     write up
>     > with this and some other thoughts in one place.
> 
>     I don't think we want to add a MAC at this point since the MAC for 8k
>     pages seems unattainable.
> 
> Even without a per-page MAC, a MAC at some level for WAL has its own benefits
> such as perfect corruption detection. It could be per-record, per-N-records,
> per-checkpoint, or per-file. The current WAL file format already handles
> arbitrary gaps so there is significantly more flexibility in adding it vs
> pages. I'm not saying it should be a requirement but, unlike pages, I would not
> rule it out just yet as it may not be that complicated.

We already have a CRC in the WAL that detects corruption, and that would
be encrypted, so it is a MAC.  It is an int32, so twice as many bits as
the heap/index page CRC --- better, but not great.  It would be pretty
trivial to increase that to 64 bite if desired.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Aug  7, 2019 at 07:40:05PM -0400, Sehrope Sarkuni wrote:
> With the provisos above, yes I think that would work though I don't think it's
> a good idea. Better to start off using the functions directly and then look
> into optimizing only if they're a bottleneck. As a first pass I'd break it up
> as separate writes with the encryption happening at write time. If that works
> fine there's no need to complicate things further.

OK, sounds like a plan!

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
> On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > I agree that all of that isn't necessary for an initial implementation,
> > I was rather trying to lay out how we could improve on this in the
> > future and why having the keying done at a tablespace level makes sense
> > initially because we can then potentially move forward with further
> > segregation to improve the situation.  I do believe it's also useful in
> > its own right, to be clear, just not as nice since a compromised backend
> > could still get access to data in shared buffers that it really
> > shouldn't be able to, even broadly, see.
> 
> I think TDE is feature of questionable value at best and the idea that
> we would fundmentally change the internals of Postgres to add more
> features to it seems very unlikely.  I realize we have to discuss it so
> we don't block reasonable future feature development.

I have a new crazy idea.  I know we concluded that allowing multiple
independent keys, e.g., per user, per table, didn't make sense since
they have to be unlocked all the time, e.g., for crash recovery and
vacuum freeze.

However, that assumes that all heap/index pages are encrypted, and all
of WAL.  What if we encrypted only the user-data part of the page, i.e.,
tuple data.  We left xmin/xmax unencrypted, and only stored the
encrypted part of that data in WAL, and didn't encrypt any more of WAL. 
That might allow crash recovery and the freeze part of VACUUM FREEZE to
work.  (I don't think we could vacuum since we couldn't read the index
pages to find the matching rows since the index values would be encrypted
too.  We might be able to not encrypt the tid in the index typle.)

Is this something considering in version one of this feature?  Probably
not, but later?  Never?  Would the information leakage be too great,
particularly from indexes?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
> > On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > I agree that all of that isn't necessary for an initial implementation,
> > > I was rather trying to lay out how we could improve on this in the
> > > future and why having the keying done at a tablespace level makes sense
> > > initially because we can then potentially move forward with further
> > > segregation to improve the situation.  I do believe it's also useful in
> > > its own right, to be clear, just not as nice since a compromised backend
> > > could still get access to data in shared buffers that it really
> > > shouldn't be able to, even broadly, see.
> >
> > I think TDE is feature of questionable value at best and the idea that
> > we would fundmentally change the internals of Postgres to add more
> > features to it seems very unlikely.  I realize we have to discuss it so
> > we don't block reasonable future feature development.
>
> I have a new crazy idea.  I know we concluded that allowing multiple
> independent keys, e.g., per user, per table, didn't make sense since
> they have to be unlocked all the time, e.g., for crash recovery and
> vacuum freeze.

I'm a bit confused as I never agreed that made any sense and I continue
to feel that it doesn't make sense to have one key for everything.

Crash recovery doesn't happen "all the time" and neither does vacuum
freeze, and autovacuum processes are independent of individual client
backends- we don't need to (and shouldn't) have the keys in shared
memory.

> However, that assumes that all heap/index pages are encrypted, and all
> of WAL.  What if we encrypted only the user-data part of the page, i.e.,
> tuple data.  We left xmin/xmax unencrypted, and only stored the
> encrypted part of that data in WAL, and didn't encrypt any more of WAL.

This is pretty much what Alvaro was suggesting a while ago, isn't it..?
Have just the user data be encrypted in the table and in the WAL stream.

> That might allow crash recovery and the freeze part of VACUUM FREEZE to
> work.  (I don't think we could vacuum since we couldn't read the index
> pages to find the matching rows since the index values would be encrypted
> too.  We might be able to not encrypt the tid in the index typle.)

Why do we need the indexed values to vacuum the index..?  We don't
today, as I recall.  We would need the tids though, yes.

> Is this something considering in version one of this feature?  Probably
> not, but later?  Never?  Would the information leakage be too great,
> particularly from indexes?

What would be leaking from the indexes..?  That an encrypted blob in the
index pointed to a given tid?  Wouldn't someone be able to see that same
information by looking directly at the relation too?

Thanks,

Stephen

Attachment
On Thu, Aug 08, 2019 at 03:07:59PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Bruce Momjian (bruce@momjian.us) wrote:
>> On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
>> > On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
>> > > * Bruce Momjian (bruce@momjian.us) wrote:
>> > > I agree that all of that isn't necessary for an initial implementation,
>> > > I was rather trying to lay out how we could improve on this in the
>> > > future and why having the keying done at a tablespace level makes sense
>> > > initially because we can then potentially move forward with further
>> > > segregation to improve the situation.  I do believe it's also useful in
>> > > its own right, to be clear, just not as nice since a compromised backend
>> > > could still get access to data in shared buffers that it really
>> > > shouldn't be able to, even broadly, see.
>> >
>> > I think TDE is feature of questionable value at best and the idea that
>> > we would fundmentally change the internals of Postgres to add more
>> > features to it seems very unlikely.  I realize we have to discuss it so
>> > we don't block reasonable future feature development.
>>
>> I have a new crazy idea.  I know we concluded that allowing multiple
>> independent keys, e.g., per user, per table, didn't make sense since
>> they have to be unlocked all the time, e.g., for crash recovery and
>> vacuum freeze.
>
>I'm a bit confused as I never agreed that made any sense and I continue
>to feel that it doesn't make sense to have one key for everything.
>
>Crash recovery doesn't happen "all the time" and neither does vacuum
>freeze, and autovacuum processes are independent of individual client
>backends- we don't need to (and shouldn't) have the keys in shared
>memory.
>

Don't people do physical replication / HA pretty much all the time?


>> However, that assumes that all heap/index pages are encrypted, and all
>> of WAL.  What if we encrypted only the user-data part of the page, i.e.,
>> tuple data.  We left xmin/xmax unencrypted, and only stored the
>> encrypted part of that data in WAL, and didn't encrypt any more of WAL.
>
>This is pretty much what Alvaro was suggesting a while ago, isn't it..?
>Have just the user data be encrypted in the table and in the WAL stream.
>

It's also moving us much closer to pgcrypto-style encryption ...

>> That might allow crash recovery and the freeze part of VACUUM FREEZE to
>> work.  (I don't think we could vacuum since we couldn't read the index
>> pages to find the matching rows since the index values would be encrypted
>> too.  We might be able to not encrypt the tid in the index typle.)
>
>Why do we need the indexed values to vacuum the index..?  We don't
>today, as I recall.  We would need the tids though, yes.
>

Well, we also do collect statistics on the data, for example. But even
if we assume we wouldn't do that for encrypted indexes (which seems like
a pretty bad idea to me), you'd probably end up leaking information
about ordering of the values. Which is generally a pretty serious
information leak, AFAICS.

>> Is this something considering in version one of this feature?  Probably
>> not, but later?  Never?  Would the information leakage be too great,
>> particularly from indexes?
>
>What would be leaking from the indexes..?  That an encrypted blob in the
>index pointed to a given tid?  Wouldn't someone be able to see that same
>information by looking directly at the relation too?
>

Ordering of values, for example. Depending on how exactly the data is
encrypted we might also be leaking information about which values are
equal, etc. It also seems quite a bit more expensive to use such index.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Thu, Aug 08, 2019 at 03:07:59PM -0400, Stephen Frost wrote:
> >* Bruce Momjian (bruce@momjian.us) wrote:
> >>On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
> >>> On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> >>> > * Bruce Momjian (bruce@momjian.us) wrote:
> >>> > I agree that all of that isn't necessary for an initial implementation,
> >>> > I was rather trying to lay out how we could improve on this in the
> >>> > future and why having the keying done at a tablespace level makes sense
> >>> > initially because we can then potentially move forward with further
> >>> > segregation to improve the situation.  I do believe it's also useful in
> >>> > its own right, to be clear, just not as nice since a compromised backend
> >>> > could still get access to data in shared buffers that it really
> >>> > shouldn't be able to, even broadly, see.
> >>>
> >>> I think TDE is feature of questionable value at best and the idea that
> >>> we would fundmentally change the internals of Postgres to add more
> >>> features to it seems very unlikely.  I realize we have to discuss it so
> >>> we don't block reasonable future feature development.
> >>
> >>I have a new crazy idea.  I know we concluded that allowing multiple
> >>independent keys, e.g., per user, per table, didn't make sense since
> >>they have to be unlocked all the time, e.g., for crash recovery and
> >>vacuum freeze.
> >
> >I'm a bit confused as I never agreed that made any sense and I continue
> >to feel that it doesn't make sense to have one key for everything.
> >
> >Crash recovery doesn't happen "all the time" and neither does vacuum
> >freeze, and autovacuum processes are independent of individual client
> >backends- we don't need to (and shouldn't) have the keys in shared
> >memory.
>
> Don't people do physical replication / HA pretty much all the time?

Strictly speaking, that isn't actually crash recovery, it's physical
replication / HA, and while those are certainly nice to have it's no
guarantee that they're required or that you'd want to have the same keys
for them- conceptually, at least, you could have WAL with one key that
both sides know and then different keys for the actual data files, if we
go with the approach where the WAL is encrypted with one key and then
otherwise is plaintext.

> >>However, that assumes that all heap/index pages are encrypted, and all
> >>of WAL.  What if we encrypted only the user-data part of the page, i.e.,
> >>tuple data.  We left xmin/xmax unencrypted, and only stored the
> >>encrypted part of that data in WAL, and didn't encrypt any more of WAL.
> >
> >This is pretty much what Alvaro was suggesting a while ago, isn't it..?
> >Have just the user data be encrypted in the table and in the WAL stream.
>
> It's also moving us much closer to pgcrypto-style encryption ...

Yes, it is, and there's good parts and bad parts to that, to be sure.

> >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
> >>work.  (I don't think we could vacuum since we couldn't read the index
> >>pages to find the matching rows since the index values would be encrypted
> >>too.  We might be able to not encrypt the tid in the index typle.)
> >
> >Why do we need the indexed values to vacuum the index..?  We don't
> >today, as I recall.  We would need the tids though, yes.
>
> Well, we also do collect statistics on the data, for example. But even
> if we assume we wouldn't do that for encrypted indexes (which seems like
> a pretty bad idea to me), you'd probably end up leaking information
> about ordering of the values. Which is generally a pretty serious
> information leak, AFAICS.

I agree entirely that order information would be bad to leak- but this
is all new ground here and we haven't actually sorted out what such a
partially encrypted btree would look like.  We don't actually have to
have the down-links in the tree be unencrypted to allow vacuuming of
leaf pages, after all.

> >>Is this something considering in version one of this feature?  Probably
> >>not, but later?  Never?  Would the information leakage be too great,
> >>particularly from indexes?
> >
> >What would be leaking from the indexes..?  That an encrypted blob in the
> >index pointed to a given tid?  Wouldn't someone be able to see that same
> >information by looking directly at the relation too?
>
> Ordering of values, for example. Depending on how exactly the data is
> encrypted we might also be leaking information about which values are
> equal, etc. It also seems quite a bit more expensive to use such index.

Using an encrypted index isn't going to be free.  It's not clear that
this would be much more expensive than if the entire index is encrypted,
or that people would actually be unhappy if there was such an additional
expense if it meant that they could have vacuum run without the keys.

Thanks,

Stephen

Attachment
On Thu, Aug  8, 2019 at 03:07:59PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
> > > On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > I agree that all of that isn't necessary for an initial implementation,
> > > > I was rather trying to lay out how we could improve on this in the
> > > > future and why having the keying done at a tablespace level makes sense
> > > > initially because we can then potentially move forward with further
> > > > segregation to improve the situation.  I do believe it's also useful in
> > > > its own right, to be clear, just not as nice since a compromised backend
> > > > could still get access to data in shared buffers that it really
> > > > shouldn't be able to, even broadly, see.
> > > 
> > > I think TDE is feature of questionable value at best and the idea that
> > > we would fundmentally change the internals of Postgres to add more
> > > features to it seems very unlikely.  I realize we have to discuss it so
> > > we don't block reasonable future feature development.
> > 
> > I have a new crazy idea.  I know we concluded that allowing multiple
> > independent keys, e.g., per user, per table, didn't make sense since
> > they have to be unlocked all the time, e.g., for crash recovery and
> > vacuum freeze.
> 
> I'm a bit confused as I never agreed that made any sense and I continue
> to feel that it doesn't make sense to have one key for everything.

I clearly explained why multiple keys, while desirable, have many
negatives.  If you want to address my replies, we can go over them
again.  What people want, and what we can reasonably accomplish, are two
different things.

> Crash recovery doesn't happen "all the time" and neither does vacuum
> freeze, and autovacuum processes are independent of individual client
> backends- we don't need to (and shouldn't) have the keys in shared
> memory.

Uh, I just don't know what that would look like, honestly.  I am trying
to get us toward something that is easily implemented and easy to
control.
 
> > However, that assumes that all heap/index pages are encrypted, and all
> > of WAL.  What if we encrypted only the user-data part of the page, i.e.,
> > tuple data.  We left xmin/xmax unencrypted, and only stored the
> > encrypted part of that data in WAL, and didn't encrypt any more of WAL. 
> 
> This is pretty much what Alvaro was suggesting a while ago, isn't it..?
> Have just the user data be encrypted in the table and in the WAL stream.

Well, I think he was saying that to reduce the overhead of encryption. 
I didn't see it as a way of allowing recovery and vacuum freeze.  My
exact reply was:

> Well, you would need to decide what WAL information needs to be secured.
> Is the fact an insert was performed on a table a security issue?
> Depends on your risks.  My point is that almost anything you do beyond
> cluster-level encryption either adds complexity that is bug-prone or
> fragile, or adds unacceptable overhead, or leaks security information.

> > That might allow crash recovery and the freeze part of VACUUM FREEZE to
> > work.  (I don't think we could vacuum since we couldn't read the index
> > pages to find the matching rows since the index values would be encrypted
> > too.  We might be able to not encrypt the tid in the index typle.)
> 
> Why do we need the indexed values to vacuum the index..?  We don't
> today, as I recall.  We would need the tids though, yes.

Uh, well, if we are doing index cleaning by doing a sequential scan of
the index, which I think we have done for many years, I think just
looking at the tids should work.  However, I don't know if we ever
adjust index entries, like re-balance the trees.

> > Is this something considering in version one of this feature?  Probably
> > not, but later?  Never?  Would the information leakage be too great,
> > particularly from indexes?
> 
> What would be leaking from the indexes..?  That an encrypted blob in the
> index pointed to a given tid?  Wouldn't someone be able to see that same
> information by looking directly at the relation too?

Well, I assume we would encrypt the heap and its indexes.  For example,
if there is an employee table, and there is an index on the employee
last name and employee salary, it would be trivial to get a list of
employee salaries sorted by last name by just joining the tids, though
you would not know the last names.  That seems like an information leak
to me.  Plus, which tables were updated would be visible in WAL.  And we
would have issues with system tables, pg_statistics, and lots of other
complexity.

I can see value in eventually doing this, perhaps before we perform
cluster-wide encryption, but doing it without cluster-wide encryption
seems like it would leak too much information to be useful.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug  8, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
> > >Crash recovery doesn't happen "all the time" and neither does vacuum
> > >freeze, and autovacuum processes are independent of individual client
> > >backends- we don't need to (and shouldn't) have the keys in shared
> > >memory.
> > 
> > Don't people do physical replication / HA pretty much all the time?
> 
> Strictly speaking, that isn't actually crash recovery, it's physical
> replication / HA, and while those are certainly nice to have it's no
> guarantee that they're required or that you'd want to have the same keys
> for them- conceptually, at least, you could have WAL with one key that
> both sides know and then different keys for the actual data files, if we
> go with the approach where the WAL is encrypted with one key and then
> otherwise is plaintext.

Uh, yes, you could have two encryption keys in the data directory, one
for heap/indexes, one for WAL, both unlocked with the same passphrase,
but what would be the value in that?

> > >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
> > >>work.  (I don't think we could vacuum since we couldn't read the index
> > >>pages to find the matching rows since the index values would be encrypted
> > >>too.  We might be able to not encrypt the tid in the index typle.)
> > >
> > >Why do we need the indexed values to vacuum the index..?  We don't
> > >today, as I recall.  We would need the tids though, yes.
> > 
> > Well, we also do collect statistics on the data, for example. But even
> > if we assume we wouldn't do that for encrypted indexes (which seems like
> > a pretty bad idea to me), you'd probably end up leaking information
> > about ordering of the values. Which is generally a pretty serious
> > information leak, AFAICS.
> 
> I agree entirely that order information would be bad to leak- but this
> is all new ground here and we haven't actually sorted out what such a
> partially encrypted btree would look like.  We don't actually have to
> have the down-links in the tree be unencrypted to allow vacuuming of
> leaf pages, after all.

Agreed, but I think we kind of know that the value in cluster-wide
encryption is different from multi-key encryption --- both have their
value, but right now cluster-wide is the easiest and simplest, and
probably meets more user needs than multi-key encryption.  If others
want to start scoping out what multi-key encryption would look like, we
can discuss it.  I personally would like to focus on cluster-wide
encryption for PG 13.

> > >>Is this something considering in version one of this feature?  Probably
> > >>not, but later?  Never?  Would the information leakage be too great,
> > >>particularly from indexes?
> > >
> > >What would be leaking from the indexes..?  That an encrypted blob in the
> > >index pointed to a given tid?  Wouldn't someone be able to see that same
> > >information by looking directly at the relation too?
> > 
> > Ordering of values, for example. Depending on how exactly the data is
> > encrypted we might also be leaking information about which values are
> > equal, etc. It also seems quite a bit more expensive to use such index.
> 
> Using an encrypted index isn't going to be free.  It's not clear that
> this would be much more expensive than if the entire index is encrypted,
> or that people would actually be unhappy if there was such an additional
> expense if it meant that they could have vacuum run without the keys.

Yes, I think it is information leakage that is always going to make
multi-key unable to fulfill all the features of cluster-wide encryption.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug 8, 2019 at 2:16 PM Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> Simplest approach for derived keys would be to use immutable attributes of the
> WAL files as an input to the key derivation. Something like HKDF(MDEK, "WAL:" |

So, I am thinking we should use "WAL:" for WAL and "REL:" for heap/index
files.

Sounds good. Any unique convention is fine. Main thing to keep in mind is that they're directly tied to the master key so it's not possible to rotate them without changing the master key.

This is in contrast to saving a WDEK key to a file (similar to how the MDEK key would be saved) and unlocking it with the MDEK. That has more moving parts but would allow that key to be independent of the MDEK. In a later message Stephen refers to an example of a replica receiving encrypted WAL and applying it with a different MDEK for the page buffers. That's doable with an independent WDEK.
 
> | timeline_id || wal_segment_num) should be fine for this as it is:

I considered using the timeline in the nonce, but then remembered that
in timeline switch, we _copy_ the part of the old WAL up to the timeline
switch to the new timeline;  see:

    https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb=HEAD#l5502

   * Initialize the starting WAL segment for the new timeline. If the switch
   * happens in the middle of a segment, copy data from the last WAL segment
   * of the old timeline up to the switch point, to the starting WAL segment
   * on the new timeline.

We would need to decrypt/encrypt to do the copy, and just wasn't sure of
the value of the timeline in the nonce.  One value to it is that if
there some WAL that generated after the timeline switch in the old
primary that isn't transfered, there would be potentially new data
encrypted with the same key/nonce in the new primary, but if that WAL is
not used, odds are it is gone/destroyed/inaccessible, or it would have
been used during the switchover, so it didn't seem worth worrying about.

One _big_ reason to add the timeline is if you had a standby that you
recovered and rolled forward only to a specific transaction, then
continued running it as a new primary.  In that case, you would have
different WAL encrypted with the same key/nonce, but that sounds like
the same as promoting two standbys, and we should just document not to
do it.

Maybe we need to consider this further.

Good points. Yes, anything short of generating a new key at promotion time will have these issues. If we're not going to do that, no point in adding the timeline id if it does not change anything. I had thought only the combo was unique but sounds like the segment number is unique on its own. One thing I like about a unique per-file key is that it simplifies the IV generation (i.e. can start at zero).

What about discarding the rest of the WAL file at promotion and skipping to a new file? With a random per-file key in the first page header would ensure that going forward all WAL data is encrypted differently. Combine that with independent WAL and MDEK keys and everything would be different between two replicas promoted from the same point.
 
> A unique WDEK per WAL file that is derived from the segment number would not
> have that problem. A unique key per-file means the IVs can all start at zero
> and the each file can be treated as one encrypted stream. Any encryption/
> decryption code would only need to touch the write/read callsites.

So, I am now wondering when we should be using a non-zero nonce to
start, and when we should be using derived keys.   Should we add the
page-number to the derived key for heap/index files too and just use the
LSN for nonce, or add the LSN to the derived key too?

The main cost of using multiple keys is that you need to derive or unlock them for each usage.

A per-type, per-relation, or per-file derived key with the same non-repeating guarantees for the IV (ex: LSN + Page Number) is as secure but allows for caching all needed derived keys in memory (it's one per open file descriptor).

Having page-level derived keys incorporating the LSN + Page Number and starting the per-page IV at zero works, but you'd have to perform an HKDF for each page read or write. A cache of those derived keys would be much larger (32-bytes per page) so presumably you're not going to have them all cached or maybe not bother with any caching,
  
> Even without a per-page MAC, a MAC at some level for WAL has its own benefits
> such as perfect corruption detection. It could be per-record, per-N-records,
> per-checkpoint, or per-file. The current WAL file format already handles
> arbitrary gaps so there is significantly more flexibility in adding it vs
> pages. I'm not saying it should be a requirement but, unlike pages, I would not
> rule it out just yet as it may not be that complicated.

We already have a CRC in the WAL that detects corruption, and that would
be encrypted, so it is a MAC. 

Encrypting a CRC does not make it a cryptographic MAC. It'd have problems similar to those discussed for the per-page CRC though it'd still be useful for basic corruption detection.
 
It is an int32, so twice as many bits as
the heap/index page CRC --- better, but not great. It would be pretty
trivial to increase that to 64 bite if desired.

"64 bite" is referring to "bit" or "byte"? ;-) I'm guessing bits...

For the WAL record CRC I think it makes sense to keep the shared wal buffer format in place and leave it on the plaintext (rather than on the cipher text like is being proposed for the page buffers). The WAL records are not fixed length so if the rest of the stream is encrypted there no way for a program without the key to be able to figure out the record offsets. Would need *some* information like the record size left in plaintext. Not sure the implications of that.

I still think there could be a separate full MAC at some aggregated level. Per-page seems like a good fit as that's how the writes happen and it could be calculated just before the actual per-page write.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Thu, Aug 8, 2019 at 6:31 PM Stephen Frost <sfrost@snowman.net> wrote:
Strictly speaking, that isn't actually crash recovery, it's physical
replication / HA, and while those are certainly nice to have it's no
guarantee that they're required or that you'd want to have the same keys
for them- conceptually, at least, you could have WAL with one key that
both sides know and then different keys for the actual data files, if we
go with the approach where the WAL is encrypted with one key and then
otherwise is plaintext.

I like the idea of separating the WAL key from the rest of the data files.  It'd all be unlocked by the MDEK and you'd still need derived keys per WAL-file, but disconnecting all that from the data files solves a lot of the problems with promoted replicas.

This would complicate cloning a replica as using a different MDEK would involve decrypting / encrypting everything rather than just copying the files. Even if that's not baked in a first version, the separation allows for eventually supporting that.
 
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
On Fri, Aug 9, 2019 at 10:25 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Aug  8, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
> > > >Crash recovery doesn't happen "all the time" and neither does vacuum
> > > >freeze, and autovacuum processes are independent of individual client
> > > >backends- we don't need to (and shouldn't) have the keys in shared
> > > >memory.
> > >
> > > Don't people do physical replication / HA pretty much all the time?
> >
> > Strictly speaking, that isn't actually crash recovery, it's physical
> > replication / HA, and while those are certainly nice to have it's no
> > guarantee that they're required or that you'd want to have the same keys
> > for them- conceptually, at least, you could have WAL with one key that
> > both sides know and then different keys for the actual data files, if we
> > go with the approach where the WAL is encrypted with one key and then
> > otherwise is plaintext.
>
> Uh, yes, you could have two encryption keys in the data directory, one
> for heap/indexes, one for WAL, both unlocked with the same passphrase,
> but what would be the value in that?
>
> > > >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
> > > >>work.  (I don't think we could vacuum since we couldn't read the index
> > > >>pages to find the matching rows since the index values would be encrypted
> > > >>too.  We might be able to not encrypt the tid in the index typle.)
> > > >
> > > >Why do we need the indexed values to vacuum the index..?  We don't
> > > >today, as I recall.  We would need the tids though, yes.
> > >
> > > Well, we also do collect statistics on the data, for example. But even
> > > if we assume we wouldn't do that for encrypted indexes (which seems like
> > > a pretty bad idea to me), you'd probably end up leaking information
> > > about ordering of the values. Which is generally a pretty serious
> > > information leak, AFAICS.
> >
> > I agree entirely that order information would be bad to leak- but this
> > is all new ground here and we haven't actually sorted out what such a
> > partially encrypted btree would look like.  We don't actually have to
> > have the down-links in the tree be unencrypted to allow vacuuming of
> > leaf pages, after all.
>
> Agreed, but I think we kind of know that the value in cluster-wide
> encryption is different from multi-key encryption --- both have their
> value, but right now cluster-wide is the easiest and simplest, and
> probably meets more user needs than multi-key encryption.  If others
> want to start scoping out what multi-key encryption would look like, we
> can discuss it.  I personally would like to focus on cluster-wide
> encryption for PG 13.

I agree that cluster-wide is more simpler but I'm not sure that it
meets real needs from users. One example is re-encryption; when the
key leakage happens, in cluster-wide encryption we end up with doing
re-encrypt whole database regardless the amount of user sensitive data
in database. I think it's a big constraint for users because it's
common that the amount of data such as master table that needs to be
encrypted doesn't account for a large potion of database. That's one
reason why I think more fine granularity encryption such as
table/tablespace is required.

And in terms of feature development we would implement
fine-granularity encryption in the future even if the first step is
cluster-wide encryption? And both TDEs encrypt the same kind of
database objects (i.e. only  tables , indexes and WAL)? If so, how
does users  use them depending on cases?

I imagined the case where we had the cluster-wide encryption as the
first TDE feature. We will enable TDE at initdb time by specifying the
command-line parameter for TDE. Then TDE is enabled in cluster wide
and all tables/indexes and WAL are automatically encrypted. Then, if
we want to implement the more fine granularity encryption how can we
make users use it? WAL encryption and tables/index encryption are
enabled at the same time but we want to enable encryption for
particular tables/indexes after initdb. If the cluster-wide encryption
is something like a short-cut of encrypting all tables/indexes, I
personally think that implementing the more fine granularity one first
and then using it to achieve the more coarse granularity would be more
easier.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, Aug 08, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On Thu, Aug 08, 2019 at 03:07:59PM -0400, Stephen Frost wrote:
>> >* Bruce Momjian (bruce@momjian.us) wrote:
>> >>On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
>> >>> On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
>> >>> > * Bruce Momjian (bruce@momjian.us) wrote:
>> >>> > I agree that all of that isn't necessary for an initial implementation,
>> >>> > I was rather trying to lay out how we could improve on this in the
>> >>> > future and why having the keying done at a tablespace level makes sense
>> >>> > initially because we can then potentially move forward with further
>> >>> > segregation to improve the situation.  I do believe it's also useful in
>> >>> > its own right, to be clear, just not as nice since a compromised backend
>> >>> > could still get access to data in shared buffers that it really
>> >>> > shouldn't be able to, even broadly, see.
>> >>>
>> >>> I think TDE is feature of questionable value at best and the idea that
>> >>> we would fundmentally change the internals of Postgres to add more
>> >>> features to it seems very unlikely.  I realize we have to discuss it so
>> >>> we don't block reasonable future feature development.
>> >>
>> >>I have a new crazy idea.  I know we concluded that allowing multiple
>> >>independent keys, e.g., per user, per table, didn't make sense since
>> >>they have to be unlocked all the time, e.g., for crash recovery and
>> >>vacuum freeze.
>> >
>> >I'm a bit confused as I never agreed that made any sense and I continue
>> >to feel that it doesn't make sense to have one key for everything.
>> >
>> >Crash recovery doesn't happen "all the time" and neither does vacuum
>> >freeze, and autovacuum processes are independent of individual client
>> >backends- we don't need to (and shouldn't) have the keys in shared
>> >memory.
>>
>> Don't people do physical replication / HA pretty much all the time?
>
>Strictly speaking, that isn't actually crash recovery, it's physical
>replication / HA, and while those are certainly nice to have it's no
>guarantee that they're required or that you'd want to have the same keys
>for them- conceptually, at least, you could have WAL with one key that
>both sides know and then different keys for the actual data files, if we
>go with the approach where the WAL is encrypted with one key and then
>otherwise is plaintext.
>

Uh? IMHO not breaking physical replication / HA should be pretty much
required for any new feature, unless it's somehow obviously clear that
it's not needed for that particular feature. I very much doubt we can make
that conclusion for encrypted instances (at least I don't see why it would
be the case in general).

One reason is that those features are also used for backups, which I hope
we both agree is not an optional feature. Maybe it's possible to modify
pg_basebackup to re-encrypt all the data, but to do that it clearly needs
to know all encryption keys (although not necessarily on the same side).

>> >>However, that assumes that all heap/index pages are encrypted, and all
>> >>of WAL.  What if we encrypted only the user-data part of the page, i.e.,
>> >>tuple data.  We left xmin/xmax unencrypted, and only stored the
>> >>encrypted part of that data in WAL, and didn't encrypt any more of WAL.
>> >
>> >This is pretty much what Alvaro was suggesting a while ago, isn't it..?
>> >Have just the user data be encrypted in the table and in the WAL stream.
>>
>> It's also moving us much closer to pgcrypto-style encryption ...
>
>Yes, it is, and there's good parts and bad parts to that, to be sure.
>
>> >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
>> >>work.  (I don't think we could vacuum since we couldn't read the index
>> >>pages to find the matching rows since the index values would be encrypted
>> >>too.  We might be able to not encrypt the tid in the index typle.)
>> >
>> >Why do we need the indexed values to vacuum the index..?  We don't
>> >today, as I recall.  We would need the tids though, yes.
>>
>> Well, we also do collect statistics on the data, for example. But even
>> if we assume we wouldn't do that for encrypted indexes (which seems like
>> a pretty bad idea to me), you'd probably end up leaking information
>> about ordering of the values. Which is generally a pretty serious
>> information leak, AFAICS.
>
>I agree entirely that order information would be bad to leak- but this
>is all new ground here and we haven't actually sorted out what such a
>partially encrypted btree would look like.  We don't actually have to
>have the down-links in the tree be unencrypted to allow vacuuming of
>leaf pages, after all.
>

Well, I'm not all that familiar with the btree code, but I still think you
can deduce an awful amount of information from having the leaf pages alone
(not sure if we could deduce a total order, but presumably yes).

>> >>Is this something considering in version one of this feature?  Probably
>> >>not, but later?  Never?  Would the information leakage be too great,
>> >>particularly from indexes?
>> >
>> >What would be leaking from the indexes..?  That an encrypted blob in the
>> >index pointed to a given tid?  Wouldn't someone be able to see that same
>> >information by looking directly at the relation too?
>>
>> Ordering of values, for example. Depending on how exactly the data is
>> encrypted we might also be leaking information about which values are
>> equal, etc. It also seems quite a bit more expensive to use such index.
>
>Using an encrypted index isn't going to be free.  It's not clear that
>this would be much more expensive than if the entire index is encrypted,
>or that people would actually be unhappy if there was such an additional
>expense if it meant that they could have vacuum run without the keys.
>

With whole-page encryption, the page would be decrypted when loading it
into shared buffers, and then accessed without encryption/decryption (at
least that's how it was proposed initially). I assume we wouldn't do that
when only encrypting the index keys (because that would mean anything that
accesses the index through shared buffers has to do the decryption,
including autovacuum et al). Which means you have to do decryption on each
index access (which you previously did not). IMHO that's a pretty clear
and significant additional overhead.

I know there were proposals to keep it encrypted in shared buffers, but
I'm not sure that's what we'll end up doing (I have not followed the
recent discussion all that closely, though).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Fri, Aug 09, 2019 at 11:51:23PM +0900, Masahiko Sawada wrote:
>On Fri, Aug 9, 2019 at 10:25 AM Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Thu, Aug  8, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
>> > > >Crash recovery doesn't happen "all the time" and neither does vacuum
>> > > >freeze, and autovacuum processes are independent of individual client
>> > > >backends- we don't need to (and shouldn't) have the keys in shared
>> > > >memory.
>> > >
>> > > Don't people do physical replication / HA pretty much all the time?
>> >
>> > Strictly speaking, that isn't actually crash recovery, it's physical
>> > replication / HA, and while those are certainly nice to have it's no
>> > guarantee that they're required or that you'd want to have the same keys
>> > for them- conceptually, at least, you could have WAL with one key that
>> > both sides know and then different keys for the actual data files, if we
>> > go with the approach where the WAL is encrypted with one key and then
>> > otherwise is plaintext.
>>
>> Uh, yes, you could have two encryption keys in the data directory, one
>> for heap/indexes, one for WAL, both unlocked with the same passphrase,
>> but what would be the value in that?
>>
>> > > >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
>> > > >>work.  (I don't think we could vacuum since we couldn't read the index
>> > > >>pages to find the matching rows since the index values would be encrypted
>> > > >>too.  We might be able to not encrypt the tid in the index typle.)
>> > > >
>> > > >Why do we need the indexed values to vacuum the index..?  We don't
>> > > >today, as I recall.  We would need the tids though, yes.
>> > >
>> > > Well, we also do collect statistics on the data, for example. But even
>> > > if we assume we wouldn't do that for encrypted indexes (which seems like
>> > > a pretty bad idea to me), you'd probably end up leaking information
>> > > about ordering of the values. Which is generally a pretty serious
>> > > information leak, AFAICS.
>> >
>> > I agree entirely that order information would be bad to leak- but this
>> > is all new ground here and we haven't actually sorted out what such a
>> > partially encrypted btree would look like.  We don't actually have to
>> > have the down-links in the tree be unencrypted to allow vacuuming of
>> > leaf pages, after all.
>>
>> Agreed, but I think we kind of know that the value in cluster-wide
>> encryption is different from multi-key encryption --- both have their
>> value, but right now cluster-wide is the easiest and simplest, and
>> probably meets more user needs than multi-key encryption.  If others
>> want to start scoping out what multi-key encryption would look like, we
>> can discuss it.  I personally would like to focus on cluster-wide
>> encryption for PG 13.
>
>I agree that cluster-wide is more simpler but I'm not sure that it
>meets real needs from users. One example is re-encryption; when the
>key leakage happens, in cluster-wide encryption we end up with doing
>re-encrypt whole database regardless the amount of user sensitive data
>in database. I think it's a big constraint for users because it's
>common that the amount of data such as master table that needs to be
>encrypted doesn't account for a large potion of database. That's one
>reason why I think more fine granularity encryption such as
>table/tablespace is required.
>

TBH I think it's mostly pointless to design for key leakage.

My understanding it that all this work is motivated by the assumption that
Bob can obtain access to the data directory (say, a backup of it). So if
he also manages to get access to the encryption key, we probably have to
assume he already has access to current snapshot of the data directory,
which means any re-encryption is pretty futile.

What we can (and should) optimize for is key rotation, but as that only
changes the master key and not the actual encryption keys, the overhead is
pretty low.

We can of course support "forced" re-encryption, but I think it's
acceptable if that's fairly expensive as long as it can be throttled and
executed in the background (kinda similar to the patch to enable checksums
in the background).

>And in terms of feature development we would implement
>fine-granularity encryption in the future even if the first step is
>cluster-wide encryption? And both TDEs encrypt the same kind of
>database objects (i.e. only  tables , indexes and WAL)? If so, how
>does users  use them depending on cases?
>
>I imagined the case where we had the cluster-wide encryption as the
>first TDE feature. We will enable TDE at initdb time by specifying the
>command-line parameter for TDE. Then TDE is enabled in cluster wide
>and all tables/indexes and WAL are automatically encrypted. Then, if
>we want to implement the more fine granularity encryption how can we
>make users use it? WAL encryption and tables/index encryption are
>enabled at the same time but we want to enable encryption for
>particular tables/indexes after initdb. If the cluster-wide encryption
>is something like a short-cut of encrypting all tables/indexes, I
>personally think that implementing the more fine granularity one first
>and then using it to achieve the more coarse granularity would be more
>easier.
>

Not sure, but I'd expect it to be the other way around, i.e. the more
granular encryption being more complicated. One reason is that with
cluster-wide you can just assume everything is encrypted and handle it the
same way, while with fine-grained encryption you need to whether each
individual object is encrypted, maybe handle it in different ways, etc.

But that's just my guess, really.

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Fri, Aug  9, 2019 at 11:51:23PM +0900, Masahiko Sawada wrote:
> I agree that cluster-wide is more simpler but I'm not sure that it
> meets real needs from users. One example is re-encryption; when the
> key leakage happens, in cluster-wide encryption we end up with doing
> re-encrypt whole database regardless the amount of user sensitive data
> in database. I think it's a big constraint for users because it's
> common that the amount of data such as master table that needs to be
> encrypted doesn't account for a large potion of database. That's one
> reason why I think more fine granularity encryption such as
> table/tablespace is required.
> 
> And in terms of feature development we would implement
> fine-granularity encryption in the future even if the first step is
> cluster-wide encryption? And both TDEs encrypt the same kind of
> database objects (i.e. only  tables , indexes and WAL)? If so, how
> does users  use them depending on cases?
> 
> I imagined the case where we had the cluster-wide encryption as the
> first TDE feature. We will enable TDE at initdb time by specifying the
> command-line parameter for TDE. Then TDE is enabled in cluster wide
> and all tables/indexes and WAL are automatically encrypted. Then, if
> we want to implement the more fine granularity encryption how can we
> make users use it? WAL encryption and tables/index encryption are
> enabled at the same time but we want to enable encryption for
> particular tables/indexes after initdb. If the cluster-wide encryption
> is something like a short-cut of encrypting all tables/indexes, I
> personally think that implementing the more fine granularity one first
> and then using it to achieve the more coarse granularity would be more
> easier.

I don't know how to move this thread forward, so I am going to try to
explain how I view it.  People want feature X and feature Y.  Feature X
seems straight-forward to implement, use, and seems secure.  Feature Y
is none of those, so far.

When I explain that feature X is the direction we should go in for PG
13, people give more reasons they want feature Y, but don't give any
details about how the problems of implemention, use, and security can be
addressed.

I just don't see that continuing to discuss feature Y with no details in
how to implement it really helps, so I have no reply to these ideas. 
People can talk about whatever feature they want on these lists, but I
have no way to help discussions that don't address facts.

I will close with what I have already stated, that what people want, and
what we can reasonably accomplish, are two different things.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug  8, 2019 at 10:34:26PM -0400, Sehrope Sarkuni wrote:
> On Thu, Aug 8, 2019 at 6:31 PM Stephen Frost <sfrost@snowman.net> wrote:
> 
>     Strictly speaking, that isn't actually crash recovery, it's physical
>     replication / HA, and while those are certainly nice to have it's no
>     guarantee that they're required or that you'd want to have the same keys
>     for them- conceptually, at least, you could have WAL with one key that
>     both sides know and then different keys for the actual data files, if we
>     go with the approach where the WAL is encrypted with one key and then
>     otherwise is plaintext.
> 
> 
> I like the idea of separating the WAL key from the rest of the data files. 
> It'd all be unlocked by the MDEK and you'd still need derived keys per
> WAL-file, but disconnecting all that from the data files solves a lot of the
> problems with promoted replicas.
> 
> This would complicate cloning a replica as using a different MDEK would involve
> decrypting / encrypting everything rather than just copying the files. Even if
> that's not baked in a first version, the separation allows for eventually
> supporting that.

OK, I can get behind that idea.  One cool idea would be for the WAL on
primary and standbys to use the same WAL key, but to use different
heap/index keys.  When the standby is promoted, there would be a way for
the WAL to start using a new encryption key, and the heap/index would
already be using its own encryption key.

Setting up such a system seems complicated.  The big problem is that the
base backup would use the primary key, unless we allowed  pg_basebackup
to decrypt/encrypt with a new heap/index key.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Aug  9, 2019 at 05:01:47PM +0200, Tomas Vondra wrote:
> I know there were proposals to keep it encrypted in shared buffers, but
> I'm not sure that's what we'll end up doing (I have not followed the
> recent discussion all that closely, though).

There is no plan to encrypt shared buffers.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug  8, 2019 at 10:17:53PM -0400, Sehrope Sarkuni wrote:
> On Thu, Aug 8, 2019 at 2:16 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
>     > Simplest approach for derived keys would be to use immutable attributes
>     of the
>     > WAL files as an input to the key derivation. Something like HKDF(MDEK,
>     "WAL:" |
> 
>     So, I am thinking we should use "WAL:" for WAL and "REL:" for heap/index
>     files.
> 
> 
> Sounds good. Any unique convention is fine. Main thing to keep in mind is that
> they're directly tied to the master key so it's not possible to rotate them
> without changing the master key.

A recent email talked about using two different encryption keys for
heap/index and WAL, which allows for future features, and allows for key
rotation of the two independently.  (I already stated how hard key
rotation would be with WAL and pg_rewind.)

> This is in contrast to saving a WDEK key to a file (similar to how the MDEK key
> would be saved) and unlocking it with the MDEK. That has more moving parts but
> would allow that key to be independent of the MDEK. In a later message Stephen
> refers to an example of a replica receiving encrypted WAL and applying it with
> a different MDEK for the page buffers. That's doable with an independent WDEK.

I assumed we would call a command on boot to get a passphrase, which
would unlock the encryption keys.  Is there more being described above?

>     > | timeline_id || wal_segment_num) should be fine for this as it is:
> 
>     I considered using the timeline in the nonce, but then remembered that
>     in timeline switch, we _copy_ the part of the old WAL up to the timeline
>     switch to the new timeline;  see:
> 
>         https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/
>     backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb
>     =HEAD#l5502
> 
>        * Initialize the starting WAL segment for the new timeline. If the
>     switch
>        * happens in the middle of a segment, copy data from the last WAL
>     segment
>        * of the old timeline up to the switch point, to the starting WAL
>     segment
>        * on the new timeline.
> 
>     We would need to decrypt/encrypt to do the copy, and just wasn't sure of
>     the value of the timeline in the nonce.  One value to it is that if
>     there some WAL that generated after the timeline switch in the old
>     primary that isn't transfered, there would be potentially new data
>     encrypted with the same key/nonce in the new primary, but if that WAL is
>     not used, odds are it is gone/destroyed/inaccessible, or it would have
>     been used during the switchover, so it didn't seem worth worrying about.
> 
>     One _big_ reason to add the timeline is if you had a standby that you
>     recovered and rolled forward only to a specific transaction, then
>     continued running it as a new primary.  In that case, you would have
>     different WAL encrypted with the same key/nonce, but that sounds like
>     the same as promoting two standbys, and we should just document not to
>     do it.
> 
>     Maybe we need to consider this further.
> 
> 
> Good points. Yes, anything short of generating a new key at promotion time will
> have these issues. If we're not going to do that, no point in adding the
> timeline id if it does not change anything. I had thought only the combo was
> unique but sounds like the segment number is unique on its own. One thing I
> like about a unique per-file key is that it simplifies the IV generation (i.e.
> can start at zero).

Uh, well, the segment number is unique, _except_ for a timeline switch,
where the segment number exists in the old timeline file, and in the new
timeline file, though the new timeline file has more WAL because it has
writes only in the new timeline.  So, in a way, the WAL data is the same
in the old and new timeline files, e.g. 000000010000000000000001 and
000000020000000000000001, but 000000010000000000000001 stops at the
timeline switch and 000000020000000000000001 has more WAL data that
represents cluster activity since the WAL switch, though the two files
are both 16MB.

> What about discarding the rest of the WAL file at promotion and skipping to a
> new file? With a random per-file key in the first page header would ensure that
> going forward all WAL data is encrypted differently. Combine that with
> independent WAL and MDEK keys and everything would be different between two
> replicas promoted from the same point.

Yes, we could do that, but I am hesitant to change the WAL format just
for this, unless there is value to the random number.  We already
discussed that the LSN could be duplicated in different heap/index
files, so we would still have issues.

>     > A unique WDEK per WAL file that is derived from the segment number would
>     not
>     > have that problem. A unique key per-file means the IVs can all start at
>     zero
>     > and the each file can be treated as one encrypted stream. Any encryption/
>     > decryption code would only need to touch the write/read callsites.
> 
>     So, I am now wondering when we should be using a non-zero nonce to
>     start, and when we should be using derived keys.   Should we add the
>     page-number to the derived key for heap/index files too and just use the
>     LSN for nonce, or add the LSN to the derived key too?
> 
> 
> The main cost of using multiple keys is that you need to derive or unlock them
> for each usage.

Right.

> A per-type, per-relation, or per-file derived key with the same non-repeating
> guarantees for the IV (ex: LSN + Page Number) is as secure but allows for
> caching all needed derived keys in memory (it's one per open file descriptor).

Yes, that is why I was trying to use a single encryption key.

> Having page-level derived keys incorporating the LSN + Page Number and starting
> the per-page IV at zero works, but you'd have to perform an HKDF for each page
> read or write. A cache of those derived keys would be much larger (32-bytes per
> page) so presumably you're not going to have them all cached or maybe not
> bother with any caching,

The good news is that the heap/index writes mostly happen in the
background.  I think we are good just using the IV of LSN + Page Number
so we can keep encrypting simple, right?

>     > Even without a per-page MAC, a MAC at some level for WAL has its own
>     benefits
>     > such as perfect corruption detection. It could be per-record,
>     per-N-records,
>     > per-checkpoint, or per-file. The current WAL file format already handles
>     > arbitrary gaps so there is significantly more flexibility in adding it vs
>     > pages. I'm not saying it should be a requirement but, unlike pages, I
>     would not
>     > rule it out just yet as it may not be that complicated.
> 
>     We already have a CRC in the WAL that detects corruption, and that would
>     be encrypted, so it is a MAC. 
> 
> 
> Encrypting a CRC does not make it a cryptographic MAC. It'd have problems
> similar to those discussed for the per-page CRC though it'd still be useful for
> basic corruption detection.

OK. I thought only the CRC length was the problem for using it for a MAC.
  
> 
>     It is an int32, so twice as many bits as
>     the heap/index page CRC --- better, but not great. It would be pretty
> 
>     trivial to increase that to 64 bite if desired.
> 
> 
> "64 bite" is referring to "bit" or "byte"? ;-) I'm guessing bits...

Sorry, bits.

> For the WAL record CRC I think it makes sense to keep the shared wal buffer
> format in place and leave it on the plaintext (rather than on the cipher text
> like is being proposed for the page buffers). The WAL records are not fixed
> length so if the rest of the stream is encrypted there no way for a program
> without the key to be able to figure out the record offsets. Would need *some*
> information like the record size left in plaintext. Not sure the implications
> of that.

OK, I am assuming we can encrypt when the WAL buffers are written to the
file system.

> I still think there could be a separate full MAC at some aggregated level.
> Per-page seems like a good fit as that's how the writes happen and it could be
> calculated just before the actual per-page write.

Well, since people can re-insert old pages without detection, and
because adding a per-page MAC to heap/index would change the page
format, meaning off-line encryption of pages would be very hard, it just
doesn't seem worth it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Aug  9, 2019 at 10:54:51PM -0400, Bruce Momjian wrote:
> On Thu, Aug  8, 2019 at 10:17:53PM -0400, Sehrope Sarkuni wrote:
> > On Thu, Aug 8, 2019 at 2:16 PM Bruce Momjian <bruce@momjian.us> wrote:
> > 
> >     On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> >     > Simplest approach for derived keys would be to use immutable attributes
> >     of the
> >     > WAL files as an input to the key derivation. Something like HKDF(MDEK,
> >     "WAL:" |
> > 
> >     So, I am thinking we should use "WAL:" for WAL and "REL:" for heap/index
> >     files.
> > 
> > 
> > Sounds good. Any unique convention is fine. Main thing to keep in mind is that
> > they're directly tied to the master key so it's not possible to rotate them
> > without changing the master key.
> 
> A recent email talked about using two different encryption keys for
> heap/index and WAL, which allows for future features, and allows for key
> rotation of the two independently.  (I already stated how hard key
> rotation would be with WAL and pg_rewind.)

So, I just had an indea if we use separate encryption keys for
heap/index and for WAL --- we already know we will have an offline tool
that can rotate the passphrase or encryption keys.  If we allow the
encryption keys to be rotated independently, we can create a standby,
and immediately rotate its heap/index encryption key.  We can then start
streaming replication.  When we promote the standby to primary, we can
then shut it down and rotate the WAL encryption key --- the new primary
would then have no shared keys with the old primary.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> On Mon, Aug 5, 2019 at 9:02 PM Bruce Momjian <bruce@momjian.us> wrote:
>     I was thinking the WAL would use the same key since the nonce is unique
>     between the two.  What value is there in using a different key?

> Never having to worry about overlap in Key + IV usage is main advantage. While
> it's possible to structure IVs to avoid that from happening, it's much easier
> to completely avoid that situation by ensuring different parts of an
> application are using separate derived keys.

Now that we are considering a different encryption key for heap/index
files and WAL, so there is no chance of overlap, it seems we can go back
to using a non-zero IV rather than derived keys.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> Attached the draft version patch sets of per tablespace transparent
> data at rest encryption. The patch doesn't support full functionality,
> it includes:
>
> * Per tablespace encryption
> * Encryption and decryption buffer data when disk I/O.
> * 2 tier key hierarchy and key rotation
> * Temporary file encryption (based on the patch Antonin proposd)
> * System catalog encryption
> * Generic key management API and test module
> * Simple TAP test

I've checked your patch series to find out how to adjust [1] to make future
merge easier.

The biggest issue I see is that front-end applications won't be able to load
the KMGR plugin. We also used some sort of external library in the first
version of [1], but when I was trying to make the front-ends aware of
encryption, I found out that dfmgr.c cannot be linked to them w/o significant
rework. So I gave up and moved the encrypt_block() and decrypt_block()
functions to the core.

A few more notes regarding key management:

* InitializeKmgr()

  ** the function probably does not have to acquire KeyringControlLock, for
  the same reasons that load_keyring_file() does not do (i.e. it's only called
  by postmaster during startup)

  ** the lines

         char *key = NULL;

     as well as

    /* Get the master key */
    key = KmgrPluginGetKey(KmgrCtl->masterKeyId);

    Assert(key != NULL);

  should be enclosed in #ifdef USE_ASSERT_CHECKING - #endif, otherwise I
  suppose (but haven't verified) compiler will produce warning that variable
  is set but not used.

  Actually ERROR might be more suitable for external (loadable) KMGR plugin,
  but, as explained above, I'm not sure if such an approach is viable.

* KmgrPluginGetKey() only seems to deal with the master key, not with the
  tablespace keys. So I suggest the name to contain the 'Master' word.

* KmgrPluginRemoveKey() seems to be unused.

* KeyringCreateKey() - I wondered why the key is returned encrypted. Actually
  the only call of the function that I found is that in CreateTableSpace(),
  and it does not use the return value at all. Shouldn't KeyringGetKey()
  handle creation of the key if it does not exist yet?

* KeyringAddKey() seems to be unused.

* keyring size (kmgr.c):

    /*
     * Since we have encryption keys per tablspace, we expect this value is enough
     * for most usecase.
     */
    #define KMGR_KEYRING_SIZE 128

    There's no guarantee that the number of tablespaces won't exceed any
    (reasonably low) constant value. The KMGR module should be able to
    allocate additional memory dynamically.


[1] https://commitfest.postgresql.org/23/2104/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Wed, Jul 10, 2019 at 08:07:49PM -0400, Bruce Momjian wrote:
> On Thu, Jul 11, 2019 at 12:18:47AM +0200, Tomas Vondra wrote:
> > On Wed, Jul 10, 2019 at 06:04:30PM -0400, Stephen Frost wrote:
> > > Greetings,
> > > 
> > > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> > > > On Wed, Jul 10, 2019 at 04:11:21PM -0400, Alvaro Herrera wrote:
> > > > >On 2019-Jul-10, Bruce Momjian wrote:
> > > > >
> > > > >>Uh, what if a transaction modifies page 0 and page 1 of the same table
> > > > >>--- don't those pages have the same LSN.
> > > > >
> > > > >No, because WAL being a physical change log, each page gets its own
> > > > >WAL record with its own LSN.
> > > > >
> > > > 
> > > > What if you have wal_log_hints=off? AFAIK that won't change the page LSN.
> > > 
> > > Alvaro suggested elsewhere that we require checksums for these, which
> > > would also force wal_log_hints to be on, and therefore the LSN would
> > > change.
> > > 
> > 
> > Oh, I see - yes, that would solve the hint bits issue. Not sure we want
> > to combine the features like this, though, as it increases the costs of
> > TDE. But maybe it's the best solution.
> 
> Uh, why can't we just force log_hint_bits for encrypted tables?  Why
> would we need to use checksums as well?

When we were considering CBC mode for heap/index pages, a change of a
hint bit would change all later 16-byte encrypted blocks.  Now that we
are using CTR mode, a change of a hint bit will only change a bit on the
stored page.

Someone could compare the old and new pages and see that a bit was
changed.  This would make log_hint_bits less of a requirement with CTR
mode, though leaking hit bit changes is probably not ideal.  Perhaps
log_hint_bits should be a recommendation and not a requirement.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jul 25, 2019 at 11:30:55PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-25, Alvaro Herrera wrote:
> > On the other hand if the Key and IV are reused between messages then
> > the same plaintext will lead to the same ciphertext, so you can
> > potentially decrypt a message using a sufficiently large corpus of known
> > matching plaintext/ciphertext pairs, even without ever recovering the
> > key.
> 
> Actually the attack being described presumes that you know *both the*
> *unencrypted data and the encrypted data* for a certain key/IV pair,
> and only then you can decrypt some other data.  It doesn't follow that
> you can decrypt data just because somebody reused the IV for a second
> page ... I haven't seen any literature referenced that explains what
> this attack is.

I never addressed this exact comment.  If someone can guess at some
known heap/index format markers at specific offsets in a page, they can
XOR that with the encrypted data to get the encryption bit stream.  They
could then use that encrypted bit stream to XOR against another
encrypted page at the same offsets and with the same key/IV to see
unenrypted user data if it exists at the same page offsets.  (The
all-zero empty space is a huge known format marker area.)

This is why CTR is so sensitive to reuse of the key/IV settings for
encrypting different data.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, Jul 30, 2019 at 04:48:31PM -0400, Bruce Momjian wrote:
> I am not even clear if pg_upgrade preserving relfilenode is possible ---
> when we wrap the relfilenode counter, does it start at 1 or at the
> first-user-relation-oid?  If the former, it could conflict with oids
> assigned to new system tables in later major releases.  Tying the
> preservation of relations to two restrictions seems risky.

For the curious, when relfilenode wraps, it starts at
FirstNormalObjectId, because GetNewRelFileNode eventually calls
GetNewObjectId(), so the concern above is wrong, though this is not an
issue anymore.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Jul 31, 2019 at 09:43:00AM -0400, Sehrope Sarkuni wrote:
> On Wed, Jul 31, 2019 at 2:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>     For WAL encryption,  before flushing WAL we encrypt whole 8k WAL page
>     and then write only the encrypted data of the new WAL record using
>     pg_pwrite() rather than write whole encrypted page. So each time we
>     encrypt 8k WAL page we end up with encrypting different data with the
>     same key+nonce but since we don't write to the disk other than space
>     where we actually wrote WAL records it's not a problem. Is that right?
> 
> Ah, this is what I was referring to in my previous mail. I'm not familiar with
> how the writes happen yet (reading up...) but, yes, we would need to ensure
> that encrypted data is not written more than once (i.e. no writing of encrypt
> (zero) followed by writing of encrypt(non-zero) at the same spot).

No one replied to this comment, so I will state here that we never write
zeros to the WAL and go back and write something else --- the WAL is
append-only.  We might do that for heap/index pages, but they would get
a new LSN (and hence a new IV) when that happens.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> On Mon, Aug 5, 2019 at 9:02 PM Bruce Momjian <bruce@momjian.us> wrote:
>     I don't think we want to add a MAC at this point since the MAC for 8k
>     pages seems unattainable.
> 
> Even without a per-page MAC, a MAC at some level for WAL has its own benefits
> such as perfect corruption detection. It could be per-record, per-N-records,
> per-checkpoint, or per-file. The current WAL file format already handles
> arbitrary gaps so there is significantly more flexibility in adding it vs
> pages. I'm not saying it should be a requirement but, unlike pages, I would not
> rule it out just yet as it may not be that complicated.

FYI, the WAL already has a CRC that detects corruption and
parially-written records (which are ignored and stop the reading of
WAL).

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Aug 10, 2019 at 08:06:17AM -0400, Bruce Momjian wrote:
> So, I just had an indea if we use separate encryption keys for
> heap/index and for WAL --- we already know we will have an offline tool
> that can rotate the passphrase or encryption keys.  If we allow the
> encryption keys to be rotated independently, we can create a standby,
> and immediately rotate its heap/index encryption key.  We can then start
> streaming replication.  When we promote the standby to primary, we can
> then shut it down and rotate the WAL encryption key --- the new primary
> would then have no shared keys with the old primary.

To help move this forward, I created a new wiki TDE section titled "TODO
for Full-Cluster Encryption" and marked some unresolved items with
question marks:

    https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

I have also updated some of the other text to match conclusions we have
made.

I know some of the items are done, but if we have agreement on moving
forward, I can help with some of the missing code.  This looks doable
for PG 13 if we start soon.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Aug 10, 2019 at 12:18 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Aug 09, 2019 at 11:51:23PM +0900, Masahiko Sawada wrote:
> >On Fri, Aug 9, 2019 at 10:25 AM Bruce Momjian <bruce@momjian.us> wrote:
> >>
> >> On Thu, Aug  8, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
> >> > > >Crash recovery doesn't happen "all the time" and neither does vacuum
> >> > > >freeze, and autovacuum processes are independent of individual client
> >> > > >backends- we don't need to (and shouldn't) have the keys in shared
> >> > > >memory.
> >> > >
> >> > > Don't people do physical replication / HA pretty much all the time?
> >> >
> >> > Strictly speaking, that isn't actually crash recovery, it's physical
> >> > replication / HA, and while those are certainly nice to have it's no
> >> > guarantee that they're required or that you'd want to have the same keys
> >> > for them- conceptually, at least, you could have WAL with one key that
> >> > both sides know and then different keys for the actual data files, if we
> >> > go with the approach where the WAL is encrypted with one key and then
> >> > otherwise is plaintext.
> >>
> >> Uh, yes, you could have two encryption keys in the data directory, one
> >> for heap/indexes, one for WAL, both unlocked with the same passphrase,
> >> but what would be the value in that?
> >>
> >> > > >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
> >> > > >>work.  (I don't think we could vacuum since we couldn't read the index
> >> > > >>pages to find the matching rows since the index values would be encrypted
> >> > > >>too.  We might be able to not encrypt the tid in the index typle.)
> >> > > >
> >> > > >Why do we need the indexed values to vacuum the index..?  We don't
> >> > > >today, as I recall.  We would need the tids though, yes.
> >> > >
> >> > > Well, we also do collect statistics on the data, for example. But even
> >> > > if we assume we wouldn't do that for encrypted indexes (which seems like
> >> > > a pretty bad idea to me), you'd probably end up leaking information
> >> > > about ordering of the values. Which is generally a pretty serious
> >> > > information leak, AFAICS.
> >> >
> >> > I agree entirely that order information would be bad to leak- but this
> >> > is all new ground here and we haven't actually sorted out what such a
> >> > partially encrypted btree would look like.  We don't actually have to
> >> > have the down-links in the tree be unencrypted to allow vacuuming of
> >> > leaf pages, after all.
> >>
> >> Agreed, but I think we kind of know that the value in cluster-wide
> >> encryption is different from multi-key encryption --- both have their
> >> value, but right now cluster-wide is the easiest and simplest, and
> >> probably meets more user needs than multi-key encryption.  If others
> >> want to start scoping out what multi-key encryption would look like, we
> >> can discuss it.  I personally would like to focus on cluster-wide
> >> encryption for PG 13.
> >
> >I agree that cluster-wide is more simpler but I'm not sure that it
> >meets real needs from users. One example is re-encryption; when the
> >key leakage happens, in cluster-wide encryption we end up with doing
> >re-encrypt whole database regardless the amount of user sensitive data
> >in database. I think it's a big constraint for users because it's
> >common that the amount of data such as master table that needs to be
> >encrypted doesn't account for a large potion of database. That's one
> >reason why I think more fine granularity encryption such as
> >table/tablespace is required.
> >
>
> TBH I think it's mostly pointless to design for key leakage.
>
> My understanding it that all this work is motivated by the assumption that
> Bob can obtain access to the data directory (say, a backup of it). So if
> he also manages to get access to the encryption key, we probably have to
> assume he already has access to current snapshot of the data directory,
> which means any re-encryption is pretty futile.
>
> What we can (and should) optimize for is key rotation, but as that only
> changes the master key and not the actual encryption keys, the overhead is
> pretty low.
>
> We can of course support "forced" re-encryption, but I think it's
> acceptable if that's fairly expensive as long as it can be throttled and
> executed in the background (kinda similar to the patch to enable checksums
> in the background).

I'm not sure that we can ignore the risk of MDEK leakage. Once MDEK is
leaked for whatever reason all that is left for attacker is to steal
data. User who realized that MDEK is leaked will have to re-encrypt
data. Even if the data is already stolen user will want to re-encrypt
data to protect further attacks. KEK rotation is futile in this case.

>
> >And in terms of feature development we would implement
> >fine-granularity encryption in the future even if the first step is
> >cluster-wide encryption? And both TDEs encrypt the same kind of
> >database objects (i.e. only  tables , indexes and WAL)? If so, how
> >does users  use them depending on cases?
> >
> >I imagined the case where we had the cluster-wide encryption as the
> >first TDE feature. We will enable TDE at initdb time by specifying the
> >command-line parameter for TDE. Then TDE is enabled in cluster wide
> >and all tables/indexes and WAL are automatically encrypted. Then, if
> >we want to implement the more fine granularity encryption how can we
> >make users use it? WAL encryption and tables/index encryption are
> >enabled at the same time but we want to enable encryption for
> >particular tables/indexes after initdb. If the cluster-wide encryption
> >is something like a short-cut of encrypting all tables/indexes, I
> >personally think that implementing the more fine granularity one first
> >and then using it to achieve the more coarse granularity would be more
> >easier.
> >
>
> Not sure, but I'd expect it to be the other way around, i.e. the more
> granular encryption being more complicated. One reason is that with
> cluster-wide you can just assume everything is encrypted and handle it the
> same way, while with fine-grained encryption you need to whether each
> individual object is encrypted, maybe handle it in different ways, etc.
>
> But that's just my guess, really.
>

I meant about the case where we want to implement both
functionality(i.e., cluster wide for encryption everything and
table/tablespace level for finer granularity encryption). If we want
to have only either one the cluster-wide is easier as you mentioned.
But if we want to have both of them I think that implementing finer
granularity encryption first and using it to achieve coarse
granularity encryption would be easier.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Sat, Aug 10, 2019 at 1:19 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> We can of course support "forced" re-encryption, but I think it's acceptable if that's fairly expensive as long as it
canbe throttled and executed in the background (kinda similar to the patch to enable checksums in the background). 

As an alternative way to provide for a "forced" re-encryption couldn't you just run pg_dumpall + psql?

Regards,
--
Peter Smith
Fujitsu Australia



Bruce Momjian <bruce@momjian.us> wrote:

> On Sat, Aug 10, 2019 at 08:06:17AM -0400, Bruce Momjian wrote:
> > So, I just had an indea if we use separate encryption keys for
> > heap/index and for WAL --- we already know we will have an offline tool
> > that can rotate the passphrase or encryption keys.  If we allow the
> > encryption keys to be rotated independently, we can create a standby,
> > and immediately rotate its heap/index encryption key.  We can then start
> > streaming replication.  When we promote the standby to primary, we can
> > then shut it down and rotate the WAL encryption key --- the new primary
> > would then have no shared keys with the old primary.
>
> To help move this forward, I created a new wiki TDE section titled "TODO
> for Full-Cluster Encryption" and marked some unresolved items with
> question marks:
>
>     https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> I have also updated some of the other text to match conclusions we have
> made.
>
> I know some of the items are done, but if we have agreement on moving
> forward, I can help with some of the missing code.  This looks doable
> for PG 13 if we start soon.

I can work on it right away but don't know where to start.

First, I think we should use a code repository to integrate [1] and [2]
instead of sending diffs back and forth. That would force us to resolve
conflicts soon and help to avoid duplicate work. The diffs would be created
only whe we need to post the next patch version to pgsql-hackers for review,
otherwise the discussions of details can take place elsewhere.

A separate branch can be created for the Full-Cluster Encryption at some point
- there are probably multiple branching strategies.

The most difficult problem I see now regarding the collaboration is agreement
on the key management user interface. The Full-Cluster Encryption feature [1]
should not add configuration variables or even tools that the next, more
sophisticated version [2] deprecates immediately. Part of the problem is that
[2] puts all (key management related) interaction of postgres with the
environment into an external library. As I pointed out in my response to [2],
this will not work for frontend applications (e.g. pg_waldump). I think the
key management UI for [2] needs to be designed first even if PG 13 should
adopt only [1].

At least it should be clear how [2] will retrieve the master key because [1]
should not do it in a differnt way. (The GUC cluster_passphrase_command
mentioned in [3] seems viable, although I think [1] uses approach which is
more convenient if the passphrase should be read from console.) Rotation of
the master key is another thing that both versions of the feature should do in
the same way. And of course, the fronend applications need consistent approach
too.

I'm not too happy to start another (potentially long) discussion in this
already huge thread, but I think the UI stuff belongs to the -hackers list
rather than to an offline discussion.


[1] https://commitfest.postgresql.org/23/2104/

[2] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO%3D8N%3Dnc2xVZPB0d9e-VjJ%3DYaRnw%40mail.gmail.com

[3]
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> I can work on it right away but don't know where to start.

I think the big open question is whether there will be acceptance of an
all-cluster encyption feature.  I guess if no one objects, we can move
forward.

> First, I think we should use a code repository to integrate [1] and [2]
> instead of sending diffs back and forth. That would force us to resolve
> conflicts soon and help to avoid duplicate work. The diffs would be created
> only whe we need to post the next patch version to pgsql-hackers for review,
> otherwise the discussions of details can take place elsewhere.

Well, we can do that, or just follow the TODO list and apply items as we
complete them.  We have found that doing everything in one big patch is
just too hard to review and get accepted.

> The most difficult problem I see now regarding the collaboration is agreement
> on the key management user interface. The Full-Cluster Encryption feature [1]
> should not add configuration variables or even tools that the next, more
> sophisticated version [2] deprecates immediately. Part of the problem is that

Yes, the all-cluster encryption feature has _no_ SQL-level API to
control it, just a GUC variable that you can use SHOW to see the
encryption mode.

> [2] puts all (key management related) interaction of postgres with the
> environment into an external library. As I pointed out in my response to [2],
> this will not work for frontend applications (e.g. pg_waldump). I think the
> key management UI for [2] needs to be designed first even if PG 13 should
> adopt only [1].

I think there are several directions we can go after all-cluster
encryption, and it does matter because we would want minimal API
breakage.  The options are:

1)  Allow per-table encyption control to limit encryption overhead,
though all of WAL still needs to be encrypted;  we could add a
per-record encyption flag to WAL records to avoid that.

2)  Allow user-controlled keys, which are always unlocked, and encrypt
WAL with one key

3)  Encrypt only the user-data portion of pages with user-controlled
keys.  FREEZE and crash recovery work since only the user data is
encrypted.  WAL is not encrypted, except for the user-data portion

I think once we implement all-cluster encryption, there will be little
value to #1 unless we find that page encryption is a big performance
hit, which I think is unlikely based on performance tests so far.

I don't think #2 has much value since the keys have to always be
unlocked to allow freeze and crash recovery.

I don't think #3 is viable since there is too much information leakage,
particularly for indexes because the tid is visible.

Now, if someone says they still want 2 & 3, which has happened many
times, explain how these issues can be reasonable addressed.

I frankly think we will implement all-cluster encryption, and nothing
else.  I think the next big encryption feature after that will be
client-side encryption support, which can be done now but is complex; 
it needs to be easier.

> At least it should be clear how [2] will retrieve the master key because [1]
> should not do it in a differnt way. (The GUC cluster_passphrase_command
> mentioned in [3] seems viable, although I think [1] uses approach which is
> more convenient if the passphrase should be read from console.) Rotation of
> the master key is another thing that both versions of the feature should do in
> the same way. And of course, the fronend applications need consistent approach
> too.

I don't see the value of an external library for key storage.

> I'm not too happy to start another (potentially long) discussion in this
> already huge thread, but I think the UI stuff belongs to the -hackers list
> rather than to an offline discussion.

I think the big question is whether we do anything, or just decide we
can't agree and stop.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > I can work on it right away but don't know where to start.
>
> I think the big open question is whether there will be acceptance of an
> all-cluster encyption feature.  I guess if no one objects, we can move
> forward.
>

I still feel that we need to have per table/tablespace keys although
it might not be the first implementation. I think the safeness of both
table/tablespace level and cluster level would be almost the same but
the former would have an advantage in terms of operation and
performance.

> > First, I think we should use a code repository to integrate [1] and [2]
> > instead of sending diffs back and forth. That would force us to resolve
> > conflicts soon and help to avoid duplicate work. The diffs would be created
> > only whe we need to post the next patch version to pgsql-hackers for review,
> > otherwise the discussions of details can take place elsewhere.
>
> Well, we can do that, or just follow the TODO list and apply items as we
> complete them.  We have found that doing everything in one big patch is
> just too hard to review and get accepted.
>
> > The most difficult problem I see now regarding the collaboration is agreement
> > on the key management user interface. The Full-Cluster Encryption feature [1]
> > should not add configuration variables or even tools that the next, more
> > sophisticated version [2] deprecates immediately. Part of the problem is that
>
> Yes, the all-cluster encryption feature has _no_ SQL-level API to
> control it, just a GUC variable that you can use SHOW to see the
> encryption mode.
>
> > [2] puts all (key management related) interaction of postgres with the
> > environment into an external library. As I pointed out in my response to [2],
> > this will not work for frontend applications (e.g. pg_waldump). I think the
> > key management UI for [2] needs to be designed first even if PG 13 should
> > adopt only [1].
>
> I think there are several directions we can go after all-cluster
> encryption, and it does matter because we would want minimal API
> breakage.  The options are:
>
> 1)  Allow per-table encyption control to limit encryption overhead,
> though all of WAL still needs to be encrypted;  we could add a
> per-record encyption flag to WAL records to avoid that.
>
> 2)  Allow user-controlled keys, which are always unlocked, and encrypt
> WAL with one key
>
> 3)  Encrypt only the user-data portion of pages with user-controlled
> keys.  FREEZE and crash recovery work since only the user data is
> encrypted.  WAL is not encrypted, except for the user-data portion
>
> I think once we implement all-cluster encryption, there will be little
> value to #1 unless we find that page encryption is a big performance
> hit, which I think is unlikely based on performance tests so far.
>
> I don't think #2 has much value since the keys have to always be
> unlocked to allow freeze and crash recovery.
>
> I don't think #3 is viable since there is too much information leakage,
> particularly for indexes because the tid is visible.
>
> Now, if someone says they still want 2 & 3, which has happened many
> times, explain how these issues can be reasonable addressed.
>
> I frankly think we will implement all-cluster encryption, and nothing
> else.  I think the next big encryption feature after that will be
> client-side encryption support, which can be done now but is complex;
> it needs to be easier.
>
> > At least it should be clear how [2] will retrieve the master key because [1]
> > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > mentioned in [3] seems viable, although I think [1] uses approach which is
> > more convenient if the passphrase should be read from console.)

I think that we can also provide a way to pass encryption key directly
to postmaster rather than using passphrase. Since it's common that
user stores keys in KMS it's useful if we can do that.

> > Rotation of
> > the master key is another thing that both versions of the feature should do in
> > the same way. And of course, the fronend applications need consistent approach
> > too.
>
> I don't see the value of an external library for key storage.

I think that big benefit is that PostgreSQL can seamlessly work with
external services such as KMS. For instance, when key rotation,
PostgreSQL can register new key to KMS and use it, and it can remove
keys when it no longer necessary. That is, it can enable PostgreSQL to
not only not only getting key from KMS but also registering and
removing keys. And we also can decrypt MDEK in KMS instead of doing in
PostgreSQL which is more safety. In addition, once someone create the
plugin library of an external services individual projects don't need
to create that.


BTW I've created PoC patch for cluster encryption feature. Attached
patch set has done some items of TODO list and some of them can be
used even for finer granularity encryption. Anyway, the implemented
components are followings:

* Initialization stuff (initdb support). initdb has new command line
options: --enc-cipher and --cluster-passphrase-command. --enc-cipher
option accepts either aes-128 or aes-256 values while
--cluster-passphrase-command accepts an arbitrary command. ControlFile
has an integer indicating cluster encryption support, 'off', 'aes-128'
or 'aes-256'.

* 3-tier encryption keys. During initdb we create KEK and MDEK and
write the meta data file(global/pg_kmgr file). When postmaster startup
it reads the kmgr file, verifies the passphrase using HMAC, unwraps
MDEK and derives TDEK and WDEK from MDEK. Currently MDEK, TDEK and
WDEK are stored into shared memory as this is still PoC but we also
can have them in process local memory.

* All cryptographic functions are implemented using OpenSSL. Since
HKDF and key wrap have been introduced in OpenSSL 1.1.0 it requires
1.1.0 or higher.

* Buffer encryption. All tables and indexes data except for vm and fsm
are transparently encrypted.

Missing features so far are followings:

* WAL encryption
* Temporary file encryption
* Command-line tool to change passphrase (KEK key rotation)
* Front-end tool support (pg_waldump, pg_rewind)
* Documentation
* Regression tests

Since some of above items are already implemented in other patches we
can use them.

We can create database cluster while enabling cluster encryption as follows:

$ initdb -D data --enc-cipher=aes-128
--cluster-passphrase-command='echo "secret password"'
$ pg_controldata | grep encryption


Data encryption cipher:               aes-128
$ pg_ctl start

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment
Bruce Momjian <bruce@momjian.us> wrote:

> On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > I can work on it right away but don't know where to start.
>
> I think the big open question is whether there will be acceptance of an
> all-cluster encyption feature.  I guess if no one objects, we can move
> forward.
>
> > First, I think we should use a code repository to integrate [1] and [2]
> > instead of sending diffs back and forth. That would force us to resolve
> > conflicts soon and help to avoid duplicate work. The diffs would be created
> > only whe we need to post the next patch version to pgsql-hackers for review,
> > otherwise the discussions of details can take place elsewhere.
>
> Well, we can do that, or just follow the TODO list and apply items as we
> complete them.  We have found that doing everything in one big patch is
> just too hard to review and get accepted.
>
> > The most difficult problem I see now regarding the collaboration is agreement
> > on the key management user interface. The Full-Cluster Encryption feature [1]
> > should not add configuration variables or even tools that the next, more
> > sophisticated version [2] deprecates immediately. Part of the problem is that
>
> Yes, the all-cluster encryption feature has _no_ SQL-level API to
> control it, just a GUC variable that you can use SHOW to see the
> encryption mode.
>
> > [2] puts all (key management related) interaction of postgres with the
> > environment into an external library. As I pointed out in my response to [2],
> > this will not work for frontend applications (e.g. pg_waldump). I think the
> > key management UI for [2] needs to be designed first even if PG 13 should
> > adopt only [1].
>
> I think there are several directions we can go after all-cluster
> encryption,

I think I misunderstood. What you summarize in

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

does include

https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com

i.e. per-tablespace keys, right? Then the collaboration should be easier than
I thought.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Wed, Aug 14, 2019 at 09:19:44PM -0400, Bruce Momjian wrote:
> I think there are several directions we can go after all-cluster
> encryption, and it does matter because we would want minimal API
> breakage.  The options are:
> 
> 1)  Allow per-table encyption control to limit encryption overhead,
> though all of WAL still needs to be encrypted;  we could add a
> per-record encyption flag to WAL records to avoid that.
> 
> 2)  Allow user-controlled keys, which are always unlocked, and encrypt
> WAL with one key
> 
> 3)  Encrypt only the user-data portion of pages with user-controlled
> keys.  FREEZE and crash recovery work since only the user data is
> encrypted.  WAL is not encrypted, except for the user-data portion
> 
...
> I don't think #3 is viable since there is too much information leakage,
> particularly for indexes because the tid is visible.

Thinking some more, it might be possible to encrypt the index tid and
for crash recovery and the freeze part of vacuum to work, which might be
sufficient to allow the user keys to remain locked.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote:
> On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > > I can work on it right away but don't know where to start.
> >
> > I think the big open question is whether there will be acceptance of an
> > all-cluster encyption feature.  I guess if no one objects, we can move
> > forward.
> >
> 
> I still feel that we need to have per table/tablespace keys although
> it might not be the first implementation. I think the safeness of both
> table/tablespace level and cluster level would be almost the same but
> the former would have an advantage in terms of operation and
> performance.

I assume you are talking about my option #1.  I can see if you only need
a few tables encrypted, e.g., credit card numbers, it can be excessive
to encrypt the entire cluster.  (I think you would need to encrypt
pg_statistic too.)

The tricky part will be WAL --- if we encrypt all of WAL, the per-table
overhead might be minimal compared to the WAL encryption overhead.  The
better solution would be to add a flag to WAL records to indicate
encrypted entries, but you would then leak when an encryption change
happens and WAL record length.  (FYI, numeric values have different
lengths, as do character strings.)  I assume we would still use a single
key for all tables/indexes, and one for WAL, plus key rotation
requirements.

I personally would like to see full cluster implemented first to find
out exactly what the overhead is.  As I stated earlier, the overhead of
determining which things to encrypt, both in code complexity, user
interface, and processing overhead, might not be worth it.

I can see why you would think that encrypting less would be easier than
encrypting more, but security boundaries are hard to construct, and
anything that requires a user API, even more so.

> > > At least it should be clear how [2] will retrieve the master key because [1]
> > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > more convenient if the passphrase should be read from console.)
> 
> I think that we can also provide a way to pass encryption key directly
> to postmaster rather than using passphrase. Since it's common that
> user stores keys in KMS it's useful if we can do that.

Why would it not be simpler to have the cluster_passphrase_command run
whatever command-line program it wants?  If you don't want to use a
shell command, create an executable and call that.

> > > Rotation of
> > > the master key is another thing that both versions of the feature should do in
> > > the same way. And of course, the fronend applications need consistent approach
> > > too.
> >
> > I don't see the value of an external library for key storage.
> 
> I think that big benefit is that PostgreSQL can seamlessly work with
> external services such as KMS. For instance, when key rotation,
> PostgreSQL can register new key to KMS and use it, and it can remove
> keys when it no longer necessary. That is, it can enable PostgreSQL to
> not only not only getting key from KMS but also registering and
> removing keys. And we also can decrypt MDEK in KMS instead of doing in
> PostgreSQL which is more safety. In addition, once someone create the
> plugin library of an external services individual projects don't need
> to create that.

I think the big win for an external library is when you don't want the
overhead of calling an external program.  For example, we certainly
would not want to call an external program while processing a query.  Do
we have any such requirements for encryption, especially since we only
are going to allow offline mode for encryption mode changes and key
rotation in the first version?

> BTW I've created PoC patch for cluster encryption feature. Attached
> patch set has done some items of TODO list and some of them can be
> used even for finer granularity encryption. Anyway, the implemented
> components are followings:

Nice, thanks.

> * Initialization stuff (initdb support). initdb has new command line
> options: --enc-cipher and --cluster-passphrase-command. --enc-cipher
> option accepts either aes-128 or aes-256 values while
> --cluster-passphrase-command accepts an arbitrary command. ControlFile
> has an integer indicating cluster encryption support, 'off', 'aes-128'
> or 'aes-256'.

Nice.  If we get agreement we want to do this for PG 13, we can start
applying these patches.

> * 3-tier encryption keys. During initdb we create KEK and MDEK and
> write the meta data file(global/pg_kmgr file). When postmaster startup
> it reads the kmgr file, verifies the passphrase using HMAC, unwraps
> MDEK and derives TDEK and WDEK from MDEK. Currently MDEK, TDEK and
> WDEK are stored into shared memory as this is still PoC but we also
> can have them in process local memory.

Uh, I thought we were going to have the TDEK and WDEK be created
separately, rather than derived from a single key, so we could do key
rotation on them independently, which might help with promoting standby
servers.

For example, someone could create a standby, rotate the TDEK right away,
then, once the standby is promoted, they can rotate the WDEK and have a
server that never reuses keys from the old primary.  Is that not a
user-case worth worrying about?  Maybe we need to discuss that more.

Oh, here's an even better reason to use separate, non-derived keys for
TDEK and WDEK.  How would you rotate keys for a primary server and its
standbys?  If the TDEK and WDEK are derived from the same key, you could
not modify the TDEK independently of the WDEK.  However, if they are
decoupled, you could shut down and rotate the TDEK of each standby, then
switch-over to a standby and rotate the TDEK on the old primary.  Once
you have rotated all the TDEK keys, you could shut down all servers and
quickly rotate the WDEK.  (The WDEK has to be the same for streaming
replication to work.)

> * All cryptographic functions are implemented using OpenSSL. Since
> HKDF and key wrap have been introduced in OpenSSL 1.1.0 it requires
> 1.1.0 or higher.

Sure.

> * Buffer encryption. All tables and indexes data except for vm and fsm
> are transparently encrypted.

Nice.

> Missing features so far are followings:
> 
> * WAL encryption
> * Temporary file encryption
> * Command-line tool to change passphrase (KEK key rotation)

I think we need the command-line tool to also rotate TDEK and WDEK, if
we go in that direction.

> * Front-end tool support (pg_waldump, pg_rewind)
> * Documentation
> * Regression tests
> 
> Since some of above items are already implemented in other patches we
> can use them.
> 
> We can create database cluster while enabling cluster encryption as follows:
> 
> $ initdb -D data --enc-cipher=aes-128
> --cluster-passphrase-command='echo "secret password"'
> $ pg_controldata | grep encryption
> 
> 
> Data encryption cipher:               aes-128
> $ pg_ctl start

Nice!

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Aug 15, 2019 at 11:24:46AM +0200, Antonin Houska wrote:
> > I think there are several directions we can go after all-cluster
> > encryption,
> 
> I think I misunderstood. What you summarize in
> 
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> 
> does include
> 
> https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com
> 
> i.e. per-tablespace keys, right? Then the collaboration should be easier than
> I thought.

No, there is a single tables/indexes key and a WAL key, plus keys for
rotation.  I explained why per-tablespace keys don't add much value.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Aug 15, 2019 at 11:24:46AM +0200, Antonin Houska wrote:
> > > I think there are several directions we can go after all-cluster
> > > encryption,
> >
> > I think I misunderstood. What you summarize in
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > does include
> >
> > https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com
> >
> > i.e. per-tablespace keys, right? Then the collaboration should be easier than
> > I thought.
>
> No, there is a single tables/indexes key and a WAL key, plus keys for
> rotation.  I explained why per-tablespace keys don't add much value.

Nothing in the discussion that I've seen, at least, has changed my
opinion that tablespace-based keys *would* add significant value,
particularly if it'd be difficult to support per-table keys.  Of course,
if we can get per-table keys without too much difficulty then that would
be better.

Thanks,

Stephen

Attachment
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote:
> > On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > > > I can work on it right away but don't know where to start.
> > >
> > > I think the big open question is whether there will be acceptance of an
> > > all-cluster encyption feature.  I guess if no one objects, we can move
> > > forward.
> >
> > I still feel that we need to have per table/tablespace keys although
> > it might not be the first implementation. I think the safeness of both
> > table/tablespace level and cluster level would be almost the same but
> > the former would have an advantage in terms of operation and
> > performance.
>
> I assume you are talking about my option #1.  I can see if you only need
> a few tables encrypted, e.g., credit card numbers, it can be excessive
> to encrypt the entire cluster.  (I think you would need to encrypt
> pg_statistic too.)

Or we would need a seperate encrypted pg_statistic, or a way to encrypt
certain entries inside pg_statistic.

> The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> overhead might be minimal compared to the WAL encryption overhead.  The
> better solution would be to add a flag to WAL records to indicate
> encrypted entries, but you would then leak when an encryption change
> happens and WAL record length.  (FYI, numeric values have different
> lengths, as do character strings.)  I assume we would still use a single
> key for all tables/indexes, and one for WAL, plus key rotation
> requirements.

I don't think the fact that a change was done to an encrypted blob is an
actual 'leak'- anyone can tell that by looking at the at the encrypted
data before and after.  Further, the actual change would be encrypted,
right?  Length of data is necessary to include in the vast majority of
cases that the data is being dealt with and so I'm not sure that it
makes sense for us to be worrying about that as a leak, unless you have
a specific recommendation from a well known source discussing that
concern..?

> I personally would like to see full cluster implemented first to find
> out exactly what the overhead is.  As I stated earlier, the overhead of
> determining which things to encrypt, both in code complexity, user
> interface, and processing overhead, might not be worth it.

I disagree with this and feel that the overhead that's being discussed
here (user interface, figuring out if we should encrypt it or not,
processing overhead for those determinations) is along the lines of
UNLOGGED tables, yet there wasn't any question about if that was a valid
or useful feature to implement.  The biggest challenge here is really
around key management and I agree that's difficult but it's also really
important and something that we need to be thinking about- and thinking
about how to work with multiple keys and not just one.  Building in an
assumption that we will only ever work with one key would make this
capability nothing more than DBA-managed filesystem-level encryption
(though even there different tablespaces could have different keys...)
and I worry would make later work to support multiple keys more
difficult and less likely to actually happen.  It's also not clear to me
why we aren't building in *some* mechanism to work with multiple keys
from the start as part of the initial design.

> I can see why you would think that encrypting less would be easier than
> encrypting more, but security boundaries are hard to construct, and
> anything that requires a user API, even more so.

I'm not sure I'm follwing here- I'm pretty sure everyone understands
that selective encryption will require more work to implement, in part
because an API needs to be put in place and we need to deal with
multiple keys, etc.  I don't think anyone thinks that'll be "easier".

> > > > At least it should be clear how [2] will retrieve the master key because [1]
> > > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > > more convenient if the passphrase should be read from console.)
> >
> > I think that we can also provide a way to pass encryption key directly
> > to postmaster rather than using passphrase. Since it's common that
> > user stores keys in KMS it's useful if we can do that.
>
> Why would it not be simpler to have the cluster_passphrase_command run
> whatever command-line program it wants?  If you don't want to use a
> shell command, create an executable and call that.

Having direct integration with a KMS would certainly be valuable, and I
don't see a reason to deny users that option if someone would like to
spend time implementing it- in addition to a simpler mechanism such as a
passphrase command, which I believe is what was being suggested here.

> > > > Rotation of
> > > > the master key is another thing that both versions of the feature should do in
> > > > the same way. And of course, the fronend applications need consistent approach
> > > > too.
> > >
> > > I don't see the value of an external library for key storage.
> >
> > I think that big benefit is that PostgreSQL can seamlessly work with
> > external services such as KMS. For instance, when key rotation,
> > PostgreSQL can register new key to KMS and use it, and it can remove
> > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > not only not only getting key from KMS but also registering and
> > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > PostgreSQL which is more safety. In addition, once someone create the
> > plugin library of an external services individual projects don't need
> > to create that.
>
> I think the big win for an external library is when you don't want the
> overhead of calling an external program.  For example, we certainly
> would not want to call an external program while processing a query.  Do
> we have any such requirements for encryption, especially since we only
> are going to allow offline mode for encryption mode changes and key
> rotation in the first version?

The strong push for a stripped-down and "first version" that is
extremely limited is really grating on me as it seems we have quite a
few people who are interested in making progress here and a small number
of others who are pushing back and putting up limitations that "the
first version can't have X" or "the first version can't have Y".

I'm all for incremental development, but we need to be thinking about
the larger picture when we develop features and make sure that we don't
bake in assumptions that will later become very difficult for us to work
ourselves out of (especially when it comes to user interface and things
like GUCs...), but where we decide to draw a line shouldn't be based on
assumptions about what's going to be difficult and what isn't- let's let
those who want to work on this capability work on it and as we see the
progress, if there's issues which come up with a specific area that seem
likely to prove difficult to include, then we can consider backing away
from that while keeping it in mind while doing further development.

In other words, I feel like we're getting trapped here in a
"requirements definition" phase of a traditional waterfall-style
development cycle we have to decide, up front, the EXACT set of features
and capabilities that we want and then we are going to expect people to
develop according to EXACTLY that set, and we'll shoot down anything
that comes across which is trying to do more or is trying to be more
flexible in anticipation of capabilities that we know we will want down
the road.  It's likely clear already but I'll say it anyway- I don't
think it's a good idea to go down that route.

Thanks,

Stephen

Attachment
On Fri, Aug 16, 2019 at 10:01 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote:
> > > On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > > > > I can work on it right away but don't know where to start.
> > > >
> > > > I think the big open question is whether there will be acceptance of an
> > > > all-cluster encyption feature.  I guess if no one objects, we can move
> > > > forward.
> > >
> > > I still feel that we need to have per table/tablespace keys although
> > > it might not be the first implementation. I think the safeness of both
> > > table/tablespace level and cluster level would be almost the same but
> > > the former would have an advantage in terms of operation and
> > > performance.
> >
> > I assume you are talking about my option #1.  I can see if you only need
> > a few tables encrypted, e.g., credit card numbers, it can be excessive
> > to encrypt the entire cluster.  (I think you would need to encrypt
> > pg_statistic too.)
>
> Or we would need a seperate encrypted pg_statistic, or a way to encrypt
> certain entries inside pg_statistic.

I think we also need to encrypt other system catalogs. For instance
pg_procs might also have sensitive data in prosrc column. So I think
it's better to encrypt all system catalogs rather than picking up some
catalogs since it would not be a big overhead. Since system catalogs
are created during CREATE DATABASE by copying files tablespace level
or database level encryption would be well suited with system catalog
encryption.

>
> > The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> > overhead might be minimal compared to the WAL encryption overhead.  The
> > better solution would be to add a flag to WAL records to indicate
> > encrypted entries, but you would then leak when an encryption change
> > happens and WAL record length.  (FYI, numeric values have different
> > lengths, as do character strings.)  I assume we would still use a single
> > key for all tables/indexes, and one for WAL, plus key rotation
> > requirements.
>
> I don't think the fact that a change was done to an encrypted blob is an
> actual 'leak'- anyone can tell that by looking at the at the encrypted
> data before and after.  Further, the actual change would be encrypted,
> right?  Length of data is necessary to include in the vast majority of
> cases that the data is being dealt with and so I'm not sure that it
> makes sense for us to be worrying about that as a leak, unless you have
> a specific recommendation from a well known source discussing that
> concern..?
>
> > I personally would like to see full cluster implemented first to find
> > out exactly what the overhead is.  As I stated earlier, the overhead of
> > determining which things to encrypt, both in code complexity, user
> > interface, and processing overhead, might not be worth it.
>
> I disagree with this and feel that the overhead that's being discussed
> here (user interface, figuring out if we should encrypt it or not,
> processing overhead for those determinations) is along the lines of
> UNLOGGED tables, yet there wasn't any question about if that was a valid
> or useful feature to implement.  The biggest challenge here is really
> around key management and I agree that's difficult but it's also really
> important and something that we need to be thinking about- and thinking
> about how to work with multiple keys and not just one.  Building in an
> assumption that we will only ever work with one key would make this
> capability nothing more than DBA-managed filesystem-level encryption
> (though even there different tablespaces could have different keys...)
> and I worry would make later work to support multiple keys more
> difficult and less likely to actually happen.  It's also not clear to me
> why we aren't building in *some* mechanism to work with multiple keys
> from the start as part of the initial design.
>
> > I can see why you would think that encrypting less would be easier than
> > encrypting more, but security boundaries are hard to construct, and
> > anything that requires a user API, even more so.
>
> I'm not sure I'm follwing here- I'm pretty sure everyone understands
> that selective encryption will require more work to implement, in part
> because an API needs to be put in place and we need to deal with
> multiple keys, etc.  I don't think anyone thinks that'll be "easier".
>
> > > > > At least it should be clear how [2] will retrieve the master key because [1]
> > > > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > > > more convenient if the passphrase should be read from console.)
> > >
> > > I think that we can also provide a way to pass encryption key directly
> > > to postmaster rather than using passphrase. Since it's common that
> > > user stores keys in KMS it's useful if we can do that.
> >
> > Why would it not be simpler to have the cluster_passphrase_command run
> > whatever command-line program it wants?  If you don't want to use a
> > shell command, create an executable and call that.
>
> Having direct integration with a KMS would certainly be valuable, and I
> don't see a reason to deny users that option if someone would like to
> spend time implementing it- in addition to a simpler mechanism such as a
> passphrase command, which I believe is what was being suggested here.
>
> > > > > Rotation of
> > > > > the master key is another thing that both versions of the feature should do in
> > > > > the same way. And of course, the fronend applications need consistent approach
> > > > > too.
> > > >
> > > > I don't see the value of an external library for key storage.
> > >
> > > I think that big benefit is that PostgreSQL can seamlessly work with
> > > external services such as KMS. For instance, when key rotation,
> > > PostgreSQL can register new key to KMS and use it, and it can remove
> > > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > > not only not only getting key from KMS but also registering and
> > > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > > PostgreSQL which is more safety. In addition, once someone create the
> > > plugin library of an external services individual projects don't need
> > > to create that.
> >
> > I think the big win for an external library is when you don't want the
> > overhead of calling an external program.  For example, we certainly
> > would not want to call an external program while processing a query.  Do
> > we have any such requirements for encryption, especially since we only
> > are going to allow offline mode for encryption mode changes and key
> > rotation in the first version?
>
> The strong push for a stripped-down and "first version" that is
> extremely limited is really grating on me as it seems we have quite a
> few people who are interested in making progress here and a small number
> of others who are pushing back and putting up limitations that "the
> first version can't have X" or "the first version can't have Y".
>
> I'm all for incremental development, but we need to be thinking about
> the larger picture when we develop features and make sure that we don't
> bake in assumptions that will later become very difficult for us to work
> ourselves out of (especially when it comes to user interface and things
> like GUCs...), but where we decide to draw a line shouldn't be based on
> assumptions about what's going to be difficult and what isn't- let's let
> those who want to work on this capability work on it and as we see the
> progress, if there's issues which come up with a specific area that seem
> likely to prove difficult to include, then we can consider backing away
> from that while keeping it in mind while doing further development.

I totally agree. That's why I stated the difficulty to support finer
granularity encryption after supported cluster wide encryption, and
worried the backward compatibility. I think we need to think
implementing what users want while keeping it simple as much as
possible even if it's complex.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center





On Thu, Aug 15, 2019 at 8:21 PM Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug 15, 2019 at 11:24:46AM +0200, Antonin Houska wrote:
> > I think there are several directions we can go after all-cluster
> > encryption,
>
> I think I misunderstood. What you summarize in
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
Do we have any status of TODO's, what has been done and what left? It's much better if we have a link of discussion of each item.

 
> does include
>
> https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com
>
> i.e. per-tablespace keys, right? Then the collaboration should be easier than
> I thought.

No, there is a single tables/indexes key and a WAL key, plus keys for
rotation.  I explained why per-tablespace keys don't add much value.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


--
Ibrar Ahmed
On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> > I assume you are talking about my option #1.  I can see if you only need
> > a few tables encrypted, e.g., credit card numbers, it can be excessive
> > to encrypt the entire cluster.  (I think you would need to encrypt
> > pg_statistic too.)
> 
> Or we would need a seperate encrypted pg_statistic, or a way to encrypt
> certain entries inside pg_statistic.

Yes.

> > The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> > overhead might be minimal compared to the WAL encryption overhead.  The
> > better solution would be to add a flag to WAL records to indicate
> > encrypted entries, but you would then leak when an encryption change
> > happens and WAL record length.  (FYI, numeric values have different
> > lengths, as do character strings.)  I assume we would still use a single
> > key for all tables/indexes, and one for WAL, plus key rotation
> > requirements.
> 
> I don't think the fact that a change was done to an encrypted blob is an
> actual 'leak'- anyone can tell that by looking at the at the encrypted
> data before and after.  Further, the actual change would be encrypted,
> right?  Length of data is necessary to include in the vast majority of
> cases that the data is being dealt with and so I'm not sure that it
> makes sense for us to be worrying about that as a leak, unless you have
> a specific recommendation from a well known source discussing that
> concern..?

Yes, it is a minor negative, but we would need to see some performance
reason to have that minor negative, and I have already stated why I
think there might be no performance reason to do so.  Masahiko Sawada
talk at PGCon 2019 supports that conclusion:

    https://www.youtube.com/watch?v=TXKoo2SNMzk

> > I personally would like to see full cluster implemented first to find
> > out exactly what the overhead is.  As I stated earlier, the overhead of
> > determining which things to encrypt, both in code complexity, user
> > interface, and processing overhead, might not be worth it.
> 
> I disagree with this and feel that the overhead that's being discussed
> here (user interface, figuring out if we should encrypt it or not,
> processing overhead for those determinations) is along the lines of
> UNLOGGED tables, yet there wasn't any question about if that was a valid
> or useful feature to implement.  The biggest challenge here is really

We implemented UNLOGGED tables because where was a clear performance win
to doing so.  I have not seen any measurements for encryption,
particularly when WAL is considered.

> around key management and I agree that's difficult but it's also really
> important and something that we need to be thinking about- and thinking
> about how to work with multiple keys and not just one.  Building in an
> assumption that we will only ever work with one key would make this
> capability nothing more than DBA-managed filesystem-level encryption

Agreed, that's all it is.

> (though even there different tablespaces could have different keys...)
> and I worry would make later work to support multiple keys more
> difficult and less likely to actually happen.  It's also not clear to me
> why we aren't building in *some* mechanism to work with multiple keys
> from the start as part of the initial design.

Well, every time I look at multiple keys, I go over exactly what that
means and how it behaves, but get no feedback on how to address the
problems.

> > I can see why you would think that encrypting less would be easier than
> > encrypting more, but security boundaries are hard to construct, and
> > anything that requires a user API, even more so.
> 
> I'm not sure I'm follwing here- I'm pretty sure everyone understands
> that selective encryption will require more work to implement, in part
> because an API needs to be put in place and we need to deal with
> multiple keys, etc.  I don't think anyone thinks that'll be "easier".

Uh, I thought Masahiko Sawada stated but, but looking back, I don't see
it, so I must be wrong.

> > > > > At least it should be clear how [2] will retrieve the master key because [1]
> > > > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > > > more convenient if the passphrase should be read from console.)
> > > 
> > > I think that we can also provide a way to pass encryption key directly
> > > to postmaster rather than using passphrase. Since it's common that
> > > user stores keys in KMS it's useful if we can do that.
> > 
> > Why would it not be simpler to have the cluster_passphrase_command run
> > whatever command-line program it wants?  If you don't want to use a
> > shell command, create an executable and call that.
> 
> Having direct integration with a KMS would certainly be valuable, and I
> don't see a reason to deny users that option if someone would like to
> spend time implementing it- in addition to a simpler mechanism such as a
> passphrase command, which I believe is what was being suggested here.

OK,  I am just trying to see why we would not use the
cluster_passphrase_command-like interface to do that.

> > > > > Rotation of
> > > > > the master key is another thing that both versions of the feature should do in
> > > > > the same way. And of course, the fronend applications need consistent approach
> > > > > too.
> > > >
> > > > I don't see the value of an external library for key storage.
> > > 
> > > I think that big benefit is that PostgreSQL can seamlessly work with
> > > external services such as KMS. For instance, when key rotation,
> > > PostgreSQL can register new key to KMS and use it, and it can remove
> > > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > > not only not only getting key from KMS but also registering and
> > > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > > PostgreSQL which is more safety. In addition, once someone create the
> > > plugin library of an external services individual projects don't need
> > > to create that.
> > 
> > I think the big win for an external library is when you don't want the
> > overhead of calling an external program.  For example, we certainly
> > would not want to call an external program while processing a query.  Do
> > we have any such requirements for encryption, especially since we only
> > are going to allow offline mode for encryption mode changes and key
> > rotation in the first version?
> 
> The strong push for a stripped-down and "first version" that is
> extremely limited is really grating on me as it seems we have quite a

Well, "grating" doesn't change any facts.  If you want to change that,
you will need to do as I stated earlier:

    https://www.postgresql.org/message-id/20190810021716.ovpqenqjb3b7uokc@momjian.us

> few people who are interested in making progress here and a small number
> of others who are pushing back and putting up limitations that "the
> first version can't have X" or "the first version can't have Y".
>
> I'm all for incremental development, but we need to be thinking about
> the larger picture when we develop features and make sure that we don't
> bake in assumptions that will later become very difficult for us to work
> ourselves out of (especially when it comes to user interface and things
> like GUCs...), but where we decide to draw a line shouldn't be based on
> assumptions about what's going to be difficult and what isn't- let's let
> those who want to work on this capability work on it and as we see the
> progress, if there's issues which come up with a specific area that seem
> likely to prove difficult to include, then we can consider backing away
> from that while keeping it in mind while doing further development.

I have seen no one present a clear description of how anything beyond
all-cluster encryption would work or be secure.  Wishing that were not
the case doesn't change things.

> In other words, I feel like we're getting trapped here in a
> "requirements definition" phase of a traditional waterfall-style
> development cycle we have to decide, up front, the EXACT set of features
> and capabilities that we want and then we are going to expect people to
> develop according to EXACTLY that set, and we'll shoot down anything
> that comes across which is trying to do more or is trying to be more
> flexible in anticipation of capabilities that we know we will want down
> the road.  It's likely clear already but I'll say it anyway- I don't
> think it's a good idea to go down that route.

I will continue to shoot down whatever I think has no reasonable chance
of working.  I can just let it go and watch it fail, but I don't see
that as a good approach.

I will state whet I have already told some people privately, that for
this feature, we have many people understanding 40% of the problem, but
thinking they understand 90%.  I do agree we should plan for our
eventual full feature set, but I can't figure out what that feature set
looks like beyond full-cluster encryption, and no one is addressing my
concerns to move that forward.  Vague complains that they don't like the
process are not changing that.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Fri, Aug 16, 2019 at 05:58:59PM +0500, Ibrar Ahmed wrote:
> 
> 
> On Thu, Aug 15, 2019 at 8:21 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Thu, Aug 15, 2019 at 11:24:46AM +0200, Antonin Houska wrote:
>     > > I think there are several directions we can go after all-cluster
>     > > encryption,
>     >
>     > I think I misunderstood. What you summarize in
>     >
>     > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#
>     TODO_for_Full-Cluster_Encryption
>     >
> 
> Do we have any status of TODO's, what has been done and what left? It's much
> better if we have a link of discussion of each item.

I think some are done and some are in process, but I don't have a good
handle on that yet.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Bruce Momjian <bruce@momjian.us> wrote:

> I have seen no one present a clear description of how anything beyond
> all-cluster encryption would work or be secure.  Wishing that were not
> the case doesn't change things.

Since this email thread has grown a lot and is difficult to follow, it might
help if we summarized various approaches on the wiki, with their pros and
cons, and included some links to the corresponding emails in the
archive. There might be people who would like think about the problems but
don't have time to read the whole thread. Overview of the pending problems of
particular approaches might be useful for newcomers, but also for people who
followed only part of the discussion. I mean an overview of the storage
problems; the key management seems to be less controversial.

If you think it makes sense, I can spend some time next week on the
research. However I'd need at least an outline of the approaches proposed
because I also missed some parts of the thread.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Fri, Aug 16, 2019 at 07:47:37PM +0200, Antonin Houska wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > I have seen no one present a clear description of how anything beyond
> > all-cluster encryption would work or be secure.  Wishing that were not
> > the case doesn't change things.
> 
> Since this email thread has grown a lot and is difficult to follow, it might
> help if we summarized various approaches on the wiki, with their pros and
> cons, and included some links to the corresponding emails in the
> archive. There might be people who would like think about the problems but
> don't have time to read the whole thread. Overview of the pending problems of
> particular approaches might be useful for newcomers, but also for people who
> followed only part of the discussion. I mean an overview of the storage
> problems; the key management seems to be less controversial.
> 
> If you think it makes sense, I can spend some time next week on the
> research. However I'd need at least an outline of the approaches proposed
> because I also missed some parts of the thread.

I suggest we schedule a voice call and I will go over all the issues and
explain why I came to the conclusions listed.  It is hard to know what
level of detail to explain that in an email, beyond what I have already
posted on this thread.  The only other options is to read all the emails
_I_ sent on the thread to get an idea.

I am able to do that for others as well.  

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Bruce Momjian <bruce@momjian.us> wrote:

> On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > Why would it not be simpler to have the cluster_passphrase_command run
> > > whatever command-line program it wants?  If you don't want to use a
> > > shell command, create an executable and call that.
> > 
> > Having direct integration with a KMS would certainly be valuable, and I
> > don't see a reason to deny users that option if someone would like to
> > spend time implementing it- in addition to a simpler mechanism such as a
> > passphrase command, which I believe is what was being suggested here.
> 
> OK,  I am just trying to see why we would not use the
> cluster_passphrase_command-like interface to do that.

One problem that occurs to me is that PG may need to send some sort of
credentials to the KMS. If it runs a separate process to execute the command,
it needs to pass those credentials to it. Whether it does so via parameters or
environment variables, both can be seen by other users.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Fri, Aug 16, 2019 at 06:04:39PM -0400, Bruce Momjian wrote:
> I suggest we schedule a voice call and I will go over all the issues and
> explain why I came to the conclusions listed.  It is hard to know what
> level of detail to explain that in an email, beyond what I have already
> posted on this thread.  The only other options is to read all the emails
> _I_ sent on the thread to get an idea.
> 
> I am able to do that for others as well.  

Also, people can certainly ask questions on this list, and I can answer
them, or I can do a Skype/Zoom/IRC chat/call if people want.  The points
of complexity are really the amount of data encrypted, the number of
keys and who controls them, and performance overhead.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Sat, Aug 17, 2019 at 08:16:06AM +0200, Antonin Houska wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > Why would it not be simpler to have the cluster_passphrase_command run
> > > > whatever command-line program it wants?  If you don't want to use a
> > > > shell command, create an executable and call that.
> > > 
> > > Having direct integration with a KMS would certainly be valuable, and I
> > > don't see a reason to deny users that option if someone would like to
> > > spend time implementing it- in addition to a simpler mechanism such as a
> > > passphrase command, which I believe is what was being suggested here.
> > 
> > OK,  I am just trying to see why we would not use the
> > cluster_passphrase_command-like interface to do that.
> 
> One problem that occurs to me is that PG may need to send some sort of
> credentials to the KMS. If it runs a separate process to execute the command,
> it needs to pass those credentials to it. Whether it does so via parameters or
> environment variables, both can be seen by other users.

Yes, that would be a good reason to use an external library, if we can't
figure out a clean API like opening a pipe into the command-line tool
and piping in the secret.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +





On Sat, Aug 17, 2019 at 3:04 AM Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Aug 16, 2019 at 07:47:37PM +0200, Antonin Houska wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
>
> > I have seen no one present a clear description of how anything beyond
> > all-cluster encryption would work or be secure.  Wishing that were not
> > the case doesn't change things.
>
> Since this email thread has grown a lot and is difficult to follow, it might
> help if we summarized various approaches on the wiki, with their pros and
> cons, and included some links to the corresponding emails in the
> archive. There might be people who would like think about the problems but
> don't have time to read the whole thread. Overview of the pending problems of
> particular approaches might be useful for newcomers, but also for people who
> followed only part of the discussion. I mean an overview of the storage
> problems; the key management seems to be less controversial.
>
> If you think it makes sense, I can spend some time next week on the
> research. However I'd need at least an outline of the approaches proposed
> because I also missed some parts of the thread.

I suggest we schedule a voice call and I will go over all the issues and
explain why I came to the conclusions listed.  It is hard to know what
level of detail to explain that in an email, beyond what I have already
posted on this thread.  The only other options is to read all the emails
_I_ sent on the thread to get an idea.

+1 for voice call, bruce we usually have a weekly TDE call. I will include you in
that call.  Currently, in that group are 

I am able to do that for others as well. 

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


--
Ibrar Ahmed
Greetings,

* Ibrar Ahmed (ibrar.ahmad@gmail.com) wrote:
> On Sat, Aug 17, 2019 at 3:04 AM Bruce Momjian <bruce@momjian.us> wrote:
> > +1 for voice call, bruce we usually have a weekly TDE call. I will include
> you in
> that call.  Currently, in that group are

> moon_insung_i3@lab.ntt.co.jp,
> sawada.mshk@gmail.com,
> shawn.wang@highgo.ca,
> ahsan.hadi@highgo.ca,
> ibrar.ahmad@gmail.com
>
> I am able to do that for others as well.

If you could add me to that call, I'll do my best to attend.

(If it's a gmail calendar invite, please send to frost.stephen.p @
gmail).

Thanks!

Stephen

Attachment
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Sat, Aug 17, 2019 at 08:16:06AM +0200, Antonin Houska wrote:
> > Bruce Momjian <bruce@momjian.us> wrote:
> >
> > > On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > Why would it not be simpler to have the cluster_passphrase_command run
> > > > > whatever command-line program it wants?  If you don't want to use a
> > > > > shell command, create an executable and call that.
> > > >
> > > > Having direct integration with a KMS would certainly be valuable, and I
> > > > don't see a reason to deny users that option if someone would like to
> > > > spend time implementing it- in addition to a simpler mechanism such as a
> > > > passphrase command, which I believe is what was being suggested here.
> > >
> > > OK,  I am just trying to see why we would not use the
> > > cluster_passphrase_command-like interface to do that.
> >
> > One problem that occurs to me is that PG may need to send some sort of
> > credentials to the KMS. If it runs a separate process to execute the command,
> > it needs to pass those credentials to it. Whether it does so via parameters or
> > environment variables, both can be seen by other users.
>
> Yes, that would be a good reason to use an external library, if we can't
> figure out a clean API like opening a pipe into the command-line tool
> and piping in the secret.

Having to install something additional to make that whole mechanism
happen would also be less than ideal, imv.  That includes even something
as install-X and then configure passphrase_command.  Our experience with
archive_command shows that it really isn't a very good approach, even
when everything can be passed in on a command line.

Thanks,

Stephen

Attachment
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> I will state whet I have already told some people privately, that for
> this feature, we have many people understanding 40% of the problem, but
> thinking they understand 90%.  I do agree we should plan for our
> eventual full feature set, but I can't figure out what that feature set
> looks like beyond full-cluster encryption, and no one is addressing my
> concerns to move that forward.  Vague complains that they don't like the
> process are not changing that.

I don't particularly care for these "40%" and "90%" characterizations
and I'm concerned that some might also, reasonably, find that to be a
way to dismiss the opinions and comments from anyone who isn't in the
clearly subjective "90%" crowd.

Regarding what the eventual feature-set is, I believe it's abundently
clear where we want to eventually go and it's surprising to me that it's
unclear- we should be aiming for parity with the other major database
vendors when it comes to TDE.  That's pretty clear and straight-forward
to define, as well, and as facts:

Oracle:
- Supports column-level and tablespace-level.
- Has a Master Encryption Key (MEK), and table keys.
- Supports having the MEK be external to the database system.
- For tablespaces, can also use an external key store with
  different keys for different tablespaces.
- Supports Triple-DES and AES (128, 192, 256 bit)
- Supports a NOMAC parameter to improve performance.
- Has a mechanism for changing the keys/algorithms for tables
  with encrypted columns.

SQL Server:
- Supports database-level encryption
- Has a Instance master key and a per-database master key
- Includes a key store for having other keys
- Provides a function-based approach for encrypting at a column level
  (imagine pgcrypto, but where the key can be pulled from a key-store in
  the database which has to be unlocked)

DB2:
- Supports a Master Key and a Data Encryption Key
- Support encryption at a per-database level

Sybase:
- Supports a key encryption key
- Supports column level encryption with column encryption keys

MySQL:
- Supports a master encryption key
- Supports having the master key in an external data store which speaks
  Oasis KMIP
- Supports per-tablespace encryption
- Supports per-table encryption

Every one of the database systems above uses at least a two-tier system
(SQL server seems to possibly support three-tier) where there is a MEK
and then multiple keys under the MEK to allow partial encryption of the
system, at *least* at a database or tablespace level but a number go
down to column-level, either directly or using a function-based approach
with a key store.

Every one has some kind of key store, and a number support an external
key store.

There is not one that uses a single key or which requires that the
enctire instance be encrypted.

Being PostgreSQL, I would expect us to shoot for as much flexibility as
we possible, similar to what we've done for our ACL system where we
support down to a column-level (and row level with RLS).

That's our target end-goal.  Having an incremental plan to get there
where we start with something simpler and then work towards a more
complicated implementation is fine- but that base, as I've said multiple
times and as supported by what we see other database systems have,
should include some kind of key store with support for multiple keys and
a way to encrypt something less than the entire system.  Every other
database system that we consider at all comparable has at least that.

Thanks,

Stephen

Attachment
On Sat, Aug 17, 2019 at 12:43 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
+1 for voice call, bruce we usually have a weekly TDE call.

Please add me to the call as well. Thanks!

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

The current calendar entry for TDE weekly call will not work for EST timezone. I will change the invite so we can accommodate people from multiple time zones.

Stay tuned.


On Sun, 18 Aug 2019 at 2:29 AM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
On Sat, Aug 17, 2019 at 12:43 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
+1 for voice call, bruce we usually have a weekly TDE call.

Please add me to the call as well. Thanks!

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Greetings,

On Sat, Aug 17, 2019 at 18:30 Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
The current calendar entry for TDE weekly call will not work for EST timezone. I will change the invite so we can accommodate people from multiple time zones.

I appreciate the thought but at least for my part, I already have regular conference calls after midnight to support Asian and Australian time zones, so I’m willing to work to support whatever has already been worked out.  (I also won’t complain about a time that’s more convenient for everyone, of course.)

Thanks!

Stephen
On 2019-08-17 08:16, Antonin Houska wrote:
> One problem that occurs to me is that PG may need to send some sort of
> credentials to the KMS. If it runs a separate process to execute the command,
> it needs to pass those credentials to it. Whether it does so via parameters or
> environment variables, both can be seen by other users.

You could do it via stdin or a file, perhaps.

Where would the PostgreSQL server ultimately get the KMS credentials from?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



> From: Ibrar Ahmed <ibrar.ahmad@gmail.com> Sent: Sunday, 18 August 2019 2:43 AM
> +1 for voice call, bruce we usually have a weekly TDE call. I will include you in

If you don't mind, please also add me to that TDE call list.

Thanks/Regards,
---
Peter Smith
Fujitsu Australia

> -----Original Message-----
> From: Stephen Frost <sfrost@snowman.net> Sent: Friday, 16 August 2019 11:01 AM

> Having direct integration with a KMS would certainly be valuable, and I don't see a reason to deny users that option
ifsomeone would like to spend time 
> implementing it- in addition to a simpler mechanism such as a passphrase command, which I believe is what was being
suggestedhere. 

Yes. We recently made an internal PoC for FEP to enable it to reach out to AWS KMS whenever the MKEY was rotated or
TDKEYwas created. This was achieved by inserting some hooks in our TDE code - these hooks were implemented by a
contrib-moduleloaded by the shared_preload_libraries GUC variable. So when no special "tdekey_aws" module was loaded,
ourTDE functionality simply reverts to its default (random) MDEK/TDEK keys.  

Even if OSS community chooses not to implement any KMS integration, the TDE design could consider providing hooks in a
fewappropriate places to make it easy for people who may need to add their own later. 

Regards,
---
Peter Smith
Fujitsu Australia






On Mon, 19 Aug 2019 at 6:23 AM, Smith, Peter <peters@fast.au.fujitsu.com> wrote:
> From: Ibrar Ahmed <ibrar.ahmad@gmail.com> Sent: Sunday, 18 August 2019 2:43 AM
> +1 for voice call, bruce we usually have a weekly TDE call. I will include you in

If you don't mind, please also add me to that TDE call list.

Sure will do.


Thanks/Regards,
---
Peter Smith
Fujitsu Australia
I have shared a calendar invite for TDE/KMS weekly meeting with the members who expressed interest of joining the meeting in this chain. Hopefully I haven't missed anyone.

I am not aware of everyone's timezone but I have tried to setup a time that's not very inconvenient. It won't be ideal for everyone as we are dealing with multiple timezone but do let me know It is too bizarre for you and I will try to find another slot.    

I will share a zoom link for the meeting on the invite in due course.

-- Ahsan
 

On Mon, Aug 19, 2019 at 9:26 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:


On Mon, 19 Aug 2019 at 6:23 AM, Smith, Peter <peters@fast.au.fujitsu.com> wrote:
> From: Ibrar Ahmed <ibrar.ahmad@gmail.com> Sent: Sunday, 18 August 2019 2:43 AM
> +1 for voice call, bruce we usually have a weekly TDE call. I will include you in

If you don't mind, please also add me to that TDE call list.

Sure will do.


Thanks/Regards,
---
Peter Smith
Fujitsu Australia
On 8/19/19 8:51 AM, Ahsan Hadi wrote:
> I have shared a calendar invite for TDE/KMS weekly meeting with the
> members who expressed interest of joining the meeting in this chain.
> Hopefully I haven't missed anyone.
> 
> I am not aware of everyone's timezone but I have tried to setup a time
> that's not very inconvenient. It won't be ideal for everyone as we are
> dealing with multiple timezone but do let me know It is too bizarre for
> you and I will try to find another slot.    
> 
> I will share a zoom link for the meeting on the invite in due course.


Please add me as well. I would like to join when I can.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



On Sat, Aug 17, 2019 at 01:52:17PM -0400, Stephen Frost wrote:
> Being PostgreSQL, I would expect us to shoot for as much flexibility as
> we possible, similar to what we've done for our ACL system where we
> support down to a column-level (and row level with RLS).
> 
> That's our target end-goal.  Having an incremental plan to get there
> where we start with something simpler and then work towards a more
> complicated implementation is fine- but that base, as I've said multiple
> times and as supported by what we see other database systems have,
> should include some kind of key store with support for multiple keys and
> a way to encrypt something less than the entire system.  Every other
> database system that we consider at all comparable has at least that.

Well, we don't blindly copy features from other databases.  The features
has to be useful for our users and reasonable to implement in Postgres. 
This is been the criteria for every other Postgres features I have seen
developed.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Sat, Aug 17, 2019 at 01:52:17PM -0400, Stephen Frost wrote:
> > Being PostgreSQL, I would expect us to shoot for as much flexibility as
> > we possible, similar to what we've done for our ACL system where we
> > support down to a column-level (and row level with RLS).
> >
> > That's our target end-goal.  Having an incremental plan to get there
> > where we start with something simpler and then work towards a more
> > complicated implementation is fine- but that base, as I've said multiple
> > times and as supported by what we see other database systems have,
> > should include some kind of key store with support for multiple keys and
> > a way to encrypt something less than the entire system.  Every other
> > database system that we consider at all comparable has at least that.
>
> Well, we don't blindly copy features from other databases.  The features
> has to be useful for our users and reasonable to implement in Postgres.
> This is been the criteria for every other Postgres features I have seen
> developed.

Having listed out the feature set of each of the other major databases
when it comes to TDE is exactly how we objectively look at what is being
done in the industry, and that then gives us an understanding of what
users (and auditors) coming from other platforms will expect.

I entirely agree that we shouldn't just copy N feature from X other
database system unless we feel that's the best approach, but when every
other database system out there has capability Y for the general feature
X that we're thinking about implementing, we should be questioning an
approach which doesn't include that.

Thanks,

Stephen

Attachment
On Fri, Aug 23, 2019 at 07:45:22AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Sat, Aug 17, 2019 at 01:52:17PM -0400, Stephen Frost wrote:
> > > Being PostgreSQL, I would expect us to shoot for as much flexibility as
> > > we possible, similar to what we've done for our ACL system where we
> > > support down to a column-level (and row level with RLS).
> > > 
> > > That's our target end-goal.  Having an incremental plan to get there
> > > where we start with something simpler and then work towards a more
> > > complicated implementation is fine- but that base, as I've said multiple
> > > times and as supported by what we see other database systems have,
> > > should include some kind of key store with support for multiple keys and
> > > a way to encrypt something less than the entire system.  Every other
> > > database system that we consider at all comparable has at least that.
> > 
> > Well, we don't blindly copy features from other databases.  The features
> > has to be useful for our users and reasonable to implement in Postgres. 
> > This is been the criteria for every other Postgres features I have seen
> > developed.
> 
> Having listed out the feature set of each of the other major databases
> when it comes to TDE is exactly how we objectively look at what is being
> done in the industry, and that then gives us an understanding of what
> users (and auditors) coming from other platforms will expect.
> 
> I entirely agree that we shouldn't just copy N feature from X other
> database system unless we feel that's the best approach, but when every
> other database system out there has capability Y for the general feature
> X that we're thinking about implementing, we should be questioning an
> approach which doesn't include that.

Agreed.  The features of other databases are a clear source for what we
should consider and run through the useful/reasonable filter.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Aug 23, 2019 at 07:45:22AM -0400, Stephen Frost wrote:
> > Having listed out the feature set of each of the other major databases
> > when it comes to TDE is exactly how we objectively look at what is being
> > done in the industry, and that then gives us an understanding of what
> > users (and auditors) coming from other platforms will expect.
> >
> > I entirely agree that we shouldn't just copy N feature from X other
> > database system unless we feel that's the best approach, but when every
> > other database system out there has capability Y for the general feature
> > X that we're thinking about implementing, we should be questioning an
> > approach which doesn't include that.
>
> Agreed.  The features of other databases are a clear source for what we
> should consider and run through the useful/reasonable filter.

Following on from that- when other databases don't have something that
we're thinking about implementing, maybe we should be contemplating if
it really makes sense as a requirement for us.

Specifically in this case- I went back and tried to figure out what
other database systems have an "encrypt EVERYTHING" option.  I didn't
have much luck finding one though.  So I think we need to ask ourselves-
the "check box" that we're trying to check off with TDE, do the other
database system check that box?  If so, then it looks like the "check
box" isn't actually "encrypt EVERYTHING", it's more along the lines of
"make sure all regular user data is encrypted automatically" or some
such, and that's a very different requirement, which seems to be
answered by the other systems by having a KMS + tablespace/database
level encryption.  We certainly shouldn't be putting a lot of effort
into building something that is either overkill or won't be interesting
to users due to limitations like "have to take the entire cluster
offline to re-key it".

Now, that KMS has to be encrypted using a master key, of course, and we
have to make sure that it is able to survive across a crash, and it'd
sure be nice if it was indexed.  One option for such a KMS would be
something entirely external (which could potentially just be another PG
database or something) but it'd be nice if we had something built-in.
We might also want it to be replicated (or maybe we don't, as was
discussed on the call, to allow for a replica to use an independent set
of keys- of course that leads to issues with pg_rewind and such though).

Anything built-in does seem like it'd be a fair bit of work to get it to
address those requirements, but that does seem to be what the other
database systems have done.  Unfortunately, their documentation doesn't
seem to really say exactly what they've done to address that.

A couple random ideas that probably won't work, but I'll put them out
there for others to shoot down-

Some kind of 2-phase WAL pass, where we do WAL replay for the
non-encrypted bits first (which would include the KMS) and then go back
and WAL replay the encrypted stuff.  Seems terrible.

An independent WAL for the KMS only.  Ugh, do we need another walwriter
then?  and buffers, and lots of other stuff.

Some kind of flat-file based approach with a temp file and renaming of
files using durable_rename(), like what we used to do with
pg_shadow/authid, and now do with replorigin_checkpoint and such?

Something else?

Thoughts?

Thanks!

Stephen

Attachment
On Fri, Aug 23, 2019 at 10:35:17AM -0400, Stephen Frost wrote:
> > Agreed.  The features of other databases are a clear source for what we
> > should consider and run through the useful/reasonable filter.
> 
> Following on from that- when other databases don't have something that
> we're thinking about implementing, maybe we should be contemplating if
> it really makes sense as a requirement for us.

Yes, that's a good point.

> Specifically in this case- I went back and tried to figure out what
> other database systems have an "encrypt EVERYTHING" option.  I didn't
> have much luck finding one though.  So I think we need to ask ourselves-
> the "check box" that we're trying to check off with TDE, do the other
> database system check that box?  If so, then it looks like the "check
> box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> "make sure all regular user data is encrypted automatically" or some
> such, and that's a very different requirement, which seems to be
> answered by the other systems by having a KMS + tablespace/database
> level encryption.  We certainly shouldn't be putting a lot of effort
> into building something that is either overkill or won't be interesting
> to users due to limitations like "have to take the entire cluster
> offline to re-key it".

Well, I think they might do that to reduce encryption overhead.  I think
tests have shown that is not an issue, but we will need to test further.
I am not sure of the downside of encrypting everything, since it leaks
the least information and has a minimal user API and code impact.  What
is the value of encrypting only the user rows?  Better key control?

> Now, that KMS has to be encrypted using a master key, of course, and we
> have to make sure that it is able to survive across a crash, and it'd
> sure be nice if it was indexed.  One option for such a KMS would be
> something entirely external (which could potentially just be another PG
> database or something) but it'd be nice if we had something built-in.
> We might also want it to be replicated (or maybe we don't, as was
> discussed on the call, to allow for a replica to use an independent set
> of keys- of course that leads to issues with pg_rewind and such though).

I think the replica could use a different key for the relations, but the
WAL key would have to be the same.

> Anything built-in does seem like it'd be a fair bit of work to get it to
> address those requirements, but that does seem to be what the other
> database systems have done.  Unfortunately, their documentation doesn't
> seem to really say exactly what they've done to address that.

I do like they pgcrypto key support to be per-database so pg_dump will
dump the data encrypted, and with its locked keys.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Aug 23, 2019 at 10:35:17AM -0400, Stephen Frost wrote:
> > Following on from that- when other databases don't have something that
> > we're thinking about implementing, maybe we should be contemplating if
> > it really makes sense as a requirement for us.
>
> Yes, that's a good point.
>
> > Specifically in this case- I went back and tried to figure out what
> > other database systems have an "encrypt EVERYTHING" option.  I didn't
> > have much luck finding one though.  So I think we need to ask ourselves-
> > the "check box" that we're trying to check off with TDE, do the other
> > database system check that box?  If so, then it looks like the "check
> > box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> > "make sure all regular user data is encrypted automatically" or some
> > such, and that's a very different requirement, which seems to be
> > answered by the other systems by having a KMS + tablespace/database
> > level encryption.  We certainly shouldn't be putting a lot of effort
> > into building something that is either overkill or won't be interesting
> > to users due to limitations like "have to take the entire cluster
> > offline to re-key it".
>
> Well, I think they might do that to reduce encryption overhead.  I think
> tests have shown that is not an issue, but we will need to test further.

I seriously doubt that's why and I don't think there's actually much
value in trying to figure out the "why" here- the question is, do those
systems answer the check-box requirement that was brought up on the call
as the justification for this feature?  If so, then clearly not
everything is required to be encrypted and we shouldn't be stressing
over trying to do that.

> I am not sure of the downside of encrypting everything, since it leaks
> the least information and has a minimal user API and code impact.  What
> is the value of encrypting only the user rows?  Better key control?

Yes, better key control, and better user API, and avoiding having an
implementation that isn't actually what people either expect or want.  I
don't agree at all that this distinction has a "minimal user API
impact"- much of the reason we were throwing out the idea of having a
proper KMS for the "bulk data encryption", at least from what I gathered
on the call, is because of the issues around having to try and bootstrap
a fully encrypted system and deal with crash recovery and hypothesized
leaks.  If we can accept that it's alright for some data to be
unencrypted, then that certainly makes life easier for us, and from what
it looks like, that's pretty typical in industry.  I daresay it seems
likely that could get us all the way to table-level encryption of whole
tuples as discussed elsewhere.  I had a further side-chat with Sehrope
where I believe I explained why the concern regarding tids and ordering
isn't actually valid too, would be great if we could discuss that at
some point as well.  I'd be happy to chat with you about it first and
then if we agree, write up the discussion for the list as well.

> > Now, that KMS has to be encrypted using a master key, of course, and we
> > have to make sure that it is able to survive across a crash, and it'd
> > sure be nice if it was indexed.  One option for such a KMS would be
> > something entirely external (which could potentially just be another PG
> > database or something) but it'd be nice if we had something built-in.
> > We might also want it to be replicated (or maybe we don't, as was
> > discussed on the call, to allow for a replica to use an independent set
> > of keys- of course that leads to issues with pg_rewind and such though).
>
> I think the replica could use a different key for the relations, but the
> WAL key would have to be the same.

This depends on how the WAL is sent to the replica-- if it's sent
unencrypted then the replica could have a different key, at least
potentially.  There are some very interesting questions around pg_rewind
support and archive_mode = always, but that's pretty far down the road
and we may have to tell the users that they have to make some choices
about if they want to have support for those features.

> > Anything built-in does seem like it'd be a fair bit of work to get it to
> > address those requirements, but that does seem to be what the other
> > database systems have done.  Unfortunately, their documentation doesn't
> > seem to really say exactly what they've done to address that.
>
> I do like they pgcrypto key support to be per-database so pg_dump will
> dump the data encrypted, and with its locked keys.

Yes, a built-in KMS would also need pg_dump support.

Thanks,

Stephen

Attachment
On Fri, Aug 23, 2019 at 10:04:13PM -0400, Stephen Frost wrote:
> > Well, I think they might do that to reduce encryption overhead.  I think
> > tests have shown that is not an issue, but we will need to test further.
> 
> I seriously doubt that's why and I don't think there's actually much
> value in trying to figure out the "why" here- the question is, do those
> systems answer the check-box requirement that was brought up on the call
> as the justification for this feature?  If so, then clearly not
> everything is required to be encrypted and we shouldn't be stressing
> over trying to do that.

We will stress in trying _not_ to encrypt everything.

> > I am not sure of the downside of encrypting everything, since it leaks
> > the least information and has a minimal user API and code impact.  What
> > is the value of encrypting only the user rows?  Better key control?
> 
> Yes, better key control, and better user API, and avoiding having an

Uh, there is no user API for all-cluster encryption except for the
administrator.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Dear Hackers. 
It's been a long time since I sent a mail.

On Sat, Aug 24, 2019 at 9:27 AM Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Aug 23, 2019 at 10:35:17AM -0400, Stephen Frost wrote:
> > Agreed.  The features of other databases are a clear source for what we
> > should consider and run through the useful/reasonable filter.
>
> Following on from that- when other databases don't have something that
> we're thinking about implementing, maybe we should be contemplating if
> it really makes sense as a requirement for us.

Yes, that's a good point.

> Specifically in this case- I went back and tried to figure out what
> other database systems have an "encrypt EVERYTHING" option.  I didn't
> have much luck finding one though.  So I think we need to ask ourselves-
> the "check box" that we're trying to check off with TDE, do the other
> database system check that box?  If so, then it looks like the "check
> box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> "make sure all regular user data is encrypted automatically" or some
> such, and that's a very different requirement, which seems to be
> answered by the other systems by having a KMS + tablespace/database
> level encryption.  We certainly shouldn't be putting a lot of effort
> into building something that is either overkill or won't be interesting
> to users due to limitations like "have to take the entire cluster
> offline to re-key it".

Well, I think they might do that to reduce encryption overhead.  I think
tests have shown that is not an issue, but we will need to test further.
I am not sure of the downside of encrypting everything, since it leaks
the least information and has a minimal user API and code impact.  What
is the value of encrypting only the user rows?  Better key control?

Maybe my think can be very wrong. Please understand.

I think that the value of encrypting with granularity rather than 
encrypting of all clusters. Maybe it is advantageous to manageability.

Of course, encrypting of all clusters is an excellent choice because 
it has minimal impact on the code, and perhaps simply to implement and 
management APIs.

But what about Database user or DBA? I thought of the example below.

Suppose we have a system with multiple tenants
(Tenant here means table, tablespace, or database.) in one database cluster.
(I think it's similar to a cloud service. I think this is going to be a common case in the future.)

We need to encrypt for tenant A and not need encryption for tenant B.
In this case, is there a reason to encrypt until tenant B?
It is a great advantage that the user, which is a characteristic of TDE, 
is encrypted without considering encryption.
But there is no reason to encrypt even a tenant that does not require encryption.
And another example, in terms of key management, I thought to encrypt with 
granularity was a good choice (especially key rotation).

A special situation where A and B tenants need encryption and A tenant needs to 
rotate the key once every three months. And B tenant needs to rotate the key once a year. 
( Of course, maybe it is a very rare case.)

If we encrypt all clusters and do not manage of granularity encryption keys by tenants. 
And when they run to A tenant key rotated, the B tenant is also rotated together, 
which can cause operational discomfort.

Of course, encrypting of all clusters and creating the managing of 
granularity keys for each tenant will solve the problem.
But if we are implementing this part, I think it's the same as
the implementation of granular encryption.

Best Regards.
Moon.
 

> Now, that KMS has to be encrypted using a master key, of course, and we
> have to make sure that it is able to survive across a crash, and it'd
> sure be nice if it was indexed.  One option for such a KMS would be
> something entirely external (which could potentially just be another PG
> database or something) but it'd be nice if we had something built-in.
> We might also want it to be replicated (or maybe we don't, as was
> discussed on the call, to allow for a replica to use an independent set
> of keys- of course that leads to issues with pg_rewind and such though).

I think the replica could use a different key for the relations, but the
WAL key would have to be the same.

> Anything built-in does seem like it'd be a fair bit of work to get it to
> address those requirements, but that does seem to be what the other
> database systems have done.  Unfortunately, their documentation doesn't
> seem to really say exactly what they've done to address that.

I do like they pgcrypto key support to be per-database so pg_dump will
dump the data encrypted, and with its locked keys.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Dear Hackers.


> Specifically in this case- I went back and tried to figure out what
> other database systems have an "encrypt EVERYTHING" option.  I didn't
> have much luck finding one though.  So I think we need to ask ourselves-
> the "check box" that we're trying to check off with TDE, do the other
> database system check that box?  If so, then it looks like the "check
> box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> "make sure all regular user data is encrypted automatically" or some
> such, and that's a very different requirement, which seems to be
> answered by the other systems by having a KMS + tablespace/database
> level encryption.  We certainly shouldn't be putting a lot of effort
> into building something that is either overkill or won't be interesting
> to users due to limitations like "have to take the entire cluster
> offline to re-key it".
>
> Now, that KMS has to be encrypted using a master key, of course, and we
> have to make sure that it is able to survive across a crash, and it'd
> sure be nice if it was indexed.

Sorry, Does KMS here mean key Management System(or Service)?
I may be mistaken, but I know that KMS is managing cryptographic keys.
In other words, I kept the master key(or KEK) in KMS( not kept to
PostgreSQL server-side),
and PostgreSQL fetched the master key from KMS, and then encrypt or
decrypt it on the PostgreSQL server-side.
Of course, some KMS supports encryption function,
which is the function to encrypt plain text inside KMS. Is this
project aiming to use this function?


>
> A couple random ideas that probably won't work, but I'll put them out
> there for others to shoot down-
>
> Some kind of 2-phase WAL pass, where we do WAL replay for the
> non-encrypted bits first (which would include the KMS) and then go back
> and WAL replay the encrypted stuff.  Seems terrible.

Sorry, Can you tell me an example what is the 2-phase WAL pass?
I know that WAL read process is decrypted WAL data when
reading an encrypted WAL page(per-page encrypt) or
WAL record(per-record encrypt) and then replay.
Is this a different case?

Best Regards.
Moon.

>
> An independent WAL for the KMS only.  Ugh, do we need another walwriter
> then?  and buffers, and lots of other stuff.
>
> Some kind of flat-file based approach with a temp file and renaming of
> files using durable_rename(), like what we used to do with
> pg_shadow/authid, and now do with replorigin_checkpoint and such?
>
> Something else?
>
> Thoughts?
>
> Thanks!
>
> Stephen



On Fri, Aug 23, 2019 at 11:35 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Fri, Aug 23, 2019 at 07:45:22AM -0400, Stephen Frost wrote:
> > > Having listed out the feature set of each of the other major databases
> > > when it comes to TDE is exactly how we objectively look at what is being
> > > done in the industry, and that then gives us an understanding of what
> > > users (and auditors) coming from other platforms will expect.
> > >
> > > I entirely agree that we shouldn't just copy N feature from X other
> > > database system unless we feel that's the best approach, but when every
> > > other database system out there has capability Y for the general feature
> > > X that we're thinking about implementing, we should be questioning an
> > > approach which doesn't include that.
> >
> > Agreed.  The features of other databases are a clear source for what we
> > should consider and run through the useful/reasonable filter.
>
> Following on from that- when other databases don't have something that
> we're thinking about implementing, maybe we should be contemplating if
> it really makes sense as a requirement for us.
>
> Specifically in this case- I went back and tried to figure out what
> other database systems have an "encrypt EVERYTHING" option.  I didn't
> have much luck finding one though.  So I think we need to ask ourselves-
> the "check box" that we're trying to check off with TDE, do the other
> database system check that box?  If so, then it looks like the "check
> box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> "make sure all regular user data is encrypted automatically" or some
> such, and that's a very different requirement, which seems to be
> answered by the other systems by having a KMS + tablespace/database
> level encryption.  We certainly shouldn't be putting a lot of effort
> into building something that is either overkill or won't be interesting
> to users due to limitations like "have to take the entire cluster
> offline to re-key it".
>
> Now, that KMS has to be encrypted using a master key, of course, and we
> have to make sure that it is able to survive across a crash, and it'd
> sure be nice if it was indexed.  One option for such a KMS would be
> something entirely external (which could potentially just be another PG
> database or something) but it'd be nice if we had something built-in.
> We might also want it to be replicated (or maybe we don't, as was
> discussed on the call, to allow for a replica to use an independent set
> of keys- of course that leads to issues with pg_rewind and such though).

I think most user would expect the physical standby server uses the
same key as the primary server's one, at least for the master key.
Otherwise they would need to use different keys every time of fail
over. Even for WAL encryption keys, since it's common to fetch
archived WAL files that are produced by the primary server by
restore_command using scp the standby server needs to use the same
keys or at least know it. In logical replication, I think that since
we would sent unencrypted data and encrypt it on the subscriber that
is initiated as a different database cluster we can use the different
keys on both sides.

> Anything built-in does seem like it'd be a fair bit of work to get it to
> address those requirements, but that does seem to be what the other
> database systems have done.  Unfortunately, their documentation doesn't
> seem to really say exactly what they've done to address that.

I guess that this depends on the number of encryption keys we use. If
we have encryption keys per tablespace or database the number of keys
would be at most several dozen or several hundred. It's enough to have
them in flat-file format on the disk and to load them to the hash
table on the shared memory. We would not need a complex mechanism.
OTOH if we have keys per tables, we would need to consider indexes and
buffering as they might not fit in the memory.

> A couple random ideas that probably won't work, but I'll put them out
> there for others to shoot down-
>
> Some kind of 2-phase WAL pass, where we do WAL replay for the
> non-encrypted bits first (which would include the KMS) and then go back
> and WAL replay the encrypted stuff.  Seems terrible.
>
> An independent WAL for the KMS only.  Ugh, do we need another walwriter
> then?  and buffers, and lots of other stuff.
>
> Some kind of flat-file based approach with a temp file and renaming of
> files using durable_rename(), like what we used to do with
> pg_shadow/authid, and now do with replorigin_checkpoint and such?

The PoC patch I created does that for the keyring file. When key
rotation, the correspond WAL contains all re-encrypted keys with the
master key identifier, and the recovery restores the keyring file. One
good point of this approach is that external tools and startup process
read it easier. It doesn't require backend codes such as system cache
and heap functions.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On 8/26/19 2:53 AM, Masahiko Sawada wrote:
> I guess that this depends on the number of encryption keys we use. If
> we have encryption keys per tablespace or database the number of keys
> would be at most several dozen or several hundred. It's enough to have
> them in flat-file format on the disk and to load them to the hash
> table on the shared memory. We would not need a complex mechanism.
> OTOH if we have keys per tables, we would need to consider indexes and
> buffering as they might not fit in the memory.

Master key(s) need to be kept in memory, but derived keys (using KDF)
would be calculated at time of use, I would think.

>> Some kind of flat-file based approach with a temp file and renaming of
>> files using durable_rename(), like what we used to do with
>> pg_shadow/authid, and now do with replorigin_checkpoint and such?
>
> The PoC patch I created does that for the keyring file. When key
> rotation, the correspond WAL contains all re-encrypted keys with the
> master key identifier, and the recovery restores the keyring file. One
> good point of this approach is that external tools and startup process
> read it easier. It doesn't require backend codes such as system cache
> and heap functions.

That sounds like a good approach.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment
On Mon, Aug 26, 2019 at 7:49 PM Joe Conway <mail@joeconway.com> wrote:
>
> On 8/26/19 2:53 AM, Masahiko Sawada wrote:
> > I guess that this depends on the number of encryption keys we use. If
> > we have encryption keys per tablespace or database the number of keys
> > would be at most several dozen or several hundred. It's enough to have
> > them in flat-file format on the disk and to load them to the hash
> > table on the shared memory. We would not need a complex mechanism.
> > OTOH if we have keys per tables, we would need to consider indexes and
> > buffering as they might not fit in the memory.
>
> Master key(s) need to be kept in memory, but derived keys (using KDF)
> would be calculated at time of use, I would think.

Yes, we can do that and the PoC patch does so. I'm rather concerned
the salt and info to derive keys. We would need at least info, which
could be OID perhaps, for each keys. Also these data need to be
accessible by both frontend tool and startup process. If the info is
very small data, say 4 byte of OID, we could have all of them on the
memory even if we have keys per tables.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Greetings, 

(Apologies for any naïve thoughts below. Please correct my misunderstandings)

I am trying to understand the background for the ideas proposed and/or already decided, but it is increasingly
difficultto follow.
 

I’ve been watching the TDE list for several months and over that time there have been dozens of different ideas
floated;Each of them have their own good points; Some are conflicting;
 

IMO any TDE implementation will be a trade-off between a number of factors:
* Design – e.g. Simple v Complex solution
* Secureness – e.g. Acceptance that a simpler solution may not handle every possible threat
* Cost/Feasibility – e.g. How hard will TDE be to implement/maintain. 
* User expectations - e.g. What is the “threat model” the end user actually wants to protect against
* User expectations – e.g. Comparison with other products
* Completeness – e.g. Acknowledgement that first implementation may not meet the end-goal.
* Future proof – e.g. ability to evolve in future TDE versions (with minimal re-write of what came before)
* Usability – e.g. Online/offline considerations
* Usability – e.g. Will a more complex solution end up being too difficult to actually use/administer
* etc…

New TDE ideas keep popping up all the time. The discussion sometimes has become mired in technical details; I’m losing
sightof the bigger picture.
 

Would it be possible to share a *tabulation* for all the TDE components; Each component may be a number of design
choices(options); And have brief lists of Pros/Cons for each of those options so that each can be concisely summarised
ontheir respective merits.
 

I think this would be of a great help to understand how we got to where we are now, as well as helping to focus on how
toproceed.
 

For example,

=====
Component: TDKEY
* Option: use derived keys; List of Pros/Cons
* Option: use random keys; List of Pros/Cons
* Option: use keys from some external source and encrypted by MDKEY; List of Pros/Cons
* Option: use same TKEY for all tables/tablespaces; List of Pros/Cons
* Option: … 
* Option: …
* => Decision (i.e. the least-worst compromise/combination of the possible options)
=====

~

Postscript: 

After writing this, I recalled recently reading a mail from Antonin
https://www.postgresql.org/message-id/44057.1565977657%40antoswhich says pretty much the same thing!
 

Also, I recognise that there is an offline shared Googledoc which already includes some of this information, but I
thinkit would be valuable if it could be formatted as Pros/Cons summary table and shared on the Wiki page for everybody
tosee.
 


Kind Regards,
---
Peter Smith
Fujitsu Australia

-----Original Message-----
From: Masahiko Sawada <sawada.mshk@gmail.com> Sent: Thursday, 15 August 2019 7:10 PM

> BTW I've created PoC patch for cluster encryption feature. Attached patch set has done some items of TODO list and
someof them can be used even for finer granularity encryption. Anyway, the implemented components are followings:
 

Hello Sawada-san,

I guess your original patch code may be getting a bit out-dated by the ongoing TDE discussions, but I have done some
codereview of it anyway. 
 

Hopefully a few comments below can still be of use going forward:

---

REVIEW COMMENTS

* src/backend/storage/encryption/enc_cipher.c – For functions EncryptionCipherValue/String maybe should log warnings
forunexpected values instead of silently assigning to default 0/”off”.
 

* src/backend/storage/encryption/enc_cipher.c – For function EncryptionCipherString, purpose of returning ”unknown” if
unclearbecause that will map back to “off” again anyway via EncryptionCipherValue. Why not just return "off" (with
warninglogged).
 

* src/include/storage/enc_common.h – Typo in comment: "Encrypton".

* src/include/storage/encryption.h - The macro DataEncryptionEnabled may be better to be using enum TDE_ENCRYPTION_OFF
insteadof magic number 0
 

* src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will report error if USE_OPENSSL is not defined. The
checkseems premature because it would fail even if the user is not using encryption. Shouldn't the lack of openssl be
OKwhen user is not using TDE at all (i.e. when encryption is "none")?
 

* src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest better to check if
(bootstrap_data_encryption_cipher== TDE_ENCRYPTION_OFF) using enum instead of the magic number 0.
 

* src/backend/storage/encryption/kmgr.c - The function run_cluster_passphrase_command function seems mostly a clone of
anexisting run_ssl_passphrase_command function. Is it possible to refactor to share the common code?
 

* src/backend/storage/encryption/kmgr.c - The function derive_encryption_key declares a char key_len. Why char? It
seemsint everywhere else.
 

* src/backend/bootstrap/bootstrap.c - Suggest better if variable declaration bootstrap_data_encryption_cipher = 0 uses
enumTDE_ENCRYPTION_OFF instead of magic number 0
 

* src/backend/utils/misc/guc.c - It looks like the default value for GUC variable data_encryption_cipher is AES128.
Wouldn't"off" be the more appropriate default value? Otherwise it seems inconsistent with the logic of initdb (which
insiststhat the -e option is mandatory if you wanted any encryption).
 

* src/backend/utils/misc/guc.c - There is a missing entry in the config_group_names[]. The patch changed the
config_group[]in guc_tables.h, so I think there needs to be a matching item in the config_group_names.
 

* src/bin/initdb/initdb.c - The function check_encryption_cipher would disallow an encryption value of "none". Although
maybeit is not very useful to say -e none, it does seem inconsistent to reject it, given that "none" was a valid value
forthe GUC variable data_encryption_cipher.
 

* contrib/bloom/blinsert.c - In function btbuildempty the arguments for PageEncryptionInPlace seem in the wrong order
(forknumshould be 2nd).
 

* src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the arguments for PageEncryptionInPlace seem in
thewrong order (forknum should be 2nd).
 

* src/backend/access/spgist/spginsert.c - In function spgbuildempty the arguments for PageEncryptionInPlace seem in the
wrongorder (forknum should be 2nd). This error looks repeated 3X.
 

* in multiple files - The encryption enums have equivalent strings ("off", "aes-128", "aes-256") but they are scattered
asstring literals in many places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it would be better to
declarethose as string constants in one place only.
 

---

Kind Regards,

Peter Smith
Fujitsu Australia

On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter <peters@fast.au.fujitsu.com> wrote:
>
> -----Original Message-----
> From: Masahiko Sawada <sawada.mshk@gmail.com> Sent: Thursday, 15 August 2019 7:10 PM
>
> > BTW I've created PoC patch for cluster encryption feature. Attached patch set has done some items of TODO list and
someof them can be used even for finer granularity encryption. Anyway, the implemented components are followings: 
>
> Hello Sawada-san,
>
> I guess your original patch code may be getting a bit out-dated by the ongoing TDE discussions, but I have done some
codereview of it anyway. 
>
> Hopefully a few comments below can still be of use going forward:
>
> ---
>
> REVIEW COMMENTS
>
> * src/backend/storage/encryption/enc_cipher.c – For functions EncryptionCipherValue/String maybe should log warnings
forunexpected values instead of silently assigning to default 0/”off”. 
>
> * src/backend/storage/encryption/enc_cipher.c – For function EncryptionCipherString, purpose of returning ”unknown”
ifunclear because that will map back to “off” again anyway via EncryptionCipherValue. Why not just return "off" (with
warninglogged). 
>
> * src/include/storage/enc_common.h – Typo in comment: "Encrypton".
>
> * src/include/storage/encryption.h - The macro DataEncryptionEnabled may be better to be using enum
TDE_ENCRYPTION_OFFinstead of magic number 0 
>
> * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will report error if USE_OPENSSL is not defined. The
checkseems premature because it would fail even if the user is not using encryption. Shouldn't the lack of openssl be
OKwhen user is not using TDE at all (i.e. when encryption is "none")? 
>
> * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest better to check if
(bootstrap_data_encryption_cipher== TDE_ENCRYPTION_OFF) using enum instead of the magic number 0. 
>
> * src/backend/storage/encryption/kmgr.c - The function run_cluster_passphrase_command function seems mostly a clone
ofan existing run_ssl_passphrase_command function. Is it possible to refactor to share the common code? 
>
> * src/backend/storage/encryption/kmgr.c - The function derive_encryption_key declares a char key_len. Why char? It
seemsint everywhere else. 
>
> * src/backend/bootstrap/bootstrap.c - Suggest better if variable declaration bootstrap_data_encryption_cipher = 0
usesenum TDE_ENCRYPTION_OFF instead of magic number 0 
>
> * src/backend/utils/misc/guc.c - It looks like the default value for GUC variable data_encryption_cipher is AES128.
Wouldn't"off" be the more appropriate default value? Otherwise it seems inconsistent with the logic of initdb (which
insiststhat the -e option is mandatory if you wanted any encryption). 
>
> * src/backend/utils/misc/guc.c - There is a missing entry in the config_group_names[]. The patch changed the
config_group[]in guc_tables.h, so I think there needs to be a matching item in the config_group_names. 
>
> * src/bin/initdb/initdb.c - The function check_encryption_cipher would disallow an encryption value of "none".
Althoughmaybe it is not very useful to say -e none, it does seem inconsistent to reject it, given that "none" was a
validvalue for the GUC variable data_encryption_cipher. 
>
> * contrib/bloom/blinsert.c - In function btbuildempty the arguments for PageEncryptionInPlace seem in the wrong order
(forknumshould be 2nd). 
>
> * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the arguments for PageEncryptionInPlace seem
inthe wrong order (forknum should be 2nd). 
>
> * src/backend/access/spgist/spginsert.c - In function spgbuildempty the arguments for PageEncryptionInPlace seem in
thewrong order (forknum should be 2nd). This error looks repeated 3X. 
>
> * in multiple files - The encryption enums have equivalent strings ("off", "aes-128", "aes-256") but they are
scatteredas string literals in many places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it would be
betterto declare those as string constants in one place only.em 
>

Thank you for reviewing this patch.

I've updated TDE patches. These patches implements key system, buffer
encryption and WAL encryption. Please refer to ToDo of cluster-wide
encryption for more details of design and components. It lacks
temporary file encryption and front end tools encryption. For
temporary file encryption, we are discussing which files should be
encrypted on other thread and I thought that temporary file encryption
might be related to that. So I'm currently studying the temporary
encryption patch that Antonin already submitted[1] but some changes
might be needed based on that discussion. For frontend tool support,
Shawn will share his patch that is built on my patch.

I haven't changed the usage of this feature. Please refer to the
email[2] for how to setup encrypted database cluster.

[1] https://www.postgresql.org/message-id/7082.1562337694%40localhost
[2] https://www.postgresql.org/message-id/CAD21AoBc-o%3DKZ%3DBPB5wWVNnBepqe8yqVs_D3eAd3Tr%3DX%3DtTGpQ%40mail.gmail.com

Regards,

--
Masahiko Sawada

Attachment
Hello.

On Thu, Oct 31, 2019 at 11:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter <peters@fast.au.fujitsu.com> wrote:
> >
> > -----Original Message-----
> > From: Masahiko Sawada <sawada.mshk@gmail.com> Sent: Thursday, 15 August 2019 7:10 PM
> >
> > > BTW I've created PoC patch for cluster encryption feature. Attached patch set has done some items of TODO list
andsome of them can be used even for finer granularity encryption. Anyway, the implemented components are followings: 
> >
> > Hello Sawada-san,
> >
> > I guess your original patch code may be getting a bit out-dated by the ongoing TDE discussions, but I have done
somecode review of it anyway. 
> >
> > Hopefully a few comments below can still be of use going forward:
> >
> > ---
> >
> > REVIEW COMMENTS
> >
> > * src/backend/storage/encryption/enc_cipher.c – For functions EncryptionCipherValue/String maybe should log
warningsfor unexpected values instead of silently assigning to default 0/”off”. 
> >
> > * src/backend/storage/encryption/enc_cipher.c – For function EncryptionCipherString, purpose of returning ”unknown”
ifunclear because that will map back to “off” again anyway via EncryptionCipherValue. Why not just return "off" (with
warninglogged). 
> >
> > * src/include/storage/enc_common.h – Typo in comment: "Encrypton".
> >
> > * src/include/storage/encryption.h - The macro DataEncryptionEnabled may be better to be using enum
TDE_ENCRYPTION_OFFinstead of magic number 0 
> >
> > * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will report error if USE_OPENSSL is not defined.
Thecheck seems premature because it would fail even if the user is not using encryption. Shouldn't the lack of openssl
beOK when user is not using TDE at all (i.e. when encryption is "none")? 
> >
> > * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest better to check if
(bootstrap_data_encryption_cipher== TDE_ENCRYPTION_OFF) using enum instead of the magic number 0. 
> >
> > * src/backend/storage/encryption/kmgr.c - The function run_cluster_passphrase_command function seems mostly a clone
ofan existing run_ssl_passphrase_command function. Is it possible to refactor to share the common code? 
> >
> > * src/backend/storage/encryption/kmgr.c - The function derive_encryption_key declares a char key_len. Why char? It
seemsint everywhere else. 
> >
> > * src/backend/bootstrap/bootstrap.c - Suggest better if variable declaration bootstrap_data_encryption_cipher = 0
usesenum TDE_ENCRYPTION_OFF instead of magic number 0 
> >
> > * src/backend/utils/misc/guc.c - It looks like the default value for GUC variable data_encryption_cipher is AES128.
Wouldn't"off" be the more appropriate default value? Otherwise it seems inconsistent with the logic of initdb (which
insiststhat the -e option is mandatory if you wanted any encryption). 
> >
> > * src/backend/utils/misc/guc.c - There is a missing entry in the config_group_names[]. The patch changed the
config_group[]in guc_tables.h, so I think there needs to be a matching item in the config_group_names. 
> >
> > * src/bin/initdb/initdb.c - The function check_encryption_cipher would disallow an encryption value of "none".
Althoughmaybe it is not very useful to say -e none, it does seem inconsistent to reject it, given that "none" was a
validvalue for the GUC variable data_encryption_cipher. 
> >
> > * contrib/bloom/blinsert.c - In function btbuildempty the arguments for PageEncryptionInPlace seem in the wrong
order(forknum should be 2nd). 
> >
> > * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the arguments for PageEncryptionInPlace seem
inthe wrong order (forknum should be 2nd). 
> >
> > * src/backend/access/spgist/spginsert.c - In function spgbuildempty the arguments for PageEncryptionInPlace seem in
thewrong order (forknum should be 2nd). This error looks repeated 3X. 
> >
> > * in multiple files - The encryption enums have equivalent strings ("off", "aes-128", "aes-256") but they are
scatteredas string literals in many places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it would be
betterto declare those as string constants in one place only.em 
> >
>
> Thank you for reviewing this patch.
>
> I've updated TDE patches. These patches implements key system, buffer
> encryption and WAL encryption. Please refer to ToDo of cluster-wide
> encryption for more details of design and components. It lacks
> temporary file encryption and front end tools encryption. For
> temporary file encryption, we are discussing which files should be
> encrypted on other thread and I thought that temporary file encryption
> might be related to that. So I'm currently studying the temporary
> encryption patch that Antonin already submitted[1] but some changes
> might be needed based on that discussion.

Thank you for sharing the patch!

Currently, I'm checking the temporary file while watching Antonin's patch.
The discussion started about a month ago but did not provide a precise method.
The changes expected so far are
For temporary files (files generated by "file/buffile.c", except
logical/reorderbuffer.c"), currently overwriting situation was not
detected yet,
Like Antonin's opinion, the structure of “file/buffile.c” is very
likely to be overwritten. To solve this, we need to suppose the IV
value is changed.
The simplest method for IV values is to add a small header for each
page previously proposed by  Sawada-san and me and record the number
of overwrites in the header.
In other words, when overwriting is performed, the header value is
increased, and the IV value is changed.
Perhaps this part will require more discussion.

I'm researching such a method, but I want to spend more time on the
potential side effects.
If possible, I would like to make an effort to fix this part using the
patch shared by  Sawada-san.

There was also a reasonable opinion that it would be better to use the
structure of “file/buffile.c” for temporary files that created in
“logical/reorderbuffer.c”
This part was created in Antonin's patch before, and if there is no
problem, I would like to use it.

Of course, I may not have written the excellent quality code
correctly, so I will make an interim report if possible.

Best regards.
Moon.

For frontend tool support,
> Shawn will share his patch that is built on my patch.
>
> I haven't changed the usage of this feature. Please refer to the
> email[2] for how to setup encrypted database cluster.
>
> [1] https://www.postgresql.org/message-id/7082.1562337694%40localhost
> [2]
https://www.postgresql.org/message-id/CAD21AoBc-o%3DKZ%3DBPB5wWVNnBepqe8yqVs_D3eAd3Tr%3DX%3DtTGpQ%40mail.gmail.com
>
> Regards,
>
> --
> Masahiko Sawada



On Mon, Aug 5, 2019 at 8:44 PM Bruce Momjian <bruce@momjian.us> wrote:
> Right.  The 8k page LSN changes each time the page is modified, and the
> is part of the page nonce.

What about hint bit changes?

I think even with wal_log_hints=on, it's not the case that *every*
change to hint bits results in an LSN change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Aug 6, 2019 at 10:36 AM Bruce Momjian <bruce@momjian.us> wrote:
> OK, I think you are missing something.   Let me go over the details.
> First, I think we are all agreed we are using CTR for heap/index pages,
> and for WAL, because CTR allows byte granularity, it is faster, and
> might be more secure.
>
> So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
> to encrypt each page.  In CTR mode, we do that by creating an 8k bit
> stream, which is created in 16-byte chunks with AES by incrementing the
> counter used for each 16-byte chunk.  Wee then XOR the bits with what we
> want to encrypt, and skip the LSN and CRC parts of the page.

Seems reasonable (not that I am an encryption expert).

> For WAL, we effectively create a 16MB bitstream, though we can create it
> in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> nonce is the segment number, but each 16-byte chunk uses a different
> counter.  Therefore, even if you are encrypting the same 8k page several
> times in the WAL, the 8k page would be different because of the LSN (and
> other changes), and the bitstream you encrypt/XOR it with would be
> different because the counter would be different for that offset in the
> WAL.

But, if you encrypt the same WAL page several times, the LSN won't
change, because a WAL page doesn't have an LSN on it, and if it did,
it wouldn't be changing, because an LSN is just a position within the
WAL stream, so any given byte on any given WAL page always has the
same LSN, whatever it is.

And if the counter value changed on re-encryption, I don't see how
we'd know what counter value to use when decrypting.  There's no way
for the code that is decrypting to know how many times the page got
rewritten as it was being filled.

Please correct me if I'm being stupid here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Aug 5, 2019 at 8:44 PM Bruce Momjian <bruce@momjian.us> wrote:
> > Right.  The 8k page LSN changes each time the page is modified, and the
> > is part of the page nonce.
> 
> What about hint bit changes?
> 
> I think even with wal_log_hints=on, it's not the case that *every*
> change to hint bits results in an LSN change.

Change to hint bits does not result in LSN change in the case I described here

https://www.postgresql.org/message-id/28452.1572443058%40antos

but I consider this a bug (BTW, I discovered this problem when thinking about
the use of LSN as encryption IV). Do you mean any other case? If LSN does not
get changed, then the related full-page image WAL record is not guaranteed to
be on disk during crash recovery. Thus if page checksum is invalid due to
torn-page write, there's now WAL record to fix the page.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Aug 6, 2019 at 10:36 AM Bruce Momjian <bruce@momjian.us> wrote:

> Seems reasonable (not that I am an encryption expert).
> 
> > For WAL, we effectively create a 16MB bitstream, though we can create it
> > in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> > nonce is the segment number, but each 16-byte chunk uses a different
> > counter.  Therefore, even if you are encrypting the same 8k page several
> > times in the WAL, the 8k page would be different because of the LSN (and
> > other changes), and the bitstream you encrypt/XOR it with would be
> > different because the counter would be different for that offset in the
> > WAL.
> 
> But, if you encrypt the same WAL page several times, the LSN won't
> change, because a WAL page doesn't have an LSN on it, and if it did,
> it wouldn't be changing, because an LSN is just a position within the
> WAL stream, so any given byte on any given WAL page always has the
> same LSN, whatever it is.
> 
> And if the counter value changed on re-encryption, I don't see how
> we'd know what counter value to use when decrypting.  There's no way
> for the code that is decrypting to know how many times the page got
> rewritten as it was being filled.
> 
> Please correct me if I'm being stupid here.

In my implementation (I haven't checked whether Masahiko Sawada changed this
in his patch) I avoided repeated encryption of different data using the same
key+IV by omitting the unused part of the WAL page from encryption. Already
written records can be encrypted repeatedly because they do not change.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com





On Sat, 2 Nov 2019 at 21:33, Antonin Houska <ah@cybertec.at> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Aug 6, 2019 at 10:36 AM Bruce Momjian <bruce@momjian.us> wrote:

> Seems reasonable (not that I am an encryption expert).
>
> > For WAL, we effectively create a 16MB bitstream, though we can create it
> > in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> > nonce is the segment number, but each 16-byte chunk uses a different
> > counter.  Therefore, even if you are encrypting the same 8k page several
> > times in the WAL, the 8k page would be different because of the LSN (and
> > other changes), and the bitstream you encrypt/XOR it with would be
> > different because the counter would be different for that offset in the
> > WAL.
>
> But, if you encrypt the same WAL page several times, the LSN won't
> change, because a WAL page doesn't have an LSN on it, and if it did,
> it wouldn't be changing, because an LSN is just a position within the
> WAL stream, so any given byte on any given WAL page always has the
> same LSN, whatever it is.
>
> And if the counter value changed on re-encryption, I don't see how
> we'd know what counter value to use when decrypting.  There's no way
> for the code that is decrypting to know how many times the page got
> rewritten as it was being filled.
>
> Please correct me if I'm being stupid here.

In my implementation (I haven't checked whether Masahiko Sawada changed this
in his patch) I avoided repeated encryption of different data using the same
key+IV by omitting the unused part of the WAL page from encryption. Already
written records can be encrypted repeatedly because they do not change.


Yeah my patch doesn't change this part. IV for WAL encryption consists of the segment file number, page offset within segment file and the counter for CTR cipher mode.

Regards,

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Nov  2, 2019 at 01:34:41PM +0100, Antonin Houska wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > On Tue, Aug 6, 2019 at 10:36 AM Bruce Momjian <bruce@momjian.us> wrote:
> 
> > Seems reasonable (not that I am an encryption expert).
> > 
> > > For WAL, we effectively create a 16MB bitstream, though we can create it
> > > in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> > > nonce is the segment number, but each 16-byte chunk uses a different
> > > counter.  Therefore, even if you are encrypting the same 8k page several
> > > times in the WAL, the 8k page would be different because of the LSN (and
> > > other changes), and the bitstream you encrypt/XOR it with would be
> > > different because the counter would be different for that offset in the
> > > WAL.
> > 
> > But, if you encrypt the same WAL page several times, the LSN won't
> > change, because a WAL page doesn't have an LSN on it, and if it did,
> > it wouldn't be changing, because an LSN is just a position within the
> > WAL stream, so any given byte on any given WAL page always has the
> > same LSN, whatever it is.
> > 
> > And if the counter value changed on re-encryption, I don't see how
> > we'd know what counter value to use when decrypting.  There's no way
> > for the code that is decrypting to know how many times the page got
> > rewritten as it was being filled.
> > 
> > Please correct me if I'm being stupid here.
> 
> In my implementation (I haven't checked whether Masahiko Sawada changed this
> in his patch) I avoided repeated encryption of different data using the same
> key+IV by omitting the unused part of the WAL page from encryption. Already
> written records can be encrypted repeatedly because they do not change.

Right.  Even though AES with CTR generates encryption bit patterns in
16-byte chunks, you only XOR the bytes you have written.  So, if the WAL
record is 167 bytes, you generate 11 16-byte patterns, but you only XOR
the first seven bytes of the 11th 16-byte block.  CTR is not like CBC
which has to encrypt in 16-byte chunks.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Hi hackers,
By arrange, I will complete the modification of the front-end tool to support TDE.
Now I have completed the modification of the pg_waldump, pg_resetwal, and pg_rewind tools.
My design:
1. Add two options,  -D and -c, to the front-end tools. You can use -c to get a password of the user to generate kek; use the -D option to get cluster encryption, walkey, and relkey.
2. pg_waldump adds wal decryption function
3. pg_rewind adds wal decryption function
4. pg_resetwal adds wal encryption

Regards,

--
Shawn Wang 

Masahiko Sawada <sawada.mshk@gmail.com> 于2019年10月31日周四 下午10:25写道:
On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter <peters@fast.au.fujitsu.com> wrote:
>
> -----Original Message-----
> From: Masahiko Sawada <sawada.mshk@gmail.com> Sent: Thursday, 15 August 2019 7:10 PM
>
> > BTW I've created PoC patch for cluster encryption feature. Attached patch set has done some items of TODO list and some of them can be used even for finer granularity encryption. Anyway, the implemented components are followings:
>
> Hello Sawada-san,
>
> I guess your original patch code may be getting a bit out-dated by the ongoing TDE discussions, but I have done some code review of it anyway.
>
> Hopefully a few comments below can still be of use going forward:
>
> ---
>
> REVIEW COMMENTS
>
> * src/backend/storage/encryption/enc_cipher.c – For functions EncryptionCipherValue/String maybe should log warnings for unexpected values instead of silently assigning to default 0/”off”.
>
> * src/backend/storage/encryption/enc_cipher.c – For function EncryptionCipherString, purpose of returning ”unknown” if unclear because that will map back to “off” again anyway via EncryptionCipherValue. Why not just return "off" (with warning logged).
>
> * src/include/storage/enc_common.h – Typo in comment: "Encrypton".
>
> * src/include/storage/encryption.h - The macro DataEncryptionEnabled may be better to be using enum TDE_ENCRYPTION_OFF instead of magic number 0
>
> * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will report error if USE_OPENSSL is not defined. The check seems premature because it would fail even if the user is not using encryption. Shouldn't the lack of openssl be OK when user is not using TDE at all (i.e. when encryption is "none")?
>
> * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest better to check if (bootstrap_data_encryption_cipher == TDE_ENCRYPTION_OFF) using enum instead of the magic number 0.
>
> * src/backend/storage/encryption/kmgr.c - The function run_cluster_passphrase_command function seems mostly a clone of an existing run_ssl_passphrase_command function. Is it possible to refactor to share the common code?
>
> * src/backend/storage/encryption/kmgr.c - The function derive_encryption_key declares a char key_len. Why char? It seems int everywhere else.
>
> * src/backend/bootstrap/bootstrap.c - Suggest better if variable declaration bootstrap_data_encryption_cipher = 0 uses enum TDE_ENCRYPTION_OFF instead of magic number 0
>
> * src/backend/utils/misc/guc.c - It looks like the default value for GUC variable data_encryption_cipher is AES128. Wouldn't "off" be the more appropriate default value? Otherwise it seems inconsistent with the logic of initdb (which insists that the -e option is mandatory if you wanted any encryption).
>
> * src/backend/utils/misc/guc.c - There is a missing entry in the config_group_names[]. The patch changed the config_group[] in guc_tables.h, so I think there needs to be a matching item in the config_group_names.
>
> * src/bin/initdb/initdb.c - The function check_encryption_cipher would disallow an encryption value of "none". Although maybe it is not very useful to say -e none, it does seem inconsistent to reject it, given that "none" was a valid value for the GUC variable data_encryption_cipher.
>
> * contrib/bloom/blinsert.c - In function btbuildempty the arguments for PageEncryptionInPlace seem in the wrong order (forknum should be 2nd).
>
> * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the arguments for PageEncryptionInPlace seem in the wrong order (forknum should be 2nd).
>
> * src/backend/access/spgist/spginsert.c - In function spgbuildempty the arguments for PageEncryptionInPlace seem in the wrong order (forknum should be 2nd). This error looks repeated 3X.
>
> * in multiple files - The encryption enums have equivalent strings ("off", "aes-128", "aes-256") but they are scattered as string literals in many places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it would be better to declare those as string constants in one place only.em
>

Thank you for reviewing this patch.

I've updated TDE patches. These patches implements key system, buffer
encryption and WAL encryption. Please refer to ToDo of cluster-wide
encryption for more details of design and components. It lacks
temporary file encryption and front end tools encryption. For
temporary file encryption, we are discussing which files should be
encrypted on other thread and I thought that temporary file encryption
might be related to that. So I'm currently studying the temporary
encryption patch that Antonin already submitted[1] but some changes
might be needed based on that discussion. For frontend tool support,
Shawn will share his patch that is built on my patch.

I haven't changed the usage of this feature. Please refer to the
email[2] for how to setup encrypted database cluster.

[1] https://www.postgresql.org/message-id/7082.1562337694%40localhost
[2] https://www.postgresql.org/message-id/CAD21AoBc-o%3DKZ%3DBPB5wWVNnBepqe8yqVs_D3eAd3Tr%3DX%3DtTGpQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
Attachment
On Sat, Nov 2, 2019 at 8:23 AM Antonin Houska <ah@cybertec.at> wrote:
> Change to hint bits does not result in LSN change in the case I described here
>
> https://www.postgresql.org/message-id/28452.1572443058%40antos
>
> but I consider this a bug (BTW, I discovered this problem when thinking about
> the use of LSN as encryption IV). Do you mean any other case? If LSN does not
> get changed, then the related full-page image WAL record is not guaranteed to
> be on disk during crash recovery. Thus if page checksum is invalid due to
> torn-page write, there's now WAL record to fix the page.

I thought the idea was that the first change to hint bits after a
given checkpoint produced an FPI, but subsequent changes within the
same checkpoint cycle do not. That's OK from a crash recovery
perspective, because redo begins at a checkpoint: either the page was
never modified after the last checkpoint, in which case the last write
to that relation was successfully fsync'd and the page is not torn, or
we restore an FPI at least once, which clobbers any torn page left
behind by the crash with a known good state. So a crash can lose some
hint bits settings, if they weren't the first to that page in that
checkpoint cycle, but it never leaves the page in an invalid state.

The same scheme will work for TDE as far as crash recovery is
concerned, but it seems like it has a cryptographic weakness if the
LSN is used as the IV, because the second hint bit write to the page
in the same checkpoint cycle would have no reason to bump the LSN,
which means you'd be encrypting with the same IV twice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> wrote:

> On Sat, Nov 2, 2019 at 8:23 AM Antonin Houska <ah@cybertec.at> wrote:
> > Change to hint bits does not result in LSN change in the case I described here
> >
> > https://www.postgresql.org/message-id/28452.1572443058%40antos
> >
> > but I consider this a bug (BTW, I discovered this problem when thinking about
> > the use of LSN as encryption IV). Do you mean any other case? If LSN does not
> > get changed, then the related full-page image WAL record is not guaranteed to
> > be on disk during crash recovery. Thus if page checksum is invalid due to
> > torn-page write, there's now WAL record to fix the page.
>
> I thought the idea was that the first change to hint bits after a
> given checkpoint produced an FPI, but subsequent changes within the
> same checkpoint cycle do not.

Got it, this is what happens in XLogSaveBufferForHint().

Perhaps we can fix it by issuing XLOG_NOOP record in the cases that produce no
FPI. Of course only if the encryption is enabled.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



On Fri, Nov 01, 2019 at 09:38:37AM +0900, Moon, Insung wrote:
> Of course, I may not have written the excellent quality code
> correctly, so I will make an interim report if possible.

The last patch has rotten, and does not apply anymore.  A rebase would
be nice, so I am switching the patch as waiting on author, and moved
it to next CF.

The discussion has gone long around..
--
Michael

Attachment
On Sun, Dec 1, 2019 at 12:03 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Nov 01, 2019 at 09:38:37AM +0900, Moon, Insung wrote:
> > Of course, I may not have written the excellent quality code
> > correctly, so I will make an interim report if possible.
>
> The last patch has rotten, and does not apply anymore.  A rebase would
> be nice, so I am switching the patch as waiting on author, and moved
> it to next CF.
>

We have discussed in off-list and weekly voice meeting for several
months that the purpose of this feature and the target for PG13 and we
concluded to step back and to focus on only internal key management
system for PG13. Transparent data encryption support is now the target
for PG14 or later. Key management system is an important
infrastructure for TDE but it can work independent of TDE. The plan
for PG13 we discussed is to introduce the internal key management
system that has one encryption key for whole database cluster and make
it have some interface to get encryption keys that are managed inside
PostgreSQL database in order to integrate it with other components
such as pgcrypto.

Idea is to get something encrypted and decrypted without ever knowing
the actual key that was used to encrypt it. The attached patch has two
APIs to wrap and unwrap the secret by the encryption key stored inside
database cluster. user generate a secret key locally and send it to
PostgreSQL server to wrap it using by pg_kmgr_wrap() and save it
somewhere. Then user can use the saved and wrapped secret key to
encrypt and decrypt user data by something like:

INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx'));

Where 'xxxxx' is the result of pg_kmgr_wrap function.

I've attached the KMS patch. It requires openssl library. What the
patch does is simple and is not changed much from the previous version
patch that includes KMS and TDE; we generate one encryption key called
master key for whole database cluster at initdb time, which is stored
in pg_control and wrapped by key encryption key(KEK) derived from
user-provided passphrase. When postmaster starts up it verifies the
correctness of passphrase provided by user using hmac key which is
also derived from user-provided passphrase. The server won't start if
the passphrase is incorrect. Once the passphrase is verified the
master key is loaded to the shared buffer and is active.

I added two options to initdb: --cluster-passphrase-command and -e
that takes a passphrase command and a cipher algorithm (currently only
aes-128 and aes-256) respectively. The internal KMS is enabled by
executing initdb with those options as follows:

$ initdb -D data --cluster-passphrase-command="echo 'password'" -e aes-256

I believe the internal KMS would be useful several use cases but I'd
like to have discussion around the integration with pgcrypto because
pgcrypto would be the first user of the KMS and pgcrypto can be more
powerful with the KMS. I'll register this KMS patch to the next Commit
Fest.

I really appreciate peoples (CC-ing) who participated in off-list
discussion/meeting for many inputs, suggestions and reviewing codes.

Regards,


--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment


On Tue, Dec 31, 2019 at 1:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Dec 1, 2019 at 12:03 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Nov 01, 2019 at 09:38:37AM +0900, Moon, Insung wrote:
> > Of course, I may not have written the excellent quality code
> > correctly, so I will make an interim report if possible.
>
> The last patch has rotten, and does not apply anymore.  A rebase would
> be nice, so I am switching the patch as waiting on author, and moved
> it to next CF.
>

We have discussed in off-list and weekly voice meeting for several
months that the purpose of this feature and the target for PG13 and we
concluded to step back and to focus on only internal key management
system for PG13. Transparent data encryption support is now the target
for PG14 or later. Key management system is an important
infrastructure for TDE but it can work independent of TDE. The plan
for PG13 we discussed is to introduce the internal key management
system that has one encryption key for whole database cluster and make
it have some interface to get encryption keys that are managed inside
PostgreSQL database in order to integrate it with other components
such as pgcrypto.

Idea is to get something encrypted and decrypted without ever knowing
the actual key that was used to encrypt it. The attached patch has two
APIs to wrap and unwrap the secret by the encryption key stored inside
database cluster. user generate a secret key locally and send it to
PostgreSQL server to wrap it using by pg_kmgr_wrap() and save it
somewhere. Then user can use the saved and wrapped secret key to
encrypt and decrypt user data by something like:

INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('xxxxx'));

Where 'xxxxx' is the result of pg_kmgr_wrap function.

I've attached the KMS patch. It requires openssl library. What the
patch does is simple and is not changed much from the previous version
patch that includes KMS and TDE; we generate one encryption key called
master key for whole database cluster at initdb time, which is stored
in pg_control and wrapped by key encryption key(KEK) derived from
user-provided passphrase. When postmaster starts up it verifies the
correctness of passphrase provided by user using hmac key which is
also derived from user-provided passphrase. The server won't start if
the passphrase is incorrect. Once the passphrase is verified the
master key is loaded to the shared buffer and is active.

I added two options to initdb: --cluster-passphrase-command and -e
that takes a passphrase command and a cipher algorithm (currently only
aes-128 and aes-256) respectively. The internal KMS is enabled by
executing initdb with those options as follows:

$ initdb -D data --cluster-passphrase-command="echo 'password'" -e aes-256

I believe the internal KMS would be useful several use cases but I'd
like to have discussion around the integration with pgcrypto because
pgcrypto would be the first user of the KMS and pgcrypto can be more
powerful with the KMS. I'll register this KMS patch to the next Commit
Fest.

It is already there "KMS - Internal key management system" (https://commitfest.postgresql.org/26/2196/).

I really appreciate peoples (CC-ing) who participated in off-list
discussion/meeting for many inputs, suggestions and reviewing codes.

Regards,


--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Ibrar Ahmed
Hello Sawada and all

I would like to elaborate more on Sehrope and Sawada's discussion on passing NULL IV in "pg_cipher_encrypt/decrypt"
functionsduring kmgr_wrap_key and kmgr_unwrap_key routines in kmgr_utils.c. Openssl implements key wrap according to
RFC3394as Sawada mentioned and passing NULL will make openssl to use default IV, which equals to A6A6A6A6A6A6A6A6. I
haveconfirmed this on my side; A key wrapped with "NULL" IV can be unwrapped successfully with IV=A6A6A6A6A6A6A6A6, and
unwrapwill fail if IV is set to anything else other than NULL or A6A6A6A6A6A6A6A6.
 

I would like to provide some comments on the encryption and decryption routines provided by cipher_openssl.c in which
cipher.cand kmgr_utils.c are using. I see that "ossl_cipher_encrypt" calls "EVP_EncryptInit_ex" and "EVP_EncryptUpdate"
onlyto complete the encryption. Same thing applies to decryption routines. According to my past experience with openssl
andthe usages online, it is highly recommended to use "init-update-final" cycle to complete the encryption and I see
thatthe "final" part (EVP_EncryptFinal) is missing. This call will properly handle the last block of data especially
whenpadding is taken into account. The functions still works now because the input is encryption key and its size is
multipleof each cipher block and no padding is used. I think it will be safer to use the proper "init-update-final"
cyclefor encryption/decryption
 

According to openssl EVP documentation, "EVP_EncryptUpdate" can be called multiple times at different offset to the
inputdata to be encrypted. I see that "pg_cipher_encrypt" only calls "EVP_EncryptUpdate" once, which makes me assume
thatthe application should invoke "pg_cipher_encrypt" multiple times until the entire data block is encrypted? I am
askingbecause if we were to use "EVP_EncryptFinal" to complete the encryption cycle, then it is better to let
"pg_cipher_encrypt"to figure out how many times "EVP_EncryptUpdate" should be called and finalize it with
"EVP_EncryptFinal"at last block.
 

Lastly, I think we are missing a cleanup routine that calls "EVP_CIPHER_CTX_free()" to free up the EVP_CIPHER_CTX when
encryptionis done. 
 

Thank you

Cary Huang
HighGo Software Canada
On Sat, 4 Jan 2020 at 15:11, cary huang <hcary328@gmail.com> wrote:
>
> Hello Sawada and all
>
> I would like to elaborate more on Sehrope and Sawada's discussion on passing NULL IV in "pg_cipher_encrypt/decrypt"
functionsduring kmgr_wrap_key and kmgr_unwrap_key routines in kmgr_utils.c. Openssl implements key wrap according to
RFC3394as Sawada mentioned and passing NULL will make openssl to use default IV, which equals to A6A6A6A6A6A6A6A6. I
haveconfirmed this on my side; A key wrapped with "NULL" IV can be unwrapped successfully with IV=A6A6A6A6A6A6A6A6, and
unwrapwill fail if IV is set to anything else other than NULL or A6A6A6A6A6A6A6A6. 
>

Sehrope also suggested me not to use the fixed IV in order to avoid
getting the same result from the same value. I'm researching it now.
Also, currently it's using key wrap algorithm[1] but it accepts only
multiple of 8 bytes as input. Since it's not good for some cases it's
better to use key wrap with padding algorithm[2] instead, which seems
available in OpenSSL 1.1.0 or later.

> I would like to provide some comments on the encryption and decryption routines provided by cipher_openssl.c in which
cipher.cand kmgr_utils.c are using. I see that "ossl_cipher_encrypt" calls "EVP_EncryptInit_ex" and "EVP_EncryptUpdate"
onlyto complete the encryption. Same thing applies to decryption routines. According to my past experience with openssl
andthe usages online, it is highly recommended to use "init-update-final" cycle to complete the encryption and I see
thatthe "final" part (EVP_EncryptFinal) is missing. This call will properly handle the last block of data especially
whenpadding is taken into account. The functions still works now because the input is encryption key and its size is
multipleof each cipher block and no padding is used. I think it will be safer to use the proper "init-update-final"
cyclefor encryption/decryption 

Agreed.

>
> According to openssl EVP documentation, "EVP_EncryptUpdate" can be called multiple times at different offset to the
inputdata to be encrypted. I see that "pg_cipher_encrypt" only calls "EVP_EncryptUpdate" once, which makes me assume
thatthe application should invoke "pg_cipher_encrypt" multiple times until the entire data block is encrypted? I am
askingbecause if we were to use "EVP_EncryptFinal" to complete the encryption cycle, then it is better to let
"pg_cipher_encrypt"to figure out how many times "EVP_EncryptUpdate" should be called and finalize it with
"EVP_EncryptFinal"at last block. 

IIUC EVP_EncryptUpdate can encrypt the entire data block.
EVP_EncryptFinal_ex encrypts any data that remains in a partial block.

>
> Lastly, I think we are missing a cleanup routine that calls "EVP_CIPHER_CTX_free()" to free up the EVP_CIPHER_CTX
whenencryption is done. 

Right.

While reading pgcrypto code I thought that it's better to change the
cryptographic code (cipher.c) so that pgcrypto can use them instead of
having duplicated code. I'm trying to change it so.

[1] https://tools.ietf.org/html/rfc3394
[2] https://tools.ietf.org/html/rfc5649

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Mon, Jan 6, 2020 at 4:43 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Sat, 4 Jan 2020 at 15:11, cary huang <hcary328@gmail.com> wrote:
>>
>> Hello Sawada and all
>>
>> I would like to elaborate more on Sehrope and Sawada's discussion on passing NULL IV in "pg_cipher_encrypt/decrypt" functions during kmgr_wrap_key and kmgr_unwrap_key routines in kmgr_utils.c. Openssl implements key wrap according to RFC3394 as Sawada mentioned and passing NULL will make openssl to use default IV, which equals to A6A6A6A6A6A6A6A6. I have confirmed this on my side; A key wrapped with "NULL" IV can be unwrapped successfully with IV=A6A6A6A6A6A6A6A6, and unwrap will fail if IV is set to anything else other than NULL or A6A6A6A6A6A6A6A6.
>>

>Sehrope also suggested me not to use the fixed IV in order to avoid
>getting the same result from the same value. I'm researching it now.
>Also, currently it's using key wrap algorithm[1] but it accepts only
>multiple of 8 bytes as input. Since it's not good for some cases it's
>better to use key wrap with padding algorithm[2] instead, which seems
>available in OpenSSL 1.1.0 or later.

Since the current kmgr only supports AES128 and AES256, the master key generated by kmgr during bootstrap will always be multiple of 8. I believe AES keys in general must always be in multiple of 8. I have not done enough research as to which encryption algorithm will involve keys not in multiple of 8 so I think with the current key_wrap_algorithm is fine. With the key wrap algorithm defined in RFC3394. The IV is used only as a "initial value" and it has to be static; either we randomize one or we use the default A6A6A6... by passing NULL, It is different from the CTR block cipher mode which has been selected to encrypt WAL and buffer. In CTR mode, each block requires a different and unique IV as input in order to be secured; and we have agreed to use segment IDs as IVs. For this reason, I think the current key wrap implementation is fine. The least we can do is to generate a IV during bootstrap and store it in control file, and this generated IV will be used for all key wrapping / unwrapping purposes instead of using the default one.

Best regards
Cary Huang
HighGo Software Canada
Hi,

I took a look at this patch. With some additions I think the feature
itself is useful but the patch needs more work. It also doesn't have
any of its own automated tests yet so the testing below was done
manually.

The attached file, kms_v2.patch, is a rebased version of the
kms_v1.patch that fixes some bit rot. It sorts some of the Makefile
additions but otherwise is the original patch. This version applies
cleanly on master and passes make check.

I don't have a Windows machine to test it, but I think the Windows
build files for these changes are missing. The updated
src/common/Makefile has a comment to coordinate updates to
Mkvcbuild.pm but I don't see kmgr_utils.c or cipher_openssl.c
referenced anywhere in there.

The patch adds "pg_kmgr" to the list of files to skip in
pg_checksums.c but there's no additional "pg_kmgr" file written to the
data directory. Perhaps that's from a prior version that saved data to
its own file?

The constant AES128_KEY_LEN is defined in cipher.c but it's not used
anywhere. RE: AES-128, not sure the value of even supporting it for
this feature (v.s. just supporting AES-256). Unlike something like
table data encryption, I'd expect a KMS to be used much less
frequently so any performance boost of AES-128 vs AES-256 would be
meaningless.

The functions pg_cipher_encrypt(...), pg_cipher_decrypt(...), and
pg_compute_HMAC(...) return true if OpenSSL is not configured. Should
that be false? The ctx init functions all return false when not
configured. I don't think that code path would ever be reached as you
would not have a valid context but seems more consistent to have them
all return false.

There's a comment referring to "Encryption keys (TDEK and WDEK)
length" but this feature is only for a KMS so that should be renamed.

The passphrase is hashed to split it into two 32-byte keys but the min
length is only 8-bytes:

    #define KMGR_MIN_PASSPHRASE_LEN 8

... that should be at least 64-bytes to reflect how it's being used
downstream. Depending on the format of the passphrase commands output
it should be even longer (ex: binary data in hex should really be
double that). The overall min should be 64-byte but maybe add a note
to the docs to explain how the output will be used and the expected
amount of entropy.

In pg_kmgr_wrap(...) it checks that the input is a multiple of 8 bytes:

    if (datalen % 8 != 0)
        ereport(ERROR,
            (errmsg("input data must be multiple of 8 bytes")));

...but after testing it, the OpenSSL key wrap functions it invokes
require a multiple of 16-bytes (block size of AES). Otherwise you get
a generic error:

# SELECT pg_kmgr_wrap('abcd1234'::bytea);
ERROR:  could not wrap the given secret

In ossl_compute_HMAC(...) it refers to AES256_KEY_LEN. Should be
SHA256_HMAC_KEY_LEN (they're both 32-bytes but naming is wrong)

    return HMAC(EVP_sha256(), key, AES256_KEY_LEN, data,
        (uint32) data_size, result, (uint32 *) result_size);

In pg_rotate_encryption_key(...) the error message for short
passphrases should be "at least %d bytes":

    if (passlen < KMGR_MIN_PASSPHRASE_LEN)
        ereport(ERROR,
            (errmsg("passphrase must be more than %d bytes",
            KMGR_MIN_PASSPHRASE_LEN)));

Rotating the passphrase via "SELECT pg_rotate_encryption_key()" and
restarting the server worked (good). Having the server attempt to
start with invalid output from the command gives an error "FATAL:
cluster passphrase does not match expected passphrase" (good).

Round tripping via wrap/unwrap works (good!):

# SELECT convert_from(pg_kmgr_unwrap(pg_kmgr_wrap('abcd1234abcd1234'::bytea)),
'utf8');
   convert_from
------------------
 abcd1234abcd1234
(1 row)

Trying to unwrap gibberish fails (also good!):

# SELECT pg_kmgr_unwrap('\x123456789012345678901234567890123456789012345678');
ERROR:  could not unwrap the given secret

The pg_kmgr_wrap/unwrap functions use EVP_aes_256_wrap()[1] which
implements RFC 5649[2] with the default IVs so they always return the
same value for the same input:

# SELECT x, pg_kmgr_wrap('abcd1234abcd1234abcd1234') FROM
generate_series(1,5) x;
 x |                            pg_kmgr_wrap
---+--------------------------------------------------------------------
 1 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 2 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 3 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 4 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 5 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
(5 rows)

The IVs should be randomized so that repeated wrap operations give
distinct results. To do that, the output format needs to include the
randomized IV. It need not be secret but it needs to be included in
the wrapped output. Related, IIUC, the wrapping mechanism of RFC5649
does provide some integrity checking but it's only 64-bits (v.s. say
256-bits for a full HMAC-SHA-256).

Rather than use EVP_aes_256_wrap() with its defaults, we can generate
a random IV and have the output be "IV || ENCRYPT(KEY, IV, DATA) ||
HMAC(IV || ENCRYPT(KEY, IV, DATA))". For a fixed length internal input
(ex: the KEK encrypted key stored in pg_control) there's no need for
padding as we're dealing with multiples of 16-bytes (ex: KEK encrypted
enc-key / mac-key would be 64-bytes).

It'd also be useful if the user level wrap/unwrap API allowed for
arbitrary sized inputs (not just multiples of 16-byte). Having the
output be in a standard format (i.e. matching OpenSSL's
EVP_aes_256_wrap API) is nice, but as it's meant to be an opaque
interface I think it's fine if the output is not usable outside the
database. I don't see anyone using the wrapped data directly as it's
random bytes without the key. The primary contract for the interface:
"data == unwrap(wrap(data))". This would require enabling padding
which would round up the size of the output to the next 16-bytes.
Adding a prefix byte for a "version" would be nice too as it could be
used to infer the specific cipher/mac combo (Ex: v1 would be
AES256/HMAC-SHA256). I don't think the added size is an issue as
again, the output is opaque. Similar things can also be accomplished
by combining the 16-byte only version with pgcrypto but like this it'd
be usable out of the box without additional extensions.

[1]: https://www.openssl.org/docs/man1.1.1/man3/EVP_aes_256_wrap.html
[2]: https://tools.ietf.org/html/rfc5649

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachment
On Sun, 26 Jan 2020 at 01:35, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> Hi,
>
> I took a look at this patch. With some additions I think the feature
> itself is useful but the patch needs more work. It also doesn't have
> any of its own automated tests yet so the testing below was done
> manually.
>
> The attached file, kms_v2.patch, is a rebased version of the
> kms_v1.patch that fixes some bit rot. It sorts some of the Makefile
> additions but otherwise is the original patch. This version applies
> cleanly on master and passes make check.


Thank you for the comments and updating the patch.

>
> I don't have a Windows machine to test it, but I think the Windows
> build files for these changes are missing. The updated
> src/common/Makefile has a comment to coordinate updates to
> Mkvcbuild.pm but I don't see kmgr_utils.c or cipher_openssl.c
> referenced anywhere in there.

Will support Windows building.

>
> The patch adds "pg_kmgr" to the list of files to skip in
> pg_checksums.c but there's no additional "pg_kmgr" file written to the
> data directory. Perhaps that's from a prior version that saved data to
> its own file?

Right, it's unnecessary. Will remove it.

>
> The constant AES128_KEY_LEN is defined in cipher.c but it's not used
> anywhere. RE: AES-128, not sure the value of even supporting it for
> this feature (v.s. just supporting AES-256). Unlike something like
> table data encryption, I'd expect a KMS to be used much less
> frequently so any performance boost of AES-128 vs AES-256 would be
> meaningless.

Ok. I agree to support only AES256 for this feature.

> The functions pg_cipher_encrypt(...), pg_cipher_decrypt(...), and
> pg_compute_HMAC(...) return true if OpenSSL is not configured. Should
> that be false? The ctx init functions all return false when not
> configured. I don't think that code path would ever be reached as you
> would not have a valid context but seems more consistent to have them
> all return false.

Agreed.

>
> There's a comment referring to "Encryption keys (TDEK and WDEK)
> length" but this feature is only for a KMS so that should be renamed.
>

Will remove it.

> The passphrase is hashed to split it into two 32-byte keys but the min
> length is only 8-bytes:
>
>     #define KMGR_MIN_PASSPHRASE_LEN 8
>
> ... that should be at least 64-bytes to reflect how it's being used
> downstream. Depending on the format of the passphrase commands output
> it should be even longer (ex: binary data in hex should really be
> double that). The overall min should be 64-byte but maybe add a note
> to the docs to explain how the output will be used and the expected
> amount of entropy.

Agreed.

>
> In pg_kmgr_wrap(...) it checks that the input is a multiple of 8 bytes:
>
>     if (datalen % 8 != 0)
>         ereport(ERROR,
>             (errmsg("input data must be multiple of 8 bytes")));
>
> ...but after testing it, the OpenSSL key wrap functions it invokes
> require a multiple of 16-bytes (block size of AES). Otherwise you get
> a generic error:
>
> # SELECT pg_kmgr_wrap('abcd1234'::bytea);
> ERROR:  could not wrap the given secret

Thank you for testing it. I will follow your suggestion described below.

>
> In ossl_compute_HMAC(...) it refers to AES256_KEY_LEN. Should be
> SHA256_HMAC_KEY_LEN (they're both 32-bytes but naming is wrong)
>
>     return HMAC(EVP_sha256(), key, AES256_KEY_LEN, data,
>         (uint32) data_size, result, (uint32 *) result_size);

Will fix.

>
> In pg_rotate_encryption_key(...) the error message for short
> passphrases should be "at least %d bytes":
>
>     if (passlen < KMGR_MIN_PASSPHRASE_LEN)
>         ereport(ERROR,
>             (errmsg("passphrase must be more than %d bytes",
>             KMGR_MIN_PASSPHRASE_LEN)));

Agreed. Will fix.

>
> Rotating the passphrase via "SELECT pg_rotate_encryption_key()" and
> restarting the server worked (good). Having the server attempt to
> start with invalid output from the command gives an error "FATAL:
> cluster passphrase does not match expected passphrase" (good).
>
> Round tripping via wrap/unwrap works (good!):
>
> # SELECT convert_from(pg_kmgr_unwrap(pg_kmgr_wrap('abcd1234abcd1234'::bytea)),
> 'utf8');
>    convert_from
> ------------------
>  abcd1234abcd1234
> (1 row)
>
> Trying to unwrap gibberish fails (also good!):
>
> # SELECT pg_kmgr_unwrap('\x123456789012345678901234567890123456789012345678');
> ERROR:  could not unwrap the given secret
>

Thank you for testing!

> The pg_kmgr_wrap/unwrap functions use EVP_aes_256_wrap()[1] which
> implements RFC 5649[2] with the default IVs so they always return the
> same value for the same input:
>
> # SELECT x, pg_kmgr_wrap('abcd1234abcd1234abcd1234') FROM
> generate_series(1,5) x;
>  x |                            pg_kmgr_wrap
> ---+--------------------------------------------------------------------
>  1 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
>  2 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
>  3 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
>  4 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
>  5 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
> (5 rows)
>
> The IVs should be randomized so that repeated wrap operations give
> distinct results. To do that, the output format needs to include the
> randomized IV. It need not be secret but it needs to be included in
> the wrapped output. Related, IIUC, the wrapping mechanism of RFC5649
> does provide some integrity checking but it's only 64-bits (v.s. say
> 256-bits for a full HMAC-SHA-256).
>
> Rather than use EVP_aes_256_wrap() with its defaults, we can generate
> a random IV and have the output be "IV || ENCRYPT(KEY, IV, DATA) ||
> HMAC(IV || ENCRYPT(KEY, IV, DATA))". For a fixed length internal input
> (ex: the KEK encrypted key stored in pg_control) there's no need for
> padding as we're dealing with multiples of 16-bytes (ex: KEK encrypted
> enc-key / mac-key would be 64-bytes).
>
> It'd also be useful if the user level wrap/unwrap API allowed for
> arbitrary sized inputs (not just multiples of 16-byte). Having the
> output be in a standard format (i.e. matching OpenSSL's
> EVP_aes_256_wrap API) is nice, but as it's meant to be an opaque
> interface I think it's fine if the output is not usable outside the
> database. I don't see anyone using the wrapped data directly as it's
> random bytes without the key. The primary contract for the interface:
> "data == unwrap(wrap(data))". This would require enabling padding
> which would round up the size of the output to the next 16-bytes.
> Adding a prefix byte for a "version" would be nice too as it could be
> used to infer the specific cipher/mac combo (Ex: v1 would be
> AES256/HMAC-SHA256). I don't think the added size is an issue as
> again, the output is opaque. Similar things can also be accomplished
> by combining the 16-byte only version with pgcrypto but like this it'd
> be usable out of the box without additional extensions.
>

Thank you for the suggestion. I like your suggestion. We can do an
integrity check of the user input wrapped key by using HMAC when
unwrapping. Regarding the output format you meant to use aes-256
rather than aes-256-key-wrap? I think that DATA in the output is the
user input key so it still must be multiples of 16-bytes if we use
aes-256-key-wrap.

BTW regarding the implementation of cipher function using opensssl in
the src/common I'm concerned whether we should integrate it with the
openssl.c in pgcrypto. Since pgcrypto with openssl currently supports
aes, des and bf etc the cipher function code in this patch also has
similar functionality. Similarly when we introduced SCRAM we moved
sha2 functions from pgcrypto to src/common. I thought we move all
cipher functions in pgcrypto to src/common but it might be overkill
because the internal KMS will use only aes with only 256 key length as
of now.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, Jan 29, 2020 at 3:43 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> Thank you for the suggestion. I like your suggestion. We can do an
> integrity check of the user input wrapped key by using HMAC when
> unwrapping. Regarding the output format you meant to use aes-256
> rather than aes-256-key-wrap? I think that DATA in the output is the
> user input key so it still must be multiples of 16-bytes if we use
> aes-256-key-wrap.

Yes I'm suggesting not using the key wrap functions and instead using
the regular EVP_aes_256_cbc with a random IV per invocation. For
internal usage (e.g. the encrypted key) it does not need padding as we
know the input value would always be a multiple of 16-bytes. That
would allow the internal usage to have a fixed output length of
LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.

For the user facing piece, padding would enabled to support arbitrary
input data lengths. That would make the output length grow by up to
16-bytes (rounding the data length up to the AES block size) plus one
more byte if a version field is added.

> BTW regarding the implementation of cipher function using opensssl in
> the src/common I'm concerned whether we should integrate it with the
> openssl.c in pgcrypto. Since pgcrypto with openssl currently supports
> aes, des and bf etc the cipher function code in this patch also has
> similar functionality. Similarly when we introduced SCRAM we moved
> sha2 functions from pgcrypto to src/common. I thought we move all
> cipher functions in pgcrypto to src/common but it might be overkill
> because the internal KMS will use only aes with only 256 key length as
> of now.

I'd keep the patch smaller and the functions internal to the KMS for
now. Maybe address it again after the patch is complete as it'll be
easier to see overlaps that could be combined.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>
> On Wed, Jan 29, 2020 at 3:43 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > Thank you for the suggestion. I like your suggestion. We can do an
> > integrity check of the user input wrapped key by using HMAC when
> > unwrapping. Regarding the output format you meant to use aes-256
> > rather than aes-256-key-wrap? I think that DATA in the output is the
> > user input key so it still must be multiples of 16-bytes if we use
> > aes-256-key-wrap.
>
> Yes I'm suggesting not using the key wrap functions and instead using
> the regular EVP_aes_256_cbc with a random IV per invocation. For
> internal usage (e.g. the encrypted key) it does not need padding as we
> know the input value would always be a multiple of 16-bytes.

That makes sense.

> That
> would allow the internal usage to have a fixed output length of
> LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.

Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
AES256 (master key) internally generated.

>
> For the user facing piece, padding would enabled to support arbitrary
> input data lengths. That would make the output length grow by up to
> 16-bytes (rounding the data length up to the AES block size) plus one
> more byte if a version field is added.

I think the length of padding also needs to be added to the output.
Anyway, in the first version the same methods of wrapping/unwrapping
key are used for both internal use and user facing function. And user
input key needs to be a multiple of 16 bytes value.

>
> > BTW regarding the implementation of cipher function using opensssl in
> > the src/common I'm concerned whether we should integrate it with the
> > openssl.c in pgcrypto. Since pgcrypto with openssl currently supports
> > aes, des and bf etc the cipher function code in this patch also has
> > similar functionality. Similarly when we introduced SCRAM we moved
> > sha2 functions from pgcrypto to src/common. I thought we move all
> > cipher functions in pgcrypto to src/common but it might be overkill
> > because the internal KMS will use only aes with only 256 key length as
> > of now.
>
> I'd keep the patch smaller and the functions internal to the KMS for
> now. Maybe address it again after the patch is complete as it'll be
> easier to see overlaps that could be combined.

Agreed.

BTW I think this topic is better to be discussed on a separate thread
as the scope no longer includes TDE. I'll start a new thread for
introducing internal KMS.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > That
> > would allow the internal usage to have a fixed output length of
> > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
>
> Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
> AES256 (master key) internally generated.

No it should be 64-bytes. That way we can have separate 32-byte
encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256).

While it's common to reuse the same 32-byte key for both AES256 and an
HMAC-SHA256 and there aren't any known issues with doing so, when
designing something from scratch it's more secure to use entirely
separate keys.

> > For the user facing piece, padding would enabled to support arbitrary
> > input data lengths. That would make the output length grow by up to
> > 16-bytes (rounding the data length up to the AES block size) plus one
> > more byte if a version field is added.
>
> I think the length of padding also needs to be added to the output.
> Anyway, in the first version the same methods of wrapping/unwrapping
> key are used for both internal use and user facing function. And user
> input key needs to be a multiple of 16 bytes value.

A separate length field does not need to be added as the
padding-enabled output will already include it at the end[1]. This
would be handled automatically by the OpenSSL encryption / decryption
operations (if it's enabled):

[1]: https://en.wikipedia.org/wiki/Padding_(cryptography)#PKCS#5_and_PKCS#7

> BTW I think this topic is better to be discussed on a separate thread
> as the scope no longer includes TDE. I'll start a new thread for
> introducing internal KMS.

Good idea.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/